Jump to content
Chinese-Forums
  • Sign Up

Tools for putting pinyin above characters?


Rowley

Recommended Posts

Hi!

 

I'm again thinking of reading a lot of books quickly to solidify the words and their usage I've learned over the last while. For that I want to put the pinyin above the words. Are there any tools out there that allow me to do entire books almost in one go, and that give good results?

 

I don't like the mdbg one for example, because that one has an 8000 character limit (which I could potentially live with), but it also gives you multiple pinyin pronunciations for many characters, which I really don't like.

 

An ideal tool would allow me to mark the words, or upload a list, of words I don't need annotated.

Link to comment
Share on other sites

Um, I don't know about mdbg... Microsoft Word definitely won't meet your needs it seems lol. How about this one: https://mandarinspot.com/annotate. I just threw a 320k-word novel at it and it didn't complain.  :mrgreen:  It also has a pop-up dictionary, very convenient.
Speaking of which, I used to load Japanese texts into Firefox and use Rikaichan for reading. I'm sure there are similar browser extensions for Chinese learners?
Oh and there's this really cool idea of using a Chinese font with ruby pinyin built in. I haven't tried it though. See the link at the top of this page.

  • Like 1
Link to comment
Share on other sites

I would like to do this too, but I don't need to do more than 100 characters at a time. I used to use the Phonetic guide in MS word 2007 in Win XP but it doesn't work any more now I am using Windows 7, no ruby text. I have enabled Asian Languages

 

It seems to be quite common that it stops working.

 

http://www.chinese-forums.com/index.php?/topic/13504-ms-word-2007-cannot-get-pinyin-with-phonetic-guide/

 

Does anyone know how to fix this without having to go to a third party website? Is there a MS update or Windows fix?

 

Thanks for any help.

 

 

Link to comment
Share on other sites

 

How about this one: https://mandarinspot.com/annotate. I just threw a 320k-word novel at it and it didn't complain.  :mrgreen:  It also has a pop-up dictionary, very convenient.

 

Thank you! Thank you! Yes! This works magnificently!

 

 

Sorry this isn't helpful, but why don't you just read the books normally?

 

I'm sure we can both write long essays on the pros and cons of this. My reasons: enjoyment, speed, and unambiguity with regards to the pinyin and tones. If it has the pinyin I can enjoy the book without having to mouse over every word I'm not entirely sure of, as the the pinyin helps with recognition. More enjoyment and more immersion equals more time spent reading, and I won't get tired or fed up as quickly. With the pinyin above the words I can read significantly faster. As I'm now studying relatively infrequent words, I might only encounter a word a few times in the book. If I can finish the book in a week instead of in 2 months, I will encounter the words I am struggling with more frequently, and again in the next books I can read after this one. And I will often guess wrong with respect to the pinyin and especially the tones, solidifying mistakes in my mind. This eliminates that.

Link to comment
Share on other sites

@Publius I have been to that site and haven't a clue as what to download. However I tried Mandarinspot and it works very well, even gives a list of words if you want to print it out.

 

So I will use that. Thanks

Link to comment
Share on other sites

@Shelley: I realized by third party site you mean PinyinJoe. Yeah, a bit annoying it keeps you jumping through different pages but it's mostly harmless.

The download link I provided in #5 points to official Microsoft site. Or you can reach the same Microsoft download page by googling MSPY 2010. The "立即下载" button will give you IME2010_zh-cn.exe, which is the update/fix you need. If it installs without incident, then you're all set. Fire up MS Word and enjoy. If, on the other hand,  it says you don't need the update, then since you are security conscious, I wouldn't recommend you do anything further. MS Word isn't that great anyways.

Link to comment
Share on other sites

@rowley, no pinyin annotation tool is accurate (it's a difficult task) and most will have several errors per paragraph or even sentence.

In my opinion you should not use them for learning new words or for getting confirmation of words you are not 100% sure of.

As an example, here's the first sentence of the foreword in the novel 《活着》. I can't put the annotations above it here, but I've marked in red the 2 characters it makes mistakes with (copy it in to the tool if you want to see the mistakes yourself).

一位真正的作家永远只内心写作,只有内心才会真实告诉他,他的自私、他的高尚是多么突出。

A brief perusal of other sentences in the novel shows a similar level of mistakes.

You say the purpose of doing this is because you want pinyin and tones to be unambiguous, but the problem is that the ambiguous words are the ones that cause these sorts of tools problems and often they make the incorrect choice when resolving that ambiguity.

The result is that instead of learning how to resolve that ambiguity yourself, you'll simply learn the incorrect pinyin and tones. I don't think this is a good tradeoff to make.

  • Like 2
Link to comment
Share on other sites

@imron, thanks for your reply. The list I am ambiguous about (the tones) is much much much larger than the list where there are actually multiple pronunciations for a single character or especially a multiple character word. So in my case I think it is worth it.

 

I think there are only a few instances where there are multiple pronunciations for a multiple character word, so there the concern isn't as high. A more important concern there I think is that the segmentation wasn't done correctly.  For the single characters it becomes murkier. For the, I think, 3500+ characters I'm in the process of learning these 659 characters have multiple pronunciations (will contain mistakes):

 

, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , 广, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , 尿, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , 绿, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , 沿, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , 便, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , 宿, , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

 

Single characters like these, which are not necessarily words on their own, are the most difficult for me to learn, especially those that have many meanings and those that have different pronunciations. I'm not sure how to overcome this yet. I was hoping seeing them in context would help, but then the Mandarin Popup addon I use (I now see) also doesn't show all the pronunciations so I don't know. For a subset of these, where the annotator consistently uses one pronunciation and where Mandarin Popup also isn't helping, it's definitely a hindrance more than a help.

Link to comment
Share on other sites

One could perhaps use your Chinese Text Analyser, filter on the characters that have multiple translations, and add an (*) or something to them. Not sure how to do that though without a lot of Replace All in Word.

 

Edit: something like that would be useful for more things. Say I wanted to study 250 words in this text, I could mark them to make sure I paid closer attention to them.

 

Edit 2: then even nicer would be if you could mark it the first (1/3), second (2/3) and however many times (23/43) you encountered the word.

 

Edit 3: here's a Google search. It seems there are some Python scripts out there that do this. I'm not familiar with Python. https://www.google.com/webhp?hl=en#hl=en&q=replace+a+list+of+words+in+a+text

Link to comment
Share on other sites

Rowley, I'm severely humbled by your list!

 

 


Someone once said that learning Chinese is "a five-year lesson in humility". I used to think this meant that at the end of five years you will have mastered Chinese and learned humility along the way. However, now having studied Chinese for over six years, I have concluded that actually the phrase means that after five years your Chinese will still be abysmal, but at least you will have thoroughly learned humility.

-- Why Chinese Is So Damn Hard

 

I've racked my brain so hard yet still can't think of a second pronunciation for 有, 个,  纪, 可, 於, 嘿 and it's only the first row... :roll:

Link to comment
Share on other sites

I'm not sure how to overcome this yet.

How are you learning these words, and where are you getting them from?

 

For me, I've found the most effective way to learn words is through context.  If you are working your way through a list of words (or even worse, characters) that someone else has put together, it's a horribly inefficient way to learn.

 

Chinese Text Analyser can't help with what you suggested, but it can help in other ways.  It's designed so you can mark words you are not sure of or aren't confident with and you can then export lists of those words in to other programs (such as Anki or Pleco) for further study to cement them in your mind.

 

Over time (once CTA gets a good feel for your vocabulary) you can also use it to export lists of unknown words to study (either before or after you read a text), giving you confidence in reading but doing away with visual aids (like pinyin annotations) that will actually slow you down in the long run and prevent you from building up decent, character-only reading speed.

Link to comment
Share on other sites

Hi imron, the choice is not really use this method or that, but use this method or none at all. I'll likely decide I don't like reading after all if I can't put the pinyin above the characters. I'll likely decide I don't like learning new words after all if I can't draw from a large ready made list to learn from.

 

With the list it's easy: learn however many words I like today and review them in the future. I can do that mindlessly and for a long time. If I want to learn words from context I never know if this word is useful outside of the particular books I'm reading. I really don't like that. And I have to expend some mental effort every day in making the list. With your tool this becomes more convenient though.

 

The list: 10,000 most frequent words from Subtlex-CH (maybe the next 10,000 later), all the words from the new and old HSK, and all the characters that appear in the Subtlex-CH list. This can be rightly criticized: the segmentation of the Subtlex-CH list is sometimes very bad, and you can argue I don't really need to know the definition of 怀特. The new HSK is good, it gives you an eclectic and useful mix of words, and more importantly perhaps it exposes you to many different characters and their meaning through those words. The old HSK is outdated. I probably don't need to know 共青团 or 红领巾.   Cutting up all the words from the Subtlex-CH and learning the characters... doesn't really sound like a good idea, but you're always told that if you know the meaning of individual characters you can guess at the meaning of words and at time I didn't have a better list to use. I still don't. It'd probably be a good idea to not learn those characters that can't make a word on its own.

 

Along with the words I have also example sentences so I do get some limited learning in context and reading practice through that. For the individual characters I often don't though...

 

Somewhere else you wrote it's better to not use a large corpus to draw words from, but use the particular book you are now reading to do that. I disagree. Then you're going to ignore the words that are common through many works but don't necessarily appear often in this work, and learn the words that appear often in this work but don't necessarily appear often in other works. 明朝那些事儿 for example has a lot of names of people and things common to the Ming Dynasty. Do I really want to learn those first instead of more common words used today? And manually sorting through the list would be a bother. I would still learn the x most common words from the book I was reading though as I came upon them.

Link to comment
Share on other sites

I've racked my brain so hard yet still can't think of a second pronunciation for 有, 个,  纪, 可, 於, 嘿 and it's only the first row... :roll:

 

 

cedict-cc, I think, gives multiple pronunciations for these. You can look them up in the dictionary. They should have multiple entries.

 

For example:

 

於 wū<书>叹词,表示感叹。另见yū;yú‘于1’。

於  yū姓。另见wū;yú‘于1’。

 

 

嘿 hēi叹词。(1)表示招呼或提起注意:嘿,老张,快走吧!|嘿!我说的你听见没有?(2)表示得意:嘿,咱们生产的机器可实在不错呀!(3)表示惊异:嘿,下雪了!|嘿,这是甚么话!另见mò;‘嗨’另见hāi。

 

嘿  mò同‘默’。另见hēi。

 

Link to comment
Share on other sites

That's fair enough, and everyone has their own style of learning that works for them and that often changes as they progress with learning Chinese.  There was a point when I preferred having pinyin above characters also, but I gave up on it (and eventually grew to dislike it) when I realised I was spending more time on the pinyin than the characters.  The problem became especially apparent when I came across a word I wasn't sure of and my eyes would flit to the pinyin and then move on, rather than actually trying to learn the character I was having trouble with.

 

 

 

I can do that mindlessly and for a long time.

This is partly the problem I have with such an approach.  Personally I find learning much more effective when I do it mindfully otherwise it's just spinning wheels and going through the motions of studying without actually learning all that much.  If you've ever been in the situation where you can read every character in a sentence but you have no idea what it means, then that's the sort of thing it leads to.

 

 

With your tool this becomes more convenient though.

Yes, and this was very much one of the purposes of CTA - to help people find relevant words to learn from context.

 

 

 

Then you're going to ignore the words that are common through many works but don't necessarily appear often in this work, and learn the words that appear often in this work but don't necessarily appear often in other works

Not really, see below...

 

 

 

明朝那些事儿 for example has a lot of names of people and things common to the Ming Dynasty. Do I really want to learn those first instead of more common words used today?

Yes!  If it's relevant to what you are reading now, it's far more useful to learn those words, often by an order of magnitude - even if you never see them again in another book (which won't be true anyway - you'll be surprised how often words you learn in some book come up in a completely different book/article by a completely different author in a completely different context).

 

I've done some analysis on this recently, which I plan to write up eventually, but the crux of it is that learning a few hundred words relevant to what you are reading can often give you the same (or greater) increase in understanding as learning a few *thousand* new words from a general wordlist like the HSK or similar.  The difference really is that clear - even across different types of content (novels vs articles), authors and genres.

 

That increase in understanding then brings all the benefits you mentioned in your above post - enjoyment, speed, with more enjoyment and more immersion leading to more time spent reading and not get tired or fed up as quickly, and reading significantly faster and finishing books in a much shorter timeframe.
 

It's far more efficient to learn those other 'more common' words when you are reading something that uses them, rather than learning them now but never encountering them in the content that you are reading.  There's not much point in giving priority to them if there are words more relevant to what you are reading that you could be learning instead.

Link to comment
Share on other sites

@Rowley: Yes, I looked them up in 现代汉语词典 and they do have more than one pronunciations. But some of those alternative readings are so rare I think time are wasted on learning them. For instance, 於 wū <书>叹词, I have never seen it before and I seriously doubt you will. 於 yū 姓, now I am face to face with it, I do recall yeah there is such a thing. But that family name is also very rare. If you meet one, you will never forget. Really no need to learn it separately, is what I'm saying.

As for the 嘿 mò 同‘默’ part, well, one funny thing about Chinese characters like this is, if a established writer, say, Lu Xun, used them that way, they are 通假字; if you use them in your HSK test, they are 错别字. Double standard at it finest.

I'm not going to judge your method. Everyone has his or her own way of doing things. I'm just thinking, information technology has made learning much easier, but it also creates pitfalls. Run a script and you get a vocab list in a second. But it might not be what you wanted. Too many factors. And oftentimes you have to already know it to know whether it's worth the effort.

Ha, idle rambling. Never mind. Good luck with whatever you're studying.

  • Like 1
Link to comment
Share on other sites

@Publius, I've counted the characters again, actually there are 4072. Of those 200 are probably really problematic and another 200 less so. At the time I made the list I contacted someone on fiverr.com to look over those 659 characters and asked them to delete pronunciations that were very uncommon. They deleted some, but in all were not that helpful.

 

Making a list like that is somewhat challenging. You don't want to spend weeks trying to find the perfect list without knowing if there is actually such a list. At the time you think you should spend that time learning. And after you've found a list how do you know it is useful? Much later I stumbled on this: http://www.learnm.org/readmeE.html if I had known about this before, I would have used that. But even this contains some -- if I recall correctly -- 400 very rarely used characters. I really like this chart for example showing you similar characters: http://www.learnm.org/data/AdjList.txt

 

What characters are difficult? Those that have many and different and abstract definitions in cedict. 嘿 HĀI ; HĒI ; Mò is easy, I should just delete the extra pronunciations when I next come upon it. I can't really find a good example now. Maybe 乃。 I've had trouble remembering the pronunciation for 契 because it ostensibly also was pronounced xiè.

 

So let's say in my list 400 characters are very difficult for me to remember. That's a problem, because it might actually take me an order of magnitude more effort to remember them. So 400 difficult characters take as much effort to learn as 2000 or 4000 easy words --- though that doesn't sound right ...

Link to comment
Share on other sites

@imron, you've shifted my thinking. That's a rare achievement  :mrgreen: Thanks.

 

I'll continue working down the list, but am now also seriously considering incorporating some learning in context and I think I've though up a very easy way to do that. Only problem now is that I've started to dislike the cedict definitions (not complete and not infrequently wrong) and I haven't yet found an alternative I can download.

 

Then, okay, the next thing is: how do you organize learning in context the easiest and most convenient and most effective way possible? My first target will probably be 武林外传. I've been editing that a bit now and will probably add the result here http://www.chinese-forums.com/index.php?/topic/8967-%E6%AD%A6%E6%9E%97%E5%A4%96%E4%BC%A0-my-own-swordsman/ (The transcript you posted had episode 77 pasted twice and episode 78 is missing.)

Link to comment
Share on other sites

I've been struggling with 匡 because the cedict definition is unhelpful and short [ Kuāng ] surname Kuang ////// [ kuāng ] to rectify, and does not reflect its usage in the example sentences I am seeing:

 

四周壁下,挨排的放着许多的小白匡床,里面卧着许多小朋友.

 

监狱的大门揭匡一声关上了.

经纪人陆匡时站在那“岗亭”外边和助手谈话.

新法规即将实施以匡助单亲家庭.

 

So here it's the unhelpful definition, rather than that the character is so difficult, that's holding me back.

 

匡 kuāng(1)<书>纠正:匡谬。(2)<书>救;帮助:匡助|匡我不逮(帮助我所做不到的)。(3)<方>粗略计算;估计:匡一匡|匡计|匡算。(4)姓。

Link to comment
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...