Popular Post sparrow Posted December 13, 2013 at 07:04 PM Popular Post Report Share Posted December 13, 2013 at 07:04 PM (edited) Edit: Please read reply #3 by alanmd on this topic. The Wikipedia frequency list is apparently not what it claims to be and repeats words inappropriately. Edit: Uploaded a corrected copy of the spreadsheet. There was a small error. Using spreadsheet formulas, I was able to pull apart the Mandarin word frequency list found on Wikipedia. Wikipedia Source PDF File Discussing Methodology (Chen, Tseng, et al.) According to the above PDF, the list comes from a 14-million-character corpus of Chinese newspapers dating 1993 or earlier. Attached is the spreadsheet. It contains Simplified, Traditional, Pinyin, and English. The comment in the top-left-most cell contains the RAND() formula, which can be used for sorting groups of characters randomly, essentially shuffling them. They can be put back in order by sorting by entry number. If people want info on how I personally use this kind of list, let me know and I'll do a write-up. Statistics Word Set Characters in Set 0001–2500 1119 0001–5000 1658 0001–7500 2048 0001–10,000 2397 Mandarin_10000_Word_Frequency_List.xls Edited December 15, 2013 at 09:25 AM by sparrow 5 Quote Link to comment Share on other sites More sharing options...
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.