fenlan Posted August 11, 2005 at 11:15 PM Report Share Posted August 11, 2005 at 11:15 PM The 20-volume Oxford English dictionary claims 600,000 entries, but many are not real words. There are 15 "words" listed as spelt "a", of which only two are what I would consider to be genuine. These are 1. The letter A; and 2. the indefinite article "a". The other 13 are: 1. Obsolete/dialectal use of "a" to mean "one". Examples only from 1200-1483, including "they satte att dyner in a hall and the quene in another" (they sat at dinner in ONE hall and the queen in another). Can this really be said to be a separate word? It is better to pair "one" and "another" and not "a" and "another", but maybe it shouldn't be separately listed in the OED. 2. "a" also spelt a', an abbreviation for "all", especially in Scots, as in Robbie Burns' "a man's a man for a' that, an' a' that". Again, not really a word, but a representation of an abbreviated "all". 3. Obsolete or dialectal for "he", "she", "it" or "they". An example from 1610 is cited where "a" means "he": "a speaks to you players". 4. Colloquial or dialectal for "have", eg "as might a been" instead of "as might have been", and "coulda" instead of "could have". 5. Obsolete, meaning "ever, always". An example from 1220 is given: "that ha schulen lasten a" (that ought to have lasted for ever). 6. A shortened form of an/on used as a prefix. Can either be two words, eg "a horseback" (on horseback) or one word, eg "ashore", and with the gerund, as in "to set the clock a going". Seems a confused entry, varying between a prefix that actually forms part of the word, to a prefix that is a separate word. As far as *I* know "a going" should be hyphenated as "a-going". 7. A shortened form of "of", as in "cuppa" or "kinda". Also says that "what manner of men" was originally "what manner a men". I would say this is not really a separate word. 8. Obsolete, meaning "till". Only example is in Old English from 1175 and is not worth quoting. 9. Obsolete, meaning "and" or "if". Example from 1450: "wendyth home a leue youre werryeng" (go home and leave your worrying?). 10. Obsolete or dialect for "O" and "ah". Example from 1485: "a veray God! I am wel dyscomforted" (O very God! I am well discomforted.) 11. Prefix to past participle. Example, "an' we have all a-left the spot". Not 100% obsolete, but 99% so and quaint. 12. Prefix a- meaning various things, eg alive, awake, amid compared with live, wake, mid. Not really a word, as the individual words need to be learned separately. Similar to number 6. 13. Suffix that has a variety of roles, eg showing feminine ending of Latin names, as Alexandr-a, Albert-a, and representing colloquial pronunication as in whatta you want. (Can replace do, have, of, to.) I am wondering how many of these 600,000 words are "real" words? Quote Link to comment Share on other sites More sharing options...
Jose Posted August 11, 2005 at 11:52 PM Report Share Posted August 11, 2005 at 11:52 PM The OED web site has some interesting statistics -> http://www.oed.com/about/facts.html The 600,000 figure (actually 615,100) is the number of word forms defined or illustrated, not the number of entries, which is significantly lower. According to that site, the 1989 edition has 291,500 entries, of which 60,400 are cross-reference entries, and 231,000 main entries. The latter include 12,200 non-naturalized (i. e. foreign) words, and 47,100 obsolete words. There are also 230 "spurious" words (why were these words included, then?). A subtraction (231,000 - 12,200 - 47,100 - 230) gives 171,470, which I guess would be the figure for the main entries that correspond to words in current usage. I still find the figure too high, though. I guess many of those entries must correspond to dialectal or nearly obsolete words. Quote Link to comment Share on other sites More sharing options...
fenlan Posted August 12, 2005 at 12:26 AM Author Report Share Posted August 12, 2005 at 12:26 AM Jose, there are also a lot of plant names and such like, so you can take a lot off that 171,000. And a lot of words so rare that I doubt *any* native speaker knows them, apart from having accidentally found them in the dictionary, eg "abacinate", which means "to blind by placing hot irons in front of the eyes". Quote Link to comment Share on other sites More sharing options...
liuzhou Posted August 12, 2005 at 04:06 AM Report Share Posted August 12, 2005 at 04:06 AM I'm not really sure what the point of this thread is, but It is important to remember that the OED is an 'historical' dictionary and as such, naturally includes archaisms. It is universally regarded as the greatest dictionary of any language. (And since when were dialect words not 'real words'?) Quote Link to comment Share on other sites More sharing options...
Jose Posted August 13, 2005 at 06:11 PM Report Share Posted August 13, 2005 at 06:11 PM Dialectal and archaic words are "real words" in a certain sense, but it is important to note that the inclusion of words that are no longer in current use is what makes the OED such a big dictionary. Contrary to a widely-believed myth in the English-speaking world, the sheer bulk of the OED doesn't mean that English has a much larger vocabulary than other languages. In fact, there is no clear criterion to define what a word is, and questions about how many words there are in any given language are virtually impossible to answer. The number of words or word forms in the OED shouldn't be regarded as a definitive "number of words" in the English language. Even if we regard two-word collocations like "car park" or "operating theatre" as words, the whole number of words in the educated usage of any language will probably fall well below 100,000. Quote Link to comment Share on other sites More sharing options...
fenlan Posted August 13, 2005 at 08:51 PM Author Report Share Posted August 13, 2005 at 08:51 PM Jose, the vocabulary of educated native speakers of English *is* larger than that of the vocabulary of their counterparts in Spain. I don't think emotion should come into it. English *is* deeper and more expressive because it contains Anglo-Saxon, French and Latin roots, as well as Old Norse, Latin and Greek contributions. Spanish speakers know the word "communicacion". English speakers know both "speech" and "communication" - neither of these is rare, but they are used in subtly different circumstances. That said, most of the word in the OED are only there as you said to cover the history of English usage from early modern English times to today. It may be the case that only 100,000 words are needed to have an as-native fluency. How many words in the ABCD dictionary are known to the average educated native speaker of Chinese? It would be interesting to find out. Quote Link to comment Share on other sites More sharing options...
Jose Posted August 13, 2005 at 11:53 PM Report Share Posted August 13, 2005 at 11:53 PM Jose, the vocabulary of educated native speakers of English *is* larger than that of the vocabulary of their counterparts in Spain. I don't think emotion should come into it. English *is* deeper and more expressive [...] Any evidence for that? "Speech" is not "comunicación" in Spanish. I would translate it as "habla" (as in "forms of speech") or "discurso" (as in "he delivered a speech"). Can you bring up any example of a sentence or nuance that cannot be translated into Spanish (or French, German, Chinese or whatever)? I don't think it's fair to say that one language is more expressive than any other one. The whole Encyclopaedia Britannica, for example, could be translated into Asturian or Frisian or Zhuang or whatever, and I don't think you would lose any information... Quote Link to comment Share on other sites More sharing options...
fenlan Posted August 13, 2005 at 11:59 PM Author Report Share Posted August 13, 2005 at 11:59 PM Jose, I have information at my fingertips that I can post if you really want me to. But I am worried the discussion is treading on toes and could easily upset someone. Should I send the information, or would it be argumentative to do so? I think I'll leave it - unless you ask me to. Quote Link to comment Share on other sites More sharing options...
Jose Posted August 14, 2005 at 12:04 AM Report Share Posted August 14, 2005 at 12:04 AM Yes, I would like to know that information. I speak both English and Spanish, and I've never felt that I can say more things in one language than in the other. I feel more confident when using Spanish because that's my mother tongue (I couldn't speak much English until I was 15 years old), but I suppose it would be the other way around if I were a native English speaker with a good knowledge of Spanish. Quote Link to comment Share on other sites More sharing options...
fenlan Posted August 14, 2005 at 12:45 AM Author Report Share Posted August 14, 2005 at 12:45 AM Jose, large English dictionaries contain 200,000 words. Similarly comprehensive Spanish dictionaries contain 100,000 words (see http://spanish.about.com/library/questions/aa-q-size_of_language.htm) Have a little play with the various online thesauri that exist for English and Spanish. Try this: Sky: Actually many English words for this: sky, the skies, heaven, the heavens, firmament, welkin, empyrean - all used in slightly different contexts. Also used as nouns meaning "sky" are the blue, the azure. None of these words is obsolete. There are also obsolete words: the loft, the scrow, the skew. The adjectives are: heavenly, celestial, empyreal, supernal In terms of the noun for sky, Spanish has the following alternatives: cielo, firmamento. Of course every sentence in English can be translated into Spanish and vice versa. There will be some English words that require a circumlocution in Spanish, and vice versa (eg "in tertulia" generally needs to be explained). But: the depth of the vocabulary cannot be the same. If I wrote a poem containing "sky", "heaven", "welkin" and "empyrean", and exploited the difference between "sky" and "the skies" and "heaven" and "the heavens", the Spanish translation would be a very, very poor version of the original. It would put over the sense of what was being said, but the contrasts and the lexical beauty of the original could not be replicated. Arguably, some sentences are strictly untranslatable into Spanish due to the dearth of expressive contrasts. Take this line from Charlotte Bronte's Shirley for example: "I...see a fine, perfect rainbow, bright with promise, gloriously spanning the beclouded welkin of life." This could be translated into Spanish only by saying "heaven of life" - but the original sentence does not say "heaven of life". (Welkin is from the Anglo-Saxon word "wolcen". ) I don't expect Spanish had as many obsolete words for the sky either, as it never had the wealth of roots and sources to call on as English. Quote Link to comment Share on other sites More sharing options...
trevelyan Posted August 14, 2005 at 01:54 AM Report Share Posted August 14, 2005 at 01:54 AM fenlan -- I'm under the impression that the OED attempts to provide the earliest known reference for as many words as it can. This doubtless contributes to some examples seeming dated. Also useful to remember that even as late as as the 1700s there were multiple ways of writing the same words across regions (even folk a-hailing from northern and southern London had recognizably different accents). This may also be partially responsible for the abudance of entries in the OED. Oh... is it really fair to classify "loft" as anacronistic. : ) Quote Link to comment Share on other sites More sharing options...
fenlan Posted August 14, 2005 at 02:16 AM Author Report Share Posted August 14, 2005 at 02:16 AM Trevelyan, the latest example in the OED of "the loft" meaning the sky is from 1590 from Spenser's Faerie Queen: "And ever-drizling raine upon the loft". But I take your point that a more useful exercise in some ways is not to trace the first use of a word, but the last use of it - that would tell you how obsolete it was. The loft could be used in poetry, couldn't it? Quote Link to comment Share on other sites More sharing options...
paul62tiger Posted August 14, 2005 at 08:27 AM Report Share Posted August 14, 2005 at 08:27 AM The OED is used by many people as a reference for the subject they have concern for at the time. Many examples you have given may well be of use to others. Don't be so quick to condemn before you consider the wider use of the work. Quote Link to comment Share on other sites More sharing options...
fenlan Posted August 14, 2005 at 12:09 PM Author Report Share Posted August 14, 2005 at 12:09 PM Paul, I am not condemning the OED, but trying to work out the size of the English vocabulary. Quote Link to comment Share on other sites More sharing options...
Jose Posted August 14, 2005 at 10:51 PM Report Share Posted August 14, 2005 at 10:51 PM Counting words is a tricky business. The often cited 200,000 figure for the number of words in English is probably based on the number of entries in the OED. But the OED does not have an exact counterpart in other languages. The main Spanish dictionary, the Diccionario de la Real Academia Española (DRAE), like the French Robert or the German Duden, is not a historic dictionary of the language. Entries are eliminated if they have become obsolete. For example, the 1992 edition of the DRAE had 83014 main entries, and the 2001 edition added 11425 new entries, and removed 6008 ones, resulting in a new total of 88431 entries. Another important difference is that the DRAE doesn't include rare Latin or Greek coinages. Only when a word coined from Latin ot Greek has appeared in different sources and over a certain span of time, is it accepted as part of the lexicon. Take the words for numbers, for example. The highest number in the DRAE is "cuatrillón" (10 raised to the 24th power, following the Continental convention for big numbers). In contrast, the OED includes quintillion, sextillion, septillion, octillion, nonillion, decillion, and several more weird Latin names up to vingtillion, as well as centillion. All such numbers higher than a quintillion seem to be whimsical inventions of lexicographers rather than actual words ever used. It would be unfair to say that the English language has more numbers than the other European languages that don't list such words in their dictionaries. There's no reason why I couldn't coin words like "decillón", "undecillón" or "centillón" in Spanish, but since the words have never been in actual use, no Spanish dictionary has ever included them. This reflects a different lexicographical attitude rather than a difference in vocabulary size. Another point where there is a fundamental difference between the OED and other dictionaries is in its treatment of dialectal words. I am not sure about this, but I think the OED includes Scots English words, for example. Dictionaries of German or Italian don't usually include dialectal words, unless they have some currency in the written language. Given the amount of dialectal variation in these languages, a dictionary that included any recorded dialectal words could easily reach hundreds of thousands of words. Imagine a historic dictionary of Italian that included up to the last Sardinian word documented in old folk songs. It would be huge indeed, and a lexicographical marvel, like the OED, but it would hardly be a reference tool for those interested in the modern standard Italian language. In this respect, the Spanish DRAE does not appear to be very consistent. Until recently, its treatment of Latin American Spanish was quite poor. It's better now, and that's probably where the bulk of the more than 11,000 new entries in the latest edition come from. However, there are still lots of Latin American colloquialisms that are not included. As for Spain's dialects, there doesn't seem to be a clear criterion. Some Asturian words are included, like "andarica" or "llocántaro", but other similar words like "pixín" or "bugre" (anyone who has visited Asturias knows these words for different kinds of fish and shellfish, as they appear in so many restaurant menus) are not. In general Astur-Leonese words are not included (you won't find "furacu" or "fame", again words every Asturian, like myself, knows), but they seem to make an exception with a few names of plants and animals, like the two I've mentioned. The inclusion criteria seem quite sloppy, to say the least. Anyway, the point I'm trying to make is that depending on where a dictionary draws the line as to what dialectal usages to include, the total number of entries can vary enormously. So, lexicographical habits and the scope of dictionaries clearly affect the number of entries they contain. Another problem with counting the main entries of a dictionary is that spelling conventions can affect whether a lexical unit is considered a word. A British person would regard "dustbin" as one word, while an American might feel that "garbage can" is two words. I think such lexical units, even if they are spelt as two or more words, should be regarded as words (like the two examples "car park" and "operating theatre" that I mentioned in a previous post). Otherwise, I am sure German would have much more words than most languages just because of the practice in that language of spelling compounds as one single unit. But note that this doesn't mean that German dictionaries have a large number of entries. On the contrary, German dictionaries list compounds as subentries of a main entry, which makes the number of main entries lower than in the dictionaries of other languages. Similarly, it is difficult to define what constitutes a word in Chinese, where dictionaries are arranged by characters, as everybody here knows, or in Arabic, where dictionaries are arranged by mainly triliteral roots. What I am trying to say is that comparing the number of words in different languages is fraught with pitfalls because of the differences in scope and lexicographical practices among the main dictionaries of the major languages and the sheer definition of what a word is. There are two more points I'd like to comment on. First, the idea that English may have more synonyms because it has drawn words from more sources could be true (although I have to mention that Spanish also draws its vocabulary from more sources than Fenlan has mentioned). In any case, I'm not denying the plausibility of that idea. But I disagree with the notion that an abundance of synonyms makes a language more expressive. For example, Spanish has two words, "aceituna" and "oliva" for "olive". I wonder how that can make the language more expressive at all. There are also at least two alternative words for things like potatoes, pineapples or avocados. This is the result of the geographic extension of the language and the variety of sources it has drawn its vocabulary from. But, frankly, I can't see any advantages in that. Is German a richer language because it has two words for "Saturday"? I can't see how. Purists in most languages tend to criticise the adoption of foreign words if they don't add anything to the language, and that's a correct attitude. Synonyms for the sake of it don't add anything to a language. A second point I wanted to comment on is that speakers of different languages have different expressive mechanisms at their disposal. When I was studying English as a kid growing up in Asturias, it seemed really bizarre to me that both "ser" and "estar" were just one verb "to be" in English. Similarly, it was strange to learn that "saber" and "conocer" could both be translated as "to know". And how could one speak without barely a hint of a subjunctive mood? If I had grown up in Portugal this perception would have been even more acute, since Portuguese has an even more complex tense system than Spanish. The article Fenlan provided a link to states correctly that Spanish also uses the contrastive difference between "noun + adjective" and "adjective + noun" to express different nuances, as well as using a lot of suffixes. This is quite true. A word like "casa" ("house") can be modified in Spanish with a variety of suffixes: "casita", "casina", "casaza", "casona", "casucha". All these forms would require the addition of adjectives to translate them into English, but the nuance of not using an adjective in Spanish would be lost (and, come to think of it, translating the shade of meaning implied by "casaza" and "casucha" is difficult, anyway). Sorry for writing at such length. To sum up, I disagree with Fenlan in his view that the superiority of the English lexicon over that of other languages is an undisputed fact, and even more with the idea that English is a more expressive language than others. I agree with him that translating a poem from English into Spanish will divest it of part of its charm, but that is a general problem with translation that works both ways. A poem by Quevedo translated into English will also fail to convey many of the nuances of the original. That's what makes translation such a difficult field. The idea that those of us, poor devils, who express ourselves in an inferior language, can only do so in a sloppy and shallow way, is absurd. If you could learn enough Spanish to read the works of people like Ciro Alegría, Juan Rulfo, Jorge Luis Borges or Gabriel García Márquez (to name but a few of my favourite authors in Spanish), you would discover an amazing world of expressivity. If you read those authors in translation you will miss the richness of their original language. By the way, going back to the original topic, I think the OED is an amazing work. I have to admit that it shows that British lexicography is indeed superior to that of other countries. I wish there was a similar dictionary of Spanish. The only comparable dictionary is the "Diccionario de Autoridades" compiled at the beginning of the 18th century. Like the OED, it includes quotes of the first recorded appearances of words in books. Apart from being almost 300 years old, it is not as comprehensive as the OED, though. At the beginning of the 20th century, there was an attempt to compile a historic dictionary, OED-style, of Spanish. The two first volumes, comprising the letters A, B, and C up to "cevilla", were published between 1933 and 1936: As so many other things, the project was suspended after the eruption of the Spanish Civil War. The Spanish Real Academia has resumed the project, and I think they are working on it at present, but there don't seem to be any dates for the completion of the project. As for other languages, I think there was also a failed attempt to compile a comprehensive historic dictionary of French. I don't know about other languages. Quote Link to comment Share on other sites More sharing options...
Dennis Posted August 15, 2005 at 12:01 AM Report Share Posted August 15, 2005 at 12:01 AM It is me again English and Dutch 5 million words. Dialectal and archaic words are also words Egg/eieren, eyes/ogen, cow/koe, cath/kat are based on the same root word Quote Link to comment Share on other sites More sharing options...
Dennis Posted August 15, 2005 at 12:27 AM Report Share Posted August 15, 2005 at 12:27 AM Chinese has the largest vocabulary in the world. Why? Because you cannot seperate Modern Chinese form Classical Chinese. Mandarin/Guanhua was used in China from Prehistories Times up till 1922 AD when it was replaced with Mandarin/Guoyu/Putonghua. A lot of Classical Chinese enters in the modern language when you are studying Chinese at the intermediate and advanced level. You can seperate Old English from Modern English because Old English words like NIMEN do not enter into the modern language . I NIMEN some clothes with me. I take some Clothes with me. Quote Link to comment Share on other sites More sharing options...
fenlan Posted August 15, 2005 at 12:43 AM Author Report Share Posted August 15, 2005 at 12:43 AM Jose, your reply indicated how political this subject is. I noted on the Internet that only Spanish speakers object to the common description of English as having the largest vocabulary in the world. Some kind of politics are involved. Now, the number of words in a the OED depends on the distinction between headwords and all entries. You will find that "madden" is a headword, but that "maddening" and "maddeningly" fall into the "other entries" category under the "madden" headword. I believe that the 3rd edition of the OED which is being produced, but may not be finished for 20 years, will address this to produce a more logical list of headwords, as maddening and maddeningly should really be headwords in themselves. So, even removing obsolete and dialectal words, you have to deal with the fact that the OED is huge. Its total number of definitions is 616,500, many of which could be dropped, but many of which could also be separate headwords in themselves. The shorter OED, a 2-volume abbreviated version of the full 20-volume work, eliminates most of the obsolete words (while still covering "every word in general usage since 1700"), and has 98,000 headwords, but still manages to give 500,000 definitions. It seems likely, given the multiplicity of English roots, that whatever principles are used to compile dictionaries, the English language will always have a larger dictionary. The DRAE that you mentioned does not give 500,000 definitions, although it has 88,000 main entries. The DRAE site gives some statistics that you can look up, and you will see that it does not give anywhere near 500,000 definitions. So an English dictionary compiled on those principles, with a bias towards creating headwords, would have several hundred thousand main entries. I am afraid that most educated native speakers are aware of the words quintillion, sextillion and septillion at least. But centillion and vingtillion are not really ever mentioned. You forgot the frequently used "zillion". As for Scots English words "not having currency in the written language" (???) - I have to inform you that is 101% untrue. Scots has its own glorious written tradition. The Scottish poet Robbie Burns wrote in Scots, not English, and Scots was the official language of Scotland until the union of 1707. My copy of the Concise Scots Dictionary runs to 800 pages, and must contain 20,000-30,000 words that are not in what some might term standard English. You are right that spelling conventions do affect the total wordcount, but this is where a consistent and logical determination of what is a "headword" and what is simply "another entry under a headword" is needed to ensure accurate comparison between languages. It seems right that German dictionaries should list endless compounds of headwords as subentries under headwords, as it is simply a matter of convention that these compounds are written as single words anyway. You are right that having two words for "olive" or "to be" does not significantly add to the expressiveness of a language, unless there are great poetical opportunities in exploiting these distinctions. But you have ignored my illustration of the redundancy of Spanish in terms of words for sky/heaven. There are great poetical opportunities in discussing sky and heaven. There are of course some Spanish distinctions hard, or impossible, to put across exactly in English. So, translations from Spanish to English are bound to lose nuances, but this is even more the case - with knobs on, so to speak - the other way round. Speaking barely with a hint of a subjunctive? Who told you that? The subjunctive is a VITAL part of educated modern English - don't let anyone tell you otherwise! By the way, I am more interested in seeing a complete and comprehensive dictionary of Mandarin along OED lines (ie, including historical usages, but excluding Classical Chinese expressions if they haven't been used in Mandarin texts) than in seeing one of Spanish or French, as that is the topic of this forum. Quote Link to comment Share on other sites More sharing options...
fenlan Posted August 15, 2005 at 12:50 AM Author Report Share Posted August 15, 2005 at 12:50 AM Dennis, as I pointed out to you, the dictionary that you said had 5 million words in fact had at least 90% fewer than you said!! Dutch is a language, like German, that has a lot of compound words, and so the policy of which to include as main entries and which to include as subordinate entries is relevant. See my reply to Jose. Now, as this is the Chinese forum, your comments on classical Chinese in modern Mandarin are relevant. A lot of classical Chinese has appeared in Mandarin, but not all, by any means. If you were to go back to the earliest vernacular novels - Sanguo Yanyi and Shuihuzhuan - and examine the entire corpus of Mandarin, you would find that the entire 60,000 characters used at some point for Chinese as a whole would not be needed. But I don't know how many would be needed. Unicode includes 20,000 Chinese characters. Can all the 5 classic vernacular novels be printed in Unicode characters? Answers on the back of a postcard, please. Clearly, a lot of classical Chinese is part of modern Mandarin. Dennis, can you tell me, what is the largest Chinese-Chinese dictionary (of words I mean, not characters)? And is that dictionary restricted to Mandarin only, rather than the whole historical sweep of Sinitic? Quote Link to comment Share on other sites More sharing options...
Dennis Posted August 15, 2005 at 12:54 AM Report Share Posted August 15, 2005 at 12:54 AM 孟子見梁惠王。 A sentence from the Menicus but also characters that are used with the same meaning today Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.