(Not) finding phonetics that have Outlier entries in Pleco

October 22, 2019 at 05:16 PM

I understood you to say that native speakers don't realise that 監 is the sound component in 藍 and 籃. I wondered what part of those characters they thought was the sound component.

October 22, 2019 at 05:20 PM

Ah. I didn't ask, but I assume they think they're 會意字, which is how most characters get reinterpreted by native speakers. Though the actual number of 會意字, especially ones composed of meaning components (as opposed to form components) is very small. It's kinda of an automatic response for the native speakers (including Chinese teachers) I've interacted with to explain everything in terms of 會意.

October 22, 2019 at 05:59 PM

2 hours ago, Ash@Outlier said:

No, they haven't been done before. Who did them? Give examples

Sorry, but you go straight on to say that Wenlin has data (but it's not very good), that it breaks down characters into their component parts (but less well).

You're literally saying that Wenlin does what you do, but you do it better!

Much better? I certainly think so. Really I do. Loads of cool information - I just saw, for instance, the little graphic to show how the 巳 in 导 traces elements of 道 (most teaching would just say it's a simplification of 道 and leave it at that - but your graphic is way way more helpful).

So you put lots of very helpful stuff on a plate for learners. And I can imagine they may well learn faster as a result.

Although I can imagine certain kinds of intense Heisigging-plus-sounds - started only after modest competency in spoken Chinese has been achieved - might well be faster. But even such a learner would still want your dictionary later in order to make sense of things after the event and enjoy the fun "aha, that's why there's this, that's why there's that" moments which are so satisfying once you already know 2 or 3 thousand characters. And not just fun - that would help cement understanding too.

We may have to agree to disagree about whether any of this is revolutionarily new, instead of being a cool dictionary that will help learners loads - especially those being taught characters in an otherwise unsystematic way. (And I do think that Wenlin can boast one feature which you lack, which is to get a list of all the common characters with a given component.)

But maybe I'm just allergic to [edit: excess enthusiasm by a vendor], even when it's rather good stuff that's being [slightly enthusiasticated].

October 22, 2019 at 06:57 PM

@realmayo First of all, I would like to thank you for this discussion. It shows that we can disagree on fundamental things without resorting to insults and sarcasm. Personally, I'm a big fan of sarcasm when talking with friends, but talking over the internet with people you don't know, if usually leads to arguments and bad blood that could easily be avoided. I'd also like to say, I appreciate your skepticism. I'm totally okay with that. In fact, we are making claims and it is up to us to back up those claims.

I'm also okay with agreeing to disagree. I did list 3 points for why I think what we are doing is revolutionary and still stand by them. But, once again, data is not software and software is not data. The searches you're talking about will be added to our stuff. The data is basically ready, minus editing.

Btw, I love Wenlin. I've been using it for probably 20 years now. We were in talks with them to sell our data via Wenlin a few years back. It stopped due that big internal problem we had back then. At some point, I'd like to pick up the conversation with them again. I'd love to see our data there. But, it's not just that they break it down "less well." It's that there is no unified system (unless you're talking about 六書, which is itself problematic). Having a simple unified system helps a lot in making things understandable and keeping out doubt (which is the enemy of remembering). It also ensures that each character is explained within a familiar framework. Learning 4 component types in order to learn 1000s of characters is cheap and is something you'll reuse 1000s of times. It becomes automatic after a while. Also, Karlgren and Wieger simply are not paleographers (paleography basically didn't even exist when they were alive). Karlgren said you don't need to go any further back than Small Seal. It's hard to even conjure the words to express how wrong of a statement that is.

Also, I agree with you that learning characters is best done after being able to speak at least basic Chinese (more specifically, after having already mastered the sounds of the language), but our dictionary is agnostic to when you learn. It's just there for you for whenever do learn.

The feature we lack is the system level data. It will be added probably when (or near the time) Pleco 4.0 comes out.

I agree with you about the hype bit. I just disagree that we're hyping anything. : )

Once again, thanks for the conversation.

October 22, 2019 at 08:57 PM

I would like to tag on a request to this interesting thread. Can you tell us when written entries will be returning, as that's all many of us really want here.

I'm sure many of us bought into outlier not because we were expecting it to live up to the kickstarter hype (yes it was hype, and thank god it wasn't true, because a good dictionary should take decades to finish, and supporting a kickstarter in my opinion is about getting niche projects off to a start, not about buying a finished product). I am guessing many were probably hoping like myself that we might get academic-style writeups updating into the outlier dictionary evry month or so. I love reading the updates when new character writeups do come through, but they appear to have become so rare now that sometimes I wonder whether any progress worthy of note is actually being done anymore. I'm not interested in algorithm generated entries, but rather the amazing little handcrafted entries that bring characters alive in a way no other book has ever done before (that is where the usp is for me at least, although whether that counts as 'revolutionary' is another question).

In my humble opinion, if outlier really want to be an 'outlier', forget about the numbers and trying to compete with the cihai for volume of entries (joke...). It would be great of you could get some solid researchers and writers to get those excellent tidbits and stories behind characters back out there and into the dictionary.

@OneEye and @Hofmann are posters on this forum that first got me really excited about character etymology; I believe they are both part of outlier, is that still correct? Can you tell us more about what is actually going on with the dictionary at the moment for those of us not interested in your coding efforts (no offence meant, I'm just not interested in that side of the progress).

October 22, 2019 at 09:20 PM

Sure. But, like I said, I do both coding and the paleography, so they are interrelated when it comes to progress.

Current state:
I'm up to past 2400 entries that have a completed Essentials, paleographic analyses (the dictionary in Pleco currently has 1750 + roughly 300 component entries).

1. So, for Essentials data, we have roughly 650 new analyses that are waiting to be converted into entries, then go through editing, then be sent to Pleco

2. But, I've been writing code for the EEE (Expert Entry Expeditor). It's already in working order. I've made about 10 new Expert entries during testing. The issue

now is getting John access to it. This has cost us about a week to 10 days so far. I think I have a solution and will verify that today. If that solution works,

(and by all accounts it should), then we can start on doing another 90 Expert entries this week. I'm going to estimate that that will progress at 20 entries per week,

then some time on top of that for data testing and editing (say 4 days). We will put out these 100 Expert entries before we get to converting the Essentials entries.

3. We have system level data basically sitting around until we have time to go through and edit it and put into a format acceptable to Pleco. That data covers

all entries currently published plus some more. I'm thinking after the Expert data goes out, we'll put out the first 200 to 250 of the 650 Essentials entries that

are waiting, then do the system data.

The good news is, all of the delays due to me writing new tools for dictionary development are basically done. Like I said, I have the EEE working already, I just

need to get it working for John, then make a few minor adjustments that won't take more than 2 days max. After that, we will be in a position to put out a

steady flow of new entries (both Essentials and Expert), since all time that was being spent on coding will go into making new entries. Though, there is some

code I need to finish up for something else (which is already 95% complete), but I should be able to do it in parallel to doing the new Expert entries.

As far as how often the updates will come out, I'm guessing probably every two months to cut down on testing time (which eats up development time). Of course,

if everyone would rather have a slightly slower monthly rate so that they can get monthly updates, then I'm open to doing that.

Quote

because a good dictionary should take decades to finish

Indeed. The reason we will be able to finish faster than that is due to the custom tools I've created for speeding things up. Those weren't available historically to dictionary makers.

@OneEye is still at Outlier. @Hofmann has moved on to greener pastures.

Sign In

(Not) finding phonetics that have Outlier entries in Pleco

Recommended Posts

realmayo

Link to comment

Share on other sites

Ash@Outlier

Link to comment

Share on other sites

realmayo

Link to comment

Share on other sites

Ash@Outlier

Link to comment

Share on other sites

Tomsima

Link to comment

Share on other sites

Ash@Outlier

Link to comment

Share on other sites

Join the conversation