Jump to content
Chinese-forums.com
Learn Chinese in China

  • Why you should look around

    Since 2003, Chinese-forums.com has been helping people learn Chinese faster and get to China sooner. Our members can recommend beginner textbooks, help you out with obscure classical vocabulary, and tell you where to get the best street food in Xi'an. And we're friendly about it too. 

    Have a look at what's going on, or search for something specific. We hope you'll join us. 
webmagnets

Is Chinese text on webpages already segmented?

Recommended Posts

webmagnets

When I go to an English language web page and long press on a word, it will highlight the entire word. I can understand how the browser or OS knows where the word starts and finishes because it has spaces.

 

However, with Chinese it still knows where the words are. When I long press on the 大 of 大家, the entire 大家 gets highlighted. How does that work?

Share this post


Link to post
Share on other sites
Site Sponsors:
Pleco for iPhone / Android iPhone & Android Chinese dictionary: camera & hand- writing input, flashcards, audio.
Study Chinese in Kunming 1-1 classes, qualified teachers and unique teaching methods in the Spring City.
Learn Chinese Characters Learn 2289 Chinese Characters in 90 Days with a Unique Flash Card System.
Hacking Chinese Tips and strategies for how to learn Chinese more efficiently
Popup Chinese Translator Understand Chinese inside any Windows application, website or PDF.
Chinese Grammar Wiki All Chinese grammar, organised by level, all in one place.

Publius

Doesn't seem to work for me.

Screenshot_2018-12-26-16-08-58-874_com.android.chrome.thumb.png.8092df39ad0517f641b01e8534755edc.pngScreenshot_2018-12-26-16-11-13-932_com.android.chrome.thumb.png.2edc8ec1802724745c156de4effde378.png

Share this post


Link to post
Share on other sites
wibr

On iOS you see a word segmentation happening if you select something, but that's a function provided by iOS, not by the website.

Share this post


Link to post
Share on other sites
webmagnets

I'm seeing this on my Chromebook, but not on my Android.

Share this post


Link to post
Share on other sites
imron

I would guess it's realtime segmentation done by the OS.  It'll never have to segment much text because it only needs to look forward and back to the nearest punctuation and/or whitespace. 

  • Like 1

Share this post


Link to post
Share on other sites
mikelove

Can confirm this is the case, yes; AFAIK all major OSes now include some sort of Chinese word segmentation support, though not every browser / text editor necessarily taps into it.

 

The default approach (used by anybody without the AI chops to do better) is to use ICU's dictionary-based word segmenter, which finds possible breakdowns using a Chinese word list and then picks the most likely one based on word frequencies. (pretty much the same thing we do, though our dictionary's bigger because we're not asking OEMs to devote flash storage to it on a billion devices 🙂)

  • Like 4

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×