Jump to content
Chinese-forums.com
Learn Chinese in China

  • Why you should look around

    Since 2003, Chinese-forums.com has been helping people learn Chinese faster and get to China sooner. Our members can recommend beginner textbooks, help you out with obscure classical vocabulary, and tell you where to get the best street food in Xi'an. And we're friendly about it too. 

    Have a look at what's going on, or search for something specific. We hope you'll join us. 
webmagnets

Is Chinese text on webpages already segmented?

Recommended Posts

webmagnets

When I go to an English language web page and long press on a word, it will highlight the entire word. I can understand how the browser or OS knows where the word starts and finishes because it has spaces.

 

However, with Chinese it still knows where the words are. When I long press on the 大 of 大家, the entire 大家 gets highlighted. How does that work?

Share this post


Link to post
Share on other sites
Site Sponsors:
Pleco for iPhone / Android iPhone & Android Chinese dictionary: camera & hand- writing input, flashcards, audio.
Study Chinese in Kunming 1-1 classes, qualified teachers and unique teaching methods in the Spring City.
Learn Chinese Characters Learn 2289 Chinese Characters in 90 Days with a Unique Flash Card System.
Hacking Chinese Tips and strategies for how to learn Chinese more efficiently
Popup Chinese Translator Understand Chinese inside any Windows application, website or PDF.
Chinese Grammar Wiki All Chinese grammar, organised by level, all in one place.

Publius

Doesn't seem to work for me.

Screenshot_2018-12-26-16-08-58-874_com.android.chrome.thumb.png.8092df39ad0517f641b01e8534755edc.pngScreenshot_2018-12-26-16-11-13-932_com.android.chrome.thumb.png.2edc8ec1802724745c156de4effde378.png

Share this post


Link to post
Share on other sites
wibr

On iOS you see a word segmentation happening if you select something, but that's a function provided by iOS, not by the website.

Share this post


Link to post
Share on other sites
webmagnets

I'm seeing this on my Chromebook, but not on my Android.

Share this post


Link to post
Share on other sites
imron

I would guess it's realtime segmentation done by the OS.  It'll never have to segment much text because it only needs to look forward and back to the nearest punctuation and/or whitespace. 

  • Like 1

Share this post


Link to post
Share on other sites
mikelove

Can confirm this is the case, yes; AFAIK all major OSes now include some sort of Chinese word segmentation support, though not every browser / text editor necessarily taps into it.

 

The default approach (used by anybody without the AI chops to do better) is to use ICU's dictionary-based word segmenter, which finds possible breakdowns using a Chinese word list and then picks the most likely one based on word frequencies. (pretty much the same thing we do, though our dictionary's bigger because we're not asking OEMs to devote flash storage to it on a billion devices 🙂)

  • Like 4

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...