Jump to content
Chinese-Forums
  • Sign Up

Sinoscope subtitles, a dialogue aid


ParkeNYU

Recommended Posts

I am working on a program that accepts an audio file containing Chinese speech alongside a corresponding transcript. The program will display the corresponding character of the spoken syllable for a duration commensurate with that of the utterance. In other words, the character will only be visible while the corresponding syllable is being spoken (essentially acting as one-dimensional subtitles). The temporary and volatile nature of the subtitle reflects these properties of speech (you can neither read ahead nor look back to catch what you've missed, just as in oral conversation). I believe that this system would help students with both character recognition and speech comprehension, as their pairing would be mutually beneficial; think of it as 'training wheels'.

 

Here is a sample in Mandarin that I made as a proof-of-concept video:

 

https://www.youtube.com/watch?v=q0jX-p-uGkY&feature=youtu.be

  • Like 2
Link to comment
Share on other sites

This looks interesting. It also has the same properties when used with English.

 

It is a theory that if you put up English words in the same place on the screen you can increase the speed at which you can read and understand.

 

There are a couple of real world examples, one is used by Honda if I remember correctly, in one of their tv ads.

 

http://creativity-online.com/work/honda-uk-keep-up/39050

 

Reckons it is 500 words per minute.

 

The more I think about this, the more I think your idea has serious merit.

Link to comment
Share on other sites

Thanks for the Honda add; it really helps support my hypothesis!

 

While it is possible to implement this system in just about any language, I think it is most compatible with Chinese languages because:

 

1) Chinese characters have perfectly square dimensions. Most written languages transcribe units of speech as linear strings of glyphs (usually horizontally, but sometimes vertically, as in Mongolian and Manchurian). These equilateral shapes are easier for the eye to scan quickly than strings of varying lengths. The monosyllabic words 'I' and 'twelfths' would occupy different amounts of space, thus straining the reader. It would also be necessary to determine whether such strings should be centered on screen or left-justified (or otherwise right-justified for Arabic and Hebrew); the former would encourage the reader to process the entire word as a single graphic unit, whilst the latter would prompt the reader to 'sound-out' the word from its initial letter.

 

2) Chinese languages have one syllable per one morpheme per one character (with extremely rare exceptions). This sets it apart from Korean Hangeul, which, whilst also squarely proportioned, can have phonemes that bleed into neighboring syllable blocks (e.g. a word spelt 'hag-a' yet pronounced as 'ha-ga'). Even if one were to render Japanese entirely in Kana, it operates on the level of 'mora' rather than syllable, which also disqualifies it (e.g. 'kou-sen' deconstructed as 'ko-u-se-n').

Link to comment
Share on other sites

I believe it is called stationary-window condition and is the third condition of three ways they tried to speed up reading

 

1) the text appears in the top left of the "window" and new words are added without taking away the old words.

 

2) the text appears in the centre of the screen but the new word appears next to the old word which disappears once the new word is there.

 

3) the words appear one after another quickly in the centre of the screen with the old word disappearing as soon as the new word appears.

 

There is an excerpt here https://books.google.co.uk/books?id=zCxf_LVzXoEC&pg=RA1-PT175&lpg=RA1-PT175&dq=stationary-window+condition&source=bl&ots=RsBBb739GB&sig=EIcXKXt3eFB8GviVIKwNnPOqJko&hl=en&sa=X&ved=0CCEQ6AEwAGoVChMIlarugKPZxwIVDBjbCh1WVA2C#v=onepage&q=stationary-window%20condition&f=false

 

I agree Chinese does seem well suited to this.

 

I am interested to see how you get on with this and if there is anything I can do to help, ask and if I can, I will

Link to comment
Share on other sites

"These equilateral shapes are easier for the eye to scan quickly than strings of varying lengths. The monosyllabic words 'I' and 'twelfths' would occupy different amounts of space, thus straining the reader. "

Show me the science. Despite common use of the word 'scan', when you actually look at what the eye does, you see brief fixations and jumps. Readers of both Chinese and English will focus on one point and absorb information around that point - not 'scan' a word from start to end - and then jump forward (or maybe back if they need to). Information is actually cut off when the eye is moving. As for straining the reader - well, we all seem to read books without, in the main, developing headaches. And if I was feeling unfair and unscientific, I might even ask you to explain this ;-)

 

This looks like it might be a fun little tool, IF you can get the characters and utterances lined up in real-world input. 

  • Like 1
Link to comment
Share on other sites

@roddy

I meant 'scan' as in the way one would scan a barcode (a fixed point). I didn't mean to imply that the eye was absorbing information while in motion; I concede that my choice of words may have been misleading. My intended assertion was that this point of fixation is wider in alphabetic scripts, while a Chinese character represents a quadrilaterally symmetrical point. Although our vision is stereoscopic, both eyes nevertheless fixate on a single point, along with its surrounding area in all directions (not just on the sides).

@imron

The concept of a multisyllabic 'word' as distinct from a monosyllabic morpheme (usually assigned to a character) is a relatively recent concept in (written) Chinese. These Chinese 'words' arose from a need for disambiguation in speech (and later for technical and modern nomenclature); they only really proliferated in tandem with vernacular writing.

That aside, flashing polysyllabic words defeats the purpose of this system. Chinese languages are unique in their near-perfect morpheme-syllable ratio (as far as I'm aware), and these pairings are well represented by characters. Aside from the fact that word boundaries are not always clear in Chinese, there is simply no need to visually group the morphemes within a word when the speaker will have to utter two syllables during that short period anyway. Just as two syllables are not vocalised simultaneously, neither is there a need to display two morphemes (and two characters by extension) similtaneously in a system designed to reflect speech as a visual analogue.

Link to comment
Share on other sites

they only really proliferated in tandem with vernacular writing.

Yes, which came about because people wanted to write in the language they spoke, rather than a language that they didn't.

 

That aside, flashing polysyllabic words defeats the purpose of this system.

Flashing monosyllabic characters will make the system cumbersome because the reality is, the language as used and spoken today is made up of polysyllabic words.  Splitting them up will just make it difficult for people to identify word boundaries.  I  ma  gine  try  ing  to  read  Eng  lish  split  up  in  to syll  a  bles.  It's significantly more burdensome than reading whole words.

Link to comment
Share on other sites

"Splitting them up will just make it difficult for people to identify word boundaries."

I agree, and that's by design. This system preserves all of the difficulties of speech comprehension, including undefined word boundaries, except for one: homophony. I do not define word boundaries for the same reason that I do not provide punctuation: it pushes too far into 'reading' territory. By the way, English orthography does not reflect syllable boundaries as clearly as Chinese does.

Link to comment
Share on other sites

This system preserves all of the difficulties of speech comprehension

It actually makes it more difficult because people speak in words, and the pauses and emphasis they make provide clues as to where the word boundaries are.  People don't speak in a staccato of monosyllables.

Link to comment
Share on other sites

If we're talking about Chinese speech, then I cannot detect word boundaries at all unless I am already familiar with the words in question. Tactically placed pauses do allow me to demarcate phrases and clauses without prior acquaintance, but never words. I'd imagine that it is because Chinese languages are tonal whilst my native language is not.

Aside from all that, the other reason is that placing two characters on the screen simultaneously does not force the reader to absorb the characters at the same rate as the speaker is vocalising the corresponding syllables. For example, the user could remain fixated on the complicated 鬱 of 鬱悶, and not have time to move on to the neighboring 悶 before the speaker has finished uttering both syllables. In short, it would be less conducive to reader-speaker synchronisation; I want to ensure that each character is given an amount of time on screen congruent with the duration of its corresponding syllable.

Link to comment
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...