Jump to content
Chinese-Forums
  • Sign Up

Book text to machine text


Flickserve

Recommended Posts

what are the solutions of converting book text into machine text?

I have a couple of books with Chinese and English sentences. Basically, they are mass sentences which I want to convert into Anki notes (I already have the audio).

Since these are mass sentences, the only easy way (time efficient) I can think of is to pay somebody to type it out. I live near a University in Hong Kong and so could get a student to do it. Each page has about 12 sentences, total of nearly 700 pages.

Link to comment
Share on other sites

The solution is either using OCR software like FineReader or like you said paying someone to type it. The OCR result will definitely need some corrections, how many depends on the software, quality of the scans, font, etc.. Usually OCR software packages have a free trial so that you can see if that could work or not.

  • Like 1
Link to comment
Share on other sites

OCR works well with books, but you would do best to cut out all the pages first so you can scan them flat. However, as wibr says, you will need to read through the result in case some characters have been wrongly identified. You can create PDF files and import them direct into Abbyy Finereader or an alternative OCR program.

  • Like 1
Link to comment
Share on other sites

Scanning tends to be pretty slow, especially with 700 pages. Try snapping photos instead; a real camera of course is better than your phone. You may have to convert the .jpgs into .tiffs and do some trial-and-error adjustments. There's a photocopy filter on Photoshop that's good for this, and you can also automate the conversion on Photoshop.

  • Like 1
Link to comment
Share on other sites

Yes, if you've got lots of time on your hands you can do that. If you photocopy you can use a multi-sheet feeder and get one file. Pleco is great, of course. I haven't tried it with a 700-page book though.

Link to comment
Share on other sites

Scanning tends to be pretty slow, especially with 700 pages. Try snapping photos instead; a real camera of course is better than your phone. You may have to convert the .jpgs into .tiffs and do some trial-and-error adjustments. There's a photocopy filter on Photoshop that's good for this, and you can also automate the conversion on Photoshop.

I am just reinstalling my software. Havent got round to photoshop yet. Lightroom can do a mass export to TIFF in a straightforward manner but no photocopy filter.

What does the photcopy filter actually do? I tried searching on the internet but not much detail. Wouldn't the the original jpg/raw converted to TIFF will hold better quality?

Link to comment
Share on other sites

The photocopy filter on Photoshop does a very good job of producing a Xerox-like page from a jpg. Doing a direct conversion from a jpg to a tiff often doesn't work well in practice because uneven lighting produces splotches on the page. Just try it. (Of course, the better and more evenly lit your photos, the better the results, but it's hard to keep the quality up when shooting 700 pages.)

 

The clearer and more distinct the text from the background, the better the results. It may "work" if you just feed in an unadjusted jpg but you'll probably have more errors to correct than if you use images that have been adjusted to resemble sharp black and white photocopies.

 

There are always going to be errors to correct no matter what method you use, but with 700 pages it's important to keep that error rate as low as possible.

 

In any event, you need to make some trial runs and see what works well for you with your camera and software.

Link to comment
Share on other sites

Ahh, so uneven lighting makes a difference. Thanks for that tip! Will try to even it out before starting.

One issue is that the pages are not wide and the book is thick being 700 pages. So, the pages are a little difficult to keep flat.

Link to comment
Share on other sites

  • 1 month later...

I held off this project for the moment. The book in question is from Taiwan containing 8000 chinese -english sentences. However, my learning is orientated towards mainland chinese style putonghua at this initial stage since I haven't enough structure and vocabulary to discern the differences. The majority of putonghua first language persons that I come into contact with are from the mainland.

 

Apparently, there are similar books on the mainland. I will visit Guangzhou early next month so I will try to pick up such a textbook. Supposedly, the expressions contained in such books are very natural to mainland speakers.

 

I am in no rush. There's plenty of other aspects of language learning to do.

Link to comment
Share on other sites

I'd say enter them yourself and treat it as a pinyin typing/listening exercise. Imagine how good you'll be after entering 700 pages.

even the great TysonD would pay others... :-)

*edit* I am not too interested in typing out pinyin and I do it anyway when looking up words in pleco or typing in wechat. Besides, training pinyin is not my primary objective of converting it into text form.

Link to comment
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...