Jump to content
Chinese-forums.com
Learn Chinese in China

markhavemann

BLCUP e-book .opz format

Recommended Posts

markhavemann

I'll post this here in case somebody else comes across the same problem.

 

I bought an e-book at https://www.blcup.com/. Somehow the e-book version was 100rmb but the physical book only costs 30 or something Taobao. But oh well, it's worth it if I can just carry a tablet to class instead of a bunch of heavy books. 

 

After paying for the book I was really annoyed to find out it's in some weird .opz format, and you need to download their own really crappy reader to open it. There is also nothing on the internet about the opz format or converting it to a better format. 

 

Anyway, here's what I figured out: 

 

  1. Rename file to .pdf
  2. open with PDF-XChange Editor
  3. It will open but say there are errors and ask if you want to save a new, fixed file. 
  4. Save it as a fresh PDF that can be read in any application

 

Unfortunately the text seems to have some weird encoding issues so copying to another application just results in garbage (not so great for quick looking up of characters). I'm trying to figure this out and I'll post the solution if I do. 

 

  • Like 1
  • Helpful 2

Share this post


Link to post
Share on other sites
Site Sponsors:
Pleco for iPhone / Android iPhone & Android Chinese dictionary: camera & hand- writing input, flashcards, audio.
Study Chinese in Kunming 1-1 classes, qualified teachers and unique teaching methods in the Spring City.
Learn Chinese Characters Learn 2289 Chinese Characters in 90 Days with a Unique Flash Card System.
Hacking Chinese Tips and strategies for how to learn Chinese more efficiently
Popup Chinese Translator Understand Chinese inside any Windows application, website or PDF.
Chinese Grammar Wiki All Chinese grammar, organised by level, all in one place.

thelearninglearner

As far as the copy paste thing goes, you can probably use pleco to ocr the parts you want to copy. Since you're using it on a tablet or phone

As far as the copy paste thing goes, you can probably use pleco to ocr the parts you want to copy. Since you're using it on a tablet or phone

Share this post


Link to post
Share on other sites
markhavemann
1 hour ago, thelearninglearner said:

As far as the copy paste thing goes, you can probably use pleco to ocr the parts you want to copy. Since you're using it on a tablet or phone

Yeah looks like I will have to resort to that. Not as convenient as copying and pasting, but I guess it will do.

Share this post


Link to post
Share on other sites
thelearninglearner
3 hours ago, markhavemann said:

Yeah looks like I will have to resort to that. Not as convenient as copying and pasting, but I guess it will do.

Maybe also check out some advanced pdf readers(I'm thinking Adobe reader) . Might have some features that can help. Extra conversions 

Share this post


Link to post
Share on other sites
markhavemann
13 hours ago, thelearninglearner said:

Maybe also check out some advanced pdf readers(I'm thinking Adobe reader) . Might have some features that can help. Extra conversions 

I've never had so many PDF editors installed on my computer at once. Eventually I found a tool to look at the "unicode mapping" tables of the PDF. Looks like the character appearances were saved as vector "glyphs" so that they could be displayed, and a text character is link to each one, but when the PDF was created it didn't specify WHICH unicode character was linked to which glyph, meaning it's completely unrecoverable without identifying each character manually.

 

5 hours ago, 大块头 said:

ocrmypdf may be a solution

I eventually settled on PDF-Xchange's built in OCR, which seems to work much better than Adobe for some reason, and it also had the option to OCR existing "text" which saved me having to flatten each page into an image or anything like that. 

  • Helpful 1

Share this post


Link to post
Share on other sites
ChTTay

Another option (for next time!?) would be to buy the book and manually scan it in. It sounds tedious but it’s not that bad now that phone scanners are decent. I scanned in a 300 page textbook myself and it took less than an hour. You could also do a chapter at a time if you wanted to (probably takes less than 5 minutes). That hour included taking the scans and tweaking a few of them. It’s not as perfect as it would have been if there was a pdf actually available (it’s an old book) but for personal use on my iPad it’s great. At least it is a pdf file that can be opened by any standard reader. 

  • Like 1

Share this post


Link to post
Share on other sites
NinKenDo

I'm guessing the OPZ format used some kind of character map and that's why you get garbage out. To get it to be proper text data, you would need to know the mapping of the codepoints, which might be relatively easy if they've just shifted them over by 1000 or something, just to make copy-pasting not work, but if it's a full remapping that might be harder. Given the size of the book, my guess is that they haven't done a complete remapping as that would cut down on the size dramatically (assuming they didn't do a random shuffle just to prevent copy-paste).

 

If they have just shifted them over, you could reverse engineer it by just looking at the glyph, finding the relevant Unicode codepoint, and calculating the difference from the codepoint that sits under the glyph in the file. Check against a few characters to be sure, and if two or three map the same way, probably that's what they've done.

Share this post


Link to post
Share on other sites
markhavemann
8 hours ago, NinKenDo said:

 

If they have just shifted them over, you could reverse engineer it by just looking at the glyph, finding the relevant Unicode codepoint, and calculating the difference from the codepoint that sits under the glyph in the file. Check against a few characters to be sure, and if two or three map the same way, probably that's what they've done.

That's a good point. I've noticed that copying "的人“ pretty consistently gives ".¶" while 的 alone is "3" and 人 alone is "+" so it does like like there is method to the madness but it's slightly beyond my own expertise unfortunately. 

 

I've uploaded the pdf here if you or anyone else wants to try crack the code.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...