Jump to content
Chinese-Forums
  • Sign Up

Online archive of text messages


roddy

Recommended Posts

Edit: I made it! RandomChineseSMS.com.

Just running this idea up the flag pole to see if anyone salutes.

Signese.com is an online archive of pictures with Chinese characters in. Between myself and a small number of contributors (Thanks Skylee and Co!)it has, over the last few years, accumulated about 1,000 photos. I like the site, if only because stuff like this and this should not go unnoticed. Or this. I know some people find it a nice source of random reading material. Need some help ordering meat on sticks?

I'm toying with the idea of having something similar for text messages. As with Signese.com contributors could send anything they want in directly from their mobile phone. Messages would become available on the site, via RSS or possibly be sent out to subscribers via SMS. It could be something funny, fascinating, enigmatic, random.

Would anyone visit that site? Would anyone contribute?

Bear in mind I might never get round to it.

Edited by roddy
Link to comment
Share on other sites

  • 2 weeks later...

Actually I wasn't thinking so much of the chain ones, though they could certainly be included. One of the things I like about signese.com is the fact that often the pictures are part of a story. But you don't know the story. There's a bit of an enigma there. Pick the right SMS and you could do the same thing.

你是穿红衣服吗?

我年纪好大了,我可不能不结婚啦,否则嫁不出去了

中午食堂被临时搬到了篮球场上,学生们在篮球场上买饭,向灾民一样,很搞笑。

哎哟,没有位子。。。满满的

Edited by roddy
finally spotted a typo
Link to comment
Share on other sites

Just came across this.

Dammit Roddy,

Stop beating me too it.:mrgreen:

We already have a site up and running for it. (for several months now actually)

So there it sits, like an empty comments spam collector racking up hosting fees by the month.

I just need help with technical aspects of it. It is one of several projects that I've been trying to push out that are always held up by technical aspects.

Slightly different though: the goal of the site is to use, everyday text messages as encountered by a foreigner in China as a way to learn Chinese.

By:

The goal is to get them annotated and have notes for each one. (provided by my teachers in their excessive downtime). Discussion would also be great by foreigners.

I have been having one of the employees put together a database type in them in by hand because we know no other way to do it. We have several hundred typed up already in the goal is to have at least 1500 shortly (already collected and parsed of evil shu1mian4yu3 and non-useful stuff).

There are two ways which I am considering doing it:

  • Progression based: only text messages with new characters or grammatical points are introduced later. This would be most beneficial to learners.
  • Everything: this would essentially be a text message soap opera hence the original title of the site "myChineselife"

The problem with the second approach is how to group things together. Tags? Threads? String cheese? Chonological as the come in. (currently preferred approach and organization)

Messages have been taken from a learner of an intermediate level in China as he does business and flirts with girls. Accordingly a lot of the messages are strictly kou3yu3 which is exactly why it's important to do the text message database as opposed to any of the other online things which tend to be more shu1mian4yu3. (Chinese refuse to write kou3yu3 except in the context of text messages). Outgoing messages have not been input (don't want to pass on the foreigner's mistakes as knowledge)

I too bounced off the idea of calling for volunteers on the form or elsewhere to send me text messages, but I worried that this would make it impossible to follow and that different levels would become mixed up. (Not to mention I would have no friggin clue how to set this up conveniently other than to have people forward messages to one of my cell phones and have somebody typed it in for me)

I would be happy to put up other types messages and one-offs (or jokes) if they could somehow be separated out. (Again the technical aspect) One of the original goals was to create a lexicon based on the text message database, because text messages by their very nature are inherently pragmatic and a great way to introduce characters based on common useage first. We have an informal one for use within the school. One -offs would have to be separated out to avoid skewing things. But the feedback I have gotten on this for several months has been positive and some people thought they would contribute. If I can add in the explanations/notes it would really give a push I am sure.

Anybody that knows anything about content management systems wants to give me a hand? I am in Beijing. Currently I tried to do it using Wordpress, but then deleted the data. Like I said that next up is getting the rest of the data input and changing names and so forth. But meanwhile if someone knows a good way I can structure that would be great.

Link to comment
Share on other sites

  • 1 month later...
(Not to mention I would have no friggin clue how to set this up conveniently other than to have people forward messages to one of my cell phones and have somebody typed it in for me)

Many cell phones have software that let you view your phone's SMS' on your PC, but this requires you opening up each message and doing a copy-paste of the contents. Better yet, you can use an application that lets you archive SMS messages into a database-like file format. I know that Nokia has such a thing, it's called "PC Phone". From there, if you know the file format you can write a tool to parse out individual messages. If not, then you will still have to use copy-paste but since the archive is now on your PC it should be a bit faster than using the viewer.

Either way it beats typing them in manually.

Link to comment
Share on other sites

Would anyone visit that site? Would anyone contribute?

Roddy,

I can't contribute because I don't have any friends to send me Chinese text messages but yes I would definitely visit a site where I can read Chinese text messages and learn a lot of kouyu (and maybe shumianshu).

Is it up and running now or what?

Link to comment
Share on other sites

I would definitely check it out.

Having different sections for different dialects would be great too. When we were in Shanghai, my wife would send me Shanghainese text messages.

For example:

侬吃过了伐?

Receiving text messages from my Dongbei friends or Shanghai friends would always have a special touch.

For example, my Dongbei friends would always start a text with 兄弟。

Anywho, great idea, I wish I was still in China. My mobile phone does not support 汉字 and writing in pinyin is not cool.

Link to comment
Share on other sites

Anywho, great idea, I wish I was still in China. My mobile phone does not support 汉字 and writing in pinyin is not cool.

My sentiments exactly. Not having Chinese language text here in the States is so lame. That's why I would definitely visit the site Roddy is proposing.

Link to comment
Share on other sites

  • 2 weeks later...
Many cell phones have software that let you view your phone's SMS' on your PC, but this requires you opening up each message and doing a copy-paste of the contents. Better yet, you can use an application that lets you archive SMS messages into a database-like file format. I know that Nokia has such a thing, it's called "PC Phone". From there, if you know the file format you can write a tool to parse out individual messages. If not, then you will still have to use copy-paste but since the archive is now on your PC it should be a bit faster than using the viewer.

Either way it beats typing them in manually.

Yes I am aware of this and I have the no key a phone software suite. It will work if the messages are in English you have to open them up one by one as each one is created in a text file. However, for Chinese this is not been possible. I've tried every encoding fixer and program that I can think of.

The phone is in no key a 3660. Nokia has been no help at all and originally told me that the phone was unable to be converted to displayed Chinese characters - a year later the phone crashed in the operating system was switched to allow Chinese characters. Pinocchio software still works in order to backup and transfer files but it is not possible to read the text files that it produces.

Apparently, newer versions of the Phone PC software allow the text messages to be browsed on the computer-- this version for that phone that was never the case and it was only to backup the messages.

So up until this point I have been getting them typed in by hand and paying for it. It is dreadfully slow obviously.

If anyone can help with this it would really really be appreciated because I looked at the website and I realize that this has been dragging on for a year and there is a ton of content that is ready to be put up their. I'm holding off on the reposting until I'm holding off on the reposting of data until I can get it worked out. It turned into a mess last time and I need someone who knows what they're doing to help out with the project.

Check out the beta version, very beta, www.mychineselife.com

Link to comment
Share on other sites

I just looked at it and it's very nicely done. The online archive of text messages should look like that is what I am thinking.

Nicely done!? Haha. I wish. But thank you. I want it to be so much more but you can see that the first post was a year ago and absolutely nothing has been done and it's all for technical reasons. It is driving me insane. I am hesitant to continue posting the data because I'm afraid I will have to redo it again. Even posting the data is taking about seven minutes per post because automatic blogging tools don't seem to work and the formatting comes out to all jacked up when trying to pull it out of the Excel file.

Like I said before there are several hundred that are already typed up and ready to go. In fact when I looked at it I realized that about a thousand of them are ready to go. In total I have about 2000 messages. The remaining several hundred I am trying to avoid having them being typed in by hand = lost money.

Link to comment
Share on other sites

Current process:

1. Individual parsing of / sorting of text messages while they are still on the phone. The time is hard to calculate - but it is quite easy to spend hours doing it.

2. Individual messages are typed into the computer by hand (this is part I also need help with because the remaining 800 messages on the phone are there). They are put in an Excel file. Date and time of the message are not kept because it is not convenient and effective export them from a phone I think the data could still be there.

3. Mistakes are noted for each message, and highlighted in red

4. Each message is copied and pasted into a program which then converts it to pinyin. I'm going for 100% accuracy on the pinyin and therefore am using Wenlin.

5. Each message has to be segmented by word first

6. Duo1yin1zi4 ambiguities must then be resolved by hand

7. Pinyin is then recopied back into the Excel file one text message at a time

8. Steps four through seven are repeated for those messages that have mistakes

9. Notes are made for each message if needed

10. English translation is added

11. tons of crazy in data manipulation and Excel file wrangling HOURS!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

12. The data is converted into column instead of horizontally aligned

13. The sound file is uploaded to WordPress

14. each message is copied to Microsoft Word

15. The data is converted from table format to text

16. The Chinese text itself is copied into another program

17. The other program will generate the word lists

18. The wordlist is then inserted in the appropriate spot in the Word document

19. A couple formatting changes by hand for every message

20. That particular message is copied and pasted into WordPress

21. Two drag operations create the post name

22. Whitespace is then eliminated

23. The link to the sound file is inserted at the bottom

24. The wordlist is changed into bullets otherwise the formatting will be all jacked up

25. Tags are added that were noted in the Excel file and any additional ones needed

26. Posted

This is a simplified process of course there are always snags and questions and a little miniature discussions which eat away at my time.

I forgot a step: recordings are made this is done normally somewhere around step nine. Each message is recorded four times (one of the reasons I hired her as because of that recording like quality of her voice and I pay extra for it-- I might as well use it)

Another step: the messages are categorized while they are still an Excel file

Another step another edit :mrgreen:: while it is still in column format things are sorted by name. Then the names are replaced with pseudo names from a separate spreadsheet.

For anyone that knows what they're doing I'm sure there are little scripts or other things that could be done to make this much faster some kind of database operation but I have no clue.

Link to comment
Share on other sites

From the look of that list it'd be easier to just have a form where people submit their address, then you send someone round on a bike to deliver a hard copy of the site :shock:

Couple of tools that might come in handy.

Fanfou (chinese) and Twitter (US/UK and others) are both SMS blogging platforms. You can take an SMS of your phone, forward it to a number and it will appear on your Fanfou page. That page will have an RSS feed, which you could then use to pull the message into other applications. Both sites can also plug into IM applications.

China Mobile's Feition software is a desktop SMS application. I haven't played with it very much, but it shouldn't be too hard to get it set up so you can forward an SMS from your phone and it'll pop up in an IM box on your computer.

Don't know what phone you are using, but it may well be possible to just export in bulk - ie this tool for SE phones (haven't actually used it, mind). I'd assume you've looked at that option and given up though. (edit, yes you have, forgot I'd read that. But to be honest I'd look again and again, as that's your major bottleneck.)

On segmentation / pinyin, I'd use Adsotrans over Wenlin - it handles word boundaries and pinyin based on context, while Wenlin just expects you to tell it what to do. Sure, it might not get it 100%, but it'll be close and any errors will be consistent - unlike the ones you are bound to make clicking all those pesky circles in Wenlin.

Also, there's no need to paste the messages one by one that I can see. Copy an entire column out of Excel, process it however you want to, then paste it back into excel. The cell boundaries will be converted to, and back from, line breaks with no problem.

After that point I'm not entirely sure what you are doing. From stuff I've done with Signese.com though, I'll suggest you look at the following (this gets a bit technical and may require some research, but it sounds like it might be a massive time saver)

Export the posts table from Wordpress via your database admin panel, presumably phpmyadmin. Get it as a .csv file or similar, that you can open up in excel.

This will look something like this

ID CAT DATE POST_CONTENT

1 5 20071225 "here's your post"

There'll be a lot of other columns, but they should be fairly simple. Ie there'll be one for comments: 0 for none allowed, 1 for allowed, etc.

Now take your Excel file with all your data. You now need to get this set up so you have the HTML you actually want in the post. With Signese, I had something like this, in different Excel columns

IMAGEURL CHINESE

what I needed to do was turn this into something Wordpress would eat, so I added some extra columns and echoed the necessary content through them. Something like

   IMAGEURL

CHINESE

with each set of spaces representing an Excel column.

But you don't want more than one column. So merge them. Thinking about it, excel should let you do this itself, but I think what I did was copied and pasted into notepad, used find and replace to remove all the tabs, then . . .

copy that chunk of text into the POST_CONTENT column of your other excel file - the one exported from Wordpress.

This will give you the content you want in the place you want, but you will need to fill in some blanks in the other columns. This can be done by echoing numbers or formula (for incrementing columns, like ID) down the columns.

Save. Import back into your Wordpress database.

Without knowing too much about how you are currently working, that's how I would approach it. You may have more issues as you have categories, and Wordpress has a separate posts to categories table you would need to generate.

I'm not sure I'd bother with the word lists. We're not talking about whole books here, it's only an SMS. The word list isn't going to be any shorter than the SMS.

Link to comment
Share on other sites

I would probably suggest that you could save a lot of time by asking someone to write a program to automatically post the updates from the excel file, combined with whatever other files you have. It should be noted that you can have excel export a comma (or tab) seperated file, which could then be combined with the wordlist file, and automatically added to the database. May not give you perfect formatting, but it would shave a fair amount of time off the process. Assuming you were able to name the sound files in a consistent manner, that could potentially be automated as well.

Doesn't seem that much can be done about the whole messages off a cellphone bit though... My best suggestion there would be to buy a Treo, get a third-part SMS program which supports exporting messages to text files, move messages one-by-one to your SIM card, put it in the Treo, then export them all to text, and finally move those to your computer. After which you should just use the Treo exclusively, as that'd be so much easier.

Link to comment
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...