banner_jpg
Username/Email: Password:
Forums

Automatic manga translation/scanlation

Pages (2) [ 1 2 ] Next
You must be registered to post!
From User
Message Body
Post #513267
user avatar
Member

8:31 am, Dec 19 2011
Posts: 173


Spurred by the recent closing of yet another scanlation group and the worry some future of manga scanlation I've tried to look into this subject.

For the past couple of hours, I've tried to find some kind of working solution to screen grab japanese characters from an image, unsuccessfully. The idea came from eroge games where an automated translation method already exist, by extracting text from a running game (with Agth) and paste it into a translator (Atlas).
The horrible mess resulting from that automated translation is often more than adequate to grasp plot-lines and dialogue, though subtleties are often lost. Still, the fact that you're not 100% reliant on translators to enjoy the games makes up for it hugely.

If it wasn't clear already, let me make it simple. You need two parts for this to work:

1. A way to extract the raw japanese characters from the medium.
2. A way to translate the extracted text.

The more automated step 1 to 2 is with the least input, the better.

As already mentioned, there's multiple ways to translate the text. Even google translate is sufficient. The real problem is finding a way to extract the japanese characters from the image, and this is where I'm stumped.

So far, the way I've tried to make this work is using an application JOCR (ocr = optical character recognition). Using this, you can then capture any image and the program will try to recognize any characters or texts from it. To make this work, JOCR needs Microsoft Office Document Imaging (MODI). MODI is included in MS Office 2003 and 2007, but not in 2010. Also, if you use 2007, it's not automatically installed, you need to manually go into the Control Panel, Add/Remove programs, select Office, then change, find MODI under Tools and then install for all computers.

But it's not enough! If you haven't already, you also need to install Microsoft Office Multi-Language Pack japanese version. Only after selecting japanese in MODI should JOCR be able to work.

Needless to say, I still can't make it work even after all that and I don't know why, JOCR refuses to recognize the characters, which is why I'm stuck. It can very well be me that failed somewhere along the way, so it may work for you.
One possible reason could be that the raw I used for testing writes the characters vertically, but even that would be fine since I could just arrange it back manually after, as long as the actual extraction works.

I've written this, partly as a reference for other people interested in this, and also as a way to encourage other people to try to find a solution and post it here. I will of course update my post with any significant advancements or working methods.

Just imagine: you have a raw that you desperately want to read, but it may never be scanlated and you don't speak an ounce of japanese. Wouldn't it be wonderful to just translate it yourself, slowly and with a couple of programs, but still? It could be the future where less and less scanlators are active.

Post #513324 - Reply to (#513267) by RilleL
Member

1:54 pm, Dec 19 2011
Posts: 56


Quote from RilleL
Even google translate is sufficient.


No, its not and VN machine translations that are considered unacceptable by any serious project translation group.

A manga is a visual medium, if I look at the pretty pictures I can get a idea of what is going on, a terrible bad Google Machine Translation would confuse me more that not knowing what the moon runes mean.

________________
What part of "Please do not put in huge images!" did you fail to understand?
Post #513333 - Reply to (#513324) by Drakron
Member

2:34 pm, Dec 19 2011
Posts: 55


Quote from Drakron
Quote from RilleL
Even google translate is sufficient.


No, its not and VN machine translations that are considered unacceptable by any serious project translation group.

A manga is a visual medium, if I look at the pretty pictures I can get a idea of what is going on, a terrible bad Google Machine Translation would confuse me more that not knowing what the moon runes mean.


Indeed, but it's a good point to start with. When you got three big steps to take (character extraction, translation, character insertion), better take them one at a time. Automatic translation is already a big subject that is researched on, so one should probably first focus on something else.

Who knows, maybe google translate will one day become so good that it produces non-glibberish text which actually makes sense. ^^"

Post #513351 - Reply to (#513333) by JustPassingBy
Member

4:41 pm, Dec 19 2011
Posts: 56


Quote from JustPassingBy
Indeed, but it's a good point to start with.


The problem is that it just makes editing much longer because the gibberish pretty much forces the editors having to translate the original text to make sense out of it, the only potential time saving would be typesetting.

Automatic translators like Star Trek Universal Translator are still in the realm of science fiction and creating a program that translates scanned pages but still spews up gibberish is still not the way to go.since it would only be a translation on the broadest meaning of the word.


________________
What part of "Please do not put in huge images!" did you fail to understand?
Member

5:26 pm, Dec 19 2011
Posts: 257


Sounds like about as much effort as just looking up the kanji would be. Which is really the only difficult part of reading raws when you're not fluent.

Learning kana can take anywhere from a day to a few months, depending on how diligent you are. If you can read kana (you could even just use a chart, but that would be reaaally slow going) you can read any manga with furigana by the kanji, as long as you have a dictionary and maybe a grammar site open in another tab.

Quote
The problem is that it just makes editing much longer because the gibberish pretty much forces the editors having to translate the original text to make sense out of it, the only potential time saving would be typesetting.


This is mostly the reason why it would be such a pain. Whatever the case, with our current technology you'll only get sensible material from an actual translation. But it's really not that hard - even when I had only just learned kana, I was able to read through raws of Yotsuba!. It was a struggle, sure, but I managed it.

However, the idea of a character extractor would still be very useful for this more traditional method. Like I said, the biggest road-block is kanji, especially if you're reading a manga without furigana. Kanji can be confusing to look up, since most dictionaries use radicals or number of strokes and other things that would generally require to you actually have some knowledge about kanji.

If you could just extract the kanji then copy/paste it into a dictionary, I'm sure many non-Japanese readers or Japanese-learners would have an easier time translating. So I defintely think it's an interesting idea to develop!

Post #513366
user avatar
Member

6:31 pm, Dec 19 2011
Posts: 838


i'd be kind of easy.. if all mangas had the same Font and the scan had a quite similar resolution... But having diferent fonts could mess any program made to understand the chars and making it able to read any Font... its a nice dream.

Post #513444
user avatar
Member

7:02 am, Dec 20 2011
Posts: 173


It seems people have misunderstood me. I did not propose that automated scanlation replace traditional scanlation. The quality is not even close. However, the possibility to read any raw yourself and not being reliant on translator is a huge freedom. It's a method for yourself, not to mass produce scanlated manga. Goggle translate or Atlas is sufficient in that it's possible to grasp what the text is largely about, which I just wrote in the previous section, not sufficient as in an acceptable translation from a scanlators point of view.

Also, the fact that it's suggested to just learn the language is a bit laughable to me. If it was so easy and everyone could do it why scanlate at all.. -.-

While it initially seems troublesome to setup a working method, everyone knows that the beginning is the hardest. Do you remember your first time registering and learning IRC? How easy is it to leech now? Or setting up Agth and Atlas? The eventual benefits are well worth the effort.

Last edited by RilleL at 8:48 am, Dec 20 2011

Post #513446
Member

7:18 am, Dec 20 2011
Posts: 27


There's only one thing that poses a problem with this, and that is, as mentioned, autoTLs. I've seen Google TL do its job well, the only problem that makes manga incompatible with autoTLs here is that mangaka use almost a whole nother form of Japanese altogether. The formality used in manga is so far off the beaten path that it almost never translates correctly, at all. That's why people say to learn it, because currently, there's nothing that can correctly do it aside from the human mind.

Post #513493 - Reply to (#513444) by RilleL
Member

2:11 pm, Dec 20 2011
Posts: 257


Quote from RilleL
It seems people have misunderstood me. I did not propose that automated scanlation replace traditional scanlation. The quality is not even close. However, the possibility to read any raw yourself and not being reliant on translator is a huge freedom. It's a method for yourself, not to mass produce scanlated manga. Goggle translate or Atlas is sufficient in that it's possible to grasp what the text is largely about, which I just wrote in the previous section, not sufficient as in an acceptable translation from a scanlators point of view.

Also, the fact that it's suggested to just learn the language is a bit laughable to me. If it was so easy and everyone could do it why scanlate at all.. -.-


Well, I don't think I misunderstood you. I've tried the method of using Google (I've never used Atlas, maybe it's better?) to read things, and it's really such a pain. I ended up having to look up each word by themselves and then looking up grammar rules so I could piece them together, because Google just mangles everything up. Even "grasp what the text is largely about" is a stretch, most of time, unless every sentence is something like "Hello!" "What?" or "Baka!" And yes, I tried this for personal enjoyment and not mass distribution.

I'm not saying everyone should learn the whole language, just that doing exactly what I said above (looking up words and grammar rules) was easier for me than relying entirely on Google. Learning the language implies memorizing it and being able to translate it in your head. They're different things.

Post #513530
user avatar
rawr
Member

8:47 pm, Dec 20 2011
Posts: 161


http://www.youtube.com/watch?v=ae01yz5z99E

Now... wait a few decades for them to finish up on Japanese. bigrazz

________________
The Company
[b]Batoto
user avatar
✯ Sarcastic
 Member

4:43 am, Dec 26 2011
Posts: 597


I understand what you are saying:

Scanlations > autoTL's > RAWS.

With the first option becoming less and less available, I believe that the future of technology will allow autoTL's to improve significantly. Already some programs are becoming smarter by detecting grammatical mistakes in addition to misspellings. Even if someone learns kana, the most difficult part to master in any language is the huge vocabulary they hold, and simply not everyone is committed to doing that.

As some of the remaining scanlators ranging from bad to dictators (e.g., 50+ posts required per chapter for a one-day-long link), this fallback option seems like a good solution that needs a little more work to be in common usage. d('-')b

________________
User Posted Image
Post #575091
user avatar
Member

3:50 am, Oct 28 2012
Posts: 173


So I realise I'm reviving a dinosaur, but I figured I should update this topic.
I recently found out about a JOCR method that actually work. I tested it myself, cropping a sentence from a raw, uploaded and it worked.
From then on you could copy that sentence and try to use google translate or something, but the important part is that for the first time I tried an OCR successfully which makes me really enthusiastic for the future.
The OCR in question is:
http://maggie.ocrgrid.org/nhocr/

Again, if you have a better method please post it, like an OCR able to read up to down text.

Last edited by RilleL at 3:58 am, Oct 28 2012

Post #575413
user avatar
Member

5:42 pm, Oct 30 2012
Posts: 402


I can only be amazed at the dedication of someone able to read any significant amount of computer translated text (especially Japanese!). Surely even porn isn't that interesting. smile

________________
Active translations list
Completed translations list
Dropped translations
Post #583369 - Reply to (#575413) by cmertb
Member

11:15 am, Jan 8 2013
Posts: 6


ye sit is if you want to no what the caracter says


Post #583370
user avatar
Member

11:25 am, Jan 8 2013
Posts: 761


In my experience, Japanese text translated by google translate usually turns out to be totally gibberish and it's very hard or impossible, to even get its general meaning.

Pages (2) [ 1 2 ] Next
You must be registered to post!