Seeking advice on translation software and strategies

Discussion in 'Horological Books' started by Jim Duncan, Mar 23, 2018.

  1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.
  1. Jim Duncan

    Jim Duncan Registered User
    NAWCC Member

    May 31, 2011
    401
    6
    18
    Central Coast of California
    Country Flag:
    Region Flag:
    I have a growing penchant for European clocks, especially the older (<1740) ones. I try to do research on what I find, but most of the books/magazines/articles are in languages other than English. A few times I've spent hours manually translating portions via Google translations. This is hard on the eyes and fingers. And occasionally there is a chuckle at the translation of certain horological terms. Commercial translation services are beyond my budget (2 or 3 pages = $400, etc).

    Has anyone found a decent translation software package that takes scanned or copied text in one language and spits it out in English? I think this falls under the OCR capability (optical character recognition).

    Or, is there a better way outside of OCR?

    Jim
     
  2. zedric

    zedric Registered User

    Aug 8, 2012
    792
    76
    28
    You don’t really need the OCR software to do the translation- if you can get the text OCRd in French or German, then you can run the text output through google translate.

    For what it’s worth, I also do what you are doing now (retype the text) so I’d be interested in any solution you come up with...
     
  3. zedric

    zedric Registered User

    Aug 8, 2012
    792
    76
    28
    If you are looking for info on French clock makers / retailers, there are a heap of old catalogues etc in the French archives, accessible at gallica.fr - but again, mostly in French. But you can access the ocr’d text for these articles with the right search terms, so you can run it through google.
     
  4. Jim Duncan

    Jim Duncan Registered User
    NAWCC Member

    May 31, 2011
    401
    6
    18
    Central Coast of California
    Country Flag:
    Region Flag:
    Zedric - Thanks for you comments. I gave gallica.fr a try and searched for "L'Horloge Francaise a Poids" which is a book I check out from the NAWCC library now and then. There were several hits for it, and all were from "Bibliographie de la France" (a listing of books). The actual book itself was not found in my search.

    Using the NAWCC library books is where I need the most help with translations. If I photocopy or scan a page without OCR there is no way to get a translation system (google) to read the copy/scan. Its all up to manual entry.

    At present it is Dutch and German translations where I need the most help. I've heard that OCR is no miracle system, but I wanted to learn what choices are out there and what are some of the limitations (technical terms, etc?).

    Jim
     
  5. zedric

    zedric Registered User

    Aug 8, 2012
    792
    76
    28
    Ok, so it looks like that book is from 1984 and so still well within copyright. That means that finding legitimate electronic copies on the web is not going to be possible. If you borrow it from the library, you could scan the page, run that through an OCR package that can pick up the French text (should be ok for a good OCR software so long as it knows it is looking for accents) then you will end up with a text file. Then you can copy and paste the text into google translate for a rough translation. Having not used any ocr package in the last ten years, I don’t know what is current, but there are surely web sites these days that you can send scanned pages and get back text?

    There are also free apps for the iPad - search for scan and translate in the App Store. Haven’t used any, but there is one with a decent review
     
  6. Jim Duncan

    Jim Duncan Registered User
    NAWCC Member

    May 31, 2011
    401
    6
    18
    Central Coast of California
    Country Flag:
    Region Flag:
    Zedric - your description is very much what I would expect, although I'd forgotten about the accent issue. I was hoping some of the NAWCC members had already been through the OCR & translation learning curve, but so far we've not heard from one. Perhaps if more of the general audience was asked? I wonder if any of the UK members have a system that works with the continental clock information language exchange?

    Jim
     
  7. zedric

    zedric Registered User

    Aug 8, 2012
    792
    76
    28
    Looks like there are 19 people seeing this thread, but none with better ideas to contribute at this stage - maybe time to re-post in Clocks general and see if that helps?

    I'm sure someone must have used OCR and done something like this before. I think one of those apps for the iPad, phone or the like would be worth exploring - they seem to be aimed at people trying to translate menus, so the vocab may be a bit limited, but I suspect they do the OCR on the device then upload the scanned text to a remote server, which would allow for better translation options.

    I generally don't mind re-typing stuff from books if it is in a foreign language, as it helps a bit in learning the language. But recently I was trying to find information on a French clockmaker, and it turns out his father went to Holland, so a lot of stuff I found was in Dutch. As I have no intention of ever leaning Dutch, re-typing that stuff to get it through translation was a bit of a drag.
     
  8. Ralph

    Ralph Registered User
    NAWCC Member Sponsor

    Jan 22, 2002
    4,626
    48
    48
    Country Flag:
    There are translator applications/apps available for your smart phone and tablets. I had a friend demonstrate one to me years ago. They could only have gotten better.
    Google has one available... your phone's camera inputs the text and it spits out in the desired language..... I think. I haven't personally used it.

    Ralph
     
  9. Jim Duncan

    Jim Duncan Registered User
    NAWCC Member

    May 31, 2011
    401
    6
    18
    Central Coast of California
    Country Flag:
    Region Flag:
    Ralph - I went looking for applications like those you mentioned. So far I've just found phone/pad systems that use the iOS system. The ones I read further on use the camera for larger images like road signs or single lines on a food menu to give picture->translate results. I was interested more in scan text->OCR->translate systems for a desktop computer. Haven't found one of those yet, but am still looking. I'm not very computer/iPhone savvy, so this may not be quick.

    Thanks for the encouragement.

    Jim
     
  10. Richard Watkins

    Richard Watkins Registered User
    NAWCC Fellow NAWCC Member

    "Looks like there are 19 people seeing this thread, but none with better ideas to contribute at this stage"

    I have been absent.

    Technical translations (what you want to do) are relatively easy. There are three rules:

    First, know the subject. You haven't a hope in hell of translating a technical book well if you don't have a good knowledge of the subject. There are plenty of examples of books that have been translated by "experts" who are excellent in both languages, the source and English. But their translations are frequently wrong because they cannot use the correct words, let alone the correct interpretation of sentences.

    Second, language skills may not matter. Well, this is the case with French, but there may be problems with other languages. Certainly German offers some problems. But hopefully they can be solved by dictionaries. (My brother is fluent in German, but when I have given him a difficult sentence or two he flounders! So fluency is no guide!)

    Third, Google translate is OK and it is probably the best translation system available now. I used to use Systran (it may not exist now) but its translations were poor. Google will give you the sense in a garbled way, and if you know the subject then it is fairly easy to convert it into good English.

    Personal translations are relatively easy, because you can work around the strange English and get a sense of the meaning. To publish is much harder, because you have to convert the strange use of language into good English.

    The reason why technical translations are relatively easy is because they almost never require a nuanced understanding of language; the language is usually direct. Verb forms are irrelevant, and subtle differences in meaning don't occur. And the translation can be in a completely different style; indeed, often changing the style leads to much better English.
     
  11. Jim Duncan

    Jim Duncan Registered User
    NAWCC Member

    May 31, 2011
    401
    6
    18
    Central Coast of California
    Country Flag:
    Region Flag:
    Richard - Thanks for the insights. Getting the translations technically correct is a good portion of the challenge. The other part is the "labor" (labour) involved in reading one language, then typing it into Google Translate to get the translated meaning (right or wrong as it may be). I find this data entry phase the most difficult. I was hoping some clever people had tweeked the OCR capabilities such that some software was out there to take over this burdensome work.
    Once OCR had a rough draft of a translation then it would be time for the technical correctness that you discuss. I expect the FBI/CIA and such have proprietary versions, but we're not likely to see a demonstration regarding clocks.
    Wish I had a brother that spoke German, as that and Dutch are the main languages I need to translate from books.

    Jim
     
  12. Richard Watkins

    Richard Watkins Registered User
    NAWCC Fellow NAWCC Member

    OCR is essential. I use ABBYY FineReader to do the OCR. Then I use MS Word, set the language and spell check it.

    There is a very serious problem with Fractur, the old German script because ABBYY FineReader and other available OCRs cannot handle it and they produce garbage. There is, apparently, a professional OCR system that works with Fractur, but it is far too expensive to use, so it has to be changed by hand into ordinary Latin script. A page or two is OK but a whole book is probably impossible to do this way.
     
  13. Jim Duncan

    Jim Duncan Registered User
    NAWCC Member

    May 31, 2011
    401
    6
    18
    Central Coast of California
    Country Flag:
    Region Flag:
    Thought I should reply to this thread before it gets forgotten.

    I looked at ABBYY translation packages but they seemed more MicroSoft friendly than Mac (which is what I have). Then my days became full of clock work, which is a nice thing. I plan to look at a sample package when I have the time (and clear head) to jump into software tryouts. If someone else discovers a nice package along the way, well all the better.

    Jim
     
  14. Dr. Jon

    Dr. Jon Registered User
    NAWCC Member

    Dec 14, 2001
    5,301
    139
    63
    Aerospace Engineer
    New Hampshire
    Country Flag:
    Region Flag:
    I do it pretty much the way Richard Watkins does.

    First I scan or OCR it if I already have it in digital form such as a download, Then I use MS word to spell check and then I use Google on line from a paragraph by paragraph translation. That is about as large a piece I can parse with mechanical translation. Google is pretty clueless about horological terms. I work mostly from French.

    I use Adobe acrobat for OCR and have been happy with it. I bought it as part a bundle with the Fujitsu Scan jet. That entire package cost me less than full Acrobat would have cost.

    Translation of horological technical literature is just hard unless you are very fluent in the original language.
     

Share This Page