Pages in topic: [1 2] > | Translating a website: Tool for downloading hundreds of files and counting words 投稿者: Rajan Chopra
| Rajan Chopra インド Local time: 17:01 2008に入会 英語 から ヒンディー語 + ...
Hi friends, A translation agency wants me to tranlate a website that contains dozens of links and in every link there are many links and sub-links. If I download the files one by one, it will take a great deal of time. Is there any method to download all files available on a website (Word files, pdf files, html pages and scanned pages etc.) in a quick and convenient manner? Secondly, there is no problem in counting the words in MS Word files as I can go to Tools and a... See more Hi friends, A translation agency wants me to tranlate a website that contains dozens of links and in every link there are many links and sub-links. If I download the files one by one, it will take a great deal of time. Is there any method to download all files available on a website (Word files, pdf files, html pages and scanned pages etc.) in a quick and convenient manner? Secondly, there is no problem in counting the words in MS Word files as I can go to Tools and ascertain the word count but is there any tool to count words in pdf files, html pages etc as counting them manually will kill a lot of time. Thanks in advance for your precious help. Regards, Chopra ▲ Collapse | | | Laurent KRAULAND (X) フランス Local time: 13:31 フランス語 から ドイツ語 + ... Unprofessional way of dealing | Dec 5, 2010 |
Hi langclinic, there is just no way anybody (agency or direct client for that matter) could make me download a whole website, with links, downloadable documents and the like - it is just unprofessional to say the least; and even more given the structure you describe. I would insist on manageable work, i. e. the agency contacts the client and asks them to send the contents they really want to have translated. For you, it will be an insurance against doing not enough or too muc... See more Hi langclinic, there is just no way anybody (agency or direct client for that matter) could make me download a whole website, with links, downloadable documents and the like - it is just unprofessional to say the least; and even more given the structure you describe. I would insist on manageable work, i. e. the agency contacts the client and asks them to send the contents they really want to have translated. For you, it will be an insurance against doing not enough or too much work. This being said, I use Anycount to count the words in a PDF file. But the PDF file must be genuine PDF (like pages created in a DTP software or through an office application), not scanned files - in this case, and as the file would be images put in a PDF, you would have to count the words manually too. Good luck! ▲ Collapse | | | Riadh Muslih (X) Local time: 04:31 アラビア語 から 英語 + ...
Laurent KRAULAND wrote: Hi langclinic, there is just no way anybody (agency or direct client for that matter) could make me download a whole website, with links, downloadable documents and the like - it is just unprofessional to say the least; and even more given the structure you describe. I fully agree with Krauland. Not only on the point of professionalism, and perhaps copyright, also because I will not do the work of the client. The client must send me what he/she wants me to translate, not me fishing for it, with or without pay. | | | jyuan_us 米国 Local time: 07:31 2005に入会 英語 から 中国語 + ... I think the question is still relevant and worth looking into | Dec 5, 2010 |
Suppose you meet a direct client, who don't have an IT department but they just want you to translate their entire website. And they don't know how to download the files either. In this case, you may have to figure out how to download all the files. | |
|
|
I have a piece of advice | Dec 5, 2010 |
1. To download a site, you need Teleport Pro. It allows us to download the site. You just indicate the URL, and the program does not go beyond the limits you have indicated (it is very important that you do not download external pages, links to which can occur on the web-site that you need to download). Downloaded files will be stored in a separate folder. Be aware that the program downloads everything (images, or whatever). It stores all these files in one folder. 2. Fine count is... See more 1. To download a site, you need Teleport Pro. It allows us to download the site. You just indicate the URL, and the program does not go beyond the limits you have indicated (it is very important that you do not download external pages, links to which can occur on the web-site that you need to download). Downloaded files will be stored in a separate folder. Be aware that the program downloads everything (images, or whatever). It stores all these files in one folder. 2. Fine count is a very powerful tool to count html files, pdf, etc. You just select the folder where your downloaded files are stored, and than select html files only (to add them to the list). 3. You translate the files in TagEditor. 4. You than look through the on-line version of your translation to find any errors, slips of the pen, etc. That is all. I successfully translated and localized several sites using the method. Of course, only small-scale web-sites can be translated in such a way. When having a large one, you will be lost in the piles of pages, images, etc. All that takes you time (which means money). And frankly speaking, only rather small sites, of individuals or small companies, can be processed in that way. Large companies will of course never ask a single free-lancer to translate the whole web-site.
[Edited at 2010-12-05 06:52 GMT] ▲ Collapse | | | Laurent KRAULAND (X) フランス Local time: 13:31 フランス語 から ドイツ語 + ...
jyuan_us wrote: Suppose you meet a direct client, who don't have an IT department but they just want you to translate their entire website. And they don't know how to download the files either. In this case, you may have to figure out how to download all the files. but a website does not appear ex nihilo somewhere on the Internet. Someone *must* be in possession of the original files. It is like the plague some of us are dealing with when handling scanned PDFs - you'd be surprised how fast some clients manage to get the originals when you say that processing scanned PDFs comes at a surcharge of X%. And how does one download Flash-generated content? | | | | Samuel Murray オランダ Local time: 13:31 2006に入会 英語 から アフリカーンス語 + ... Three sets of tools | Dec 5, 2010 |
langclinic wrote: Is there any method to download all files available on a website (Word files, pdf files, html pages and scanned pages etc.) in a quick and convenient manner? Yes, you need an "offline browser". I recommend Oleg Chernavin's Web Downloader 2.2 (google for webdown.exe and look on abandonware sites). Secondly, there is no problem in counting the words in MS Word files as I can go to Tools and ascertain the word count but is there any tool to count words in pdf files, html pages etc as counting them manually will kill a lot of time. You can try Anycount: http://www.anycount.com/download.html | |
|
|
| Original files | Dec 5, 2010 |
"Someone must be in possession of the original files". Yes, but they may be server-side scripts querying databases and contain no actual HTML at all. (That still doesn't make it the translator's problem, of course.) | | | Original files | Dec 5, 2010 |
"Someone must be in possession of the original files". Yes, but they may be server-side scripts querying databases and contain no actual HTML at all. (That still doesn't make it the translator's problem, of course.) | | |
As stated by various previous posters, the right way to do this is to get the files (and instructions!) from the IT guy of the company. If the site is not exclusively made up of static HTML pages, you can't possibly translate it by just trying to download the site "from outside". If you know that it's all static html, it's still better to get the files from the webmaster, but it's possible to grab them from the net. The "right" tools for that are httrack and wget. I use wget, I believe the... See more As stated by various previous posters, the right way to do this is to get the files (and instructions!) from the IT guy of the company. If the site is not exclusively made up of static HTML pages, you can't possibly translate it by just trying to download the site "from outside". If you know that it's all static html, it's still better to get the files from the webmaster, but it's possible to grab them from the net. The "right" tools for that are httrack and wget. I use wget, I believe the command to download (mirror) a site is wget -m -np -P outputfolder -p http://www.site/address.com -m: mirror site, -np no parent folders, -P: specify name of output folder, -p: get page dependencies such as images Word counts shouldn't be an issue with HTML. You should do HTML with a CAT anyway, and your CAT will give you a word count. BTW both downloading and translating these files takes a fair bit of IT knowledge - I'm not sure I myself would take it on without the client's guidance and support.
[Edited at 2010-12-05 12:22 GMT] ▲ Collapse | |
|
|
Translator's Abacus | Dec 5, 2010 |
Looked at "Anycount" and wondered if there was anything similar but free. Came across "Translator's Abacus" at http://www.globalrendering.com/download.html and downloaded it. I've tried it at it seems quite useful. | | | Webreaper & Anycount | Dec 5, 2010 |
langclinic wrote: Hi friends, Is there any method to download all files available on a website (Word files, pdf files, html pages and scanned pages etc.) in a quick and convenient manner? WebReaper 10.0 (Freeware) Secondly, there is no problem in counting the words in MS Word files as I can go to Tools and ascertain the word count but is there any tool to count words in pdf files, html pages etc as counting them manually will kill a lot of time.
Anycount | | | Samuel Murray オランダ Local time: 13:31 2006に入会 英語 から アフリカーンス語 + ... | Pages in topic: [1 2] > | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » Translating a website: Tool for downloading hundreds of files and counting words Wordfast Pro | Translation Memory Software for Any Platform
Exclusive discount for ProZ.com users!
Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value
Buy now! » |
| Anycount & Translation Office 3000 | Translation Office 3000
Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |