Hidden Characters in Word Document?
投稿者: Arabic & More
Arabic & More
Arabic & More  Identity Verified
ヨルダン
アラビア語 から 英語
+ ...
Sep 7, 2013

I am having some trouble regarding the word count for a document I recently translated. According to Word, the document contains about 45,000 words, but the agency that gave me the job says it contains about 28,000. When I asked about the discrepancy, they said that there are hidden characters in the Word document affecting the word count and that they will send me a "clean" version so I can confirm the correct number of words.

Has anyone ever encountered something like this?
... See more
I am having some trouble regarding the word count for a document I recently translated. According to Word, the document contains about 45,000 words, but the agency that gave me the job says it contains about 28,000. When I asked about the discrepancy, they said that there are hidden characters in the Word document affecting the word count and that they will send me a "clean" version so I can confirm the correct number of words.

Has anyone ever encountered something like this?

Is there a way that I can remove these hidden characters so that I can perform an independent word count?
Collapse


 
Javier Wasserzug
Javier Wasserzug  Identity Verified
米国
Local time: 12:56
英語 から スペイン語
+ ...
Software Sep 7, 2013

Try this one:
http://ginstrom.com/CountAnything/


 
Tony M
Tony M
フランス
Local time: 21:56
メンバー
フランス語 から 英語
+ ...
SITE LOCALIZER
One possibility... Sep 7, 2013

Amel Abdullah wrote:
Has anyone ever encountered something like this?


Yes, I have! I once had a document with exactly this 'nearly double' error; and various different ways of counting the words, including CAT-tool analysis, yielded quite widely-varying counts. Never did quite get to the bottom of it...


Is there a way that I can remove these hidden characters so that I can perform an independent word count?


I can't remember if this was the final solution, but one thing you could try would be to 'Save as...' a plain text file, which ought to strip out anything spurious. If nothing else, at worst it might reveal what those hidden characters actually are

[Edited at 2013-09-07 20:15 GMT]


 
gbaydar
gbaydar
Local time: 22:56
英語 から トルコ語
+ ...
Text boxes and footnotes Sep 7, 2013

You client must be unchecking the "count text boxes and footnotes"* option in MS Word word counter, or the other way around.

* something to that affect, my Word is in TR.


 
Joakim Braun
Joakim Braun  Identity Verified
スウェーデン
Local time: 21:56
ドイツ語 から スウェーデン語
+ ...
Deselect "hidden" Sep 8, 2013

If it's as simple as text formatted as "hidden": Select all, Format->Font... and then deselect "Hidden".

Use the "Display invisible characters" command to display (but not word-count) hidden text.

(I don't have the English version of Word, so the commands may be called something else, but I think you'll be able to figure it out.)

---

Sorry, upon re-reading the question this probably isn't a relevant answer!

[Bearbeitet am 2013-09-08 08:5
... See more
If it's as simple as text formatted as "hidden": Select all, Format->Font... and then deselect "Hidden".

Use the "Display invisible characters" command to display (but not word-count) hidden text.

(I don't have the English version of Word, so the commands may be called something else, but I think you'll be able to figure it out.)

---

Sorry, upon re-reading the question this probably isn't a relevant answer!

[Bearbeitet am 2013-09-08 08:58 GMT]
Collapse


 
LilianNekipelov
LilianNekipelov  Identity Verified
米国
Local time: 15:56
ロシア語 から 英語
+ ...
I don't know about Word in particular, Sep 8, 2013

however, different programs and CAT tools have different word counts -- the difference being sometimes close to a few thousand in a 100,000 word document. It is very hard to count words in Excel, especially, because you cannot really count the columns separately, unless perhaps there is a way I don't know about -- an easy way not adding the rows separately. You have to recount everything, if you want any more or less precise word count these days.

 
Arabic & More
Arabic & More  Identity Verified
ヨルダン
アラビア語 から 英語
+ ...
TOPIC STARTER
Unicode control characters... Sep 15, 2013

Thanks to those who offered suggestions, both here and via e-mail.

I did not download the program suggested by Javier as I was not sure it was relevant to my issue, but I did try the various other solutions offered...to no avail.

The client eventually sent me the "clean" document and told me the hidden characters were called "Unicode control characters," which are used for bidirectional text control. They are used when both English and Arabic appear in the same line sin
... See more
Thanks to those who offered suggestions, both here and via e-mail.

I did not download the program suggested by Javier as I was not sure it was relevant to my issue, but I did try the various other solutions offered...to no avail.

The client eventually sent me the "clean" document and told me the hidden characters were called "Unicode control characters," which are used for bidirectional text control. They are used when both English and Arabic appear in the same line since Arabic is written R to L.

Now that I know what the characters are called, does anyone know how I can remove them when doing my own word count? I am likely to receive more files of this type and would love to be able to do this on my own.

Regarding Excel, I usually copy and paste the material to Word and let Word do the counting. Not sure if there is an easier way.
Collapse


 
Rolf Keller
Rolf Keller
ドイツ
Local time: 21:56
英語 から ドイツ語
Directional Unicode characters Sep 15, 2013

Amel Abdullah wrote:

The client eventually sent me the "clean" document and told me the hidden characters were called "Unicode control characters," which are used for bidirectional text control. They are used when both English and Arabic appear in the same line since Arabic is written R to L.

Now that I know what the characters are called, does anyone know how I can remove them when doing my own word count?


Work with a renamed copy, don't touch the original!
In Word use "Save as/Text only (*.txt)/Coding=Unicode". Open this .txt document using Notepad, it should display all the English and Arabic words correctly. In Notepad save this document, then reopen it in Word. Now the directional info should have been discarded (try it using the arrow keys). Some formatting will be discarded as well, but this should not affect the word count.

If the above receipt doesn't work, try saving (in Notepad) with Coding=Arabic.

FWIW: That "directional " Unicode characterse are U+200E, U+200F, U+202A,
U+202B, U+202C, U+202D and U+202E. Maybe there is a way to replace them with "nothing".


 


To report site rules violations or get help, contact a site moderator:

このフォーラムのモデレーター
Maya Gorgoshidze[Call to this topic]
Prachya Mruetusatorn[Call to this topic]

You can also contact site staff by submitting a support request »

Hidden Characters in Word Document?






CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »
Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »