How to search very large TMX files on a Mac?
Thread poster: wilhelm_zwo (X)
wilhelm_zwo (X)
wilhelm_zwo (X)
Netherlands
Local time: 12:16
German to Dutch
Jul 18, 2013

What is a good way to search very large TMX files (3 GB and up) on a Mac, for concordance purposes?

WII


 
Fernando Toledo
Fernando Toledo  Identity Verified
Spain
Local time: 12:16
German to Spanish
I use Jul 18, 2013

wilhelm_zwo wrote:

What is a good way to search very large TMX files (3 GB and up) on a Mac, for concordance purposes?

WII


for this TextWrangler but I think any Editor will do it.


 
Meta Arkadia
Meta Arkadia
Local time: 17:16
English to Indonesian
+ ...
Nope Jul 18, 2013

Fernando Toledo wrote:
[I use] for this TextWrangler

No you don't. TextWrangler can't open files larger than around 350 MB.

but I think any Editor will do it.


Nope. As far as I can see, only Java based text-editors can open such large files.

Cheers,

Hans


 
Martin Skara, PhD.
Martin Skara, PhD.  Identity Verified
Slovakia
Local time: 12:16
French to Slovak
+ ...
Vim Jul 19, 2013

try http://macvim.org/OSX/index.php

 
wilhelm_zwo (X)
wilhelm_zwo (X)
Netherlands
Local time: 12:16
German to Dutch
TOPIC STARTER
UltraEdit vs. TextWrangler Jul 19, 2013

Fernando Toledo wrote:

wilhelm_zwo wrote:

What is a good way to search very large TMX files (3 GB and up) on a Mac, for concordance purposes?


for this TextWrangler but I think any Editor will do it.


I'm sorry, but these DGT TMs are way too large for good old TextWrangler. Luckily the new UltraEdit 4.1 can handle them. AMOF its Find in Files function can search very fast in all TMs in a folder, even when they are as gigantic as the DGT. Still looking for the optimal solution, though. (Since UE isn't integrated in my CAT tool CafeTran on my Mac.)

Joakim, when will your revamped TMX searcher be relaunched?


 
Meta Arkadia
Meta Arkadia
Local time: 17:16
English to Indonesian
+ ...
Heap? Jul 19, 2013

Martin Skara, PhD. wrote:
try http://macvim.org/OSX/index.php

MacVim should be able to open large files, but I think it requires increasing the Java heap. I can't. It's not in a *.plist or *.config file (as far as I can see), and I stay away from the Terminal if I don't know what I'm doing which is most of the time.

Advice welcome. Martin?

Cheers,

Hans


 
John Moran
John Moran  Identity Verified
Ireland
Local time: 11:16
German to English
+ ...
OmegaT Jul 19, 2013

Assuming you have more than 4GB RAM OmegaT has no problems with 3GB TM's but you have to tell the Java Virtual Machine to make enough space for the file as the default is too small.

To do this go to Applications/Utilities and open terminal

The type:

cd /Applications/OmegaT.app/Contents/

and then

open .

Drang and drop the file "Info.plist" into a text editor (I use TextWrangler).

Look for the VMOptions and
... See more
Assuming you have more than 4GB RAM OmegaT has no problems with 3GB TM's but you have to tell the Java Virtual Machine to make enough space for the file as the default is too small.

To do this go to Applications/Utilities and open terminal

The type:

cd /Applications/OmegaT.app/Contents/

and then

open .

Drang and drop the file "Info.plist" into a text editor (I use TextWrangler).

Look for the VMOptions and change the -Xmx value to something above 3GB. I have 8GB RAM so I use:


VMOptions
-Xmx6024M

Then create a project with a small dummy file (a docx with one dummy sentence) and place the tmx file in the /tm directory. Then you can use Ctrl+F to search the TMX file and it also uses lemmatisation so "dog" will find "dogs".
Collapse


 
Meta Arkadia
Meta Arkadia
Local time: 17:16
English to Indonesian
+ ...
It's a worry Jul 20, 2013

John Moran wrote:
Assuming you have more than 4GB RAM OmegaT has no problems with 3GB TM's

I'm pretty sure der Wilhelm is well aware of that solution. Like me, he uses CafeTran. Loading and searching large files in CT is no problem, and you can even run two instances of CT at the same time. And if that isn't enough, you can load a huge TMX file as an "external" database, in which case it uses very little RAM.

So let me rephrase Wilhelm's question:

How can I search - and index - large TMX (and other) files on a Mac, outside my CAT tool.

There are two problems with that:

 You can't open documents (not files in general) exceeding around 350 MB on a Mac with apps that don't run under Java (I don't know if there are other solutions, but I doubt it)
 Spotlight/SpotInside cannot search TMX files

So to search those files, you'll need a Java application, or you (still) need a Java application to open the TMX file, convert it to TXT, split it into files OS X can handle, i.e. smaller than 300 MB to make them searchable in Spotlight/SpotInside.

I still don't know how to do it. I tried Martin's solution (above), but a 1.5 GB TMX file didn't open in MacVim. I tried to increase the Java heap for MacVim, to no avail, mainly because MacVim isn't a Java app.

Der Wilhelm suggested UltraEdit (Java). The new beta can split files it seems, so that could be a solution. I downloaded the latest build which can't split files...

I spent so many hours on trying to solve the issue, I could have learned the contents of those databases by heart. I'm sick of it. But I'm sure everybody knows we're talking about the EU files (DGT and Eurobook), and I happen to translate EU notifications. What's worse, from two source languages - ENG and GER - into DUT. I need those big files. Searching them, and even auto-assemble from them in CAfeTran, is not a problem, but I want to be able to search the DGT/Eurobook files of the other source language. And for non-EU texts, I want to be able to search them without attaching them to my current project.

Cheers,

Hans


[Edited at 2013-07-20 00:44 GMT]

[Edited at 2013-07-21 04:42 GMT]


 
Meta Arkadia
Meta Arkadia
Local time: 17:16
English to Indonesian
+ ...
Well, integrate it then Jul 20, 2013

wilhelm_zwo wrote:
Since UE isn't integrated in my CAT tool CafeTran on my Mac.

Write an Automator Service to be able to search from within CafeTran (or any other app), or ask the UE developer to write it.

Cheers,

Hans


 
Heartsome Support
Heartsome Support
Local time: 18:16
Import server-based database Aug 30, 2013

It is too big to open this TMX even with text editor. Theoretically you can import the TMX to Heartsome supported server-based database such as MySQL, PostgreSQL or Oracle for searching on Mac. In this case you have to translate files in Heartsome, because Heartsome does not provide an independent TM program.

 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

How to search very large TMX files on a Mac?






Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »