I have long wanted better access to the content in my Maltego files, and by “better” I mean scripted command line access. Having recently paid my annual license fee for the 14th time(!) I have almost 2,500 MTGX/MTGL files lurking.
I first mentioned this a week ago in Coding With Claude and yesterday I got it done - basically by having watched Claude struggle, then fail, and avoiding those pitfalls. If there is a solution to a problem, Claude will get there quickly. If there isn’t a solution, Claude will wander in circles, doing the best it can, and finally ending with a fancy framework around the place where the solution would fit.
Thusly leaving you to work out the details on your own. Which is what I quickly did, having first ignored the problem for a week …
Attention Conservation Notice:
Stream of conscious regarding obscure file formats, network analysis, and what one can/can not do with AI. Starts esoteric and goes downhill from there.
Simplification:
I knew that Claude couldn’t handle this problem, so I dumbed it down - just asking ChatGPT for a way to identify the software used to create Lucene files, and then a way to query them. Keep in mind current Maltego files end with MTGL and there is an older format MTGX.
I was amazed to find such an ancient Lucene version in use - the latest maintenance release to 5.5 from October of 2017. I looked at my files and the very last with an MTGX in its name was 2016-02-09. Lucene 5.5.0 was released two weeks after that. Reading between the lines, Maltego shifted to Lucene 5.5, it works great for their purposes, and they’re not inclined to change. The current version of Lucene is five major upgrades ahead and tools for 5.5 are starting to age out of support.
Lucene50SegmentInfo
Mac OS X
java.vendor
Homebrew
java.version
21.0.7
java.vm.version
21.0.7
lucene.version
5.5.5
os.arch
aarch64
java.runtime.version
21.0.7
source
flush
os.version
15.5 timestamp
1748968106106
_3f.si
_3f.cfe
_3f.cfs
Lucene50StoredFieldsFormat.mode
BEST_SPEED
Having focused on command line options rather than libraries to integrate, I noticed Lucene Query Tool, which has not been updated since … *sparkles* 2016. Perfect. The build was really simple, took all of thirty seconds on my elderly Proxmox system.
This invocation in the Graphs/Graph1 folder of an unpacked Maltego graph dumps all the details on the few entities that are in it. That’s 11,426 lines for the 450 entities that come back from an “L1 machine” aimed at maltego.com.
lqt --index DataEntities -q %all
Here’s what came out for the email associated with the DNS entries for the Maltego domain. I made this file on June 3rd
valueStr[string]: dns@jomax.net.
displayValueStr[string]: dns@jomax.net.
propText[string]: dns@jomax.net.
weight[int]: 100
hiddenPropText[string]: 2025-06-03 09:26:25.662 -0700
id[string]: 2uwnhtd0eotch
labelReadOnly[bool]: f
type[string]: maltego.EmailAddress
hasAttachments[bool]: f
properties[map]|maltego°automation°dob[map]|hidden[bool]: t
properties[map]|maltego°automation°dob[map]|value:datetime[long]: 1748967985662
properties[map]|maltego°automation°dob[map]|type[string]: datetime
properties[map]|maltego°automation°dob[map]|propertyValueFormat[string]: STRING
properties[map]|email[map]|displayName[string]: Email Address
properties[map]|email[map]|value:string[string]: dns@jomax.net.
properties[map]|email[map]|type[string]: string
properties[map]|email[map]|propertyValueFormat[string]: JSON
__key[long]: -4901947072960886639
__index-date[date]: 1748967985667
__birth-date[date]: 1748967985667
Running Aground:
It’s nice to finally be able to see inside Maltego files, but I have to unpack them on MacOS, rsync them to Linux, and then run lqt to see the details.
Lucene is a search library that is integrated into larger projects - like ArangoDB and Elasticsearch. The command line tools for it are 1) dated 2) often unsupported 3) complex to install, with both ChatGPT and Claude hallucinating badly when presented with this problem.
Again reading between the lines, both of those LLMs have ingested the same set of instructions using a specific version - Lucene 9.1.3. I think they got it from a Stack Exchange post and I don’t think the solution was complete. It appears someone offered a complex, well documented problem, and then received a couple suggestions, none of which were complete.
As a longtime Unix user, I’m used to eyeballing partial solutions and picking a path through them. The LLMs are just lying in a plausible fashion when queried about this problem.
Tinkering:
After much fiddling, I can unpack and read an MTGL file and I mastered the older MTGX format, which contains a simple GraphML file, rather than the cluster of Lucene indices in the newer format. There is still a LOT to be done and it’s going to be fragmentary, because it’ll be just command line tools for handling large groups of these files, not some smooth vertically integrated thing.
I do want the MAGA Meltdown graph information into ArangoDB format, with the linked URLs pulled as PDF or something, so it becomes a proper LLM+knowledge graph. That wasn’t in reach yesterday morning, now it most certainly is. The question is how far I’m going to go in smoothing the process. I may just get some minimal bits of code together and leave the Maltego2Arango as a starting point, with the assumption that anyone who needs this sort of thing is going to have a customization job ahead of them.
What would make more sense, at least for me, is getting some transforms going that can pull data FROM ArangoDB. Once things become bi-directional, then I’d have something. But the problem is monetization … no point to building complex infrastructure if I can’t get paid using it, right?
Conclusion:
Neither Claude nor ChatGPT did very well on this problem, but they were very valuable in their failure modes. I saw what they could not do, they … put some bounds around where I had to look for answers. It’s like that peer reviewed science thing - negative results still ARE results, they can keep you out of the weeds.
I have been holding off paying for the Claude Max subscription until I knew I had lots of coding queued up. I’m also dealing with some health stuff that has me moving slowly. I have another doctor’s thing on the 17th and once that’s done I’ll know if I’m going to be able to capitalize on having a $100/month programming assistant active.
Among the videos I’ve seen about AI and coding, a question has come up repeatedly - “If it’s good for development, where’s the FOSS?” I think that … what I see with AI is better for applying existing tooling, rather than creating it from scratch.
So I’ll grind a bit more on this, update the Maltego2Arango repo, and save the next person who comes along a great many hours of digging to get the basics in place.