I read Information Operations Recognition: From Nonlinear Analysis to Decision-Making about three years ago. I found the OSINT of Chapter 2 and the network analysis in Chapter 4 immediately familiar, but I ran aground hard in Chapter 3, Elements of non-linear dynamics for information operations recognition. Way back when, my sophomore year in college, I shared an apartment with three other guys, one chemical engineer, one electrical engineer, and one nuclear engineer. I recognize the math in that chapter as second year calculus for engineers, but being in computer science I had already made my escape from such things.
I’ve needed a Python oriented replacement for JGAAP for years. I’ve spent a lot of time with Open Semantic Search, which implicitly meant spending some time with spaCy, and because I was in a mode of learning rather than full throttle pipeline tuning, I’ve also had some Natural Language Toolkit adventures. But the only graphical Python thing that makes sense is Orange Data Mining, which I first mentioned in Attribution Using Stylometry. This is something I played with years ago, but it didn’t stick for me.
A couple days ago I had waded through about a dozen of their training videos on statistical matters (the other big gap in my skills) … and then I noticed the four most recent are right in line with what I was needing to do with comparing writing samples.
Longtime readers will recall that I previously wrestled with “brain fog” as a result of Lyme disease and its treatment circa 2007 - 2009. New things that should have taken me a month to reach proficiency would require a year … or two … or maybe they’d just remain out of reach. Switching from Perl to Python was an agonizing drag from mid-2011 to mid-2013. I have started Matt Jackson’s Social and Economic Networks: Models and Analysis at least FIVE TIMES, and every single time between four to six weeks in I would have a health downturn and stop attending.
But I got my brain back in early June and this change seems to have stuck. There are no words in English to describe the gratitude I feel for this. I can study something for a bit and it starts working for me. WOW.
Visual Programming:
Orange has a visible programming style. Each of the nodes on this graph are a “widget” - loading a file, turning it into a corpus (bag of words for NLP), looking at it, then this embedding thing is a machine learning function. The distances are a least squares stats thing, the clustering bunches words up by their meaning, and t-SNE is some sort of visualization thing I just found. This feels a lot like the Unix command line tool chain environment - simple things you can stick together in order to produce solutions to complex problems.
And what excited me so much is this - the Spectroscopy section. A portion of that math I only vaguely recognize is available as widgets and there’s a little video training section on using it. Game on!
Or maybe … not so fast!
Danger Zone:
I don’t recall where I first saw Drew Conway’s piece The Data Science Venn Diagram but it immediately stuck with me.
The Danger Zone! haunts me. There were three and four and five credit hour statistics classes for almost every discipline when I was in school. But computer science and industrial engineering had this diminutive practical stats class that was just two credit hours. I don’t recall precisely why I chose that class, perhaps weariness with battling calculus, but that’s what I did. They say we use 10% to 15% of what we study in college and I would put that little stats class at the very top of that heap.
Conclusion:
Maybe, now that I’ve got my brain out of hock, 2024 is a time to shore up my less than stellar stats background. The point is this: having those spectroscopy tools in an easy canned format is great … but not if that leads me to a place where I think I know what’s going on, but I’ve committed some sort of grim unwarranted inference.
I make fun of the folks who goof on attribution using the community edition of RiskIQ, see Sovereign Challenge Theropod Stampede for the particulars. I don’t want to be THAT GUY, wading into some IO attribution problem, and promptly shooting myself in both feet.
So it’s a problem … but one that will no longer be torment for me to solve. I just need to spend a little time filling in some gaps.