I noticed Micah Lee announcing his new book, Hacks, Leaks, and Revelations on Christmas Eve last year. The Kindle copy wasn’t available until January but I immediately hit go, knowing it was going to be good.
Attention Conservation Notice:
This book IS really good. Pardon me while I gush a bit over it.
Review:
What you will find in this book is someone with skills very similar to mine, only he’s steadily been a journalist in the area while I’ve done … other things. This book is literally a strong hand extended for you to hold while you stand up in this area. He’s divided the material into five sections.
Sources & Datasets
Tools Of The Trade
Python Programming
Structured Data
Case Studies
The first section covers how to go about getting data, how to protect yourself and your sources, basically all of the device level security and tradecraft you’ll need to get started. Amazingly, he had the patience to outline doing this for Linux, Mac, and Windows(!) throughout the entire book.
Tools of the trade involves packages you’ll want to use, some Docker based applications, and how to handle the three main types of email boxes. Python programming is enough of an introduction to get you started, there are many things where all you need is to be handy with the Python package manager and its virtual environments.
Part IV on Structured data covers BlueLeaks, the Parler insurrection data, and EpikFail, which was the dump of all of registrar Epik’s systems. I was paid to examine BlueLeaks, I was aware that something was happening with Parler right when it went down, and I managed to avoid the Epik troubles until right at the end, when I attended a group video chat with CEO Rob Monster and Anonymous founder Aubrey Cottle. All three of these produced tabular data that needs the ability to handle such content using a scripting language, like Python.
The case studies in Part V are about the AFLDS horse paste pushers and the famous neo-Nazi Discord chat leaks. These are both datasets I was aware of when the became available, but I’ve never done anything with either of them.
Conclusion:
Reading this book took me about two hours - 80% or more was just review. The things I found that I marked for further attention were:
Dangerzone is a tool for disarming potentially hazardous data files.
Aleph is a Docker collection of tools for handling large volumes of data.
My SQL-fu is not strong, so I’m going to review the Epik section.
I’m not sure what to do with this book. Do I use it as the foundation for Q2 of 2024 and go through it in detail? Do I just cover the three sections I personally feel the need to examine closely? Or should I go through and offer additional exercises/reading for certain areas?
The only thing in this area where I am ahead of what’s in the book are the adversary resistant computing and networking things I publish here. How to handle live data, what to do with burner phones, and most of all the need for fail closed VPN connections are not covered.
Whatever the case, this book is a fantastic resource for those of you who’ve been following along here with an eye on analyzing the take from operations.