Last weekend, while working on Substack Zombie Wave, I noticed the update to the unofficial API for this place, and I was immediately delighted. Substack in 2025 feels like pre-teens Twitter. Itβs light, itβs fast, you can see as far as you wish in terms of the network.
So I did a simple crawl of my network and then fired up Gephi. I started with my subscriber list, spiraled out three levels, and found 1,974 accounts with 150,000+ links to almost 48,000 other unique Substacks. The larger the text, the more subscribers I found among the 1,974. The colors have no legend, theyβre auto-generated as a visual clue to the natural subdivisions among the various components of this network.
Attention Conservation Notice:
The eye candy is in the lede, the rest of this will be hardcore GraphCraft. This may be one of the last English sentences you see here β¦
History:
My career from the late 1980s has always involved computer networks. I ascended the Cisco career ladder to the Professional certs realm in the late 1990s, then turned my attention to other sorts of networks starting in 2010. I benefitted first from Lada Adamicβs class on Coursera, then made repeated attempts at Matt Jacksonβs graduate level network analysis class.
Over the years I have applied Maltego to technical and detailed social network analysis, Gephi to large scale conversations, and Iβve spent time with Sentinel Visualizer for counter-fraud/insurgency/etc work. Having a computer science background, I expanded into hand coding network construction in NetworkX for use with Gephi, in order to get the metrics I needed to make sense of what I was seeing.
When I began using Substack in earnest, in the fall of 2023, I rigged up some quick scripts using awk/sed and wget in order to see the limited network in which my account was embedded. Then this nominally beneficial botnet annoyed me, so I started digging, and now here we are.
Methods & Metrics:
The very first thing I did was a simple command line spider - just get the names of subscribers to a given account. This is the absolute bare minimum and it produced this set of 150k+ relationships. It is BADLY flawed, missing about half of all subscriptions. It doesnβt do custom domains yet, and I think itβs also failing on accounts that accepted a random name, then later exercised their single rebranding.
While the crawl was running and got out pad & pencil so I could sketch out the basics of an ArangoDB graph database for this content. It was readily apparent that caching would please both me and Substack. There will also be a strong need for some derived metrics - while Gephi is very flexible in this area, itβs clear there are things about Substack that will not only need to be cached for the sake of speed, there are others that will need to be created.
As an example of a synthetic attribute, take a look at the accounts that host the most subscriptions. The upper bound on reading might be a hundred accounts, the upper bound on Notes usage might be following a thousand, but some of these are already well past that. Fully a third of the accounts subscribed to, 17,660 in total, have just one subscriber among the 1,974 accounts sampled.
So we need (a) number(s) that express(es) whether an account is a purely human reader, some sort of cyborg, or purely an automated tool. Each has a role to play, the thing that troubles me is when automation is put to work simulating a human audience. Thatβs wrecked most of the social networks where Iβve seen it employed; they either greatly curtail API access, like LinkedIn did, or they implode like Xitter.
See that green blob to the right of thesilencepath, at about 2:30? That account subscribes to 988 Substacks, the vast majority of which are not of interest to the other 1,973 in the sample. The blob are the singletons. When I get a bit further into this process the ArangoDB database will, as an emergent property of building a well done graph, permit me to see which accounts are doing this. Theyβre clearly not human, less a few people with psychiatric issues that relentlessly make piles.
Conclusion:
Iβm good at the network science angle of this stuff, but Iβm not enough of a programmer to expose this to the world. I just asked a programming focused chat room for some help getting this going.
Much like what is happening, albeit slowly, with Shall We Play A Game?, this is an opening for some analytics of a type that Substack does not provide.
Coda:
More eye candy as I get back up to speed with Gephi after years of not using it.