It really feels like an under defined task. Do you actually need to see those nodes? At that scale, you never want to render 100B of them. Instead you would need some kind of density aggregation when zoomed out and moving to LoD style k-d tree partitioning when zoomed in. That's almost the area of rendering engines like Unreal's Nanite. You can create your own renderer for data like this, but game engines are likely your closest inspiration. Then again, unless you already have x/y coordinates ready, (based on graphviz I'm assuming you don't) even laying out the points will be a very heavy task. (The usual iterative force directed layout would likely take days)
But if you were my coworker I'd really press on why do you want the visualisation and if you can get your answers in some other way. And whether you can create aggregates of your data that reduces it to thousands of groups instead. Your data is a minimum of ~800GB if the graph is a single line (position + 64bit value encoding each edge, no labels), so you're not doing anything real-time with it anyway.
Truly, 100B nodes needs some sort of aggregation to have a chance at being useful. On a side project I've worked with normalizing >300GB semi-structured datasets that I could load up into dataframe libraries, I can't imagine working with a _graph_ of that size. I thought I was a genius when I figured out I could rent cloud computing resources with nearly a terabyte of RAM for less than federal minimum wage. At scale you quickly realize that your approach to data analysis is really bound by CPU, not RAM. This is where you'd need to brush off your data structures and algorithms books. OP better be good at graph algorithms.
1) 100B? Try a thousand. Of course context matters, but I think it is common to overestimate the amount of information that can be visually conveyed at once. But it is also common to make errors in aggregation, or errors in how one interprets aggregation.
2) You may be interested in the large body of open source HPC visualization works. LLNL and ORNL are the two dominant labs in that space. Your issue might also be I/O since you can generate data faster than you can visualize it. One paradigm that HPC people utilize is "in situ" visualization. Where you visualize at runtime so that you do not hold back computation. At this scale, if you're not massively parallelizing your work, then it isn't the CPU that's the bottleneck, but the thing between the chair and keyboard. The downside of in situ is you have to hope you are visualizing the right data at the right time. But this paradigm includes pushing data to another machine that performs the processing/visualization or even storage (i.e. compute on the fast machine, push data to machine with lots of memory and that machine handles storage. Or more advanced, one stream to a visualization machine and another to storage). Checkout ADIOS2 for the I/O kind of stuff.
You're right, but I think that may be what the OP is actually asking for. They talk of "zooming out" but I don't think they mean so they can literally zoom out and see all 100b nodes individually on their screen at once but instead mean that some high level / clustered view is shown to give an overview.
That being the case, I think you're suggesting that this high level summarisation happens as a separate preprocessing step (which I agree with FWIW) whereas I think they're imagining it happening dynamically as part of rendering.
there's 8 million pixels in 4k, so if you're trying to graph 8 million points, might as well just fill up the screen with a single color and call it a day. If you have 8 billion, well you can graph about 0.1% of that and fill up every single pixel of the screen, but then you're just looking at noise. To be able to show connections between nodes, you'd need maybe 9 pixels per node, so that's around 900k nodes you might be able to graph on a 4k screen, assuming a maximum number of connections between nodes is 8, and the connected nodes are adjacent. So now you're at about 0.01% can be graphed on yor display, and that's not even very usable and there'd not be a lot of information you could glean from that.
You could go to 81 pixels per node and you'd be able to connect more nodes to a graph, and maybe you could make some sense of it that way, but you'd only be graphing 0.001% and at that point, what's your selection criteria? Your selection criteria for nodes would have more of an impact than how you choose to graph it.
It's unclear to me if you're making the same point I'm about to make. So I guess at best it's another point and at worst another framing?
I think the relationship to a 4k image is a great way to explain why you should never do this. Specifically because we can note how as resolution increases it gets difficult to distinguish the difference. Like the difference between 480p and 720p is quite large but 4k and 8k is... not. A big part of why the high res images even work is because the data being visualized is highly structured and neighboring data strongly relates. So maybe OP's graph contains highly structured graph cliques. But it is likely doubtful. Realistically, OP should be aiming for ways to convey their data with far less than 10k points. Maybe ask yourself a question: can you differentiate a picture of a thousand people from two thousand? Probably not.
What is the average degree of the 100B nodes in this graph? If it's anything north of like...2 (or maybe 1.0000001, or less, unsure), then this sounds about as intractable as "visualizing Facebook friends" (times 30)
Comparing it to a rendering engine I think is a bit of a cheat unless the points do have some intrinsic 2-D spatial coordinates (and no edges beyond immediate adjacency). You're ultimately viewing a 2-D surface, your brain can kinda infer some 3-D ideas about it, but if the whole volume is filled with something more complex than fog, it gets tricky. 4-D, forget about it. 100-D as many datasets are? lol.
Having worked in a lab where we often wanted to visualize large graphs without them just devolving into a hairball, you'd need to apply some clustering, but the choice of clustering algorithm is extremely impactful to how the whole graph ends up looking, and in some cases it feels like straight deception.
Speaking of Nanite, anybody know of a data visualization tools actually implemented with Mesh Shaders? I've dabbled with time series data, not graphs, but it feels lonely.
My use case is that I have a graph of flops, latches, buffers, AND, OR, NOT gates and I want to visualize how data is changing/getting corrupted as it goes through each of them.
Ok, so you have nice natural boundaries between systems. If you're dealing with something processor-like, you have really good chokepoints where for example ALU / register / caches connect. The task may be way easier if you deal with one of them at a time. Maybe even abstract anything less interesting (memory/cache?) Would visualising things per-system work better for you, or maybe visualising separate systems getting affected instead of specific nodes?
Having the structure of the device available should also help with the layout - we know you can group the nodes logically into independent boxes instead of trying to auto-layout everything.
But if you were my coworker I'd really press on why do you want the visualisation and if you can get your answers in some other way. And whether you can create aggregates of your data that reduces it to thousands of groups instead. Your data is a minimum of ~800GB if the graph is a single line (position + 64bit value encoding each edge, no labels), so you're not doing anything real-time with it anyway.