To make the long story short, the conclusion for all this was that building the P2P OS distributed server by simply copy-pasting an existing DHT algorithm is a no-go, and this sent me right back to square one: "now what?"
Well, the breaking news story-of-the-day is that i think i found a way to strengthen DHTs just enough to make them cope with the high churn problem, and, together with the obfuscated code-based "moving target defense" mechanism, i might now have a complete solution to almost all the potential problems i can foresee at this stage (specifically, there is one more problem that i'm aware of that is still outstanding, namely protecting against DDoS attacks, but apparently there are accessible commercial solutions for this one also; i'll talk about this in another post after i'll do some more digging)
Without getting into too many technical details at this point (primarily because all this is still in a preliminary stage, without a single line of code being written to actually test the algorithms involved), the main ideas for an "improved DHT" are as follows:
- use a "network supervisor" server which, based on its unique global perspective over the network, will be responsible for maintaining a deterministic network topology, all while also keeping the network's critical parameters within acceptable bounds
- add redundancy at the network nodes level by clustering several routers inside each node: in brief, having several routers inside a node, coupled with a deterministic routing algorithm (as enabled by the deterministic topology of the network), should provide a sufficient level of resilience to malicious intruders such as to allow the network to operate properly
At the end of the day, when all pieces are put together the overall picture looks something like this:
So basically this is how far i got: i have this "supervised network" architecture which i think might be a solution for a sufficiently resilient and reliable distributed server, and i have the code obfuscation-based network integrity protection, but now i need to test these thingies the best i can. I definitely won't be able to test a large-scale system anywhere near a real-life scenario until actually deploying it in the wild, but a preliminary validation of its key features taken one by one seems feasible.
PS
The network monitoring/maintenance algorithm, the node insertion/removal procedures, etc, are all pretty messy stuff that i still have to thoroughly double-check before actually diving into writing code -- e.g. here's a sneak preview for how a new node is inserted in, and announces its presence to, the routing ring:
- the blue nodes are "currently" existing nodes positioned in an already-full 23-node ring (i.e. 000::, 001::, 010::, 011::, 100::,, 101::, 110::, 111:: in the image above, where '::' means all trailing bits are 0)
- the yellow nodes encircled in solid lines are nodes that have already been inserted in the yet-incomplete 24-node ring (the yellow nodes are interleaved with the existing 23 blue nodes in order to create the new 24-node ring)
- the red node is the node that is "currently" being inserted in the routing ring (more specifically, in the yellow nodes "sub-ring" at index 0111::, i.e. in between the [already existing] blue nodes 011:: and 100::)
- the yellow nodes encircled in dashed lines are nodes that will be inserted in the [yet-incomplete] yellow nodes ring after the "current" insertion of the red node is completed
- after the yellow sub-ring will be completely populated (i.e. there will be a total of 24 [yellow and blue] nodes in the routing ring), the routing ring will be expanded to 25 nodes by inserting new nodes in between the existing [yellow and blue] nodes of the 24-node ring, a.s.o.; i.e. the routing ring always grows by creating a new sub-ring of "empty slots" in between the existing nodes, and incrementally populating said empty slots with new nodes