P2P OS: Distributed server: any DHT will do, right? Wrong.

After diving into DHTs a while ago, i first thought i had it all figured out: DHT is the name of the game when it comes to distributed servers, or, at the very least, they are an appropriate and mature solution for providing a distributed routing service. And apparently that is indeed the case, but with a caveat: all the common DHT algorithms presented in the literature are highly unreliable in a high-churn rate [residential] end-user-supported P2P network. More specifically, what all common DHT algorithms (that i know of) lack is on one hand enough redundancy to cope with the kind of churn typically found in end-user P2P networks (where users frequently join and leave he network, unlike in a network of long-lived servers), and on the other hand they are not sufficiently resilient to face the kinds of concerted attacks that can be perpetrated in a P2P network by a set of coordinated malicious nodes.

To make the long story short, the conclusion for all this was that building the P2P OS distributed server by simply copy-pasting an existing DHT algorithm is a no-go, and this sent me right back to square one: "now what?"

Well, the breaking news story-of-the-day is that i think i found a way to strengthen DHTs just enough to make them cope with the high churn problem, and, together with the obfuscated code-based "moving target defense" mechanism, i might now have a complete solution to almost all the potential problems i can foresee at this stage (specifically, there is one more problem that i'm aware of that is still outstanding, namely protecting against DDoS attacks, but apparently there are accessible commercial solutions for this one also; i'll talk about this in another post after i'll do some more digging)

Without getting into too many technical details at this point (primarily because all this is still in a preliminary stage, without a single line of code being written to actually test the algorithms involved), the main ideas for an "improved DHT" are as follows:

use a "network supervisor" server which, based on its unique global perspective over the network, will be responsible for maintaining a deterministic network topology, all while also keeping the network's critical parameters within acceptable bounds
add redundancy at the network nodes level by clustering several routers inside each node: in brief, having several routers inside a node, coupled with a deterministic routing algorithm (as enabled by the deterministic topology of the network), should provide a sufficient level of resilience to malicious intruders such as to allow the network to operate properly

Sure enough, the points listed above are just the very top-level adjustments that i'm trying to make to the existing plain-vanilla DHTs, but there are quite a lot of fine points that need to be actually implemented and tested before celebrating, e.g. the iterative routing algorithm with progress monitor at each step in the routing process, having multiple paths from one node to another supported by a backtracking algorithm, node state monitoring and maintenance by the supervisor server, etc - and these are just a few examples of the issues that i am aware of.

At the end of the day, when all pieces are put together the overall picture looks something like this:

So basically this is how far i got: i have this "supervised network" architecture which i think might be a solution for a sufficiently resilient and reliable distributed server, and i have the code obfuscation-based network integrity protection, but now i need to test these thingies the best i can. I definitely won't be able to test a large-scale system anywhere near a real-life scenario until actually deploying it in the wild, but a preliminary validation of its key features taken one by one seems feasible.

PS
The network monitoring/maintenance algorithm, the node insertion/removal procedures, etc, are all pretty messy stuff that i still have to thoroughly double-check before actually diving into writing code -- e.g. here's a sneak preview for how a new node is inserted in, and announces its presence to, the routing ring:

the blue nodes are "currently" existing nodes positioned in an already-full 2³-node ring (i.e. 000::, 001::, 010::, 011::, 100::,, 101::, 110::, 111:: in the image above, where '::' means all trailing bits are 0)
the yellow nodes encircled in solid lines are nodes that have already been inserted in the yet-incomplete 2⁴-node ring (the yellow nodes are interleaved with the existing 2³ blue nodes in order to create the new 2⁴-node ring)
the red node is the node that is "currently" being inserted in the routing ring (more specifically, in the yellow nodes "sub-ring" at index 0111::, i.e. in between the [already existing] blue nodes 011:: and 100::)
the yellow nodes encircled in dashed lines are nodes that will be inserted in the [yet-incomplete] yellow nodes ring after the "current" insertion of the red node is completed
after the yellow sub-ring will be completely populated (i.e. there will be a total of 2⁴ [yellow and blue] nodes in the routing ring), the routing ring will be expanded to 2⁵ nodes by inserting new nodes in between the existing [yellow and blue] nodes of the 2⁴-node ring, a.s.o.; i.e. the routing ring always grows by creating a new sub-ring of "empty slots" in between the existing nodes, and incrementally populating said empty slots with new nodes

P2P OS

Thursday, August 29, 2013

Distributed server: any DHT will do, right? Wrong.

No comments:

Post a Comment