Saturday, December 28, 2013

IPv6 adoption in exponential growth stage

This is the third year in a row that sees IPv6 adoption growing exponentially, with a factor a bit over 2:1 year-on-year (2011 -> 2012 -> 2013). Since we're in the early deployment stage, this trend can be expected to continue for at least several years (and maybe with an even larger factor), such that by the end of 2015 a 10% IPv6 penetration is most likely a conservative estimation.

The IPv6 penetration, even in low numbers, is important for this project because IPv6 can be used to do the heavy lifting in the routing ring, such that the more IPv6 connections are deployed, the less the bandwith strain on the routing ring peers.

Monday, December 9, 2013

Cross-platform multi-threaded foundation library

After trying to probe the future for over a year for potential show-stopper problems, a few months ago i basically decided that i gathered enough information to effectively start coding on P2P OS with [what i think it is] a pretty good chance of having covered all the major issues that might stop me dead in my tracks, and today is the day i can proudly announce the first working version of a foundation library that i'll be using for all P2P OS development: say hello to my new shiny libposif-0.9

So why another library? Well, to make the long story short, i had three main reasons for this:
  1. i wanted an independent (and free, at least LGPLed) library. Sure enough, there are quite a number of LGPLed libraries out there, but none of them is modular- and minimalistic enough for my taste: what i wanted was a library that has no dependencies on anything else than its host OS' kernel API (and even these dependencies must be fully POSIX-compatible) - or, if it does have some dependencies, those dependencies should be very easy to eliminate
  2. i wanted a cross-platform library. Again, there are many cross-platform libraries in the wild, but they come in bloated bundles from which it's very hard to extract only the modules one actually needs for a specific application
  3. finally, i wanted a standards-based library (except only for a minimalistic POSIX-compatible OS API, which should itself be encapsulated into a cross-platform wrapper): this means e.g. i can use C++11's std::threads but not pthreads, i can use a [platform-abstraction wrapper for] an "mkdir" command but i cannot use a Recycle Bin API, i can use HTML5 for a GUI (or text consoles for text-only UIs) but not graphical widgets or non-standard multi-media components (whether they are wrapped into a cross-plaform library or not), etc
So now with listing the goals out of our way, here's a brief description of what libposif is all about:
  • Messaging-based multi-threaded processing: i called this library module "libposif-mt", and what it does is it allows a program to:
    • organize processing into multiple "Tasks", where each task is a collection of one or more execution "Threads". Each task can start any number of threads, where a libposif "Thread" corresponds to an OS thread (i.e. it's a C++11 std::thread)
    • each "Thread" groups any number of "Automata", where each automaton is an independent state machine that can send messages ("SendMessage()") to other automata (may the destination automaton be part of the same thread, or in a different thread in the same task, or in another task altogether) and is notified via a callback function ("onMessageReceived()") of incoming messages sent by other automata
    And here's a picture of it all:

    To explain why i need this kind of functionality, i'll quote from an interminable doc in which i gathered all the various things that need to be implemented:
    • each router in a node pings all the other routers in its node every ~1 minute, and it updates its own node image and its live routers list
    • if a pinged router does not respond to a ping then another 2 pings are immediately retried at 10s interval, and if 5 successive ping “sequences” (i.e. 1 ping+2 retries) fail on a router, or if a router informs the ping sender that it has left the node, then said router is marked as offline in the ping sender's live router list and it will no further be pinged
    • when a router detects 5+/10 routers in its own node as offline, it sends to the server its list of offline routers with a rate limiting scheme; multiple such messages coming from different routers in a node will eventually trigger a node image update on the server
    The quote above is by no means intended to shed any light on the inner workings of an algorithm, but rather it's meant to show that the algorithm is completely asynchronous (when/then-based instead of if/then), i.e. it begs for independent inter-connected state machines that simply track some state conditions, exchange messages with one another, and change their state when a given situation occurs - and this is exactly what libposif's "Automaton" does.
    • note that the above algorithm's mechanics are very similar in nature to how the data chunks are handled in torrents, i.e. each file in a torrent is processed independently, a file is defined as consisting of blocks which are themselves processed asynchronously, etc, and depending on various [asynchronous] conditions associated with a file, or a block, etc, the torrent client takes a specific course of action
  • Portable file system interface: i called this module "libposif-fs", and it's basically implemented using POSIX functions plus several OS-specific commands which are not defined in POSIX (i didn't want to use boost, it's way too bloated a library for my taste, so i'll just wait for the file system functions to be included into std:: before using them). The big deal about this library module is that i tried to make it [reasonably] safe for program development, i.e. i tried to minimize the risks of seeing my "windows" directory vanish because i'm sending an empty path to a "rmdir" function and the likes
    • in brief, libposif-fs declares a "Sandbox" class which has to be initialized with a "base path", and any and all file operations are methods of a Sandbox object and are confined to the base path (and its sub-directories) of the Sandbox object that they use; equally important, the base path is thoroughly tested against critical system paths and against a user-definable set of "pathNotAllowed()" set of rules when a Sandbox object is created, such that with a little bit of care (when initializing a Sandbox object's base path) the potential for damaging other applications' files is really slim
  • UDP sockets: this one is unsurprisingly called "libposif-udp", and i used Qt's QUdpSocket library for its cross-platform implementation (this is a good example of using a thrid-party cross-platform library without critically relying on it because i'm not using Qt's event loop-based signal/slot mechanism or any other fancy stuff - just calling QUdpSocket's plain-vanilla read/write methods) 
  • HTTPQueryServer: part of the "libposif-tcp" module, this component is a minimalist HTTP server intended to be run on the localhost and serve as the backend for HTML5-based GUIs (see the HTML browser-based client/server GUI model description here). In order to allow multiple simultaneous Ajax connections from the browser to multiple server sockets on the localhost, the HTTPQueryServer object implements the CORS specification
  • Miscellaneous networking functions: this "libposif-netmisc" module contains a collection of networking functions such as enumerating the local host's IP addresses (IPv4 & IPv6), performing DNS lookup and reverse DNS, etc, and it's implemented using the Qt library (it's just a wrapper over the corresponding QtNetwork functions)
  • UPnP client: "libposif-upnp": i just grabbed miniupnp for this, so not much to say about this one since it's as cross-platform a library as it can get
  • Firewall controller: part of the "libposif-fwl" module, this is a platform-dependent component implemented as a wrapper object over whatever firewall is installed on the system; for the time being i only wrote a netsh wrapper for plain vanilla windows, but it's all dead simple to extend (just add some extra files and configure the library to point to them)
    • just for the purpose of illustration, here's how the library is configured to compile for windows with [a wapper over] windows' netsh.exe's firewall controller:
      #define libposif_fwlctrl_h\
      #define libposif_fwlctrl_cpp\
      Now suppose i want to add support for a linux firewall controller to libposif: this will involve writing/grabbing a linux firewall controller, packing the source code in a sub-folder e.g. "iptables-ctrl" of the firewall controller's source modules folder "libposif-fwlctrl.src", and then when i need to compile the library for linux (and use this firewall controller implementation) i'll just need to set the firewall #defines in the library configuration file point to this implementation:
      #define libposif_fwlctrl_h\
      #define libposif_fwlctrl_cpp\

So, to rise, this is where i stand right now: i have a 20-something-page document where i gathered all the most minute details of the algorithms involved in P2P OS (network policies, client, distributed server, software protection, etc), i now have this libposif foundation library to build upon, so i guess the next big thing should be a glorious post about the first piece of code that will actually do something :)

I'll most likely be expanding this library with new functionality over time, but it's probably not worth it writing a new post each time i'll do this (well, unless it's something that i'll deem spectacular enough to warrant a separate post), so i'll just keep silently updating this post in the background as i'll add new modules and/or features.

In other news, i completely switched to Qt Creator, and after playing around with it for the last several months (and after quite a number of bugs and idiosyncrasies have been fixed during this time) i can now recommend it for any serious cross-platform standards-based development (there still are a few rough edges to be polished here and there, but it's already usable as it is). Here's a glimpse of my new Qt Creator 3.0 desktop in all its glory:

So good bye Borland Builder, you served me well for over 15 years, but the days of closed source software extortion are pretty much over. Nice to meet you Qt, and have a nice life!

Thursday, August 29, 2013

Distributed server: any DHT will do, right? Wrong.

After diving into DHTs a while ago, i first thought i had it all figured out: DHT is the name of the game when it comes to distributed servers, or, at the very least, they are an appropriate and mature solution for providing a distributed routing service. And apparently that is indeed the case, but with a caveat: all the common DHT algorithms presented in the literature are highly unreliable in a high-churn rate [residential] end-user-supported P2P network. More specifically, what all common DHT algorithms (that i know of) lack is on one hand enough redundancy to cope with the kind of churn typically found in end-user P2P networks (where users frequently join and leave he network, unlike in a network of long-lived servers), and on the other hand they are not sufficiently resilient to face the kinds of concerted attacks that can be perpetrated in a P2P network by a set of coordinated malicious nodes.

To make the long story short, the conclusion for all this was that building the P2P OS distributed server by simply copy-pasting an existing DHT algorithm is a no-go, and this sent me right back to square one: "now what?"

Well, the breaking news story-of-the-day is that i think i found a way to strengthen DHTs just enough to make them cope with the high churn problem, and, together with the obfuscated code-based "moving target defense" mechanism, i might now have a complete solution to almost all the potential problems i can foresee at this stage (specifically, there is one more problem that i'm aware of that is still outstanding, namely protecting against DDoS attacks, but apparently there are accessible commercial solutions for this one also; i'll talk about this in another post after i'll do some more digging)

Without getting into too many technical details at this point (primarily because all this is still in a preliminary stage, without a single line of code being written to actually test the algorithms involved), the main ideas for an "improved DHT" are as follows:
  • use a "network supervisor" server which, based on its unique global perspective over the network, will be responsible for maintaining a deterministic network topology, all while also keeping the network's critical parameters within acceptable bounds
  • add redundancy at the network nodes level by clustering several routers inside each node: in brief, having several routers inside a node, coupled with a deterministic routing algorithm (as enabled by the deterministic topology of the network), should provide a sufficient level of resilience to malicious intruders such as to allow the network to operate properly
Sure enough, the points listed above are just the very top-level adjustments that i'm trying to make to the existing plain-vanilla DHTs, but there are quite a lot of fine points that need to be actually implemented and tested before celebrating, e.g. the iterative routing algorithm with progress monitor at each step in the routing process, having multiple paths from one node to another supported by a backtracking algorithm, node state monitoring and maintenance by the supervisor server, etc - and these are just a few examples of the issues that i am aware of.

At the end of the day, when all pieces are put together the overall picture looks something like this:

So basically this is how far i got: i have this "supervised network" architecture which i think might be a solution for a sufficiently resilient and reliable distributed server, and i have the code obfuscation-based network integrity protection, but now i need to test these thingies the best i can. I definitely won't be able to test a large-scale system anywhere near a real-life scenario until actually deploying it in the wild, but a preliminary validation of its key features taken one by one seems feasible.

The network monitoring/maintenance algorithm, the node insertion/removal procedures, etc, are all pretty messy stuff that i still have to thoroughly double-check before actually diving into writing code -- e.g. here's a sneak preview for how a new node is inserted in, and announces its presence to, the routing ring:

  • the blue nodes are "currently" existing nodes positioned in an already-full 23-node ring (i.e. 000::, 001::, 010::, 011::, 100::,, 101::, 110::, 111:: in the image above, where '::' means all trailing bits are 0)
  • the yellow nodes encircled in solid lines are nodes that have already been inserted in the yet-incomplete 24-node ring (the yellow nodes are interleaved with the existing 23 blue nodes in order to create the new 24-node ring)
  • the red node is the node that is "currently" being inserted in the routing ring (more specifically, in the yellow nodes "sub-ring" at index 0111::, i.e. in between the [already existing] blue nodes 011:: and 100::)
  • the yellow nodes encircled in dashed lines are nodes that will be inserted in the [yet-incomplete] yellow nodes ring after the "current" insertion of the red node is completed
  • after the yellow sub-ring will be completely populated (i.e. there will be a total of 24 [yellow and blue] nodes in the routing ring), the routing ring will be expanded to 25 nodes by inserting new nodes in between the existing [yellow and blue] nodes of the 24-node ring, a.s.o.; i.e. the routing ring always grows by creating a new sub-ring of "empty slots" in between the existing nodes, and incrementally populating said empty slots with new nodes

Thursday, August 1, 2013

Been stuck for several months, but now i might be on to something

As i explained in an earlier post, there are several classes of internet connection that a user may have in the real world, but for the purpose of this discussion we shall simplify the categorization in only two [top-level] "meta-classes":
  • 'good' internet connections: these connections allow a peer to have direct P2P connectivity with any other peer on the network; and
  • 'leech' internet connections: these connections only allow two peers to connect to each other by means of a relaying peer, where said relaying peer must have a 'good' connection in order to be able to act as a relay
As it can be seen, any two peers with 'leech' connections will have to rely on a third-party relaying peer with a 'good' connection in order to be able to connect to each other.

In other words, there are real-world objective reasons that will prevent all peers from being equal on the network: 'leeches' will always require assistance from 'good' peers, while they will be truly unable to assist other peers on the network in any way (because of their objectively problematic internet connection)

The problem (that got me stuck for over four months):
In the real-world internet, the ratio between 'good' internet connections and 'leech' connections is (by far) sufficiently high to enable a cooperative self-sustained P2P network, i.e. there are enough 'good' peers that can provide relaying services to the 'leeches' upon request. HOWEVER, the very fact that there is a network contribution disparity between 'good' peers and 'leeches' can motivate some users to commit severe abuses that can ultimately bring down the network (if too many users become abusive): namely, a peer with 'good' connectivity might just decide it doesn't want to serve the network (by providing [bandwidth-consuming] relaying services to the unfortunate 'leeches'), and in order to get away with this unfair behavior all it has to do is to misrepresent its 'good' internet connection as being a 'leech' connection: once successful in misrepresented itself on the network as 'leech', it will not be requested to provide [relaying] services on the network.

So the problem can now be stated as follows:
how can an open-protocol P2P network be protected against hacked malicious clients which, because the network protocol is open, can be crafted in such a way that they will fully obey the network protocol syntax (and thus will be indistinguishable from genuine clients based solely on their behavior), but they will falsely claim to have 'leech'-type of internet connections that prevent them from actively contributing to the network. In brief, said malicious clients will unfairly use other peers' bandwidth when they'll need it, but will not provide [any] bandwidth of their own to the other peers when they'll be requested to do so, and they will get away with it by falsely claiming that they are sitting behind a problematic type of internet connection which prevents them from being cooperative contributors to the network (when in truth they are purposefully misrepresenting their internet connection's capabilities in order to make unfair use of the network).

The standard solution (which cannot be used):
The standard solution to the problem described above is to make sure that all the peers in the network are running a digitally-signed client program, which client program is a known-good version that a central authority distributes to the peers. However, once we dive into the details of how such a solution can be implemented we get into trouble: specifically, digitally-signed clients cannot be used in the P2P OS ecosystem because this would imply the existence of an [uncompromised] signature-validation DRM running on the peers' computers, which we cannot assume, because if we would make such an assumption we would only shift the problem of “how do we prevent compromised peers” to “how do we prevent compromised DRMs”, i.e. we'd only get right back to square one

A saving idea? (go or no-go, not sure yet):
A new way of protecting a known-good system configuration is the talk of the town these days, namely the "moving target defense" (a.k.a. MTD) [class of] solutions (apparently this concept - as opposed to the underlying techniques - is so new that it didn't even make it in wikipedia at the time i'm writing this), and for the specific case of the P2P network problem as i stated it above (i.e. resilience to maliciously crafted lying peers) the MTD translates into the following:
  1. have a central authority that periodically changes the communication protocol's syntax, then creates a new version of the client program which complies with the new protocol, and finally it broadcasts the new [known-good] version of the client program on the P2P network; in this way, the protocol change will immediately prevent ALL old clients, including the compromised ones, to log onto the network, and will require each peer to get the new [known-good] version of the client program as distributed by the central authority (i.e. all the maliciously-crafted compromised clients are effectively eliminated from the network immediately after each protocol change)
  2. the protocol changes that are implemented in each new version of the client program will be deeply OBFUSCATED in the client program object code (using all the code obfuscation tricks in book), with the goal of delaying any [theoretically possible] successful reverse engineering of the new protocol beyond the release of the next protocol update and thus render the [potentially cracked] older protocol(s) unusable on the network 
  3. the protocol obfuscator must be automatic and must itself be an open source program, where the only secret component (upon which the entire system security scheme relies on) must be the specific [random] strategy that the obfuscator elects to use as it releases each new version of obfuscated clients
As a result, after each protocol update the P2P network will only host known-good versions of clients, and by the time when any protocol reverse engineering effort might be successful, a new protocol update will already have been released, thus preventing any prior-to-the-update reverse-engineered clients to log onto the network.

The work ahead:
As it can be seen from the above description, the dynamic protocol update solution relies on the ability to create and distribute obfuscated program versions at a higher rate than an attacker's ability to create a malicious reverse engineered version of the program. Thus, given a system that uses the dynamic protocol adjustment method (as described above), the network integrity protection problem translates into the following problem:
[how] can a protocol be obfuscated such that the [theoretical] time necessary to crack the obfuscated code, given a known set of resources, exceeds a predefined limit?
Should the protocol obfuscation problem have a solution (probably underpinned by dynamic code obfuscation techniques) then the problem is solved (and i won't mind if it will be an empirical solution for as long as it proves viable in the real world) - so this is what i'm trying to find out now.

A few articles on/related to code and protocol obfuscation:

I also started a discussion on code obfuscation on comp.compilers, feel free to join here:!topic/comp.compilers/ozGK36DRtw8

Sunday, June 2, 2013

Striving for perfection

I ran into yet another unexpected roadblock (pretty nasty stuff btw), i's workin' on it, but ain't gonna whine about all this just yet, so let's take a break for a moment (pun intended :P) and peek at the pros for a change (it's Sunday, what the heck!)

Tuesday, March 19, 2013

A milestone year in the standardization of computing

The time for staring work on the production-quality P2P OS is getting nearer by the day, so i thought it's high-time to look around a bit and try to get a clear picture of what tools and technologies are out there for building true cross-platform applications nowadays, and much to my delight here's what i found:
  1. we now have native threads built right into C++11, so there's no need for third-party libs (with God knows what kinds of portability problems each) any more
  2. since the days of HTML4 already, one could build a full-featured GUI right onto a web page and have it rendered in any compliant browser (Wt is a pretty neat showcase for what can be done this way), except only for the multimedia features which were not included in the HTML4 standard; now the W3C is making really quick progress towards baking exactly these capabilities into the HTML5 standard (via the free VP8 format), plus device I/O and persistent storage (all of theses especially driven by WebRTC's needs), such that once work will have been completed on these issues a standard HTML5 web page will be able to deliver a full desktop-like experience right inside a web browser (including smooth 2D graphics editing and animation via SVG and high-quality 3D effects via WebGL)
  3. last but not least, Qt is making steady progress towards becoming a usable C++ cross-platform development tool not only for all the major desktop OSes Win/Mac/Lin, but also for iOS, Android, and BB10
So at the end of the day, the world we live in looks a bit like this to me:

Back to the future: the good old tried and tested X client/server model anyone?

Now, while things look pretty neat the way they are already, how about going one step further? Namely, consider you decide to write your apps' UI based on a minimalist restriction of what HTML5 has to offer, e.g. you'll only use a limited set of widgets (say buttons, drop-down lists, and check-boxes), you'll use only an [editable] text area for all your text interactions, a canvas and an image renderer for graphics, and a simple file system API (well, this is prolly "too minimalistic" a set of UI functions, but i'm picking it up here just to get my message across); in this case, once your app is written based solely on (such) a subset of functions, what you'll have at the end of the day is a native app which will require a UI renderer with a very limited set of capabilities, such that not only you can use a standard HTML5 renderer on systems where it is available, but you can also build a minimalist UI renderer of your own (in native code) on systems that do not provide an HTML renderer: more specifically, where ever you'll find a POSIX(-like) system it's very likely you'll be able to get a port of WebKit (or even a full-fledged browser) for a pretty decent price (if not for free altogether), but if you are about to build an entirely new system of your own, then porting the Linux kernel (or writing a new fully POSIX-compliant kernel of your own) just in order to get WebKit running would be a titanic work that no ordinary small(ish) company can put up with all by itself.

And this is where the minimalist UI model comes into play: if you can have the GNU toolchain ported on your system (which should be pretty easy stuff, especially if you use an existing processor architecture - e.g. there's the OpenRISC out there for grabbing, and it comes with GNU toolchain support and all right out of the box), then all you'll have to do is to implement your small(ish) UI spec (e.g. in C++) and compile it with your gcc, and there you go, you'll have your apps truly cross-platform, ready to be deployed as-is both on any (minimalist) systems that implement your UI spec, and also on any systems that have an HTML5 browser:

The no-brainer solution: have your apps truly cross-platform-ready with a minimalist UI spec

Well, you tell me if i am wrong, but in my view (and given all of the above) this year may well become the most significant milestone in the evolution of standards-based programming after the standardization of ANSI C back in 1969.

Apparently we'll still have to cope with writing BSD sockets wrappers for a while until/if they'll eventually be included in the C++ stdlib, but quite frankly that's a pretty trivial piece of code (given that we have standardized multi-threading baked inside C++) and not much more then a residual problem nowadays.

Thursday, March 14, 2013

Major breakthrough, project back on tracks

After over a year of crunching the IPv4 CGN traversal problem at the back of my mind, it finally clicked! Or, more appropriately called, it banged!
In fact, this click (or bang, or whatever else i should call it) is such a major breakthrough, with such immense potential implications, that i have to refrain from saying much about it on this blog before filing a provisional patent; but what i can say though is that i'm 99% confident i found a novel algorithm that can break through all the IPv4 CGN types that are out there in the wild - and this is not just in theory, i actually tested it for over two weeks on all the mobile connections that i could find, on multiple networks, in eight countries around the world (with a few more tests pending at the time of writing). It did take me three weeks to refine the algorithm down to its current details (and it was quite a bumpy ride), but the bottom line is that i now have a working solution for full P2P/IPv4 connectivity, down to the most minute details, so that i no longer have to wait for God knows how many more months (or years?) to see how the IPv6 dust will eventually settle; that is, i can start working on the production-quality implementation of P2P OS right now.

So the next step: full-throttle fund raising campaign for developing the release-version of P2P OS (i.e. with youtube promo, kickstarter project, call-a-friend, and whatever else it will take to get them wheels turning). The stars aligned, the time has come, let the fun begin!

About the patent thingie, that's just for playing safe, be cool :) - this project will be open source after all ("networking for the masses", remember?).

Wednesday, February 20, 2013

WebRTC starts flexing its muscles

During the past two years since i started working on P2P OS there has been some significant progress on the WebRTC project which two years ago looked more like a statement of intent than anything else; and since WebRTC is backed by big players like Google, Mozilla, and Opera (with Microsoft notably missing from this lineup after it flushed $8 billion down the drain last year on Skype and it's now pitifully crying foul and trying to screw things up again), it might eventually turn into a viable P2P solution that could make P2P OS rather redundant. However, while a side-by-side comparison between P2P OS and WebRTC does have some merit, i think the case for which of the two will come on top (in terms of quintessential project goals) is far from being settled at this point in time, mainly because WebRTC pays little to no attention to properly dealing with the very real (and critical) problems that the real-world internet topology of today and tomorrow pose to P2P connectivity: in a nutshell, WebRTC opted for a conservative SIP-like technology wherein it falls back to using a network of dedicated TURN relay servers whenever a direct P2P connection between two nodes cannot be made, with little consideration to the fact that such a relay server network requires some big-pockets "sponsors" that can throw in enough ca$h to keep it up and running (e.g. Viber pumps some $2.5 million a year to keep its rather small-ish network up and running), and i think it's very likely that users will be forced to have/get a Google/Mozilla/Opera/M$/whatever account in order to use the service.

Alternatively, P2P OS aims at creating a self-reliant P2P network which is meant by design to gracefully navigate both the rough waters of the current IPv4 exhaustion and IPv4-to-IPv6 transition, and the promised shiny days of a P2P-friendly IPv6-only world of the decades ahead. Also, the scope of P2P OS is not restricted to point-to-point communication between nodes; instead, its design goal is to provide a generic foundation for content-centric networking (a.k.a. named data) where point-to-point communication is only one of many use case scenarios.

To rise, while i can see WebRTC as a serious potential competitor to P2P OS, i think abandoning P2P OS because of WebRTC would be premature at this point in time.