Thursday, January 15, 2004
Make it better
Here's an idea. I'm going to try to draw out a sort of peer to peer backbone network protocol for generic use, something that can perform well and self-exist, with no master server, nat transparent, and with swarming (like bittorrent). The idea is to get the highest bw/lowest ping near the top of the mesh-tree, while not loading all of the load on one node. The major concept points are outlined below.
- Bandwidth devoted is in terms of connections. Each connection is assumed to take up roughly 1/2 kb/s, directly. This 1/2 is seen as 1/4 coming up and 1/4 passing through. The bandwidth of a node is only what they can devote, and is only checked at eval time. If a nodes bandwidth changes, this is reflected at eval time. A node should NOT devote more than 50% of its real bw to the network as connections.
- Data passing through a node is for location of wanted content only. Major passings of data should not be pushed through the tree, or else the design collapses.
- Low ping high bw nodes go closer to the root. The root node should have a low ping, but should also have the highest bandwidth possible. Generally, whatever node with a ping lower than 100 to its nearest child and with the highest bandwidth should be the root node. Note that if this appears near the bottom, it will take a while to get moved to the top; but all networks start with two nodes, and so the shifting around should remain balanced.
- Nodes are inserted into place by bandwidth. When a node is first inserted, it does not effectively appear in the network until it is properly placed. First, it exchanges ping timings with its nearest bandwidth matches. These ping timings are used to instant eval where the node should move. If it has too low of a ping (say, it has 50 connection bw, but a ping of 780mS), it is moved down a level, and exchanges ping timings with the lower bandwidth nodes.
- Bandwidth can be retracted on the fly, but it must be done in a gentle fashion for the design to work. When a node wishes to retract bandwidth, it lowers the amoun of bandwidth it is supplying, but continues to allow connections through. It must wait for the eval packet to reach it for its bw restrictions to be activated.
- Client bandwidth can also be retracted. Clients using the node for nat traversal reflection can be notified that they should find another node. Their bandwidth can be constricted, but should not be dropped. Warnings can be issued if the connection will be severed, but the warning must have a duration of 2 minutes before severence occurs.
- Clients behind nats can signal eachother via the network, and then choose a node that is available as a reflection node via the network. Nodes must mark themselves as reflection nodes for this, and the reflection uses part of their non-node-connection bw.
The trick here is that a regular tree won't do it:
o o o o o
Notice that the massive load of about half the network goes onto the root node (at the top; this is an upside down tree). What I want to do is more of a concentric ring tree, or "Cast Tree." A "CAST TREE" is instigated like a cast system; the lower nodes all form a gorup, then the upper ones form their own group, and so on.
| | |
| |-----o----| |
| | | | |
o | o | o
| | | | |
| |-----o----| |
| | |
We call the center cast 0, and each cast going outward goes up one. So, there are 3 casts here: 0 (1 member), 1 (2 members), and 2 (5 members).
The highest in each cast talks to the second highest in the next cast, and so on down. An attempt to maintain as many connections as posible to the next cast (ring) is put into place. Communication inside one's own cast is done by simple relay (it's a gigantic ring). A packet need only go up to the next cast one time, and then is marked as relayed. When a packet reaches a cast, it splits in both directions. When it is sent up (or down) to the next cast, the copy at the node uploading is marked. If the other one is sent upward/downward, it too is marked. One of two things happen with this then: either the node in that cast has not received the packet, and splits and continues the process; or the node has gotten it, and kills it. Packets are all ID'd, so it is actually advantageous for the packet to hit the upper cast in two points and thus cover it in twice the speed (assuming they are at opposite ends of the cast, and the cast is perfectly balanced).
A packet is marked with direction it must go as well. If it came from down, it is marked to go up; if it came from up, it is marked to go down; if it came from the current cast, it is marked to go both ways. It is unmarked after delivery in said direction. So, if the lower node in cast 1 of the above diagram originated a packet, it would go up to cast 0, down to cast 2, and be marked as not needing to go up or down. Then it would go directly to the upper node in cast 1, and stop circulating cast 1. If a packet originates at cast 2 in the bottom left node, then it will be marked for up and down, go right and up, and circulate around until it hits the nodes that both have a copy and is killed.
All packets need a time to live (about 5 minutes should suffice), a creation time, and an ID. The time to live is how long the nodes will hold record of relaying a packet once they get it, so that they can kill any other copies coming to them. The TTL and Creation Time both designate at what point the packet will be killed regardless. The packet ID is generated using the originating node's MAC address and the date and time that the packet is sent, to be universally unique.
This kind of design, once refined down, should create a network scalable enough to handle almost anything. Heirarchial casting will be my design of choice, however; casts can be made of subcasts so that they scale better. This will be done by only allowing a cast ring to get so large, and then stacking the newer nodes on top of the old ring (not at a higher and lower level, but extruding "out" of the screen in the above diagram), thus creating a second ring at the same cast level, which follows the same interface rules with its above and below subcasts as the entire cast follows with the lower and higher casts. The entire cast itself would decide which cast members attatch to the higher and lower cast; but the higher and lower casts will see these as single casts.
What do you think? Refinable basic idea? It should provide high scalability even with many slow nodes on the outer casts (i.e. dial-up), because those casts will be huge, but will not be one gargantuant ring. Passing around a ring of 10 stacked 10 high takes less time than passing around a ring of 100 (9 relays vertical, 4-5 horizontal per ring, so after 14 relays, all 100 would have the data; as opposed to 50 relays in a single cast ring).