User:Conseo-Polyc0l0rNet/Trees of Transactions
Contents |
A voter's count engine?
Counting is technically the core of each voting mechanism and the realization of the overall process. Traditionally counting mechanisms are subject to debate and there is a whole scientific discourse about which counting methods (simple majority rule, 50% rule, Schulze-method, Concordet, 2/3-majority and so on) are appropriate and "fair". The paradox is that the counting method is treated as some objective agreement finder, while the intention of the tool is said to help people express their unfiltered opinion. At least there is a certain bias towards organisational schemes in most of these debates, because the method is chosen as being objective for all participants.
Votorola has modelled a delegational tree as the basis for consent and does not suspend this data structure prematurely for some counting mechanism. In fact different counting mechanisms can traverse the tree behind one end-candidate and treat different properties (e.g.: counts as numbers) of nodes (voters), differently. Since every voter is supposed to be in total control of the node representing her, counting decisions "through" her node can be driven by her.
In programming such an approach can be called lazy. The idea behind lazyness is to defer calculating a result until it is needed. A general side effect of lazyness is keeping the original data structure in tact. It is always possible to save all determining history of a computation in a timely manner as a log and recalculate ("replay") any resulting state later.
Requirements for a voter "programmable" count-engine
Openness to voter-defined counting mechanisms
The term programmable here is not meant to be as the action the voter has to do (although she could), but rather as the perspective of openness and freedom towards different voter expressions in their node. While the consensual outcome dictates that you decide upon one candidate, it doesn't tell how you forward your own resources and delegated votes. If conditions have to be fulfilled to draw votes, then it is also open to the voter to accept votes or ignore them. These decisions can be sophisticated and cannot be determined by the tool.
While in many cases this might not be necessary or even reasonable (e.g. to split and keep some votes), in some cases it will be necessary to build sophisticated collective plans of resource flow. It also puts the voter in total control and disqualifies "objective" counting methods.
Correctness
Votorola keeps a log of snapshots for all counts, so voters can already transparently check the counting. Voters are also supposed to confirm votes and resources before they are applied to a plan (agreement is reached).
Still, having different counting methods needs the tooling to have non-conflicting (proven) tree-traversal mechanisms. Not only is this necessary for voters to trust the tool, but is also important to avoid deadlocks and twisted problems in counting.
Defining a functional zipper [1] of the tree would allow the definition of very simple, external functions for node traversal.
Liveness
Collaborative applications often benefit from quick updates. Vote changes should therefore be available quickly and continously to encourage contributors by the very action of the other voters, if possible.
At the moment Votorola has a 4 hour interval of counting, which might be enough for long term legislative plans, but not for every day economical organisation. Votorola's current compromise is that your own changes are shown to you immediatly.
Historical data
Counting in certain intervals defies historical data. If snapshots and vote-logs of a server exist, any historical situation can be reproduced by taking the latest snapshot before the point of interest and applying all events between the time of the snapshot and point of interest from the log.
The problem here is that creating snapshots is expensive as well as reproducing a large chunk of the log on an older snapshot (which in fact is the same thing as producing a snapshot). Therefore historical data seems to be very expensive. Possible solutions to keep computational cost low are needed, especially if users should have easy access to historical data and the development of the current situation of the tree.
Effectivity and efficiency
The algorithms should be both effective and efficient, allowing a slower vote-server to run on cheap consumer hardware, while scaling up both horizontally on server and on network side. If the algorithms traverse the vote tree atomicly, runtime cooperation between many vote-servers over the internet on the same poll might be feasable and not more expensive than only calculating its own trees.
Fast Parallel counting algorithm
To address the requirements outlined above, the simplest conclusion is to be able to reproduce any point in time. We will deal with the efficiency problem after proposing the basic concept.
Pure counting function
To look at the plain problem, we need to define a counting function, which allows to construct any point in time from the plain data, the user vote logs. To do so we have to build a vote tree in memory and count the transactions on the tree.
To reduce this building problem to its core operation, what we want to achieve for quick atomic update to nodes is incremental updates. If we used total counts then we would have to calculate differences when updating the nodes down to the root (end-candidate) by accessing previous counts. Also this basically means we would overwrite data (the sum) which brings all kinds of problems with tree access as well. This complection is resolved by striping the user action to the pure information of novelty it contains. Since this change only needs to reflect the resource that changed, it also contains a lot less data. With incremental transactions on the vote-tree we therefore allow incremental updates to all candidate nodes as well. Logically this simplification will also allow to parallelize the process further on.
Basic Model
After transaction 3:
A (1)(2)(3) sum resources: { +2 Votes, +60$ } / B (2)(3) sum resources: { +1 Votes, +20$ }
Transaction log:
(1)(2)(3)(4)(5)(6) (4) -> C shifts from <null> to A, resource-difference: { +1 Votes, +100$ } (5) -> B resource-difference: { +10$ } (6) -> C shifts from A to B: resource-difference: { +1 Votes, +100$ }
After transaction 6:
A (1)(2)(3)(4)(5)(6) sum resources: { +3 Votes, +170$ } / B (2)(3)(5)(6) sum resources: { + 2 Votes, +130$ } / C (4)(6) sum resources: { +1 Votes, +100$ }
Each voter A, B, C only needs to keep track of the transactions made on him/her. We might split this in different lists (differentiating between personal actions, delegations and removals). Each node only saves a reference to each transaction in its list(s).
To construct the vote count for any voter in the poll at any time we need to have an indexed (balanced tree) access to transaction log by transaction-time (time of creation in serial with ids (Note to myself: This is complecting two concepts and as a consequence doesn't allow to push earlier events from other servers later on, needs some thought as this changes the incremental nature of the process.). We then calculate the tree up to the transaction id and build the count for the voter we are interested in, e.g. B, by summing the list of his/her transactions.
"Counting" step (sequentially)
We don't actually count (as in sum up) here, but rather link the transactions to each voter. This allows to process the data with different counting/summation algorithms (and also to traverse it differently) any time later, since we don't delete the original raw voter data in the tree result, but keep it. Voters are stored in a Map (persistent Map in memory, some queryable format on disk). This registry is created on the fly through the transactions. In each counting step we 1) Enter the tree at the voter triggering the action and add the transaction. 2) Go to each candidate and add the transaction. 3) At the end we mark the transaction as finished 4) Pass to the next transaction.
Complexity
To traverse a tree the complexity in each step depends on the tree depth. For simplicity we assume a roughly balanced tree, although it can be balanced on a very low grade in extreme cases. If we can assume that each candidate has on average at least two voters, than this is already enough to constrain complexity to logarithmic cost.
For a million voters this means log_2(1 000 000)~=20 steps. For a billion voters this means log_2(1 000 000 000)~=30 steps.
(Note: The actual computation of the sum of all items in memory is small and can be cached in a current value cache in the node for the 95% when the request is for the current count.)
(Note: In general this means we can easily lookup a candidate hierarchy in memory.)
Memory cost
We now can take advantage of the incremental nature of the pure process and durably cache count (state) at any point. This means we can snapshot the tree(s) at fixed constant intervals at a balance between snapshotting and runtime memory cost. A replay after some snapshot also only costs at maximum the inverval size. If we assume every reference to a transaction id costs 8 bytes (long) and assume that we snapshot every 10000 transactions, then we have:
For a million voters 20*8*10000 ~= 1.6 MiB For a billion voters 30*8*10000 ~= 2.4 MiB
We assume each node (voter) is accessible through the key of his/her mailish-username. It contains the list(s) for the transactions. If we assume the three mentioned lists we can assume an upper bound of 1 KiB per empty node (including a cached sum and list of direct voters of the previous snapshot value as well as for the last read values of these).
For a million voters 100*1000000 ~= 1 GiB For a billion voters 100*1000000000 ~= 1 TiB
And finally if we assume 1 KiB as an upper bound:
10000 transactions ~= 10 MiB.
We can see that the voter registry totally dominates the problem as it grows linearly with the voters. The algorithm doesn't require all nodes to be in memory to be effective, only those who are affected by the transactions. The problem with access to registry comes rather from the query side access to the different counts and the possible read-access counting algorithms on some revision of the tree. It might be possible to split counting between a set of machines (and therefore reducing the memory pressure). Some thoughts on this are probably needed as GC will hurt even if we had 1 TiB of memory in a single server.
Note: Each snapshot writes the list of all transactions per voter since the last snapshot to disk and saves the snapshot value for the sum (previous snapshot + all intermediate transactions). It also writes a snapshot of the voter registry to disk. With persistent (in memory) data structures this can be done in parallel in the background without copying (and locking), freeing the resources automatically through GC after the snapshot.
Performance
Read access
All read access can happen in parallel, since each query time-stamp can be mapped versus a discrete revision of the tree. If the query time is older than the last snapshot point then disk-read might be needed. We assume for now that this is a comparatively rare case and can be done in batch for cached smoothed timelines in the ui for instance. The cost for each read access of resource(s) is summing all transactions over this resource(s). For an endcandidate this can be in the worse case 10000 floating point operations + memory access to 10 MiB of transactional data. The memory access being the bottle neck we can introduce a further cache for the current value, which is produced for the first request and valid by transaction-id. Note: Since we calculate this cache lazily, we don't need to have it in memory for all voters, only for active ones (ones that are queried).
Note: Snapshotting only needs read access.
Write access
In general write access only needs to update an atomic (persistent in memory) list and doesn't do any computation. It is hence bound by memory access.
Write access is trickier to parallelize. We have to ensure ACID properties for any read access querying the transactional data (durability is actually achieved by writing the transactions to disk before processing them and ACID snapshotting).
If we disciminate between two different types of events, vote-shifts and resource-updates, we can see that resource-updates can flow in parallel, as long as they are not committed (for read access), if they are inserted in the list in correct order (which only costs at at average n/2 steps, where n is the number of parallel jobs). At each point we can only expose a continuous list of transactions. Vote-shifts on the other side change the resource flow (and remove one resources from one place to another) and therefore need a lock on the tree (no resource or other events might be processed and the tree is up-to-date). Each vote shift in itself costs constant time, triggering a removal of its resources at one point and the addition at another allowing parallel flow of resources to begin immediately afterwards. Therefore the locking brings all multithreaded counting to a halt only for a short time.
We will have to see whether this defeats multithreaded writing and whether memory access time is the bigger problem and the writer can in fact be single threaded. This reduction of locking might also be relevant if a distributed algorithm is needed though.