Hi, I wanted to share a simpler but more efficient proof-of-capacity scheme than the one used by Burstcoin. I also feel this alternative approach is somewhat more straightforward and easier to understand.
Say every miner has a unique public key. Now calculate a list of hashes (using any algorithm) of that public key (or a shortened hash of it) concatenated with a simple counter nonce. E.g:
Hash(PK||0),
Hash(PK||1),
Hash(PK||2),
Hash(PK||3),
...
Classify these nonces, based on the resulting hashes, into up to 65536 separate files (or "buckets") named '0000'..'ffff' (hex for 0..65535). The file '0000' would store all the nonces that resulted in hashes that start with '0000', file '0001' would store all the nonces that generated a hash that start with '0001', etc.
The resulting set of files would look something like:
File '0000':
000000f1edfe
000008dfc5a0
0000ababc735
000054e108d1
0000122c90ed
...
File '0001':
0000000063f8
00000024e595
00000005b9c1
00000e64f04b
00000f87f9ee
...
...
File 'ffff':
...
(note the buckets contain the nonces - not the hashes themselves. The nonces are shown in ascending order because they are generated using a counter and thus added in increasing order. The corresponding hashes are not ordered in any way within the bucket)
Using, say, 6 bytes for each nonce, as demonstrated, would allow indexing a potential space of up to 256^6 = 281 trillion nonces, which would cover up to 1.7 petabytes of mined data.
This scheme allows performing 256^2 times (65536x) less hash operations than a pure processor-only miner to locate a winning nonce - by selecting the appropriate bucket and then testing all the nonces in it for a match. This still requires millions of hash operations on each lookup for a terabyte scale mined dataset. However, with a simple trick, that advantage can be significantly improved. Just add one or two additional bytes of the corresponding hash along with the nonce value. So now every entry in the bucket would look like:
[bytes 3 and 4 of the hash][nonce]
Example:
File '0000':
3405, 000000f1edfe
9f29, 000008dfc5a0
5be2, 0000ababc735
1e27, 000054e108d1
a346, 0000122c90ed
...
File '0001':
7ab2, 0000000063f8
323c, 00000024e595
1956, 00000005b9c1
53f3, 00000e64f04b
839c, 00000f87f9ee
...
...
File 'ffff':
...
...
This requires 8 bytes per entry, but now provides an even stronger advantage where a processing-only miner would need to calculate 256^4 = ~4 billion times more hashes than a disk based one, on average (9 orders of magnitude more processing power).
Some notes:
- Since only 8 bytes are stored per hash (regardless of hash length), it means that in terms of storage, this scheme is 4 times more space efficient than the one currently used by Burstcoin (4 times more 'tickets' can be packed in one unit of storage).
- For 1TB of mined data, each one of the buckets would be about 16MB in size, thus on average, only 16MB would be read from disk and only 32 hash operations performed per lookup.
- The entries don't need to be sorted in any special order within their bucket. Testing for a match would simply use linear search (by streaming the data to memory in chunks), and since individual files are not very large, that would be very fast (in the order of several tens to hundreds of milliseconds on a modern hard drive).
- The hash itself can be of any length. E.g SHA-3 can be used, even with a digest size of up to several MB.
- Multiple disks can address different ranges of nonces (every disk would have its own set of buckets).
- The mining process can occur continuously and incrementally in the background. Buckets are append-only files and entries can be read from them as soon as they are added. This allows the initial mining phase to be relatively long (say, up to several weeks or more in time) but provide immediate availability of any tickets generated and thus potential reward for the miner.
If you've noticed any error(s) in my calculations please let me know and I'll try to correct it. I'm also interested on knowing about other alternative schemes that have been proposed?