@zero24x the stagger is dependent on a few factors;
- target file size
- available memory
- IO strategy implemented
granularity = floor ( <target_size> / <memory> )
or, when using gpuplot -buffer
granularity = floor ( <target_size> / <memory> / 2 )
the stagger value directly relates to the amount of memory allocated to the plotting process. Nonces are computed sequentially (with 4096 sequential scoops in them), but writing a file with stagger 1
produces a 256 MiB ( 1024 * 4096 * 64Bytes) file that stored nonces like this:
0,1,2,..,4095 (1024x total)
which is very unfortunate for mining, as a lot of seeks have to be performed.
While mining, we're interested in a specific scoop in all nonces - this hardens Burst against GPU/ASIC attacks.
Therefore, the perfect organization of this example file would be
This organziation allows the miner process to read 1024 scoops needed for the particular block to be solved in one go, a sequential read of 1024 * 64 Bytes = 64 KiB, as opposed to 1024 * 64 Bytes and 1023 seeks in between (head movement).
So the plotter process computes as many nonces as fit the configured memory limit. Upon writing into the file,
- all scoops 0 are collected and written sequentially,
- repeat for the remaining 4095 scoops.
if you allocated 32 MiB to the plotter, the internal structure of the file is:
... (repeated for a total of 8 times).
The plotter can therefore only plot a multiple of 128 nonces (in this example).
gpuplot has another "tweak" - it uses twice the memory for a shadow copy to allow for parallel computing and file I/O.
If your stagger value is less than, say, 8 MiB (131,072), your disk may not operate optimally because it spends more time seeking then reading. You then need to optimize your file, that is: reorganize from
But there is a variant on this, and it trades compute power against IO
using the gpuplot terminology: you may plot in "direct" or "buffere" mode.
"buffer mode" was example 1 above.
"direct mode" computes file-length times the scoop 0 and writes them out.
Then it computes file-length times the scoop 0 to 1, discards 0, writes all scoop 1 out.
Repeat until finished (4096 scoops).
The problem is, scoops (64 Bytes) can not be computed "individually" but only as part of a whole nonce (4096 * 64 Bytes).
If you plot a 1 TiB file ( 1 TiB = 4,194,304 nonces ) every single scoop needs 256 MiB ( 1 TiB / 4096 ).
If you assign 4 GiB memory to gpuplot, 16 scoops can be computed in one go ( 4 GiB / 256 MiB ).
No double buffering here, as computing is slow - you throw away ~127 of 128 results - you keep only the requested 16 out of computed 4096 values.
(I think it aborts right after hitting the wanted scoop, hence 127/128 and not 255/256, a 50% speedup).
This scenario only makes sense if your IO is a lot slower than computation (most users : single disk and potent GPU).
The plotter can therefore only plot a multiple of 16 nonces (in this example).
another variant is xplotter, which has a fast mode of pre-allocating a sequential file of the target size on NTFS.
wplot (?) pre-allocated a continous file by writing out <target-size> zeroes (to have it physically sequential on disk) and then seeking in this file for writing a fully optimized file (length==stagger).
It then computes nonces as usual (scoops 0..4095), aggregates them according to available memory, and then writes out a fully optimized file by doing the seeks for perfect placement.
If we still have 4 GiB memory for plotting, the granularity is 256 ( 1 TiB / 4 GiB ).