GPU plot generator v4.1.1 (Win/Linux)



  • This post is deleted!


  • @tco42 said in GPU plot generator v4.0.3 (Win/Linux):

    I'm currently using 2* 1080Ti, 2* Xeon E5-2620, 128gb ram and 12* 8Tb HDDs. I can get more than 200k nonce /min when plotting to a single Samsung 960-Pro which should be the maximum those gpus can calculate. when plotting to those 12 HDDs my speed drops to about 50k Nonce /Min. It seems like the GPUs are only working when the hdds finished writing and the HDDs are only writing when the GPUs finished calculating.

    tco42. ~ You mention writing plots to a Samsung-SSD. Is that a large-capacity device, and do you then transfer the completed plots over to your HDD-8Tb workers? Plus, are you able to combine the 2x GPUs via SLI & does that work any better? Also, are you able to plot to several HDD (simultaneously) from those powerful video cards, as others suggest, or is that just not viable?

    Thx.



  • @vaxman

    1. those discs (hgst deskstar 8tb) are all connected to a 24 port sas hba (https://www.broadcom.com/products/storage/host-bus-adapters/sas-9305-24i)
      i tested the maximum speeds of the hba by copying from my first 12 hdds to the second set of 12 hdds which resulted in about 200mb/s per drive -> 2,4gb/s total read and write with all hdds connected to the same hba

    2. I'm currently using ntfs

    @BeholdMiNuggets
    those plots were just some tests to find the bottleneck that limits my plotting.
    you dont need to connect the gpus using sli, there are just minor differences. those gpus should be able to put lots of data to the drives (200k nonce / min = 50gb / min = 20 min/tb) but there is a bottleneck (mb in the plotter, mb in windows) that i'm still trying to find



  • @tco42 said in GPU plot generator v4.0.3 (Win/Linux):

    1. those discs (hgst deskstar 8tb) are all connected to a 24 port sas hba.

    Do you find the speed difference between SAS/SATA makes any difference for Burst Mining? And what about the Hard-Drive Cache?

    (https://www.broadcom.com/products/storage/host-bus-adapters/sas-9305-24i).

    Only see 12x data slots on that [PCIe 8x] DIC. Do they all double up? Presume you also have a dedicated (rack) server case to store all those (24) drives. Any pictures?

    i tested the maximum speeds of the hba by copying from my first 12 hdds to the second set of 12 hdds which resulted in about 200mb/s per drive -> 2,4gb/s total read and write with all hdds connected to the same hba.

    I guess that's why you need Server CPUs. What's the MoBo & Psu?

    1. I'm currently using ntfs

    Is there a possible, superior option that you're considering?

    those plots were just some tests to find the bottleneck that limits my plotting.> you dont need to connect the gpus using sli, there are just minor differences. those gpus should be able to put lots of data to the drives (200k nonce / min = 50gb / min = 20 min/tb). But there is a bottleneck (mb in the plotter, mb in windows) that i'm still trying to find.

    Fortunately, each HDD only has to be plotted once or twice. So, not that burdensome considering the projected working life of the Drives.



  • @BeholdMiNuggets

    @BeholdMiNuggets said in GPU plot generator v4.0.3 (Win/Linux):

    @tco42 said in GPU plot generator v4.0.3 (Win/Linux):

    1. those discs (hgst deskstar 8tb) are all connected to a 24 port sas hba.

    Do you find the speed difference between SAS/SATA makes any difference for Burst Mining? And what about the Hard-Drive Cache?

    (https://www.broadcom.com/products/storage/host-bus-adapters/sas-9305-24i).

    Only see 12x data slots on that [PCIe 8x] DIC. Do they all double up? Presume you also have a dedicated (rack) server case to store all those (24) drives. Any pictures?

    SAS or SATA makes no difference in this case, because i use SATA drives and SAS->SATA adapter cables.

    There are 6 SAS connectors on the broadcom and each connector has 4* SAS3 connections. My "professional" rack case is selfmade :D will post pictures soon

    i tested the maximum speeds of the hba by copying from my first 12 hdds to the second set of 12 hdds which resulted in about 200mb/s per drive -> 2,4gb/s total read and write with all hdds connected to the same hba.

    I guess that's why you need Server CPUs. What's the MoBo & Psu?

    You can use the same PCIe adapter in every mobo with a PCIe x8 slot. It does not need too much gpu power. Main reason to go for xeons is, that i get 2*40 PCIe lanes which are quite useful when i use that server as a workstation/gaming pc when mining is no longer profitable.

    1. I'm currently using ntfs

    Is there a possible, superior option that you're considering?

    EXT4 could be useful when switching from windows to linux.

    those plots were just some tests to find the bottleneck that limits my plotting.> you dont need to connect the gpus using sli, there are just minor differences. those gpus should be able to put lots of data to the drives (200k nonce / min = 50gb / min = 20 min/tb). But there is a bottleneck (mb in the plotter, mb in windows) that i'm still trying to find.

    Fortunately, each HDD only has to be plotted once or twice. So, not that burdensome considering the projected working life of the Drives.

    Yeah, thats a big plus. But i still needed about 120h to plot only 12 drives (96tb) and this server might get upgraded to more than a pb soon :)



  • I understood you have 24 drives.

    How are they organized ?

    • Is it 24 individual filesystems ?
    • Or groups of four disks with one filesystem ?
      • if so, does the LSI do the striping or did you use Windows to stripe them ?

    You're aiming for a Petabyte with Windows, hats off to you !



  • @vaxman
    they are 24 separate volumes mounted inside a single folder (c:\burst\hdd0, c:\burst\hdd1...), but i also tried to mount them to separate drive letters (d:\ e:...) but that didnt change anything...

    I think i will switch to linux soon but windows was faster to setup... could not wait any longer to start plotting ;)



  • @tco42 to how many targets are you plotting concurrently (in parallel) ?

    Don't tell me you plot one drive and wonder why your bandwith does not exceed 200 MB/s.



  • @vaxman i made 2 batches of 12 hdds each... more drives per batch helps a lot, because it wont increase the time needed to generate the empty files (about 13-14h) before the plotting starts in direct mode



  • @tco42 said in GPU plot generator v4.0.3 (Win/Linux):

    @vaxman i made 2 batches of 12 hdds each... more drives per batch helps a lot, because it wont increase the time needed to generate the empty files (about 13-14h) before the plotting starts in direct mode

    once the pre-allocation is finished, to how many targets are you plotting in parallel - 12 ?
    Are these SMR drives ? Direct mode on SMR may or may not be performant. I don't know the specifics of your setup, but lets play with numbers:

    1 drive plotting
    64 GiB RAM for collecting staggers -> for every 64 GiB "round", your disk does this 4096 times:

    • write 16 MiB
    • seek to the next spot

    when plotting 4 drives in parallel (never did this myself, just speculating wildly, but at least the numbers should be correct)
    64 GiB RAM for collecting staggers -> for every 64 GiB "round", your 4 disks do this 4096 times:

    • write 4 MiB
    • seek to the next spot

    plotting 12 in parallel; (12x) 4096x

    • write 1.33 MiB
    • seek

    all scenarios are not perfect for SMR, as the sequentially written data volume is too low. There is virtually no documentation on the required placement and size of the shingles, but anything writing "randomly" is just bad for SMR performance.

    To pull you over to the unix side: Try a copy-on-write filesystem like zfs.
    I've plotted 8x 8TB recently, the SMRs accepted data at 150+ MB/s - because it was strictly sequential.
    I've plotted to PMR drives first, then optimized to SMR.

    For that you'd need twice the number of PMR disks (or stripes) as your number of GPUs to keep it streaming.

    plot tempA while optimizing tempB to target1
    plot tempB while optimizing tempA to target1

    The most economic way would be (striped) old SATA disks, the fastest way would be SSD(s) as temp target.



  • @vaxman
    yes there are 12 targets in parallel

    they are PMR drives not SMR

    i used 96gb ram (8gb per drive, 2mb per write) this might be a problem, but 2mb per seek should be enough to get 800mb/s / 12 drives = 66mb/s on a PMR drive...

    if i understand the code correctly and asume equal gpus and hdds the generated data should be evenly distributed between the drives (CommandGenerate.cpp, line 320). I guess there might be a problem, because i sometimes see single drives idle for a longer time. this is when the gpus stop calculating new plots. i think this could be caused by windows and its thread handling when the right thread does not run fast enough to get to line 404...

    i'm already using zfs, but only on my small freeNAS system with 4*6tb ;) i really like zfs but im not really sure there is any benefit for mining...



  • @tco42

    the generated data should be evenly distributed

    I didn't read the code (wouldn't understand much of it, I guess) but are you sure that computing a nonce (4096*64 Bytes) is context-free ?

    easily checked:
    Make 12 directories on your SSD, plot 12 files of, say 10 GB, in parallel.
    Then scale down to 8,6,4,2 parallel plots and observe the plot time.
    If it scales linearly, you are good to go parallel as much as you like.
    If it gets faster the less you parallelize, the algo is not "context-free" and needs more context than fits into local (gpu) memory.

    Optionally adjust file size to always plot the same amount of data.
    Finding the sweet spot..



  • @vaxman
    i spent the few days trying your suggestions, but it didn't matter how many plots were written to the ssd, it was the same speed (about 200k/min) all the time. Its the same like plotting to ram, since its a quite powerful samsung 960 pro with writing speeds exceeding 2gb/s.

    When the next 24 hdds arrived i tried switching to linux (ubuntu 17.04) and ext4 filesystem and it seems like i was right to blame the problems on windows.

    • ext4 does not need to create the plot file before starting to write to it. this saves about 12-15h at the start of the plot
    • linux seems to handle those threads much better than windows. with the same settings and the same hardware my nonces/min stay at around 210-220k.
    • it does not matter how many plot files are used (as long as there are more than 5, which are required to get enough write speed) the nonce/min stay at above 210k

    After testing for some hours i completely wiped the new hdds and started plotting around 10h ago. Currently 20% are done and there are less than 40h remaining while plotting 24*8tb.

    -> Switching from windows (120h for 12 * 8tb) to linux (50h for 24 * 8tb) is a very good choice if you want to plot in direct mode or to many drives at the same time :)



  • I forgot to mention that i'm mining the first 24*8tb (NTFS formatted, some 8tb plot files, some 1tb plot files) at the same time. Reading speeds are around 4gb/s, roundtimes below 15s even while plotting



  • I updated the GPU plot generator (v4.1.1). It now comes with file pre-allocation when launched with admin rights, which greatly speed things up in terms of IO operations.

    I'm searching for Linux and MacOS owners to test it. I'll provide pre-built binaries soon for these OSes to ease things up.



  • thanks for these updates cryo! I had been using an older version which worked well via buffer method (plots 5TB in about 12-13 hours) but was never able to successfully get the direct mode to work, which is of course preferable. So, I just downloaded 4.1.1 and am eager to try direct mode again to see what happens.

    One question though... I noticed that after downloading both 4.1.1 and 4.1.0 off your github page, the ZIP only contains the .exe file. Is this correct? So, the exe is the only file that changed and if so, then I should just replace the old exe file with the new one and leave all of the other files (dll, bat, txt) intact. Or, do some of these files also need to be replaced? if so, where do i find a download with everything? Not sure what version I am using as I can't see it anywhere.

    Thanks for the help!



  • I changed the build system, so the only files required are the .exe and the [kernel] folder.
    If it doesn't launch due to missing DLLs, install the Microsoft C++ 2015 redistribuables.



  • @cryo thank for the info. I just started a small 500GB plot using 4.1.1 direct mode and it launched fine after I installed the C++ package. However, I wanted to sanity check the write speed as it seems very low. I am only getting about 8-12 MB/sec in direct mode, compared to 80-120 MB/sec using the buffer mode from a previous version. Certainly I expected this to be somewhat slower since it writes plots already optimized, but is 10X slower reasonable? Or maybe I have set something wrong?

    I used the same devices.txt setting (0_0_4096_128_8192) as with buffer mode and the same 20000 value in my .bat to use ~5GB of CPU memory. Any tips would be appreciated.



  • hmmm, and now it dropped down to 2-3MB/s, which means it will never finish at that rate. Been stuck there for more than 10 minutes so there must be something wrong.


  • admin

    @GabryRox In direct mode there is a long delay as it builds the empty file before filling it in - just wait, it'll get faster. As long as it's not an SMR drive .....


Log in to reply
 

Looks like your connection to Burst - Efficient HDD Mining was lost, please wait while we try to reconnect.