I normally run just 2-3 data threads and max intensity for the CPU threads. It reads 2-3 drives at around 90MB/s each so total throughput is about 300MB/s across the array.
I don't have near enough TB to warrant using the GPU to mine, it's the hard drives that are the bottleneck until you get to crazy high amounts of TBs
I'm using nice workstation class computers, so no real CPU or memory or backplane bottlenecks, at least not at the 30TB I'm at right now.
Could be a LOT of things affecting the throughput, from the way and size you plot the files, to the computer hardware. I'd say try different things until you can find ways to improve it. It takes time.