@luxe Thank you for your feedback. Yes I totally agree with you: the more RAM you have the faster it'll be. I reorder the generated plots in the CPU RAM buffer before writing 4096 times to the final file, no matter what the PLOTS_NB is.
Another tweak is to ensure the CPU RAM buffer is envenly dividible by the GPU RAM size. Thus the graphic card won't be put on hold while the writing occurs.
direct mode improvements are on the 4.1+ version. I updated the plotter to v4.1.3 to support CentOS (6 & 7) build (there is even CentOS binaries compiled against CUDA 8.0 SDK on my repository now).
About your mixed AMD/NVidia environment, there is no runtime inter-compatibility between AMD and NVidia cards for now.
Nevertheless, you can build two plotters: one with the
libOpenCL.so from the CUDA SDK, and one with the AMD SDK. You won't be able to plot to one single file with your two cards together but it's not the best choice anyway if you use the