I’m upgrading my OpenGL engine to work in larger batches.
For dynamic object instanced rendering, I upload “packed” (32 byte) individual transform data through SSBO. I then run a shader to unpack the transform data and to do matrix multiplications and finally I’m using instance rendering to draw (I know I can do better with indirect draw calls and through more aggressive packing… these would be my next steps)
What I found out through experimentation is that the bottleneck is CPU->GPU SSBO submission. I watched the talks by John McDonald and Cass Everitt and I’m …