Hacker News

PhilipTrettner

How Much Linear Memory Access Is Enough? solidean.com

PhilipTrettnerop3 days ago

I looked into this because part of our pipeline is forced to be chunked. Most advice I've seen boils down to "more contiguity = better", but without numbers, or at least not generalizable ones.

My concrete tasks will already reach peak performance before 128 kB and I couldn't find pure processing workloads that benefit significantly beyond 1 MB chunk size. Code is linked in the post, it would be nice to see results on more systems.

twoodfin4 hours ago

Your results match similar analyses of database systems I’ve seen.

64KB-128KB seems like the sweet spot.

_zoltan_2 hours ago

is this an attempt at nerd sniping? ;-)

on GPU databases sometimes we go up to the GB range per "item of work" (input permitting) as it's very efficient.

I need to add it to my TODO list to have a look at your github code...

PhilipTrettneropan hour ago

It definitely worked on myself :)

Do have a look, I've tried to roughly keep it small and readable. It's ~250 LOC effectively.

Also, this is CPU only. I'm not super sure what a good GPU version of my benchmark would be, though ... Maybe measuring a "map" more than a "reduction" like I do on the CPU? We should probably take a look at common chunking patterns there.

source