A brisk, brilliantly coded tutorial on vector quantisation: how far you can push compression on model KV caches and embeddings without breaking what matters. The interactive sliders and diagrams do the teaching before the maths catches up.
Someone’s clearly done this with care. Bravo.