machine-learning

an archive of posts in this category

May 04, 2026	Why my INT4 and INT8 KV cache quantization gave bitwise-identical perplexity
May 03, 2026	T4 GPU + Llama: why your attention OOMs at 16K and the one-line fix