May 04, 2026 Why my INT4 and INT8 KV cache quantization gave bitwise-identical perplexity May 03, 2026 T4 GPU + Llama: why your attention OOMs at 16K and the one-line fix