Benchmark do Kimi K2.6: forte em código, ainda inconclusivo para raciocínio geral | Resposta