01. LLM Quality
weight 15How capable is the underlying language model for reasoning, follow-ups, and accuracy?
- 0 / 10
- Scripted responses, no real LLM, frequent nonsense.
- 5 / 10
- GPT-3.5-class behaviour, often useful but hallucinates.
- 10 / 10
- Frontier-model behaviour: nuanced reasoning, tool-use, low hallucination.