* chore: add top p option in llm-tool * chore: wire up the top p with async generate
- handle loading models with different names for the safetensors files (gemma) - handle merge tokens that can't be split - organize code into Load/Evaluate