* handle partially quantized models
- fix for #53#71#69#74
- in order to test the models
- I added a default prompt of an appropriate form
- while working on the model configuration also added additional stop tokens (#74)
- fixed the repetitionPenalty code (#71)
- remove async llm generation -- this is just doubling our work
- and does not match the style used in the example applications
- package generation parameters into a struct
- refactor command line arguments into distinct pieces based on their use
- this will be reusable in the lora commands
- handle loading models with different names for the safetensors files (gemma)
- handle merge tokens that can't be split
- organize code into Load/Evaluate