- remove async llm generation -- this is just doubling our work - and does not match the style used in the example applications - package generation parameters into a struct - refactor command line arguments into distinct pieces based on their use - this will be reusable in the lora commands
llm-tool
See various READMEs:
Building
Build the llm-tool scheme in Xcode.
Running (Xcode)
To run this in Xcode simply press cmd-opt-r to set the scheme arguments. For example:
--model mlx-community/Mistral-7B-v0.1-hf-4bit-mlx
--prompt "swift programming language"
--max-tokens 50
Then cmd-r to run.
Note: you may be prompted for access to your Documents directory -- this is where the Hugging Face HubApi stores the downloaded files.
The model should be a path in the Hugging Face repository, e.g.:
mlx-community/Mistral-7B-v0.1-hf-4bit-mlxmlx-community/phi-2-hf-4bit-mlx
See LLM for more info.
Running (Command Line)
Use the mlx-run script to run the command line tools:
./mlx-run llm-tool --prompt "swift programming language"
By default this will find and run the tools built in Release configuration. Specify --debug
to find and run the tool built in Debug configuration.
See also:
Troubleshooting
If the program crashes with a very deep stack trace you may need to build in Release configuration. This seems to depend on the size of the model.
There are a couple options:
- build Release
- force the model evaluation to run on the main thread, e.g. using @MainActor
- build
Cmlxwith optimizations by modifyingmlx/Package.swiftand adding.unsafeFlags(["-O"]),around line 87
Building in Release / optimizations will remove a lot of tail calls in the C++ layer. These lead to the stack overflows.
See discussion here: https://github.com/ml-explore/mlx-swift-examples/issues/3