use memory limit API (#13)

* add buffer cache limit

* swift-format

* a more reasonable size

* add memory stats to command line tool, update to final api

* add note about changing models
This commit is contained in:
David Koski
2024-03-05 15:22:12 -08:00
committed by GitHub
parent 430b464c8d
commit 61105bf0c4
3 changed files with 115 additions and 39 deletions

View File

@@ -16,9 +16,25 @@ Some notes about the setup:
- this downloads models from hugging face so LLMEval -> Signing & Capabilities has the "Outgoing Connections (Client)" set in the App Sandbox
- LLM models are large so this uses the Increased Memory Limit entitlement on iOS to allow ... increased memory limits for devices that have more memory
- `MLX.GPU.set(cacheLimit: 20 * 1024 * 1024)` is used to limit the buffer cache size
- The Phi2 4 bit model is small enough to run on some iPhone models
- this can be changed by editing `let modelConfiguration = ModelConfiguration.phi4bit`
### Trying Different Models
The example application uses Phi2 model by default, see [ContentView.swift](ContentView.swift#L58):
```
/// this controls which model loads -- phi4bit is one of the smaller ones so this will fit on
/// more devices
let modelConfiguration = ModelConfiguration.phi4bit
```
There are some pre-configured models in [LLM/Models.swift](../../Libraries/LLM/Models.swift#L62)
and you can load any weights from Hugging Face where there
is a model architecture defined and you have enough
memory.
### Troubleshooting
If the program crashes with a very deep stack trace you may need to build