use memory limit API (#13)
* add buffer cache limit * swift-format * a more reasonable size * add memory stats to command line tool, update to final api * add note about changing models
This commit is contained in:
@@ -73,6 +73,9 @@ class LLMEvaluator {
|
||||
func load() async throws -> (LLMModel, LLM.Tokenizer) {
|
||||
switch loadState {
|
||||
case .idle:
|
||||
// limit the buffer cache
|
||||
MLX.GPU.set(cacheLimit: 20 * 1024 * 1024)
|
||||
|
||||
let (model, tokenizer) = try await LLM.load(configuration: modelConfiguration) {
|
||||
[modelConfiguration] progress in
|
||||
DispatchQueue.main.sync {
|
||||
@@ -80,6 +83,8 @@ class LLMEvaluator {
|
||||
"Downloading \(modelConfiguration.id): \(Int(progress.fractionCompleted * 100))%"
|
||||
}
|
||||
}
|
||||
self.output =
|
||||
"Loaded \(modelConfiguration.id). Weights: \(MLX.GPU.activeMemory / 1024 / 1024)M"
|
||||
loadState = .loaded(model, tokenizer)
|
||||
return (model, tokenizer)
|
||||
|
||||
|
||||
@@ -16,9 +16,25 @@ Some notes about the setup:
|
||||
|
||||
- this downloads models from hugging face so LLMEval -> Signing & Capabilities has the "Outgoing Connections (Client)" set in the App Sandbox
|
||||
- LLM models are large so this uses the Increased Memory Limit entitlement on iOS to allow ... increased memory limits for devices that have more memory
|
||||
- `MLX.GPU.set(cacheLimit: 20 * 1024 * 1024)` is used to limit the buffer cache size
|
||||
- The Phi2 4 bit model is small enough to run on some iPhone models
|
||||
- this can be changed by editing `let modelConfiguration = ModelConfiguration.phi4bit`
|
||||
|
||||
### Trying Different Models
|
||||
|
||||
The example application uses Phi2 model by default, see [ContentView.swift](ContentView.swift#L58):
|
||||
|
||||
```
|
||||
/// this controls which model loads -- phi4bit is one of the smaller ones so this will fit on
|
||||
/// more devices
|
||||
let modelConfiguration = ModelConfiguration.phi4bit
|
||||
```
|
||||
|
||||
There are some pre-configured models in [LLM/Models.swift](../../Libraries/LLM/Models.swift#L62)
|
||||
and you can load any weights from Hugging Face where there
|
||||
is a model architecture defined and you have enough
|
||||
memory.
|
||||
|
||||
### Troubleshooting
|
||||
|
||||
If the program crashes with a very deep stack trace you may need to build
|
||||
|
||||
Reference in New Issue
Block a user