use memory limit API (#13)

* add buffer cache limit * swift-format * a more reasonable size * add memory stats to command line tool, update to final api * add note about changing models
2024-03-05 15:22:12 -08:00
parent 430b464c8d
commit 61105bf0c4
3 changed files with 115 additions and 39 deletions
--- a/Applications/LLMEval/ContentView.swift
+++ b/Applications/LLMEval/ContentView.swift
@@ -73,6 +73,9 @@ class LLMEvaluator {
    func load() async throws -> (LLMModel, LLM.Tokenizer) {
        switch loadState {
        case .idle:
+            // limit the buffer cache
+            MLX.GPU.set(cacheLimit: 20 * 1024 * 1024)
+
            let (model, tokenizer) = try await LLM.load(configuration: modelConfiguration) {
                [modelConfiguration] progress in
                DispatchQueue.main.sync {
@@ -80,6 +83,8 @@ class LLMEvaluator {
                        "Downloading \(modelConfiguration.id): \(Int(progress.fractionCompleted * 100))%"
                }
            }
+            self.output =
+                "Loaded \(modelConfiguration.id).  Weights: \(MLX.GPU.activeMemory / 1024 / 1024)M"
            loadState = .loaded(model, tokenizer)
            return (model, tokenizer)

--- a/Applications/LLMEval/README.md
+++ b/Applications/LLMEval/README.md
@@ -16,9 +16,25 @@ Some notes about the setup:

 - this downloads models from hugging face so LLMEval -> Signing & Capabilities has the "Outgoing Connections (Client)" set in the App Sandbox
 - LLM models are large so this uses the Increased Memory Limit entitlement on iOS to allow ... increased memory limits for devices that have more memory
+- `MLX.GPU.set(cacheLimit: 20 * 1024 * 1024)` is used to limit the buffer cache size
 - The Phi2 4 bit model is small enough to run on some iPhone models
    - this can be changed by editing `let modelConfiguration = ModelConfiguration.phi4bit`

+### Trying Different Models
+
+The example application uses Phi2 model by default, see [ContentView.swift](ContentView.swift#L58):
+
+```
+    /// this controls which model loads -- phi4bit is one of the smaller ones so this will fit on
+    /// more devices
+    let modelConfiguration = ModelConfiguration.phi4bit
+```
+
+There are some pre-configured models in [LLM/Models.swift](../../Libraries/LLM/Models.swift#L62)
+and you can load any weights from Hugging Face where there
+is a model architecture defined and you have enough
+memory.
+
 ### Troubleshooting

 If the program crashes with a very deep stack trace you may need to build