use memory limit API (#13)

* add buffer cache limit * swift-format * a more reasonable size * add memory stats to command line tool, update to final api * add note about changing models
2024-03-05 15:22:12 -08:00
parent 430b464c8d
commit 61105bf0c4
3 changed files with 115 additions and 39 deletions
--- a/Applications/LLMEval/README.md
+++ b/Applications/LLMEval/README.md
@@ -16,9 +16,25 @@ Some notes about the setup:

 - this downloads models from hugging face so LLMEval -> Signing & Capabilities has the "Outgoing Connections (Client)" set in the App Sandbox
 - LLM models are large so this uses the Increased Memory Limit entitlement on iOS to allow ... increased memory limits for devices that have more memory
+- `MLX.GPU.set(cacheLimit: 20 * 1024 * 1024)` is used to limit the buffer cache size
 - The Phi2 4 bit model is small enough to run on some iPhone models
    - this can be changed by editing `let modelConfiguration = ModelConfiguration.phi4bit`

+### Trying Different Models
+
+The example application uses Phi2 model by default, see [ContentView.swift](ContentView.swift#L58):
+
+```
+    /// this controls which model loads -- phi4bit is one of the smaller ones so this will fit on
+    /// more devices
+    let modelConfiguration = ModelConfiguration.phi4bit
+```
+
+There are some pre-configured models in [LLM/Models.swift](../../Libraries/LLM/Models.swift#L62)
+and you can load any weights from Hugging Face where there
+is a model architecture defined and you have enough
+memory.
+
 ### Troubleshooting

 If the program crashes with a very deep stack trace you may need to build