Files
mlx-swift-examples/Applications/LLMEval
Ashraful Islam a7b2b54f18 LLMEval UI Improvements (#27)
* Feat: LLMEval UI Improvements

1. adds Markdown rendering in the UI
2. Adds init time and token/second stat
3. Minor UI enhancements

* feat: adds a copy to clipboard button for llm outputs

* adds scrollviewreader to sync with main

* ran pre-format to resolve formatting issues

* updates the missing dependency in project definition

* feat: switch between plain text and markdown

adds a segemented picker to switch between plain text and markdown
2024-03-18 09:15:50 -07:00
..
2024-03-01 16:10:00 -08:00
2024-03-01 16:10:00 -08:00
2024-03-01 16:10:00 -08:00
2024-03-05 15:22:12 -08:00

LLMEval

An example that:

  • downloads a huggingface model (phi-2) and tokenizer
  • evaluates a prompt
  • displays the output as it generates text

Note: this must be built Release, otherwise you will encounter stack overflows.

You will need to set the Team on the LLMEval target in order to build and run on iOS.

Some notes about the setup:

  • this downloads models from hugging face so LLMEval -> Signing & Capabilities has the "Outgoing Connections (Client)" set in the App Sandbox
  • LLM models are large so this uses the Increased Memory Limit entitlement on iOS to allow ... increased memory limits for devices that have more memory
  • MLX.GPU.set(cacheLimit: 20 * 1024 * 1024) is used to limit the buffer cache size
  • The Phi2 4 bit model is small enough to run on some iPhone models
    • this can be changed by editing let modelConfiguration = ModelConfiguration.phi4bit

Trying Different Models

The example application uses Phi2 model by default, see ContentView.swift:

    /// this controls which model loads -- phi4bit is one of the smaller ones so this will fit on
    /// more devices
    let modelConfiguration = ModelConfiguration.phi4bit

There are some pre-configured models in LLM/Models.swift and you can load any weights from Hugging Face where there is a model architecture defined and you have enough memory.

Troubleshooting

If the program crashes with a very deep stack trace you may need to build in Release configuration. This seems to depend on the size of the model.

There are a couple options:

  • build Release
  • force the model evaluation to run on the main thread, e.g. using @MainActor
  • build Cmlx with optimizations by modifying mlx/Package.swift and adding .unsafeOptions(["-O3"]), around line 87

Building in Release / optimizations will remove a lot of tail calls in the C++ layer. These lead to the stack overflows.

See discussion here: https://github.com/ml-explore/mlx-swift-examples/issues/3