implement LoRA / QLoRA (#46)

* implement LoRA / QLoRA - example of using MLX to fine-tune an LLM with low rank adaptation (LoRA) for a target task - see also https://arxiv.org/abs/2106.09685 - based on https://github.com/ml-explore/mlx-examples/tree/main/lora * add some command line flags I found useful during use - --quiet -- don't print decorator text, just the generated text - --prompt @/tmp/file.txt -- load prompt from file * user can specify path to model OR model identifier in huggingface * update mlx-swift reference Co-authored-by: Ashraful Islam <ashraful.meche@gmail.com> Co-authored-by: JustinMeans <46542161+JustinMeans@users.noreply.github.com>
2024-04-22 09:30:12 -07:00
parent 7e85eb8b88
commit 6c0b66f90a
32 changed files with 3483 additions and 64 deletions
--- a/Applications/LoRATrainingExample/README.md
+++ b/Applications/LoRATrainingExample/README.md
@@ -0,0 +1,21 @@
+#  LoRATrainingExample
+
+Example application that:
+
+- downloads the `mlx-community/Mistral-7B-v0.1-hf-4bit-mlx` model from huggingface
+- loads the train/valid/test data from `$SRCROOT/Data/lora` (this is copied into the build but you can imagine how it might be downloaded)
+- adds LoRA adapters and trains the model
+- let's you evaluate a prompt against the model
+
+This roughly equates to the command line example in [Tools/llm-tool](../../Tools/llm-tool) and
+you can read more about LoRA there.
+
+This evaluates the LoRA adapted model rather than a fused model.  This doesn't persist
+the LoRA weights or the fused model -- it will retrain it each time the program is launched.
+
+### Troubleshooting
+
+The `mlx-community/Mistral-7B-v0.1-hf-4bit-mlx` model requires a little over 4G of
+memory to load an train -- this may require ~6G of physical RAM.
+
+