Files

David Koski 9d74afd119 handle partially quantized models (#76 )

* handle partially quantized models

- fix for #53 #71 #69 #74
- in order to test the models
	- I added a default prompt of an appropriate form
	- while working on the model configuration also added additional stop tokens (#74)
- fixed the repetitionPenalty code (#71)

2024-05-28 16:35:11 -07:00

Assets.xcassets

implement LoRA / QLoRA (#46 )

2024-04-22 09:30:12 -07:00

Preview Content/Preview Assets.xcassets

implement LoRA / QLoRA (#46 )

2024-04-22 09:30:12 -07:00

ContentView.swift

handle partially quantized models (#76 )

2024-05-28 16:35:11 -07:00

LoRATrainingExample.entitlements

implement LoRA / QLoRA (#46 )

2024-04-22 09:30:12 -07:00

LoRATrainingExampleApp.swift

implement LoRA / QLoRA (#46 )

2024-04-22 09:30:12 -07:00

README.md

implement LoRA / QLoRA (#46 )

2024-04-22 09:30:12 -07:00

README.md

LoRATrainingExample

Example application that:

downloads the mlx-community/Mistral-7B-v0.1-hf-4bit-mlx model from huggingface
loads the train/valid/test data from $SRCROOT/Data/lora (this is copied into the build but you can imagine how it might be downloaded)
adds LoRA adapters and trains the model
let's you evaluate a prompt against the model

This roughly equates to the command line example in Tools/llm-tool and you can read more about LoRA there.

This evaluates the LoRA adapted model rather than a fused model. This doesn't persist the LoRA weights or the fused model -- it will retrain it each time the program is launched.

Troubleshooting

The mlx-community/Mistral-7B-v0.1-hf-4bit-mlx model requires a little over 4G of memory to load an train -- this may require ~6G of physical RAM.