llm improvements

- document the tokenizer used (https://github.com/huggingface/swift-transformers) - provide a hook for tokenizer configuration, prompt augmentation - this isn't as rich as the python equivalents but it helps a little
2024-03-01 14:46:32 -08:00
parent 599661774a
commit 82f6a969d4
8 changed files with 250 additions and 22 deletions
--- a/Libraries/LLM/README.md
+++ b/Libraries/LLM/README.md
@@ -4,9 +4,22 @@ This is a port of several models from:

 - https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/models/

-You can use this to load models from huggingface, e.g.:
+using the Hugging Face swift transformers package to provide tokenization:

- https://huggingface.co/mlx-community/Mistral-7B-v0.1-hf-4bit-mlx
+https://github.com/huggingface/swift-transformers
+
+The [Models.swift](Models.swift) provides minor overrides and customization --
+if you require overrides for the tokenizer or prompt customizations they can be
+added there.
+
+This is set up to load models from Hugging Face, e.g. https://huggingface.co/mlx-community
+
+The following models have been tried:
+
+- mlx-community/Mistral-7B-v0.1-hf-4bit-mlx
+- mlx-community/CodeLlama-13b-Instruct-hf-4bit-MLX
+- mlx-community/phi-2-hf-4bit-mlx
+- mlx-community/quantized-gemma-2b-it

 Currently supported model types are: