EASIEST Way to Fine-Tune a LLM and Use It With Ollama
FULL TRANSCRIPT
you want to fine tune your large language model and run it locally on your machine
using Ollama.
Well, in today's video, we're going to do exactly that.
So let's go.
First, for the fun part, just finding the right data set.
The reason why finding the right data set is so important is when you train
a small, large language model with a data set
that is relevant to the task you're trying to do.
It can actually outperform large models.
What I'm going to be doing today is creating a small, fast LLM
that will generate SQL data based off of table data.
I provide it.
One of the biggest data sets to do this with is called synthetic text to SQL,
which has over 105,000 records split
into columns of prompt SQL content, complexity, and more.
Im running a Nvidia 4090 GPU,
so I'm going to be fine tuning this on my machine using ubuntu.
If you don't have a GPU, feel free to do this using Google Colab,
which allows you to run training code in the cloud.
The great news is that this project
does not require a lot of complex hardware to get it up and running.
We're going to be using Unsloth
which allows you to fine tune a lot of open source models
really efficiently, with about 80% less memory usage.
And we're going to be using Llama 3.1 which is an LLM used for commercial and
research purposes, especially in English, and has really high performance.
Make sure that you have Anaconda
installed on your machine as well as the Cuda libraries.
I will be using Cuda 12.1 and Python 3.10 for this project.
You want to install
the dependencies required by Unsloth which you can find in the Readme.
But for simplicity, here it is.
This creates a new environment for us
and installed PyTorch Cuda libraries as well as the latest unsloth.
You'll also want to install Jupyter if it isn't there already,
and then run your Jupyter notebook.
And now you're done with the setup.
So let's go into the Jupyter notebook and get started.
First, we want to make sure that all the installed requirements are actually there.
If you're using Google Colab, this command will install the packages.
Next we're going to import the fast language model by Unsloth.
Here we're specifying that we want to use the Lama three eight bit model.
We also want to set up a max sequence length of 2048 tokens.
This means that the model will only consider up to 2048 tokens,
where a token can be a word, subword character, or even punctuation.
When processing or generating text, and we'll set load in four bit to true,
which essentially means we're using less bits, as opposed to using the typical 16
or 32 bits to represent the information in the model.
Doing this is going to help you
reduce memory usage and also reduce the load on your machine.
After running this,
you're going to get a cute Ascii image, and that means that your model is loaded.
After this,
we're going to load in the PEFT model, which is basically Lora adapters.
If you don't know what these terms mean, that's totally fine.
Basically, the LORA adapters mean that
we only have to update 1 to 10% of the parameters in this model.
Without them, it means that
we would have to retrain the whole model, not just a small portion,
which takes a lot of time, energy, and even money.
Unsworth provides this here with the recommended settings.
I trust them, so feel free to read each comment.
Now this is where things can get a little bit tricky depending on what data
set you're using.
The each data set comes different from each other,
but they're each formatted in the same way such that the large language model
can understand it.
Llama three uses alpaca prompts which look like this.
Now, if you remember our data set, it is not as easy as just plugging it in
and letting it go off to the races.
I have to format my response first before plugging in the data.
I'm only interested in the SQL of my database.
The prompt I will be asking for, as well as the generated code and explanation.
So I'm going to update my code to reflect this.
Now we set up the training module to supervise to fine tuning.
Trainer by hugging face is what I used.
There are a lot of parameters to use all that can be described in their own video.
So for example, have max steps which tells us how many training steps to perform.
Seed is a random number generator.
We used to be able to reproduce results and warmup steps gradually increases
learning rate over time.
So now that we have everything set up, let's run it.
And that's it.
Your model has been trained.
Now before we move on,
we actually need
to convert this into the right file type so that we can run this locally
using a llama.
Luckily, onslaught has a one liner we can use to do this.
After this is done, we only need to do one step to be able to run this with Allama.
First, open up your terminal.
I'm using the warp terminal here.
Go to the path of where the file is saved.
Then create a file called Model file and open it up in the code editor.
This is Ollamas Docker like file configuration
where we can create new models with specific parameters.
In our model file we're going to put a prompt.
So something like you're an SQL generator that takes a user's query
and gives them helpful SQL to use.
Finally make sure Olan was running.
And then we're just going to run this command.
This command will then read all the items in the model file you just created,
and start using llama Dhcp under the hood to make sure that the model runs
on your machine. And congrats!
You can now use your fine tuned LM locally,
all with the OpenAI compatible API and more in your applications.
If you're
curious to know more about Alama, we do have a two minute video
out about everything you need to know about Alama here.
Otherwise, thank you for watching and I'll see you next time.
UNLOCK MORE
Sign up free to access premium features
INTERACTIVE VIEWER
Watch the video with synced subtitles, adjustable overlay, and full playback control.
AI SUMMARY
Get an instant AI-generated summary of the video content, key points, and takeaways.
TRANSLATE
Translate the transcript to 100+ languages with one click. Download in any format.
MIND MAP
Visualize the transcript as an interactive mind map. Understand structure at a glance.
CHAT WITH TRANSCRIPT
Ask questions about the video content. Get answers powered by AI directly from the transcript.
GET MORE FROM YOUR TRANSCRIPTS
Sign up for free and unlock interactive viewer, AI summaries, translations, mind maps, and more. No credit card required.