How to use Alpaca-LoRA to fine-tune a model like ChatGPT

rishsriv · on March 23, 2023

This looks fantastic. Will try replacing our current fine-tuned FLAN-UL2 model with this.

I wonder how the devtooling around this will evolve. Seems like a matter of days until someone creates a GUI wrapper around this, and obviates the need to use programmer time for fine-tuning

anymoonus · on March 23, 2023

I'm curious, what are the differences between T5, Flan-T5, and Flan-UL2 for fine-tuning? Does the instruction tuning matter at all, once you're fine-tuning?

isoprophlex · on March 23, 2023

Low-rank adaptation (LoRA) ... has some advantages over previous methods:

- It is faster and uses less memory, which means it can run on consumer hardware.

- The output is much smaller (megabytes, not gigabytes).

- You can combine multiple fine-tuned models together at runtime.

This is great news for my dream of building a fine-tuned interactive messenger, that can deliver a message on my behalf by training it on my personality & the information I want to convey.

Now just add text to speech and a talking head, as discussed in that other submission about cloning yourself with AI... https://news.ycombinator.com/item?id=35280418

elorant · on March 23, 2023

And then you hook it up to hundreds of dating apps and it just does the boring job of making introductory chat presenting you at the end with only the women who are interested in a real date.

gravitronic · on March 24, 2023

My friend was doing this with mIRC scripts on our city's DALnet room in 2001. His bot hit up random users, asked a/s/l, and if they were female would text to speech "BabyGurl18 is female".

So we'd be watching a movie, he'd have his computer speakers turned up in his room, his bot would announce and he'd get up and leave, already in conversation

speedgoose · on March 23, 2023

The problem of this strategy is that your dates may realise that you are not talking as well as a good dating LLM once they meet you.

yieldcrv · on March 23, 2023

> your dates

> meet you

Everything you said is a net positive for very many people

speedgoose · on March 23, 2023

Disappointing many people in person isn’t very positive from my point of view.

jrumbut · on March 24, 2023

It was bound to happen to anyway, at least we can make it more efficient

nl · on March 24, 2023

And then someone uses it to create lots of fake people to populate the app so real people buy more subscriptions.

stavros · on March 24, 2023

When you consider what will happen when everyone on dating apps is an LLM, you see the issue: in the end, everyone is "interested" in everyone who swiped right.

TehCorwiz · on March 23, 2023

I'm reminded of this scene from the "I, Robot" movie: https://www.youtube.com/watch?v=ZKxr0wyIic4

cal5k · on March 23, 2023

What if you died but your chatbot didn't know?

zer00eyz · on March 23, 2023

Something like this is a theme in the book Idoru by William Gibson. What happens to the data of a dead person is one theme. Marriage to an AI another.

BoorishBears · on March 23, 2023

This is an idea I've been mulling over.

Maybe software running on your PC to capture everything you type, a voice transcriber that filters out your voice specifically and records that, and you've got a dataset that covers a lot of who you are.

Fine tune a model on that and boom, you're "immortal", and as LLMs get better and better, the fidelity of "you" gets better and better.

waboremo · on March 23, 2023

All the downsides of immortality without any of the fun. As expected from this timeline.

BoorishBears · on March 23, 2023

No downsides or fun for yourself, the main use I could see for it would be for something like being able to "talk" to your great great great grandpa one day.

It's like home video on steroids

waboremo · on March 23, 2023

I suppose that is the one retaining upside for the other person when it comes to immortality. But for the individual, none of the immortality fun remains. You aren't meeting or talking to them as you would in "normal" immortality, you are still dead and don't know if they even exist, you don't even get to ensure the right things are passed down.

whoknows234 · on March 24, 2023

So why write letters, time capsules, or leave any other sort of mementos for anyone ?

cloudking · on March 23, 2023

https://www.rewind.ai/

camdenlock · on March 23, 2023

> The weights for LLaMA have not yet been released publicly. To apply for access, fill out this Meta Research form.

Cute. ;)

k_eshav · on March 23, 2023

iirc someone posted the weights to a torrent. you can look it up. :)

isoprophlex · on March 23, 2023

I grabbed it shortly after the leak and left it on for a while. Never seeded that much data on a single torrent. This is some screaming hot data.

tysam_and · on March 23, 2023

LoRA has actually been around for a little while! I first saw it when it became popular in fine-tuning models quantized down to about 8 bits or so. I'm sure it's doing stuff in the 4bit range now! :D

I believe it's a core toolbox piece of tech required to really push the limits of LLMs either in original training or in inference. Similar sort of to how batch norm was for convolutional neural networks. I look forward to seeing how this will be applied in the future.

syntaxing · on March 24, 2023

The easiest way to run alpaca Lora locally is with this little known fork [1] that uses Docker. You’ll be up and running in about 20 min with pretty much any modern consumer Nvidia GPU.

[1] https://github.com/chris-alexiuk/alpaca-lora

mnreef · on March 26, 2023

Hi All, I have a noob question. I have been reading about Alpaca and Alpaca Lora. I have a use case in which I want to fine tune/train Alpaca Lora on a large corpus of books which are in the txt format. I know for Alpaca, the data was in "Instruction : Prompt" format. however, my text is huge and is not in that format. It's simply a library of books and journal articles. I want to be able to ask a question and the model answers based on the books I trained it on. I also want to be able to ask general questions for example which books discussed topic x or y.

I have tried OpenAI's API to create embeddings, but I want to use Alpaca.

I really appreciate your help.

braingenious · on March 23, 2023

I love these idea of LoRAs for LLMs.

Has anybody made a llama/alpaca erebus model? I read about them in the oobabooga docs and a locally-run language model fine tuned on literotica could be the funniest thing I’ve ever seen.

credit_guy · on March 23, 2023

I guess this LoRA is the missing piece.

NVIDIA stated recently that GPT bots will become one million times more powerful in ten years. Many people doubted that.

With LoRA, I see a much higher improvement. These guys claim a 10000 times reduction in parameter size. A different way to look at it, is that with the current hardware you can train a model that has 10000 times more parameters. If you add a 100x improvement in hardware in 10 years (not at all unrealistic), that's the million. But we will have significant improvements in training methods too.

flangola7 · on March 23, 2023

Where do you find 10,000 more data?

credit_guy · on March 24, 2023

You don't need 10,000 more data to train a model with 10,000 more parameters. You can use the same data and the model will perform better. Much better. The problem is that currently it is nearly impossible to run and train the model with more parameters.

ChatGPT has nearly 200 billion parameters. GPT-4 we don't know, there are rumors that it has 100 trillion parameters, but they are probably unfounded. In any case, we've seen how much more powerful GPT-4 is. Imagine a GPT-5 with 1 quadrillion parameters.

And then imagine that after you've trained it to some reasonable level, you "downsample" the parameters, using the SVD approach described in LoRA, and get a GPT-5 with the same 200 billion parameters as ChatGPT, but with many, many times more power, even than GPT-4.

noonething · on March 25, 2023

Parameters here are just like object attributes, correct?

nathanasmith · on March 24, 2023

Audio, video, and LLMs enabled for real world interaction will pave part of the way for sure.

flangola7 · on March 24, 2023

They have already ingested most of the internet and all popular and semi popular books. If your plan doesn't involve every person on Earth wearing a go pro and uploading it to OpenAI you will have difficulty finding 9,999 more internets and libraries of congress.

evilduck · on March 24, 2023

Seems like Google might have the advantage, with all of YouTube to mine.

flangola7 · on March 24, 2023

Is that really a Google advantage? Anytime can watch or download YouTube videos.

nico · on March 23, 2023

Can a model be fine-tuned “online”?

If cost wasn’t an issue, could I fine-tune a model in real time, while also using it for inference?

joshka · on March 23, 2023

Stats from https://replicate.com/blog/replicate-alpaca suggest 90 minutes on 4 x A100/80GB. This is 3.5 hours on A100/40GB - I'm guessing LoRA is probably parallizable.

nl · on March 24, 2023

Yes and no.

The training process is modifying the network weights. These are usually written to copies of the file instead of overwriting it (because what if the loss is actually worse after an epoch of training?)

But there's nothing stopping inference from occurring on a model that is being trained.

all2 · on March 24, 2023

Stats from TFA say 3 hours to fine tune on an A100 processor.

all2 · on March 24, 2023

I wonder if one could "fine tune" specific layer values... That might be faster than updating every weight in every layer.

nl · on March 24, 2023

LoRA is an alternative to traditional fine tuning (which is usually done on specific layers as you mentioned).

To quote the LoRA paper[1]:

> We hypothesize that the change in weights during model adaptation also has a low “intrinsic rank”, leading to our proposed Low-Rank Adaptation (LoRA) approach. LoRA allows us to train some dense layers in a neural network indirectly by optimizing rank decomposition matrices of the dense layers’ change during adaptation instead, while keeping the pre-trained weights frozen

It's truly revolutionary: It basically lets you create a very small "diff" which you apply yo an existing model and it is suddenly fine tuned. These diff models are very small (5M for example).

[1] https://arxiv.org/abs/2106.09685

nico · on March 24, 2023

Or “multiplex” fine-tuning with inference, ie. do fine-tuning for 100ms, inference for 100ms, then tuning again… etc

Btw, is there a way to combine two or more models?

So for example, if I create 5 copies of a model, then fine-tune each copy with a different dataset -> can the 5 datasets be merged together somehow to create a model that has the learning of the 5?

rcarmo · on March 24, 2023

So they use cog before installing it? Apparently this wasn’t proofread.

Also, is it just me or there are currently more ways to run LLMs on a CPU than on a GPU springing up on GitHub? I have hacked my own, but my chat UI is awful, so what is the nicest, pre-packaged CUDA-friendly way to run this now?

eachro · on March 24, 2023

How does LoRA save more than 50% of the memory usage? I see that the weight updates have much lower memory footprint by virtue if being low rank. But you still need the dense weights for the forward pass dont you?

leereeves · on March 24, 2023

I'm not an expert, but I believe it only saves memory in the final model, after training is done, by merging the low rank LoRA wrapper matrices with the original weight matrices.

For example, if an original layer has N inputs and outputs (an NxN weight matrix) LoRa adds a 16xN matrix before it and an Nx16 matrix after it, trains only those new matrices, and finally multiplies all three matrices to get a single 16x16 matrix.

slicktux · on March 23, 2023

Anyone else click on this thinking it was about the wireless protocol?

y3sar · on March 23, 2023

Both are fascinating tech

techn00 · on March 24, 2023

It feels like I'm living in a cartoon with all these terms: > In this blog post, we’ll show you how to use LoRA to fine-tune LLaMA using Alpaca training data.