More

mchiang · 2025-01-31T17:19:28 1738343968

Can confirm it doesn't. Many Ollama posts get pushed off the front page too despite having hundreds of points. Over time I understood. If they did this for YC companies, it would ruin the trust of HN, YC, and probably the most important to YC companies, the reputation of the startup itself.

zozbot234 · 2025-01-31T18:00:50 1738346450

I assume this is what happens when many HN users just flag every AI- and LLM-related post out of sheer frustration with the reality distortion field around this particular topic.

mchiang · 2025-01-31T16:45:53 1738341953

I’m one of the maintainers of Ollama.

It’s amazing to see others build on top of open-source projects. Forks like RamaLama are exactly what open source is all about. Developers with different design philosophies can still collaborate in the open for everyone’s benefit.

Some folks on the Ollama team have contributed directly to the OCI spec, so naturally we started with tools we know best. But we made a conscious decision to deviate because AI models are massive in size - on the order of gigabytes - and we needed performance optimizations that the existing approaches didn’t offer.

We have not forked llama.cpp, We are a project written in Go, so naturally we’ve made our own server side serving in server.go. Now, we are beginning to hit performance, reliability and model support problems. This is why we have begun transition to Ollama’s new engine that will utilize multiple engine designs. Ollama is now naturally responsible for the portability between different engines.

I did see the complaint about Ollama not using Jinja templates. Ollama is written in Go. I’m listening but it seems to me that it makes perfect sense to support Go templates.

We are only a couple of people, and building in the open. If this sounds like vendor lock-in, I'm not sure what vendor lock-in is?

You can check the source code: https://github.com/ollama/ollama

justinmayer · 2025-01-31T21:40:52 1738359652

These comments would carry more merit if they weren’t coming from the very person who closed this pull request: https://github.com/jmorganca/ollama/pull/395

Those rejected README changes only served to provide greater transparency to would-be users, and here we are a year and a half later with woefully inadequate movement on that front.

I am very glad folks are working on alternatives.

threecheese · 2025-02-01T00:09:55 1738368595

As an outsider (not an oss maintainer, but a contributor), the decline to merge imo was understandable - the maintainer had a strategy and it didn’t fit. They gave reasons why - really nicely - and even made a call to action for PRs placing architecture docs elsewhere. Your response tonally was disparaging, and the subsequent pile on was anti productive. All due respect to your experience as a maintainer; in that role, can you imagine seeing a contribution that you are not interested in, and declining/forgetting to or being too busy to engage, imagining that it might get dropped or made better while you are busy with your priorities?

Putting myself in your shoes, I can see why you might be annoying at being ignored. Suggests this change is really important to you, and so my question would be why didn’t you follow the maintainers advice and add architecture docs?

zozbot234 · 2025-01-31T17:07:11 1738343231

These comments seem reasonable to me. Could you clarify the Ollama maintainers' POV wrt. the recent discussion of Ollama Vulkan support at https://news.ycombinator.com/item?id=42886680 ? Many people seem to be upset that this PR seems to have gotten zero acknowledgment from the Ollama folks, even with so many users being quite interested in it for obvious reasons. (To be clear, I'm not sure that the PR is in a mergeable state as-is, so I would disagree with many of those comments. But this is just my personal POV - and with no statement on the matter from the Ollama maintainers, users will be confused.)

EDIT: I'm seeing a newly added comment in the Vulkan PR GitHub thread, at https://github.com/ollama/ollama/pull/5059#issuecomment-2628... . Quite overdue, but welcome nonetheless!

creesch · 2025-01-31T19:52:38 1738353158

Since you are one of the maintainers of Ollama, maybe you can help me answer a related question. It is great that the software itself is open source, but hosting the models must cost a fortune. I know this is funded by VC money, yet nowhere on the Ollama website or repository there is any mention of this. Why is that?

There isn't an about section, a tiny snippet in a FAQ somewhere, nothing.

mchiang · 2025-02-01T18:04:29 1738433069

We partner with Cloudflare R2 to minimize the cost of hosting. Check out their pricing.

The website is so minimal right now because we have been focused on the GitHub repo.

mchiang · 2025-01-31T15:23:28 1738337008

Ollama has the ability to pull models off Hugging Face as well:

https://huggingface.co/docs/hub/en/ollama

mchiang · 2025-01-21T06:17:24 1737440244

the 671B model is now available:

4 bit quantized: ollama run deepseek-r1:671b

(400GB+ VRAM/Unified memory required to run this)

https://ollama.com/library/deepseek-r1/tags

8 bit quantization still being uploaded

mchiang · 2025-01-21T04:45:27 1737434727

Sorry about that. We are currently uploading the 671B MoE R1 model as well. We needed some extra time to validate it on Ollama.

ipsum2 · 2025-01-21T05:10:27 1737436227

The naming of the models is quite confusing too...

mchiang · 2025-01-21T06:14:32 1737440072

Did you mean the tags or the specific names from the distilled models?

mchiang · 2024-08-19T16:10:30 1724083830

Ollama provides an API endpoint that now supports the ability for an LLM to use tools/functions. Ollama is not a framework itself.

Agent Zero already can use Ollama and alternatives to run the LLMs, and this new feature should enable it to more accurately call tools that is getting built into the models that support it.

mchiang · 2024-08-07T17:49:10 1723052950

This is cool. I'd like to give it a try. Press a button, and get GPU access to build apps on.

erik_landerholm · 2024-08-07T17:49:52 1723052992

Thank you! Let us know how you get on with it!

mchiang · 2024-06-27T15:24:56 1719501896

They are available on Hugging Face: https://huggingface.co/collections/google/gemma-2-release-66...

Ollama: https://ollama.com/library/gemma2

mchiang · 2024-03-15T21:10:12 1710537012

While the PRs went in slightly earlier, much of the time was spent on testing the integrations, and working with AMD directly to resolve issues.

There were issues that we resolved prior to cutting the release, and many reported by the community as well.

Zambyte · 2024-03-15T23:09:29 1710544169

Thank you for clarifying and thanks for the great work you do!

mchiang · on Feb 17, 2024

There is the qwen 1.5 model from Alibaba team.

https://ollama.com/library/qwen

ollama run qwen:0.5b ollama run qwen:1.8b ollama run qwen:4b ollama run qwen:7b ollama run qwen:14b ollama run qwen:72b

I would only recommend smaller parameter sizes if you are fine tuning with it.