Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> people seem to find GPT-4 better for say coding than Gemini Ultra

For my use cases, Gemini Ultra performs significantly better than GPT-4.

My prompts are long and complex, with a paragraph or two about the general objective followed by 15 to 20 numbered requirements. Often I'll include existing functions the new code needs to work with, or functions that must be refactored to handle the new requirements.

I took 20 prompts that I'd run with GPT-4 and fed them to Gemini Ultra. Gemini gave a clearly better result in 16 out of 20 cases.

Where GPT-4 might miss one or two requirements, Gemini usually got them all. Where GPT-4 might require multiple chat turns to point out its errors and omissions and tell it to fix them, Gemini often returned the result I wanted in one shot. Where GPT-4 hallucinated a method that doesn't exist, or had been deprecated years ago, Gemini used correct methods. Where GPT-4 called methods of third-party packages it assumed were installed, Gemini either used native code or explicitly called out the dependency.

For the 4 out of 20 prompts where Gemini did worse, one was a weird rejection where I'd included an image in the prompt and Gemini refused to work with it because it had unrecognizable human forms in the distance. Another was a simple bash script to split a text file, and it came up with a technically correct but complex one-liner, while GPT-4 just used split with simple options to get the same result.

For now I subscribe to both. But I'm using Gemini for almost all coding work, only checking in with GPT-4 when Gemini stumbles, which isn't often. If I continue to get solid results I'll drop the GPT-4 subscription.



I have a very similar prompting style to yours and share this experience.

I am an experienced programmer and usually have a fairly exact idea of what I want, so I write detailed requirements and use the models more as typing accelerators.

GPT-4 is useful in this regard, but I also tried about a dozen older prompts on Gemini Advanced/Ultra recently and in every case preferred the Ultra output. The code was usually more complete and prod-ready, with higher sophistication in its construction and somewhat higher density. It was just closer to what I would have hand-written.

It's increasingly clear though LLM use has a couple of different major modes among end-user behavior. Knowledge base vs. reasoning, exploratory vs. completion, instruction following vs. getting suggestions, etc.

For programming I want an obedient instruction-following completer with great reasoning. Gemini Ultra seems to do this better than GPT-4 for me.


It constantly hallucinates APIs for me, I really wonder why people's perceptions are so radically different. For me it's basically unusable for coding. Perhaps I'm getting a cheaper model because I live in a poorer country.


Are you using Gemini Advanced? (The paid tier.) The free one is indeed very bad.


Spent a few hours comparing Gemini Advanced with GPT-4.

Gemini Advanced is nowhere even close to GPT-4, either for text generation, code generation or logical reasoning.

Gemini Advanced is constantly asking for directions "What are your thoughts on this approach?" even to create a short task list of 10 items. Even when being told several times to provide the full list, and not stop at every three or four items and ask for directions. Is constantly giving moral lessons or finishing the results with annoying marketing style comments of the type "Let's make this an awesome product!"

Code is more generic, solutions are less sophisticated. On a discussion of Options Trading strategies Gemini Advanced got core risk management strategies wrong and apologized when errors were made clear to the model. GPT-4 provided answers with no errors, and even went into the subtleties of some exotic risk scenarios with no mistakes.

Maybe 1.5 will be it, or maybe Google realized this quite quickly and are trying the increased token size as a Hail Mary to catch up. Why release so soon?

Quite curious to try the same prompts on 1.5.


I asked Gemini Advanced, the paid one, to "Write a script to delete some files" and it told me that it couldn't do that because deleting files was unethical. At that point I cancelled my subscription since even GPT-4 with all its problems isn't nearly as broken as Gemini.


If you share your prompt I'm sure people here can help you.

Here's a prompt I used and got a a script that not only accomplishes the objective, but even has an option to show what files will be deleted and asks for confirmation before deleting them.

Write a bash script to delete all files with the extension .log in the current directory and all subdirectories of the current directory.


I’m going to have to try Gemini for code again. It just occurred to me as a Xoogler that if they used Google’s code base as the training data it’s going to be unbeatable. Now did they do that? No idea, but quality wins over quantity, even with LLM.


There is no way NTK data is in the training set, and google3 is NTK.


I dunno, leadership is desperate and they can de-NTK if and when they feel like it.


What is “NTK”?


"Need To Know" I.e. data that isn't open within the company.


Almost all of google3 is basically open to all of engineering.


> My prompts are long and complex, with a paragraph or two about the general objective followed by 15 to 20 numbered requirements. Often I'll include existing functions the new code needs to work with, or functions that must be refactored to handle the new requirements.

I guess this is a tough request if you're working on a proprietary code base, but I would love to see some concrete examples of the prompts and the code they produce.

I keep trying this kind of prompting with various LLM tools including GPT-4 (haven't tried Gemini Ultra yet, I admit) and it nearly always takes me longer to explain the detailed requirements and clean up the generated code than it would have taken me to write the code directly.

But plenty of people seem to have an experience more like yours, so I really wonder whether (a) we're just asking it to write very different kinds of code, or (b) I'm bad at writing LLM-friendly requirements.


Not OP but here is a verbatim prompt I put into these LLMs. I'm learning to make flutter apps, and I like to try make various UIs so I can learn how to compose some things. I agree that Gemini Ultra (aka the paid "advanced" mode) is def better than ChatGPT-4 for this prompt. Mine is a bit more terse than OP's huge prompt with numbered requirements, but I still got a super valid and meaningful response from Gemini, while GPT4 told me it was a tricky problem, and gave me some generic code snippets, that explicitly don't solve the problem asked.

> I'm building a note-taking app in flutter. I want to create a way to link between notes (like a web hyperlink) that opens a different note when a user clicks on it. They should be able to click on the link while editing the note, without having to switch modalities (eg. no edit-save-view flow nor a preview page). How can I accomplish this?

I also included a follow-up prompt after getting the first answer, which again for Gemini was super meaningful, and already included valid code to start with. Gemini also showed me many more projects and examples from the broader internet.

> Can you write a complete Widget that can implement this functionality? Please hard-code the note text below: <redacted from HN since its long>


This is useful, thanks. Since you're using this for learning, would it be fair to characterize this as asking the LLM to write code you don't already know how to write on your own?

I've definitely had success using LLMs as a learning tool. They hallucinate, but most often the output will at least point me in a useful direction.

But my day-to-day work usually involves non-exploratory coding where I already know exactly how to do what I need. Those are the tasks where I've struggled to find ways to make LLMs save me any time or effort.


> would it be fair to characterize this as asking the LLM to write code you don't already know how to write on your own?

Yea absolutely. I also use it to just write code I understand but am too lazy to write, but it's definitely effective at "show me how this works" type learning too.

> Those are the tasks where I've struggled to find ways to make LLMs save me any time or effort

Github CoPilot has an IDE integration where it can output directly into your editor. This is great for "// TODO: Unit Test for add(x, y) method when x < 0" and it'll dump out the full test for you.

Similarly useful for things like "write me a method that loops through a sorted list, and finds anything with <condition> and applies a transformation and saves it in a Map". Basically all those random helper methods and be written for you.


That last one is an interesting example. If I needed to do that, I would write something like this (in Kotlin, my daily-driver language):

    fun foo(list: List<Bar>) =
        list.filter { condition(it) }.associateWith { transform(it) }
which would take me less time to write than the prompt would.

However, if I didn't know Kotlin very well, I might have had to go look in the docs to find the associateWith function (or worse, I might not have even thought to look for it) at which point the prompt would have saved me time and taught me that the function exists.


Is there any chance you could share an example of the kind of prompt you're writing?

I'm always reluctant to write long prompts because I often find GPT4 just doesn't get it, and then I've wasted ten minutes writing a prompt


How do you interact with Gemini for coding work? I am trying to paste my code in the web interface and when I hit submit, the interface says "something went wrong" and the code does not appear in the chat window. I signed up for Gemini Advanced and that didn't help. Do you use AI Studio? I am just looking in to that now.


I've found Gemini generally equal with the .Net and HTML coding I've been doing.

I've never had Gemini give me a better result than GPT, though, so it does not surpass it for my needs.

The UI is more responsive, though, which is worth something.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: