Been using Claude Code (4 Opus) fairly successfully in a large Rust codebase, but sometimes frustrated by it with complex tasks. Tried Gemini CLI today (easy to get working, which was nice) and it was pretty much a failure. It did a notably worse job than Claude at having the Rust code modifications compile successfully.
However, Gemini at one point output what will probably be the highlight of my day:
"I have made a complete mess of the code. I will now revert all changes I have made to the codebase and start over."
What great self-awareness and willingness to scrap the work! :)
Gemini has some fun failure modes. It gets "frustrated" when changes it makes doesn't work, and replies with oddly human phrases like "Well, that was unexpected" and then happily declares that (I see the issue!) "the final tests will pass" when it's going down a blind alley. It's extremely overconfident by default and much more exclamatory without changing the system prompt. Maybe in training it was taught/figured out that manifesting produces better results?
It also gets really down on itself, which is pretty funny (and a little scary). Aside from the number of people who've posted online about it wanting to uninstall itself after being filled with shame, I had it get confused on some Node module resolution stuff yesterday and it told me it was deeply sorry for wasting my time and that I didn't deserve to have such a useless assistant.
Out of curiosity, I told it that I was proud of it for trying and it had a burst of energy again and tried a few more (failing) solution, before going back to it's shameful state.
After a particular successful Claude Code task I praised it and told it to "let's fucking go!" to which it replied that loved the energy and proceeded to only output energetic caps lock with fire emojis. I know it's all smoke and mirrors (most likely), but I still get a chuckle out of this stuff.
This was also my exact experience. I was pretty excited because I usually use Gemini Pro 2.5 when Claude Code gets stuck by pasting the whole code and asking questions and it was able to get me out of a few pickles a couple of times.
Unfortunately the CLI version wasn't able to create coherent code or fix some issues I had in my Rust codebase as well.
Same here. Tried to implement a new feature on one of our apps to test it. It completely screwed things up. Used undefined functions and stuff. After a couple of iterations of error reporting and fixing I gave up.
Claude did it fine but I was not happy with the code. What Gemini came up with was much better but it could not tie things together at the end.
Personally my theory is that Gemini benefits from being able to train on Googles massive internal code base and because Rust has been very low on uptake internally at Google, especially since they have some really nice C++ tooling, Gemini is comparatively bad at Rust.
Tangental, but I worry that LLMs will cause a great stagnation in programming language evolution, and possibly a bunch of tech.
I've tried using a few new languages and the LLMs would all swap the code for syntactically similar languages, even after telling them to read the doc pages.
Whether that's for better or worse I don't know, but it does feel like new languages are genuinely solving hard problems as their raison d'etre.
Not just that, I think this will happen on multiple levels too. Think de-facto ossified libraries, tools, etc.
LLMs thrive because they had a wealth of high-quality corpus in the form os Stack Overflow, Github, etc. and ironically their uptake is causing a strangulation of that source of training data.
Perhaps the next big programming language will be designed specifically for LLM friendliness. Some things which are human friendly like long keywords are just a waste of tokens for LLMs, and there could be other optimisations too.
>Personally my theory is that Gemini benefits from being able to train on Googles massive internal code base and because Rust has been very low on uptake internally at Google, especially since they have some really nice C++ tooling, Gemini is comparatively bad at Rust.
Were they to train it on their C++ codebase, it would not be effective on account of the fact that they don't use boost or cmake or any major stuff that C++ in the wider world use. It would also suggest that the user make use of all kinds of non-available C++ libraries. So no, they are not training on their own C++ corpus nor would it be particularly useful.
> Personally my theory is that Gemini benefits from being able to train on Googles massive internal code base
But does Google actually train its models on its internal codebase? Considering that there’s always the risk of the models leaking proprietary information and security architecture details, I hardly believe they would run that risk.
That's interesting. I've tried Gemini 2.5 Pro from time to time because of the rave reviews I've seen, on C# + Unity code, and I've always been disappointed (compared to ChatGPT o3 and o4-high-mini and even Grok). This would support that theory.
As go feels like a straight-jacket compared to many other popular languages, it’s probably very suitable for an LLM in general.
Thinking about it - was this not the idea of go from the start? Nothing fancy to keep non-rocket scientist away from foot-guns, and have everyone produce code that everyone else can understand.
Diving in to a go project you almost always know what to expect, which is a great thing for a business.
If budget is a factor, there are local/regional providers you can find good deals with. E.g. I've had good experiences with https://www.opticfusion.com/ in Seattle.
If low-latency across the US is a factor then...doing that with one will be sub-optimal, however I'd either start in a specific market you want to chase (e.g. LA or NYC), or go more central like Chicago.
If connectivity flexibility and peering are key for you then you might consider locating in a "carrier hotel". The best locations for those vary by geography (e.g. in Chicago that would be at Equinix, but in LA it's Coresite). You'll pay more, but can readily access the peering exchange.
Figuring out your goals is key here. There are lots of providers all over the spectrum, so narrowing the options is key.
Location is clearly important. Where do you want your datacenter(s) and do you want all locations managed by the same company or different companies.
Electrical reliability: some sites are fed from multiple substations, have a well maintained battery room, redundant generators, well tested transfer switches... and some don't. Automatic transfer switches tend to be a SPoF though, and a major provider, doing everything right, is likely to have an outage at one of their facilities due to a transfer switch failure once every 3-10 years, and afterwards, there will be an unavoidable scheduled maintenance to replace it (unless it had to be replaced during the outage). Maybe there's a new way to install transfer switches redundantly though?
HVAC reliability: HVAC needs electricity too, so see the previous one, but servers don't like to be hot, what are their plans for when HVAC equipment fails.
Network reliability: if you're at a carrier neutral site, you'll get fiber to your rack from your upstreams and then it's all up to you or them. If your provider is your transit, do they deliver redundant connections (lacp), do they run redundant connections all the way to their border router, redundant connections to their upstreams, do they have multiple upstreams, will they pull fiber from an upstream carrier to your rack. Do they participate in the local peering exchange? Also, have a capacity in mind.
Remote hands / physical access is important. If you have a lockable rack, do all the racks have the same key? IPMI can avoid a lot of physical time on the machines, but not all of it.
In addition to all of these, there's a what they say, and what's actually the case, and if they'll let you inspect / audit.
There's no wrong answers on these, I have my personal hosting at a facility with probably wishy washy answers to everything and I have low trust, but it's cheap and I don't need nine nines. Other people need all the reliability. Or something in between.
I don't recall exactly (it's been a while since I was handling the colo contracts directly). I just remember that it was really good value.
You're probably going to have to contact folks for quotes for most colo providers. They're doing "enterprise sales", so they want to talk to you, see what the needs and opportunities are, then quote you based on your needs.
This has been an intense journey building CompanyCraft as a solo dev (and I was never a full-time dev by background). Thanks to CoPilot and ChatGPT I was able to learn a decent level of React through the project. I look forward to your feedback!
Congrats on jumping into this adventure! Here are my thoughts on your concerns:
1. If you communicate that you're in the early stages of building an exciting new business and wanting to get feedback from early-access customers, there should be businesses that are OK with you being solo. They will adapt to that risk, and you won't be able to sell to everyone, but it shouldn't exclude you from everyone either. You might just not be talking to the right customers yet.
2. For milestones (and fundraising) you should think about your goals - what are you trying to get out of this? Do you want to build and scale a big team, serve giant amounts of customers, etc. Or do you just want to augment or replace your income? Or something in-between? You can think about this from different angles: team size, revenue, geographic scope, your personal exit or income, etc.
Here are a couple of examples from my journey in case they're helpful:
In my prior business (https://www.bigleaf.net) my first milestone was to get a working product. I didn't feel as though just talking about it would be convincing enough (and it wasn't), but once I could demo an MVP then it really wow'd customers. Along the way I added a milestone to get a technical co-founder since I got burnt out doing it myself. Those 2 major steps took just over a year. The next milestone was getting our first revenue (took just a month or so after the MVP), then milestones just kept being added from there logically based on the strategic path we chose to be on at that moment.
In my current business (https://www.companycraft.ai) I had a mix of some early milestones. I wanted to talk to potential customers (entrepreneurs) to ensure the problems I wanted to solve matched up with pain they had, and I also wanted to build an MVP ASAP. I completed those in about 3 months, with many other smaller steps and goals along the way. My current milestone is to do a full "v1" public launch in November.
So you just set some milestones that are logical based on where you are and where you want to be in the coming months and years, then hold yourself accountable to them (or ideally be connected with a peer or mentor who can be an accountability partner of sorts).
3. On location, I go back to your team goals. If you're going to stay solo then no, location shouldn't matter. If you want to build a fully remote team then it shouldn't matter much (though being near-ish to a decent airport would be valuable for visiting the team and customers). If you want any local team members then you should think about where you'll gather and what the talent pool is like in your area.
4. It sounds like you're already on a good track here - talking to people who have the pain you're solving and gathering their emails. If that has you on the right track I'd just keep doing that. However you mentioned below that you're selling this B2B...so in that case you may want to consider if you're actually talking to the buyer of your product. Are the people you've met the people who would approve the purchase of your solution? If not, try to connect with those people. Identifying a list of them on LinkedIn and cold reaching out could be one method.
5. Fundraising goes back to your goals as an entrepreneur :) If you want to grow this to IPO or a really big exit, have a team of hundreds, and build a market-changing or world-impacting offering, then you probably need to raise outside funds at some point. However, if your desired scale is smaller and/or you see a path to profitability without fundraising then you get to keep control of your destiny, which is great. If you can afford it and you don't sense the market is going to run away without you, I'd advise continue building, talking to customers, and close your first deals; don't rush into raising money. If/when you know that you can't succeed at your goals without raising money, then dive into that path.
They cite "two to 15+ seconds" in this blog post for responses. Via the OpenAI API I've been seeing more like 45-60 seconds for responses (using GPT-3.5-turbo or GPT-4 in chat mode). Note, this is using ~3500 tokens total.
I've had to extensively adapt to that latency in the UI of our product. Maybe I should start showing funny messages while the user is waiting (like I've seen porkbun do when you pay for domain names).
Was this in the past week? We had much worse latency this past week compared to the rest (in addition to model unavailability errors), which we attributed to the Microsoft Build conference. One of our customers that uses it a lot is always at the token limit and their average latency was ~5 seconds, but that was closer to 10 second last week.
...also why we can't wait for other vendors to get SOC I/II clearance, and I guess eventually fine-tuning our own model, so we're not stuck with situations like this.
I've seen more errors lately I think, but no the latency has been an issue for months. I think it has grown some over the last few months, but not a dramatic change.
There's no real benefit to streaming if you are planning to use the LLM output downstream (say, in a SQL query). LLM latency is a major annoyance right now, whether locally-hosted or cloud-based.
I thought this was an interesting thread. I'm curious if folks here agree with his assessment:
"predictive AI will seem stuck while generative AI accelerates in 2023. Most high-value AI will still be predictive- so there will significant frustrations around AI ROI."
Shameless plug - if you're using Starlink and want to fix the dropouts, plus get a static public IPv4 address that works over any ISP, get our SD-WAN service: https://www.bigleaf.net/. With a 2nd internet connection, our platform will auto-ID your sensitive traffic and route it over the best performing connection (e.g. aware of jitter, dropouts, etc), plus your bulk data traffic (e.g. Netflix) will route over your highest-throughput connection. 30-day money-back guarantee, so no risk to try.
As a former wireless ISP architect/engineer, it's wonderful to see the leadership that Starlink is providing in LEO satellite connectivity (due to the low latency compared to geostationary). I hope the "block the sky with satellites" visual/astronomy concerns won't play out as an actual issue, because this seems like a great platform to address connectivity needs in harder-to-reach areas.
Bigleaf Networks | Portland, OR area | Full-Time ONSITE | $60k-$150k plus stock options
Bigleaf is what we call "Cloud-first SD-WAN" - we use a software-based network to optimize the connection from businesses to their key cloud applications. For example, grocery stores use us to ensure their credit card transactions are successful and law firms use us to ensure their VoIP calls always sound good.
We're hiring for a number of roles right now:
* Software Development Manager
* Sr. Front-End SW Engineer
* Linux-focused SW Engineer (firmware and Linux networking subsystem)
* Software Engineer (general role with focus on back-end and networking)
* DevOps Engineer
* Director of Network Engineering and Operations
* Network Support Specialist
Most of these roles are up on our website, and you can read more here: http://www.bigleaf.net/careers. Feel free to email me directly (see my profile) if you're interested.
Bigleaf is an SD-WAN platform that provides reliability and performance for Cloud applications over commodity broadband. We're a small team but we've got an established business with hundreds of paying mid-market customers, and we're growing quickly.
Our interview process entails some initial email discussions, 1-2 in-person or phone-based interviews (no crazy technical algorithm memorization tests), and often a brief (~1 hr) coding challenge for you to do from home.
We're hiring for the following technical roles right now:
* Front-end Developer
* Sr. Software Engineer (Linux networking focus)
* Network Operations Engineer
* Network Integration Engineer
Check out more details here: http://www.bigleaf.net/careers and feel free to email me (Founder and CEO) at joelm@bigleaf.net (no recruiters please).
Bigleaf Networks | Beaverton, OR (near Portland) | ONSITE full-time
Bigleaf is an SD-WAN provider delivering internet redundancy and optimization, keeping businesses connected to the cloud. Our proprietary platform uses Software-Defined-Networking (SDN) technologies to provide seamless failover and dynamic application prioritization.
We have a reliable and high-performance service that’s growing quickly, so we're looking for a Network Operations Engineer to join the team.
The Role
• Network Operations
• Technical Support
• Device and Service Provisioning
• Software Engineering / DevOps
Fit Check
• Do you love serving customers with outstanding support?
• Do you know what ARP is and how it works?
• Have you troubleshot BGP or OSPF issues?
• Do you know what jitter does to a VoIP call?
If you think this might be a good fit for you, please check out more info below and get in touch:
However, Gemini at one point output what will probably be the highlight of my day:
"I have made a complete mess of the code. I will now revert all changes I have made to the codebase and start over."
What great self-awareness and willingness to scrap the work! :)