Hacker News new | past | comments | ask | show | jobs | submit login
Wikimedia is moving to Gitlab (mediawiki.org)
889 points by input_sh on Oct 28, 2020 | hide | past | favorite | 390 comments



I think I have some relevant experience here.

We host all of our projects on github:

https://github.com/sqlalchemy/

yet we also use gerrit!

https://gerrit.sqlalchemy.org/

users send us pull requests, and they never have to deal with Gerrit ever. We use a custom integration, the source code to which is here: https://github.com/sqlalchemyorg/publishthing/tree/master/pu... and then we mostly bidirectional synchronization between Gerrit and Github pull requests (code changes can move freely from Github PR -> gerrit, comments and code review comments are posted bidirectionally, Gerrit status changes are synchronized into the PR - example: https://github.com/sqlalchemy/sqlalchemy/pull/5662

I continue to find Gerrit's code review to be vastly better than Githubs. Gitlab would take tremendous server resources to run internally and I like Github much better for the front-facing experience.

I wrote about an earlier form of our integration here: https://techspot.zzzeek.org/2016/04/21/gerrit-is-awesome/

to sum up:

1. your project benefits massively by being on Github

2. Gerrit is awesome (for me)

3. gitlab is not very appealing to me UX-wise and self-hosting wise

4. you can still use pull requests from outside users and use gerrit for code reviews.


I am yet to know people who Review code and use Gerrit to name a better solution.

I belonged to a team that used Gerrit for Review and Hosting, we changed to hosted Gitlab because people missed a "GitHub-like UI" they were used to. It was unanimous that Code Review on Gerrit was way better:

1. You start reviewing the commit message, that is the first touch point with a change everyone has 2. Navigation is done from file to file 3. On Gerrit there isn't two people commenting the same thing, because: 3.a. Messages from different people on the same line change are displayed together, not as different threads. 3.b. The review of a previous version is displayed with the next version, so you can continue the same discussion

I understand that GitHub/GitLab interface is more friendly, but their code-review really stands in the way of producing good software by not favoring good commit messages and long discussions.


> I am yet to know people who Review code and use Gerrit to name a better solution.

What about reviews via patches sent to a mailing list?

I haven't looked into Gerrit for a while, so one question I have is how it handles related commits? The mailing list approach can group them in a single thread tied by a cover letter message where each commit along with the associated diff from the parent working tree is a message which is a reply to the cover letter message.


To be polite, I think the target audience of these tools might not include you. While that workflow works for you and, apparently, scales for some very large projects like the Linux kernel, it isn't a good solution for an enormous number of people which is why tools like Github, Gerrit, GitLab, and others exist.


> To be polite, I think the target audience of tools might not include you.

Insinuating that I'm a special case doesn't add to the discussion.


You aren't a special case, but if you don't see any flaws with your method of code review then I don't think you're the target market for code review tools.


I have no dog in this fight, but you stated:

> it isn't a good solution for an enormous number of people

.. without providing any insight or justification for that belief. So an interesting and productive tangent might be to elucidate what you believe those flaws are?

Unpacking this question might even lead to some good UX ideas that can be applied to today's review systems?


I would provide one from someone who is not a dev or SE.

I honestly never fully understand how to read the maillist patch. The plain text format makes it very hard to understand what's going on. I'm sure I will understand it better or even prefer it if I use it long enough, but then again I can instantly understand code reviews on GitHub/GitLab/Gerrit.


I've never had to use a maillist patch myself, but from the ones that I've glanced at, I have this same problem, that formatting, syntax highlighting is absent. It seems like this would be an easy problem to fix by using a better viewer, so that might be enough to make it comprehensible.


> I honestly never fully understand how to read the maillist patch. The plain text format makes it very hard to understand what's going on.

That may be due to the settings in your mail client. If it's displaying plain text in a variable width font and/or not applying syntax highlighting in terms of showing added and removed lines in different colors, that would make the diff more difficult to read.

But some mail clients can do that and it makes reading the diff much easier.

> I'm sure I will understand it better or even prefer it if I use it long enough, but then again I can instantly understand code reviews on GitHub/GitLab/Gerrit.

Essentially, code reviews in a mailing list are much like a discussion thread in Hacker News or Reddit where the thread structure is very similar. The only difference is that most mail clients only allow you to display one message at a time.

In my mail client, Thunderbird, you can see the overall thread structure of a patchset discussion. This is the root message for the thread which serves as the cover letter (which is the equivalent of the PR description) [2]. The first commit in the patch is displayed here [3] (note that I have a plugin that enables diff syntax highlighting). The email subject is the commit message title (with the [PATCH v2 1/3] tag prepended to it). The commit message itself is the beginning of the email, and the diff follows.

Unlike Github (and maybe Gitlab), the commit message and diffstat is treated at the same level as the diff itself. That means you can comment on it just like you would on the diff.

Here, you can see the Junio C Hamano's comments on the second commit in the patch set [4]. He's commenting on the diffstat line which shows 391 lines lines added to the builtin/submodule--helper.c file. Further down in the same message [5], he's commenting on the code inline, much like someone would quote a message here on HN and reply inline to multiple sections of it. It's not really that different compared to comments on a diff in Github or Gitlab other than the fact that it's a reply to an email message rather than a web page.

[1] https://i.imgur.com/QmqUWR8.png

[2] https://i.imgur.com/mILREtf.png

[3] https://i.imgur.com/gdoy5zs.png

[4] https://i.imgur.com/BcTdRRe.png

[5] https://i.imgur.com/cCpqsOL.png


I will be honest, I don't even use email client :/


True, I suppose most people use Gmail or one of the other major email providers through a webmail interface. I haven't been able to get Gmail or Hotmail to display threaded messages the way they're displayed in Thunderbird and they tend to display messages using a variable width font.

In that context, reviewing code would be difficult, if not impossible, to do via email.


It is one advantage to mailpatch though. A lack of vendor lock-in means you can view the patch using whatever application you want. And there's room for a better mailpatch viewer, if anyone could be bothered to make one.

I'm guessing one disadvantage is the method of diff is hard-coded into the patch. It would be good to switch to word-diff, or ignore whitespace, but I'd imagine these could be applied as transformations on the generic format.


> I'm guessing one disadvantage is the method of diff is hard-coded into the patch. It would be good to switch to word-diff, or ignore whitespace, but I'd imagine these could be applied as transformations on the generic format.

The plugin I use in Thunderbird can switch between unified, context, and side-by-side diff views based on the same email. Adding the transformations you mentioned could be done.

But one limitation that email has over other review tools is the lack of ability to expand the view of the context within the client. The only way I could think of is to have git format-patch generate the diff with the entire context included and then have the client limit the display of that context. But that not have a reasonable fallback for those using clients that aren't capable or configured to do that.


Which plugin are you using? it seems life-changing to me


This is the one I'm using: https://github.com/Qeole/colorediffs. I installed it several years ago, so I'm not entirely sure whether you can install it on a current version of Thunderbird, but it is still working with my installation.


I might try it out as well! Thanks for it.


That should be an easy problem to fix:

Just send out email in HTML format with the code portions set to fixed width font and syntax highlighting already applied. Should display just fine in GMail.


> Just send out email in HTML format with the code portions set to fixed width font and syntax highlighting already applied

That won't work because people actually download those email messages and apply them to their local repository with the git am command and they also expect to be able to reply to the email inline when commenting on a patch. Also, for mail clients that don't support HTML rendering, it would be much more difficult to read or respond to an HTML email.


I would assume you'd sent both a text message and an HTML message?


Now you're claiming that I'm making an argument that I never made.

To be clear, the original statement was whether there was a better review tool compared to Gerrit for those who review code and use that tool for that purpose. I responded by suggesting the patch review via mailing list method and asked a follow up question about how Gerrit handled related commits and explained how that case was handled by the mailing list method.

Then you, not the person I originally responded to, decided to interject and, while claiming not to be rude, claims I'm saying something entirely different than what I actually stated.

Personally, I found that very off putting and extremely rude on your part.


I'm sorry I was rude to you, I didn't mean that and I apologize.

I didn't mean anything more than what I literally wrote, which is I don't think you're the target market for code review tools and your suggestion may not be a good fit for people looking for review tools.


Gerrit handles related commits in a similar way I guess.

A Gerrit changeset is like a GitHub pr, it has many commits, and commits are usually rebased against master, this is crucial to track the review properly over time. (Its very easy to switch to earlier revisions of the same changeset and you never see external changes.)

Honestly, I don't think Gerrit is anything special, I think it simply has the right approach to development (rebased/ff-only) that enables easy review.


Based on the documentation I read, it looks like Gerrit handles this by grouping changes by topics [1]. I'm not sure whether that can be done automatically when pushing up changes that span multiple commits in a local branch.

If I create a branch with several commits where one commit adds a new method with associated unit tests, and a subsequent commit adds several calls to that new method in the code base (while updating any affected tests), then how would Gerrit handle the ordering of those commits. Even if they're in the same topic, I don't know if there's a way to ensure that the first commit is reachable from the subsequent commit.

[1] https://gerrit-review.googlesource.com/Documentation/intro-u...


I'm a Product Designer at GitLab and I appreciate your feedback.

We are well aware of the advantages that Gerrit has over GitLab and how these are emphasized when teams migrate to GitLab. We are working on improvements that will help mitigate this. Specifically, targeting your points:

1. We're discussing ability to comment on commit messages: https://gitlab.com/gitlab-org/gitlab/-/issues/19691 2. From version 13.2 you can opt to show one file at a time, in your user preferences: https://gitlab.com/gitlab-org/gitlab/-/issues/222790. We have also listed a number of improvements to this feature in https://gitlab.com/groups/gitlab-org/-/epics/516.

Could you expand on point 3.b? If the commented line hasn't changed from version A to B, you should be seeing the comment in version B. Or maybe you're referring to something else?


Gerrit is the one where each "pull request" has to be a single commit, right?

I'm not particularly happy about GitHub, but I think it's less about GH and more about workflow and source code evolution, and I'm not sure if Gerrit solves anything here.

First, PRs being heavyweight encourages large branches instead of smaller incremental changes. Secondly, a large change (such as a big new feature) ends up living in a branch until it's ready to be released, not until it's reviewed. This means that any code that cannot be immediately merged to master need to live on that branch for a while, and gets rapidly out of date, requiring constant rebasing to keep it from rotting.

I'd prefer to merge as soon as something is accepted and use master at the main development branch, but that causes challenges. If you've merged something you don't want to release yet, you're faced with having to build release branches through cherry-picking, which can be really difficult or even impossible. You can hide features behind flags, but sometimes a branch is a big risky refactoring or some structural change that isn't providing features that can be isolated. Plus, once you merge something, any following changes, even if unrelated, often end up depending on the things you want to exclude.

I think something like Pijul (with some discipline, like always doing small incremental commits) could make this easier by being able to treat individual commits as moving pieces that can be rearranged for a release, but it wouldn't solve everything.

Any thoughts on this and how Gerrit would fit in?


> Gerrit is the one where each "pull request" has to be a single commit, right?

Yes, but you can have changesets depend on each other, so its not that big a deal. (But you ca end up in rebase hell if you do that). You also get a version history of all the different versions of your commit.

Ancedotally, during my use of gerrit, i never really wished to have multiple commits on a single changeset.

> I'd prefer to merge as soon as something is accepted and use master at the main development branch

That's what Wikimedia did, mostly. (There were weekly deployment branches but it was unusual to have something in master but reverted out of the deploy branch). It seemed to mostly work fine afaik (of course i wasnt on the team doing deploys, for all i know they might have horror stories)


> Ancedotally, during my use of gerrit, i never really wished to have multiple commits on a single changeset.

People's tools shape their workflows; projects that use Gerrit tend to do more squashing of commits, because there's much more per-commit overhead. When I encountered Gerrit, I found it really frustrating to work with for this exact reason. Other aspects of it were great, but if you're used to "one logical change per commit" and end up with a dozen commits in a PR, that can be painful with Gerrit.


The whole point of Gerrit is to keep doing "one logical change per commit", but reviewing them individually. Anecdotally, this results in much higher-quality reviews.

You can still group them by "feature" and merge them atomically by using topics.


> # Why

> For the past two years, our developer satisfaction survey has shown that there is some level of dissatisfaction with Gerrit, our code review system. This dissatisfaction is particularly evident for our volunteer communities. The evident dissatisfaction with code review, coupled with an internal review of our CI tooling and practice makes this an opportune moment to revisit our code review choices.

and then further down

> # FAQ

> * Why is GitHub not considered?

> - GitHub would be the first tool required to participate in the Wikimedia technical community that would be non Free Software and non self-hosted.

> - GitHub also does not meet all of our needs; for example, GitHub grants little control of metadata, no influence over privacy policy/data retention, sanctions and bans, little control over backups and data integrity checks, and no long-term guaranteed access to underlying repository settings and configuration.


Wikimedia was already using mirroring to github (however we didn't accept pull requests).

I'm pretty sure most of the anti-gerrit sentiment at wikimedia was about gerrit as a code review tool.

My personal experience with it (as a mediawiki developer) is gerrit has a lot of UI bugs (although it has gotten better). I also suspect it encourages a code review culture that is overly nitpicky and risk averse (but perhaps that is just cultural forces at wikimedia)


I'm not familiar with any UI bugs of any kind, but we don't use the "live editing" feature, maybe that's where you had problems. The big issue with Gerrit is on the "getting plugins to work" side of things, as they are kind of ad-hoc and almost totally undocumented, as well as the access model is too complicated but once that's all working, there is no need to deal with it.

nitpicky culture, we maybe have that problem with Openstack where there are thousands of developers, but for our projects in SQLAlchemy we're a team of about five people and I'm more or less a BDFL type of role, to the degree that we are nitpicky about things it only prevents much bigger problems from happening later, if a review has little things that are bugging me I'll just fix them myself and push a new change up rather than bothering them with it, also something you can't usually do with pull requests.


Wikimedia was using a super old version of gerrit for a long time. They upgraded recently (although the upgrade happened at roughly the same time as i left my job and took a step back from wikimedia so i dont have much experience with the new version).

I think Wikimedia struggles a lot with code review culture in general. Different people have conflicting ideas about what good code looks like. It used to be very nitpicky (i've had code rejected in the past for using (php's) intval() instead of casting to int. I've also had code rejected for casting to int instead intval().) But that's improved quite a bit with better precommit lint tools. The length of the feedback cycle is very long and sometimes feels like its mostly about who you know (e.g. the last patch i submitted i did on sept 9 for a decently serious bug. First actionable feedback (which was relatively minor things of the form use a constant named ONE_MINUTE instead of 60) was on oct 15. Thats kind of a long time to wait for code review imo). Anyways, its just not fun to contribute when code review is so unpredictable and long.

Hmm. Guess i got off on a bit of a tagent there. I do think gerrit has some usability issues, but i think that's hardly the main problem.


those sound like managerial / organizational / social issues. technology isn't going to solve those without good guidance and controls for the overall system. building that up for a very large organization is extremely difficult, I'd not want to have to do that :).


> My personal experience with it (as a mediawiki developer) is gerrit has a lot of UI bugs (although it has gotten better). I also suspect it encourages a code review culture that is overly nitpicky and risk averse (but perhaps that is just cultural forces at wikimedia)

Can strongly recommend to remove "-1 code review" and require all comments to be resolved instead. Accomplishes the same goal why being more positive.


I wonder how gerrit compares to reviewboard (https://www.reviewboard.org/)


Anyone thinking of moving to their own Gitlab instance with Gitlab CE-- either stay on Github or prepare to waste your time dealing with user spam bots that pollute your site's search results.

In other words-- if you want the common use case for a FOSS project:

1. publicly viewable main repository with publicly viewable issue tracker

2. requirement to log in to view all snippets, user profiles, perhaps even other repos as enforced by administrator settings (otherwise SEO bots will leverage these features to eat your search results)

3. anyone with an email can sign up to post issues to the main repo's issue tracker

There is no combination of settings in Gitlab CE to achieve this. Any sane approach has to leave out step #2. That means that your Gitlab instance gets hammered with user spam from bots which then get indexed in Google search results for your site.

Worse, Gitlab has no tools to make it easy to remove the user spam (and obviously no tools to prevent it from happening).

Just run a public-facing Gitlab CE instance for a few days. Search for one of the spam snippets you collect, and you'll find results for all the FOSS projects out there running their own Gitlab instances.

I've never seen any solutions offered by Gitlab for this, nor frankly any interest in the myriad bug reports about them addressing this at all.

Edit: typo


Hi! I'm the PM at GitLab who works on Snippets, so thanks for providing this feedback. We do have Recaptcha support which can be configured - are you seeing these kinds of issues with that enabled/configured?

One item that is on the roadmap that is coming and may be of interest is `Optional Admin Approval for local user sign up` - https://gitlab.com/groups/gitlab-org/-/epics/4491.

I'm not in the group working on that, but it does appear to be coming soon and would limit the ability of newly created accounts from doing anything until they're approved.


Hi phikai,

I built a privacy friendly alternative to ReCaptcha called FriendlyCaptcha [1], is there a possibility to see this integrated as a more user friendly alternative?

Happy to chat (e-mail in profile)

[1] https://friendlycaptcha.com/


Man this needs more attention, cool project. I see you tried to submit to HN a couple of times and didn't get traction, that's too bad. Don't give up!


Is the demo somehow tweaked to be less hard?

On my machine it doesn't take any time to solve it and I see no signs of CPU usage. Even trying a couple of times in incognito mode and watching CPU immediately after loading the page for the first time.

On many sites creating a profile takes a few seconds. Loading one of my CPU cores for another 5 seconds doesn't really bother me if I wanted to create massive amounts of profiles/posts. I'll still do over 100 per minute on a standard desktop PC.


The default difficulty is set to a difficulty that makes sense on websites that have a varied audience (which includes some ancient browsers on old devices).

The solver runs in WebAssembly and is really really fast (~4M hashes per second) - but not every browser supports WASM yet (around 0.3% empirically). The JS fallback is around 10 times slower (more in 5+ year old browsers) - for those users you want at least a decent solve time too.

For Gitlab's audience the difficulty can probably be increased a lot - it all depends on the website and usecase. I'm sure the JS fallback's performance can be improved (it involves a lot of operations on 64bit ints that need to be represented as two numbers in JS), happy to accept PRs [1] :)

[1]: https://github.com/FriendlyCaptcha/friendly-pow/blob/master/...


What are your thoughts on performing a quick intial test on each client to measure their performance then tailoring the puzzle to be difficult enough for each?


Once the spammer figures out what you're doing, he'll just throttle the CPU for the duration of the quick test.

Depending on how smart the test is, just having Date.now() return values with a -12000, -11000, -10000 offsets the first few calls might even do it


That looks cool! Can someone create an issue to add support for this to GitLab? And maybe we can consider switching GitLab.com to this as well.


I'm personally interested in this too so I've created one :D https://gitlab.com/gitlab-org/gitlab/-/issues/273480


Thanks for creating this! I think adding support for this in GitLab is a no-brainer. After that we can consider enabling it for GitLab.com


Hopefully you are successful, but how can you scale? If it takes 5 seconds on a desktop, then a server can solve 500.000 captchas per month. At $5 per month, a spammer can still send 1.000 messages for a cent.


It's not enabled yet in production - but the main mechanism is by increasing the difficulty as more requests are made from an IP in a certain timeframe (it's basically rate limiting at that point). Think: every 3rd request in a minute doubles the difficulty with some cooldown period.

With that the cost (and complexity) of an attack can hopefully be in the same ballpark (or higher) than ReCaptcha - without your end user having to label cars or send data to Google.

But in the end a determined spammer will get through any captcha cheaply (for reference: ReCaptcha solves are sold by the thousands for $1) - we just hope we can do better than ReCAPTCHA, especially UX-wise.


I love this concept of proof-of-work captchas, but there's a growing number of tools and ways to bypass IP blocks via IP rotation[1], specially after the explosion of IaaS providers. How do you intend to tackle this?

[1] Some examples: https://rhinosecuritylabs.com/aws/bypassing-ip-based-blockin... https://oxylabs.io/products/real-time-crawler https://github.com/alex-miller-0/Tor_Crawler https://www.scrapinghub.com/crawlera/


There are free and paid list of all ip addresses from datacenters like https://udger.com/resources/datacenter-list, they probably existing for specifically preventing this, so maybe thats an option here.


The obvious follow-up question is how IPv6 impacts this, because I think it's supposed to be easy for someone to get their hands on a decent chunk of IPv6 addresses.

Maybe the difficulty could scale as a property of how similar the IP address is to previously seen addresses... so the addresses in the same /64 block would be very closely related, for example. (I think that's how IPv6 works... but definitely something I haven't researched lately, so I could just sound very confused)


I don't have all the answers yet, but indeed rate limiting a larger block (at least /64), or even at multiple prefix sizes with different weighting makes sense.


So the way this is supposed to work is that providers hand out /48s and each site should be allocated a /64. In practice if you for example rent a VPS, you'll be handed a /64 for it by your service provider from their /48.

I would personally treat any /64 as the same. Depending on your local network setup the second half of the address could be anything and could change frequently. You might also get multiple addresses. Whereas getting a new /64, or /48, requires slightly more effort.

Of course there's a risk you'll block a /64 and that takes out some whole company or whatever, but I've seen that happen to corporate proxies that got flagged as a source of spam as well so this is not an easy problem even without the 2^128 address space.


Your website mention that friendlycaptcha is open source but looking at the license in the repository, it is a custom license that can't be defined as open source. Can you change it to source available?


Love to see this. ReCaptcha is nothing short of a menace. I'll take a shot at this for my next project


There doesn't appear to be any discussion on your website or on GitHub about why, to be blunt, this is even a good idea in the first place.

A classic 2004 paper, "Proof-of-Work" Proves Not to Work [0], explained that the fundamental problem with proof-of-work bot filters is that attackers will always be able to solve the cryptographic puzzle faster than legitimate users. A touch of security-through-obscurity can help at the margins, but you chose Blake2b, which is used by cryptocurrencies like Zcash, Siacoin, and Nano [1], and as a result there are optimized GPU algorithms (first Google result [2]) and FPGA designs (one of the top Google results [3]). Have you run the numbers on any of those?

The closest to any discussion of these numbers that I saw was a mention on your website that it may take up to 20s on mobile; for comparison, the much-hated image CAPTCHA takes about 6-12s on average for native English speakers, and 7-14s for non-native speakers [4].

In another comment you bring up the idea of starting with a lower difficulty, and increasing it with repeated requests from the same IP address (IPv4, I assume). Unfortunately, access to unique IPv4 addresses is highly correlated with access to more compute power: laptops and desktops in developed countries are most likely to be in a household with a unique IPv4 address, whereas mobile devices on 4G internet and households in developing countries are more likely to be behind Carrier-Grade NAT [5], where thousands or millions [6] of hosts share a pool of a handful or dozens of IPv4 addresses. (The exact same concern applies to IPv6 /64 prefixes.)

This means that mobile devices will face a "double-jeopardy": your service will present them with higher proof-of-work difficulties because the same IPv4 address is shared by more people, and at the same time, the mobile device solves the proof-of-work slower for the same difficulty than a desktop.

Do you have documented anywhere on your website or GitHub how you address these concerns?

[0]: https://www.cl.cam.ac.uk/~rnc1/proofwork.pdf

[1]: https://en.bitcoinwiki.org/wiki/Blake2b

[2]: https://github.com/zhq1/sgminer-blake2b

[3]: https://xilinx.github.io/Vitis_Libraries/security/2020.1/gui...

[4]: http://theory.stanford.edu/people/jcm/papers/captcha-study-o...

[5]: https://en.wikipedia.org/wiki/Carrier-grade_NAT

[6]: Yes, millions. RFC 6598 reserved a /10 for them, which is 4 million unique IPv4 addresses: https://tools.ietf.org/html/rfc6598


I'm not associated with the project in any way, but your well researched comment did miss at least one important factoid.

This comment:

> The closest to any discussion of these numbers that I saw was a mention that it may take up to 20s on mobile; for comparison, the much-hated image CAPTCHA takes about 6-12s on average for native English speakers, and 7-14s for non-native speakers.

Missed this quote from the website:

> As soon as the user starts filling the form it starts getting solved

> By the time the user is ready to submit, the puzzle is probably already solved.

The time spent solving reCAPTCHA is active user involvement. The time being spent on Friendly Captcha is passive and can overlap with time being spent filling out a form.

"up to 20 seconds" was also seemingly presented as a worst-case scenario. Most users' devices would presumably be faster than that, but I don't know how the author researched that conclusion on how performance scales. Friendly Captcha does report back some information on how long it is taking users to solve the captcha, and it looks like website owners could use that to adjust the difficulty based on the needs of their specific audience and how tolerant they are of untargeted spam.

The stuff you point out about Blake2b seems entirely legitimate, and I wonder if an Argon variant would be more appropriate to avoid specialized hardware being quite so problematic.

Personally, I really like the idea of Friendly Captcha. Certainly, there are problems with any captcha implementation. People can rant for many, many paragraphs about websites that use reCAPTCHA... I'm not surprised to see someone ripping apart a different captcha system. The ideal solution would be for spammers to just stop being so obnoxious... but good luck with that plan.


The time being spent on Friendly Captcha is passive and can overlap with time being spent filling out a form.

Great point!

I wonder if an Argon variant would be more appropriate

The creators of Argon2 actually also created a memory-hard proof-of-work function they call MTP (for "Merkle Tree Proof", which is a terrible name, totally un-Googleable; I always have to search for the title of their paper, "Egalitarian Computing"): https://arxiv.org/pdf/1606.03588.pdf

A bug bounty for it was sponsored by Zcoin, which is nice. Zcoin is actually considering moving away from it, but mainly because the proof size of 200kb is prohibitive, which is less of a concern for a captcha system: https://forum.zcoin.io/t/should-we-change-pow-algorithm/477

I'm not surprised to see someone ripping apart a different captcha system

I really don't mean to rip it apart. I just wanted to see some discussion, any discussion, of the well-known flaws with the idea and what ideas OP has to address them.


It is also important to note that the 6-12 seconds and 7-14 seconds reported in the paper is for the garbled text CAPTCHAs, not for image labeling tasks (fire hydrants, cars, etc).


I'll try to provide my thoughts on each of the issues you've mentioned, let me know if there's something I missed.

On using blake2b: I chose blake2b as I was looking to use a hash function that is small in implementation, readily available and already optimized. With WebAssembly the solver can achieve (close to native) speeds and be least be an order of magnitude or two closer to optimized GPU algorithms.

Using specialized hardware, image tasks (and even more so audio tasks which must be present for accessibility reasons) have the same issue that they can be solved by GPU algorithms (i.e. machine learning, in which even a low percentage success rate would already be enough). If you search on GitHub you will find there are more ML captcha cracking repos than captcha implementations - they are probably even easier to get started with than adapting GPU miner code.

Image/Audio Captcha vs ML is an arms race that can be beat for split seconds of compute (even on CPU) or cheap human labeling: it's just as broken. FriendlyCaptcha optimizes for the end user (privacy + effort + accessibility) by not engaging in the arms race - I think it makes a better trade-off. Like the sibling comment pointed out the captcha solving can happen entirely in the background so that hopefully it doesn't even make the user wait.

As for rate limiting/difficulty adjustment: it's not perfect and it could lead to problems if you share the IP with a spammer (and let's be realistic: even with a million users on one IP there won't be tens of users signing up to some forum per minute). Also normal captchas have problems here though: users from these locales already get presented with much more difficult+frequent recaptcha tasks (I also doubt they are localized: American sidewalks are harder to label if you've never seen one in real life). Setting a reasonable upper limit to difficulty may be good enough here.

On not using blake2b: I have considered mutating the hashing algorithm every day randomly to make writing an optimized solver for it all that more difficult - but that would mean one could no longer self-serve the JS+WASM and be done with it. I won't rule it out for FriendlyCaptcha v2 if this does ever become a real problem.

Swapping out the hash function should be easy (the puzzles are versioned to allow for this). If you have a different function in mind and someone implements it in Assemblyscript (so we also have a JS fallback) then we can definitely consider it.


Thanks for your detailed response.

I've seen all the projects claiming to have broken ReCAPTCHA—often using Google's own ML services, hilariously—but it's unclear to me how broken image/audio CAPTCHAs are in practice (and the number of GitHub repos doesn't seem like a good measure to me). If they really are completely broken, then why are they still so widely used? If they really are completely broken by ML, how do human CAPTCHA-solving services stay in business?

FriendlyCaptcha optimizes for the end user (privacy + effort + accessibility)

Good point. I am concerned though that burning CPU cycles on the proof-of-work uses battery life if the end-user is on mobile, without getting their getting any choice in the matter. What if, given an informed choice, they would have preferred an image CAPTCHA? (On the other hand, that could use more cellular data. Might be good to run the numbers on this too.)

even with a million users on one IP there won't be tens of users signing up to some forum per minute

I think this is a bad choice of threat model. "Some forum" would likely be better off with simpler measures, like a hidden honeypot textbox: https://dev.to/felipperegazio/how-to-create-a-simple-honeypo...


Cool project, but I do find it quite ironic that it's named friendly captcha when it's not a captcha.


How would you define "CAPTCHA"?


The original expansion was "Completely Automated Public Turing test to tell Computers and Humans Apart".


CAPTCHA: a computer program or system intended to distinguish human from machine input, typically as a way of thwarting spam and automated extraction of data from websites

I would say this Oxford Languages dictionary definition is close enough.


Really nice! Finally someone is using the blockchain technology in a meaningful way!


This doesn't use a blockchain, it uses a Hashcash-style proof-of-work function (an idea that predates the Bitcoin by decades): https://en.wikipedia.org/wiki/Hashcash


Awesome work, I will be giving this a try in my next project


> up to 20 seconds on old smartphones

That sounds like a very battery-unfriendly idea.


It's not perfect, but maxing a single core for 20 seconds on an older smartphone is a necessary evil for this kind of captcha.

The alternative: loading a third party script and multiple images (~2MB) to label for ReCAPTCHA and spending time performing the task also takes some battery (and mental) power.


> We do have Recaptcha support which can be configured - are you seeing these kinds of issues with that enabled/configured?

Thanks, I have used Recaptcha for a long time now. It made no difference.

> One item that is on the roadmap that is coming and may be of interest is `Optional Admin Approval for local user sign up` - https://gitlab.com/groups/gitlab-org/-/epics/4491.

Yes, that would be a very sensible solution and welcome feature for my use case here.

Unfortunately, from the bottom of that issue tracker:

"Yikes. I'm glad we did the further breakdown and pre-work. It's a bit cringeworthy looking back and seeing I estimated a 5"


Hi! I'm a PM at GitLab. Please see my reply above for more details but TL;DR we shipped the first iteration of the `Optional Admin Approval for local user sign up` feature in 13.5. I'd love your feedback! Please comment on the epic if there are other changes for this feature that would help your use case https://gitlab.com/groups/gitlab-org/-/epics/4491


Thanks for the update. I can certainly manage user sign-up from the admin tab for the time being. Once it's hooked into email, I believe that will make things maintainable again for me.

From a UX standpoint it's still sub-par. Someone who wants to report an issue doesn't want to wait an arbitrary amount of time to be allowed to report an issue. They are ready to report it at that moment.

And as an admin, I don't want to have to approve new users on a schedule to ensure the delay is low enough that they are still willing to submit the issue after I approve them. I'd much prefer they go ahead and submit the content, especially so that I can use it in my review of whether to approve the sign up or not.

I seem to remember some pattern in Gitlab where my login period timed out before I finished making a comment. When I logged back in, Gitlab had somehow saved my comment content so that I could then post it so that others could see it. Is there any way to use that pattern for users who haven't been approved yet? So that they can post content, but with a warning shown to them that other users won't see it until the sign-up is approved.


That's a really interesting idea! Users could have limited interactions with the instance and content queued up until approved by an administrator. I created an issue to capture this. https://gitlab.com/gitlab-org/gitlab/-/issues/273542


Relying on Google's Spying-as-a-Service tooling is not very FOSS at all.

There need to be other ways to reach out to users who block Google.


I immediately back out whenever encounter Recaptcha.

The other day I was forced to endure it, because I wanted to delete my ancient Minecraft account, since Microsoft pulled a Facebook and are going to require a Microsoft account to play going forwards. Without exaggeration, it took me 15 minutes of training Google surveillance AI (had to solve it three times), for Recaptcha to let me in. I guess Google really hates me.


Yesterday I spent the longest ever with a recaptcha, about 2-3 minutes, at a frigging checkout page. I decided to endure it just because I really needed that ergonomic kb+mouse combo.

Hopefully they'll allow me to solve captchas for longer without getting a RSI.


Are you sure you are human?


I'm human enough, and I've been a licensed driver long enough, to recognize that rumble strips at the side of a road are not crosswalks. But apparently enough bots thought they were that the system is now trained on that 'fact', and I as a human am forced to misidentify rumble strips as crosswalks to pass as human.

It's bizarre.


ReCaptcha also thinks that mailboxes are parking meters, for some reason.



Try reCAPTCHA’s audio version (the headphones icon), it’s much easier than guessing what images it wants you to click (if you speak English, have headphones, and are not hearing-impaired).


This sounds like it has the potential to be a modern version of the credit score: avoid it enough, and you become persona non grata. That is, for more than 15 minutes.


I do the same thing.


You're doing something very wrong if you take 15 minutes to solve these and aren't on Tor. Even on public VPN and Firefox this doesn't happen usually. I know people that pick the wrong options to fuck with their models though, and then go on HN to complain about recaptcha being annoying.


I have similar issues. I do not pick the wrong options. It also doesn't take me too long to solve the captchas, leading to "too many queries from your ip address". This is what internet users deal with when blocking most google services.


Thanks for bringing up this epic in the conversation phkai. I'm a PM at GitLab for our Auth group and am working on the `Optional Admin Approval for local user sign up` feature. I'm happy to tell y'all that we shipped the first iteration of this in our 13.5 release. You can find more information in our release blog https://about.gitlab.com/releases/2020/10/22/gitlab-13-5-rel... . I've also updated the epic with more information about its current status https://gitlab.com/groups/gitlab-org/-/epics/4491#status-upd....


For this specific case, the Wikimedia Foundation has explicitly stated that "It is the Free Software release of GitLab that runs optional non-free software such as Google Recaptcha to block abuse, which we do not plan to use." So, not incredible helpful at the moment.

Also, is manual approval for new signups a good idea for a large FOSS project? It seems like a pretty big barrier to legitimate discussion.


We (at torproject.org) also adopted GitLab CE recently and we had to close down registrations because of abuse. Tens (hundreds?) of seemingly fake accounts were created in the two weeks we had registrations opened and we had to go through each one of those to make sure they were legitimate. In our case, snippets were not directly the problem: user profiles were used as spam directly.

We can't use ReCAPTCHA or Akismet for obvious privacy reasons. The new "admin approval" process in 13.5 is interesting, but doesn't work so well for us, because it's hard to judge if an account should be allowed or not.

As a workaround, we implemented a "lobby": a simple Django app that sits in front of gitlab to moderate admissions.

https://gitlab.torproject.org/tpo/tpa/gitlab-lobby/

The idea is people have to provide a reason (free form text field) to justify their account. We'd also like people to be able to file bugs from there directly, in one shot.

We're also thinking of enabling the service desk to have that lower bar for entry, but we're worried about abuse there as well.

Having alternatives to ReCAPTCHA would be quite useful for us as well.


You have to remove incentives. Block the viewing of these snippets by logged out users by default and require opt-in and a way to whitelist snippets by snippet or user. Same for user profiles


I don't think this is targeting human views - but it's targeting Google for SERP (Search Engine Results Pages) boost.


That's the point. Having a way to disable search engines would also work, but wouldn't be obvious to spammers so they would still try to spam. Disabling all users by default works to remove the incentive to try


Is this something that we will have in the CE version (the open licensed one) or it will only go to the enterprise one?


None of your captcha settings work, not even the invisible captcha setting that requires enabling a feature flag.


Have you thought of the option of disabling links? That would make SEO spam impossible


Is just adding the attribute rel="nofollow ugc" to any links in submitted content may be good enough. This tells search engines to not index, or tag them as suspicious, allowing them them to identify SEO spam more easily. [1]

Having both options would be great.

[1] https://support.google.com/webmasters/answer/96569


The spam is infuriating (not GitLab's fault, of course). Atleast, on our instance at https://git.cloudron.io, we got massive snippet spam. After we disabled snippets, we got massive spam on the issue tracker (!). The way we "fixed" is by turning on mandatory 2FA for all users.

As a general lesson, what we learnt is these are not bots. These are real humans working in some poor country manually creating accounts (always gmail accounts) and pasting all sorts of random text. Some of these people even setup 2FA and open issues with junk text, it's amazing. Unfortunately, GitLab from what I can tell cannot make issues read-only to non project members (i.e I only want project members to open issues, others can just read and watch issues).

Currently, our forum spam (https://forum.cloudron.io) is way more than GitLab spam. On the forum, we even have Captcha enabled (something we despise) but even that doesn't help when there are real humans at work.


We had one of the "real humans" write to us (in issues) asking us to leave his spam up for "just a few hours".

We implemented a filter anyway.

(This was not Gitlab, but a specific form on our unique website.)


> asking us to leave his spam up for "just a few hours"

What... why? What is their goal???


Feeding their family?


But like... who's paying for that kind of spam??


In our case, mostly pirate TV streaming services. (Watch NBA games live etc.)


A lot of people unfortunately seem to bite on those "SEO experts" kind of emails. I had a few clients ask me if they should give it a try, since, "why not, it's cheap".


Why are they posting random text in Gitlab?


This is a typical spam profile. Usually they contain links, which search engines follow.

https://forum.cloudron.io/user/cardioaseg


The link contains rel=nofollow.


That doesn't matter, see other comments below on Google's changing treatment of this attribute.

Also you'll find spambots posting on any open form on the internet even if it doesn't do them any good, because much of it is automated, so even if you hide the results the spam will still come in.


I don’t know for sure, but I think our Markdown implementation adds nofollow.


I used to think that spammers would stop if their spamming didn't win them any results. But they don't care. They spread their spam as widely as possible without trying to prune out the places where it does them no good.


I am not entirely sure. See https://forum.cloudron.io/users, if you go to say page 10 or something you will see all sorts of nonsense. I am still trying to figure what the best way to fight this spam (because captcha is enabled and required to even create accounts). But these are real people and not bots. I know this because they even post new messages all the time.


Definitely the SEO backlinks- for example one profile I see is linking to an Indian escort service in the profile.


Maybe GitLab needs an option to disable external linking, and filter any comment that contains an external link automatically


Or a nofollow option (add rel=nofollow)


That's a great idea. We have discussed ways of getting a trust level, and enable this for specific groups. Discourse uses the same system for preventing spam. "Good" bots detect the rel=nofollow and do not come back.

See my proposal here: https://gitlab.com/gitlab-org/gitlab/-/issues/14156#note_258...


Iterating on my original thought, here is a smaller feature request for self-hosted GitLab instances. This can help GitLab.com too: https://gitlab.com/gitlab-org/gitlab/-/issues/273618


I still think an even better path to success is to allow entirely disabling linking for non-admins.

Google no longer treats "nofollow" as strongly as it used to: https://webmasters.googleblog.com/2019/09/evolving-nofollow-...


Thanks for sharing, I have added it to the issue, maybe you want to join the discussion there :) https://gitlab.com/gitlab-org/gitlab/-/issues/273618#note_43...

Just so that I can follow - URLs posted by non-admins should not render as HTML URLs at all? Wouldn't that be quite limiting for OSS project members for example?


My opinion on the topic isn't definitive by any means, but I think a lot of projects would do just fine without allowing arbitrary hyperlinks to be added by non-admins.

I think being able to link to related issues and link into the code is still important, for example.

It's certainly a trade off, but spammers want it to be rendered as a link.


getting that sweet sweet seo backlink juice


Isn’t that why we add rel=nofollow to low friction user submitted links on our platforms?


Google changed the interpretation of those a year ago. https://webmasters.googleblog.com/2019/09/evolving-nofollow-...


> Looking at all the links we encounter can also help us better understand unnatural linking patterns.

It appears as though they want to mark these links in order to prevent inorganic SEO, not help it.


I don't get it. They post all this spam in the hopes that people click on the links therein, thereby boosting the ranking of those sites? Does that actually work at all?


It doesn’t actually require anyone clicking on the links. Google sees inbound links and uses that as a factor when calculating the ranking of the linked page.


I thought that was how it worked like a decade or more ago, but not today.


Regardless of whether it works, people still pay for it. I have a Facebook ad right now that says "Get over 500,000 backlinks for $29.99". No doubt it's someone with a bot that spams comment forms.


A service like Stop Forum Spam might be a solution to this. It checks for IP address and email address and gives it a value based on how likely it is assumed to be a spammer.

When they have to set up a new email account and maybe even a new IP address for every few accounts, it gets to be a lot of work soon.

https://www.stopforumspam.com/


How do you know they are real humans? I imagine bots doing 2FA would still be cheaper.


I know this because in our forum we have LOTS of "spam" users - https://forum.cloudron.io/users . These users will go into posts and actually make "helpful" comments. Like, "Oh I tried this solution but I found that my disk my full. Deleting data fixed my problem". It almost seems genuine but they build reputation and once they have some votes, they go back and edit all the comments to have links.


Banning entire countries helps a lot. I don't want to name certain countries, but let's assume it's one where it's common to see human corpses floating on a big river.


That doesn't help narrow it down. I live in Seattle and the first think I thought of was a popular tiktok of teens finding a corpse in the river last month.


The word you missed was "common."


Apart from the fact that banning an entire country from contributing to their code would be antithetical to the Wikimedia foundation, if you're implying the country which I think you're implying (which is also where I live, btw) you'll:

1. Ban a burgeoning tech industry which has produced over 20 unicorns,receives billions in funding from across the world and produces world-class tech talent; 2. Ban millions of other OSS developers from contributing; and 3. Just lead to SEO spammers picking out other impoverished countries to spam from, which means finally you'll end up with only people from the "west" being able to contribute in any way.


Many bots are likely still powered under the hood by humans.

On my backlog of projects to do is to make a browser extension that solves the more obnoxious captchas for me, as I'm regularly behind vpn and fall into ridiculously long solve loops.

On the most popular api i could find, $10 buys you a shockingly LOT of solves (not that I've tested it yet). It is automatable but ultimately still powered by humans.


It’s incredibly sad how the open web is being destroyed by google’s recaptcha.


Without google's recaptcha, do you think there would be less spam?

Personally, I suspect there would be more without at least some speed bumps to raise the cost of spamming. I would absolutely love for there to be better options than recaptcha that meets the same needs around bot-detection, price, implementation effort, and accessibility. It is, sadly, the best option I've seen on offer.

You're right. The scenario we're in is incredibly sad. It would be wonderful if the individual actors involved had better options to meet their needs.


I'd argue that it's equally sad to see the open web get destroyed by massive DDoS attacks and malicious actors. How would you keep your own website up if it was constantly being attacked?


You're barking up the wrong tree. Bad actors create abuse and spam which they can do because of fundamental weaknesses in the design of the internet. People trying to solve that reality with Recaptcha (and Cloudflare for that matter) aren't the ones destroying the internet.


I don't think it's so much the wrong tree as it is but one tree in a forest to be barking up.

All the maturely developed bot filters frequently throw me in an endless battery of tests that have me giving up in frustration before finally making it through to content I'm requesting.

> aren't the ones destroying the internet

IMO they are every bit as much destroying it as the abusers they're claiming to fend off.


I'm totally in that camp of opinion, although I'll acknowledge the escalating abuses carried out by both "sides."

In the meantime, i hope to have the savviness to program my own way out of unsolvable captchas.


Already exists: https://github.com/dessant/buster

Edit: on re-read, you meant solving using humans. Buster uses speech-to-text APIs to solve.


Every lead to spice(e:solve, lol gboard) my problems is probably worth a peek. I'll take a look, thank you.


Could add nocrawl to your robots.txt and advertise the fact on signup page that search engines won’t find this content.


At GitLab Inc. we have a Trust and Safety team https://about.gitlab.com/handbook/engineering/security/opera... that prevents spam.

So far that functionality has lived in separate repositories from the core codebase since few people needed it, the cycle time was quicker, and it is an advantage to not have the spammers see the code.

If there is strong interest in collaborating on this I'm sure they will be happy to engage. I'll ask them how best to structure this.


There's been a gitlab bug for almost 3 years to stop relying on recaptcha, https://gitlab.com/gitlab-org/gitlab-foss/-/issues/45684 Debian, KDE and Gnome have never wanted to make their users run Google's nonfree javascript blob to contribute on their gitlab instance. There's been interest, Gitlab has done very little about it. Edit: other bugs about this can be found here https://gitlab.com/gitlab-org/gitlab-foss/-/issues/46548


We have a team currently working on improving the detection and mitigation of spam. We continue to look for ways to improve the security and user experience of our product. Our product includes the Akismet Spam filter which you can read more about in our handbook: https://about.gitlab.com/handbook/support/workflows/managing.... Further, Gitlab.com includes the ability to report abuse directly to our trust & safety team here: https://about.gitlab.com/handbook/engineering/security/opera... however, the report abuse feature on self-managed reports back to the instance admin. We are also currently developing an anti-spam feature intended to further improve spam detection & mitigation. This is set to be enabled on GitLab.com within 3 months.


As mentioned above in the thread, multiple times, maybe a simpler solution to reduce spam is to remove incentives by:

- removing links (making them as plain text forcing users to copy paste them..) - hiding links from non-registered users (plain text to non-registered users, clickable for registered users), - blocking links from search engine crawlers (robots.txt / rel=nofollow...).

Maybe these fall in the "for each complex problem there is simple but wrong solution" but it sounds like it's worth a try.


(I already replied on a different thread but this might make more sense)

A service like Stop Forum Spam might be a solution to this. It checks for IP address and email address and gives it a value based on how likely it is assumed to be a spammer.

When they have to set up a new email account and maybe even a new IP address for every few accounts, it gets to be a lot of work soon.

https://www.stopforumspam.com/

It has a very simple API and is not that hard to implement (really, I have done it myself :) )


Appreciate the response - I'll look into now


Okay, thank you. I see Gitlab is mostly Ruby. Just to get a general idea of the code this is a simple PHP function to use it:

https://plugins.trac.wordpress.org/browser/gwolle-gb/trunk/f...

That function can be called when the register form has been submitted. It will return true or false. Forget about the transient stuff, that is just WordPres caching stuff.

You don't need an API key like with Akismet. You would only need it if you want to add or remove entries from the SFS database. It really is much simpler. Ofcourse you might want to have a checkbox in the settings. But still, in an afternoon you might be able to finish this :)

Wish you the best.


Great suggestion, this looks like a very straightforward service and implementation. All open source as well.


I think the code of this problem is that it is hard to identify if a user is a bot or a human. I've not seen any elegant free solutions to this.


That is not the core of the problem. Spammers are humans, and sometimes they will solve recaptchas in large quantities to get their spam through. Its about having a multipronged approach for administrators to stay ahead of them. For some examples of free solutions see https://www.mediawiki.org/wiki/Manual:Combating_spam. It's even possible to connect spamassassin to forms. Gitlab needs tools and automation that detects and rolls back spam, bans users, knobs to tune restrictions and rate limits based on how spammers are acting. Gitlab inc just hasn't seemed to care much to help people trying to use Gitlab and keep their software freedom.


I think the focus of our Trust and Safety team has been on GitLab.com and not on all GitLab instances. We'll discuss changing this.


Thank you.


GitLab team member here. We just added a new page to our Handbook where we share approaches to preventing, detecting and mitigating spam on self-managed instances of GitLab. https://about.gitlab.com/handbook/engineering/security/opera...

We want to hear from you! Instructions on how to contact us: https://about.gitlab.com/handbook/engineering/security/opera...


I'm curious about the spamassassin integration. Do you know of any open source projects currently using it for a web application?


I'll be curious to see whether they even use GitLab user auth. For Gerrit (and Phabricator), Wikimedia already requires contributors to have a dev account on Wikimedia's LDAP system: https://wikitech.wikimedia.org/wiki/Help:Create_a_Wikimedia_...


Can you say more about how it's a problem if people can view things without logging in? Naively I would have seen that as a plus.


If you allow new users to create user profiles with links, and those user profiles are visible to Google, spammers will create a bunch of new user accounts and fill them with spam links.

The easiest way to prevent this is to block Google from seeing user profiles by requiring login to see the profiles.


> "and those user profiles are visible to Google"

googlebot adheres to robots.txt, right?

in which case couldn't self-hosted gitlab admins add a robots.txt entry for the profile page url?


That requires the spammers to notice that it's blocked in robots.txt, which seems optimistic


There is also adding the rel="nofollow ugc" to user submitted links that removes the benefits of linking for spammers.

https://support.google.com/webmasters/answer/96569


Most bots aren't going to bother checking if you've done that, so while they'll not get the expected benefit you'll still get the spam.


Because spammers fill the publicly viewable things-- like snippets and user profiles-- with spam. If it can be viewed without logging in, then they get it indexed with Google and it dilutes the search results.

Nobody views snippets and user profiles as a common part of daily development, so it takes time away from development to investigate those things to prune them. And if you don't prune it fast enough, it gets into the search results at which point it's even more of a pain in the ass to remove (even using Google's webmaster tools).


> Because spammers fill the publicly viewable things-- like snippets and user profiles-- with spam. If it can be viewed without logging in, then they get it indexed with Google and it dilutes the search results.

Isn't then the problem not that it is viewable but that it's not excluded from indexing by robots.txt?


Anything visible without login will be visible to people who want to follow a bare URL to your tracker and visible to a search engine crawler, making it visible to people without a URL who just search for the issue. That is, indeed, a plus.

But even if you require login to post stuff to the issue tracker, creating a login and posting a comment has been trivially automated.

You're no longer running a useful issue tracker, you're running a free ad network: you're hosting a dozen useful issues and a thousand advertisements and blackhat SEO comments for spammers.

If it's not visible to search engines, and your repo doesn't get much traffic, it's not nearly as valuable to spammers. It's kind of like cutting off your nose to spite your face, but those basic economics of cost to spammers, cost to users, value to spammers, and value to users are the only rules that you can really apply when hosting content on the Internet.


> creating a login and posting a comment has been trivially automated

Isn't this what various CAPTCHA tools handle?

You can also require email address validation.


1. Create user account

2. Create spam content (snippets, profile, etc.)

3. Get spam content search indexed

4. Profit


a free user sign up is not going to prevent scraping...


I think the idea is that if you can't view issues without logging in, then google won't index your issues (because it can't view them), so you won't get people spamming in order to get into google


Over the years I've frequently seen Google search results showing things that require login to the indexed site. Has that changed?


I believe that requires the site giving google’s bot a logged in view rather than something google does themselves.


It also means 95% of people will stop casually viewing issues.


The problem isn't scraping, it's spam.


Strange I never saw this behaviour on our Gitlab instance invent.kde.org.


You seem to be using a central login system (https://identity.kde.org/) that requires going to a separate website to create an account which presumably is non-standard enough to throw off most bots.


invent.kde.org uses the nonfree google Recaptcha, that prevents it mostly. Not very nice for KDE to make people run nonfree software blob in their browser that gives up their freedom, gives up their privacy to google and trains Google's proprietary machine learning models.


Where does it use that?


Personally, I also think, GitHub's Checks API, and it Github bots. Gives a much better experience compared to GitLab.

On a daily basis I am confused about how diff's are being rendered in GitLab merge requests, as it has a weird way to render ${blah} in strings in lines.

Also when you want to check if an issue exists in GitLab own repository, you always end up in some jungle of redirected tickets. Just now I got redirected to three different tickets , as they switch projects or something. Really annoying.


Can you share an example or screenshot of what you mean around the rendering issue in merge requests?

As for the issue redirects - it does stink. It's an artifact of our move to a single code base for CE and EE [1]. A lot of issues have long-standing SEO so the "old" issue often comes up in a Google search.

[1] https://about.gitlab.com/blog/2019/08/23/a-single-codebase-f...


Yes, for example, in today's pull request, I added a new file and now in my Typescript file it renders things like this, which I find confusing: https://imgur.com/a/C82hcMK


In general settings, you can check "Public" under "Restricted visibility levels". According to the blurb, "selected levels cannot be used by non-admin users for groups, projects or snippets".

Is that not what you want with #2?


It's what I want for #2, but it has the unfortunate side-effect of restricting visibility for my main public repo. My #1 goal above is for people to be able to clone from my main repo without logging in.


I see. And as soon as you make the repo public you end up with public issues again, unless you restrict them to project members...

And a workaround of auto syncing just the code to a public repo where issues and stuff is disabled isn't available natively in CE.


If you can, place your Gitlab CE instance behind an LDAP server. Have another site handle signups. (admitedly, setting up something with LDAP is often a massive pain. I duct-tape around it by using LdapJS on top of a CMS)

I've had a handful of projects where human spammers will bother to create an account and jump through the loops, but in the 2-3 years of running a Gitlab instance, which has 1300 users, I only had 2-3 incidents (we keep an eye on recent projects, snippets, etc).


The GitLab LDAP config is pretty easy.


I would encourage folks to look at Gitea.io. I run that on Kubernetes alongside Drone and it basically replicates all the most important parts of GitHub.


You'd think Wikimedia in particular has experience with the issue of spam bots polluting the site's search result.


Is this mainly a concern for the Gitlab issue tracker? Wikimedia will continue to use Phabricator for issue tracking, Gitlab CE will only be used for CI and code review/hosting...


No. Spammers will create repos and user profiles and snippets and anything they can with spam in them.


I would imagine authentication being done through Wikimedia's existing LDAP or Mediawiki solution and I hope that features that already exists in Phabricator(such as snippets) will be disabled.


Is it possible to configure a robots.txt file to accomplish #2?


No, robots.txt is for well behaving bots like bing bot and google bot, not bots that will spam your forums (and Git repo apparently).


It doesn't stop the spam from being created, but it does stop the spam from ruining your site reputation in search results was the suggestion I guess.


But the goal is to prevent spam in the first place. I don't think these bots will verify robots.txt to see if the spamming is effective. They just spam anything they can get their hands on.


They probably don't have general code that spam any form, it's more likely that they have code specific to GitLab CE instances that knows to post snippets. If GitLab changes their default configuration so that those snippets are no longer indexed by Google, the spammer are likely to stop using that GitLab CE spamming script, after a while.


But if the issue is SEO bots then robots.txt would block the search engines thus meaning the spam content is of no importance (it's effectively private) and doesn't cause issues for the main sites SEO (nor help the spammers).


To be honest, I'm not certain of the purpose of the spam.

Some portion of it would end up in the search results, sure.

But I don't know if there's some secondary benefit to, say, a casino showing a link coming from my site even if my site has a robots.txt saying that the address for that link isn't to be directly indexed.

Is there such a benefit? If not then I'll just set up the robots.txt and observe whether that does indeed solve the problem. But I'd much prefer to just set up the permissions I know I want on my own running instance than spend time making inferences about the reasons bots are abusing my instance's inputs.


That's right. I'm talking about SEO spam. Basically anything that has a url where the content includes input from the user will be spammed.

I'm fine with, say, the spammers hammering the main repo's merge requests and issue tracker. Those are things any healthy project will check regularly-- I'm even fine just pruning the spam there by hand (and historically I haven't gotten a lot there anyway).

But I don't regularly look at the global view of snippets, and I don't want to regularly prune the global user list for SEO spam in the user profiles. There's no good reason most FOSS projects need those things to be publicly viewable, anyway. But AFAICT Gitlab's admin settings only have a single setting that affects all these things across the board. So if you make snippets viewable only to logged in users, then nobody can clone from the main repo without logging in.

It's quite frustrating, and Gitlab shows no interest in disabling or hiding features like snippets and user profiles.


robots.txt is nothing more than a request. It's essentially a "no dog on lawn" sign in the yard of a vacation home.


I wonder how effective QuestyCaptcha would be on GitLab.


Generally speaking questycaptcha works great if you fly under the radar enough that nobody is putting any effort in, and its just automated bots. It tends to fall apart the more high profile you are.


Can't you just put the gitlab instance behind an nginx proxy to achieve this? Like, if you are requesting ^/user/$, check for a cookie; if invalid, return 403


I honestly can't see why someone would go through the trouble of making sure their instance is correctly configured and available when there are solutions (like GitHub) that just work out of the box.


Control. Github can change the behavior, pricing, availability, or security of their offerings at any time. If they get hacked then you could suffer. GH is also closed source.

For many a self hosted solution is better, despite the costs.


It's not a lot of trouble. To be fair my GitLab instance isn't setup for large numbers of public contributors. I have a somewhat limited network connection and work on projects that often have large-ish codebases, building Docker images, etc.... and I do 90% of that on my home network (local servers, storage, etc....). So running GitLab locally allows me (and a few other folks) to get all those nice features without relying on the world facing internet connection and without having lots of delays moving large files up and down...


That's the use-case I can understand (if someone has a large number of machines or a fast internal network behind a VPN). My comment was more aimed at public facing projects that are open and accept contributions from anyone.


In the past, I've found Gerrit to be reasonably good. Phabricator, on the other hand, not so much.

Having worked with MediaWiki in the past on CRs, I think this will be a good move to modernize things for them.

When faced with a similar task around the same time at Wikia (now Fandom), we chose GitHub while we were moving off of SVN. I'm glad we did at the time, even without all the additional features GitHub has.

I understand why WMF didn't choose GitHub. Compared to their current stack, Gitlab is going to feel like a serious upgrade.


What issues did you have with Phabricator? I'm maintaining phabricator for the WMF and I'm interested in anything that could improve the user experience.


I was reviewing code in mercurial's phabrictor and it was awful the most notable was that when a new version was uploaded the comments stayed on the same line number instead of sticking to the same code.

There were other annoyances but it would move at least to "ok", maybe even "good" if that was fixed.


This is not (and has never been) the behavior of Phabricator.

See <https://secure.phabricator.com/T7447> for discussion of why this feature can never work the way you think it should work in the general case and why I believe other implementations, particularly GitHub's implementation, make the wrong tradeoffs (GitHub simply discards comments it can't find an exact matching line for).

If you believe this feature is possible to implement the way you imagine, I invite you to suggest an implementation. I am confident I can easily provide a counterexample which your implementation gets wrong (by either porting the inline forward to a line a human user would not choose, or by failing to port an inline which is still relevant forward).


I don't need perfect, I just need good. GitHub and GitLab both have good implementations as well as every other good code review system I have used. GitHub annoying tries its hardest to hide the "outdated" comments but GitLab has the option to keep them open (they are no longer visible in the code, but remain on the discussion tab)

So I appreciate your opinion that it is impossible, but as a reviewer I much prefer when the tool tries.


GitHub's implementation does not do what you claim it does. GitHub has no behavior around porting and placing comments (while Phabricator does), GitHub just hides anything it can't place exactly. See my link above for a detailed description of GitHub's very simple implementation. I believe this is absolutely the wrong tradeoff.


I use GitHub every day, I've definitely seen it preserve some comments. Sure, it drops a lot. But I still prefer this to dropping them all, or showing the comments on the wrong lines.


I mean that GitHub does not "try", in the sense of looking at the interdiff, doing fuzzy matching, trying to identify line-by-line similarity, etc. It places comments only if the hunk is exactly unchanged and gives up otherwise.

Phabricator does "try", in the sense that it examines the interdiff and attempts (of course, imperfectly, because no implementation can be perfect) to track line movement across hunk mutations.

My claim is that all comments which GitHub places correctly, Phabricator also places correctly. And some comments which GitHub drops, Phabricator places correctly (on the same line a human would select)! However, some comments which GitHub drops, Phabricator places incorrectly (on a line other than the line a human would select).

So the actual implementation you prefer is not one that tries, but one that doesn't try! Phabricator could have approximately GitHub's behavior by just deleting a bunch of code.

That's perfectly fine: many other users also prefer comments be discarded rather than tracked to a possibly-wrong line, too. I strongly believe this isn't a good behavior for code review software, which is why Phabricator doesn't do it -- but Phabricator puts substantially more effort into trying to track and place comments correctly than GitHub does.


In particular, see <https://secure.phabricator.com/T7447#112231> for a specific example which I believe GitHub's implementation gets egregiously wrong, by silently discarding an inline which is highly relevant to discussing the change.


After reviewing some tools we went with Phabricator ourselves; it's not ideal, but it's open source (read: free, we can't afford a $x / seat license) and self-hosted.


Phab has been great for my side projects.

- You can host Phabricator on a $5/mo VPS and have CPU to spare, whereas GitLab Ruby is a big hog that requires a $20/mo box minimum.

- Able to deploy + configure it within a few hours, even with fancy features like emails via mailgun, using pygmentize to highlight code, and observed + managed repos. Defaults are all reasonable and get you moving quickly. Haven't had to touch config since I set it up.

- The Kanban + stories + PR flow is wholly sufficient. Arcanist grows on you fast. It totally abstracts the PR workflow for most any VCS and can help enforce practices (e.g. reviews, sign-off, merging, etc). "Projects as tags" feels weird at first but ends up giving you fantastic cross-sectional views of your issues.


Phabricator is amazing as an all-in-one solution for a small shop.


I think Phabricator is a really powerful tool for engineering teams, but when you try to do more cross-functional team collaboration, it's not as user-friendly as GitLab.

I used Phabricator at a previous company and miss some functionality, like Phabricator's ability to show issue dependencies in a more intuitive and granular way -- but at that company, we had a lot of trouble getting the Design team to use Phabricator, for example.

As OSS communities continue to onboard newcomers, they're faced with a generation that expects modern interfaces that are user-friendly. Having user-friendly tooling also helps promote diversity of OSS communities since it's easier to onboard people with all sorts of backgrounds, since the technical adoption barrier is lowered.

I think GitLab is a clear winner here since it's user friendly and designed for cross-functional team collaboration (GitLab dog foods their own product in all departments of the team, so you have HR, Marketing, Finance, etc all using it, in addition to the full product teams).

Full disclosure: I work at GitLab as the OSS Program Manager. Part of the reason I joined was because I feel really strongly about GitLab's ability to lower the contribution barrier and get more people involved in OSS.


I'm curious about this because I've wanted to get off Phab as soon as I started using it

- Whats your thoughts on "arc"? It seems like a whole can of worms of problems you can run into with basic branch flows. I know teams that have complex branch flows and it is a nightmare. Same for Windows users.

- What do you use for a CI? How well does the integration work for you?

- How is it with tracking conversations on Diffs?

- Any particular plugins or bots for it that help make the difference?


@epage -- Unfortunately I can't speak to the branch flow or bots question. Perhaps someone else here can? Other answers are below.

-- Re: CI -- We dogfood GitLab CI via gitlab.com -- so no need for integrations.

GitLab non-engineering teams use CI all the time because we constantly update the handbook to document all of our work.

I would love to see this practice more often in OSS orgs, and other companies for that matter. Having a handbook-first approach (https://about.gitlab.com/company/culture/all-remote/handbook...) really helps enable remote team collaboration and makes it easier for newcomers to jump in. I think OSS orgs have done a good job of recognizing the importance of documentation for development projects, but there's an opportunity to increase documentation around workflows and community operations.

-- Re: tracking conversations on Diffs -- Admittedly, I don't have a lot of experience with this outside of GitLab. But maybe that's the point. It's easy to chime in on diffs on merge requests on GitLab, and one of my favorite features is "suggesting changes" where you can add in a suggested update to a diff and the author can choose whether or not to apply it.

Here's a link with info about suggested changes: https://docs.gitlab.com/ee/user/discussions/#suggest-changes

It's within the larger doc addressing discussions in GitLab in general: https://docs.gitlab.com/ee/user/discussions/

-- Btw, for anyone interested, here's more info on using GitLab for project management:

- https://about.gitlab.com/solutions/project-management/ - https://www.buggycoder.com/project-management-with-gitlab/ - https://thenewstack.io/gitlab-issue-board-project-management...

I gave a presentation about cross-functional team collaboration using GitLab at GNOME's GUADEC this year. Here are the slides: https://events.gnome.org/event/1/contributions/70/ .. As a program manager, I'm generally really excited about this topic!

-- Some of the features I talk about are not available as part of the Community Edition, but there's the GitLab for Open Source program which gives OSS projects access to our top tiers, plus 50K CI mins per month, for free.

I'm hoping to make the program's page more discoverable, but in the meantime, here's the link: https://about.gitlab.com/solutions/open-source/


What features do you think Gitlab lacks compared to GitHub?

I haven't used the CI/CD features of either, but PR/MR features seem comparable. Is it the advanced workflow stuff and CI/CD integration where GitHub is better? Bots?

I think git in general should copy the approach of Fossil and include issue management and wikis along with the repo, to keep things consistent and avoid vendor lock-in.

But I would be a lot more worried about being locked-in to GitHub than Gitlab.


> I think git in general should copy the approach of Fossil and include issue management and wikis along with the repo, to keep things consistent and avoid vendor lock-in.

A few paragraphs I recently wrote elsewhere:

The entire state of code forges as a general thing in 2020 is all the evidence you could possibly want that version control systems (Git, I'm talking about Git) are themselves massively deficient in design.

I rant about this all the time, but there is an entire class of argument about how & whether to use GitHub / GitLab / Gitea / Phabricator / Gerrit / sourcehut / mailing lists / whatever that would mostly vanish if the underlying data model in the de facto standard was rich enough to support the actual work of software development. Because it's not, we find ourselves in a situation where no widely used DVCS is actually distributed in practice, and the tooling around version control is subject to platform monopolization by untrustworthy actors and competitive moats.

Code review should itself be distributed/federated, but few of the people involved have incentives to make that happen. It's possible something like https://github.com/forgefed/forgefed will eventually get traction, and Git has been dominant for long enough that I wonder all the time when we might see a viable successor that learns from its fundamental mistake. In the meantime we're forced to choose from a frankly pretty terrible lot of options in the broad structural sense.

(For clarity, I'm a WMF employee and am involved in the decision to migrate to GitLab.)


I feel like the git model makes a lot of sense when viewed as an extension to the mailing list code review system. But most people dont want that model. However trying to fit git to other models is a bit round peg into slightly square hole imo.


Yeah, from that angle and from the perspective of 2005 it's a reasonable design, and I think what I describe above as a massive deficiency only really becomes visible in the light of everything that's happened since.


To me, it sounds like the issue is that you need a central source of truth that everyone can pull from for their purposes, and distributing the code review part doesn't sound like it'll add much. In the current climate, most anyone requesting code review is probably trying to merge into the main central source of truth anyways, so what actual benefit does it bring to either the maintainers or the contributors?


Version control for a genuinely long-lived project is a problem that often outlasts:

- Dominant version control and code review system(s) / paradigms.

- The current configuration of institutional owners.

- Users' trust in an owner / sponsor / maintainer. (Forks happen for reasons.)

- The involvement of developers who remember why and how decisions were made.

- The trustworthiness of the entities that control services, applications, and network real estate used for development.

Some central source of truth is usually necessary, but maintainers and contributors don't benefit when that source of truth is subject to vendor lock-in or can otherwise only migrate at great cost. For all the collaborative benefit that GitHub has undeniably wrought, platform monopolies are eventually a failure mode for end users, at least as for-profit enterprises. With the exception of the dominant silo vendors, nobody in the ecosystem really benefits from being forced to choose a silo that will be hard (and lossy) to escape later. The silos are engineered to limit mobility and channel interoperability to their own ends, for business reasons that run directly contrary to the interests of their users.

If the protocol at hand were actually up to the task, we'd spend less effort and anxiety on the problems of all the non-protocol platform tooling that's been built up around it.


> an entire class of argument [...] mostly vanish if the underlying data model in the de facto standard was rich enough to support the actual work of software development.

Interesting idea. You think we could develop a unified data model that covers source code, static files, documentation, project management and community management as a single unified thing?

That’s certainly ambitious, and I’d love to see it. For the moment it seem that Git has won for source code (in a pretty crowded field) because just that part was hard and it was a big improvement. The collaboration tools it includes, mostly around email, appear to be inadequate for most projects. So now we see a healthy ecosystem that adds rich collaboration on top of / next to Git.

> no widely used DVCS is actually distributed in practice

I think this is due to economic and social factors rather than technical ones. Fully distributing a Git repo is very doable, but harder to think about than the Github model. Plus you have all the normal P2P problems around who’s online and how good their connection is.

> tooling around version control is subject to platform monopolization

Again, I think this is simply the social network effect more than anything else. Making a website for your project let’s people find it, use it, and contribute to it. The bar to entry is lowered further if it’s a common platform, where people already have accounts and know how it works, and where they can get a consolidated view of all their activity.

Centralized hosting makes even more sense as projects grow and you only want a subset of the code on any given development machine. Eventually big monorepos preset serious scaling challenges.

Still... I completely agree that it would be awesome to have a more self-sovereign computing architecture writ large. I’m just pessimistic we can get there from here.


> You think we could develop a unified data model that covers source code, static files, documentation, project management and community management as a single unified thing?

Realistically, not exactly, given how much space some of those things cover.

I do think that entities like code review are as much a part of the history of a project as the deltas to code. Reviews not being first-class objects in the VCS itself has turned out to be a crack into which you can wedge an entire GitHub.

I won't claim I know where best to draw the line here. Better handling of large static files by default and a robust way to model relationships between projects obviously belong within the VCS. On the other hand, relationships modeled in issue tracking systems and the like are also part of the software's history, but past some level of complexity it gets much harder to imagine wedging them into something that you can pass around like you clone a Git repo. All I can really say for sure is that it feels broken that all of this stuff lives in competing application silos.

(As a sidebar: Not that you can't jam things like review data into git-as-data-store. Gerrit does just that. But nobody's going to mistake that for a usable interface to code review.)

Anyhow, I don't think you're wrong about the social & economic factors, but I think a different landscape with less concentration of power could have shaken out if (for example) easy code review had been baked in and host-agnostic early on. Fully p2p architectures aren't feasible, or even necessarily desirable, for a lot of problems - but it shouldn't be too much to ask that things are able to be federated and resistant to capture by a single vendor.

> Still... I completely agree that it would be awesome to have a more self-sovereign computing architecture writ large. I’m just pessimistic we can get there from here.

Yeah, fair enough. I am myself boundlessly pessimistic about the future of computing generally.


> I think git in general should copy the approach of Fossil and include issue management and wikis along with the repo, to keep things consistent and avoid vendor lock-in.

It does include git send-mail, and I think Sourcehut’s use of that for issues is nice (and they quote customer claims that “SourceHut mailing lists are the best thing since the invention of reviewing patches.”).


Let's not kid ourselves here, it's because GH is owned by MS.


Its not. This isn't 2000s /. M$ is teh evil!!!11.

Its because GH is not available as self-hosted open source. Doesn't matter who owns it. Github was discussed and rejected by wikimedia back in 2012 as well, which was before MS bought them


Gitlab is worse on almost every angle compared to Github

It simply lacks the attention to detail...you can tell that Github walks the extra mile to get the UX right.

We used Gitlab for a year and then migrated to Github...it’s a joy!


Having used both a fair bit, I don't know what you're talking about. If anything my experience has been the opposite. Gitlab had the second mover advantage on a few things, while Github's interface has some weird oddities that seem to stem from the fact that that's how they've always been.


Interesting... we used GHE for 7 years and have now switched to gitlab. Gitlab CI, container repos, and Kubernetes integration has been amazing.


WMF is only replacing Gerrit for now, Phabricator will continue to be the issue tracker.


This is sad. In my experience, Gerrit is a much better code review system than Gitlab merge requests. But it is different from what people are used to.


Probably because you got used to Critique at Google ;)

I agree though. I think the most important thing in a code review system is inline comments in the diff itself, and that’s something you get from Gerrit, Phabricator (Differential), etc. It encourages people to discuss the particulars of of a diff. Merge approval can be made contingent on resolving minor issues within a diff. Diffs are also approved on a per-diff basis, and it’s less typical to merge a stack of diffs.

I think the pull request / merge request makes sense with the “trusted lieutenants” development model that the Linux kernel uses, but for other projects you would be more likely to want a work flow where someone submits a single commit/diff and then someone approves it (after comments).

When I review PRs on services like GitHub I very often think, “This should be several different reviews” and the discussion thread in a PR is often not a high-quality discussion. I don’t use GitLab as much but my experience is that it has the same problems. What I would love is to review a stack of commits and approve / make comments on the commits individually.

(For those reading: Mondrian -> Rietveld -> Gettit, and also Mondrian -> Critique. Mondrian and Critique are internal tools at Google. Phabricator originated at Facebook which has a lot of ex-Google engineers on staff.)


I don't use Github much, but Gitlab allows for multiple threads in a merge request. These threads may reference diffs/commits, but can also be directed at the merge request in general. Each thread has to be explicitly resolved before merging.


I think the pull request model still makes sense. Of course if you stick to small changes it tends towards the patch model. However thete still some cases where two or three commits at once make sense. Even rarer there are cases where marging a bunch of changes into a "staging" branch before merging to master makes sense. I think this added flexibility is valuable, of course keeping the "patch style" single-commjt review great should probably be the priority.


Do you think that gitlab will ever add inline diff comments? I don't know if it would even be feasible to add to gitlab


More online than comments on the changes in the Mr? Like: https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/2...

(there's some old discussion on there, random example)


I guess you can get the hang of Gerrit, but i tried it out a couple of times (i occasionally do work on projects in the Wikimedia git repo) and is pretty evident that the interface is written by developers with little knowledge of user experience. Moving stuff to Gitlab will probably increase the number of volunteer contributions, at least i'm more interested in contributing more now.


I really don't mind the Gerrit UX - it seems to be optimized for daily use by programmers, not for onboarding speed. That's a tradeoff I'm very much okay with.


Gerrit is so off-putting to new users that many never get over the learning curve. At Wikimedia we want to be welcoming to new contributors. We also have to consider the on-boarding experience of new staff members, as well as the productivity of staff and long-time contributors. Gerrit satisfies some people who have used it for a long time but it is almost universally disliked by newcomers. When your users are volunteers, you can't force them to use Gerrit until they get used it it. If the experience is bad enough then they will choose to spend their time on something else instead.


I'm personally somewhat doubtful that that is really the aspect of the new dev experience at wikimedia that is actually turning off newbies.


Curious, did you think of moving also the repos and reviews to Phabricator, as I recall WMF using that for task management?


You're okay with the tool being difficult to use because you already know how to use it.


I'm okay with it being slightly more difficult to get started with, in return for higher productivity in the long run, yes. Right now, after spending a similar amount of time both with GitHub PRs (and Gitlab MRs) and Gerrit, I still find Gerrit much easier and faster to use.


I'm generally okay with that tradeoff too, but we tried it early on at our company and the juice was not worth the squeeze, at least in our case.

Designers and even many developers found it essentially impossible to use and the developers who were reasonably comfortable with it spent way too much time assisting others in attempting to use it.

(fwiw I found myself somewhere in the middle - I like the model and understood the ideas but also found it annoying to work with in practice)


Same. Cool idea in concept. Not something I have enough time to be interested in using heavily.


I'm okay with Vim being slightly harder to learn to use than VS Code. A tougher learning curve in exchange for more powerful tools can be a good tradeoff.


I think this is a good comparison, and exactly shows the problem: Vim is only used by a minority of developers, the majority use some kind of graphical editor (like VS Code). That doesn't mean learning Vim isn't a good tradeoff, it's just not a good tradeoff for the majority of people.


I already know how to use it and I hate how proprietary it is, I have so many other problems on my plate than customizing my Git to work a certain way. I like using the off the shelf tools that work nicely with the normal git workflows using temporary branches for MRs. And a nice UI that anyone can use with minimal effort or training.


Difficult to learn != difficult to use.


I agree with you - for tools I use on a daily basis.

However, since my interests vary considerably, and therefore I dabble with lots of different tools, the difficult-to-learn tools never get enough traction in my limited human memory to get me to the easy-to-use stage.

If a community doesn't want to engage occasional users, it's probably fine (maybe even desirable) to have a higher barrier to entry to make daily use really fast.

If a community benefits meaningfully from occasional users, a high learning barrier may not be a good thing.


"Because it's hard" is a bad reason to shy away from something.


No it's not, you have limited time and devoting X hours to gain back X/10 hours worth of productivity gain in the future is a bad investment. Don't do something hard for the sake of doing it unless the gains out weigh the cost.

https://xkcd.com/1205/


Software should be written for users, not non-users. You'd think this would be self-evident and yet here we are.


How do you then convert a non-user into a user with the least friction?


Force. I for one never looked at Gerrit and thought "I should push this at my employer". I'll probably never use it unless I'm forced to.


Gerrit ux is full of bugs that get in the way of daily use (maybe improved over time). Things like overriding ctrl-f in the browser but then having the overriden search bar not work or not being able to effectively type inline comments on mobile.

I dont really think the choices that they did make that work, are really any better in optimized daily use than intuitive choices would have been.

(Yes probably much of this is fixed)


The git-review [1] tool makes it trivial to interface with Gerrit (it's likely packaged for your distro, see [2]). I've found many people struggling with Gerrit don't know about it and it has made their life significantly easier. It handles all the magic of pushing to refs so that you never need to know about it. You drop a .gitreview in your project and then your work-flow is literally

    $ git checkout -b my-feature-branch
    $ emacs ... <edit edit edit>
    $ git add -i ...
    $ git commit
    $ git review
     read reviews, edit
    $ git add -i ...
    $ git commit --amend
    $ git review # push new change revisions
You can download an upstream change to use locally with "git review -d 123456"

[1] https://docs.openstack.org/infra/git-review/ [2] https://www.mediawiki.org/wiki/Gerrit/git-review


Honestly, i find git-review much more annoying than just memorizing `git push origin HEAD:refs/for/master` (if that's too much easy to create a normal alias) and the gerrit web interface gives you the command to download a specific changeset. Git-review tends to break in unclear ways and sometimes do things other than what i expect it to.


> Moving stuff to Gitlab will probably increase the number of volunteer contributions, at least i'm more interested in contributing more now.

Is there a project out there that originally used something like Gerrit, Phabricator, Reviewboard, or a mailing list that moved to Gitlab or Github where the number of contributions increased after the change?


What advantages do you see in Gerrit? Do they require a lot of experience in order to be realized?


Clear mapping of change request == commit, allowing for easy building of multiple in flight change requests, rearranging them by using git rebase, updating with push to refs/for/master. Easy diffing between states of the CR (patch sets), so you can see what changes since the last round of comments, even if it spanned multiple updates. This is the feature I miss the most from other code review systems - be able to easily work on another commit that bases on one that I just sent out for review (even starting review on the new while the parent still hasn't finished being reviewed!), and rebasing my current change as the parent change gets reviewed/updated.

Possibility to send out a PR for review using just a push (change message in commit, push to refs/for/master%r=foo).

Snappy and compact code review experience (no space wasted for whitespace, avatars, pretty buttons). Full coverage with keyboard shortcuts.

Powerful review rule system based in Prolog, allowing for things like code owners, experimental subdirectories without the need for review, etc.


> Clear mapping of change request == commit,

was forced to use Gerrit by a client. I could never get the hang of this, I like to do frequent commits on short lived branches and using vanilla git. I never wanted any more features other than a nice UI to encourage people to review.


The nice UI for review works well exactly because it limits the functionality available to users and enforces a particular commit model. If you don't do that, you get the code review mess that are Github/GitLab PRs/MRs - difficult to tell apart how commits relate to the change and how it progresses through review, because the entire branch history is free-form.


It's possible to allow users to have multiple commits but still show review between the submitted "tip" commits. If you want you don't even need to show the other commits in the UI.


One thing one has to get away from that one change request == one issue. Multiple small commits as separate change requests for one story are fine as long as they work standalone (which is generally a good idea, e.g. to enable bisecting)


I much prefer the model of force-pushing your development branch to create change sets. It lets you more easily see how development evolves in response to feedback. And the final state of the branch which gets merged leaves all the in-progress work that no one cares about behind only in Gerrit.

With Github/lab's model, if you force push your PR, you lose the ability to view its previous state and diff against that. Alternately, if you just keep adding commits, then the final branch that gets merged (unless you squash) has all the in-progress work which pollutes the repo's history.

Gerrit also has a finer grained permission model, but I don't care as much about that.

Gerrit definitely expects the user to understand how git works conceptually a bit more than Github/lab.


> With Github/lab's model, if you force push your PR, you lose the ability to view its previous state and diff against that.

That's not quite true. Gitlab lets you compare any two "versions" of the force pushed branch.


Thanks. I haven’t used gitlab. I assumed it worked the same as GitHub. That’s good to know.


Yup, GitLab works as expected here. It's always surprised me how quickly the old commit is garbage-collrcted when you force-push a branch on GitHub. It causes weird errors in CI runs and breaks viewing the old commits.

Seriously I'll pay for those couple of KiB of space, just keep it around. (at least until the PR is closed)


Gitlab DOES let you compare different versions of changesets in a merge request: https://docs.gitlab.com/ee/user/project/merge_requests/versi...


> With Github/lab's model, if you force push your PR, you lose the ability to view its previous state and diff against that.

I'm not sure about Gitlab, but Github has recently added a feature where you can view the diff between the old branch head and new one. But, as far as I'm aware, there's no way to check out the previous branch head from the repo due to a lack of a remote branch pointing to it.

At least git itself provides a range-diff command that allows you do see a diff between the commits between two versions of a given branch.


But you don't force push with Gerrit?

Just update your change set, then push to refs/for/<branch_name> again.


It's been a while since I used it, but I guess the more important point is that it's a rebase-based workflow.

My memory was you had to force push your working branch to refs/for. Thank you for the correction.

I've actually setup and run instances of it as two companies, but as I say, it's been a while.


Yes, I very much agree: the rebase-based workflow is what makes Gerrit superior to all other systems I've tried (of which I find Github and Bitbucket particularly loathsome).

I felt I needed to correct you because with Gerrit you reserve the concept of force pushing for exceptional cases, which I think is the correct mental model. Force pushing should not be done frivolously.


It's effectively similar to a force-push.


I would say it's quite different, since you don't overwrite anything. Old patch sets are still available should you wish to roll back/reference either a particular commit, or a whole chain of commits.


It is a force push under the hood. You’re removing an old commit and creating a new one.


It's not a force push under the hood. Under the hood a new branch is created (first patch set is /refs/changes/nnnn/0, second is /refs/changes/nnnn/1, etc.). From the end user perspective the result is quite plausibly similar to force push.


The branch model must have changed. I used to push directly to a branch that was the change number (not the revision number) and of course had to force because it was the same.

So what happens when you push to changes/nnnn/12 when revision 11 hasn’t been created?


At least the way our installation works we don't push to changes/nnnn/12, rather changes are pushed to refs/for/{branch} (often refs/for/master). This endpoint isn't an actual branch. It triggers gerrit to look up the nnnn for the Change-Id in the commit message and create the appropriate branch for the next patch set.


It's depressing how git tooling has to squish out all the flexibility that makes git attractive in the name of being unsurprising to someone who doesn't understand git.

I was excited to finally bring in git as we start ramping down on a legacy project onto something new. Then I started thinking about the developers that have never touched git and I need to support. I looked at the tools available and what workflows they dictate. Then there's the drive to do something similar to the rest of the company, autonomy only goes so far without a good reason.

Fuck me. I'm going to pick GitLab and hate it.


I had to use Gerrit in a previous job, and _hated_ it. The UX is abysmal. Some folks loved it though, especially engineers working mostly on the backend. People with more of a frontend focus couldn't get past the awful user experience.


Can confirm, it's extremely off-putting to say the least. Having previously used Github, Gitlab, Bitbucket I find Gerrit very unusual. I have to Google on how to do basic stuff.


Why is it sad? They've identified a better experience for their target audience, new community devs, just that not perhaps for you haha.


I've been using Gitlab and honestly it's pretty awesome; using a mono-repo setup and everything just works. The integration of CI jobs is also excellent.


it was interesting to read their reasons for not using github, especially related to no control over bans or sanctions. Microsoft could pull the rug out from under them at any time if they got pressure from somebody like China.


Interesting to see China be used as the example. Github recently took action towards DMCA requests. Meanwhile, Github has been used as actively used as a safehaven from Chinese censors when it came to 996 protests and COVID information.


For sure, Github should be lauded for its policy towards China, they have absolutely done the right thing here. Even so, there's no guarantee that that would last forever, so it makes sense for a company like Wikimedia to host this themselves.


[flagged]


I downvoted you because it wasn't the government that filed the takedown. Perhaps you mean "a private company using laws enacted by the US government?"


It's naive to believe that Microsoft would stand up to the CPP and not the US government.


[flagged]


> It's more naive to treat the CCP like some kind of horrible boogeyman

This is a valid point some of the time, but I don't think it really applies here. First off, the Wikipedia has had problems with censorship by various states, China being the most notable by a long shot (https://en.wikipedia.org/wiki/Censorship_of_Wikipedia).

It's very much true that the US government (or some other non-China state) could be a threat to the Wikipedia in the future, and I'm sure the folks at Wikimedia are aware of that too.


I'm not sure how GitLab is a better option. China could pressure them too.


GitLab is open source and can be self-hosted by Wikimedia.


They're selfhosting.


I was able to see Wikipedia uses something called openstack, but not the details of what infrastructure they host on. Anyone know what facilities these repos will ultimately be served from?


Openstack is used for wikimedia cloud, which is a project to give volunteers compute resources to do cool projects (this includes domains wmcloud.org toolserver.org and wmflabs.org). Production does not use openstack.

Things like code review tools would be hosted in Virginia [eqiad] (with backup in texas [codfw]) on hardware owned by the Wikimedia Foundation.

Docs about how gerrit is hosted https://wikitech.wikimedia.org/wiki/Gerrit if you want to know nitty gritty details see also https://github.com/wikimedia/puppet


Wikimedia uses colocated boxes around the world at different providers, and the user facing stuff is backed by Cloudflare.

https://meta.wikimedia.org/wiki/Wikimedia_servers


The user facing stuff is not backed by cloudflare (Cloudflare's magic transit has at some points been used to mitigate DDOS attacks at the IP layer, but otherwise cloudflare is not used. Wikimedia operates (its own) varnish servers in virginia, texas, san francisco, singapore and amsterdam to do frontend caching.)


Is this GitLab open source, or is it GitLab enterprise on the free plan?

Because they are different products, even if the GitLab marketing pages make it appear as if they are the same.


Community Edition.


Oh I get it now thank you. I had read only the outcome part.


You can self host GitHub too


It's not open source, at best they ship you a locked down VM image to run yourself.

Edit: Confirmed, "GitHub Enterprise is delivered as a virtual appliance that includes all software required to get up and running. The only additional software required is a compatible virtual machine environment." https://enterprise.github.com/faq


This is not true, they publish the code at https://gitlab.com/gitlab-org/gitlab.

If you want to use premium features, you do have to pay to unlock them, and depending on your deployment you would need to get the gitlab-ee image instead of the gitlab-ce one.


We're talking about GitHub.



Which costs $21/user/month versus $0 for the free plan and $4 for the team plan. That's a steep price increase.


If you pay the enterprise fee, which is likely an X amount per seat. They MIGHT give it out for free to Wikimedia as a favor / goodwill, but there will be strings attached.

(I'm not saying Gitlab doesn't have strings attached)


Source? I don't think that's true. Github has on-premises enterprise solutions but one should not confuse that with running free and open software on your own machines - https://enterprise.github.com/faq


I personally don't think anyone would confuse it, but the fact remains, you can host self GitHub. Maybe it costs, maybe it has string attached but ultimately the functionality is there... which makes it true.


Gitlab has no business in China.


It's a shame that Gerrit doesn't get more TLC. Now that GitHub is owned by Microsoft, it would be a good way for Google to combat the ongoing centralization of open source.

Fundamentally, the Gerrit changes model is superior than the pull-request model. I also really like how Gerrit stores all its information in git repos, comments, settings and everything else.

The issue is everything that surrounds it. Talking from somebody who maintains an instance for a customer. The UI has all sorts of UX problems and takes a while to get into. The notification system is super noisy by default, we had to build a system on top and it's still not as good as GitHub's (which is also bad in its own respect).


I'll generally support using gitlab over github on principle, but have found gitlab to be basically useless without javascript enabled while github still kind of works at least for read-only visiting of shared github URLs. Whenever someone shares a gitlab URL with me, it's very rare I can make any use of it without enabling js, it's quite annoying.


Of a simple page that works without JavaScript enabled is of interest to you, you might want to check out sourcehut.


If I could compel everyone migrating to gitlab to instead use sourcehut, I would.

Alas this isn't about my personal choice of git hosting, I already have my needs covered with a dedicated server.


One thing stopping me from moving my company's code to gitlabs cloud offering is the storage sizes for repos being extremely small. I heard from a rep that this will change in November. I'm wondering if this purchase-more-storage change come relates to this?


Wikimedia will self-host Gitlab. They can use whatever limits they want.


They are still very limited when it comes to functionality though - only the first column in https://about.gitlab.com/pricing/self-managed/feature-compar... , right?


They probably use the free Open Source option, which gives access to the top tiers for free.

https://about.gitlab.com/solutions/open-source/


That means they are running non-free code, which seems to be one of their main points against Github.


It's community edition.


The community edition doesn't have any tiers or features, those are only in the Enterprise Edition.


This requires asking for a specific number of license seats for your open source project and that's impossible to work with. Every possible user account/contributor takes up a license. How many contributors will I have tomorrow? Don't know. How many spam accounts will I have that waste license seats? Too many, and they're impossible to clean up.


All of those are self-hosted.

"Core" is the free community edition. The others are not open source/free and require payment.

Core is plenty for most use cases. That table makes it look like it has almost no features, but most items in the list are either advanced or pretty niche.


Right, so is Wikimedia going to pay $$$ to GitLab for one of the more advanced licenses?


According to their pricing page: "We provide free Gold and Ultimate licenses to qualifying open source projects and educational institutions. Find out more by visiting our GitLab for Open Source and GitLab for Education program pages."

I'm sure Wikimedia has the dosh to pay for licenses themselves, but it's hard to see how the per/user pricing model would work for any open source project.


No, Wikimedia is going to be using the Community Edition (CE) of GitLab, which is free and open source under an MIT license. This decision and the reasons for it are described in more detail in the FAQ section of the linked article.


My understanding is you can self host and pay a sub and get access to all the other features. I may be wrong.


That's correct - with the GitLab Enterprise Edition you can self-host GitLab and get access to all of the features - both those from the Core open source version as well as our proprietary features.


Yes, but that's only if you self-host the Enterprise Edition which contains non-free code, not the Community Edition which only contains free code.


The limit is 10GB, while GitHub's limit is "ideally less than 1 GB, and less than 5 GB is strongly recommended".

I don't consider 10GB to be "extremely small", especially since that's a larger than usual limit for a hosted solution.


You can buy tons more space for Git LFS on GitHub. 10GB seems to be a hard limit on Gitlab.


The repository storage size is 10GB [1]. I wouldn't consider that "extremely small".

[1]: https://docs.gitlab.com/ee/user/gitlab_com/index.html#accoun...


That's for container registry, packages, code, artifacts, everything. I have a single project in my monorepo that produces a 512MB binary file and stores it as an asset. In 20 CI runs, assuming there was 0 code in repo or anywhere else, we'd use up the entire budget. We make more than 20 commits/day.


I don't know about those other features, but that definitely does not include the container registry.


They're hosting Gitlab themselves.


Where can one find the repo sizes?


"This raises the question: if Gerrit has identifiable problems, why can't we solve those problems in Gerrit? Gerrit is open source (Apache licensed) software; modifications are a simple matter of programming."

... And nuclear power is a simple matter of splitting atoms.


Things move so fast! git was initially developed on the back of a fag (cigarette) packet by Mr Torvalds and his mates for Linux code development. It seems to work quite well because we have a lot of gits these days.

github or lab or gerrit? I don't have favourite but I did notice a lot more coloured lines with gerrit stuff and a feeling of bewilderment. However I also felt like the adults seemed to know what was going on.

Let's see how this pans out. It'll probably be fine but I suspect we will lose something by creeping towards the "mainstream" and ignoring diversity. That sounds a bit odd for a tool designed for Linux 8)


I need clarification: is this Mediawiki or Wikimedia moving to gitlab? It seems the Wikimedia foundation is moving their code, so why is this on mediawiki.org?


> so why is this on mediawiki.org?

Probably because mediawiki is the main Wikimedia software project.


I'd be interested in the CI aspects of this transition, which seem to be glossed over.

The combination of Gerrit with Jenkins and Jenkins Job Builder blows everything else I've seen out of the water with how easy it is to make both per-patchset and post-integration changes in an infrastructure-as-code manner across multiple repos, once you get over the learning curve.


We recently conducted an extensive evaluation of CI options[0], concluding at the time that GitLab could meet our needs but didn't make a whole lot of practical sense unless we also migrated to it for code review. Other considerations (Gerrit user experience and sustainability, onboarding costs, a de facto migration of many projects from our Gerrit instance to GitHub, etc.) led us to re-evaluate whether a migration for code review would make sense, and that's what the decision linked here addresses.

I am on the team that maintains our existing CI system[1] (Zuul, Jenkins, JJB), though I mostly work on other things. While this system is certainly quite powerful, I would not personally describe it as easy. We have a lot of work in front of us in migrating it to GitLab, but so far I've found the experience there quite a bit more pleasant than grepping through JJB definitions and the like.

At any rate, if you're interested in how all of this pans out, we will as ever be doing the work in a very public fashion.

[0]. https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering...

[1]. https://www.mediawiki.org/wiki/Continuous_integration


To clarify, is your GitLab-based job system done with configuration-as-code and does it share definitions of jobs across repos?

The solution we come up with in using Gerrit/Jenkins was to have a common test invocation (in our case `make test`) that glossed over all the details of a projects build process, and was expected to output test and coverage in specific formats Jenkins could consume (junit/xunit and cobertura). We have jobs that run `make test` no matter if the code is C, Go, Python, Javascript, etc.

This also had the beneficial side effect of lowering the barrier to entry for someone working on any random project - make papers over all those toolchain differences.


> To clarify, is your GitLab-based job system done with configuration-as-code and does it share definitions of jobs across repos?

We announced our decision to migrate to GitLab on Monday, so we don't so much have a GitLab-based job system.

Nevertheless, yes, GitLab CI jobs will be defined by files checked into version control, and we'll reuse things where appropriate.


Google's non-free JavaScript recaptcha is an absolute showstopper! At least roll out a custom made solution of nothing else.


We are not going to use reCAPTCHA.[0]

[0]. https://www.mediawiki.org/wiki/GitLab_consultation/Discussio...


Would it make sense to implement a peer-to-peer protocol for sharing git repositories, similar to torrenting, with the goal of overcoming these kinds of issues?


Since Microsoft is pro open-source now, they probably should move too.


why is this news


is the deal with Wikimedia the reason gitlab blocked users from Iran ?


Microsoft will slowly kill github


Depends. MS used to have its own code hosting service (Codeplex I believe), and it wasn't that bad, but Github was more "social".

I think it's mostly a question of how independent Github will be under MS. Now to be fair, MS has good services. Azure is nice, and office 365 is pretty good.


They might make money off if it, but I believe GP was referring to github's prominence as the home of so many free software projects.


A compony moves their code repository to a different provider. Honest question: why is this news worthy? Am I missing something?


Wikimedia is very cautious in making changes because they take very seriously the values of sustainability and predictability.

That they are moving from Gerrit to Gitlab is a blow against Gerrit and a boon for Gitlab (assuming it goes well).


I don’t know about Wikimedia being super cautious when adopting technologies... I host a MediaWiki instance and there’s been a lot of “not so cautious” tech decisions in the past. Jumping early on the HHVM train (which they eventually had to leave); adopting Lua for wiki modules; developing Parsoid as a Node service (now rewritten in PHP)... None of these was the “safe option”; in some cases it worked out well but in others it didn’t.


None of that was forced on anyone, though. Everything you mention are purely optional components. I think it makes sense for them to explore new technologies like that while still keeping their general requirements conservative.


Yeah, but those components were heavily used on Wikipedia and other Wikipedia websites, and some - especially Lua modules - are fundamental once the wiki grows to a certain size, since wikicode templates are quite limited and suffer from performance problems.


What's wrong with HHVM? Honestly asking, I have no real context beyond knowing it is a FB invention


HHVM got very little community adoption and never had strong promises to stay compatible with the rest of the PHP ecosystem.


Thanks. I'm not a developer so I guess it's relevant, just not for me.


It helps to follow trends. For a while I thought Gitlab was part of GitHub.

When I see enough news about something on HN, I look into it.

I even did a little Rust tutorial recently.

Buzz makes a difference


^F gittorrent.

It is like working in a big corporation. Eventually you will do / say something that won't be liked by a person in power / competitor and you will be cancelled. Community / public supported projects, also free speech are only safe from influence on community infrastructure. It's time to start decentralizing.


Canceled by whom? They are self-hosting their own instance of Gitlab CE. If Gitlab the company disappears tomorrow, they can fork the project and continue using it.


I never checked out gitlab but the name always made it seem like a github copycat. If you want to differentiate your product why not choose a name that rings different to your competitor?

And now I finally went and browsed some repositories and it's clear as that they were very much "inspired" but Github's design, it's practically a github clone, at least in that regard (although something is a bit off in the smoothness)

Maybe one day I'll find myself in a similar position and see things differently but this blatant copying always seemed ridiculous to the point where I would feel ashamed to lead this type of strategy.


I don't want to sound pedantic but this comment makes you look extremely ignorant and unaware of this space and its solutions, use cases, etc. Basically you sound completely oblivious of what's going on here.

If today was the first day that you saw GitLab, perhaps you don't have the domain knowledge to make any of this claims. Nothing you said here makes sense. The Git as a prefix in the name is just a technicality, the same way I could create an operating system called YeahOS and everyone would understand that OS is a suffix that blends into the branding.

But what makes you sound particularly ignorant it's calling GitLab a clone of Github. They are both git platforms, that's it. By that metric BitBucket, Gitea, Gogs, CodeCommit, Phabricator, etc., are all clones of GitHub.

If anything, recent GitHub functionalities like GitHub Actions are a clone of mature GitLab functionality like GitLab's CI.


>the same way I could create an operating system called YeahOS and everyone would understand that OS is a suffix that blends into the branding

Yet if you named it myOS and copied the look and feel from iOS everyone could see that you created an iOS clone.

That's the only point I was making but seems that many feelings were hurt in the process.


It's all fair. Looking back at my comment I sounded a little bit harsh and I apologize for that.

What I was trying to establish is that there's really no solid ground to claim that any Git platform is a copy of another one since they are all essentially productivity and team-work wrappers around Git.

If you're talking about the general information architecture that's how SourceForge was even before Git existed, so hardly an original idea.


>no solid ground to claim that any Git platform is a copy of another

I think that's fairly obvious. I guess I didn't explain clear enough the point I was trying to make in the original post. I kind of regret writing it now.


I think it's fair to say that a few years ago GitLab was very much inspired by Github's design. However the project has had a focus on adding additional, tightly-integrated features and I'd say in the last couple of years Github has been more inspired by GitLab than the other way around.


GitHub and GitLab, Gitea etc are all centred around git, so including "Git" in the name seems like an obvious, sensible even, idea.

Personally I don't think the GitLab UI looks that similar to GitHub's, and to me it looks and feels kind of clunky, and lacks contrast.


The important difference is that GitLab is open source! That's a bit like critizing LibreOffice for "copying" MS Office and using the word "Office" in its name.


Hi, Developer Evangelist at GitLab here.

GitLab Core is open source which is the base for the Community Edition. The paid tiers are based on Core and add Enterprise licensed proprietary code, following our open core business model.

More about our Open Source stewardship: https://about.gitlab.com/company/stewardship/


You could not be farther from the truth. Gitlab is an improvement on Github in every way, they offer a suite of features that Github doesn't have that are only available through third party connections.

Gitlab is not a Github clone.


A few years ago, Gitlab was indeed mostly a clone of Github. The thing is during these few years I personally think Gitlab became far superior to Github in terms of "turn key" solution to manage your software. Now, Github is the one "stealing" ideas from Gitlab, with for example their Actions.


That two services based on git have git in their names seems reasonable. That's the only commonality... "hub" and "lab" are both one syllable, but very different in sound and connotation.


Both are just fancy wrappers and tools built around GIT. It's not surprising that one looks like the other. Given historic exoduses from one, it's only logical the other would build a similar toolset with the intention of allow for easy migrations between platforms.

While similar, the companies are different in the approaches, the platforms are different in their pricing (both having impacted the other which ultimately has helped consumers).

Your argument would hold true of any word processor. I think it's an unfair assessment of two products targeting the same specific users.


People wanted a decentralized GitHub. That's why they built it like GitHub, except decentralized.


If you think Gitlab’s a Github clone you must not have seen Gitea.


I agree that Gitea looks a lot more like Github (but with a dark theme!), and I think this is a good thing. I also like Gitea because it doesn't have reCaptcha or other proprietary components, but it doesn't seem to have the advanced ('Enterprise') features that Gitlab does (I program as a hobby so I've never needed these). When Gitea has grown and is more accepted, I hope the Wikimedia Foundation will consider using it.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: