GitHub Actions with act to test stuff locally is actually pretty usable. We use on-prem k8s-hosted runners to get access to servers/clusters with limited internet access, works great.
But as always with “”gitops”” themed tools I think it’s pretty awkward to handle rollbacks. Either you take the stance that master is the source of truth and let the CI/CD tooling revert commits if deployment fails, or you store that state elsewhere and allow e.g manifests to diverge from git.
Like all terms in the industry, "gitops" gets defined a little differently by everyone.
In our case, we use trunk-based development where "main" is the branch that is approved for release, but it does not necessarily point to what is currently released. Instead, we use git tags for that. On merge to main, we kick off pipeline executions (we use AWS CodePipeline here, but Github actions with a concurrency group id set would achieve something similar). Non-prod stages just push lightweight tags in the format of `$STAGE-YYYY.MM.DD.hhmm` on successful deployment. Prod stages are similar, but we publish a Github release declaration using the Github API rather than just pushing a git tag. In the case of rollback, you either push new tags at older commits or roll forward with a revert commit (but still using tags as the tracking mechanism).
Thanks for sharing! We are a small company and right now we release (from our monorepo) on every PR merge to main. As the engineering team continues growing, we anticipate this approach will become unsustainable soon, and are looking for alternatives.
I wonder if the tag based release is a widespread industry pattern (especially for growing monorepos)? Would be curious to learn more about it
I was honestly shocked that I couldn't find many blog posts or articles about managing this process. The tag based mechanism was one I came up with on my own because we needed something. We're also a small team of 5 engineers, with many small microservices that each have their own pipelines. The problem with main being the release tracker is that it doesn't work if you have multiple environment stages. Stage specific branches work, but then you don't get history.
I'm on a team that implemented a hybrid tag-based release system on a monorepo and it is working well.
External releases are built on release branches off of main with their own git tag rules, but I want to touch on internal releases off of main.
Between feature branches and main we have GitHub status checks that gate bad PRs from getting in. Once the PRs are in main, we build a nightly full suite of components and put them through various levels of interop testing. Once our system receives the signal that all builds and interop testing had passed, it applies a Last Known Good (LKG) tag to the git commit the components were built from.
After that, various systems including artifact links for QA and auto-merge jobs from main, are set to use the LKG tag to ensure they are getting good builds and code.
If you're looking for a way to reproduce your CI locally that isn't tied to a particular CI system (but which has a nice integration with GitHub Actions), there's also Toast: https://github.com/stepchowfun/toast
Toast lets you use whatever base image you want (even when running with GitHub Actions), and it has some extra features for local development (e.g., caching, bind mounts, tasks with dependencies between them, etc.).
huh. this looks crazy similar to something i just started building, down to the yaml. except mine doesn't attempt to containerize.. i wonder how much overhead that adds
Incoming shameless plug; if you don’t have to handle the hosting runners, but still to reap the benefits of having proper hardware (close to the metal). Check out BuildJet for GitHub actions[1] - 2x the speed for half the price. Easy to install and easy to revert.
I've mentioned this before on HN, but `act` was completely unusable for me when I tried it. Tons of errors that required digging deep into the code to understand, missing features… has it improved over time? It seems very useful and I gave it a solid try at the time (maybe a year ago), but just had to give up after a few hours.
edit: I decided to upgrade and give it a try, went from 0.2.21 to 0.2.25 (Mar 30, 2021 → Nov 24, 2021). It's still failing to run a basic "build and test" pipeline, and really not doing anything fancy: apt-get install + make + ./service + ./tests.py, with a Redis sidecar. This seems to be because services are not supported in `act`: https://github.com/nektos/act/issues/173
I wrote a very small container orchestrator that spawns a runner container when it gets a webhook that a job was triggered. it spawns a runner container with the environment required to attach to the repo as an ephemeral runner, do the job, then detach the runner and exit the container. very fast.
this was all before I knew about act and it's about the same speed in my limited testing.
I wish the UI has a left sidebar with scroll-able TOC, so I don't need jump back to home page to pick a different subject. I asked the same to gobyexample.com's author and was told they have no intention to change.
Every CI system in existence is reinventing the exact same wheel: "I want to run some random task" + event hooks + secrets + logs + plugins + integrations. It's so ridiculous that more of them keep being created - and are losing functionality. GHA has all these configs in YAML (there's a user-friendly config file...) but doesn't let you run paramaterized builds in their web UI? .....why?
We shouldn't be writing all these jobs in a format that only works for one CI system. You spend months writing Jenkinsfiles, and then you move to CircleCI and have to rewrite all of them, and then move to Drone and have to rewrite all of them, and then move to CodeBuild/CodePipeline and have to rewrite all of them, and then move to GitHub Actions and have to rewrite all of them. Eventually we'll rewrite them all for something else. And why? To run the same exact tests on a slightly different system.
It's why I like tekton, it's a CI system for kubernetes so you're letting kubernetes solve all those problems. I have far more faith that k8s will be alive and kicking and not breaking core things like secrets, logs, container execution, etc. in painful ways in the future than anything else.
In general though I try to put all the CI work into a simple shell or similar script and then just configure whatever CI system is in use to call it in the appropriate environment (in a container, bespoke VM, etc.). I agree putting all kinds of complexity in a unique CI system is just asking for trouble down the road as it's now basically a hard dependency for your code to be shippable.
Maybe it didn't 6 months ago, but it seems this is an option now, and well-documented.
This is a topic of interest for me, I presented on Jenkins and GitOps to an audience at KubeCon who I suspect are almost all interested in moving away from Jenkins, or have been told to be interested in switching from Jenkins to something else, and I tried to get the idea across that they probably don't really need to switch the workflow tool even if it's ancient, ...
But maybe should consider subbing out some of the important fiddly bits underneath it (like, I assume the vast majority of Jenkins users are building images with Docker, and if they're running on Kubernetes, they're many of them wondering what they will do, or how long they really have before they have to start worrying about the deprecation of dockershim and how their lives are going to change when their clusters won't be running Docker under the hood anymore?)
I was actually arguing for a tool like Porter.sh to come in and make the boundaries of "what's in a build" super neat and tidy, organized, but also limited so the next time they feel compelled to switch workflow tooling, it will be a non-issue and can be over and finished inside of a single day's work. The problem isn't that your workflow tool is too old, it's that you've jammed too much arbitrary complexity inside of it, probably because the right abstraction was not made available to you at the time. "Switching off of Jenkins" just means building a second system and it comes with all the baggage of "second system syndrome" to do so.
Sure it's difficult switching from Jenkins, when you've built this gigantic pipeline with 18 branches and 12 of them run in parallel, half of them are configured with different options passed to the docker build tool, half of them must use buildx, and the third half of them are unmaintained so we don't go in there... so put some guard rails up around the hard parts! And get somebody in there to take care of those cobwebs.
I'm glad it's an option now! That was crazy that it took 3 years for this feature to show up, though.
I think we need a few more "12 Factor App"-style guidelines for modern systems. 12 Factor goes a long way to abstract away the tendency for implementation lock-in. But we can probably create a few more guidelines specific to CI so it's portable. Use OAuth, use the same authorization layer as your VCS, tie secrets and artifacts to the VCS repo, make every plugin a Docker container. (I'm stealing these concepts from Drone because it has the best cloud-native design I've ever seen)
The final unsolved bit is how to manage a DAG of jobs and hooks around every event in a portable way. We probably need a universal CI spec and API, but maybe that's too specific.
Also: holy crap, I don't think I've seen Porter.sh before, I love the idea! Definitely going to look into that
A brilliant insight I came across once was that a low barrier to entry can prevent standardisation and quality.
The original example was that in LISP it's so easy to add Object Oriented functionality that it's a mere afternoon's work for a graduate student. For comparison, it took Bjarne Stroustrup several years to extend C with objects to create C++. Hence there are only about 4-5 such languages out there, of which only two are popular.
This means that every C++ program is object oriented in the "same way", whereas random OO LISP modules are incompatible and you can't merged them into one cohesive program or reuse the components nicely.
It's almost trivial to come up with a build system. Like you said, it boils down to not much more than interpreting a simple sequence of steps, typically doing nothing more than just triggering shell execution directly or indirectly. A basic build system is a few days work for a talented programmer! Heck, a mere shell script will do in a pinch...
Notice however how the industry is slowly consolidating on Docker and Kubernetes. I suspect that one reason for this is that these are hard technologies. Not just anyone can "whip up" a container build and orchestration system in a short period of time.
Hence, there's only a handful of container systems commonly used in the wild, which has resulted in standardisation. Precisely the type of standardisation that you (and many others) have been craving.
TL;DR: The most complex system will become the CI system standard to rule them all because it's complex.
ANSI Common Lisp was the first standardized OOP language (1994) with its integrated Common Lisp Object System. The project to specify this object system took several years and a group of core designers, plus a larger feedback group.
The design & feedback was driven together with a portable implementation: Portable Common Loops. The team developed that implementation to validate their design ideas and to get practical feedback from a broader community.
It's pretty terrible if you compare it to CircleCI or GitLab from 4 years ago. I'm a big fan of GitLab, seems like the only company pushing things forward in an _elegant matter_, used it heavily in the startup world. Using GitHub again these days. I cry every time I need to do GHA stuff. Current setup of Github + CircleCI is miles more elegant as it was 6 years ago.
I like gitlab, but I'm not sure elegant is the word I'd use for them; they have a severe case of wanting to check all the boxes for features, but it can be a little clunky to see how the parts work together. My pick for elegance would be sourcehut. On the other hand, they all seem to work pretty decently and the clunkiness isn't that bad, so I keep using it:)
This makes sense - the main reason GitLab took off is vertical integration with CI/CD which Github is catching up on. Github has the long game in mind and with it's size and preponderance of OSS I see it taking over when they innovate to a more useable level.
well, look at Azure DevOps. Everything it does is coming to GitHub, and AZDO will eventually be sunset in favor of GitHub. And AZDO is quite good, imo.
MS moved a lot of (almost all) Azure DevOps people and put them on GitHub.
Word of advice, if you are thinking of running self hosted runners and use Actions for your organisation, do yourself a favour and check them out in a year or two and use something like Argo Workflows or Tekton instead.
GHA isn’t a product thought for GH private organisations, you will find that every much needed feature for this use is very low in GH roadmap.
We run GHA with auto-scaling self hosted runners at scale (~1000+ runners at peak hours) pretty well for PyTorch but it is a labor of love and patience.
However I'd say that Github's been pretty receptive to feedback and has actively fixed almost every wall that we've run into (if we haven't been able to fix it for ourselves)
It doesn’t matter how smart you are with reusable workflows you will never get to a truly DRY setup that scales for dozens of repositories.
Another major pain is that we still haven’t private actions. It was due end of 2021 (maybe it is out now but I checked a couple of days ago).
Setting up runners to look after a pool of repositories needs elevated permissions.
GH offers a way to enforce a list of enabled actions but this does not work with private binary registries hosting pre built Docker actions. The only thing that could prevent you to pull software at runtime from the internet, which means, if you want to have a decent security posture all you are left with is referencing actions using the full git sha version.
Many common use case require hacks, which is fun for a weekend project but isn’t great for a large scale operation. An example is simply running a workflow dynamically targeting the folders containing changes. At the moment you have to create a job, generate a build matrix on the fly and pass it in input as the matrix to the actual job.
> Another major pain is that we still haven’t private actions. It was due end of 2021 (maybe it is out now but I checked a couple of days ago).
I'm looking forward to this landing too. In the meantime, though, checking out the repository that contains the Actions and referencing a local path works fine so this hasn't really been a blocker for us.
Edit: per sibling comment, it seems that this feature became available in the last few days. Nice!
The only way I'm aware of to use a private action is to clone the repository it's in using a personal access token, and then use a local relative path to the action to run it.
I wish more people wrote docs in this fashion. I almost always end up skipping official docs in favor of digging around for blog posts going over code examples on things.
There's also 'Learn X in Y Minutes' (https://learnxinyminutes.com/), which covers a range of different 'X'es. They make it ridiculously easy to get going with a new tool/language, IMO. It's a superb paradigm in general.
I loved the simplicity and directness of the UI too. Specially like the feature where mouse cursor maps to code blocks, how did you create this ? I'd like to use this framework if possible for tech blogs.
> Actions reduce workflow steps by providing reusabe[sic] “code” for common tasks. To run an action, you include the uses keyword pointing to a GitHub repo with the pattern {owner}/{repo}@{ref} or {owner}/{repo}/{path}@{ref} if it’s in a subdirectory. A ref can be a branch, tag, or SHA.
Aside from the typo, I wonder how many packages could be backdoored at once, if an action maintainer went rogue, seeing as there's no pinning for actions by default, and (according to https://github.com/msys2/setup-msys2/blob/main/HACKING.md) moving a tag is the default way to push updates to an action. (Interestingly get-cmake/run-cmake/run-vcpkg are all operated by the same person.)
Nice and simple explanation. Never looked at the docs, I used the Github 'workflow' to automatically create an action for CMake and CTest (C++ code) and just followed the steps without customizing. It just worked out-of-the-box, the autogenerated yaml file is here:
https://github.com/google-research/tiny-differentiable-simul...
I too love "Go by Example" and refer to it often. Makes me want it for all the things.
Shot in the dark, anyone know of one for hot-off-the-presses modern Python (3.10) with typing akin to Golang? All the modern additions really need a comprehensive overview like Go by Example somehow manages to do in a very lightweight style
None of that is scary. Pretty much all of that advice applies to systems you run internally. I do wish GHA had a solution for secure file injection, which solutions like Jenkins already have, so we didn't need a janky workaround for JSON blobs.
Maybe this thread is the right place to ask a usage question: I have a multiline GitHub Secret which I would like to print out to a `.env` file in a GitHub Action. How can I do that?
My current solution doesn't support spaces within the secret content:
`echo "$ENV_FILE"`, maybe? `echo $ENV_FILE` (without quotes) will split the environment variable by separators in $IFS and pass each chunk as separate argument.
But as always with “”gitops”” themed tools I think it’s pretty awkward to handle rollbacks. Either you take the stance that master is the source of truth and let the CI/CD tooling revert commits if deployment fails, or you store that state elsewhere and allow e.g manifests to diverge from git.
I would be curious to hear how other people do it