Hacker News new | past | comments | ask | show | jobs | submit | kevg123's comments login

I've been working 20 hours a week at a major company for the last 7 years and I love it. Full health benefits.

> What kind of role are you in?

SRE/Programmer

> How did you make the part-time arrangement happen?

I accidentally fell into it when I went part time to do a masters degree to consider switching careers but then, when I decided I didn't want to pursue the other career, I just kept doing part time and my company was fine with it. I work Mondays (7 hours), Tuesdays (7 hours), and Wednesdays (6 hours) and then I have Thursdays through Sundays off. I get paid half of course but it's still a good salary (grateful to be a programmer!).

> What’s been the hardest part of working part-time in tech?

Sometimes it's easier for me to take calls on Thursdays or Fridays instead of pushing back on it (technically I could, but I don't want to jeopardize my part time status). This is fine as I just make up the time in the following week but it means that I can't easily treat Thursdays and Fridays exactly like a weekend as far as scheduling trips, etc.

> Are there specific companies or setups that are more open to part-time roles?

I did it at a huge company so I think it's possible anywhere but I already had a great reputation and my company really wanted to keep me, and I think it would be harder to do without having a great reputation. I get the impression that companies generally don't like doing this because of the possible contagion effect of other workers wanting to do the same, so I generally don't talk about it much at work.

> Any tips, advice, or experiences you can share would be super helpful!

Four days a week part time is much more do-able than three days a week. Don't limit yourself just to companies/jobs that explicitly advertise part time work, though those certainly also make sense to apply to, but just make part time a key part of the negotiation process. Consider emphasizing flexibility.


> based on Intel EU stall profiling for hardware profiling

It wasn't clearly defined but I think EU stall means Execution Unit stall which is when a GPU "becomes stalled when all of its threads are waiting for results from fixed function units" https://www.intel.com/content/www/us/en/docs/gpa/user-guide/...


1. A classic book is The Mythical Man-Month and it discusses a surgical team approach that I think is interesting where there are lead surgeons and the rest of the team is there to support them.

2. Programmer Anarchy by Fred George on YouTube is an interesting idea.


MMM is a good book, but I like to suggest Weinberg's The Psychology of Computer Programming as another contemporary (to it) take that discusses both how to examine what's happening in teams (effective or not) and what was observed in some teams (effective and not). Like MMM, it received an update later on. Weinberg left his original text essentially intact (I think light changes to address errors in the printing, not changes to what was written though) and added commentary to each chapter instead of editorializing them directly.


> What are the key assets you monitor beyond the basics like CPU, RAM, and disk usage?

* Network is another basic that should be there

* Average disk service time

* Memory is tricky (even MemAvailable can miss important anonymous memory pageouts with a mistuned vm.swappiness), so also monitor swap page out rates

* TCP retransmits as a warning sign of network/hardware issues

* UDP & TCP connection counts by state (for TCP: established, time_wait, etc.) broken down by incoming and outgoing

* Per-CPU utilization

* Rates of operating system warnings and errors in the kernel log

* Application average/max response time

* Application throughput (both total and broken down by the error rate, e.g. HTTP response code >= 400)

* Application thread pool utilization

* Rates of application warnings and errors in the application log

* Application up/down with heartbeat

* Per-application & per-thread CPU utilization

* Periodic on-CPU sampling for a bit of time and then flame graph that

* DNS lookup response times/errors

> Do you also keep tabs on network performance, processes, services, or other metrics?

Per-process and over time, yes, which are useful for post-mortem analysis


Those are some great ideas for Prometheus alert rules. If they aren't already added here: https://samber.github.io/awesome-prometheus-alerts/


IO wait time for disks is a great one too for catching IO load, `glances` and `atop` do a good job of surfacing it when it's an issue.


With all that, might want some good automatic anomaly detection. While at IBM's Watson lab, I worked out something new, gave an invited talk on the work at the NASDAQ server farm, and published it.

With a lot of monitoring someone might be interested.


How to identify a mistuned vm.swappiness?


I rely on a heuristic approach which is to track the rate of change in key metrics like swap usage, disk I/O, and memory pressure over time. The idea is to calculate these rates at regular intervals and use moving averages to smooth out short-term fluctuations.

By observing trends rather than just static value ( a data point at specific time) you can get a better sense of whether your system is underutilizing or overutilizing swap space. For instance, if swap usage rates are consistently low but memory is under pressure, you might have vm.swappiness set too low. Conversely, if swap I/O is high, it could indicate that swappiness is too high.

This is a poor man’s approach, and there are definitely more sophisticated ways to handle this task, but it’s a quick solution if you just need to get some basic insights without too much work.


That is a good list, now just need to prioritize (after finding the ICP).


Before you start adding all of that make sure you have customers like parent poster.

For example I monitor disk space, RAM, CPU and that’s it for external tooling.

If any of that goes above thresholds someone will log into the server and use windows or Linux tooling to check what is going on.

I mostly monitor services health check endpoints so http calls to our own services. If network is down or shoddy response times of the services.

So all in all not much of servers itself.


I think the hardest part is deciding which gems to use. It's not uncommon to end up with over 50 gems in your Gemfile.

For example, built-in capabilities for authentication are limited: https://github.com/rails/rails/issues/50446

So then do you go with has_secure_password/etc., Devise, rodauth, authentication-zero, or something else? These are big decisions that then might affect other things like authorization, OAuth, PassKey, etc.

And that's authentication & authorization which are a relatively well-understood and maintained area, but other areas might have totally unmaintained gems that might have issues with recent versions of Rails, or native module compilation issues with more recent versions of operating systems, etc.

A lot of Rails guidance on blog posts and StackOverflow might be outdated.

This problem is not unique to Rails. I still think Rails is great and relatively vibrant. Nevertheless, I suggest being very wary of Rails guides, blog posts, and StackOverflow answers that are more than 1 year old and doing a careful study and inventory of gems before deciding to use them and reviewing their relative recent usage and activity.


Number of hours per week would be nice. I think we'll see a lot more demand and supply of 30 and 20 hour per week jobs.


The more common recommendation is to catch what you need to handle in a special way (if any) and then have a catch-all (or re-throw) for the rest, and if you don't need to catch anything specific, then just catch (or re-throw) all.

Think of exceptions like error codes. Most often one just checks if there is an error or not. Sometimes, one checks for specific error codes in addition to the general check. It would be rare to check every single error code, though possible.

By this analogy, I think the recommendation to check each type of exception is very uncommon.

Most importantly, make sure you do always catch exceptions at some level and handle them somehow (even if it's just logging), and also make sure no exception/error information is lost (e.g. blank catch block, not logging all exception details, not re-throwing with an inner exception so the original stack is lost, etc.).


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: