Data engineers don't work with machine learning at all. In fact one of the reaso...

omega3 · on Nov 29, 2022

> Data engineers don't work with machine learning at all.

This very much depend on the company. From experience DE is used as a catch-all title.

tomrod · on Nov 29, 2022

I haven't come across that. Unless the job title is being polluted like DS was (to include all aspects of data management), DE is specifically about data pipelines and not the models generating those clusters, predictions, or classifications.

jstx1 · on Nov 29, 2022

Data scientist is much more catch-all from what I've seen. But a lot of that varies a lot by geography too (for example in the US people very often use DS very differently from how the title is used in the UK).

pinum · on Nov 29, 2022

>in the US people very often use DS very differently from how the title is used in the UK

I haven't heard about this before and now I'm curious- can you elaborate on the differences?

jstx1 · on Nov 29, 2022

In the US it's more common for data scientist to be similar to a product analyst or a data analyst perhaps with better technical skills. In the UK data scientist is more likely to be someone who is doing applied ML work (other titles for this are ML engineer or applied data scientist).

Obviously it's not a perfectly clean separation but it's a trend, and people sometimes end up really talking past each other. You can see on r/datascience which is very US-heavy how people often recommend to beginners not to bother with advanced ML, stick to SQL, basic Python and analytics, and in the UK data science job market that's outright bad advice (it's fine advice for the UK analytics market which is a separate thing).

jghn · on Nov 29, 2022

This. "Data Engineering" is pretty far from having a standard definition in the wild. If someone is describing a role as "data engineering" about the only thing you can count on being true is that it involves data.

zhdc1 · on Nov 29, 2022

Somewhat surprised that there's a separate job category for what sounds like large-scale data cleaning and aggregation work (which IMO is 90%+ of the effort involved with data science).

Anyway, I'm going to go back to my 5K+ lines of code for an upcoming conference submission - almost all of which involve data cleaning and aggregation - and think about how I could be making a 2x more than I am now.

Thanks Hacker News.

adammarples · on Nov 29, 2022

Depends. If their favourite data engineer says "Oh hey, I can write tensorflow too", then guess who get the job of to "productionizing" their crappy data science notebooks?

jstx1 · on Nov 29, 2022

You have two much more likely options:

1. The person who developed the notebook is responsible for productionizing it. (No, it's not all crappy notebooks and some data scientists can indeed write high quality code).

2. You have someone like an ML engineer whose job it is to do this.

What you're describing seems like the least likely option; at least on the teams I've worked on "I can write tensorflow" would get you nowhere if that's not already a part of your job description.

adammarples · on Nov 29, 2022

Well, anyways that's what I was doing this time last year. Much too small a team to have a dedicated ML engineer though.