Nothing shocking. GUI for machine learning existed for decades, for example Weka and Rapidminer. I don't see "drag and drop" in screenshots, however; it's just awful gadget journalism lingo, common for this site with reviews of gaming mouses.
ETL-style dataflow pipelines are more natural for such tasks than imperative programming, it's not just "for people who can't code". Rapidminer is dataflow-based too.
Actual links, if you want to avoid gadget news website:
So, I clicked on the link to understand how this was different than their existing drag-and-drop machine learning tool, and...
> This tool, the Azure Machine Learning visual interface, looks suspiciously like the existing Azure ML Studio, Microsoft’s first stab at building a visual machine learning tool. Indeed, the two services look identical. The company never really pushed this service, though, and almost seemed to have forgotten about it despite the fact that it always seemed like a really useful tool for getting started with machine learning.
> Microsoft says this new version combines the best of Azure ML Studio with the Azure Machine Learning service.
In practice, this means that while the interface is almost identical, the Azure Machine Learning visual interface extends what was possible with ML Studio by running on top of the Azure Machine Learning service and adding that services’ security, deployment and life cycle management capabilities.
The problem with this approach is that so much of machine learning is dependent on the datasets you choose to give it. If people need their hand held through setting up a basic Neural Network, I foresee a lot of garbage in garbage out
This sentiment gets expressed every time programming is made more accessible.
It always turns out that the difficulty of "setting up a basic [hello world application]" is entirely unrelated to the essential complexity of the problem space and attracting a broader range of new users is later viewed as a valuable advance.
Well for example if people don't understand the difference between continuous, discrete, and unrelated values then they will have major flaws. For example, say they are trying to build a NN to predict customer orders by geographic area. If they treat zip codes as continuous or discrete values they're going to get really strange results because ML is ultimately just interpolation/extrapolation. Idk how well drag and drop can convey those principles
I see user skills and the fact that the tool is drag and drop as two different issues, in this context.
A fluent Python developer that doesn't understand basic ML concepts can easily use something like Scikit to code and build "wrong" models. By "basic concepts" I mean standard tasks like data preparation for specific algorithms, sampling, evaluation methods, testing for bias, or just generally how to properly execute most ML tasks like the ones prescribed by something like CRISP-DM.
Someone with basic coding skills - e.g. knows SQL and some imperative programming - but with a solid understanding of ML tasks, and how to execute them properly, probably has a better chance of coming up with better results than the former, using something like IBM Modeller or RapidMiner.
Note that I'm not saying that a drag and drop tool is superior; you could build a flow-based GUI for Scikit, so a tool like this is always, at most, an interface to some code libraries (Scikit, in this example). Having full access to the actual lib, or just better libs, is likely to be less constraining, and more apt for more sophisticated approaches.
Yeah, but... the sentiment exists for a reason. Look, I’m all for growing programming talent, but I’ll leave the toy visual/drag and drop interfaces to those who are interested in using them to build intuition (absolutely nothing wrong with that).
But beyond that, I’m going to use the grown up tools to build real things (and get paid for doing so).
Looks closer to the open source R tool from revolution analytics which they acquired awhile back. Either way, node/graph based ML workflow tools aren’t new. SAS has had a pretty good one for well over a decade. Although it definitely shows its age these days.
I want to build, train and deploy a ML learning model in production.
I do know how to program.
I still want this tool. Badly. So badly. Just because I can program doesn't mean I want to use programming to solve every problem. I want simple tools that do the complex things for me for the 90% of cases where they are good enough.
Let me spend my time on the 10% of problems that simple tools can't solve!
ML is so complicated and so black box-y, it's not like it's a todo list or something.
It seems like this would be the worst of both worlds. Too simple to be of any use to any ML engineer, too complicated for the uninitiated, not customizable enough for a domain expert.
That's because it's not programming. ML describes a strategy using machines to derive information from data. You use programming languages to tell machines how to execute on that strategy.
'tricking'. Such typical tech guy condescension. Excel provides a ramp all the way from "put numbers in boxes" through complicated multidimensional calculations to... well game development if you're crazy enough. It's a good tool for the tasks it was designed for, and it can be used by people of all skill levels.
There was a post yesterday about how it is desirable for Fortran to be preserved as opposed to porting Fortran code to C++, because scientists want to focus on expressing the science and math, not on the low level details of passing arrays around and performing safe array access (which Fortran compilers are more strict about and will emit errors for).
I haven't used this tool, but it seems reasonable that the people who it might be useful for (regardless if Microsoft PR recognizes this or not) are for scientists who have an idea for an algorithm, but don't want to spend too much time thinking about how to write and deploy python/C++ code to their clusters.
Sure, it may not seem like that much effort to many programmers, but as the Fortran discussion stressed, just because you can code and think logically, doesn't mean you're a programmer.
People possessing Domain and Subject knowledge are usually not coders so it makes perfect sense to give a tool that eases the transition into use of this tech. Make the tools accessible enough so that it caters to 80% of use cases and you have already won the battle.
Funny enough, when I saw a demo for this, it was by a programmer, and that was one of his main task at his job. I had the same impression, but he actually made a quite sound argument for it. He wasn't a machine learning expert (yet) and it allowed him and his team to quickly construct models and see what the best fit was. He was quite efficient at using it, and showed us how to build and run models in under an hour. There was no code they had to maintain which was ultimately a lot less work for them.
I think a lot of biologists (such as myself) would find drag and drop ML very useful.
I’m not interested in deploying a service or anything, but it would be a great way to take a first pass at analyzing some of the huge and pretty complex datasets that we generate, like metagenomic DNA sequences of microbiotas that are paired with health related information that could also be fed into the model.
Even just narrowing down a list of potential targets would be pretty darn useful.
In the analytics departments of enterprise sized companies and organisations sits a lot of mathematicians, economics and statisticians. These are the people who are going to use ML to change the world, because business intelligence, prediction and analytics live in these departments.
Not a lot of these people can program.
I know a lot of programmers who want to work with ML, but in my experience, very few programmers are good enough at math or statistics to do so, and even fewer have the business skills to actually translate their results to management in non-tech based organisations.
I’m sure a lot of programmers will make excellent data-scientists, but I’m not entirely convinced why I would bring ML to my programmers rather than my people who have degrees in applied statistics and organisations.
I work in the public sector. One of the reasons ML hasn’t found it’s golden case yet, is largely because no one have figured out how to use ML in a way that is better than the decades worth of data-related work we have already done, and part of the reason behind this, is that companies who sell ML are programmers. They know how to use ML to identify, but none of them, not even IBM seem to know how to use ML for something they can actually get us to buy. And lord knows both sides of the tables have tried, I mean, even our political leadership has heard of the ML hype, and want us to use it. So I’m rather hopeful these tools for non-programmers will bring ML into the hands of people who will know what to use it for.
I think there's value in these services for marketers, or target-ers. I see a lot of potential clients that know only excel /basic SQL and have a large list of customers/orders, or voters & donors. If they could upload and get back a scored list of top targets or leads that would add value. They just need to upload that list back to FB or wherever.
I don't see what the market would be for implementing this into production, like you say to make live in production needs to know how to code. Perhaps though I could see Segment offer a similar tool which I guess would be 'in production' without code.
I think that’s true regarding where value will be produced, but Microsoft can sell way more subscriptions if it’s point and click and presumably wrong. I will say though that they recently added suggested chart types in excel and they don’t seem half bad.
Nice. Microsoft has been doing some great work the last couple of years with ML - especially for beginners its great to get started with their tooling. I always like their UI and workflow parts in Azure ML.
I still use their free Jupyter Notebooks service also.
By the way, I can see a massive adoption of this in consulting. Often we develop ‘quick and dirty’ model to test variables or do a high level regression / predictive model. Having this without the need for Python code is extremely useful, especially when you work in short term, high burn projects and just need an 80/20 answer.
I wonder who their target market are. ML requires a solid math background and the ability to customize every detail of the process which drag-and-drop tools never fully provide. I understand that it's always beneficial to have more user-friendly tools, but I still think ML experts - at least those who don't just copy-paste code snippets from SO - would still prefer the more professional R and Python packages.
Maybe Microsoft aims at teaching ML to beginners, which still would be detrimental if they get used to just that.
I had this exact same thought when I read the headline. It seems like MS and others are viewing ML as a similar opportunity to Big Data/BI ten years ago. You saw the "democratization of data" as people with little technical skills could suddenly create analytics dashboards within tools like Tableau.
In my opinion, it's far too easy to make a critical mistake during design/implementation of ML to follow this same path. And what's more, if you mess up making an analytics dashboard, it's usually fairly obvious. In ML, there are MANY ways to mess up a model and you have no easy way to tell.
If someone doesn't have the technical experience behind creating these models, I would not trust any output they give me from using one of these tools. And if they do have the experience, they would certainly not be choosing to use one of these tools either.
Can you please elaborate more on what kind of critical mistakes a machine can make, while someone with math background would not make.
I am building a competing tool, so I am not affiliate with MS, but I do think that auto ML has value.
Machine learning is different from imperative programming in such that most of the "programming" is done by experiments and not with actual "program", hence there is an opportunity to replace programming with compute. I.e. an automl platform can create 100's of models/pipelines and just try them all.
Also, why would you trust a model which was created manually and not a model which was auto created.
When a model is created in auto ML it pass the same validation process as manually created model, so in both cases the quality of the model should be judged independent from the way that it was created.
In addition, all models (regardless of how they were created - human / not human), should be monitored for predictive performance. I.e. I will not "trust" any model without continuous verification.
A common error is target leaking. An AutoML system will likely consider this a "strong feature". This is where having someone that actually understands the business domain is critical.
There's no question that there's value in AutoML system yet most ML production systems I've worked on / seen were way more complex than feature vector -> model -> prediction.
You likely have multiple models, pipelines, normalizations and plain old conditionals. Hard to automate all of this.
Right. I am aiming at the group of companies that have 0 data scientist and would like to avoid hiring one. I assume that their use cases is simple/common and can be automated.
Note that automation is not only building the model, but automating the full life cycle - pre processing, hp optimization , pipeline deployment and monitoring/retraining.
> "Can you please elaborate more on what kind of critical mistakes a machine can make, while someone with math background would not make.
I am building a competing tool"
the short answer is, go study stats and fundamentals of ML instead of asking hn to build your product for you.
> "why would you trust a model which was created manually and not a model which was auto created."
one of many reasons: domain knowledge is important, and math alone cant tell you things are muffed up. contrived example: you build a linear regression model to predict home price and square footage has a negative coefficient. Math conclusion: bigger house = lower price. domain knowledge: oh, we are missing a feature and the model cant tell the difference between city homes vs rural.
there is value to auto ml but there is a lot of room to go horribly wrong
Again, my point is that for a given data set, an auto ml system is much more efficient and radically cheaper than human modeler.
You are pointing to an area outside the realm of automl (feature engineering/generation) , which is domain specific. But this was not my original question.
this has nothing to do with feature engineering and generation. I never added or changed any features in the example. It is exactly in the realm of automl, you run a model, -because- you are missing data, your model is making wrong assumptions.
You could argue (which you didn't) that this would fall under model interpretation, but a model in this example would probably fail to generalize and make bad predictions in the future: IE slamming home values because they have large square footage.
>In ML, there are MANY ways to mess up a model and you have no easy way to tell.
What about all those businesspeople who only hire analysts to tell them (and their peers) what they want to hear? Now they can tell themselves what they want to hear, having laundered it through a computer.
A lot of ML problems are already solved fairly well and people want to use them in their products. For example, say you wanted to make a smartphone app that had some kind of image recognition. You aren't trying to invent a new machine learning algorithm. This tool would be very convenient for making an app like that.
You won't trust one of these people to work on your project, will you?
The problem gets worse in unsupervised ML, e.g. cluster analysis. Whatever variables you choose, clustering will give you some results. But only an experienced person can understand what variables to choose for the clustering, how to do it, and what those clusters really mean. You can't just try different things in clustering until it "works", because it always works.
I was taking a look at H2O.ai's autoML dashboard yesterday, which they call Driverless AI. Broader in scope, includes an interpretability feature, and seems a little less white box-ey than what I could see from MS. Plus, great looks. Haven't tried it first-hand, but I did take a mental note.
I think an ideal generalized ML service is more like - you give it a CSV and then add another row with missing column(s) and it guesses what should be in those column(s) along with some human readable explanation of how it got there.
So it's like Scratch for ML? You 'draw' a program with a unwieldy graphical interface, and when it inevitably becomes a visual mess of complexity what can you do?
Machine Learning pipelines are at their core a sequence of data transformations. Having a concrete visualization of said transformations may make developing ML pipelines easier, and also allow for clearer communication between developers (and their managers, and potentially future auditors) as to what components they're working on and how they fit together.
Also, by using input output blocks (as opposed to generic functions that can potentially access global state), the data dependencies of different components is made explicit, and can make tooling around that easier. (I don't know how strongly this is enforced in this implementation).
I remember using this in NINETEEN NINETY FIVE (it was called clementine then).
Interestingly one of the leads (Rob Milne) sold up (to IBM , forced sale I guess due to a cash squeeze and no investors) and went to Everest, where he got to the bottom of the Hilary step, had a massive heart attack and died.
ETL-style dataflow pipelines are more natural for such tasks than imperative programming, it's not just "for people who can't code". Rapidminer is dataflow-based too.
Actual links, if you want to avoid gadget news website:
https://docs.microsoft.com/en-us/azure/machine-learning/serv...
https://docs.microsoft.com/en-us/azure/machine-learning/serv...