I hope I'm not too late to your thread to ask questions. I find your field absolutely fascinating, though I've never taken my interest further than basic undergrad image processing or simple machine learning. I wish I had more time to follow all of my interests.
Anyway, this one particular gray area has been bugging me, and I've been hoping to run into a researcher or someone that has the appropriate context to clarify it for me. It's not technical question, per se.--it's a set of questions that build up to an uncertainty related to model training, reuse, and sharing. (I hope I'm asking the relevant questions...) I don't expect answers to all of this as it would take way too much of anyone's time, but maybe you can pick a few key points and discuss. I'm also really dying to get feedback on the last part as to its feasibility.
1. Do computer vision / modeling researchers typically share common training and evaluation data sets, or is it mostly kept proprietary? If good data sets do exist out in the open, do they undergo continual improvement, or are they frozen? Are these training sets typically amenable to only one type of training--ie., black box, offline vs online, etc.? Does it take a lot of practice/skill to know how much to hold back for post-training evaluation? Do you have an estimate for how much time and effort is involved in manually annotating and curating these data sets?
2. Once an algorithm is developed, does a model get trained under different parameters? Does tweaking those parameters lead to vastly different results? Are there typically distinct optima for a given classification task, or can it vary? Does the training set have a big impact on the performance of the algorithm? Which is more important, the training data or the algorithm?
3. Once models undergo offline training, are they fast to run? Is there a typical runtime complexity, or do different types of models operate significantly differently under the hood (by principle or by computational complexity)? Can you run an extremely complicated and robust model with a ton of classification outputs in sub-millisecond time on commodity hardware?
4. Can trained models be packaged and redistributed as a kind of "open source"? Are there any obvious barriers preventing this, such as the existence (or proliferation) of patents in your field? Do computer vision researchers like to share their code / results? If it's not a common practice to share code, would a large number of researchers be downright opposed to having their (patentable) algorithms and (copyrightable?) models made available to others?
5. Are trained models too complicated for the layman to use and produce good, consistent results? (For our purposes, I would consider a layman to be someone with at least some understanding of basic computer science, including exposure to data structures, algorithms, and a little bit of mathematical ability. Perhaps none of this too deep.)
6. Highly related to the last question, would there be a lot of specialized knowledge required to tweak model input parameters? Would these parameters correlate to mathematical operations? Would there be any arcane and seemingly arbitrary weights to adjust that are deeply encoded into the model itself? (Not sure if I'm far left field here or not.)
I think what I'm ultimately hitting at is that it would be freaking awesome if there were a good set of robust pre-trained classifier models available for the layman programmer. Models that continually undergo development as improvements are made to training sets, algorithms, the literature--what have you. Please tell me if this kind of thing already exists.
Anyway, I feel like I'm about to go on a long-winded talk about a technology I've been imagining. It's something along the lines of a specification that allows for the broad spectrum sharing of reusable, generic, containerized ML and classifier training results with others. In the world I envision, you would share trained models and import them just as you would code libraries.
Let me reiterate: if this sort of thing happens to exist in the wild already, please point me to it! I can think of tons of uses. :)
Note: replied to my own comment because original text was too long.
Hi, I didn't notice this reply for a long time, but felt bad not to reply to something so long :)
1. Yes, good datasets exist. They grow over time and they are open. It takes a lot of time. ILSVRC is good example. There are many more: Pascal VOC, COCO probably best known among them.
2. The optima vary. Training set has huge impact and is super super important. Has to be large and varied otherwise you've lost even before you choose a model
3. They can be very fast to run. With a good GPU, the most recent convnets can run in few milliseconds per image
4. Everything is open and BSD/MIT, no issues at all. Models are often distributed, for example the ILSVRC 2012 model is included with Caffe framework.
5. The trained models are TRIVIAL for layman to use. But they are not trivial to train.
6. Yes there is a lot of specialized knowledge needed to tweak model params.
The summary is these models are now trivial to use - you literally give it an image and it gives you very good predictions for what's inside in a few milliseconds (or few seconds on CPU, not GPU). They are not trivial to train and understand, though. That needs time, practice and mathematical understanding. Caffe is the best framework to look at. Pretrained models are available for 1000 ILSVRC classes for classification and for 200 ILSVRC classes in detection, but not for many other tasks (e.g. scene classification etc.)
(Note: Posted as self-reply because original comment was too long)
I picture a website kind of like Github, or perhaps a language package index like npm's. Except this website is concerned with classification instead of code. You would see the following kinds of downloads:
* Highly optimized, performant, pre-trained classifier models.
Perhaps numbering in the thousands if the site were popular.
You'd see a wide variety of classifiers: general ones,
specific things like "dog breeds" and "celebrities" and "car types".
* Common library code capable of running the models against user data.
Available in a variety of languages. The model classifier format would
have to be generic enough that it can run under libraries in any
environment.
There might also be downloads directed at supporting researchers. Stuff like:
* Well put-together training data sets
* Human-curated annotations, categorizations, ontologies, etc.
available as metadata that can be paired with the training sets in any
way desired. Not all of it may be useful for any given classifier.
Do you know if there is already an existing "standard format" for encoding or serializing pre-trained models for the purpose of sharing and exchanging them? There are likely a few algorithm-specific serialization formats for persisting internal graphs and weights and so forth. But some of these results are left trapped entirety within the confines of the internal data structures of a particular implementation...
In any case, I'm not aware of the existence of one universal format encoding all the things. Because why would there be? What would be "standard" about it? The algorithm space is rather wide and different algorithms encode different things, so there wouldn't even be cross-cutting similarities to take advantage of. It would be like inventing a file format for "text files and PNG images and fonts together!". Arbitrary, pointless. An absurd idea.
Assume it's not pointless, though; start forming a picture of a universal data format or scheme for encoding and sharing all the possible training results irrespective of the source algorithm that produced them or the one that is required to produce the results. We can't just encode the "training result" because we've already demonstrated the complete and utter uselessness. Instead, the universal scheme would have to encode at least three things: a "language descriptor" written as an abstract machine language whose task is creating a bridge between the computer-generated "training result" and the predetermined "classification result" when a user input is provided.
descriptor(input, perception encoding) => classification result
Apart from the user input, everything else is encoded into our data serialization format. The "descriptor" would ordinarily have been some C or MatLab (or whatever) code. It's the part that would have told us "this picture is of a KITTEN" or "this text was written by STEPHEN KING" given all the other inputs. Now it is an encoding of an abstract circuit, state machine, or some other language grammar. Notice also how this has become entirely self-hosting.
If there other other arguments, then,
descriptor(input, A, B, C..., perception encoding) => classification result
Where metadata concerning the purpose, names, types, ranges, and defaults for `A, B, C...` are also encoded in the data format. Classification types, ranges, etc. must also be encoded,
classification result ∈ (class P, Q, R...)
Instead of being compiled to a reduced representation and inlined into the body of the "descriptor", they could be provided as a parameter, adding a further degree of indirection. I won't show any further notation.
To quickly summarize again: The language descriptor performs the task of parsing the model, then accepting an arbitrary input (a classification set, possible parameters, and the subject material), and ultimately yielding production of an output.
By now you've likely noticed that all of this mess isn't technically different than stuffing the executable of the classifier program itself into the serialization of the training results. You'd be correct, of course. It might seem arbitrary here, but I think I'll demonstrate a few nifty results later on. Besides, I'm not really suggesting they be contained in the same file.
To make use of these abstractions, there would need to be some client libraries (C++, Java, Python, etc.) provided that make it trivially easy to load and evaluate any of the classifiers from your own code. Since we went to the trouble of encoding the aforementioned "language descriptor" as an abstract grammar, the whole classifier (training and all) can be hosted and run from anywhere there is a client library provided, essentially making the classifier available from any language. What's more is the client libraries would not require constant updates to support new algorithm variations--it's baked into the data format, so we get the capability for free simply by swapping files.
Another cool thing we could do is define shared sets of "classification results". Instead of defining classes and categories and whatnot on a situation-to-situation basis, perhaps we could draw from a global pool of concepts and ideas pulled from the world. We can impart stable names to as many different things and concepts as possible: classes like "CAR", "PERSON", "BOY", "DOG", "TANK", etc. -- all designed to be globally unique and robust identifiers that can continue to evolve over time without breaking our classifier algorithms. A side benefit is that all classifiers would begin to speak the same shared language. (Granted, not all classification outputs would be amenable to this. Some would. Photo seem like a great use case.)
Now, If we were to build that category database as an ontology database instead... Perhaps you could begin to semantically infer things?
{Dexter's Lab} implies {Cartoon}
Might we infer the person who uploaded it is a 90's kid?
Or for a more graph topology-based, semantic kind of result,
{Velociraptor} implies {Dinosaur}
implies {Predator}
implies {Extinct Animals}
implies {Seen on film} {Jurassic Park}
Coincident occurrence with
{Person}, ->
{Man}, ->
{Sam Neill}
{Sam Neill} was {Seen on film} {Jurassic Park}
We can probably be sure that you're looking at a still from {Jurassic Park} at this point.
I'm not claiming searching the graph like that would be efficient, of course. But if you've got ontological overlap that is cheap to check, it might be fun...
The classifier ontology would be versioned, but probably very slow moving in terms of changes. It might be impractical to package the entire database with an app. With ontologies, you can plug in subsets and wire them to other ones later on,
The pre-trained classifier data can be versioned. Say that someone else did all the work of training it to recognize Bob Ross paintings or whatever.
classifier = new Classifier("algorithm").forModel("BOB-ROSS-PAINTINGS-0.1")
classifier.classify(new Image("http://i.imgur.com..."));
Oops, someone forgot to include happy little trees. But we can fix that,
// Happy Little Trees edition.
classifier = new Classifier("algorithm").forModel("BOB-ROSS-PAINTINGS-1.0")
classifier.classify(new Image("http://i.imgur.com..."));
Produces results:
95% Bob Ross
100% Happy Little Trees
75% Happy Little Clouds
If this kind of tooling and ecosystem existed, do you even know how much fun I could have on Reddit?
But in all seriousness, think of the practicality of reusability. Downloading and running classifiers other people trained, from the language of your choice? That's powerful and empowering. It takes the tech out of the realm of "Google playtoy" and puts it in our collective hands.
Think of the kinds of novel apps that the Average Joe programmer could develop. And if this type of thing truly got the support of the image processing crowd, I can't fathom how much improvement we'd witness on a year-to-year basis.
Does anything like this exist in your field for researchers now? If so, could it be made to be usable by laymen? (Or does it already exist for general audiences? Am I living under a rock?)
If this kind of thing doesn't exist or isn't shared, what steps could be made toward making something like this a reality? Are there critical gating pieces that need to come together first in order to make all of it work? Or conversely, do you feel strongly that something like this just isn't feasible?
Possible complications on building an "open source" set of classifiers is the deep knowledge required to contribute. And what about patents? It's my understanding that universities like to patent research (eg. SIFT), and AFAIK there must be broad coverage of of this space by the universities. That would be a major setback.
Anyway, I've rambled on far too much. If anyone managed to read all of that, please forgive me for inundating you with such a crazy, ill-informed, and long-winded diatribe. I hope I made sense.
The neural architectures change so a set of parameters for one network won't run on another. Image size changes too which changes the nn behind it.
There is not much work on mapping one set of parameters onto another. Transfer learning might have a little applicability. But unlikely at the needing edge of vision research.
Worth noting a neural network IS a general purpose function encoding.
Anyway, this one particular gray area has been bugging me, and I've been hoping to run into a researcher or someone that has the appropriate context to clarify it for me. It's not technical question, per se.--it's a set of questions that build up to an uncertainty related to model training, reuse, and sharing. (I hope I'm asking the relevant questions...) I don't expect answers to all of this as it would take way too much of anyone's time, but maybe you can pick a few key points and discuss. I'm also really dying to get feedback on the last part as to its feasibility.
1. Do computer vision / modeling researchers typically share common training and evaluation data sets, or is it mostly kept proprietary? If good data sets do exist out in the open, do they undergo continual improvement, or are they frozen? Are these training sets typically amenable to only one type of training--ie., black box, offline vs online, etc.? Does it take a lot of practice/skill to know how much to hold back for post-training evaluation? Do you have an estimate for how much time and effort is involved in manually annotating and curating these data sets?
2. Once an algorithm is developed, does a model get trained under different parameters? Does tweaking those parameters lead to vastly different results? Are there typically distinct optima for a given classification task, or can it vary? Does the training set have a big impact on the performance of the algorithm? Which is more important, the training data or the algorithm?
3. Once models undergo offline training, are they fast to run? Is there a typical runtime complexity, or do different types of models operate significantly differently under the hood (by principle or by computational complexity)? Can you run an extremely complicated and robust model with a ton of classification outputs in sub-millisecond time on commodity hardware?
4. Can trained models be packaged and redistributed as a kind of "open source"? Are there any obvious barriers preventing this, such as the existence (or proliferation) of patents in your field? Do computer vision researchers like to share their code / results? If it's not a common practice to share code, would a large number of researchers be downright opposed to having their (patentable) algorithms and (copyrightable?) models made available to others?
5. Are trained models too complicated for the layman to use and produce good, consistent results? (For our purposes, I would consider a layman to be someone with at least some understanding of basic computer science, including exposure to data structures, algorithms, and a little bit of mathematical ability. Perhaps none of this too deep.)
6. Highly related to the last question, would there be a lot of specialized knowledge required to tweak model input parameters? Would these parameters correlate to mathematical operations? Would there be any arcane and seemingly arbitrary weights to adjust that are deeply encoded into the model itself? (Not sure if I'm far left field here or not.)
I think what I'm ultimately hitting at is that it would be freaking awesome if there were a good set of robust pre-trained classifier models available for the layman programmer. Models that continually undergo development as improvements are made to training sets, algorithms, the literature--what have you. Please tell me if this kind of thing already exists.
Anyway, I feel like I'm about to go on a long-winded talk about a technology I've been imagining. It's something along the lines of a specification that allows for the broad spectrum sharing of reusable, generic, containerized ML and classifier training results with others. In the world I envision, you would share trained models and import them just as you would code libraries.
Let me reiterate: if this sort of thing happens to exist in the wild already, please point me to it! I can think of tons of uses. :)
Note: replied to my own comment because original text was too long.