Kaggle’s Yelp Restaurant Photo Classification Competition, Fast.ai Style

kaveh_h · on June 25, 2018

Nick (the author) reflects on this: "The data needs to be merged into a format which can be used to train a neural network. Solving this leads to the second, much bigger issue: many of the resulting label to image mappings are inappropriate...".

If one instead use a RNN (recurrent neural network) particularly with LSTM, then it can take as input the sequence of all photos from a business and the output of the model would then be the sequence of labels, similar to how translation models work. Of course then another problem could perhaps be the ordering of labels is unrelated to order of photos for a business, but there is probably some way to handle this by either data synthesis (multiple permutations of the training data to ignore ordering of labels) or by sorting the labels in a certain fashion that the model can learn.

harveynick · on June 25, 2018

Hi, Nick here.

That's really interesting. I have another 2000-ish word length part two* I'm hoping to put out around the end of the week with some ideas for others things to try. Most of my ideas for next steps revolved around tweaking the data loader to output composite images, or tweaking the loss the function to be more forgiving of false negatives. This is an alternative I hadn't considered.

You would still need 34+ layers of ResNet to detect the features, I think, but perhaps the fully connected classification layers could be replaced with an RNN. You would probably ignore all of the outputs aside from the last one, so that part of the ordering wouldn't matter. Input ordering might have an effect, though.

Assuming you don't object, I'm going to add this to the "extension ideas" at the end of my part 2 (with a hat tip, of course).

Of course the next problem would be figuring out how to actually accomplish this with the fast.ai library...

* I was originally on the fence about just publishing the entire 5000-ish words as a single post, but Ulysses actually started to have trouble with it and that pushed me over the edge.

matt4077 · on June 25, 2018

The LSTM idea is good, but the ordering problem stems from the LSTM assumption that the input is in some way ordered (i.e. words in a text, or frames in a video).

I would instead try separate inputs for each image, followed by an aggregation layer. I think a maximum-pooling approach makes most sense for most of the labels: there seem to be some photos for each restaurant providing all the information regarding certain labels. When there are five interior images, and one showing a patio, you want the latter to determine the label "outdoor dining".

For other characteristics, such as "elegant" (I'm making up these labels because I read the post yesterday and don't remember the specifics) average pooling might be a better fit.

In fact, a simple concatenation might work well, allowing the model to learn these things on its own. With differing numbers of images per restaurant that wouldn't work, though. Instead, one might try both MAX and AVG pooling, followed by concatenation. Or, if available, play around with other types of averaging, such as trimming outliers.

On another point: I was slightly put off by the criticism of the data structure. That structure follows directly from the problem being solved and isn't some oversight making life more complicated than necessary. To work with such data, I tend to throw it into a rails application and export whatever format I need (although I'd use a python solution if I were more familiar with them).

kaveh_h · on June 26, 2018

Actually the LSTM model doesn't asume anything about ordering itself, but it could probably overfit because of ordering depending on how much training data is available, and in this case we know that the order of input and output is not directly related. One way to make the LSTM model more robust and less prone to over-fitting would simply be by making a number of copies of each training examples where each copy is then mutated by changing order of the images and tags.

harveynick · on June 25, 2018

Could you clarify what you mean by "the criticism of the data structure"? I don't entirely follow your meaning, but I suspect there might be something in the post I need to update/clarify.

kaveh_h · on June 26, 2018

Glad to inspire and go ahead and use this idea. By the way my full name is Kaveh Hadjari.

Yes you would still would need a good image classifier like ResNet 34 to actually extract usuable features which could then be used as input to a RNN model.

An important point if you would try to actually implement such a model, don't train the image classifier together with the RNN as that would probably make the training unnecessarily slow with the end result of either not making a dent in the weights of the ResNet or simply overfitting the relatively small and inappropriate training set that is provided by Yelp for this competition. Instead use the ResNet classifier in a preprocessing step to transform the images to the activations of a later layer in the ResNet and use those as the input for the RNN model, note when training the RNN you should persist these activations as you would otherwise would unnecessarily need to run forward prop multiple times through the ResNet for each image. In this manner you get the benefit of the ResNet which is great at extracting features from images and the ability to train the RNN model much more quickly.

namuol · on June 25, 2018

I'd love to see more articles from fast.ai students covering topics other than image (multi) classification. I've been gradually going through the course, and attempting to apply what I'm learning as I go, but I rarely see good results, especially with structured data.

autokad · on June 25, 2018

> "but I rarely see good results, especially with structured data."

what algorithms are you using? I dont know if I am a ds neanderthal, but DNN are the last thing I try. xgboost and some ensembling with catboost usually gets very good results with little effort

namuol · on June 26, 2018

There are lectures that specifically use deep learning (in the form of fastAI's high-level `ColumnarData` API) to get state of the art results on the Rossmann Kaggle competition, but they _do_ borrow a good amount of feature engineering from the third-place winners from that competition.

It might very well be that the DNN sledgehammer is simply the wrong tool for the job in most structured-data cases; if that's so, even an article outlining _why not_ to use FastAI for these scenarios (with actual performance comparisons) would be nice.

That's all I'm looking for -- I should have mentioned that I am almost 100% new to ML/DL/data science & friends.

halflings · on June 25, 2018

Wouldn't call the ensembling of xgboost models + catboost "little effort". One thing that is true however is that xgboost will usually give better performance without tuning, vs DNNs without any tuning / hyperparameter search.

autokad · on June 26, 2018

psudocode here:

yhat_xgboost = xg.predict(x_test)

y_hat_cat = cat.predict(x_test)

y_hat = .7 * y_hat_xgboost + .3 * y_hat_cat

how is that not little effort?

namuol · on June 26, 2018

I really think the FastAI course needs to be taken with a heaping dose of salt in the form of "here are all the non-DL solutions to common problems" -- there's a ton of material out there already, but based on anecdotal evidence, there are a lot of folks very new to ML & data science in general that are taking the course (i.e. people like myself).

harveynick · on June 27, 2018

Have you looked at fast.ai's ML course? It might also be worth a try. http://forums.fast.ai/t/another-treat-early-access-to-intro-...

I'd also recommend the Andrew Ng Coursera course, which I talked about here: https://harveynick.com/2018/04/25/some-notes-on-the-andrew-n...

ericd · on June 25, 2018

It can be a bit tricky, there are a lot of gotchas, and it's generally hard to debug if you're new to it.

I've found it helps to walk through and dig into everything the notebooks are doing, how the data loaders work, etc. The Fast.ai library does a lot of automagical self-configuration, which can save time if you're experienced with it and know what to expect, but can burn a lot of time when it does something you don't expect, and don't notice it doing that.

slow_donkey · on June 25, 2018

Dl for structured data is something I'm interested in as well. There's a lecture on fastai going over categorical embeddings which is interesting. There's also an airbnb article on the same topic that shouldn't be hard to find.