Deep Learning

Nice Notebooks - Wide Range of Subjects:


Currently deep learning and any neural networks should only be used in tabular data in time series, it can also do well in an ensemble model, but does not perform well as a self-sustaining model.


The recent paper out from Google, "Scalable and accurate deep learning with electronic health records", has an notable result in the supplement: regularized logistic regression essentially performs just as well as Deep Nets



I've often thought that if your data is unstructured, hand-designed features then DNNs don't make sense and you should use random forests / logistic regression / (naive) bayes etc. For some reason I feel like this is an uncommon perspective.
Of course images/time-stream/etc. structured data is great with DNNs because you can build good prior models with your architectures.
When I see people trying multilayered dense networks on problems with 10-15 unstructured features my first thought is "why?".

I work at a hospital. We joke all the time that we can solve every problem in healthcare with a logistic regression.
It's not really a joke.

Marketing aside , this is not shocking to people who work these problems

Or domains like finance where you often have loads of data but an extremely weak signal, drowning in noise. Usually only the extreme regularising properties of restricting yourself to a linear model can prevent just fitting to the noise.

There is also the joke that most classification problems can be solved with a SQL group by.

The baseline models are using hand-engineered features. If you have access to, or can create, hand-engineered features for your data you should absolutely use logistic regression. Deep learning remains interesting in cases were we don't have these kinds of features.

This is disingenuous to say the least. People in decision making positions across healthcare institutes will think they now need to adopt complicated deep learning architectures and hire deep learning specialists, when instead the message should be "get your data in order and run a simple model that any of your statisticians can easily do".

Reminds me of a result I saw in Nature Biotech a few months ago. In Fig 2(https://www.nature.com/articles/nbt.4061/figures/2), the out of sample performance was higher with regularized regression than with their deep learning model, and this is without feature engineering. It seems strange to me that this got past reviewers.