CATEGORICAL REPARAMETERIZATION WITH GUMBEL SOFTMAX
- Link: https://arxiv.org/pdf/1611.01144v1.pdf
- Continuous distribution on the simplex which approximates discrete vectors (one hot vectors) and differentiable by its parameters with reparametrization trick used in VAE.
- It is used for semi-supervised learning.
DEEP UNSUPERVISED LEARNING WITH SPATIAL CONTRASTING
- Learning useful unsupervised image representations by using triplet loss on image patches. The triplet is defined by two image patches from the same images as the anchor and the positive instances and a patch from a different image which is the negative. It gives a good boost on CIFAR-10 after using it as a pretraning method.
- How would you apply to real and large scale classification problem?
UNDERSTANDING DEEP LEARNING REQUIRES RETHINKING GENERALIZATION
- For 110-layers ResNet the most contribution to gradient updates come from the paths with 10-34 layers.
- ResNet trained with only these effective paths has comparable performance with the full ResNet. It is done by sampling paths with lengths in the effective range for each mini-batch.
- Instead of going deeper adding more residual connections provides more boost due to the notion of exponential ensemble of shallow networks by the residual connections.
- Removing a residual block from a ResNet has negligible drop on performance in test time in contrast to VGG and GoogleNet.
The post What I read lately appeared first on A Blog From Human-engineer-being.
Source: Erogol – What I read lately
This paper states the following phrase. Traditional machine learning frameworks (VC dimensions, Rademacher complexity etc.) trying to explain how learning occurs are not very explanatory for the success of deep learning models and we need more understanding looking from different perspectives.
They rely on following empirical observations;
- Deep networks are able to learn any kind of train data even with white noise instances with random labels. It entails that neural networks have very good brute-force memorization capacity.
- Explicit regularization techniques – dropout, weight decay, batch norm – improves model generalization but it does not mean that same network give poor generalization performance without any of these. For instance, an inception network trained without ant explicit technique has 80.38% top-5 rate where as the same network achieved 83.6% on ImageNet challange with explicit techniques.
- A 2 layers network with 2n+d parameters can learn the function f with n samples in d dimensions. They provide a proof of this statement on appendix section. From the empirical stand-view, they show the network performances on MNIST and CIFAR-10 datasets with 2 layers Multi Layer Perceptron.
Above observations entails following questions and conflicts;
- Traditional notion of learning suggests stronger regularization as we use more powerful models. However, large enough network model is able to memorize any kind of data even if this data is just a random noise. Also, without any further explicit regularization techniques these models are able to generalize well in natural datasets. It shows us that, conflicting to general belief, brute-force memorization is still a good learning method yielding reasonable generalization performance in test time.
- Classical approaches are poorly suited to explain the success of neural networks and more investigation is imperative in order to understand what is really going on from theoretical view.
- Generalization power of the networks are not really defined by the explicit techniques, instead implicit factors like learning method or the model architecture seems more effective.
- Explanation of generalization is need to be redefined in order to solve the conflicts depicted above.
My take : These large models are able to learn any function (and large does not mean deep anymore) and if there is any kind of information match between the training data and the test data, they are able to generalize well as well. Maybe it might be an explanation to think this models as an ensemble of many millions of smaller models on which is controlled by the zeroing effect of activation functions. Thus, it is able to memorize any function due to its size and implicated capacity but it still generalize well due-to this ensembling effect.
The post Paper review – Understanding Deep Learning Requires Rethinking Generalization appeared first on A Blog From Human-engineer-being.
Source: Erogol – Paper review – Understanding Deep Learning Requires Rethinking Generalization
Google’s work in artificial intelligence is impressive. It includes networks of hardware and software that are very similar to the system of neurons in the human brain. By analyzing huge amounts of data, the neural nets can learn all sorts of tasks, and, in some cases like with AlphaGo, they can learn a task so well that they beat humans. They can also do it better and in a bigger scale.
AI seems to be the future of Google Search and of the technology world in general. This specific method, called deep learning, is reinventing many of the Internet’s most popular and interesting services.
Google, during its conception and growth, has relied predominantly on algorithms that followed exact rules set by programmers (think ‘if this then that’ rules). Even with that apparent reassurance of human control, there’s still some concerns about the world of machine learning because even the experts don’t fully understand how neural networks work. However, in recent years great strides have been made in understanding the human brain and thus how neural networks could be wired. If you feed enough photos of a dog into a neural net, it is able to learn to identify a dog. In some cases, a neural net can handle queries better than algorithms hand-coded by humans. Artificial intelligence is the future of Google Search and that means it’s probably a big influencer of everything else.
Based on Google Search advances and AI like AlphaGo, experts expect to see:
- More radical deep learning architectures
- Better integration of symbolic and subsymbolic systems
- Expert dialogue systems
And with AI finally dominating the game of Go:
- Deep learning for more intricate robotic planning and motor control
- High-quality video summarization
- More creative and higher-resolution dreaming
Experts consider these methods can accelerate scientific research per se. The idea of having scientists working alongside artificially intelligent systems that can hone in on areas of research is not a farfetched idea anymore. It might happen soon and 2016 looks like a good year for it.
The post What Will Be The Key Deep Learning Breakthrough in 2016? appeared first on 3Blades.
Source: 3blades – What Will Be The Key Deep Learning Breakthrough in 2016?