Oleg Zabluda's blog
Thursday, June 21, 2018
 
How Can Neural Network Similarity Help Us Understand Training and Generalization?
How Can Neural Network Similarity Help Us Understand Training and Generalization?
"""
In order to solve tasks, deep neural networks (DNNs) progressively transform input data into a sequence of complex representations [...] Understanding these representations is critically important, [...] has proven quite difficult, especially when comparing representations across networks. In a previous post, we outlined the benefits of Canonical Correlation Analysis (CCA) as a tool for understanding and comparing the representations of convolutional neural networks (CNNs), showing that they converge in a bottom-up pattern, with early layers converging to their final representations before later layers over the course of training.

In “Insights on Representational Similarity in Neural Networks with Canonical Correlation” we develop this work further to provide new insights into the representational similarity of CNNs, including differences between networks which memorize (randomized labels) [...] from those which generalize [...] we also extend this method to provide insights into the dynamics of recurrent neural networks (RNNs), [...] Comparing RNNs is difficult in many of the same ways as CNNs, but RNNs present the additional challenge that their representations change over the course of a sequence. This makes CCA, with its helpful invariances, an ideal tool for studying RNNs in addition to CNNs. As such, we have additionally open sourced the code used for applying CCA on neural networks with the hope that will help the research community better understand network dynamics.

We found that groups of different generalizing networks consistently converged to more similar representations (especially in later layers) than groups of memorizing networks (see figure below). At the softmax, which denotes the network’s ultimate prediction, the CCA distance for each group of generalizing and memorizing networks decreases substantially, as the networks in each separate group make similar predictions.

We found that groups of different generalizing networks consistently converged to more similar representations (especially in later layers) than groups of memorizing networks (see figure below). At the softmax, which denotes the network’s ultimate prediction, the CCA distance for each group of generalizing and memorizing networks decreases substantially, as the networks in each separate group make similar predictions.

Perhaps most surprisingly, in later hidden layers, the representational distance between any given pair of memorizing networks was about the same as the representational distance between a memorizing and generalizing network (“Inter” in the plot above), despite the fact that these networks were trained on data with entirely different labels. Intuitively, this result suggests that while there are many different ways to memorize the training data (resulting in greater CCA distances), there are fewer ways to learn generalizable solutions.
[...]
wider networks [...] converge to more similar solutions than narrow networks. We also found that trained networks with identical structures but different learning rates converge to distinct clusters with similar performance, but highly dissimilar representations.
"""
https://ai.googleblog.com/2018/06/how-can-neural-network-similarity-help.html

Labels:


| |

Home

Powered by Blogger