3 March 2018

David Tse recently gave a talk where he said that there is often an unexplored information-theoretic angle to many well-studied machine learning problems. For example, you can use the maximum entropy principle to fit a probability distribution based on samples from it, instead of using maximum likelihood.

That thought ran through my head as I was reading this new paper for journal club: the authors use mutual information as the criterion to determine which features of an image (or any input more generally) are the most responsible for a prediction that is made by a trained neural network: https://arxiv.org/pdf/1802.07814.pdf

Written on March 3, 2018