Abstract
We offer a formal treatment of choice behavior based on the premise that agents minimize the expected free energy of future outcomes. Crucially, the negative free energy or quality of a policy can be decomposed into extrinsic and epistemic (or intrinsic) value. Minimizing expected free energy is therefore equivalent to maximizing extrinsic value or expected utility (defined in terms of prior preferences or goals), while maximizing information gain or intrinsic value (or reducing uncertainty about the causes of valuable outcomes). The resulting scheme resolves the exploration-exploitation dilemma: Epistemic value is maximized until there is no further information gain, after which exploitation is assured through maximization of extrinsic value. This is formally consistent with the Infomax principle, generalizing formulations of active vision based upon salience (Bayesian surprise) and optimal decisions based on expected utility and risk-sensitive (Kullback-Leibler) control. Furthermore, as with previous active inference formulations of discrete (Markovian) problems, ad hoc softmax parameters become the expected (Bayes-optimal) precision of beliefs about, or confidence in, policies. This article focuses on the basic theory, illustrating the ideas with simulations. A key aspect of these simulations is the similarity between precision updates and dopaminergic discharges observed in conditioning paradigms.
Notes
1 Variational free energy was introduced by Richard Feynman to solve inference problems in quantum mechanics and can be regarded as a generalization of thermodynamic free energy. In this paper, free energy refers to variational free energy. We will see later that minimizing free energy (or maximizing negative free energy) corresponds to maximizing expected value.
2 Note the dialectic between minimizing the entropy expected in the future and maximizing the entropy of current beliefs—implicit in minimizing free energy Friston et al. (Citation2012). “Perceptions as hypotheses: Saccades as experiments.” Front Psychol. 3: 151.
3 For readers interested in technical details, the simulations (and figures) reported in this paper can be reproduced by downloading the academic freeware SPM. Annotated Matlab scripts can then be accessed through a graphical user interface (invoked by typing DEM and selecting “epistemic value”). Please visit http://www.fil.ion.ucl.ac.uk/spm/software/
4 This is a fairly subtle assertion that lies at the heart of active inference. Put simply, agents will adjust their expectations to minimize the free energy associated with any given observations. However, when the agent actively samples observations, it has the opportunity to choose observations that minimize free energy—an opportunity that is only realized when the agent believes this is how it behaves. A more formal proof by reductio ad absurdum—that appeals to random dynamical systems—can be found in Friston and Mathys (Citation2015). I think therefore I am. Cognitive Dynamic Systems. S. Haykin, IEEE press: in press. In brief, to exist, an ergodic system must place an upper bound on the entropy of its states, where entropy is the long-term average of surprise. Therefore, any system that does not (believe it will) minimize the long-term average of surprise does not (believe it will) exist.
5 The values of one half in the first block of the A matrix () mean that the agent cannot predict the cue from that location. In other words, there is no precise sensory information and the agent is “in the dark.”
6 For example, we do not have to worry about how the agent learns all possible configurations of the maze.