Publication Cover
Cognitive Neuroscience
Current Debates, Research & Reports
Volume 6, 2015 - Issue 4
5,137
Views
359
CrossRef citations to date
0
Altmetric
Discussion Paper

Active inference and epistemic value

, , , , &
Pages 187-214 | Received 23 Oct 2014, Published online: 13 Mar 2015
 

Abstract

We offer a formal treatment of choice behavior based on the premise that agents minimize the expected free energy of future outcomes. Crucially, the negative free energy or quality of a policy can be decomposed into extrinsic and epistemic (or intrinsic) value. Minimizing expected free energy is therefore equivalent to maximizing extrinsic value or expected utility (defined in terms of prior preferences or goals), while maximizing information gain or intrinsic value (or reducing uncertainty about the causes of valuable outcomes). The resulting scheme resolves the exploration-exploitation dilemma: Epistemic value is maximized until there is no further information gain, after which exploitation is assured through maximization of extrinsic value. This is formally consistent with the Infomax principle, generalizing formulations of active vision based upon salience (Bayesian surprise) and optimal decisions based on expected utility and risk-sensitive (Kullback-Leibler) control. Furthermore, as with previous active inference formulations of discrete (Markovian) problems, ad hoc softmax parameters become the expected (Bayes-optimal) precision of beliefs about, or confidence in, policies. This article focuses on the basic theory, illustrating the ideas with simulations. A key aspect of these simulations is the similarity between precision updates and dopaminergic discharges observed in conditioning paradigms.

View correction statement:
Corrigendum

Notes

1 Variational free energy was introduced by Richard Feynman to solve inference problems in quantum mechanics and can be regarded as a generalization of thermodynamic free energy. In this paper, free energy refers to variational free energy. We will see later that minimizing free energy (or maximizing negative free energy) corresponds to maximizing expected value.

2 Note the dialectic between minimizing the entropy expected in the future and maximizing the entropy of current beliefs—implicit in minimizing free energy Friston et al. (Citation2012). “Perceptions as hypotheses: Saccades as experiments.” Front Psychol. 3: 151.

3 For readers interested in technical details, the simulations (and figures) reported in this paper can be reproduced by downloading the academic freeware SPM. Annotated Matlab scripts can then be accessed through a graphical user interface (invoked by typing DEM and selecting “epistemic value”). Please visit http://www.fil.ion.ucl.ac.uk/spm/software/

4 This is a fairly subtle assertion that lies at the heart of active inference. Put simply, agents will adjust their expectations to minimize the free energy associated with any given observations. However, when the agent actively samples observations, it has the opportunity to choose observations that minimize free energy—an opportunity that is only realized when the agent believes this is how it behaves. A more formal proof by reductio ad absurdum—that appeals to random dynamical systems—can be found in Friston and Mathys (Citation2015). I think therefore I am. Cognitive Dynamic Systems. S. Haykin, IEEE press: in press. In brief, to exist, an ergodic system must place an upper bound on the entropy of its states, where entropy is the long-term average of surprise. Therefore, any system that does not (believe it will) minimize the long-term average of surprise does not (believe it will) exist.

5 The values of one half in the first block of the A matrix () mean that the agent cannot predict the cue from that location. In other words, there is no precise sensory information and the agent is “in the dark.”

6 For example, we do not have to worry about how the agent learns all possible configurations of the maze.

Additional information

Funding

The Wellcome Trust funded this work. KJF is funded by the Wellcome Trust [088130/Z/09/Z]. DO is funded by the European Community’s Seventh Framework Programme [FP7/2007-2013] project DARWIN [Grant No: FP7-270138]. GP is funded by the European Community’s Seventh Framework Programme (FP7/2007-2013) project Goal-Leaders (Grant No: FP7-ICT-270108) and the HFSP (Grant No: RGY0088/2014).

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 202.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.