Minimizing surprise
Active Inference is a normative framework to characterize Bayes-optimal behavior and cognition in living organisms. Its normative character is evinced in the idea that all facets of behavior and cognition in living organisms follow a unique imperative: minimizing the surprise of their sensory observations. Surprise has to be interpreted in a technical sense: it measures how much an agent’s current sensory observations differ from its preferred sensory observations—that is, those that preserve its integrity (e.g., for a fish, being in the water).
[1, p.6]
The active inference framework claims that living organisms have internal realizations of generative models which store probability distributions of:
- The organism’s preferred vital parameter values and external states, real-world states. These are the states that are important for the well-being of the organisms such as the right body temperature or the availability of nourishment.
- The observations that the organism expects to make conditional on the real-world states. Like the feeling of wetness when walking in the rain.
The ultimate objective of the organism is to minimize surprise emanating from observations. Surprise measures the discrepancy between the organisms expected observations and the actual observations. Surprise means that the organism is outside of its “comfort zone” of expected observations and expected states.
The organism uses its generative model to generate (thus the term generative model) observation hypotheses continuously and tests these hypotheses against the actual observations it receives from its body and the outer environment. When the hypothesis and actual observation don’t match but the hypothesis is not strongly preferred, the organism changes it’s hypothesis. If it has strong preferences for the hypothesis to be true it takes action to instead change the real-world state, thereby bringing the actual observations closer to the hypothesis.
Surprise is quantified in terms of the negative logarithm of the probability of an observation as \(-\log p(o)\). The more improbable an observation is according to the generative model, the larger the surprise.
The organism wants to move through life with as few surprises as possible, basically taking the “path of least resistance” where resistance is defined as surprise. It is a useful as baseline approximation to think of an organism’s life as an attempt to minimize the integral of free energy over its lifetime. This is similar to the way nature minimizes the integral of the difference between kinetic energy and potential energy over time obeying the principle of least action [6].

The baseline approximation can be modified by various biases such as future bias that puts different values on surprise in the past, present and in the future and future discounting beyond a certain time horizon. It is also subject to metabolic and cognitive limitations.
There is a subtle difference between expected observations and preferred observations. When the organism has made an actual observation, it measures its surprise in relation to expected observations. When it is planning for the future, it measures (probable) future observations in relation to preferred future observations; it expects things to be in a certain way but may prefer them to be in another way in the future.
Simple organisms that live in predictable environments usually only need to, or can, take one of a few reflex actions to get back to an expected state. The only option a beached fish has is to try to flap and hope the flapping will take it back to the water. Bacteria follow the nutrient gradient. Simple organisms may not have the imagination to hold preferences, only expectations.
Intelligent animals, most notably (at least some) humans, can do more than flap or follow the nutrient gradient. They have a large repertoire of possible actions for returning to a preferred state and they may have strong preferences. Complex corrective actions may require the organism to temporarily visit some apparently unattractive states and make ugly observations in order to get to a preferred state at the end. A human may need to undergo a root canal to get back to a state without toothache. This is in line with the “minimizing integral of surprise” hypothesis suggested above.
As explained in an earlier post, surprise cannot be calculated analytically. Instead the organism uses a quantity called variational free energy (VFE) as a proxy (upper limit) for surprise in perceptual inference and expected free energy (EFE) as a proxy for (future) surprise in action inference 1. Variational free energy was introduced in an earlier post. We will dig deeper into both variational free energy and expected free energy in this post.
Free energy is thus the loss function that the organism seeks to minimize at all times to stay within its homeostatic and allostatic states.
Preference and prejudice
As explained in an earlier post about perceptual inference the organism holds beliefs about the probabilities of real-world states prior to any observations. A slightly derogatory but intuitive term for these beliefs would be prejudices. States with high probabilities are in AIF called expected states or preferred states, depending on the situation. Expected states are shaped by the history of the species and the individual organism.
Prior beliefs are represented by a probability distribution \(p(s)\) where \(s\) represents the state. This probability distribution is often referred to simply as the prior. (“Adjusting one’s priors” has become a popular way of saying that one has to challenge one’s deeply held beliefs.)
Like all prejudices the prior beliefs can be true or false but they must on average be adaptive enough. Otherwise the organism would behave in a way that threatens its survival 2 3.
Prior beliefs modulate the organism’s inferred beliefs about the world after an observation has been made according to Bayes’ theorem:
$$p(s \mid o) = \frac{p(o \mid s)p(s)}{p(o)}$$
A sharply peaked \(p(o \mid s)\) is required to shift a strongly held prior belief into a different posterior belief. If a prior state is believed to have 100% certainty then no observation, however strong, will change this belief.
Just like the organism has expectations about the current state of the world, it also has preferences about future states or, equivalently, what future observations it finds desireable. When the organism decides what to do next, it runs simulations of future states and observations to determine which actions would most likely lead to preferred states or observations. Depending on the complexity of the organism, the simulations may be anything from determining which way the nutrient gradient points to planning a professional career.
In a stable world the preferred observations or preferred states should be same as the expected observations and expected states respectively. The future body temperature should be in line with the historical body temperature, give or take. The fish both expects and prefers to be in water rather than on land. If this is the case, then the same generative model can guide both perceptual inference and the “simulation” in action inference.
Concretely the prior \(p(s)\) could in stable situations represent both expected states and preferred states. If this is not the case, then the organism must hold two different generative models in its “head”: one for analyzing the current state of affairs and one for planning for the future. Intuitively this seems reasonable at least for humans as we often prefer to be in a state that is different from the state we expect to be in (sometimes to the detriment of our mental health).
The jury seems to still be out regarding whether the brain defines its preferences in terms of observations or states. Both variants can be found in the literature without much motivation for one or the other. The generative model maps between states and observation and expressions for EFE can be derived for both preferences as states and preferences as observations 4. When I below write “preferred states” it should be read “preferred states or observations”
Planning for the future
I predict myself therefore I am.
Anil Seth. Being You.
A sequence of future actions is in AIF called a policy, \(\pi\). A policy is a sequence of discrete actions 5 \(\pi = [a_0, a_1, \ldots a_{T-1}] = a_{0:T-1}\), each with the purpose of moving the organism to a new state.
Each policy will lead to a unique sequence of observations \(o_{1:T}\) and a sequence of associated states \(s_{1:T}\). We will in the following for convenience and brevity skip the subscripts in most equations and use the notations \(o\) and \(s\) for the full sequences \(o_{1:T}\) and \(s_{1:T}\) respectively.
The purpose of action inference is to, when action needs to be taken, infer an optimal probability distribution of policies, \(\hat q(\pi)\), that minimizes free energy in the future. The actual policy to employ can then be sampled from the distribution by for instance choosing the most probable policy (or if one is adventurous, a slightly less probable policy).
Since all the states and observations are in the future when evaluating policies, the policy distribution has to be inferred based on a “simulation” during which each possible policy is evaluated probabilistically with respect to whether it will minimize future surprise or not.
The policy distribution inference can be formulated as an optimization (in this case minimization) problem with expected free energy as the loss function.
The active inference frameword doesn’t, as far as I know, say anything about where the candidate policies come from, only about how to choose the optimal policy among the candidates.
Generative model including action
In preparation for understanding action inference, we start with making two enhancements to the generative model for perceptual inference introduced in this post.
First, we don’t just infer the state distribution at the current moment in time (\(t = 0\)) but estimate a the distribution of the whole sequence of states leading up to \(t = 0\), \(s_{-T+1:0}\), based on the corresponding sequence of observations \(o_{-T+1:0}\).
We will also include the actions that cause the organism to go from one state to the next, \(a_{-T:-1}\). The sequence of actions is a policy, \(\pi\).
The above additions mean that \(p(o, s)\) becomes \(p(o, s, \pi)\). This distribution can be envisioned as a three-dimensional volume with the dimensions \(o, s\) and \(\pi\) with a probability value \(p(o, s, \pi)\) at each coordinate \((o, s, \pi)\) of the volume. This means that we can (in principle) in the volume look up for instance what policies co-occur with high probability with what sequences of observations and states and which observations and states are likely to co-occur.
For any probability distribution \(p(o, s, \pi)\), it is true that:
$$p(o) = \sum_{s, \pi} p(o, s, \pi)$$
This is the marginal probability distribution of \(o\). Expressed in terms of surprise this becomes:
$$- \log p(o) = – \log \sum_{s, \pi} p(o, s, \pi) = – \log \mathbb E_{q(s, \pi)} \left[\frac{p(o, s, \pi)}{q(s, \pi)}\right] \ \ \ \ (1)$$
In the last expression we have multiplied the expression in both the nominator and the denominator with the probability distribution \(q(s, \pi)\) which represents the variational posterior of perceptual inference.
Jensen’s inequality gives:
$$- \log p(o) = – \log \mathbb E_{q(s, \pi)} \left[\frac{p(o, s, \pi)}{q(s, \pi)}\right] \leq – \mathbb E_{q(s, \pi)} \left[\log \frac{p(o, s, \pi)}{q(s, \pi)}\right] = \mathcal F[q; o, \pi]$$
where \(\mathcal F\) is the variational free energy at time \(0\), introduced in an earlier post and further explained here, but now based on whole sequences and and taking actions into account for inferring the distribution of states.
Formally \(\mathcal F\) is a functional (function of a function) of the variational distribution \(q\), parametrized by \(o\) and \(\pi\). The variational distribution is the “variable” that is modified to find the minimum value of \(\mathcal F\) in perceptual inference.
In accordance with the earlier post, for a certain policy \(\pi\) we get:
$$\mathcal F[q; o] = \mathbb{E}_{q(s, \pi)} [\log q(s, \pi) – \log p(o, s, \pi)] \Rightarrow \ \ \ \ (2)$$
$$\mathcal F[q; o, \pi] = \mathbb{E}_{q(s \mid \pi)} [\log q(s \mid \pi) – \log p(o, s \mid \pi)] = \ \ \ \ (3)$$
$$\mathbb{E}_{q(s \mid \pi)} [\log q(s \mid \pi) – \log p(o \mid s) – \log p(s \mid \pi))]=$$
$$\sum_s q(s \mid \pi) \log q(s \mid \pi) – \sum_s q(s \mid \pi) \log p(o \mid s) – \sum_s q(s \mid \pi) \log p(s \mid \pi) =$$
$$D_{KL}[q(s \mid \pi) \mid \mid p(s \mid \pi)] – \mathbb E_{q(s \mid \pi)}[p(o \mid s)]$$
Above we have taken into account that the likelihood \(p(o \mid s)\) does not depend on \(\pi\). It describes the sensory system and is therefore relatively stable.
Doing it the other way around:
$$\mathcal F[q; o, \pi] = \mathbb{E}_{q(s \mid \pi)} [\log q(s \mid \pi) – \log p(o, s \mid \pi)] =$$
$$\mathbb{E}_{q(s \mid \pi)} [\log q(s \mid \pi) – \log p(s \mid o, \pi) – p(o \mid \pi))] =$$
$$\sum_s q(s \mid \pi) \log q(s \mid \pi) – \sum_s q(s \mid \pi) \log p(s \mid o, \pi) – \sum_s q(s \mid \pi) \log p(o \mid \pi) =$$
$$D_{KL}[q(s \mid \pi) \mid \mid p(s \mid o, \pi)] – \log p(o \mid \pi)$$
Since a Kullback-Leibler divergence is always greater than zero, this shows in another way that free energy is always larger than the surprise (here the surprise is conditioned on \(\pi\); different policies yield different surprises).
In this enhanced model the prior distribution of states in the current moment depends on the policy that was executed leading up to the current moment. If the last thing we did was to jump into the lake we would expect a wet state. Had we chickened out we would expect to stay dry. A prior informed by the last action (and potentially all actions before that) would in turn modulate the approximate posterior; it would be much more likely that we were actually (in the posterior sense) wet if we had taken that jump. The last action of the policy would precede the observation so it could inform the posterior even before any observation was made.
Since we assume that the generative model is available for the organism for calculating free energy it follows that \(p(o \mid s)\) and \(p(s \mid \pi)\) are also available for computation 6.
Inferring \(\hat q(\pi)\)
As mentioned above, the purpose of action inference is to, when action needs to be taken, infer an optimal probability distribution of policies, \(\hat q(\pi)\), that minimizes free energy in the future. The actual policy can then be sampled from the distribution by for instance choosing the most probable policy.
To move from looking into the past to looking into the future we need to make further modifications to the free energy to transform it to expected free energy.
First we replace \(p(o, s, \pi)\) with \(\tilde p(o, s, \pi)\). The first distribution is the generative model that is valid up until the current moment when doing perceptual inference. The second distribution is valid for the planning horizon into the future from the current moment.
The two distributions may or may not be the same. \(p(o, s, \pi)\) defines expected states and observations whereas \(\tilde p(o, s, \pi)\) defines preferred states and observations. Sometimes we may prefer something we don’t actually expect.
The second modification is based on the fact that we don’t know what exact observations we will run into in the future so we must work with probabilities and more concretely a distribution of observations. This means that equation \((2)\) is transformed into an expectation over \(s, \pi\) and \(o\); \(\mathcal F[q; o]\), the “retrospective” free energy is replaced by \(\mathcal G[q]\), the “prospective” free energy 7.
$$\mathcal G[q] = \mathbb{E}_{q(s, o, \pi)} [\log q(s, \pi) – \log \tilde p(o, s, \pi)] = \ \ \ \ (4)$$
$$\mathbb{E}_{q(\pi)} \left[\log q(\pi) + \mathbb{E}_{q(s, o \mid \pi)} \left[ \log q(s \mid \pi) – \log \tilde p(o, s \mid \pi) \right]\right]=$$
$$\mathbb{E}_{q(\pi)} \left[\log q(\pi) + \mathcal G[q; \pi]\right] \ \ \ \ (5)$$
where
$$\mathcal G[q; \pi] = \mathbb{E}_{q(s, o \mid \pi)}\left[\log q(s \mid \pi) – \log \tilde p(o, s \mid \pi) \right]$$
Starting from \((5)\) we get:
$$\mathcal G[q] = \mathbb{E}_{q(\pi)} \left[\log q(\pi) + \mathcal G[q; \pi]\right] =$$
$$\sum_{\pi} q(\pi)(\log q(\pi) – \log e^{- \mathcal G[q; \pi]}) =$$
$$\sum_{\pi} q(\pi)\left(\log q(\pi) – \log \left(\frac{e^{- \mathcal G[q; \pi]} \sum_{\pi} e^{- \mathcal G[q; \pi]}}{\sum_{\pi} e^{- \mathcal G[q; \pi]}}\right)\right) =$$
$$\sum_{\pi} q(\pi)\left(\log q(\pi) – \log (\frac{e^{- \mathcal G[q; \pi]}}{\sum_{\pi} e^{- \mathcal G[q; \pi]}} \right) – \sum_{\pi} q(\pi) \log \sum_{\pi} e^{- \mathcal G[q; \pi]}=$$
$$\sum_{\pi} q(\pi)\left(\log q(\pi) – \log (\frac{e^{- \mathcal G[q; \pi]}}{\sum_{\pi} e^{- \mathcal G[q; \pi]}} \right) – \log \sum_{\pi} e^{- \mathcal G[q; \pi]}=$$
$$\sum_{\pi} q(\pi)\left(\log q(\pi) – \log \sigma (- \mathcal G[q; \pi])\right) – C$$
Remember that the objective was to find the \(\hat q(\pi)\) that minimizes \(\mathcal G[q; \pi]\). \(\sigma\) is the softmax function, often used at the output artificial neural network classifiers. It normalizes any array of values so that they sum up to one, effectively turning the values into a probability distribution.
\(C\) is a quantity that doesn’t depend on \(q(\pi)\), only the total set of candidate policies which remain unchanged during action inference. Since both \(q(\pi)\) and \(\sigma (- \mathcal G[q; \pi])\) are proper probability distributions we can express \(\mathcal G[q]\) as a KL-divergence:
$$\mathcal G[q] = D_{KL}\left[q(\pi) \mid \mid \sigma (- \mathcal G[q; \pi])\right] – C$$
This expression is trivially minimized if we set \(\hat q(\pi) =\sigma (- \mathcal G[q; \pi])\), making the two distributions of the divergence equal.
Shapes of \(\mathcal G[q; \pi]\)
To find \(\hat q(\pi)\) we thus have to find the expected free energy \(\mathcal G[q; \pi]\) for each candidate policy.
Preferences as a distribution of states
This section outlines a derivation of \(\mathcal G[q; \pi]\) in the case when the organism’s preferences are expressed in terms of preferred states.
$$\mathcal G[q; \pi] = \mathbb E_{q(o, s \mid \pi)}[\log q(s \mid \pi) – \log \tilde p(o, s \mid \pi)] =$$
$$\mathbb E_{q(o, s \mid \pi)}[\log q(s \mid \pi) – \log p(o \mid s) – \log \tilde p(s \mid \pi)] \ \ \ \ (6)$$
\(\log p(o \mid s)\) characterizes the sensory system and does therefore not depend on \(\pi\). It is also assumed to be the same in the future as in the past. These two assertions imply that \(\tilde p(o, s, \pi) = p(o \mid s) \tilde p(s \mid \pi)\)
\(\tilde p(s \mid \pi)\) represents the preferred (target) distribution of states; we minimize \(\mathcal G[q; \pi]\) under the assumption that this distribution is true. The preferred distribution is the same regardless of policy so we can set \(\tilde p(s \mid \pi) = \tilde p(s)\). This means that \(\tilde p(o, s, \pi) = p(o \mid s) \tilde p(s) = \tilde p(o, s)\) [3, p.453].
Also, \(q(o, s \mid \pi) = p(o \mid s)q(s \mid \pi)\) where \(q(s \mid \pi)\) is the variational posterior inferred during the “simulation”.
Equation \((6)\) therefore becomes:
$$\mathcal G[q; \pi] =\mathbb E_{p(o \mid s)q(s \mid \pi)}[\log q(s \mid \pi) – \log p(o \mid s) – \log \tilde p(s)] =$$
$$\sum_{o, s} p(o \mid s)q(s \mid \pi)[\log q(s \mid \pi) – \log p(o \mid s) – \log \tilde p(s)] =$$
$$\sum_s \left(\sum_o p(o \mid s)q(s \mid \pi)[\log q(s \mid \pi) – \log \tilde p(s)] – \sum_o q(s \mid \pi)p(o \mid s)\log p(o \mid s)\right) =$$
$$\sum_s \left(q(s \mid \pi)[\log q(s \mid \pi) – \log \tilde p(s)]\sum_o p(o \mid s) – q(s \mid \pi)\sum_o p(o \mid s)\log p(o \mid s)\right)$$
$$\sum_o p(o \mid s) = 1 \Rightarrow$$
$$\mathcal G[q; \pi] = \sum_s \left(q(s \mid \pi)[\log q(s \mid \pi) – \log \tilde p(s)] – q(s \mid \pi)\sum_o p(o \mid s)\log p(o \mid s)\right) =$$
$$D_{KL}[q(s \mid \pi) \mid \mid \tilde p(s)] + \mathbb E_{q(s \mid \pi)}[\mathbb H[p(o \mid s)]]$$
Preferences as a distribution of observations
This section outlines a derivation of \(\mathcal G[q; \pi]\) in the case when the organism’s preferences are expressed in terms of preferred observations.
$$\mathcal G[q; \pi] = \mathbb E_{q(o, s \mid \pi)}[\log q(s \mid \pi) – \log \tilde p(o, s \mid \pi)] =$$
$$\mathbb E_{q(o, s \mid \pi)}[\log q(s \mid \pi) – \log p(s \mid o, \pi) – \log \tilde p(o)] =$$
$$\sum_{o, s} q(o, s \mid \pi)[\log q(s \mid \pi) – \log p(s \mid o, \pi) – \log \tilde p(o)]$$
Bayes’ theorem gives:
$$p(s \mid o, \pi) = \frac{p(o \mid s, \pi)p(s \mid \pi)}{p(o \mid \pi)}$$
Also, as stated above, the likelihood \(p(o \mid s)\) doesn’t depend on \(\pi\). We get:
$$p(s \mid o, \pi) = \frac{p(o \mid s)p(s \mid \pi)}{p(o \mid \pi)}$$
$$\mathcal G[q; \pi] = \sum_{o, s} q(o, s \mid \pi)[\log q(o \mid \pi) – \log p(o \mid s) – \log \tilde p(o)] =$$
$$\sum_{o, s} q(o, s \mid \pi)[\log q(o \mid \pi) – \log \tilde p(o)] – \sum_{o, s} q(o, s \mid \pi)[\log p(o \mid s)]$$
$$ q(o, s \mid \pi) = q(o \mid \pi)p(s \mid o, \pi) \Rightarrow$$
$$\mathcal G[q; \pi] = \sum_o \left(\sum_s p(s \mid o, \pi)q(o \mid \pi) [\log q(o \mid \pi) – \log \tilde p(o)]\right) – \sum_s \left(\sum_o p(o \mid s)q(s \mid \pi) \log p(o \mid s)\right) =$$
$$\sum_o \left(q(o \mid \pi) [\log q(o \mid \pi) – \log \tilde p(o)] \sum_s p(s \mid o, \pi)\right) – \sum_s q(s \mid \pi) \left(\sum_o p(o \mid s) \log p(o \mid s)\right)$$
$$ \sum_s p(s \mid o, \pi) = 1 \Rightarrow$$
$$\mathcal G[q; \pi] = D_{KL}[q(o \mid \pi) \mid \mid \tilde p(o)] + \mathbb E_{q(s \mid \pi)} \mathbb H[p(o \mid s)] $$
Note that the two expressions for \(\mathcal G[q; \pi]\) don’t necessarily yield exactly the same answer as they start from different premises. It can in fact be proven that [1, p.251]:
$$D_{KL}[q(s \mid \pi) \mid \mid \tilde p(s)] + \mathbb E_{q(s \mid \pi)}[\mathbb H[p(o \mid s)]] \geq$$
$$D_{KL}[q(o \mid \pi) \mid \mid \tilde p(o)] + \mathbb E_{q(s \mid \pi)} \mathbb H[p(o \mid s)]$$
Whence \(\mathcal G[q; \pi]\)
We are not done yet though. We still need to calculate each \(\mathcal G[q; \pi]\). Remember that \(o = o_{1:T}\) and \(s = s_{1:T}\) are sequences of observations and corresponding states. To simplify the inference of \(\mathcal G[q; \pi]\) we will make a couple of assumptions [3, p.453]:
$$q(s_{1:T} | \pi) \approx \prod_{t=1}^{T} q(s_t | \pi)$$
And
$$p(o_{1:T}, s_{1:T} | \pi) \approx \prod_{t=1}^{T} p(o_t, s_t | \pi)$$
These simplifications rely on a mean field approximation which assumes that all dependencies between the observations and states in a sequence are captured by the parameter \(\pi\) so that the distribution can be expressed as a product of independent distributions.
$$\mathcal G[q; \pi] = \mathbb{E}_{q(s, o \mid \pi)} \left[\log q(s \mid \pi) – \log \tilde p(o, s \mid \pi)\right] =$$
$$\mathbb{E}_{q(s, o \mid \pi)} \left[\sum_{t=1}^T \left(\log q(s_t \mid \pi) – \log \tilde p(o_t, s_t \mid \pi)\right) \right] =$$
$$\sum_{t=1}^T \mathbb{E}_{q(s, o \mid \pi)} \left[\log q(s_t \mid \pi) – \log \tilde p(o_t, s_t \mid \pi) \right] = $$
$$\sum_{t=1}^T \mathcal G[q; \pi, t]$$
Where
$$\mathcal G[q; \pi, t] = \mathbb{E}_{q(s_t, o_t \mid \pi)} \left[\log q(s_t \mid \pi) – \log \tilde p(o_t, s_t \mid \pi)\right]$$
We can thus find \(\mathcal G[q; \pi]\) by calculating \(\mathcal G[q; \pi, t]\) for each time step in the planning horizon and taking the sum of all values.
Links
[1] Thomas Parr, Giovanni Pezzulo, Karl J. Friston. Active Inference. MIT Press Direct.
[2] Ryan Smith, Karl J. Friston, Christopher J. Whyte. A step-by-step tutorial on active inference and its application to empirical data. Journal of Mathematical Psychology. Volume 107. 2022.
[3] Beren Millidge, Alexander Tschantz, Christopher L. Buckley. Whence the Expected Free Energy. Neural Computation 33, 447–482 (2021).
[4] Stephen Francis Mann, Ross Pain, Michael D. Kirchhoff. Free energy: a user’s guide. Biology & Philosophy (2022) 37: 33.
[5] Carol Tavris on mistakes, justification, and cognitive dissonance. Sean Carroll’s Mindscape: science, society, philosophy, culture, arts, and ideas. Podcast (Spotify link).
[6] The principle of least action. Feynman Lectures.
- The term active inference is used to describe both the whole framework and the action-oriented part of the framework. I find this confusing. Active inference seems to refer to the fact that the policy (sequence of actions) can be inferred through an inference algorithm resembling the one used to infer the posterior in perceptual inference. To avoid overloading the term active inference I will call the action-oriented part of active inference action inference for now. ↩︎
- An interesting question is how off one’s priors can be before they become detrimental. Looking at today’s rampant political lies, conspiracy theories and extreme religious beliefs, quite a lot it seems. ↩︎
- Humans sometimes end up in maladaptive or even self-destructive states so their preferences can be poorly calibrated. ↩︎
- Intuitively (and speculatively) I believe I think in terms of states when I plan actions consciously, like when I (a long time ago) planned my education. I wanted to end up in a state of competence and knowledge which is a very abstract state that is not easy to characterize with an observation. When I jump my horse, I on the other hand see my self on the other side of the fence after the jump (as opposed to on the ground in front of the fence). A physical and to a degree unconscious preference like that may be more observation-based. There is most likely of hierarchy of representations from e.g., raw retinal input to abstract states so maybe the difference between states and observations is not so clear-cut. ↩︎
- It is possible to build a continuous model of active inference. I start with introducing the discrete variant as it is somewhat more intuitive. ↩︎
- It is tempting to use the term “known quantity” but that might lead one to believe that the generative model is cognitively known to the organisms which it probably not the case. It suffices that it can use the generative model for its unconcious inference of the approximate posterior. ↩︎
- We can derive \(\mathcal G[q]\) from \(\mathcal F[q; o]\) by taking the expectation of \(\mathcal F[q; o]\) over \(q(o \mid s, \pi) = p(o \mid s)\). (We cannot take the expectation over \(q(o \mid pi)\) as this would ignore the fact that the observations and the states are correlated.) \(\mathcal G[q] = \mathbb{E}_{p(o \mid s)q(s, \pi)} [\log q(s, \pi) – \log p(o, s, \pi)]\). ↩︎