What is surprise? – ostrogothia

While the goal of perception is to infer a representation of a real-world state from observations, AIF literature posits that the goal of all actions that an agent plans and takes is to minimize surprise. One definition of surprise is how much an agent’s current sensory observations differ from its preferred sensory observations [1, p.6]. Taking actions to minimize surprise secures the allostasis and integrity of the organism; it keeps the organism alive.

Agents have varying complexity with respect to their ability to minimize surprise. Simple organisms live in the present, constantly following a “surprise gradient”, e.g., the concentration of nutrients, while more complex organisms have the ability to run simulations of different strategies (“policies”) in their heads before choosing one that minimizes the surprise over a longer period of time. Complex organisms can therefore perform actions of varying complexity ranging from keeping the right body temperature to getting an education.

In this post I try to, with less than stellar success it turns out, unpack how surprise is represented in the formalism of AIF.

The dimly said

While I’m intrigued by the active inference framework, reading the textbook [1] and papers on the topic is an intellectual, epistemological, and ontological challenge.

Some questions that I have failed to find definite answer to in the literature up until now are:

Are preferences expressed in terms of observations or states?
Which probability distribution in the framework represents the preferences?
Is “expected” equal to “preferred”?

Two possible but somewhat contractory explanations are presented in [1] and [2] which are up until now my two main sources for this odyssey into the active inference framework. I will below try to describe both explanations. I’m afraid this post turns into a critique of the AIF literature rather than something very illuminating.

First explanation

The first explanation comes from [1]:

This speaks to the fact that in Active Inference, the notion of marginal probabilities or surprise (e.g., about body temperature) has a meaning that goes beyond standard Bayesian treatments to absorb notions like homeostatic and allostatic set-points. Technically, Active Inference agents come equipped with models that assign high marginal probabilities to the states they prefer to visit or the observations they prefer to obtain. For a fish, this means a high marginal likelihood for being in water.
[1, p.26]. My emphasis.

Marginal probability is introduced like this:

The first, which we refer to simply as surprise, is the negative log evidence, where evidence is the marginal probability of observations.
[1, p.18]. My emphasis. The excerpt is from a paragraph explaining the two concepts of surprise and Bayesian surprise.

The first quote above explicitly indicates that a preferred body temperature is somehow encoded into the marginal probability so that “surprising” body temperatures can be detected from the marginal probability. It also explicitly states that the agent has a model “that assign high marginal probabilities to the states they prefer to visit or the observations they prefer to obtain“. This seems to indicate that we can express preferences either in terms of states or in terms of observations.

With $p(o) = \sum_i p(o \mid s_i)p(s_i)$ (which is a marginalization over the states) there are only two conceivable ways to encode preferences (like the fish’s preference for water) into $p(o)$. We can either encode them in the likelihoods $p(o \mid s_i)$ or in the priors $p(s_i)$.

The likelihoods map states to observations and are probably coded quite hard into several levels of the perceptual apparatus of the brain. We will not easily change our basic idea about what a cat looks like.

The more likely candidates are the priors. They already code what we expect and they come in as “weights” in the evidence. This supported by the following quote:

… action minimizes free energy (and surprise) by changing the world to make it more compatible with your beliefs and goals.
[1. p.9].

This seems to relate surprise not to observations but the beliefs which are associated with states according to the following:

In general, the concept of a ‘state’ is abstract and can refer to anything one might have a belief about.
[2, p 3]. My italics.

To be noted is also that the quote somehow seems to equate beliefs and goals, the “is” and the “ought”.

Second explanation

The AIF tutorial [2] seems to contradict the account in [1] when introducing action planning:

Therefore, we will instead represent this distribution as $log(p(o \mid \mathbb{C}))$, where the variable $\mathbb{C}$ denotes the agent’s preferences (Parr, Pezzulo, & Friston, 2022). In this distribution, observations with higher probabilities are treated as more rewarding. Note that this is distinct from priors over states, $p(s)$, which encode beliefs about the true states of the world (i.e., irrespective of what is preferred).
[2, p.6]. My emphasis and latex notation.

Here $p(s)$ are explicitly said not to encode any preferences (but “beliefs”). Instead preferences are assumed to be explicitly encoded by the quantity $\mathbb{C}$. The tutorial [2] refers to this quantity only as something that “encodes preferences” while it leaves its ontological status (class) undefined.

That $p(s)$ encodes beliefs also requires some explanation. My hypothesis is that a “belief” is a mental state that in some meaningful way represents a real-world state.

Later the textbook [1] chimes in saying about $p(o \mid \mathbb{C})$:

Note that pragmatic value [ $log(p(o \mid \mathbb{C}))$ ] emerges as a prior belief about observations, where the C-parameter includes preferences. The (potentially unintuitive) link between prior beliefs and preferences is unpacked in chapter 7; for now, we note that this term can be treated as an expected utility or value, under the assumption that valuable outcomes are the kinds of outcomes that characterize each agent (e.g., a body temperature of 37°C).
[1, p.34].

Also here $\mathbb{C}$ is said to represent “preferences”. $log(p(o \mid \mathbb{C}))$ is said to represent a “prior belief”. [2] posits that what we have beliefs about are states and (presumably) not observations, provided that states and observations are indeed different concepts.

The quote seems to put the preferred body temperature into $\mathbb{C}$ rather than into $p(s)$.

The distribution $p(o \mid \mathbb{C})$ is introduced as an input to policy evaluation, i.e., planning of actions. So it is by its nature forward-looking. It is never mentioned in [1] in conjunction with perceptual inference and thus not suggested to play a role in actual appraisal of surprise, only in forward-looking, “vicarious”, appraisal of surprise.

To throw in some more ambiquity, in chapter 3 the textbook [1] states:

If one defines these preferred states as expected states, then one can say that living organisms must minimize the surprise of their sensory observations
[1, p 59].

This suggests that preferences are defined in terms of states rather than observations. If “states” don’t in fact refer to observations but that would imply a truly confused ontology as the same quote also mentions “sensory observations”. This quote also equates preferred states and expected states which makes poor intuitive sense. What I expect is not always what I prefer; the “is” is not necessarily the “ought”.

Residual confusion

Exactly how preferred observations or states and thus surprise are defined in AIF has been difficult for me to understand from the very beginning, when I started studying the topic. As illustrated above the literature is neither transparent nor consistent regarding this. A big part of the problem is that the literature doesn’t use a well-defined ontology and that there seems to be untold assumptions lurking in the back drop.

Reading other sources than the two quoted above leads me to think that one can define preferences either in terms of a distribution over preferred observations or over preferred states. This would seem logical too since there is a clear mathematical relationship between observations and states.

The hypothesis I’m going to evaluate going forward is that expected states and observations are encoded in the generative model as prior states wheras preferences are actually coded outside of the generative model and only used when evaluating future actions. This seems somewhat consistent with the otherwise strange move in [2, p 10] where equation 12 starts off a derivation with what looks like the generative model $p(o, s \mid \pi)$ ¹ and then in L3 replaces the evidence of the generative model with $p(o \mid \mathbb C)$.

An alternative would be that there are two different models, generative of otherwise: one used for perceptual inference, and one used to express preferences; an “is” model and an “ought” model. If so, then the derivation referred to above jumps from one model to the other without much explanation other than that “this is a central move within active inference”. I understand that I should be humble as I’m a beginner in this area but this “move” looks to me like a category error.

To be continued…

Links

[1] Thomas Parr, Giovanni Pezzulo, Karl J. Friston. Active Inference.
[2] Ryan Smith, Karl J. Friston, Christopher J. Whyte. A step-by-step tutorial on active inference and its application to empirical data. Journal of Mathematical Psychology. Volume 107. 2022.
[3] Beren Millidge, Alexander Tschantz, Christopher L. Buckley. Whence the Expected Free Energy. Neural Computation 33, 447–482 (2021).

[3 p 453 doesn’t have the dependence on $\pi$ in the corresponding equation. It seems to me that $\pi$ does modify $p(s)$, the states we expect to end up in depending on our action, which in turn modifies the distribution $late p(s, o)$. If you jump off the diving tower you will expect very different states from what you would expect if you have second thoughts and climb back down. ↩︎

The dimly said

First explanation

Second explanation

Residual confusion

Links

Related Post

Leave a Reply Cancel reply