Conditionalization, A New Argument For
Bas C. van Fraassen
Probabilism in epistemology does not have to be of the Bayesian variety. The probabilist represents a person's opinion as a probability function; the Bayesian adds that rational change of opinion must take the form of conditionalizing on new evidence. I will argue that this is the correct procedure under certain special conditions. Those special conditions are important, and instantiated for example in scientific experimentation, but hardly universal. My argument will be related to the much maligned Reflection Principle (van Fraassen 1984,1995), and partly inspired by the work of Brian Skyrms (1987).
1. Modeling opinion: how we change our minds
On a very simple minded view of opinion, we have at each moment a set of full beliefs, which we update by adding our new evidence. Both the prior beliefs and the new evidence are propositions, the latter coming in at a steady stream. The set of full beliefs in question constitute our explicit opinion; whatever follows from them logically constitutes our implicit opinion. There is a second operation besides adding evidence which is more a matter of book-keeping: the process of transferring some beliefs from the implicit column to the explicit column -- i.e. enlarging the set of explicit full beliefs by means of purely logical deduction. This operation we can without loss of generality think of as simple repeated application of Modus Ponens.
Not much less simpleminded is the core model of the orthodox Bayesians. Our opinion consists of a probability function; new evidence consists of propositions to which we give probability 1. Updating cannot be simply adding now, but there is a Modus Ponens like operation, Conditionalization, which counts as purely logical updating. If P is my prior opinion, and E my new evidence, then P' = P( |E), defined as P( &E)/P(E), is my new (posterior) opinion -- the change of P into P' is called conditionalization on E. (I will leave aside how the distinction between explicit and implicit opinion can also be added in here.)
There are many criticisms of these simple-minded views. But there are also supporting arguments. A central sort of supporting argument attempts to establish this:
Conditionalization is the only admissible form of updating if the second model is assumed correct in other respects.
How shall we view those arguments? I view them as follows: there are special conditions under which they lead to the correct conclusion. In these circumstances it is accurate to represent a certain aspect of someone's prior opinion and new evidence taking in the above manner. In addition, these special conditions also entail that on this occasion, the updating of the prior opinion must be by conditionalization on that new evidence.
The emphasis is on "special". Of course, some aspects of those conditions may be quite general aspects of the human condition, and the rest may be characteristic of a specially important sort of situation. In fact, one instance of this special sort of situation has, I think, tacitly guided certain branches of modern epistemology: the well designed scientific experiment or controlled observation.
What I shall do here is add another argument for Conditionalization, but one which is explicitly premised on that sort of situation, and on a certain view of what is crucial to it. Some of what is crucial to it is, I think, also crucial to all rational management of opinion -- some, but not all (van Fraassen, 1989, Chapter Seven).
2. A controlled epistemic situation
Imagine that we are waiting to observe the outcome of some process and that we are quite sure that we know what the possible outcomes are. Imagine in addition that we know exactly how we shall update our current prior opinion P in response to this outcome. Finally, imagine that the resulting posterior opinions P' are (in this context) uniquely characterized by the outcome which they take to be the actual one, and to which they accordingly give probability 1. In the case of an experiment, for example, the i-th posterior opinion is the one that gives probability 1 to the i-th possible outcome of the experiment.
What I call here our prior and posterior opinion are really a small part of our opinion, restricted to propositions about the process in question, and perhaps some hypotheses which we are testing in this situation. I have now included some clauses assuming that we currently have opinions about our own opinions (notably, what our future opinions can be at the end of the process). But while these remarks are an important part of the characterization of this situation (as I see it), they will not have any role in our central argument. So the propositions in the domain of P are restricted to ones solely about the process observed and hypotheses about it, excluding propositions about the observer. I make these initial suppositions somewhat more precise as follows:
There is a partition {Bi: i in I} such that the possible posteriors form a set {P'i: i in I} with P'i(Bi) = 1.
By a partition I mean an exhaustive set of mutually exclusive propositions. Again, these propositions could plausibly include the distinctive information that the experiment had its i-th possible outcome. Note then that these posteriors are orthogonal to each other: each gives 1 to a proposition to which all the others give 0.
I will omit the annoying "i in I" whenever only one index set is being used; I will for the time being assume that all index sets are finite.
3. An epistemic principle
We must now discuss how the posteriors in this situation are to be related to the prior. This cannot be a discussion of how things actually happen in the world, since some experimenters are morons, some take drugs, and some suffer strokes while observing and reasoning. But we can ask how the posteriors are to be related to the prior in a case of unobstructed rational management of opinion -- the sort of thing that I hope goes on when I am trying to balance my bank book and when Millikan measures the charge of the electron, for example.
Here the informal remarks about the observer's own prevision of how he will update his opinion play the central motivating role. In the case of rational management, taken to be unobstructed -- I leave other cases aside for now -- this prevision can only derive from a conscious commitment to certain policies or intentions that characterize this observer 's view of how that is to be done responsibly. We may not be able to restrict these policies very far a priori. But I have argued on earlier occasions (1984, 1995) that they must meet a certain minimal condition:
General Reflection Principle: the prior opinion must fall within the range spanned by the foreseen possible posterior opinions.
Just a brief aside: the Reflection Principle itself follows from this General Reflection Principle only if we assume that our future opinion is itself represented by a random variable -- that the language includes resources for describing and attributing future opinion to oneself. That is not being assumed here, so we shall be relying on something weaker than the original Reflection Principle.
Under opinion I include here the subjective probabilities assigned to propositions, but also (following the historical precedent already set by Pascal, Fermat, and Huyghens) the expectation values for measurable quantities ('random variables', rv) deriving from those subjective probabilities. I will restrict discussion here to simple random variables: quantities with only a finite range of possible values. A simple rv is thus a quantity f for which there is a partition {Fj: j in J}, with J finite, such that f takes value j if and only if Fj is the case. The expectation value of f depends on the probability function P; it is the sum of the factors jP(Fj), and I shall denote it ExpP(f). Thus the General Reflection Principle takes the more precise form:
General Reflection Principle: for each rv f, the prior expectation value ExpP(f) lies within the interval spanned by the foreseen possible posterior expectation values {ExpP'i(f), i in I}.
4. What happens if we Conditionalize?
Before trying to show that Conditionalization is incumbent on us is the above sort of basic epistemic situation, let us look at what happens if one does Conditionalize.
Conditionalization is really orthogonal decomposition. That is, if {Em: m in M} is a partition, we can exhibit an orthogonal decomposition of P as follows:
P =
3{P(Em)(P |Em): m in M}The components P( |Em) are mutually orthogonal probability functions; they are the Conditionalizations of P on the propositions Em. The sum is a weighted sum (a convex combination, a mixture) in that the coefficients P(Em) are non-negative and sum to 1.
Thus the rule to update by Conditionalization says that in the basic epistemic situation I described, the orthogonal family {P'i} must be exactly the family {P( |Bi)}.
The General Reflection Principle is certainly obeyed when someone follows the rule of Conditionalization. The reason is that a convex combination always lies within the interval spanned by its components. This is the elementary but infinitely useful:
Mixing Principle: if P is a mixture of {Qi:i in I} then for all A, P(A) is in the interval [{Qi(A):i in I}].
(I use square bracket notation for an interval spanned by a set: the span [X] of set X is the smallest closed convex set that contains all members of X.) The Mixing Principle derives of course from the simpler principle that a convex combination of numbers lies in the interval spanned by its components.
So we have now seen that if the rule of Conditionalization is obeyed, then both our basic principles are satisfied:
a) the possible posteriors form an orthogonal set indexed by a partition of propositions to which they give 1,
b) the General Reflection Principle holds.
But the rule of Conditionalization was meant by its advocates to be universal, not just applicable in those 'controlled epistemic situations' which are or resemble well-designed experimental set-ups of a certain sort. I will investigate here only the question whether, if our two basic principles are satisfied -- which I think is typical of such situations but not at all the case in general -- the updating must be by Conditionalization.
5. Some caveats about mixing
It may already look as if we are but steps away from proving that updating must be by Conditionalization, given the General Reflection Principle. But there are obstacles. The most important of these is that the converse of the Mixing Principle does not hold. This we can see from a simple counterexample.
Let's take a three-point space generated by a partition of three propositions, {A, B, C}. Then there are then exactly 8 propositions, including the tautology T and the self-contradiction F. I will tabulate the remaining six propositions and the probabilities assigned to them by three probability functions:
A B C AvB AvC BvC
---------------------------------------------------------------------
q1 0.4 0.6 0 1 0.4 0.6
q2 0 0 1 0 1 1
p 0.2 0.2 0.6 0.4 0.8 0.8
Notice that q1 and q2 are orthogonal to each other and that for all propositions E in the relevant domain, p(E) lies in the interval spanned by q1(E) and q2(E). But any mixture of q1 and q2 will give to A and B either different values or zero. So p is not mixture of those two functions.
I should add one more caveat, to fix our thinking about convex sets and combinations, though it won't play an important role here. If all components assign the same odds or conditional probability to a pair of propositions, so will the mixture. But it is not the case in general that the odds or conditional probabilities assigned by a mixture are within the span of those assigned by the components ("Simpson's paradox"). Thus the General Reflection Principle should also not be thought of as extending too far: it pertains to the basic sort of opinion representable directly within the discourse of expectation values.
I said that conditionalization is just orthogonal decomposition, but I only showed above that the conditionalizations on a partition form an orthogonal decomposition. The converse holds too. From here on I will take it for granted that {Bi: i in I} is a partition, and rely on this in statement and proof of results.
T1. If {Qi: i in I} is an orthogonal family of probability functions with Qi(Bi) = 1 then P is a mixture of {Qi} if and only if P( |Bi) = Qi for all i in I such that P(Bi) > 0.
We already saw that P is a mixture of its conditionalizations on the partition. To complete the proof, assume that
P =
3{ciQi}.and that the numbers ci are non-negative and sum to 1. (If some of these coefficients are zero, the proof is easily amended.) Then for all A which are part of Bi, P(A) = ciQi(A). Since Bi is part of Bi and Q(Bi ) = 1, this implies that ci = P(Bi). But if that number is not 0 then P(A) = P((Bi)P(A|Bi) so P(A|Bi) = Qi(A). This argument is general for all for all parts A of Bi and for all indices i in I, so P( |Bi) = Qi for all i in I for which P(Bi) > 0.
6. Identifying all the mixtures of an orthogonal family
Expectation values combine linearly; they 'mix' along with their probability functions. Thus:
Lemma. If P is the mixture
We do not require here that the Qi are orthogonal, nor that the rv are simple. For simple rv, all that concern us here, the lemma follows at once from the definition of expectation.
T2. If {Qi: i in I} is an orthogonal family of probability functions with Qi(Bi) = 1 then P is a mixture of {Qi} if and only if for all simple rv f, ExpP(f) is in the span of {ExpQi(f): i in I}.
Assuming the antecedent, the implication in the "only if" direction follows at once from the Lemma and the mixing principle for ordinary convex combinations of numbers.
Assume the antecedent and also that for all simple rv f, ExpP(f) is in the span of {ExpQi(f): i in I}. To prove that P is a mixture of the Qi we see by T1 that it will suffice to prove that P( |Bi) = Qi whenever P(Bi) > 0.
To do this let us introduce a bit of notation for certain rv that we can associate with the members of our partition.
Definition i% is the rv which takes value 1-r if ABi, -r if Bi-A, and 0 under all other conditions
Lemma. If P(Bi) > 0 then P(A|Bi) = r iff ExpP(i%) = 0.
For in that case ExpP(i%) = P(ABi)-rP(ABi)-rP(Bi-A) = P(ABi) - rP(Bi), which equals 0 iff P(ABi) = rP(Bi), hence if r = P(A|Bi).
To continue the argument for the theorem, choose r = Qk(A), and suppose A is part of Bk. If j >< k then ABk and Bk-A both receive probability 0 from Qj, so then ExpQj(k%) = 0. If j = k then that expectation value equals Qk(ABk) - rQk(Bk) = Qk(A) - r. But we chose r = Qk(A), so once again ExpQj(k%) = 0.
Thus the expectation value of k% is in this case 0 for all the Qj. But we have assumed that the expectation value of any simple rv for P is in the span of those for the Qj. Therefore ExpP(k%) = 0 also. But then, by the Lemma, P(A|Bk) = Qk(A) as was required to be shown.
7. Conditionalization rationally required in this case
We go back now to the basic epistemic situation in which the foreseen possible posteriors form an orthogonal family. If the person involved there adheres to the General Reflection Principle, we conclude by T1 and T2 that his or her possible posteriors are exactly the conditionalizations of his prior on the 'outcomes' of the process under observation. That is, s/he follows the rule of Conditionalization in situations of this sort.
Bibliography
Skyrms, B.: 1987, "Dynamic coherence and probability kinematics", Philosophy of Science 54, 1-20.
van Fraassen, B.: 1984, "Belief and the Will", Journal of Philosophy 81, 235-256.
van Fraassen, B.: 1989 Laws and Symmetry, Oxford, Oxford University Press.
van Fraassen, B.: 1995, "Belief and the problem of Ulysses and the Sirens", Philosophical Studies 77, 7-37.
Princeton University