123 kB

David M. Rogers · Thomas L. Beck · Susan B. Rempe

Information Theory and Statistical Mechanics Revisited

the date of receipt and acceptance should be inserted later

Abstract The statistical mechanics of Gibbs is a juxtaposition of subjective, probabilistic ideas on the one hand and objective, mechanical ideas on the other. From the mechanics point of view, the term ‘statistical mechanics’ implies that to solve physical problems, we must first acknowledge a degree of uncertainty as to the experimental conditions. Turning this problem around, it also appears that the purely statistical arguments are incapable of yielding any physical insight unless some mechanical information is first assumed. In this paper, we follow the path set out by Jaynes[25], including elements added subsequently to that original work, to explore the consequences of the purely statistical point of view. Because of the amount of material on this subject, we have found that an ordered presentation, emphasizing the logical and mathematical foundations, removes ambiguities and difficulties associated with new applications. In particular, we show how standard methods in the equilibrium theory could have been derived simply from a description of the available problem information. In addition, our presentation leads to novel insights into questions associated with symmetry and non-equilibrium statistical mechanics. Two surprising consequences to be explored in further work are that (in)distinguishability factors are automatically predicted from the problem formulation and that a quantity related to the thermodynamic entropy production is found by considering information loss in non-equilibrium processes. Using the problem of ion channel thermodynamics as an example, we illustrate the idea of building up complexity by successively adding information to create progressively more complex descriptions of a physical system. Our result is that such statistical mechanical descriptions can be used to create transparent, computable, experimentally-relevant models that may be informed by more detailed atomistic simulations. We also derive a theory for the kinetic behavior of this system, identifying the nonequilibrium ‘process’ free energy functional. The Gibbs relation for this functional is a fluctuation-dissipation theorem applicable arbitrarily far from equilibrium, that captures the effect of non-local and time-dependent behavior from transient driving forces. Based on this work, it is clear that statistical mechanics is a general tool for constructing the relationships between constraints on system information.

Keywords predictive statistical mechanics · maximum entropy · likelihood · probability · information entropy

PACS 65.20.De · 89.70.Cf · 05.70.-a · 05.70.Ln · 05.40.-a · 82.20.Uv · 02.50.Ey · 05.10.Gg · 82.39.Wj

David M. Rogers · Susan B. Rempe
Center for Biological and Materials Sciences, MS 0895
Sandia National Laboratories, Albuquerque, New Mexico 87185, USA
E-mail: dmroge@sandia.gov E-mail: slrempe@sandia.gov

Thomas L. Beck
Department of Chemistry
University of Cincinnati, Cincinnati, Ohio 45221-0172
E-mail: thomas.beck@uc.edu## 1 Introduction

If the foundation of thermodynamics is to be built on processes existing in the physical world, then the whole structure of the theory will be subject to constant revision as new physics is discovered. This, however, has not proven to be the case. Rather, as new mechanistic information is added, statistical mechanics persists in an identical form, with changes only in the meaning attached to system states and measurement outcomes. It follows that, as in the case of the geometry of Euclid, statistical mechanics does not describe objects actually existing in the physical world, but rather idealizations of them[55]. This distinction immediately explains why the structure of statistical mechanics has persisted throughout the developments of the last century. Because its basic axioms are conventions chosen to be logically consistent and in agreement with our intuition, the mathematical form of the theory operates as a device for carrying out extended logic.

There have, to date, been many examples of using logical inference for framing statistical mechanical questions. Perhaps the most widely known is in the gradual shift in the conceptualization of an “ensemble.” Early ideas, associated with the names of Maxwell, Boltzmann, and others, were based on physically realizable systems with many weakly interacting particles, i.e. gases. The theory simply stated that an examination of all the particles at a single instant revealed the statistical properties of the ensemble. Gibbs [17] adapted the concept to systems that may contain strong internal interactions, e.g. solids or condensed phases, by imagining the ensemble as an infinite number of physical replicas of the system. His subjective conceptualization can be seen in his definition of the laws of thermodynamics as expressing “the approximate and probable behavior of systems of a great number of particles, or, more precisely, ... for such systems as they appear to beings who have not the fineness of perception to enable them to appreciate quantities of the order of magnitude of those which relate to single particles, and who cannot repeat their experiments often enough to obtain any but the most probable results.” It was immediately clear that for developing a subjective, formal treatment of the probability distribution over phase and its consequences, ‘hypotheses concerning the constitution of matter’ would not be required except in working out special cases.

The shift toward a subjective interpretation occurred only gradually because of the combination of Gibbs’ modest personality[48] and a dispute between Gibbs and his contemporaries [15], who viewed the physical reason for the weak coupling between ensembles which brought about equilibrium as paramount. Even as Schrödinger [62] presented a maximum entropy derivation of the canonical ensemble similar to the modern treatment of Jaynes[33], he still found it necessary to seek a middle-ground by considering such distractions as the physical realizability of infinite heat baths. The work of Jaynes [25,34,33] and others [18] went a great deal toward clarifying the situation by making a distinction between the “delusion that an ensemble describes an ‘objectively real’ physical situation” [34] and the subjective question of determining the “agreement between the premises and the conclusions.” [17] However, the philosophical debate over objective vs. subjective interpretations of thermodynamics continues to date [67]. Not surprisingly, attempts to prove ergodicity and convergence to maximum entropy distributions using mechanical arguments show that the most robust route is to introduce some form of uncertainty[71,46].

Perhaps the strongest criticism of this approach is associated with the use of the term, ‘subjective.’ This term seems to imply that the results of the theory cannot be considered as objectively existing in reality. Nevertheless, experiments are able to compare work and heat values to find agreement with thermostatics, provided a given system behaves according to the assumptions. In exactly the same way, Euclid’s geometry is able to deduce physically measurable distances, provided these objects behave as ideal solids. Subjectivity is present in both of these cases because assumptions are always required in order to calculate one quantity from another. The term ‘subjective’ simply acknowledges that this reasoning process proceeds from assumptions derived from experience. Physical predictions of objectively real phenomena can be made from a subjective theory based on assumptions that are objectively correct. However, imputing objectivity to assumptions used to solve a particular problem makes it impossible to conceive of possible changes in prior information and has given rise to some of the most difficult paradoxes in science.

In this paper, we aim to derive the statistical aspect of thermodynamics from the logical foundation given by Jaynes [33] in sufficient detail to present applications to modern problems outside the realm of the equilibrium canonical ensemble. Although some of the most important results of this inquiry have already been presented by Gibbs, we find that a derivation from first principles clarifies the logical foundations of the theory. A similar derivation for the canonical ensemble from first principles[33] exemplifies the generality that such a theory may attain; however, several important questions remain unaddressed. First and foremost, the form of the canonical ensemble must change when new degrees of freedom are added. This addition corresponds to a change in the prior information for the problem, and it is not immediately evident how both problems can---

be related. We have found that directly attacking this change in the number of possible ‘states’ of a system requires a type of logical relativity theory, which results from rejecting the two-valuedness of elementary hypotheses. We address this theory in Sec. 2 and provide the most important details in the Appendix.

Next, we introduce the partition function and entropy functionals in Sec. 3, from which the elementary theory of statistical mechanics becomes manifest. A relative entropy functional is in most respects simpler than the free energy as it may be deduced from purely statistical considerations. The basis of this functional in information theory shows that thermodynamic states are characterized not by an objective physical situation, but instead by subjective information about the system. Further, logically consistent predictions from statistical mechanics will disagree with the results of experiment whenever physically incorrect assumptions are made about the state of the system. A novel result of this section is that thermodynamic free energies fundamentally express the log-likelihood of a state of knowledge. Although the absolute probability is undefined without specifying all possible states, the likelihood ratios between states may be deduced, and are exactly the exponentials of free energy differences. Although the partition function and entropy are defined as ‘state-functions,’ dependent on a state of knowledge, we justify the fundamental importance of likelihood ratios by confirming that the partition function can be built from a product of such likelihood ratios in any order, leading to the conception of thermodynamic cycles. This result also proves that the resulting probability distribution is a state function, as it should be. The assumptions required for constructing the relationships between informational states—the laws of statistical mechanics—are thus founded on probability theory.

We will develop the ion channel as an example problem for applications (Sec. 4) of the basic set of relations given in Sec. 3. Transmembrane proteins that shuttle solutes between two aqueous/membrane interfaces have drawn the attention of a large crowd of experimental and theoretical investigators.[22] Selective channels and transporters are critical for maintaining living cells in their nonequilibrium state. Similar functionality is a required ingredient of synthetic semi-permeable partitions, used in fuel cells, solute separation, and electrochemical sensing. The operational characteristics of these devices are determined from their response to applied pressure, electric fields, and solute concentration differences. The most easily measured response is ion conduction, available through current measurements that can be carried out on micrometer-sized patches at milli-second resolution.[20,68] Conduction of other species, such as water, as well as structural changes in the channel and surrounding interface regions are also important, but less accessible. The most easily accessible theoretical descriptions of channel behavior center around the structural properties of the equilibrium state and its propensity for ion occupancy under no external bias (in non-conducting conditions).[61] Instead of presenting a patchwork of accumulated techniques in statistical mechanics, in this article we present a top-down view by successively adding mechanistic information to predict these propensities. This allows a construction of the simplest possible physical interpretation of channel behaviour, but uses a statistical mechanics capable of deriving all the complexities of atomistic and quantum-mechanical systems. Because no net currents are present at equilibrium[30], the fluxes in these systems must be analyzed using a nonequilibrium theory.

The usual Komologrov definition of probability and the Boltzmann factor are developed in Sec. 4.1 and 4.2. The latter is derived by a maximum relative entropy argument along a path in a thermodynamic cycle. For the special case of maximum relative entropies, we find that the entropy increments add to the total information entropy of each state, proving path independence of the entropy for this case. These two cases generate the usual equilibrium thermodynamics approach without the necessity of assuming extensivity. We will show that this approach can be used directly to give a naïve distribution over ion occupancy states for the channel. We will show later that this distribution is related to a coarse-grained (marginal) distribution at a state of knowledge with more degrees of freedom. However, both states of knowledge correctly employ the rules of statistical mechanics, and their difference lies in the assumed information for the problem.

Adding new coordinates, the opposite of restricting them in Sec. 4.1, leads to the multicanonical ensemble, presented in detail in Sec. 4.3. The new degrees of freedom are termed coarse coordinates. The relationship between canonical and multi-canonical ensembles is the usual one. Fixing coarse coordinates within the multicanonical ensemble generates a conditional ensemble. The aggregate probability of the coarse coordinates is related to the potential of mean force as in coarse-graining.[38] Although we could directly add time-dependent states in this section, we have adopted a slower development, adding conformational states of the channel at equilibrium. This allows an intuitive connection to the well-known equilibrium theory, where introduction of interacting systems can change the distribution over the system of interest. The analogous development in Ref. [25] is the derivation of the constant pressure or constant angular momentum ensemble. Instead of describing everything in terms of all atomistic positions and momenta, or all atomic and electroniceigenfunctions, we have shown here that these limits can be approached incrementally as necessary for each application.

In general, adding maximum relative entropy information along with new coordinates, $Y$ , will change the distribution over the previous coordinates, $X$ . In some cases, for example when the marginal distribution over $X$ is experimentally known, this is not desirable. We instead seek a method for inference on $Y$ from a known distribution over $X$ and maximum entropy information. The method for constrained addition is derived in Sec. 4.4 by assuming a probability distribution for ion occupancy states in a channel, and then inferring the distribution of channel conformations. Other important applications of the same theory are possible. In particular, it leads directly to the predictive statistical mechanics[30] of dynamic processes arbitrarily far from thermodynamic equilibrium. The non-equilibrium process entropy (caliber) and free energy functionals follow naturally from a specification of non-equilibrium states as trajectories. The Gibbs relations for these functionals lead trivially to generalized fluctuation-dissipation theorems. Although based on ideas originally from Jaynes[27, 28], the work presented in this section differs in an important respect. By fixing the distribution at an initial time and then maximizing the step-wise transition entropies we arrive at a non-anticipating process. Previous presentations[27, 47, 14] utilize anticipating conditions, resulting in the possibility of non-physical influences from forces which may be exerted on the system at future times. Removing this shortcoming brings us closer to an original idea by Jaynes[26] and gives path probabilities immediately recognizable as a canonical form for forward transition processes. Information loss in discarding the starting distribution in favor of the final distribution leads to a quantity analogous to the thermodynamic entropy production. This entropy has a great advantage over other formulations[12, 66] in that it does not explicitly require definition of a ‘steady-state.’ Such a state may not be unique (as in the case for the Liouville equation) or even exist, e.g. transient processes like evaporation in an open system. It is our hope that this new development will complete the statistical foundations of thermodynamics by providing a basis for the second law of thermodynamics in information theory already hinted at in the problem of Maxwell’s demon.[41]

2 Logical Foundations

Jaynes[33] presents a cogent interpretation of probability theory as a method for conducting logical inference in the presence of uncertainty. This interpretation is based on Pólya’s qualitative conditions for plausible reasoning in mathematics[56] combined with the consistency theorems of Cox and Aczél[10, 2] deduced by consideration of the associativity equation. Requiring our system for assigning plausibilities to be associative, such that adding information in any order leads to the same probability assignment, it is possible to deduce the product rule

$P(AB|C) = P(A|BC)P(B|C) = P(B|AC)P(A|C), \quad (1)$

for which the right equality is Bayes’ theorem. The symbols, $A$ , $B$ , and $C$ stand for hypotheses, or logical propositions, and the symbols on the right of the $|$ represent given information, or assumptions. In this paper, we denote propositions using Greek or capital letters. This distinction is necessary to allow for propositions that represent coordinates, i.e.

$X$ : Some property of the system is described by the number $x$ .

Propositions always appear inside the probability symbol and follow the Boolean algebra, where multiplication denotes a logical ‘and,’ while addition represents a logical ‘or.’ We refer the reader to the first few paragraphs of the Appendix for the necessary notation. Of particular importance is the relation $AB = A$ when $A \Rightarrow B$ , used extensively to replace $X$ with $X\mathbb{S}_x$ . We also omit the prior information $I$ for clarity in some instances, although it is always to be assumed in the formulas presented here.

An immediate question occurs as to how probabilities may be assigned in the first place. The most appealing answer is to employ the principle of indifference (termed $I$ ), which states that, for any number of possible outcomes, each is equally probable. However, this again begs the question as to the definition of the hypothesis space. Is any assignment possible in the absence of this knowledge? We assume that some assignment is possible, and state it as $P(A|I) = \text{constant}$ for hypotheses, $A$ , that are ‘of the same type.’ We provide a formal justification for this process in the Appendix, and note that it extends some amount of inference to statements that are undecidable, but does not affect the conditional assignments, $P(C|AI)$ , when $A$ says something about $C$ .

The ability to reason in an un-defined hypothesis space has some interesting consequences for the principal of complementarity.[31] Suppose an infant is entertained by a screen that shows one color, $x_1, x_2$ , or $x_3$ andplays one sound, $y_1, y_2$ , or $y_3$ at every moment. However, $X_1$ never occurs simultaneously with $Y_1$ and so on for $X_2$ and $Y_2, X_3$ and $Y_3$ . As these ideas are being learned, it appears that $X_1$ and $Y_1$ are mutually contradictory circumstances, and it should be possible for the infant to express some idea of the validity of that statement, $X_1 \oplus Y_1$ . Upon their first encounter with both $X_1 Y_1$ in the real world, the infant may be genuinely surprised. Moreover, if $X_1$ represents the statement, ‘color $x_1$ is present,’ and the child believes all colors are mutually exclusive ( $\oplus(X_1, X_2, X_3)$ ), then $\bar{X}_1$ can be represented equally well with, ‘any color other than $x_1$ is present.’ It may thus be ontologically true that both $X_1 \bar{X}_1$ for a complex scene.[19, 7] Following the argument of Poincaré for establishing real numbers[55], suppose the screen is divided into smaller and smaller segments, and each time the colors are found to be mutually exclusive in each segment. Then the act of defining, by recursion, a continuous space on which colors are mutually exclusive will lead to an idea in contradiction with the nature of light (but not with our measuring apparatus). The situation is immediately seen to be similar to the double-slit experiment, where we must abandon our notion that a particle at position one, $X_1$ , and at position two, $X_2$ , are mutually exclusive, and instead replace it with $X_1 + X_2 + (X_1 \oplus X_2)C$ , where $C$ denotes ‘a measurement has been taken to determine $x$ .’ These considerations relax the ‘logical consistency’ restrictions on what questions may be asked of the quantum theory.[52]

The problem of assigning relative probabilities to hypotheses concerning which sets of events may be possible is considered at length in the Appendix. Using the principle of indifference, the main result is that the probability of a set of independent events, $\Omega$ , is proportional to the number of events, $|\Omega|$ , so that

$\begin{aligned} P(x|\Omega I) &= \frac{1}{|\Omega|}, x \in \Omega \\ &= \frac{P(x|I)}{P(\Omega|I)} = \frac{\text{const.}}{P(\Omega|I)} \\ &\Rightarrow P(\Omega|I) \equiv |\Omega| P(\varphi|I), \end{aligned} \quad (2)$

where $\varphi$ stands for some elementary hypothesis. This result elegantly sweeps questions due to symmetry under the rug, and these will be given more consideration in a subsequent paper. The discussion in the Appendix already shows that this development provides a method for dealing with symmetric hypotheses in a very simple format, fundamentally based on the principle of indifference. The appropriate ‘(in)distinguishability factors’ are derived as a result of its use in Sec. 4.1. From a statistical mechanics viewpoint, the principle of indifference then provides partition functions, $Z[\Omega] = P(\Omega|I) / P(\varphi|I)$ , consistent with completely ‘entropic’ systems. We show next that conventional partition functions may be obtained from these by moving constraints on average values to the left-side of the probability symbol as well.

3 Minimal Relations of Statistical Mechanics

Because we are deriving purely statistical relationships, the only things we are able to compare are states of knowledge. There may be several convenient computational notations or methods for solving the resulting equations, but it will not be necessary to read these as implying physically existing quantities or mechanisms. It is not the physical, causal mechanisms themselves, but rather theories about them that are the subject of the reasoning process. These theories may appear as propositions to be mutually compared or as given information for solving certain inference problems. However, unless physics appears in this way, it can have no influence on the solution of the logical problem. Mechanics enters because the set of coordinates and constraints relevant to any given hypothesis must be found from mechanical insight, and the answers resulting from statistics will depend non-trivially on this input.

We claim that the state functions of statistical mechanics, the partition function and entropy functional, can be derived by successive addition or replacement of problem information. The first process can always be carried out, with a corresponding change in the probability distribution for system states via re-weighting the probability distribution from the previous state. In general, this process is uni-directional. The second process, replacing information, can only be carried out directly via re-weighting in certain circumstances.

To develop our notation, we represent each state of knowledge by a set, $\mathcal{C} = {A}$ , of propositions or informational constraints, $A$ . Important types of propositions include system coordinates, energy assignments, and statistical weights. As we will see, the latter amount to propositions of the type “There is a physical mechanism increasing the likelihood of state $x_1$ over $x_2$ by some amount,” and are closely aligned with energy assignments and the translation of problems with pre-specified coordinates to problems with pre-specifiedphysical forces. In addition to these, we also permit statements defining the set of coordinates relevant to deciding a given proposition—the problem phase space—as well as the symmetries these propositions obey. Although it may seem peculiar from a mechanistic point of view, any problem constraints can appear either as given information or as objects to be compared. We might also be able to remove the distinction between coordinates, energy assignments, and definitions of phase space to form so-called generalized ensemble or coarse-grained systems.

We consider the process of adding information $A$ to some known state, $\mathcal{C}$ , which we denote by $\mathcal{C} \rightarrow A\mathcal{C}$ . The first quantity of interest is the probability $P(A|\mathcal{C}I)$ , which we compare to an alternative process, $\mathcal{C} \rightarrow \Phi\mathcal{C}$ . It is convenient to assume the existence of a null hypothesis, $\Phi$ , that is un-decidable from any other information. Formally,

$P(\Phi|\mathcal{C}I) = P(\Phi|\mathcal{C}'I) = P(\Phi|I) \quad \forall \mathcal{C}, \mathcal{C}'. \quad (3)$

Now divide the set of propositions, $\mathcal{C}'$ (appearing above), into two sets, $\mathcal{D}$ and $\mathcal{C}$ . From the two equivalent ways of composing $P(\mathcal{D}\Phi|\mathcal{C}I)$ using Bayes' theorem, it is easy to see that the above is true if and only if $\Phi$ is irrelevant to conclusions about $\mathcal{D}$ .

$P(\mathcal{D}|\Phi\mathcal{C}I) = P(\mathcal{D}|\mathcal{C}I)$

We compute the relative likelihood,

$Z[A\mathcal{C}]/Z[\mathcal{C}] \equiv \frac{P(A|\mathcal{C}I)}{P(\Phi|\mathcal{C}I)} = \frac{P(A\mathcal{C}|I)/P(\Phi|I)}{P(\mathcal{C}|I)}, \quad (4)$

using

$\frac{P(A|\mathcal{C}I)}{P(\Phi|\mathcal{C}I)} = \sum_{\{X\}} \frac{P(A|X\mathcal{C}I)}{P(\Phi|X\mathcal{C}I)} P(X|\mathcal{C}I). \quad (5)$

The summation set, ${X}$ , should include any system states relevant to deciding the plausibility of $\mathcal{C}$ or $A$ . To see this, assume that the states relevant to deciding $\mathcal{C}$ or $A$ are collected in the space $\mathbb{S}_X$ . Then write ${X} = \mathbb{S}_X \times \mathbb{S}i$ , where $X_i \in \mathbb{S}i$ are irrelevant to $A$ and $\mathcal{C}$ so that $P(X|A\mathcal{C}I) = P(X{A\mathcal{C}}X_i|A\mathcal{C}I) = P(X{A\mathcal{C}}|A\mathcal{C}I)P(X_i|I)$ . The sum in Eq. 5 factors into

$\sum_{X_{A\mathcal{C}} \in \mathbb{S}_X} \sum_{X_i \in \mathbb{S}_i} \frac{P(X_{A\mathcal{C}}X_i|A\mathcal{C}I)}{P(\Phi|\mathcal{C}I)} = \sum_{X_{A\mathcal{C}} \in \mathbb{S}_X} \frac{P(A|X_{A\mathcal{C}}\mathcal{C}I)}{P(\Phi|X_{A\mathcal{C}}\mathcal{C}I)} P(X_{A\mathcal{C}}|\mathcal{C}I).$

If we are use information $A$ , as an assumption it should come from known experimental data on the system. In order to establish $A$ , we may therefore tabulate frequencies for $X \in \mathbb{S}_X$ . If $A\mathcal{C}$ , turned out to be true, scientists basing their conclusions only on $\mathcal{C}$ would be increasingly surprised (or skeptical if the report is second-hand) at the evidence collected after $N$ trials. This is because the probability of these results given $\mathcal{C}\mathbb{S}_X$ would be (from the multinomial distribution),

$\begin{aligned} P(\{X\}_1^N \sim A\mathcal{C}|\mathcal{C}\mathbb{S}_X) &= N! \prod_{X_i \in \mathbb{S}_X} P(X_i|\mathcal{C}\mathbb{S}_X)^{n_i} / n_i! \\ &\rightarrow e^{N\mathcal{H}[A\mathcal{C}\mathbb{S}_X|\mathcal{C}\mathbb{S}_X]} \\ \mathcal{H}[A|B] &\equiv - \sum_{X_i \in \mathbb{S}_X} P(X_i|A\mathbb{S}_X) \ln \left[ \frac{P(X_i|A\mathbb{S}_X)}{P(X_i|B\mathbb{S}_X)} \right]. \end{aligned} \quad (6)$

According to $\mathcal{C}$ , the likelihood of such a set of observations decreases exponentially with $N$ . This is a condensed version of the Wallace derivation for the entropy, presented in more detail in Ref. [33]. The limit taken in the second equation is as $N \rightarrow \infty$ , which is appropriate for assessing such a set of hypothetical observations or second-hand reports. Evidently, the Kullback-Liebler divergence, $-\mathcal{H} \geq 0$ , represents the value of the information $A\mathcal{C}$ (or difference of opinion) to an observer who has already accepted $\mathcal{C}\mathbb{S}_X$ . The relative information entropy, $\mathcal{H}$ , reaches its maximum, zero, when the new information does not alter the distribution. For any reasonable comparison to be made, the distributions must be compared over the same set, $\mathbb{S}_X$ , which should include any observational information that $A$ or $B$ may predict. As in the case for the free energy difference, above, the relative entropy is independent of the distribution over irrelevant variables, $X_i \in \mathbb{S}_i$ . This happens here because the probability assignments are identical over the subspace $X|X_i$ for each $X_i$ .### 3.1 Transitivity

The above arguments showed how to add information incrementally. If the starting point is taken to be $I$ , it is then possible to assign a rank to all states,

$Z[\mathcal{C}] \equiv \sum_{\{X\}} \frac{P(\mathcal{C}|XI)}{P(\Phi|XI)^{|\mathcal{C}|}} P(X|I), \quad (7)$

where $|\mathcal{C}|$ denotes the size of the set, $\mathcal{C}$ , and $Z[\Phi] = 1$ . And we may compare any two states using

$\frac{P(\mathcal{C}|I)}{P(\mathcal{D}|I)} = \frac{Z[\mathcal{C}]}{Z[\mathcal{D}]}. \quad (8)$

To show that this can be computed using successive addition of information and Eq. 4, it is necessary to prove that any order of information addition leads to (7). The proof is a direct consequence of Bayes' theorem (1).

$\begin{aligned} P(AB|I) &= P(A|I) P(B|AI) \\ &= P(A|I) \sum_{X \in \mathbb{S}_x} P(BX|AI) \\ &= P(A|I) \sum_{X \in \mathbb{S}_x} P(B|XAI) P(X|AI) \end{aligned}$

The last formula is exactly the form of Eqns. 4 and 5 (with the normalization removed) and the above derivation is symmetric in $A$ and $B$ .

The relative entropy cannot be so defined, since it compares distributions. However, the entropy with respect to a complete space,

$\mathcal{H}[\mathcal{C}\mathbb{S}_x|\mathbb{S}_x], \quad (9)$

can be compared among all states which depend only on the space $\mathbb{S}_x$ . In the sections below, it will be shown that adding maximum entropy-type information, $B$ , makes $\mathcal{H}[AB\mathbb{S}_x|\mathbb{S}_x] = \mathcal{H}[AB\mathbb{S}_x|A\mathbb{S}_x] + \mathcal{H}[A\mathbb{S}_x|\mathbb{S}_x]$ . However, this is not necessarily true when Eq. 21 does not hold.

3.2 Inference

The above concepts may be solidified using the inference process as an example. Given a model, $M$ , for how data may be generated, we may use any prior information or symmetries of the problem to write down a prior state of knowledge, $\mathbb{S}_\theta M$ . The prior distribution over the parameter space, $P(\theta|\mathbb{S}_\theta M)$ , is then given by the free energy for the process $\mathbb{S}_\theta M \rightarrow \theta\mathbb{S}_\theta M = \theta M$ . Next, some data, $D_1$ , is collected and the state of knowledge updated to $D_1\mathbb{S}_\theta M$ . The free energy for $D_1\mathbb{S}_\theta M \rightarrow \theta D_1 M$ now gives the posterior distribution. Bayes' theorem appears as the thermodynamic cycle identity between the free energy for $D_1\mathbb{S}_\theta M \rightarrow \theta D_1 M$ and $D_1\mathbb{S}_\theta M \rightarrow \mathbb{S}_\theta M \rightarrow \theta M \rightarrow \theta D_1 M$

$\frac{Z[\theta D_1 M]}{Z[D_1\mathbb{S}_\theta M]} = \left( \frac{Z[D_1\mathbb{S}_\theta M]}{Z[\mathbb{S}_\theta M]} \right)^{-1} \frac{Z[\theta M]}{Z[\mathbb{S}_\theta M]} \frac{Z[\theta D_1 M]}{Z[\theta M]}$

Interestingly, inference using Bayes' theorem is increasingly being used to estimate the probabilities of free energies from computational sampling experiments[64,54,58], and this can be generalized to estimating free energy functionals.[23,59]

The relative entropy between $\mathbb{S}_\theta M$ and $D_1\mathbb{S}_\theta M$ measures how informative $D_1$ is in determining $\theta$ , while $\mathcal{H}[D_1 D_2\mathbb{S}_\theta|D_1\mathbb{S}_\theta]$ shows the amount of information that $D_2$ conveys once $D_1$ is known.### 3.3 Replacement of Information

The second type of process is that of completely replacing information. For this case, we may apply all the formulas of the previous section, and compare $\mathcal{C} \rightarrow A\mathcal{C}$ with $\mathcal{C} \rightarrow B\mathcal{C}$ . However, instead of computing each of these separately, we wish to directly compare the total likelihood between $A$ and $B$ . We thus fix either situation, and use

$1 = \sum_X P(X|A\mathcal{C}) = \frac{P(B|\mathcal{C})}{P(A|\mathcal{C})} \sum_X \frac{P(A|X\mathcal{C})}{P(B|X\mathcal{C})} P(X|B\mathcal{C}). \quad (10)$

This shows that the distribution over $X|A$ can be transformed from $X|B$ via re-weighting – although this is known to be computationally inefficient.[65] However, if there is an $X$ for which $P(X|B\mathcal{C})$ is zero, but for which $P(A|X\mathcal{C}) = \frac{P(X|A\mathcal{C})P(A|\mathcal{C})}{P(X|\mathcal{C})}$ is non-zero, then the above expression cannot be evaluated. Therefore, if $B$ contains a restriction on the set of allowable $X$ , then this restricts mutual comparison among $A, B$ . Likelihood ratios can only be computed directly using Eq. 10 if $P(X|B\mathcal{C})$ is nonzero on a smaller space $\Omega \subseteq {X}$ than ${X}$ on which $P(X|A\mathcal{C})$ is nonzero. To some extent, this caveat explains the computational problems involved with re-weighting samples.[43]

Fig. 1 Reaction diagram showing system states as nodes. Two constraints, $\mathcal{S}_x$ , defining a coordinate space, and $\Omega$ , defining some further restriction are illustrated here. $F$ and $G$ are average value constraints, and their relative likelihoods can be calculated using Eq. 12 in either direction. For identical constraints, all hypotheses are completely connected, as shown by the double-headed, dark arrows. Restrictions such as $\mathcal{S}_x$ or $\Omega$ limit the set of propositions that can be directly compared without knowledge of $P(\Omega|M)/P(\Phi|M)$ , and only one comparison direction is allowed, illustrated by the grey, dotted arrows.

Propositions defined inside $\Omega$ can still be compared against one another, and their likelihoods computed from either the null hypothesis, $\Phi$ , or a new null hypothesis, $\Phi\Omega$ , defined relating only to $X$ allowed by $\Omega$ . Addition of the information, $B = B\Omega$ , to a thermodynamic state can be represented using a commutation diagram (Fig. 1), where paths represent step-wise addition of constraints / hypotheses. Completely commuting classes share an underlying definition of coordinate space. Whenever information of the type $\Omega$ is added, it directly bears on subsequent propositions. Paths adding $B\Omega$ will therefore restrict the set of subsequent questions that may be asked without knowledge of $P(\Omega|\mathcal{C})$ . These paths are therefore represented by a directed edge, branching from the above completely connected graph. The commutation diagram terminology is justified by noting that the multiplicative functions, (11), transforming one probability distribution into another arrive at the same distribution function for any ‘allowed’ path.Because the free energy formula (5) is simply Eq. 10 for the special case $B = \Phi$ , it is convenient to define

$w_{B \rightarrow A}(X\mathcal{C}) \equiv \frac{P(A|X\mathcal{C})}{P(B|X\mathcal{C})}, \quad (11)$

so that free energy differences can be expressed more simply as

$\frac{P(A|\mathcal{C})}{P(B|\mathcal{C})} = \langle w_{B \rightarrow A} | B \rangle. \quad (12)$

As their name implies, these are weights,

$\begin{aligned} P(X|AI) &= \frac{w_{B \rightarrow A}(X) P(X|BI)}{\sum_{X \in \mathbb{S}_x} w_{B \rightarrow A}(X) P(X|BI)} \\ \langle f(x) | A \rangle &= \frac{\langle w_{B \rightarrow A} f(x) | B \rangle}{\langle w_{B \rightarrow A} | B \rangle}. \end{aligned} \quad (13)$

It must be understood that the re-weighting is only valid when $w_{B \rightarrow A} < \infty$ .

4 Specific Applications

The formulas derived in the last section are well-known relations in statistical mechanics. When Boltzmann factors are inserted for the weights $w_{A \rightarrow B}(x)$ , Eq. 12 generates free energy perturbation and umbrella sampling formulas[9] and unambiguously identifies $P(X|A)$ . However, a few important differences from the standard development can be noticed in the above. First, the commutativity of thermodynamic cycles is perhaps not as widely appreciated as it should be. Although it is well known that $Z[\mathcal{C}]$ is a state function, because of its definition in Eq. 7, this shows that a sum of relative free energy differences around any closed loop of a thermodynamic cycle totals to zero with the caveat that Eq. 12 may only be applied from a larger phase space to a smaller. The same is not true of relative entropies (6), which give a sum dependent on the path taken. Instead, it is necessary to define $\mathcal{H}[\mathcal{C}\mathbb{S}_x|\mathbb{S}_x]$ as the state function. Also, the entropy definition of Eq. 6 is independent of changes in phase-space volume because $P(X|AI)$ transforms the same way as $P(X|BI)$ for an injective change of variables $X \rightarrow Y$ .

The physical problem of determining $P(A|X\mathcal{C}) / P(B|X\mathcal{C})$ has not yet been addressed. Because this function can be expressed as a ratio, we need only specify $w_{\Phi \rightarrow A}(X\mathcal{C})$ . Extending the concept of the partition function (Eq. 7), the weights can be interpreted as $w_{\Phi \rightarrow A}(X\mathcal{C}) = Z[AX\mathcal{C}] / Z[X\mathcal{C}]$ . We will present arguments for defining this function for several different types of problems, and find that the standard Boltzmann-factor form, $e^{-\beta_A U_A(x)}$ , is not a universal answer. The general idea will be to find a minimal set of relevant information $XY$ , implied by $X\mathcal{C}$ so that $A$ is conditionally independent from $\mathcal{C}$ when $XY$ is known, simplifying the weight to $w_{\Phi \rightarrow A}(X\mathcal{C}) = w_A(XY)$ . Comparing $w_A$ for different $XY$ then suggests an appropriate relative weight. Specific problems relating to changes in the symmetries of phase space will be addressed in a separate paper.

4.1 Constraints on Phase Space

A simple type of constraint is one that limits hypothesis space.

$\Omega$ : The set of allowed states is limited to those in which $\mathcal{C}$ is a member of the set, $\Omega$ .

This type of constraint can be used to limit investigations to interesting, or highly probable configurations as well as formulate decision problems. Adding $\Omega$ to a state results in a normalization

$P(\mathcal{C}|\Omega A) = \frac{P(\mathcal{C}|A) I(\mathcal{C} \in \Omega)}{\sum_{\{\mathcal{C}\}} P(\mathcal{C}|A) I(\mathcal{C} \in \Omega)}, \quad (14)$

where the indicator function, $I(\cdot)$ is one when the condition is satisfied, and zero otherwise.Given some $\mathcal{C}$ , two constraints that both allow $\mathcal{C}$ should be equally likely, leading to the assignment

$w_{\Omega}(A|\mathcal{C}) = \frac{P(\Omega|\mathcal{C}A)}{P(\Phi|\mathcal{C}A)} = I(\mathcal{C} \in \Omega), \quad (15)$

for any $A$ that does not specifically reference $\Omega$ .

The free energy of the constrained system is the denominator of Eq. 14

$Z[A|\Omega]/Z[A] = \sum_{\{\mathcal{C}\}} P(\mathcal{C}|A) I(\mathcal{C} \in \Omega). \quad (16)$

This is consistent with the assignment in the appendix derived for the case $A = \emptyset$ (Eq. 2).

Given two constraints, we can use Bayes' theorem to show

$P(\Omega_1|\Omega_2FI) = \frac{P(F\Omega_1\Omega_2|I)}{P(F\Omega_2|I)} = \frac{P(F|\Omega_1\Omega_2I)}{P(F|\Omega_2I)} P(\Omega_1|\Omega_2I), \quad (17)$

a simple theorem relating free energies in successively constrained spaces. If $F$ gives no information deciding whether $\mathcal{C}$ satisfies both $\Omega_1\Omega_2$ vs. only $\Omega_2$ , then the constrained likelihoods, $P(F|\Omega I)$ , should be equal. Because the principle of indifference gives $P(\mathcal{C}|\Omega I) = \frac{1}{|\Omega|}$ , and we have shown that $P(\Omega|I) = \text{const.} \times |\Omega|$ , it is possible to not only compare energetic hypotheses, such as $F$ vs. $G$ , but also constraints on phase space. Eq. 17 also shows that once information of this type ( $\Omega$ ) has been moved to the right-hand side, then we will not be able to eliminate it using Eq. 10. Rather, once $\Omega$ has been assumed, then subsequent addition of information will have to include $\Omega$ as part of $\mathcal{C}$ on the right-hand side of Eq. 5. To remove this information and get $P(F|I)$ would require $P(\Omega|FI)$ .

As an example, we consider the multi-ion binding site at a $K^+$ -ion channel selectivity filter (Fig. 2)[1]. Four cationic binding sites are distinguished, and it is assumed that the channel presents a high enough energetic penalty to exclude the possibility of anion occupancy. We do not expect multiple ion occupancy of the same site to be possible (or highly probable) because of mutual electrostatic repulsion and geometric features of the channel. This leads us to the fermion-like default statistics,

$\begin{aligned} \mathbb{S}_x = & \oplus[N_0, N_1 \cdot \oplus(X_1, X_2, X_3, X_4), N_2 \cdot \oplus(X_1X_2, X_1X_3, X_1X_4, X_2X_3, X_2X_4, X_3X_4), \\ & N_3 \cdot \oplus(X_2X_3X_4, X_1X_3X_4, X_1X_2X_4, X_1X_2X_3), N_4X_1X_2X_3X_4], \end{aligned} \quad (18)$

where $n$ particles may occupy $k$ states in $\binom{k}{n}$ ways for a total of $2^k$ elementary states of the system. In the absence of any other information, each state is equally likely.

$P(NX|\mathbb{S}_xI) = \frac{I(NX \in \mathbb{S}_x)}{2^4} \quad (19)$

This probability distribution factors into a product of independent distributions for each site, with equal probability for occupied and unoccupied states. The distribution is shown for reference in Fig. 3a.

The partition function is the number of states, $Z[\mathbb{S}_x] = 2^4$ (16). Using the same equation, the partition function of a constrained system, for example at fixed $N$ , is $Z[N\mathbb{S}_x] = Z[\mathbb{S}x] \sum{X|N} P(NX|\mathbb{S}_xI) = \binom{k}{n}$ . The much debated 'degeneracy factor' for particle counting has already crept in as a consequence of the definition (18), since in the limit $K \gg N$ , $\binom{k}{n} \rightarrow k^n/n!$ . In the following discussion we will successively incorporate mechanical information including the average system energy, and mutual interactions between the ions and the channel.

4.2 Addition of Maximum Entropy-Type Information

The significance of $-\ln Z$ as the (non-dimensional) Gibbs free energy should be immediately recognized. To show this formally, define a hypothesis, $F$ , as

$F$ : The probability distribution of the system, given that $F$ is accepted, is the most likely observational distribution that obeys $\langle f(x)|FA \rangle = F$ for any $A$ .Fig. 2 KcsA ion channel selectivity filter in its biological orientation (intracellular solution below) showing ion binding sites S1-S4. For visual clarity, two of the four identical monomer units are not shown. Physiological conventions for the potential difference, $\Delta V$ , and direction of outward positive current ( $g$ ) are indicated.

Rather than specifying an absolute state, this information is phrased in terms of the change in the probability distribution from an initial state before $F$ has been accepted. Representing this prior state of knowledge by $A$ , then according to the argument above, the least surprising distribution given information $\langle f(x)|FA \rangle$ is the maximum entropy distribution. This distribution should satisfy the mathematical condition,

$P(X|FA) = \operatorname{argmax} \mathcal{H}[FA|A] \text{ s.t. } \langle f(x)|FA \rangle = F. \quad (20)$

The unique solution to this condition is[33]

$P(X|FA) = \frac{P(X|A) \frac{dx_A}{dx_{FA}} e^{-\lambda f(x)}}{\int P(X|A) \frac{dx_A}{dx_{FA}} e^{-\lambda f(x)} dx_{FA}}, \quad (21)$

for some $\lambda(A)$ , proving that the hypothesis $F$ (Eq. 20) is logically equivalent to assuming the probability assignment of Eq. 21. The Jacobian $\frac{dx_A}{dx_{FA}}$ has been explicitly shown in this equation because of the importance of continuous functions in thermodynamics. In a discrete setting, it has the effect of dividing $P(X|A)$ to maintain its normalization. At this solution, the value of $\mathcal{H}$ is

$\mathcal{H}_{\max}[FA|A] = \lambda F + \ln \int P(X|A) \frac{dx_A}{dx_{FA}} e^{-\lambda f(x)} dx_{FA} \quad (22)$

According to Bayes' theorem,

$P(F|XA) = \frac{P(X|FA) dx_{FA} P(F|A)}{P(X|A) dx_A} = \frac{P(F|A) e^{-\lambda f(x)}}{\int P(X|A) \frac{dx_A}{dx_{FA}} e^{-\lambda f(x)} dx_{FA}} \quad (23)$

To find the probability of $F$ from a given $X$ , we consider two cases. First, assume $X$ (and $I$ ) constitute the only data relevant to deciding the plausibility of $F$ . Then $P(F|XAI) = P(F|XI)$ and the terms involving $A$ must evaluate to a constant in the above, so that

$P(F|XAI) = \text{const}(I) e^{-\lambda f(x)} \quad \text{case 1.} \quad (24)$ According to the principle of indifference, the leading constant must not depend on $F$ , and thus is present to remind us that we are only able to compute likelihood ratios. We thus have the Boltzmann weight

$w_F(XA) = \frac{P(F|XA)}{P(\Phi|A)} = e^{-\lambda f(x)}. \quad (25)$

In the second case, we may split $A$ into two pieces of information, $B$ , determining some weighting over a set of hypotheses of which $F$ is a member, and other information, $A'|B$ , irrelevant to $F$ when $X$ is known. Obviously, maximum-relative entropy hypotheses fall into $A'$ , since they are making statements about $X$ and not other hypotheses. Therefore,

$P(F|XA'BI) = \text{const}(F;B) e^{-\lambda f(x)} \quad \text{case 2}. \quad (26)$

The information, $B$ , thus functions as a nuisance parameter[33] because different assumptions lead to different assignments of plausibilities among $F$ among some class that $B$ affects, and we have $w_F(XA) = L(F;A)e^{-\lambda f(x)}$ . Because this type of information leads naturally to consideration of alternate classes of hypotheses, we recognize this dividing information to be associated with a set, $\Omega$ , of hypothesis space. If $B$ re-weights relative likelihoods among alternate $F \in \Omega$ , then $F$ has effectively become a coordinate and $B$ an energy-type constraint. If $B$ re-weights all $F \in \Omega$ by the same amount, then its effect is to shift $P(\Omega)$ . We therefore arrive at the diagram picture of Fig. 1. Relative likelihoods between nodes can be computed via Eq. 10 (case 1) or 5 (case 2). Subgraphs of this structure represent thermodynamic cycles.

Sequentially using the maximum-relative entropy hypothesis, $F$ , requires special consideration of the order in which information is added. For this type of constraint, the probability distribution is found to be independent of the order of information addition. This can be verified by recursion, writing the result of applying Eq. 21 twice. Surprisingly, the relative entropies add to the state function Eq. 9. Starting from $\mathbb{S}_x$ and moving to $F\mathbb{S}_x$ gives

$\mathcal{H}[F\mathbb{S}_x|\mathbb{S}_x] = \sum_{X \in \mathbb{S}_x} P(X|F\mathbb{S}_x) \ln \frac{P(X|\mathbb{S}_x)}{P(X|F\mathbb{S}_x)}$

Adding $F\mathbb{S}_x \rightarrow FG\mathbb{S}_x$ gives

$\begin{aligned} \mathcal{H}[FG\mathbb{S}_x|F\mathbb{S}_x] &= \sum_{X \in \mathbb{S}_x} P(X|FG\mathbb{S}_x) \ln \frac{P(X|F\mathbb{S}_x)}{P(X|FG\mathbb{S}_x)} \\ &= \mathcal{H}[FG\mathbb{S}_x|\mathbb{S}_x] + \sum_{X \in \mathbb{S}_x} P(X|FG\mathbb{S}_x) \ln \frac{P(X|F\mathbb{S}_x)}{P(X|\mathbb{S}_x)} \\ &= \mathcal{H}[FG\mathbb{S}_x|\mathbb{S}_x] - \lambda F - \ln \frac{P(F|\mathbb{S}_x)}{P(\Phi|\mathbb{S}_x)} \\ &= \mathcal{H}[FG\mathbb{S}_x|\mathbb{S}_x] - \mathcal{H}_{\max}[F\mathbb{S}_x|\mathbb{S}_x]. \end{aligned}$

Therefore, when $F$ is a maximum-relative entropy hypotheses,

$\mathcal{H}_{\max}[FGA|A] = \mathcal{H}_{\max}[FGA|FA] + \mathcal{H}_{\max}[FA|A]. \quad (27)$

Jaynes[33] has used the functional Eq. 6 and Eq. 7 to derive a host of general relations for maximum entropy constraints including the computation of averages,

$\langle f_j(x)|\mathcal{C} \rangle = - \frac{\partial \ln Z[\{F_j\}\mathbb{S}_x]}{\partial \lambda_j},$

and the (Legendre transform of the) first law of thermodynamics

$d(-\ln Z[\{F_j\}\mathbb{S}_x]) = \sum_j \langle f_j(x)|\mathcal{C} \rangle d\lambda_j - \frac{\partial \ln Z}{\partial |\mathbb{S}_x|} d|\mathbb{S}_x|, \quad (28)$

from which the Gibbs relations,

$\frac{\partial^2 \ln Z[\mathcal{C}]}{\partial \lambda_i \partial \lambda_j} = \langle f_i(x) f_j(x)|\mathcal{C} \rangle - \langle f_i(x)|\mathcal{C} \rangle \langle f_j(x)|\mathcal{C} \rangle = - \frac{\partial \langle f_j(x)|\mathcal{C} \rangle}{\partial \lambda_i} = - \frac{\partial \langle f_i(x)|\mathcal{C} \rangle}{\partial \lambda_j}, \quad (29)$ may be found. We find that it is more appropriate to phrase these relationships in terms of Legendre transforms of the entropy functional, $\mathcal{F} \equiv \sum_j \lambda_j \langle f_j(x) \rangle - \mathcal{H}$ for the specific problems considered in Sec. 4.4. This distinction was unnecessary before because $\mathcal{F} = -\ln Z$ for distributions derived strictly from constraints of the maximum-entropy form.

A central maximum entropy constraint in statistical mechanics is a constraint on average energy. We label this constraint by $\beta$ . A simplified energy function is constructed for the ion channel system by including a mutual Coulomb repulsion between the ions, constrained to the vertical axis and spaced at $3.5\text{\AA}$ . We also assume a simple stabilization energy for each ion from the protein, $E^0 \approx -115$ kcal/mol. Abbreviating $NX$ to $X$ , the energy function is

$E(X) = E(n, x) = \frac{1}{2} \sum_{i \neq j} \frac{q^2}{4\pi\epsilon_0 |x_i - x_j|} I(X_i X_j) + \sum_i E^0 I(X_i). \quad (30)$

Placing this constraint on the average system energy at constant $N$ leads to the well-known canonical distribution with partition function

$\begin{aligned} Z[N\beta\mathbb{S}_x] &= \frac{P(N\beta\mathbb{S}_x|I)}{P(\Phi|I)P(\varphi|I)} \\ &= Z[N\mathbb{S}_x] \sum_X P(X|N\mathbb{S}_x) w_\beta(NX) = \binom{4}{n} \sum_{X|N} \binom{4}{n}^{-1} e^{-\beta E(X)}. \end{aligned}$

Here, it can be seen that the probability for $N\mathbb{S}_x$ , $\binom{4}{n} P(\varphi|I)$ (51), cancels in the expression so that the increment $Z[N\beta\mathbb{S}_x]/Z[N\mathbb{S}_x]$ is an average according to Eq. 5. Removing the constraint on $N$ also leads to the multicanonical ensemble in the same way, viz. $Z[\beta\mathbb{S}_x] = \sum_N Z[N\beta\mathbb{S}_x]$ (Eq. 16), $P(N|\beta\mathbb{S}_x) = Z[N\beta\mathbb{S}_x]/Z[\beta\mathbb{S}_x]$ .

In either case, we can assign the parameter $\beta$ the meaning of, “there exists a physical mechanism that decreases the likelihood of the system being in a high-energy state.” To separate these energy states, we introduce a constraint on the energy, denoted by $E$ . Thus, if a system were allowed to choose its own energy state¹, the force would bias this choice according to $P(E\beta|A)/P(E\Phi|A) = e^{-\beta E}$ . We can set this bias, $\beta$ , to give a reference system with known properties by exactly balancing its internal tendency toward higher energy, $P(E+dE|A)/P(E|A)e^{-\beta dE} = 1$ . This implies that $\beta$ should solve $\beta = \frac{\partial}{\partial E} \ln Z[EA]$ for a reference system with known energy, for example a thermometer in which energy is easily measured by size expansion. Because our reference thermometer is constantly exchanging energy with the environment, we usually observe its average energy, and $\beta$ should be chosen such that $\langle E|\beta A \rangle = -\frac{\partial}{\partial \beta} \ln Z[\beta A]$ . The difference between these values (maximum vs. average energy) is important for small systems, but becomes negligible in the limit of large system sizes.[8] Using either of these forces in the present system mimics the effect of allowing energy exchange between the thermometer at this state and the system. This explains the convention of identifying temperature with the dilation of a thermometer and its connection to the statical force, $\beta$ .

Another constraint we may add is the inclusion of an external force on the total number of ions, $\mu$ . Because the $n$ ions are more likely to choose an environment with lower energy, $-\mu n$ , this changes the probability of ion occupancy by $\frac{P(\mu|N)}{P(\Phi|N)} = e^{\beta \mu n}$ . The multiplier $\beta$ appears because we want to express $\mu$ in energy units. Just as above, we can choose the chemical potential, $\mu$ , to give a reference system with known properties by balancing its internal energy change on ion addition using the choice $(\beta\mu) = -\ln \frac{Z[(N+1)A]}{Z[NA]}$ .[5]

We can mimic the effect of allowing $K^+$ transfer from a bulk 100 mM KCl solution to the present system (with the corresponding $Cl^-$ moved to a similar environment and its contribution neglected) by choosing $\mu_{K^+} = -81 + \beta^{-1} \ln 0.1$ kcal/mol.[16] Without the constraint on $N$ , the system was effectively allowed to exchange particles with vacuum. The combination of both constraints, which we refer to as $F = \beta\mu$ , is shown in panel (c) of Fig. 3. The preference for the separated state ( $X_1 X_4$ ) in this model shows the effect of mutual ion repulsion.

¹ Alternatively, to avoid anthropomorphic terminology, if the system energy is not constrained and we compare the maximum entropy $P(E|A)$ .Fig. 3 Ion occupancy distribution in successively complex models.

4.3 Addition of Variables (Generalized Ensemble Methods)

The theoretical background in Sec. 3 allows us to go further than the most common relations of thermodynamics summarized in the last two subsections. In particular, the choice of coordinate space, $\mathbb{S}_x$ , is no different than any other constraint except that it is almost never moved to the left-hand side to form quantities such as $P(F\mathbb{S}_x|I)$ and comparisons between states are carried out almost exclusively with a fixed $\mathbb{S}_x$ . The addition of coordinates is associated with the transition from canonical to multicanonical ensembles. It has served as the starting point for some very difficult reading in thermodynamics textbooks involving over/under counting and (in)distinguishability arguments. Our definition of $P(\Omega|I)$ (2) counts each ‘state of knowledge’ once, and thus directly accounts for (in)distinguishability factors. As will be shown in a subsequent paper, this result does not require input from quantum mechanics other than a specification of the allowed states of the system.

Since the rules have already been given above, we proceed to an example, addition of protein-ion interactions by assuming a set of protein conformational states. This leads to the conception of a generalized ensemble. A simplistic example is provided by assuming (in addition to an open state, $O$ ) two ‘C-type’ inactivated states in which a pinching motion of the pore prevents occupancy at site 2 (state $I_1$ ) or sites 2 and 3 (state $I_2$ )[13]. These states are assumed to be mutually exclusive and exhaustive, so that all conformational states, $Y$ , are a member of the space $\Omega = \oplus(O, I_1, I_2)$ . Before any coupling is assumed, the total number of occupancy states, $|\mathbb{S}_x|$ , is multiplied $|\Omega|$ times to create the product space, $\Omega \times \mathbb{S}_x$ . When the conformational state is known, $\Omega$ is irrelevant, and we can intuitively use the knowledge of its coupling to $X$ (denoted by $G$ )to guess a form for $P(X|YFG\mathbb{S}_x)$ . This is an instance where intuition runs ahead of logical reasoning, and it is difficult to see the logical steps required to arrive at this result.

Because $X$ is coupled to $Y$ , the conformation, $Y$ is also coupled to the occupancy state, $X$ , and it is necessary to know the full distribution, $P(XY|F\mathbb{S}_x\Omega)$ . This can be rationally arrived at using our thermodynamic diagram (Fig. 1). We could first add a non-interacting space for $\Omega$ to get $P(XY|F\mathbb{S}_x\Omega) = P(X|F\mathbb{S}_x)P(Y|\Omega)$ ( $F\mathbb{S}_x \rightarrow F\mathbb{S}_x\Omega$ ) and then add information on their coupling, $F\mathbb{S}_x\Omega \rightarrow F\mathbb{S}_x\Omega$ . The distribution of $X$ changes when $G\Omega$ is known, since $G$ places constraints on both $X$ and $Y$ .

$P(X|YFG\mathbb{S}_x) = \frac{P(XG|YF\mathbb{S}_x)}{P(G|YF\mathbb{S}_x)} = \frac{P(XG|YFG\mathbb{S}_x)}{\sum_{X \in \mathbb{S}_x} P(XG|YFG\mathbb{S}_x)} \quad (31)$

Is there more to learn from this result? In Sec. 3 we showed that any order of adding the information leads to equivalent results, as long as free energy differences are computed in the direction of increasing constraints. Given information $FG\Omega\mathbb{S}_x$ (or $FGY\mathbb{S}_x$ ), we are able to write down the distribution for $XY$ simply by maximizing the entropy $\mathcal{H}[FG\Omega\mathbb{S}_x|\Omega\mathbb{S}_x]$ (or $\mathcal{H}[FGY\mathbb{S}_x|\mathbb{S}_x]$ ). These generate conditional distributions given information of the type: ‘the system is in a given coarse state.’ The unanswered question is what the distribution over the coarse states looks like. To answer this, we consider the process $F\mathbb{S}_x \rightarrow F\mathbb{S}_x\Omega$ . The mutually exclusive and exhaustive condition, $\Omega$ , defines a space for the coarse coordinates, $Y$ . However, without this space, we may still calculate $F\mathbb{S}_x \rightarrow YFG\mathbb{S}_x$ ,

$\begin{aligned} \frac{Z[FGY\mathbb{S}_x]}{Z[F\mathbb{S}_x]} &= \frac{P(FGY\mathbb{S}_x|I)}{P(\Phi|I)P(F\mathbb{S}_x|I)} \\ &= \frac{P(GY|F\mathbb{S}_x I)}{P(\Phi|I)} = \sum_{X \in \mathbb{S}_x} \frac{P(GY|FXI)}{P(\Phi|I)} P(X|F\mathbb{S}_x). \end{aligned}$

This could also have been arrived at through the intermediary path $F\mathbb{S}_x \rightarrow YF\mathbb{S}_x \rightarrow YFG\mathbb{S}_x$ . The probability for $Y$ in some mutually exclusive and exhaustive set is a sum of these

$\begin{aligned} \frac{Z[FG\Omega\mathbb{S}_x]}{Z[F\mathbb{S}_x]} &= \frac{P(FG\Omega\mathbb{S}_x|I)}{P(\Phi|I)P(F\mathbb{S}_x|I)} \\ &= \sum_{Y \in \Omega} \frac{P(GY|F\mathbb{S}_x I)}{P(\Phi|I)} = \sum_{Y \in \Omega} \frac{Z[FGY\mathbb{S}_x]}{Z[F\mathbb{S}_x]}. \end{aligned}$

We find again that the partition function of Eq. 7 has a direct probability interpretation as an un-normalized probability.

This idea forms the basis for understanding the free energy difference as a log-likelihood ratio between two Hamiltonians as expressed by Eq. 8 and for extending a canonical ensemble into a multi-canonical one. To perform the extension, define some space over which a previously fixed parameter may vary, and then integrate the partition function over this space. Given a set of mutually exclusive and exhaustive coarse states, we may write down the micro/multi split using

$P(XY|F\mathbb{S}_x\Omega) = P(X|YFG\mathbb{S}_x)P(Y|F\mathbb{S}_x\Omega) \quad (32)$

and the coarse probabilities using either of

$P(Y|F\mathbb{S}_x\Omega) = \frac{\sum_{X \in \mathbb{S}_x} P(XYFG|\mathbb{S}_x\Omega)}{\sum_{Y \in \Omega} \sum_{X \in \mathbb{S}_x} P(XYFG|\mathbb{S}_x\Omega)} \quad (33)$

$= \frac{Z[YFG\mathbb{S}_x]}{\sum_{Y \in \Omega} Z[YFG\mathbb{S}_x]}. \quad (34)$

The denominators of the second and third expressions correspond to the free energies for processes $\mathbb{S}_x\Omega \rightarrow F\mathbb{S}_x\Omega$ , and $\Phi \rightarrow F\mathbb{S}_x\Omega$ , respectively. This argument holds when $Y$ denotes any type of constraint, and the generalized ensemble method is an example of the above when $Y$ are alternate Hamiltonians.[21,45]

As an aside, the interpretation of Eq. 6 given in the introduction implies that the relative entropy addition $\mathbb{S}_x \rightarrow \mathbb{S}_x\Omega$ (as well as $F\mathbb{S}_x \rightarrow F\mathbb{S}_x\Omega$ ) is zero. This is a reasonable result in the following sense. If some distribution over $X \in \mathbb{S}_x$ is assumed, and new observations of a coordinate, $Y$ , became available that werenevertheless completely random, then $F\mathbb{S}_x\Omega$ does not have any additional informational value, relative to $F\mathbb{S}_x$ . This is contrary to the behavior of the thermodynamic entropy because the thermodynamic entropy increases whenever states are added to the system, even if they are irrelevant, leading to nonzero entropy for nuclear spin systems at zero Kelvin. Instead of this behavior, it seems preferable to define the entropy relative to the completely uniform distribution, as we have done here. In this case, the probability for occupying degenerate (but distinguishable) states increases because of the counting conventions of the free energy functional.

Incorporating the conformational state information, $G\Omega$ , into the ion channel system leads to the results shown in panels (b, no energetic constraint) and (d, constrained chemical potential and energy) of Fig. 3. Because fewer states are available to the system in conformations $I_1$ and $I_2$ , they appear less often. Colloquially, they are said to be entropically un-favorable. In our derivation, this entropy decrease came about from adding information $G$ . This result that could have been derived either as a consequence of formally reducing the number of occupancy states (as we have done) or by assuming a very large energy for un-allowed occupancies at $I_1$ and $I_2$ . The statement, ‘ $I_2$ is entropically unfavorable’ is therefore expressing the fact that the accessible volume for $X$ has decreased from some previously available volume upon changing $\Omega$ to $I_2$ or upon adding information $I_2G$ . The conventional thermodynamic entropy implicitly defines this previously available volume, regardless of whether such a state physically exists. This dependence is made explicit in the present definition of a relative entropy.

4.4 Conditional Maximum-Entropy Information

If, instead of the energy function assumed for $F$ in the above example, we had assumed some experimentally known probability distribution over $X$ , then adding information $G$ becomes qualitatively different. In order to not interfere with the distribution over $X$ , the information $F$ must take priority over any other constraints we may add to the problem. However, this does not prevent us from coupling $Y$ to $X$ using the conventional maximum-relative entropy hypothesis,

$G$ : The probability of $XY$ , given that $G$ is accepted, is the most likely observational distribution that obeys $\langle g(y;x)|AXG \rangle = G(X)$ for any $AX$ .

This is because the entropy functional decomposes as

$\begin{aligned} \mathcal{H}[AG\mathbb{S}_x\Omega|A\mathbb{S}_x\Omega] &= \sum_{XY} P(XY|AG\mathbb{S}_x\Omega) \ln \frac{P(XY|A\mathbb{S}_x\Omega)}{P(XY|AG\mathbb{S}_x\Omega)} \\ &= \sum_{XY} P(XY|AG\mathbb{S}_x\Omega) \left[ \ln \frac{P(X|A\mathbb{S}_x\Omega)}{P(X|AG\mathbb{S}_x\Omega)} + \ln \frac{P(Y|AX\Omega)}{P(Y|AGX\Omega)} \right] \\ &= \mathcal{H}_X[AG\mathbb{S}_x\Omega|A\mathbb{S}_x] + \sum_X P(X|AG\mathbb{S}_x\Omega) \mathcal{H}_Y[AGX\Omega|AX\Omega]. \end{aligned} \quad (35)$

The sums in this section are all taken to be over $X \in \mathbb{S}_x$ and $Y \in \Omega$ without loss of generality since we choose $\mathbb{S}_x \times \Omega$ to be the set of all $XY$ relevant to deciding $A$ or $G$ . The last term in the expansion above is a conditional entropy, which is a functional of $P(Y|AGX\Omega)$ and depends on $X$ . Because each conditional distribution can be chosen independently from the others and from $P(X|AG\mathbb{S}_x\Omega)$ , the entropy of each one is independently maximized when $\mathcal{H}[AG\mathbb{S}_x\Omega|A\mathbb{S}_x]$ is maximum. However, the presence of $Y$ allows $\mathcal{H}_X[AG\mathbb{S}_x\Omega|A\mathbb{S}_x]$ to differ from $\mathcal{H}_X[A\mathbb{S}_x|A\mathbb{S}_x] = 0$ , since $P(X|AG\mathbb{S}_x\Omega) = \sum_Y P(XY|AG\mathbb{S}_x\Omega)$ . For these two to be equal in general requires that $P(X|AG\mathbb{S}_x\Omega) = P(X|A\mathbb{S}_x)$ – i.e. that the distribution of $X$ is not dependent on the information $G\Omega$ when $A$ is present.

Because we want to specify the marginal distribution of $X$ directly, it is convenient to denote this information as the compound hypothesis,

$F_X$ : The probability distribution of $X$ is determined by information $F_X$ and unchanged by information $G\Omega$ .

When this hypothesis is in place, we will have $P(X|F_XG\mathbb{S}_x\Omega) = P(X|F_X\mathbb{S}_x)$ . Bayes’ theorem says that we must also have $P(G\Omega|XF_X\mathbb{S}_x) = P(G\Omega|F_X\mathbb{S}x)$ , implying $w{G\Omega}(F_XX) = 1$ . Effectively, the $Y$ have become ‘imaginary states’ to the system in the sense that there is no free energy change for $F_X\mathbb{S}_x \rightarrow F_XG\mathbb{S}_x\Omega$ .Although there is no change to $\mathcal{H}_X$ or the distribution of $X$ , maximizing (35) results in

$P(Y|F_XGX\Omega) = \frac{P(Y|F_XX\Omega)e^{-\lambda g(y;x)}}{\sum_{Y \in \Omega} P(Y|F_XX\Omega)e^{-\lambda g(y;x)}}, \quad (36)$

an expression reminiscent to the transition probability for a Markov process. The conditional entropy is

$\begin{aligned} \mathcal{H}_Y[F_XGX\Omega|F_XX\Omega] &= \sum_{Y \in \Omega} P(Y|F_XGX\Omega) \ln \frac{P(Y|F_XX\Omega)}{P(Y|F_XGX\Omega)} \\ &= \langle \lambda g(y;x) | F_XGX\Omega \rangle + \ln \sum_{Y \in \Omega} P(Y|F_XX\Omega) e^{-\lambda(x)g(y;x)}, \end{aligned}$

and we define as usual

$w_G(XY) = \frac{P(G(X)|XYI)}{P(\Phi|XYI)} = e^{-\lambda g(y;x)}.$

These considerations are sufficient to fill out the thermodynamic cycle when $F_X$ is assumed, as has been done in the left half of Fig. 4.

Fig. 4 Reaction diagram for adding conditional maximum entropy information. Partition functions, determined by likelihood ratios for each transition, are written out for each state. For the ‘forward’ process $F_X S_x \rightarrow F_X G S_x \Omega$ , there is a ‘reverse’ process $F_Z S_x \rightarrow F_Z G^* S_x \Omega$ signifying the dual maximum conditional entropy problem.

Imposing the distribution among ion occupancy states given in Ref. [1] (shown for reference in Fig. 3f) as $F_X$ , application of this procedure to determine the conformational equilibrium shows that the channel is almost always in the open state due to the high probability for occupancy of S2. The probabilities for $I_1$ and $I_2$ are $2.3 \cdot 10^{-4}$ and $8 \cdot 10^{-6}$ . Although $X|F_X$ is independent from $G\Omega$ , knowledge of $Y$ is still informative for $X$ , as

$P(X|YF_XG\mathbb{S}_x) = \frac{P(X|F_X\mathbb{S}_x)P(Y|F_XGX\Omega)}{\sum_{X'} P(X'|F_X\mathbb{S}_x)P(Y|F_XGX'\Omega)}. \quad (37)$

Using this method of inference, the occupancy distribution in the open state is shown in Fig. 3e. There is a very slight increase in occupancy at S2 and a decrease at S3, but the effect is small because the open structure is dominant. Note that our assumption that the free energies of Ref. [1] are averages over the conformational states was chosen for illustration and may be incorrect.

We argue that addition of this type of conditional information is central to non-equilibrium statistical mechanics. To derive an ensemble of trajectories, we add all possible transitions, $Y$ , originating from each state, $X$ . The initial state and its transitions are linked by some information, $G$ , which determines the distributionof $Y$ given $X$ . This constraint determines a maximum entropy transition probability density, as considered in differential form in Refs. [6,40]. The hypothesis $F_X$ states that what we know about the starting distribution is completely determined by $F_X$ and not by any possible, but unknown, future events. It is required for the process to be non-anticipating in the sense that no information about processes we may carry out in the future – $G\Omega$ is available from $X$ .

There is a great deal of literature on methods for non-equilibrium statistical mechanics. Because this paper is intended to show a new way of approaching problems, we will confine ourselves to deriving two main results of the non-equilibrium theory. The first is the more recent development of fluctuation formulas for irreversible entropy. Fig. 4 displays the duality between fixing $F_X$ at the initial time and fixing its propagated distribution $F_Z$ . In setting up an inference problem for $Y$ starting from $F_X G X$ , the distribution of $Y$ is given by (36). If this distribution is used to determine $F_Z$ using $P(Z|F_X G \mathbb{S}x) = \sum{XY} P(Y|F_X G X \Omega) P(X|F_X \mathbb{S}_x) I(Z = Z(Y))$ , some information loss occurs when $F_X$ is discarded and only $F_Z$ and information constraining the transitions between states, $G$ , retained. Assuming the transitions, $Y$ , specify both end-points $X, Z$ , the distribution of $Y$ carries the complete information for this process. Using the information loss metric[3,33],

$\begin{aligned} L &= -\mathcal{H}[F_X G \mathbb{S}_x \Omega | F_Z G^* \mathbb{S}_x \Omega] \\ &= \sum_Y P(Y|F_X G \mathbb{S}_x \Omega) \ln \frac{P(Y|F_X G \mathbb{S}_x \Omega)}{P(Y|F_Z G^* \mathbb{S}_x \Omega)} \\ &= \left\langle \ln \frac{P(Y|F_X G X \Omega)}{P(Y|F_Z G^* Z \Omega)} + \ln \frac{P(X|F_X \mathbb{S}_x)}{P(Z|F_Z \mathbb{S}_x)} \right\rangle \\ &= \mathcal{H}_Z[F_Z \mathbb{S}_x | \mathbb{S}_x] - \mathcal{H}_X[F_X \mathbb{S}_x | \mathbb{S}_x] + \left\langle \ln \frac{P(Y|F_X G X \Omega)}{P(Y|F_Z G^* Z \Omega)} \right\rangle. \end{aligned} \quad (38)$

The averaging is taken in the forward direction, and so $L \geq 0$ evidently represents the amount by which the real distribution $F_X G \rightarrow XY F_X G$ contains information not present in a distribution guessed from $F_Z G^*$ . Note that if $G$ allows only one-to-one $XZ$ , the transitions are deterministic, and zero information is lost. More generally, if forward and backward inference directions yield the same joint distribution so that $F_X G = F_Z G^*$ , then there is no way to discern the direction of time's arrow and no information is dissipated.

The above relations are purely statistical, and have been stated in terms of maximum entropy constraints for forward, $G$ , and reverse, $G^*$ , inference problems. They are generally valid for any choice of $G^*$ . In derivations of the fluctuation theorem,[37] a particular choice of $G^*$ is made corresponding to time-reversed equations of motion. The statistical perspective expressed here shows that this operation is confined to the choice for $G^*$ , and provides a suggestion as to the informational role of time-reversal. For example, the forward constraints are consistent with the Langevin equation,

$P(Z|X) \propto e^{-(\Delta p - F_X)^2/2\sigma^2 - (\Delta p - F)v_X \beta/2},$

so that the momentum change ( $\Delta p = p_Z - p_X$ ) is normally distributed about $F - \gamma v$ to yield a Boltzmann distribution. The correct choice of $G^*$ is given by changing $\beta$ to $-\beta$ in the above equation. The equation for Brownian motion can be similarly derived by constraining $\Delta x^2$ with $\sigma^{-2}/2$ and $-\Delta x F/2$ with $\beta$ . In both of these equations, the same set of forward transitions are used for $G^*$ , but the sign of the Lagrange multipliers constraining the fluxes are reversed. We can thus intuitively see that reversing the sign of externally applied forces gives the correct fluctuation theorems using the information loss metric (Eq. 38). This relation is valid in transient stochastic dynamics, and allows for entropy to increase both by increasing the entropy of the distribution (first part of Eq. 38) and by the presence of irreversible fluxes (last term of Eq. 38). Such an informational perspective is required for understanding entropy increase for processes which do not have time-reversal symmetry, but nevertheless have well-defined and reproducible behavior.

Retaining only information about the end-points of a path $\Gamma = X_1 X_2 \dots X_N$ , from $F_1$ to $F_N$ , we denote $\Gamma_i = X_1 \dots X_i$ and $\Gamma^i = X_i \dots X_N$ . We also assume constant $\mathbb{S}x$ and conditional independence, $P(X{i+1}|G\Gamma_i F_1) = P(X_{i+1}|G\Gamma_i)$ . If the transitions are known from $\Gamma$ , the total dissipation is

$dS/k_B \equiv L = \mathcal{H}_N[F_N \mathbb{S}_x | \mathbb{S}_x] - \mathcal{H}_1[F_1 \mathbb{S}_x | \mathbb{S}_x] + \left\langle \sum_{i=1}^{N-1} \ln \frac{P(X_{i+1}|G\Gamma_i \mathbb{S}_x)}{P(X_i|G^* \Gamma^{i+1} \mathbb{S}_x)} \right\rangle, \quad (39)$

where $k_B$ is the Boltzmann constant. This path functional is in agreement with the thermodynamic entropy production given by the ratios of forward and reverse path probabilities[11,24,37] as well as an expression forentropy production deduced from mechanical considerations[6] when $\ln P(X_{i+1}|G\Gamma_i\mathbb{S}x)/P(X_i|G\Gamma^{i+1}\mathbb{S}x) = -\lambda g(x{i+1}, x_i)$ , with $g$ a generalized flux. We have derived this result from the direction of information propagation,[26] and no special treatment has been given to the multiplier, $\beta$ , defining the externally applied temperature. This derivation also avoids the complications associated with defining a steady-state. A curious feature is that it does not make specific reference to heat. This may be explained by noting that the transitions associated to fluxes, $g$ , are probabilistic and represent interaction with an external system. These transitions may add or remove energy from our system, while the external system remains at a fixed thermostatic temperature state, $\beta{\text{ext}}^{-1}$ . We then define the heat injected from the environment as the net energy gain, $\beta_{\text{ext}} dQ = \langle \lambda g(x_{i+1}; x_i) \rangle$ . This identifies (39) with the Clausius form for the second law,[50,29,32]

$dS/k_B = dS_{\text{int}}/k_B - \beta_{\text{ext}} dQ \geq 0. \quad (40)$

The above claims relating transition probabilities to fluxes can be established for the Langevin and Brownian equations, and have been more thoroughly explored in a manuscript devoted to nonequilibrium problems[60].

The next result will be a derivation of the fluctuation-dissipation theorems from the Gibbs relations. Because our free energy for the process $A = F_{X_1}\mathbb{S}{X_1}G{12}\mathbb{S}{X_2}G{123}\dots$ is simply the free energy for $F_{X_1}\mathbb{S}_{X_1}$ , we must find an alternate free energy functional. The ‘caliber’ function of Jaynes,

$\mathcal{H}[A\mathbb{S}_\Gamma|\mathbb{S}_\Gamma] = \sum_{\Gamma} P(\Gamma|A\mathbb{S}_\Gamma) \ln \frac{P(\Gamma|\mathbb{S}_\Gamma)}{P(\Gamma|A\mathbb{S}_\Gamma)} \quad (41)$

lends itself to the task by defining the Legendre transform

$\begin{aligned} \mathcal{F}[\lambda] &\equiv \sum_{i=1}^N \langle \lambda_i g_i(\Gamma_i) | A\mathbb{S}_\Gamma \rangle - \mathcal{H}[A\mathbb{S}_\Gamma|\mathbb{S}_\Gamma] \\ &= - \sum_{\Gamma} P(\Gamma|A\mathbb{S}_\Gamma) \ln \prod_{i=1}^N \langle e^{-\lambda_i g_i(X_i; \Gamma_{i-1})} | \Gamma_{i-1} \rangle \\ &= \sum_{i=1}^N \langle -\ln Z[G_i \Gamma_{i-1} \mathbb{S}_x] | A\mathbb{S}_\Gamma \rangle. \end{aligned} \quad (42)$

The first derivatives generate a ‘first law’ for non-equilibrium processes,

$\begin{aligned} \frac{\partial \mathcal{F}}{\partial \lambda_i} &= \langle g_i(\Gamma_i) \rangle \\ d\mathcal{F} &= \sum_i \langle g_i(\Gamma_i) \rangle d\lambda_i. \end{aligned} \quad (43)$

This is a path average conditional on $A\mathbb{S}_\Gamma$ , but this notation has been suppressed for clarity. The second derivatives are the Green-Kubo formulae

$\frac{\partial^2 \mathcal{F}}{\partial \lambda_i \partial \lambda_j} = - \langle \delta g_i(\Gamma_i) \delta g_j(\Gamma_j) \rangle \quad (44)$

For the ion channel example we have been developing, a completely new set of constraints must be developed for transitions between states. For the forward problem, we are given $X_i$ as well as some set of feasible transitions, $Y|X_i$ , from state $i$ . Because the probability of inactivated states are negligible, we consider only the open channel state, and single-jump transitions as shown in Fig. 2 of Ref. [1]. Five transitions from each state are possible, corresponding to doing nothing, or all sites moving up or down by the addition of a $K^+$ or a water at the appropriate end.

In order to produce a system that conserves energy, we place a constraint on the energy change at each step.

$P(Y|X_i \beta' \Omega) = \frac{e^{-\beta(E(X_{i+1}) - E(X_i))}}{\sum_{Y|X_i} e^{-\beta(E(X_{i+1}(Y)) - E(X_i))}} \quad (45)$

This amounts to a stochastic addition of energy to the system in the amount of $\langle dE|X_i \rangle = -\frac{\partial Z[X_i \beta' \Omega]}{\partial \beta'}$ . The steady-state distribution will differ from the canonical distribution in general because the normalization constant, $Z[X_i \beta' \Omega]$ , depends on $X_i$ . This difference has come about because of the addition of information limitingwhich transitions are possible. If all states were available during each transition, the normalization constant would again be independent of $X_i$ and we would recover the canonical distribution. For the Langevin and Brownian equations with uniform applied temperature, the canonical distribution is also obtained because the normalization constant is independent of $X_i$ .

Because transitions are not generally spontaneous, but may have an energy barrier, we add another constraint, $\beta' E^\dagger$ , directly on the number of transitions per time-step, $\tau$ ,

$P(Y|X_i\beta'E^\dagger\Omega) = \frac{P(Y|X_i\beta'\Omega)e^{-\beta E^\dagger I(Y)/\tau}}{\sum_{Y|X_i} P(Y|X_i\beta'\Omega)e^{-\beta E^\dagger I(Y)/\tau}}. \quad (46)$

These barriers could, of course, be made to depend arbitrarily on the transition, $Y$ , but for simplicity we assume that they are present only when a transition occurs and uniformly equal to the sum of 2 ps kcal/mol. The stochastic process specified by these two formulas has the identity matrix as the small time-step limit, and an equilibrium-like distribution as the large step limit. The energy barrier assumption differs from the usual rate equation formulation, since the Chapman-Komologrov equation no longer holds. Instead, the behavior of the above system is dependent on the time-scale studied, reminiscent of fractal kinetic models.[42] Note also that $E^\dagger$ may be a function of the time-step, $\tau$ , to give a specified average number of transitions to recover a Markov model. Because this is a novel kinetic model, it remains to be seen how well these two constraints reproduce actual dynamics; however the form of this equation matches well the nonlinearity near $t = 0$ in exact transition probabilities computed for the Müller-Brown potential surface (Fig. 4 of Ref. [70]), while variations in the surface chosen to divide states can be mimicked by changes in $E^\dagger$ .

To finish our specification of non-equilibrium jump processes, we specify the forces on spontaneous ion creation and annihilation. Removing the possibility of a change in ion number unless it either enters or exits through an end of the channel, we can then specify the external force, $\mu$ , acting on these special events using the same type of energy constraint (and assuming for simplicity the same energy barrier) as above. This leads to

$P(Y|X_i A \mu) = \frac{P(Y|X_i A) e^{\beta \mu_{\text{int}} dN_{\text{int}}(Y) + \beta \mu_{\text{ext}} dN_{\text{ext}}(Y)}}{\sum_{Y|X_i} P(Y|X_i A) e^{\beta \mu_{\text{int}} dN_{\text{int}}(Y) + \beta \mu_{\text{ext}} dN_{\text{ext}}(Y)}}, \quad (47)$

with $dN_{\text{int}}$ and $dN_{\text{ext}}$ representing the number of ions added to the system ( $\pm 1$ ) from the internal and external solutions, respectively. The form of this transition probability is similar to that of a recent paper on currents in boundary driven Kawasaki dynamics,[4] which were also analyzed using a cumulant-generating function similar to Eq. 42.

An outward-driving voltage can be added to the system by imposing an external field, increasing the likelihood for transitions moving ions outward by an amount $e^{\beta \Delta V g(Y)}$ . The function $g(Y) = \sum_j I(X_j \leftarrow X_{j+1})$ counts the number of ions taking a step outward during transition $Y$ , consistent with the sign convention of Fig. 2. For ion movements internal to the channel, this has an equivalent effect on the path distribution as applying an energy constraint $e^{\beta \sum_j V_j I(X_j)}$ ( $I(\cdot)$ is the indicator function). These constraints provide a complete kinetic model for our ion channel in arbitrary solution conditions and driving voltages.

The steady-state ion occupancies at zero applied voltage and $\mu$ identical to that for (e) and (f) of Fig. 3 are plotted in panel (g). The steady-state distribution is slightly altered from the local equilibrium prediction of (e). This happens despite the fact that the transition probability obeys detailed balance with respect to the steady-state, and exactly five transitions lead into each ion occupancy state. The reason is that the transition probability is normalized by a different value for the forward and reverse transitions.

As a final note, the current can be calculated as a perturbation from a steady-state using Eq. 44

$\begin{aligned} \langle g(t) \rangle_{\beta \Delta V'} &= - \left. \frac{\partial \mathcal{F}[\beta \Delta V]}{\partial \beta \Delta V(t)} \right|_{\beta \Delta V'} \\ &\approx - \left. \frac{\partial \mathcal{F}[\beta \Delta V]}{\partial \beta \Delta V(t)} \right|_{\beta \Delta V} - \sum_{i \leq t} \frac{\partial^2 \mathcal{F}}{\partial \beta \Delta V(t) \partial \beta \Delta V(i)} \beta (\Delta V'(i) - \Delta V(i)) \\ &= \langle g(t) \rangle_{\Delta V} + \beta \sum_{i \leq t} \langle \delta g_i \delta g_t \rangle_{\Delta V} (\Delta V'(i) - \Delta V(i)). \end{aligned} \quad (48)$

This gives the time-dependent linear response for small changes in the holding potential. The conductance near the resting potential is the time-integral of the steady-state current auto-correlation function (at zero average current), in accordance with Onsager's phenomenological equation[53]. The negative sign comesabout because of the positive sign of the constraint ( $\beta\Delta V$ ). At other voltages, this integral is the slope of the current/voltage curve. The presence of an additive constant explains why Onsager reciprocity only holds near equilibrium, where the fluxes are zero. Other Legendre transforms of Eq. 42 lead to relationships at fixed current, etc. as in the usual theory.[8]

Fig. 5 Current-voltage plot calculated using the free energies from Fig. 2 of Ref. [1] along with the assumptions listed in the text. The voltage plotted in this figure is the sum of the five voltage steps between S0-S5. The integrated autocorrelation function is shown using tangent lines according to the FDT (Eq. 48). Although the reversal potential shifts are physically reasonable, inward rectification (opposite known channel behavior) is observed.

The current-voltage characteristics calculated for this system are shown in Fig. 5. The fluctuation-dissipation theorem (Eq. 48) gives the slope of the current-voltage curve, and is plotted as a tangent line at each data point. Noticeable deviations occur at positive voltages due to numerical error in calculating the steady-state flux and long-timescale behavior of the autocorrelation function. This has been traced to very long relaxation times ( $O(10^5)$ steps) for the current, which is in turn due to the low transition probabilities between conduction states with high free energy barriers. The set of energy barriers used leads to larger current magnitudes at hyperpolarized voltages (inward-rectifying behavior), inconsistent with the known operation of the channel. It is of interest to more accurately model the transition energy barriers and determine whether the time-dependence of dwell times for individual states is adequately represented by equations of the present, maximum entropy, form.

5 Conclusions

This work has attempted to formulate the purely statistical content of statistical mechanics in terms of the Bayesian probability theory of Jaynes[33]. From this perspective, thermodynamics is a tool for understanding experimental information and its consequences. In the process, it has become clear that the principles and mathematical methods are of much more general applicability than conventional arguments would lead one to suppose[25] and that a large number of advanced concepts and methods can be synthesized in this way.

Entropy has been defined from the perspective of information theory, representing the (negative) information content of distributions. Because the entropy is maximized upon adding average value information, its first derivative with respect to variations in the distribution is zero. The first law of thermodynamics expressed in Eq. 28 is a direct consequence of this observation. We have shown that the process of adding average value information while maximizing the relative information entropy at each step is transitive. Therefore, adding a series of such constraints in any order will lead to the same distribution, with the sum of the information increments adding to the same value for all paths. Because the entropy was defined only relative to a reference distribution, the information increments are zero whenever the distribution is unchanged by maximizing entropy. Had this relative form been used to define the thermodynamic entropy, the zeroth law of thermodynamics would not require special treatment of nuclear spin multiplicity at zero Kelvin.

Thermostatic partition functions, $Z[A\mathbb{S}_x]$ , have likewise been identified as expressing relative probabilities. Changes in this function correspond to changes in information, and can be understood as a subjective probability assignment determining relative likelihoods between allowed alternative states of the system. Beforespecifying a set of alternate constraints ( $\Omega$ ) the system may choose between to reach statistical equilibrium, the partition function can only take this relative form, as in ( $F \rightarrow G$ ) of Fig. 1. Once a complete set of constraints is specified, then the partition function decides the relative probability of each state within $\Omega$ , and it is possible to say (Eq. 34) that the probability of state $A$ , divided by the probability of $\Omega$ , is the probability of $A$ given that $A$ is in the set $\Omega$ . This interpretation of the partition function leads naturally to multicanonical ensemble and umbrella sampling methods[9].

Comparisons between states of knowledge can be done using these functions, and the picture presented here does not require the specification of a complete set of all possible states of knowledge. Instead, the relations of Sec. 3 give a basic, consistent set of equations for defining the changes between these states. This set already justifies the appearance of (in)distinguishability factors in the partition function, as shown in § 4.1. We have provided a justification for the common indicator function, $w_{\Omega}(\mathcal{C})$ (15), for comparing purely entropic changes in phase space, as well as the Boltzmann factor (25), for comparing changes in maximum entropy information $P(F|\mathbb{S}_x)/P(G|\mathbb{S}_x) = Z[F\mathbb{S}_x]/Z[G\mathbb{S}_x]$ . We have also shown two more advanced examples, generating the multicanonical ensemble in § 4.3 and a conditional maximum entropy in § 4.4. These are related to the first two examples as marginal distributions are related to conditional ones.

The concept of building up thermodynamic equations of state by adding system information is important for developing multi-scale understanding of large physical systems. Because this approach is based on using well-defined system states at each step, the predictions of the coarse-grained theory may be compared with a fully atomistic (or ab-initio electronic) molecular dynamics simulation or coarse-grained Monte-Carlo sampling. At such levels, the number of states will be greatly increased to include coordinates and momenta of all particles, with a change in the energy function to a more accurate approximation. Because this level of description quickly becomes computationally intractable, the approximate potential of mean force derived from high-level considerations may be useful for locating important states for detailed study, deriving stochastic boundary conditions, and applying force or energy biasing sampling techniques.

As is now well known, the statistical machinery outlined here is generally applicable to problems where there is uncertainty. It can be used equally well in reasoning about equilibrium and coarse-graining problems as well as non-equilibrium processes. Starting with a ‘trajectory space’ and adding information on allowed transitions as well as expectation values of fluxes between states leads to a state of knowledge about the process. In such a process, the ability to directly write down the equilibrium distribution (a long-sought goal[57, 51]) disappears in the same way a marginal distribution over coarse-grained variables cannot be directly produced from an equilibrium distribution over all atomistic coordinates and momenta. Instead, the transition distribution can be directly written, and the transient fluxes and eventual steady-state (if it exists) become path averages. A consideration of the information loss for stochastic processes leads to a formula similar to the second law of thermodynamics (39), applicable arbitrarily far from equilibrium. The information entropy functional of the path probability given in Sec. 3 takes on the definition Jaynes’ ‘caliber,’[28] while its Legendre transform (42) is a path free energy functional whose Gibbs relations easily generate Green-Kubo type fluctuation-dissipation theorems.[27, 28, 47, 14] We emphasize that these formulas are not required to be extensive or local,[35, 36, 39] avoid the necessity of defining a steady-state,[12, 66] and are independent of how we define fluxes so that we do not have to immediately write down hydrodynamic equations.[44] The present work has given a necessary statistical foundation for extending these results by carrying over modern equilibrium techniques such as the evaluation of free energy differences[63], and coordinate/path re-weighting techniques[69, 49]. These formulas achieve Jaynes’ goal of providing a “foundation for the predictive aspect of statistical mechanics, in which a single basic principle and method applies to all cases, equilibrium or otherwise.”[26] They imbue non-equilibrium and transient dynamic problems with the same structure as the equilibrium thermodynamics given by Gibbs[17], and open the door for a new understanding of processes far from equilibrium.

Acknowledgements This work was supported, in part, by Sandia’s LDRD program, and, in part, by the National Institutes of Health through the NIH Road Map for Medical Research. Sandia National Laboratories is a multi-program laboratory operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.

A Formal Derivation of Ratios for Undefined Quantities of Probability

There are several axiomatic foundations for probability theory, and the system of Kolmogorov is perhaps the most widely taught and well-known. This system begins by assuming a space of elementary states, $\mathbb{S}_y = {Y_1, Y_2, \dots}$ . We assume that $Y_i$are mutually exclusive and exhaustive. In this case, a probability distribution function can be defined as a measure of a set $P(X|\mathbb{S}y) = \sum{Y_i \in X} P(Y_i|\mathbb{S}_y)$ , with $P(\mathbb{S}_y|\mathbb{S}_y) = 1$ and $P(Y_i|\mathbb{S}_y) \geq 0 \forall Y_i \in \mathbb{S}_y$ . Any possible subset of $\mathbb{S}_y$ then defines an aggregate ‘state.’ Because they are mutually exclusive, separate elementary hypotheses cannot be combined using the ‘and’ operation, only states, such that the probability of state one and state two is the probability of their intersection, $X_1 \cap X_2$ . We have not yet made clear how this structure is related to logical inference.

Formal logic is concerned with proving logical statements from given assumptions. Both the assumptions and the statements to be proven can be stated in the form of logical sentences. Each sentence makes assertions about elementary hypotheses using some combination of the logical operators. For this paper, we assume the Boolean algebra, including the ‘or’ operation (+), as in $X_1 + X_2 \equiv 'X_1 \text{ or } X_2'$ , the ‘and’ operation (*), as in $X_1 X_2 \equiv X_1 \text{ and } X_2$ , and the negation, $\bar{X}_1 = \text{not } X_1$ . Logical statements can be assigned a probability by defining states, $X$ , as hypotheses of the form ‘the system is in state $X$ .’ Each logical sentence then maps to a set of states by replacing union for ‘or’, intersection for ‘and’, and set complementation for ‘not.’ Assuming all statements are either true or false, aggregate operations may be defined from these, for example the mutually exclusive statement $A \oplus B = A\bar{B} + \bar{A}B$ and the implication $A \Rightarrow B = \bar{A} + B$ . The probability of a given statement can then be determined from the probability of the set it implies. In order to make logical inferences from an assumed logical sentence, $\Gamma$ , we require the definition of a conditional probability. This is found by changing the space $\mathbb{S}_y$ to $\mathbb{S}_\Gamma$ via setting $P(Y_i|\Gamma\mathbb{S}_y) = 0 \forall Y_i \notin \mathbb{S}_\Gamma$ and then re-normalizing, resulting in

$P(X|\Gamma\mathbb{S}_y) = \frac{P(X \cap \mathbb{S}_\Gamma|\mathbb{S}_y)}{P(\mathbb{S}_\Gamma|\mathbb{S}_y)}. \quad (49)$

Although very easy to present, this system is unnecessarily restrictive for two reasons. The first is that it requires the definition of a complete space, $\mathbb{S}_y$ , at the outset, which no means of reasoning can remove to add new hypotheses. As we have seen, inference can then only take place by successively reducing this space to smaller regions. By analogy, the process of statistical mechanics would therefore have to begin by assuming a multicanonical ensemble along with its mutually exclusive and exhaustive coordinates, and then derive successively constrained systems. Although a valid derivation can be produced this way, it appears to deny us the ability to define an isolated physical system. The second reason is related to this point. Physically, we would like to begin with the idea of an isolated system and then successively build in more complexity as relevant dynamic variables are discovered. The Komologrov system does not provide a means of reasoning about a hypothesis without first defining its ‘space.’ Instead of adding prior information to the right of the conditional sign, we would like to build up a complete picture of a physical system by successively moving prior information over to the left.

This point was considered by Jaynes[33], who showed that a probability theory ‘without bounds’ could be derived from three desiderata for assigning plausibilities to logical statements. The first was that degrees of plausibility be represented by positive, real numbers. The second and third require that all available prior information is used and that equivalent states of knowledge and reasoning processes lead to identical results. From these desiderata, the product rule (Bayes’ theorem, $P(AB|C) = P(A|BC)P(B|C) = P(B|AC)P(A|C)$ ), may be deduced. Here, the only requirement is that A, B, and C represent information and that the postulate, C, does contradict itself. It is therefore unnecessary to define a space in which A must exist in order to determine its plausibility from C.

One point is worth noting. A logical sentence of the form $\Gamma = A(B+C)(A \oplus D)$ immediately implies A as well as denies D, and provides some information about the statements B and C. However, it does not contain any information whatsoever about an unrelated proposition, F. With some thought, it can be seen that assignment of plausibilities based on a logical sentence must fall into one of four classes: true ( $P(A|\Gamma) = 1$ ), possible ( $P(B|\Gamma)$ ), undecidable ( $P(F|\Gamma)$ ), or impossible ( $P(D|\Gamma) = 0$ ). Probability theory is chiefly concerned with propositions that are possible. However, the product rule also applies to situations in which a proposition is undecidable.

What has been said serves to illustrate the difficulty of reasoning without assuming a set of mutually exclusive and exhaustive alternatives. To demonstrate this concretely, we will attempt to assign probabilities to a general logical statement, $\psi$ , assuming only the principle of indifference, I, and possibly another logical statement, $\Gamma$ . Obviously, the plausibility will be one if $\Gamma \Rightarrow \psi$ and zero if $\Gamma \psi$ constitutes a contradiction. The other situations are shown in the following example.

Consider the meeting of two gamblers who have, by means unspecified, come into possession of a Stern-Gerlach magnet. Being as they are, they decide to place wagers on measurements of a beta ( $\beta^-$ ) decay process. In order to decide the winner, both agree on the same method of classifying the measurement outcome. The most apparent measurement would be whether or not the following event occurred.

A: An electron is observed in the time interval $t, t + dt$ .

However, they find themselves unable to assign $P(A|I)$ because of a large amount of uncertainty on the physics of the experiment. Then one of them notices that the device can tell them not only if an electron has been observed, but also if it has positive or negative spin. This changes their prior information for the problem, since they recognize that there are now two elementary hypotheses: A, observed with spin-up, or B, observed with spin-down. They are believed to be mutually exclusive, so that they know $(A \oplus B)$ . According to the principle of indifference,

$P(A|(A \oplus B)I) = \frac{1}{2}. \quad (50)$

As their previous state of knowledge was unable to distinguish between these two events, it implicitly combined both of these two elementary hypotheses into a single event, which they held to be un-assignable. However, it seems that the principle of indifference should have some bearing on the question of $P(A|I)$ , since in Aristotelian logic, A must always be either true or false. Representing this two-valued foundation of Aristotelian logic as L, Cox has derived the sum rule, $P(A|LI) + P(\bar{A}|LI) = 1$ . In this case, assuming L is equivalent to assuming A and $\bar{A}$ are mutually exclusive events and that one must occur, a situation represented by (50). In the case of Aristotelian logic, then, any reasoning on a proposition, A, on the left-side of $(A|J)$ , must be preceded by assuming A and $\bar{A}$ are mutually exclusive and exhaustive on the right side. An inability to assign $P(A|I)$ based only on I would then amount to some system of logic that does not begin by assuming L. This logical complication in part explainswhy the problem has not yet been directly discussed, as it requires us to reason about statements which are usually considered axiomatic using Aristotelian logic, $L$ .

Is such a system possible, and if so, does it serve any useful purpose? Jaynes considered this relaxation to be required for reasoning about more vaguely defined propositions such as whether a defendant did or did not exercise reasonable judgement in a medical malpractice suit. A corollary of the present question is the construction of a non-Euclidian geometry, which is indeed possible when one does away with the assumption that through one point, only one parallel may be drawn to a given straight line[55]. We argue by analogy that the product rule is more fundamental than the sum rule, and that the most important use of removing the rule ( $A \oplus \bar{A}$ ) is to make explicit the assumptions on how logical propositions must inter-relate. For example, if $A$ represents the proposition that a defendant exercised reasonable judgment, both $A$ and $\bar{A}$ may be held to be absolutely true, but for different choices made by the defendant. In order to make definite conclusions, however, it will be necessary to define a set of mutually exclusive hypotheses – for example by enumerating individual actions and measurable ethical standards. Once a set of mutually exclusive hypotheses is defined, a problem of deciding plausibilities in the absence of this assumption may be reduced to one in Aristotelian form.

In addition to assuming the product rule, it will be necessary to define a set of operations in a reduced Boolean algebra where the plausibility of $A\bar{A}$ may be nonzero. The contradiction in this statement disappears when $\bar{A}$ is defined to be a new proposition, say $B$ , independent of $A$ unless some prior information is present relating the two. It would seem that by thus removing the operation of negation, a reduction to Aristotelian logic is always possible. More precisely, the proposed system of logic, $\phi L$ contains the conjunction and disjunction in the usual sense, but not negation. In order to equate $A$ and $B$ in the Aristotelian sense, we must then know that $A$ and $B$ are mutually exclusive and that one or the other is always true. Because of this property, statements in $\phi L$ cannot be disproven unless some relations between them are first assumed.

We thus add the further relations, ‘ $\oplus$ ’ to mean that two propositions are mutually exclusive and exhaustive, ‘ $\Rightarrow$ ’ to mean that the left proposition is logically equivalent to the conjunction (i.e. $A(A \Rightarrow B) \Leftrightarrow AB(A \Rightarrow B)$ ) and ‘ $\Leftrightarrow$ ’ to mean that two propositions are logically identical. The Aristotelian expansions, $A \Leftrightarrow B = AB + \bar{A}\bar{B}$ and $A \oplus B = A\bar{B} + \bar{A}B$ may not hold in general, and in their place, $\Rightarrow$ and $\Leftrightarrow$ define the set of substitution rules which may be used. The principle of contradiction is thus $P(AB|A \oplus B) = 0$ . No contradiction can be deduced without the mutual exclusivity clause, thus $P(\bar{A}|A \Leftrightarrow \bar{A}(\phi L))$ is undefined, whereas $P(\bar{A}|(A \Leftrightarrow \bar{A})(A + \bar{A})) = 1$ . It is also evident that $A \Leftrightarrow A$ is always to be assumed. From the product rule, $P(A|C) = P(AA|C) = P(A|AC)P(A|C)$ , so that the syllogism is likewise reduced to $P(A|AC) = 1$ , irrespective of whether or not $C$ contains $\bar{A}$ .² At this point, all that can be said of the disjunction is that $A + A$ is equivalent to $A$ , $A \Rightarrow A + B$ and that $P(A + B|C) \geq P(A|C)$ .

Note that $\phi L$ is consistent, since any sentence, $\psi$ , in $\phi L$ can be converted to one, $\psi'$ , in $L$ by symbolically re-labeling elementary propositions such that $\psi$ is true if and only if $\psi'$ is true and $\psi$ is reducible to a contradiction if and only if $\psi'$ is so. The construction of $\psi'$ may be accomplished simply by replacing all negated elementary propositions (only individual literals may be negated in $\phi L$ ) with new elementary propositions. The rules of Aristotelian logic for this sentence are in one to one correspondence with those of $\phi L$ . Note that distributing any negations for an expression in $L$ and adding to this expression an Aristotelian clause, $(A \oplus \bar{A})$ , for each elementary proposition that appears constitutes the reverse transformation.

We now show that it is admissible to use the product rule to completely expand Eq. 50 for comparison to $P(A|I)$ .

$\frac{1}{2} = P(A|(A \oplus B)|I) = \frac{P(A(A \oplus B)|I)}{P(A \oplus B|I)} = \frac{P(A|I)}{P(A \oplus B|I)}$

By defining a set of possible assignments, assuming $A \oplus B$ reduces statements about $A$ and/or $B$ to Aristotelian form. From this example it is evident that the only information required to assign a probability using the principle of indifference is the number of elementary hypotheses which may be measured. For $A(A \oplus B)$ , there is only one hypothesis, and it is formally undecidable. On the contrary, $A \oplus B$ implies that there are two possibilities, since there are two elementary hypotheses $A$ or $B$ making this expression true. Therefore, we define the principle of indifference as one basing its determination of plausibility completely on the number of distinct truth assignments which may confirm a logical expression in $\phi L$ . In the case of only one, undecidable assignment (represented as $\phi$ ), the principle of indifference gives an unknown constant.

$P(A|I) \text{ const.} \equiv P(\phi|I) \quad (51)$

According to the product rule, principle of indifference must therefore assign a likelihood to compound propositions as $P(A \oplus B|I) = 2P(\phi|I)$ .

In general a logical sentence, $\Gamma$ , represents some assumptions on certain, contradictory, possible, and identical hypotheses. Writing the set of literals contained by $\Gamma$ as $x_1, x_2, \dots, x_n$ , a basic set of hypotheses, $\mathbb{S}_\psi$ , consists of the $2^n - 1$ conjunctions from all possible (non-null) combinations of the $x_i$ . However, only some subset, $\Omega$ , of $\mathbb{S}_\psi$ will be possible given $\Gamma$ . We may use the principle of indifference to assign likelihoods over this space as

$P(\psi|\Gamma I) = \frac{1}{|\Omega(\Gamma)|}, \quad \psi \in \Omega \subseteq \mathbb{S}_\psi$

Next, we rigorously define $\Omega$ as the set of all conjunctions of $x_i$ (i.e. $\psi \in \mathbb{S}_\psi$ ) that raise the status of $\Gamma$ to certainty ( $\psi \Rightarrow \Gamma$ so that $P(\Gamma|\psi I) = 1$ ). Other $\psi'$ with either be undecidable, with no relevance to $\Gamma$ , or contradictory ( $P(\Gamma|\psi' I) = P(\psi'|\Gamma I) = 0$ ); neither will contribute to Eq. 5. Since each literal is represented as a unique $x_i$ , we may visualize the set of conjunctions in the

² However, it does not make sense to admit logically contradictory prior information such as $AB(A \oplus B)$ .usual sense of a truth table, where false is taken to mean ‘not present.’ The product rule is now sufficient to show that

$\begin{aligned} P(\psi|\Gamma I) &= \frac{1}{|\Omega|}, \psi \in \Omega(\Gamma) \\ &= \frac{P(\Gamma\psi|I)}{P(\Gamma|I)} = \frac{P(\Gamma|\psi I)P(\psi|I)}{P(\Gamma|I)} \\ &= \frac{P(\varphi|I)}{P(\Gamma|I)} \\ &\Rightarrow P(\Gamma|I) = |\Omega|P(\varphi|I). \end{aligned}$

We note that $\Omega$ often takes the form of a product space, e.g. for $\Gamma = (\oplus(A, B, C, \dots))(\oplus(A', B', C', \dots))(\dots)$ . In this expression, we have defined $\oplus(\cdot)$ as expanding to a conjunction of exclusive-ors on all pairwise combinations in $\cdot$ , so that only one expression from the argument set may be true at once. When $\Omega$ is explicitly present as an assumption, then we may define each element of $\Omega$ as an elementary state to give the Komologrov system of probability, in which the sum rule,

$P(\Omega'|\Omega I) = \sum_{\psi \in \Omega'} P(\psi|\Omega I) = \sum_{\psi \in \Omega'} \frac{P(\psi|I)}{P(\Omega|I)}, \Omega' \subseteq \Omega, \quad (52)$

becomes valid. However, it should be noted that the set $\Omega$ is not itself elementary, but instead constructed from elementary hypotheses of the form ‘ $x_1$ is true,’ etc. If the prior information, $\Gamma$ , does not state that $x_1$ and $x_2$ are mutually exclusive, for example if $\Gamma = x_1 + x_2 = (x_1 + x_2)(x_1 + x_2) = x_1 + x_2 + x_1x_2$ , then $\Omega = {x_1, x_2, x_1x_2}$ . In the present paper, we use $\Omega$ instead of $\Gamma$ , since we have not proved that the reverse mapping $\Omega \rightarrow \Gamma$ is unique.

References

Åqvist, J., Luzhkov, V.: Ion permeation mechanism of the potassium channel. Nature 404(6780), 881–884 (2000). DOI 10.1038/35009114. URL http://dx.doi.org/10.1038/35009114
Aczél, J.: A Short Course on Functional Equations and their Applications. D. Reidel, Dordrecht (1987)
Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: International Symposium on Information Theory, 2^nd, pp. 267–281 (1973)
Baiesi, M., Maes, C., Neton, K.: Computation of current cumulants for small nonequilibrium systems. Journal of Statistical Physics 135(1), 57–75 (2009). URL http://dx.doi.org/10.1007/s10955-009-9723-3
Beck, T.L., Paulaitis, M.E., Pratt, L.R.: The Potential Distribution Theorem and Models of Molecular Solutions. Cambridge, New York (2006)
Bergmann, P.G., Lebowitz, J.L.: New approach to nonequilibrium processes. Phys. Rev. 99(2), 578–587 (1955). DOI 10.1103/PhysRev.99.578
Bimbó, K., Dunn, J.M.: Four-valued logic. Notre Dame J. Formal Logic 42(3), 171–192 (2001)
Callen, H.B.: Thermodynamics and an Introduction to Thermostatics. Wiley (1985). 2^nd ed.
Chipot, C., Pohorille, A. (eds.): Free Energy Calculations. Springer (2007)
Cox, R.T.: The algebra of probable inference. Johns Hopkins Univ. Press, Baltimore, MD (1961)
Crooks, G.E.: Entropy production fluctuation theorem and the nonequilibrium work relation for free energy differences. Phys. Rev. E 60(3), 2721–2726 (1999). DOI 10.1103/PhysRevE.60.2721
Crooks, G.E.: Path-ensemble averages in systems driven far from equilibrium. Phys. Rev. E 61(3), 2361–2366 (2000). DOI 10.1103/PhysRevE.61.2361
Cuello, L.G., Jogini, V., Cortes, D.M., Perozo, E.: Structural mechanism of C-type inactivation in K⁺ channels. Nature 466(7303), 203–208 (2010). DOI 10.1038/nature09153. URL http://dx.doi.org/10.1038/nature09153
Dewar, R.: Information theory explanation of the fluctuation theorem, maximum entropy production and self-organized criticality in non-equilibrium stationary states. J. Phys. A: Math. Gen. 36, 631–641 (2003)
Ehrenfest, P., Ehrenfest, T.: The conceptual foundations of the statistical approach in mechanics. Cornell University Press, Ithaca NY (1959). English translation of Encykl. Math. Wiss. 1912. by M. J. Moravcsik
Friedman, H.L., Krishnan, C.V.: Thermodynamics of ion hydration. In: F. Franks (ed.) Water: A Comprehensive Treatise. Plenum Press, New York (1973)
Gibbs, J.W.: Elementary principles in statistical mechanics. C. Scribner’s sons (1902)
Grandy, W.T.: Foundations of Statistical Mechanics. Kluwer, Boston (1987)
Hamblin, C.L.: One-valued logic. The Philosophical Quarterly 17(66), 38–45 (1967)
Hamill, O.P., Marty, A., Neher, E., Sakmann, B., Sigworth, F.J.: Improved patch-clamp techniques for high-resolution current recording from cells and cell-free membrane patches. Pflügers Arch. Eur. J. Physiol. 391(2), 85–100 (1981). URL http://dx.doi.org/10.1007/BF00656997
Hansmann, U.H.E., Okamoto, Y.: Monte carlo simulations in generalized ensemble: Multicanonical algorithm versus simulated tempering. Phys. Rev. E 54(5), 5863–5865 (1996). DOI 10.1103/PhysRevE.54.5863
Hille, B.: Ion Channels of Excitable Membranes. Sinauer (2001). Third Ed.
Hummer, G.: Position-dependent diffusion coefficients and free energies from Bayesian analysis of equilibrium and replica molecular dynamics simulations. New J. Phys. 7, 34 (2005). DOI 10.1088/1367-2630/7/1/034
Jarzynski, C.: Rare events and the convergence of exponentially averaged work values. Phys. Rev. E 73, 046,105 (2006)---
Jaynes, E.T.: Information theory and statistical mechanics. Phys. Rev. 106(4), 620–630 (1957). DOI 10.1103/PhysRev.106.620
Jaynes, E.T.: Information theory and statistical mechanics. II. Phys. Rev. 108(2), 171–190 (1957). DOI 10.1103/PhysRev.108.171
Jaynes, E.T.: Where do we stand on maximum entropy? In: R.D. Levine, M. Tribus (eds.) The Maximum Entropy Formalism, p. 498. M.I.T Press, Cambridge (1979)
Jaynes, E.T.: The minimum entropy production principle. Ann. Rev. Phys. Chem. 31, 579–601 (1980)
Jaynes, E.T.: The evolution of carnot's principle. In: EMBO Workshop on Maximum-Entropy Methods, vol. 1, pp. 267–282 (1984). Reprinted by Ericksen & Smith in 1988.
Jaynes, E.T.: Predictive statistical mechanics. In: G.T. Moore, M.O. Scully (eds.) Frontiers of Nonequilibrium Statistical Physics, p. 33. Plenum Press, New York (1986)
Jaynes, E.T.: Clearing up mysteries - the original goal. In: J. Skilling (ed.) Maximum-Entropy and Bayesian Methods, p. 1. Kluwer, Dordrecht (1989)
Jaynes, E.T.: The Gibbs Paradox. In: C.R. Smith, G.J. Erickson, P.O. Neudorfer (eds.) Maximum Entropy and Bayesian Methods, pp. 1–22. Kluwer, Dordrecht (1992)
Jaynes, E.T.: Probability Theory: the Logic of Science. Cambridge, Cambridge (2003)
Jaynes, E.T., Rosenkrantz, R.D.: Papers on Probability, Statistics and Statistical Physics. Kluwer, Boston (1989)
Jou, D., Casas-Vázquez, J., Lebon, G.: Extended irreversible thermodynamics. Rep. Prog. Phys. 51, 1105–1179 (1988)
Jou, D., Casas-Vázquez, J., Lebon, G.: Extended irreversible thermodynamics revisited. Rep. Prog. Phys. 62, 1035–1142 (1999)
Kawai, R., Parrondo, J.M.R., den Broeck, C.V.: Dissipation: The phase-space perspective. Phys. Rev. Lett. 98(8), 080,602 (2007). DOI 10.1103/PhysRevLett.98.080602
Kirkwood, J.G.: Statistical mechanics of fluid mixtures. J. Chem. Phys. 3(5), 300–313 (1935). DOI 10.1063/1.1749657. URL http://link.aip.org/link/?JCP/3/300/1
Kjelstrup, S., Bedeaux, D.: Non-Equilibrium Thermodynamics Of Heterogeneous Systems (Series on Advances in Statistical Mechanics). World Scientific (2008)
Lebowitz, J.L.: Stationary nonequilibrium gibbsian ensembles. Phys. Rev. 114(5), 1192–1202 (1959). DOI 10.1103/PhysRev.114.1192
Leff, H.S., Rex, A.F.: Maxwell's demon 2: entropy, classical and quantum information, computing. IOP Publishing, London (2003)
Liebovitch, L.S., Fischbarg, J., Koniarek, J.P.: Ion channel kinetics: a model based on fractal scaling rather than multistate Markov processes. Math. Biosci. 84(1), 37–68 (1987). DOI 10.1016/0025-5564(87)90042-3. URL http://www.sciencedirect.com/science/article/B6VHX-45F51WT-47/2/1748db2fd83bd3a67841eb30f9b348df
Lu, N., Kofke, D.A.: Accuracy of free-energy perturbation calculations in molecular simulation. I. Modeling. J. Chem. Phys. 114(17), 7303–7311 (2001). DOI 10.1063/1.1359181. URL http://link.aip.org/link/?JCP/114/7303/1
Luzzi, R., Áurea R. Vasconcellos, ao Ramos, J.G.: Predictive statistical mechanics: a nonequilibrium ensemble formalism. Kluwer, Dordrecht (2002)
Lyman, E., Ytreberg, F.M., Zuckerman, D.M.: Resolution exchange simulation. Phys. Rev. Lett. 96, 028,105 (2006)
Mackey, M.C.: The dynamic origin of increasing entropy. Rev. Mod. Phys. 61(4), 981 (1989). DOI 10.1103/RevModPhys.61.981
Maes, C.: The fluctuation theorem as a gibbs property. J. Stat. Phys. 95(1/2), 367–392 (1999). DOI 10.1023/A:1004541830999
Mehra, J.: Josiah Willard Gibbs and the foundations of statistical mechanics. Foundations of Physics 28(12) (1998)
Minh, D.D.L., Chodera, J.D.: Optimal estimators and asymptotic variances for nonequilibrium path-ensemble averages. The Journal of Chemical Physics 131(13), 134,110 (2009). URL http://dx.doi.org/10.1063/1.3242285
von Neumann, J.: Mathematical Foundations of Quantum Mechanics. Princeton Univ. Press (1996). Translated by Robert T. Beyer
Niven, R.K.: Steady state of a dissipative flow-controlled system and the maximum entropy production principle. Phys. Rev. E 80(2), 021,113 (2009). DOI 10.1103/PhysRevE.80.021113
Omnès, R.: Logical reformulation of quantum mechanics. i. foundations. Journal of Statistical Physics 53, 893–932 (1988). URL http://dx.doi.org/10.1007/BF01014230. 10.1007/BF01014230
Onsager, L.: Reciprocal relations in irreversible processes. I. Phys. Rev. 37(4), 405–426 (1931). DOI 10.1103/PhysRev.37.405
Pohorille, A., Darve, E.: A Bayesian approach to calculating free energies in chemical and biological systems. AIP Conference Proceedings 872(1), 23–30 (2006). DOI 10.1063/1.2423257. URL http://link.aip.org/link/?APC/872/23/1
Poincaré, H.: Science and hypothesis. C. Scribner's Sons, New York (1907). English translation by Sir Joseph Lamor
Pólya, G.: Mathematics and Plausible Reasoning. Princeton Univ. Press (1954). 2 vols
Robin, W.A.: Non-equilibrium thermodynamics. J. Phys. A: Math. Gen. 23, 2065–2085 (1990)
Rogers, D.M., Beck, T.L.: Modeling molecular and ionic absolute solvation free energies with quasi-chemical theory bounds. J. Chem. Phys. 129(13), 134505 (2008). DOI 10.1063/1.2985613. URL http://link.aip.org/link/?JCP/129/134505/1
Rogers, D.M., Beck, T.L.: Resolution and scale independent function matching using a string energy penalized spline prior. ArXiv e-print (2010)
Rogers, D.M., Rempe, S.B.: A first and second law for nonequilibrium thermodynamics: Maximum entropy derivation of the fluctuation-dissipation theorem and entropy production functionals (2011). Submitted, eprint:255413
Roux, B., Allen, T., Bernèche, S., Im, W.: Theoretical and computational models of biological ion channels. Quarterly Rev. Biophys. 37(01), 15–103 (2004). DOI 10.1017/S0033583504003968
Schrödinger, E.: Statistical thermodynamics. Cambridge (1967)
Shirts, M.R., Chodera, J.D.: Statistically optimal analysis of samples from multiple equilibrium states. The Journal of Chemical Physics 129(12), 124,105 (2008). URL http://dx.doi.org/10.1063/1.2978177---
1. Sriraman, S., Kevrekidis, I.G., Hummer, G.: Coarse master equation from Bayesian analysis of replica molecular dynamics simulations. J. Phys. Chem. B 109(14), 6479–6484 (2005). DOI 10.1021/jp046448u. URL http://pubs.acs.org/doi/abs/10.1021/jp046448u. PMID: 16851726
1. Torrie, G.M., Valleau, J.P.: Nonphysical sampling distributions in monte carlo free-energy estimation: Umbrella sampling. J. Comput. Phys. 23(2), 187–199 (1977). DOI 10.1016/0021-9991(77)90121-8. URL http://www.sciencedirect.com/science/article/B6WHY-4DDR2HH-3V/2/0884c987f3a8b098b688e68aa847371f
1. Trepagnier, E.H., Jarzynski, C., Ritort, F., Crooks, G.E., Bustamante, C.J., Liphardt, J.: Experimental test of hatano and sasa's nonequilibrium steady-state equality. Proc. Nat. Acad. Sci. USA 101(42), 15,038–15,041 (2004)
1. van Kampen, N.G.: Stochastic Processes in Physics and Chemistry. Elsevier, Amsterdam (2007). 3^rd Ed.
1. Wonderlin, W., Finkel, A., French, R.: Optimizing planar lipid bilayer single-channel recordings for high resolution with rapid voltage steps. Biophys. J. 58(2), 289–297 (1990). DOI 10.1016/S0006-3495(90)82376-6. URL http://www.sciencedirect.com/science/article/B94RW-4V8S1XS-1/2/bfea0f3a475894b091e6d33a4ea759b0
1. Ytreberg, F.M., Zuckerman, D.M.: Single-ensemble nonequilibrium path-sampling estimates of free energy differences. J. Chem. Phys. 120, 10,876 (2004). DOI 10.1063/1.1760511. Note: JCP 121,5022(2004) corrects the Metropolis criterion in the text above Eq. 9.
1. Zuckerman, D.M., Woolf, T.B.: Dynamic reaction paths and rates through importance-sampled stochastic dynamics. J. Chem. Phys. 111(21), 9475–9484 (1999). DOI 10.1063/1.480278. URL http://link.aip.org/link/?JCP/111/9475/1
1. Zwanzig, R.: Ensemble method in the theory of irreversibility. J. Chem. Phys. 33(5), 1338–1341 (1960). DOI 10.1063/1.1731409. URL http://link.aip.org/link/?JCP/33/1338/1

Xet Storage Details

Size:: 123 kB
Xet hash:: db96cb248edadd351f24e08efa15a592c676e40b923b10ae83650262901462cc

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.