Abstract
We construct a thermodynamic potential that can guide training of a generative model defined on a set of binary degrees of freedom. We argue that upon reduction in description, so as to make the generative model computationally manageable, the potential develops multiple minima. This is mirrored by the emergence of multiple minima in the free energy proper of the generative model itself. The variety of training samples that employ binary degrees of freedom is ordinarily much lower than the size 2 of the full phase space. The nonrepresented configurations, we argue, should be thought of as comprising a high-temperature phase separated by an extensive energy gap from the configurations composing the training set. Thus, training amounts to sampling a free energy surface in the form of a library of distinct bound states, each of which breaks ergodicity. The ergodicity breaking prevents escape into the near continuum of states comprising the high-temperature phase; thus, it is necessary for proper functionality. It may, however, have the side effect of limiting access to patterns that were underrepresented in the training set. At the same time, the ergodicity breaking within the library complicates both learning and retrieval. As a remedy, one may concurrently employ multiple generative models—up to one model per free energy minimum.
1 Motivation
Training sets and empirical data alike are often processed using representations that do not have an obvious physical meaning or are not optimized for the specific application computation-wise. Of particular interest are binary representations of information as would be pertinent to digital computation. Not only do the values of the binary variables depend on the detailed digitization recipe, but the number of training samples will usually be vastly smaller than the size of the full phase space available, in principle, to binary variables. Given this, one is justified in asking whether a reduced description exists that uses a relatively small number of variables and parameters to efficiently document the empirically relevant configurations. At the same time, it is desirable for the reduced description to be robust with respect to the choice of a discretization procedure that is used to present the original data set.
The problem of finding reduced descriptions is relevant for all fields of knowledge, of course, and is quite difficult in general. For example, the state of an equilibrated collection of particles is unambiguously specified by the expectation value of local density in a broad range of temperature and pressure (Evans, 1979). Particles exchange places on times comparable to or shorter than typical vibrational times, implying it is unnecessary to keep track of the myriad coordinates of individual particles. The equilibrium density profile is a unique, slowly varying function of just three spatial coordinates. Yet under certain conditions, the translational symmetry becomes broken: One may no longer speak of an equilibrium density profile that is unique or smooth. Instead, one must keep track of a large collection of distinct, rapidly varying density profiles, each of which corresponds to a metastable solid; these profiles can be regarded as equilibrated with respect to particles’ vibrations but not translations. For instance in a glassy melt (Lubchenko, 2015; Lubchenko & Wolynes, 2007), the number of alternative, metastable structures scales exponentially with the system size while the free energy surface becomes a vastly degenerate landscape that breaks ergodicity.
Here we address the problem of finding reduced descriptions in the context of machine learning. The complete description in the present setup is realized, by construction, through a generative model that is a universal approximator to an arbitrary digital data set. That is, one can always choose such values for the model’s parameters that the model will eventually have generated any given ensemble of distinct binary sequences of length . The generative model is in the form of an Ising spin-based energy function, each spin representing a binary number. Ising spin-based generative models have been employed for decades (Hopfield, 1982; Laydevant et al., 2024; Mohseni et al., 2022), of course. The present energy function has the functional form of the higher-order Boltzmann machine (Sejnowski, 1986) and generally contains every possible combination of the spins. The learning rules are, however, different in that the coupling constants are deterministically expressed through the log weights of individual sequences in the ensemble we want to reproduce. The retrieval is performed by Gibbs-sampling the Boltzmann distribution of the resulting energy function at a nonvanishing temperature.
This study begins by constructing effective thermodynamic potentials whose arguments are the parameters of the complete generative model. Each of these potentials is uniquely minimized by the optimal values of the coupling constants—in the complete description—and can be generalized so as to reflect correlations among distinct subensembles. We note that effective thermodynamic potentials for a variety of generative models have been considered in the past. These are exemplified by the Helmholtz machine (Dayan et al., 1995) or the loss function for the restricted Boltzmann machine (Montúfar, 2018), among others.
Specific inquiries during retrieval in the present generative model are made by imposing a constraint of user’s choice. Of particular interest are constraints in the form of an additive contribution to the energy function that can stabilize a particular combination of the spins. This is analogous to how in a particle system, one can use an external—or “source”—field to stabilize a desired density profile (Lubchenko, 2015; Evans, 1979). If the system is ergodic, one may then use a Legendre transform to obtain a free energy as a function of the density profile. The latter procedure is a way to obtain a description in terms of variables of interest. At the same time, it represents a type of coarse graining. Likewise, here we employ appropriate source fields to produce a description in terms of variables of interest. The new degrees of freedom reflect weighted averages of the original spin degrees; thus their energetics are governed by a free energy. We show that when the source fields are turned off, this free energy becomes the thermodynamic potential for the coupling constants. Thus, learning and retrieval, respectively, can be thought of as minimizations on a conjoint free energy surface.
The total number of the coupling constants in the complete description, , becomes impractically large already for trivial applications, which then prompts one to ask whether the description can be reduced in some controlled way. The most direct way to reduce the description is to simply omit some terms from the energy function; the number of such terms increases combinatorially with the order of the interaction. We show that following a reduction in description, however, the free energy will increase nonuniformly over the phase space so as to develop multiple minima of comparable depth. By co-opting known results from statistical mechanics, we argue that depending on the application, the amount of minima could be so large as to scale exponentially with the number of variables.
The multiplicity of minima makes the choice of an optimal description ambiguous. Conversely, a successful reduction in description (Merchan & Nemenman, 2016; Davtyan et al., 2012) implies that the underlying interactions are low rank.
Using the conjoint property of the free energy of learning and retrieval, respectively, we provide an explicit, rather general illustration of how the appearance of multiple minima in the learning potential also signals a breaking of ergodicity in the phase space spanned by the actual degrees of freedom of the model. Hereby the phase space of the spin system becomes fragmented into regions separated by free energy barriers (Goldenfeld, 1992). Consequently, escape rates from any given free energy minimum can become very low because they scale exponentially with the temperature and the parameters of the model (Chan & Lubchenko, 2015). Ergodicity breaking has been observed in restricted Boltzmann machines (Béreux et al., 2023; Decelle & Furtlehner, 2021), thus suggesting the latter machines represent reduced descriptions.
The ergodicity breaking implies that quantities such as the energy and entropy, among others, are no longer state functions; instead, they could at best be thought of as multivalued functions. Hence the notions of free energy and entropy—as well as the associated probability distribution—all become ill defined. Indeed, having already two alternative free energy minima in a physical system corresponds to a coexistence of two distinct physical phases. Consider, for example, water near liquid-vapor coexistence at the standard pressure. The entropy of the system can vary by about per particle, depending on the phase, according to the venerable Trouton’s rule for the entropy of boiling (Berry et al., 1980). Thus, the entropy and the free energy, as well as many other thermodynamic quantities, are poorly defined. The uncertainty in the density of the system is particularly dramatic, about three orders in magnitude (Lubchenko, 2020).
As a potential remedy for the fragmentation of the space of the coupling constants caused by the reduction in description, one may employ a separate generative model for an individual minimum, or phase. Such individual models would each be ergodic and can be thought of as a Helmholtz machine (Dayan et al., 1995), since free energies can be defined unambiguously for a single-phase system. Running the machine would then be much like consulting multiple experts at the same time. Such multiexpert inquiries may in fact be unavoidable when distinct models employ sufficiently dissimilar interactions. Consider liquid-to-solid transitions as an example. Not only are such transitions intrinsically discontinuous (Lubchenko, 2015), but practical descriptions of the two respective phases involve altogether different variables (Lubchenko, 2017), as alluded to above.
Since the number of possible configurations of bits becomes huge even for modest values of , the vast majority of all possible configurations of the system are automatically missing from the training set. Because of their thermodynamically large entropy, we argue, these nonrepresented configurations can overwhelm the output of the machine. To avoid instabilities of this sort, the nonrepresented states must be placed at higher energies than the represented configurations; furthermore, the two sets of configurations should be separated by an extensive energy gap. Thus, the missing configurations comprise a higher-temperature phase; this also implies a breaking of ergodicity. The robustness of the generative model, then, comes down to preventing a transition toward this high-temperature phase. At a fixed temperature, such stability is achieved by parameterizing the spectrum of the nonrepresented states so as to make the energy gap sufficiently large. Conversely, at a fixed value of the gap, the retrieval temperature must be set below the transition temperature between the low-entropy and high-entropy phases, respectively. This will, however, limit one’s ability to retrieve those configurations that were relatively underrepresented in the training set.
The article is organized as follows. In section 2, we construct a conjoint free energy surface whose arguments are the parameters of the generative model and, at the same time, coarse-grained values of the original degrees of freedom the complete description operates on. The formalism allows one to standardize weights of individual configurations in a data set as a means to implement calibration of sensors as well as manage bias or duplication in the data, if any. In section 3, we find ergodicity breaking in the simplest possible realization of a reduced description. We make a connection with the ergodicity breaking during the physical phenomenon of phase coexistence and discuss implications for machine learning. Section 4 provides a thermodynamically consistent treatment of incomplete data sets and argues that knowledge can be thought of as a library of bound states whose free energy is lower than the free energy of the nonrepresented states. Section 5 provides a summary and some perspective.
2 Thermodynamics of Learning and Retrieval
2.1 Setup of the Generative Model
the summation being over the points comprising the Hamming space. We will consistently label the latter points in the Hamming space using Latin indices, as well as any other quantities assigned to those points.
By construction, the weight is intended to specify what a chemist would call the “mole fraction” of configuration . The weight may or may not be associated with a probability, depending on the context; the weights are assigned according to a user-defined convention as would be mole fractions in chemistry, where one must explicitly specify what is meant by a “species,” “particle,” and so on. We set aside until section 4 the obvious issue that even for a very modestly sized system, obtaining or storing -worth of quantities is impractical. For now, we simply assume that for configurations not represented in the data set, the respective weights are assigned some values of one’s choice. In any event, the quantities do not have to be integer. For instance, pretend we are teaching a machine, by example, the behavior of the inverter gate. Four distinct configurations are possible in principle: (1) , (2) , (3) , and (4) , where one arrow stands for the input bit and the other arrow for the output bit. For concreteness, let us set at the number of times configuration was presented in the set. Suppose the training set is , , , . It will be useful to regard as an adjustable parameter, even if we eventually adopt for it a fixed value that is very small relative to the rest of the ’s.
as “weighted sums,” or “averages,” or “expectation values.”
It is not immediately obvious how much the coefficients from equation 2.11—and hence the generative model itself—would vary among data sets originating from distinct sources or experiments that one may nonetheless deem being qualitatively similar or even equivalent. In the happy event that the variation is indeed small, such robustness could be thought of, by analogy with thermodynamics, as the couplings being subject to a smoothly varying free-energy surface. With the aim of constructing such a free-energy surface, we next discuss a variety of ways to connect our generative model with a data set and, conversely, how to retrieve patterns learned by the model.
2.2 Data Retrieval and Calibration: Learning Rules
Retrieval in the setup above is performed by Gibbs sampling, at temperature , the energy function obtained by computing with the help of equation 2.6, and then subtracting from it the quantity . If parallel tempering is used, the quantity in the last expression is fixed at the target temperature, of course.
The standard value serves to calibrate the weight of configuration , if presented, in the data set; the quantities are nonvanishing, by construction. Let us provide several examples of where such calibration is useful or even necessary: (1) If distinct configurations are detected using separate sensors, the outputs of the latter sensors must be calibrated against each other. To drive this point home, imagine that two or more detectors operate using different physical phenomena. For instance, one may determine temperature by using a known equation of state for a material or by spectroscopic means, among many others. Outputs of distinct detectors can be and must be mutually calibrated where the respective ranges of detection overlap. (2) Calibration may be necessary to manage duplication in the data set or a bias, if any, in the acquisition or publication of the data (see Beker et al., 2022, for a discussion of such a bias in the context of using machine learning to predict optimal reaction conditions). (3) One can think of the quantity as the intrinsic entropy of state , according to the second equality in equation 2.12. The latter entropy could, for instance, reflect the log-number of states of a hidden degree of freedom, when the visible variables happen to be in configuration . (4) Variation with respect to the “local” energy difference is analogous to the material derivative, whereby the local reference energy can be thought of as specifying context. This is useful if the inputs exhibit correlations and/or one wishes to parameterize differences among distinct inputs (see appendix A). There we also compare the present use of standard values to that in chemistry. Finally, one may choose to regard the standard values and as inherently distributed, in a Bayesian spirit.
According to equation 2.12, the “local” energy deviations reflect how much the coupling constants would have to be perturbed from their standard values to shift the distribution from its standard value. Suppose, for the sake of argument, the latter shift is due to incoming additional data and that we set at the number of instances of configuration . Then equation 2.11 effectively prescribes a set of learning rules. For instance, a two-body coupling constant will be modified according to . We see this learning rule is similar to, but distinct from, the venerable Hebbian rule, in which instances would instead add up cumulatively. Here, in contrast, the coupling constants are weighted by the log-number of instances of reinforcement.
2.3 Free Energy of Learning
According to equations 2.11 and 2.12, there is one-to-one correspondence between the quantities and the coupling constants . Thus equation 2.11 can be viewed as the solution of a minimization problem in which one minimizes the thermodynamic potential with respect to the coupling constants , while keeping exactly one coupling constant fixed. (It is most convenient to fix , which simply specifies an overall multiplicative factor for the numbers and, hence, does not affect the weights .). In this sense, one can think of as being able to guide a learning process in principle.
What is the meaning of the distribution , which one may nominally associate with the potential ? Analogous distributions arise in thermodynamics as a result of the canonical construction. Hereby one effectively considers an infinite ensemble of distinct but physically equivalent replicas of the system (McQuarrie, 1973). The width of the distribution then should be formally thought of as variations of the weights among distinct replicas. To generate such ensembles in practice, one can, for instance, break up a very large data set into smaller—but still large—partial data sets. In some cases, data may exhibit correlations due to implicit or hidden variables, such as the time and place of collection. If so, breaking data sets into pertinent subsets may reveal correlations among fluctuations of the weights —from subset to subset—that are not necessarily captured by the ideal mixture-like expression, equation 2.29. Adopting a particular calibration scheme for the inputs of the sensors may also introduce correlations among weight fluctuations. We outline, in appendix A, a possible way to modify the free energy so as to account for the latter correlations.
2.4 Coarse-Graining, Choice of Description, and Inquiries
Equations 2.48 through 2.52 thus complete the formal task of building a conjoint thermodynamic potential for the coupling constants, on the one hand, and for the expectation values of the degrees of freedom, on the other hand. The thermodynamic potentials and can be thought of as constrained free energies of the system, where the constraint is due to deviations from the standard model. We see that the constrained versions of the Helmholtz and Gibbs energies are in the same relation to each other as their unconstrained counterparts.
Relation 2.42 can be profitably thought of as specifying stationary points of a tilted free energy surface as a function of the magnetizations , where the fields are treated as constants representing externally imposed fields. If there is only one stationary point, it corresponds to a unique minimum; the depth of the minimum is then equal to the equilibrium Gibbs energy . If, however, a portion of the surface happens to exhibit a negative curvature, then there is a way to tilt the latter surface so that it will now exhibit more than one minimum. If such additional minima are in fact present—implying equation 2.42 has multiple solutions—one is no longer able to unambiguously define the Gibbs energy or other thermodynamic quantities including the entropy and energy. Consequently, the coarse-graining scheme itself becomes ambiguous. These notions apply equally to the constrained free energies and , per equations 2.50 to 2.52, which is central to the discussion of reduced descriptions in section 3.
Equations 2.40 and 2.37 also embody a particular way to implement specific inquiries in the present formalism. Suppose, for the sake of argument, that we wish to recover a pattern from a cue in the form of a subset of spins, each fixed in a certain direction, similar to how one would retrieve in the Hopfield network (Hopfield, 1982). This would formally correspond to setting , the two options corresponding to spin polarized up and down, respectively. More generally, the setup in equation 2.37 allows one to impose a broad variety of constraints, whose rigidity can be tuned by varying the magnitude of the pertinent source field.
3 Reduced Descriptions Break Ergodicity
The number of coupling constants one can employ in practice is much less than the number of the parameters constituting the full description from equation 2.11. This implies that practical descriptions are reduced essentially by construction. Here we argue that the task of finding such reduced, practical descriptions is, however, intrinsically ambiguous. The section is organized as follows. First, we attempt to emulate a very simple data set using a trial generative model that is missing a key interaction from the actual model underlying the data set. We will observe that in contrast with the complete description, the reduced free energy surface is no longer single-minimum but instead develops competing minima. We next show that the Helmholtz free energy of the reduced model also develops competing minima; these minima correspond exactly with the minima of the potential . This connection is then used to show that reduced descriptions transiently break ergodicity, which will act to stymie both training and retrieval. A potential remedy will be proposed.
The free energy surface from equation 2.29, as a function of the coupling constants, for the standard model from equation 3.3; . Two select cross-sections are shown. The cross-section contains the global minimum of , while the cross-section corresponds to a reduced description in which the two-body interaction is missing. , .
The free energy surface from equation 2.29, as a function of the coupling constants, for the standard model from equation 3.3; . Two select cross-sections are shown. The cross-section contains the global minimum of , while the cross-section corresponds to a reduced description in which the two-body interaction is missing. , .
(a, c) Contour plots corresponding to the and slices, respectively, from Figure 1. (b) A special intermediate situation, where the surface just begins to develop two distinct minima: , .
(a, c) Contour plots corresponding to the and slices, respectively, from Figure 1. (b) A special intermediate situation, where the surface just begins to develop two distinct minima: , .
Because of the combinatorial multiplicity of high-rank couplings, we are particularly interested in such trial generative models that employ the smallest possible number of high-rank couplings. Thus, for the two spins, we focus on the cross-section of the overall free energy surface. As can be seen in Figure 1, the function is minimized not at a unique set of the two remaining variables and but instead is bistable. The two minima are exactly degenerate. This means the optimal choice of the coupling constants— and in this case—is no longer obvious. Indeed, already an infinitesimal change in the coupling constants of the standard model from equation 3.3 can result in a discrete change in the position of the lowest minimum.
Suppose, for the sake of argument, that we simplemindedly accept the , minimum as the result of training and ignore the other minimum. This will stabilize one of the correct configurations of the gate, , but it will also make the remaining correct configuration of the gate the least likely one out of the four available configurations during retrieval. As yet another candidate strategy, suppose we accept not the location of an individual minimum of as the result of training, but, instead, use a Boltzmann-weighted average over the minima. For the generative model, equation 3.3, this would result in accepting as the result of training, a nonsensical outcome.
If one sets all the coupling constants of order two and higher to zero in the generative model, equation 2.6, in order to reduce the description or otherwise, the latter model becomes equivalent to the mean-field ansatz equation 3.4. This circumstance allows us to use the same notation: for the thermodynamic potential as a function of either the coupling constants , on the one hand, or the magnetizations , on the other hand.
Dashed lines: the slice of the free energy surface from equation 3.9 for three select values of . Solid lines: the respective values of the exact Helmholtz energy for the energy function, equation 3.1. .
The two minima on the meanfield curve for , Figure 3, are precisely equivalent to the two minima of in the plane in Figure 1, because the generative model, equation 2.6, at (at ) is equivalent to the meanfield ansatz, equation 3.4, as already alluded to.
According to the discussion following equation 2.52, we do not expect the thermodynamics of the setup 3.9 to be well defined below the critical point, , where the surface 3.9 has more than one minimum. In physical terms, the nominal expectation value of the magnetization—which remains vanishing even below the critical point—now becomes decoupled from the set of its typical values and, thus, is no longer descriptive of the microscopic behavior of the system. In formal terms, the corresponding Gibbs energy, the entropy, and the energy all become poorly defined and can be thought of, at best, as multivalued under limited circumstances. We flesh out these notions in what follows; the detailed calculations are relegated to appendix B.
Having two distinct minima on the curve implies the Gibbs energy is no longer well defined. Indeed, on the one hand, the typical values are nonvanishing below the critical point already at . On the other hand, the average magnetization should be zero, at , by symmetry. This seeming paradox is resolved by noting that symmetry-breaking fields acting on individual spins do not have to be externally imposed but could emerge internally (Goldenfeld, 1992; Lubchenko, 2015); such fields originate from the surrounding spins and come about self-consistently. One may then define a multivalued Gibbs energy, the number of branches determined by the degree of degeneracy of the Helmholtz energy. The two branches of the Gibbs energy—one per each minimum of —are shown in Figure 4 and labeled “” and “” as pertinent to the right-hand side and left-hand side minimum of , respectively. Each of the two branches thus corresponds to a restricted Gibbs energy computed for a single phase. Indeed, the differential relation in equation 2.52 can sample the Helmholtz energy only within an individual minimum. Sampling the phase space within an individual minimum can be thought of as vibrational relaxation, a relatively fast process (He & Lubchenko, 2023). In contrast, crossing over to the other minimum requires a discontinuous transition across the mechanically unstable region delineated by the spinodals (see Figures 5 and 9). In terms of kinetics, such discontinuous transitions are slow because they require activation (He & Lubchenko, 2023; see also below).
The two branches of the restricted Gibbs energy are labeled with and . The equilibrium Gibbs energy from equation 3.11 is shown for select values of . The curve is masked by the mean-field curves. , .
The two branches of the restricted Gibbs energy are labeled with and . The equilibrium Gibbs energy from equation 3.11 is shown for select values of . The curve is masked by the mean-field curves. , .
The magnetization, per spin, as a function of the source field along the slice , , below the critical point. The mean-field curves correspond to the , minimum (red), the , minimum (blue), and the mechanically unstable region separating the spinodals (pink); the portions of this curve where are either metastable or unstable. The equilibrium values for the energy function, equation 3.10, are given for several system sizes. , .
The magnetization, per spin, as a function of the source field along the slice , , below the critical point. The mean-field curves correspond to the , minimum (red), the , minimum (blue), and the mechanically unstable region separating the spinodals (pink); the portions of this curve where are either metastable or unstable. The equilibrium values for the energy function, equation 3.10, are given for several system sizes. , .
The equilibrium Helmholtz free energy for the energy function from equation 3.10, compared with the actual free energy cost, , to have magnetization and the mean-field Helmholtz free energy . The horizontal axis refers to the magnetization along the direction as in Figure 3. , .
(a) The internal energy and (b) the entropy. The mean-field energy and entropy, respectively, corresponding to the right-hand-side and left-hand-side minimum of in Figure 6 are labeled with supserscripts “” and “.” . The equilibrium energy and entropy, respectively, are labeled using the value they were computed for. (The curves are masked by the meanfield curves.)
(a) The internal energy and (b) the entropy. The mean-field energy and entropy, respectively, corresponding to the right-hand-side and left-hand-side minimum of in Figure 6 are labeled with supserscripts “” and “.” . The equilibrium energy and entropy, respectively, are labeled using the value they were computed for. (The curves are masked by the meanfield curves.)
Parametric graph of the entropy as a function of energy from Figure 11. The dots show the model, discrete energy distribution and that were used to compute the restricted and the equilibrium quantities, respectively. The blue and orange dots show the locations corresponding to temperature . The location, energy-wise, of the false states was parameterized so that they are in equilibrium with the true states at . .
Parametric graph of the entropy as a function of energy from Figure 11. The dots show the model, discrete energy distribution and that were used to compute the restricted and the equilibrium quantities, respectively. The blue and orange dots show the locations corresponding to temperature . The location, energy-wise, of the false states was parameterized so that they are in equilibrium with the true states at . .
(a) Signs of the exact polarizations as functions the source field for the energy function, equation 3.1, are shown using colors. Red and blue areas: . Purple area: . The counterparts of the region boundaries pertaining to the likeliest mean-field magnetizations are shown using dashed black lines. The line connecting the two yellow dots corresponds to a discontinuity in the most probable value of the magnetization and, as such, is a phase boundary. (b) The magnetization as a function of . The solid and dashed line to the stable and metastable minimum, respectively, of the mean-field free energy surface , at , .
(a) Signs of the exact polarizations as functions the source field for the energy function, equation 3.1, are shown using colors. Red and blue areas: . Purple area: . The counterparts of the region boundaries pertaining to the likeliest mean-field magnetizations are shown using dashed black lines. The line connecting the two yellow dots corresponds to a discontinuity in the most probable value of the magnetization and, as such, is a phase boundary. (b) The magnetization as a function of . The solid and dashed line to the stable and metastable minimum, respectively, of the mean-field free energy surface , at , .
We see the equilibrium free energies each provide a lower bound on the restricted free energies—convex-up for Gibbs and convex-down for Helmholtz—as they should (Landau & Lifshitz, 1980). Thus, when the distribution of the magnetization becomes bimodal, the equilibrium Helmholtz energy no longer reflects the actual distribution. Instead, the central portion of the equilibrium Helmholtz energy becomes a shallow bottom that largely interpolates between the two minima of the restricted Helmholtz energy . The mean-field free energy , on the other hand, has a qualitatively correct form and, we see, tends asymptotically to the actual free energy cost— for large , as it should. Transitions between the two free energy minima of are accompanied by an extensive change of total magnetization and thus can be induced by varying the external field by a small amount that scales as ; this notion is consistent with Figure 5. Consequently, the variation of the equilibrium Helmholtz energy, per particle, along its shallow bottom scales as with the system size . Thus, it vanishes in the thermodynamic limit . Consequently, the equilibrium susceptibility becomes poorly defined even though the susceptibilities of individual phases each remain well defined and finite.
Thus, when ergodicity is broken, the notions of free energy and thermodynamic quantities become poorly defined (see the discussion following equation 2.52). We show the restricted and equilibrium values of the internal energy and the entropy in respectively, Figures 7a and 7b. These graphs directly convey that already for a generative model as simple as the one in equation 3.1, a reduced description does not allow for a Helmholtz machine (Dayan et al., 1995), since the basic thermodynamic quantities, such as the energy and entropy, cannot be properly defined.
In a finite system, the barriers separating distinct free energy minima are finite, even if tall, implying the ergodicity is broken transiently. The ergodicity is restored on times that scale exponentially with the parameters of the problem and can become very long already for modestly sized systems and/or following small temperature variations. We illustrate these notions in appendix C. There we also demonstrate that transitions among distinct free energy minima are rare events that occur via nongeneric sequences of individual spin flips.
Below the critical point, the minima of the surface are two and are separated by a finite barrier. Thus, the machine will be able to reproduce only the patterns belonging to the now-reduced phase space. Specifically which minimum will be found during training is generally a matter of chance; the odds are system specific. For instance, setting at a small, positive value, in the generative model, equation 3.3, will create a bias toward the , minimum of the surface. One can use symmetry considerations (see appendix D) to make a general case that the reduced free energy surface will consist of isolated minima whose curvature is nonvanishing in all directions. This means that the bound states corresponding to the latter minima are compact, shape-wise, in the phase space.
The two minima on the surface , for the model 3.1, correspond to the two “correct” states of the inverter gate, and , respectively. Both minima must be sampled in order for the machine to work adequately. Owing to the one-to-one correspondence between the minima of and , respectively, the two minima on the surface must be sampled for the machine to work properly. Sampling multimodal distributions can become computationally unwieldy already for a modest number of variables. Thus, to efficiently sample multiple free energy minima that are separated by barriers, it may be necessary to employ a separate generative model—call it an “expert”—for each individual minimum. For the underlying model 3.1, one such expert in the form of fields and in equation 3.4, can be used to retrieve the state of the gate, as already mentioned. The other model—in the form of fields and —will retrieve the state.
The requisite number of experts is thus bounded from above by how many minima free energy landscapes will acquire as a result of reduction in description. Actual physical systems provide insight as to how large this number can be. Note that lowering the temperature in a physical system raises the free energy and thus corresponds to a reduction in description. A variety of physical scenarios can unfold as a set of interacting degrees of freedom is cooled. For instance, translationally invariant Ising-spin models with ferromagnetic, short-range coupling will exhibit two minima below the ergodicity breaking transition, irrespective of the system size. The worst-case scenario is arguably represented by some disordered spin systems, such as the mean-field Potts glasses (Kirkpatrick & Wolynes, 1987), in which the number of minima can scale exponentially with the system size in the broken-ergodicity regime. The disorder can also be self-generated—as in actual finite-dimensional glassy liquids—whereby the free energy surface becomes exponentially degenerate below the dynamical crossover (Lubchenko, 2015; Lubchenko & Wolynes, 2007). In the latter case, a reduced description has in fact been achieved (Singh et al., 1985; Baus & Colot, 1986; Rabochiy & Lubchenko, 2012; Lubchenko, 2015) using a trial density profile of a glassy liquid in the form of a sum of narrow gaussian peaks, each centered at a distinct corner of an aperiodic lattice. The number of distinct, comparably stable aperiodic structures self-consistently turns out to scale exponentially with the system size, consistent with experiment (Xia & Wolynes, 2000; Lubchenko, 2015; Lubchenko & Wolynes, 2007). Each such lattice is mechanically stable and corresponds to a distinct expert.
We provide an elementary illustration in appendix D that the greater the reduction in the description, the more experts are needed for proper functionality. Conversely, one may say that an individual expert operating on a progressively naive model will retrieve a smaller subset of the correct configurations. In any event, we have described elsewhere (He, 2022) a protocol to train sets of experts specifically for binary data sets; these findings are also to be presented in a forthcoming submission.
4 Knowledge as a Library of Bound States
The number of spins in a description depends on the resolution. Even at relatively lower resolutions and, hence, modest values of , the number of configurations represented in a realistic data set will be much much less than the size of the phase space available to binary variables. Consequently, the vast majority of the energies in the full description (see equation 2.11) are undetermined. Here we argue that the energies of nonrepresented configurations must be explicitly parameterized and that such parameterizations, as well as retrieval protocols, must satisfy rigid constraints in order to avoid instabilities toward learning or retrieval of nonrepresented configurations. One may think of nonrepresented configurations as false positives.
The contributions of the nonrepresented reconfigurations to the sums in equation 2.11 are indeterminate, as already alluded to. Simply omitting the latter contributions from the sums in equation 2.11 would be formally equivalent to assigning a fixed, vanishing value to their energies , irrespective of the detailed parameterization of the represented states. Instead, we take a general approach whereby we treat the energies of the nonrepresented states as adjustable parameters; their values, then, will be chosen so as satisfy the constraint in equation 4.3.
Equation 4.15 dictates that the canonical energy of the false states in a robust description must be separated by a nonvanishing, extensive gap from the canonical energy of the true states. Indeed, for , the right-hand side of equation 4.15 is numerically close to . We note that for data set parameterizations obeying condition 4.15, the standard entropy, equation 2.19, is dominated by the represented states, implying the present treatment is internally consistent.
The thermodynamic singularity at defines a dichotomic distinction between the true and false states, even if their respective spectra overlap. Because of its high entropy, the high- phase can be thought of as corresponding to a generic mixture of components. The low- phase, on the other hand, is by construction a mixture with a data set-specific composition, whereby some components are represented much more than others. Thus, at the temperature , the composition of the mixture undergoes an intrinsically discrete change, consistent with the transition exhibiting a latent heat.
The states within the energy gap are strictly inaccessible in equilibrium because at any temperature, there is at least one state whose free energy is lower. This implies that a subset of the true states, those at , could not be observed in equilibrium. This high-energy flank of the true states corresponds, in the calibration 4.5, to low frequency, relatively underrepresented configurations from the data set. Although the latter states can be observed, in principle, at temperatures above —as we illustrate in Figure 12b in appendix E—the true states, as a whole, are, however, only metastable at such temperatures. If kept at , the machine will spontaneously transition into the false states—and thus will begin to produce nonrepresented configurations. The machine is thermodynamically unlikely to retrieve the represented configurations again unless the temperature is lowered below . Thus, we view the robustness of retrieval as conditional on the machine being confined to the portion of the phase space pertaining to the true states, which implies a breaking of ergodicity.
Relaxation profiles for equilibrium Monte Carlo simulations. (a) Single spin autocorrelation function averaged over individual spins and over . The horizontal dashed line indicates the location squared of an individual minimum of , the limiting height of the plateau when the escape time from an individual minimum diverges. (b) Multispin autocorrelation function . .
Relaxation profiles for equilibrium Monte Carlo simulations. (a) Single spin autocorrelation function averaged over individual spins and over . The horizontal dashed line indicates the location squared of an individual minimum of , the limiting height of the plateau when the escape time from an individual minimum diverges. (b) Multispin autocorrelation function . .
Canonical entropy and energy as functions of temperature for the discrete energy distribution shown as dots in Figure 8. The restricted, single-phase averages over the lower-energy block are labeled with T and over the higher-energy block labeled with F. The full equilibrium quantities are labeled with TF. .
Canonical entropy and energy as functions of temperature for the discrete energy distribution shown as dots in Figure 8. The restricted, single-phase averages over the lower-energy block are labeled with T and over the higher-energy block labeled with F. The full equilibrium quantities are labeled with TF. .
(a) Thermodynamic potentials , where is an externally fixed temperature, for the true and false states. The restricted entropies are the same as in Figure 8. (b) An expanded view of the low- phase to illustrate that in order to sample “true” states above , one must raise the temperature above and vice versa for the states with . The main graph shows that at , the “true” states are, however, only metastable.
(a) Thermodynamic potentials , where is an externally fixed temperature, for the true and false states. The restricted entropies are the same as in Figure 8. (b) An expanded view of the low- phase to illustrate that in order to sample “true” states above , one must raise the temperature above and vice versa for the states with . The main graph shows that at , the “true” states are, however, only metastable.
Thus, the totality of the true states can be thought of as a collection—or library (Lubchenko & Wolynes, 2004)—of bound states centered at patterns from the data set. The bounding potential is a free energy since it has an entropic component. The spectrum of the nonrepresented states must be properly parameterized to avoid escape toward nonrepresented states; such an escape may well occur in an abrupt, avalanche-like fashion owing to the transition being discontinuous. This notion is consistent with an analysis of US Supreme Court data, due to Gresele and Marsili (2017), who used a spin-based generative model of the type considered here. The latter study associates frequencies of configurations with Boltzmann weights. The nonrepresented states—called “unobserved” in Gresele and Marsili (2017)—are omitted from the sums in the expressions for the coupling constants. According to the discussion in the beginning of this section, omitting unobserved states amounts to pinning their energies at ; the latter value happens to be greater than the energies assigned to the observed states. Gresele and Marsili (2017) find that including in the data set configurations whose weight is lower than a certain threshold value causes an instability toward faulty retrieval. Our study suggests that those low-weight states may well be sufficiently close, energy-wise, to the nonrepresented states so as to substantially stabilize them.
The stability criterion, equation 4.16, must be satisfied by descriptions irrespective of whether they employ full or reduced sets of coupling constants. This amounts to an additional constraint when optimizing with respect to the values of the coupling constants and/or number of experts, in a reduced description. These conclusions are consistent with earlier studies of associative memory Hamiltonians (AMH) for protein folding. AMH-based generative models have been used to predict three-dimensional structures of native folds of proteins since the 1980s (Friedrichs & Wolynes, 1989; Davtyan et al., 2012). The native structures are extracted from protein-structure databases, while the nonnative states are emulated by placing residues but generically within correct structures. The coupling constants are then determined by maximizing the energy gap separating the native and nonnative states, respectively, relative to the width of the spectrum of the nonnative states.
Finally, we briefly comment on data sets and/or parameterizations thereof where the distribution is multimodal. In such cases, the true states themselves may be best thought of as a collection of distinct phases whose stability relative to each other and to the false states will depend on temperature. When distributed, the numbers can be profitably thought of as numerical labels. See Marsili et al. (2013), Haimovici and Marsili (2015), and Cubero et al. (2019) for an in-depth discussion of such labeling schemes.
5 Summary and Conclusion
We have considered acquisition of knowledge in the form of a generative model that operates on binary variables. The generative model assigns an energy value to each of the configurations of such binary variables. The expression for the energy has the functional form of a high-order Boltzmann machine (Sejnowski, 1986) but employs a non-Hebbian training protocol. We explicitly calibrate the weights of individual configurations within the data set and assign separate energy references to individual configurations of the machine. Thus, we explicitly treat learning as contextual.
We have built a conjoint free energy surface that can guide, in principle, both training and retrieval. The free energy is a function of the coupling constants; it is uniquely minimized by some complete set of coupling constants whose size can be as large as . In practice, the set of the coupling constants must be reduced in size from the value of , high-order couplings likelier to be removed because of their multiplicity. The resulting free energy can be thought of as a cross-section of the original surface along a submanifold of the original space of the coupling constants. We find that when evaluated within such submanifolds, the free energy has not one but multiple minima. The latter degeneracy is connected to the degeneracy of the free energy as a function of coarse-grained degrees of freedom. This apparent connection between the free energy for learning and retrieval, respectively, can be traced to bounds on the free energy derived early on by Gibbs. We have seen that reduction in description can be thought of as a mean-field approximation.
We have argued that in a consistent treatment, one must explicitly parameterize the energies of configurations not represented in a data set; they cannot simply be regarded as indeterminate. Because of their huge number, the nonrepresented configurations must be destabilized, energy-wise, by an extensive amount relative to the represented states. Thus, the nonrepresented configurations comprise a distinct, high-temperature phase.
Reduction in description leads to ergodicity breaking, which plays a dual role in acquisition of knowledge. Learned patterns can be thought of as a library of bound states centered at configurations from the data set. All other patterns comprise what is essentially a high-entropy continuum. Ergodicity breaking prevents escape into the continuum and, hence, is essential to discriminating correct patterns. At the same time, kinetic barriers separating distinct free-energy minima in the low-temperature phase will result in kinetic bottlenecks for both learning and retrieval. The latter aspect of the ergodicity breaking is detrimental; it also seems to be an inherent feature of contextual learning. To mitigate these detrimental effects, one may have to resort to using separate generative models for distinct free energy minima. The number of minima—and hence the demand for more experts—will typically increase with the degree of reduction in description. It is conceivable that the thermodynamic potentials considered here can be employed to rate the veracity of an individual expert by using the depth of the corresponding free energy minimum as a metric.
The above notions appear to be consistent, for instance, with the limited success machine learning–based force fields have had in predicting structures of inorganic solids. Indeed, the bonding preferences for most elements and the bond orders tend to switch among several, discretely different patterns, in a fashion similar to that shown in Figure 5. An example important in applications is the competition between the tetrahedral and octahedral bonding patterns in intermetallic alloys (Zhugayevych & Lubchenko, 2010). This near discreteness results from cooperative processes (Golden et al., 2017) that are similar to the processes causing the ergodicity breaking we discussed here. Thus, we anticipate that each distinct bonding pattern will require a separate generative model. For instance, for three atoms and two generative models per atom, there would be eight different potential outcomes for the ground state. Furthermore, the analysis in section 4 suggests that successful generative models would have to be trained on at least two sets of structures: One set corresponds to low-energy ordered structures, the other set to high-energy liquid structures. We do not, however, have access to liquid structures at present, which one might view as a fundamentally difficult aspect of the problem of predicting the structure of inorganic solids. This important problem (Maddox, 1988) remains unsolved.
Our work has focused on thermodynamic aspects of learning and retrieval. The corresponding kinetics are highly sensitive to the detailed form of the generative model and the sampling moves. A generic sampling move, during retrieval, will place the system into one of the false states essentially always, because of their multiplicity. The probability of sampling out of a false state into a true state is very very low, roughly , corresponding to an entropic free energy barrier . This entropic bottleneck is analogous to Levinthal’s paradox of protein folding (Levinthal, 1969; Wolynes, 1997a) according to which an unfolded protein should never find a native state, even if the latter is thermodynamically favorable. Levinthal’s paradox is, however, resolved by noting that the landscape of the nonnative states of actual proteins is not flat. Instead, its overall shape is funnel-like and minimally frustrated notwithstanding some amount of roughness (Bryngelson et al., 1995; Onuchic et al., 1997; Wolynes, 1997b), a notion at the heart of spectrum parameterization in associated-memory Hamiltonians (Davtyan et al., 2012). Likewise, it appears that machine learning will work well only if the data set itself allows for a funneled free energy landscape or a combination of a modest number of such funnels.
Appendix A: Connection with Chemistry and Generalization of for Correlations in Data Sets
The notions of calibration put forth in section 2 can be compared with how one counts states in thermochemistry. In chemical contexts, one may use the semiclassical result for the density of states to normalize the partition function for a set of classical particles (Landau & Lifshitz, 1980). Thereby, one counts configurations in the convention that there is one state per the volume formed by the thermal de Broglie wavelength of the particle. Since the latter wavelength is inversely proportional to the square root of the mass, one obtains that per every state of the Ne atom, there are states of the Ar atom. But this becomes an entirely moot point in the classical regime, in which all thermodynamic properties of the system are strictly independent of mass! Consistent with this, the entropy of a classical system can be defined only up to an additive constant, and so only entropy differences are meaningful. The relevant length scale for counting states in a classical gas is the typical spacing between like particles since a particle identity can be established only if no other particles of the same species are nearby (Lubchenko, 2015, 2020). Thus, the chemist ties standard states to densities at the onset of the calculation; equation 2.12 has a similar purpose.
The coefficients in the quadratic expansion in equation A.3 imply nontrivial correlations among fluctuations of the weights. One may imagine how such correlations can arise owing to intrinsic uncertainties in calibrating detectors. Consider, for instance, Shockley’s setup of a self-guiding missile or face-recognition device (Brock, 2021), in which the image collected by the device’s camera is passed through a film containing the image of the intended target. The accuracy of the aim is assessed by measuring the intensity of the light that has passed through the film. One must set a separate intensity standard for each individual target or even the very same target depending on the lighting conditions. A universal device, capable of processing multiple images and/or lighting conditions, would then require a floating calibration scheme for the input. Ideally the intensity standard should vary smoothly with variations in the image. Incidentally, “elastic” algorithms to align or match images have been discussed since the early 1980s (Burr, 1981; Moshfeghi, 1991). When alignment of two images requires deletions or insertions, the latter may be thought of as “lattice defects,” by analogy with continuum mechanics.
The standard values are set by the calibration convention for the outputs. The choice of the source fields remains flexible and, hence, can be used to implement a particular floating-calibration scheme for the inputs. The reference values should be equal to each other within an error that, ideally, increases smoothly with the difference between two supposedly similar images, according to an adopted similarity criterion. (One hopes that the cost of the defects, if any, does not overwhelm the cost stemming from purely elastic distortion of defectless portions of the image.) Thus, roughly, , consistently for all pairs that are neighbors in the Hamming space of configurations, and is some judiciously chosen cutoff distance. The latter convention is analogous to the setup of a scalar field theory defined on a discrete lattice (Itzykson & Zuber, 2012); hereby the standard is the field itself, while the lattice points comprise the (-dimensional) Hamming space of the configurations represented in the data set.
The standard potentials and of two configurations that are farther apart than the cutoff distance are still correlated, but indirectly, through chains of neighbors (in the Hamming space). The degree of correlation is problem specific. In continuum mechanics contexts, the average in equation A.9 tends to a steady value in dimensions three and higher but diverges logarithmically and linearly with the distance in two and one spatial dimensions, respectively, or in any dimensions when the shear modulus vanishes (Landau & Lifshitz, 1980). In any case, we conclude that owing to intrinsic uncertainties in input calibration, the coupling constants will be nonvanishing. Finally we note that the local nature of the standard state is formally analogous to the locality of gauge fields in field theory (Itzykson & Zuber, 2012) or, for instance, of the Berry phase in quantum mechanics (Berry, 1984). Variations in such gauge fields amount to long-range interactions among local degrees of freedom. Specifically in elastic continua, whether degenerate or not, such interactions are of the dipole-dipole variety (Bevzenko & Lubchenko, 2009, 2014) but can be screened in the presence of fluidity (Lemaître et al., 2021).
Appendix B: The Two-Spin Generative Model: Calculations
When the Helmholtz energy has two minima, equation B.1 has not one but three solutions for a vanishing external field : Two solutions correspond to the minima themselves and one solution to the saddle point separating the minima. The two minima and the saddle point all lie within the slice from Figure 3. It will suffice for our purposes—and will simplify the prose a great deal—to work along the latter slice. We use and as our variables. Below the critical point, the curve exhibits two inflection points , whereby . These points—conventionally called the “spinodals”—delineate the stability limits, since the susceptibility is negative between the spinodals: . We denote the locations of the spinodals pertaining to the positive and negative minimum as and , respectively. When , equation B.1 will have three solutions within an interval of nonvanishing width, whose left-hand-side and right-hand-side boundaries are determined by solving equation B.1 with set at and , respectively. This is illustrated in Figure 5 with the curves. Of the three solutions of equation B.1, we will consider the two stable solutions pertaining to the minima. Hereby the magnetization is subject to a bimodal distribution, the more likely mode corresponding to the deeper minimum of .
To elucidate the nature of the mean-field constraint—which causes ergodicity breaking at sufficiently low temperatures—we first juxtapose, in Figure 9a, the signs of the magnetizations as functions of the source fields for the exact and mean-field solution, respectively, of the generative model, equation 3.1. For the exact solution, we color-code the regions of positive and negative with red and blue, respectively. The purple areas, then, show where the exact values of the two typical polarizations and , respectively, have the same sign. For the mean-field solution, possible values of the magnetizations are determined by the positions, in the plane, of the minima of the tilted free energy surface , where the fields are treated as constants (see the discussion following equation 2.52). We show, in Figure 9a, the signs for the likelier magnetization pattern—the one corresponding to the deeper minimum of the tilted surface. The corresponding boundaries are shown using dashed lines; we see they lie rather close to their exact counterparts. Unlike the signs, the magnitudes of the exact and mean-field magnetizations, respectively, show a qualitatively different behavior except when the source fields are large enough for the (tilted) mean-field free energy surface to exhibit just one minimum. (We reiterate that the exact free energy always has just one minimum.) Consequently, the likeliest values of mean-field magnetizations experience a discontinuity when the two competing minima of the free energy surface are exactly degenerate. Conditions for such degeneracy are met along a substantial segment of the line shown in Figure 9a as the straight dashed line connecting the two yellow dots. The latter segment thus represents a phase boundary. An elementary calculation shows that the ends of the latter phase boundary are located at , where and , while the mean-field lines are given by functions , .
The distinct difference of the likeliest magnetizations on the two opposite sides of the phase boundary—the component shown in Figure 9b—is an instance of hysteresis, a classic signature of broken ergodicity. The discontinuity across the phase boundary implies that a substantial region of phase space around the origin is strictly avoided for sufficiently large values of the coupling as a result of the mean-field constraint.
Appendix C: Ergodicity Breaking Is Transient. It Is Restored via Rare, Cooperative Processes
Appendix D: Ergodicity Breaking Causes the Parameter Manifold to Fractionalize into a Set of Fragments That Are Compact
Below the critical point, the minima of the surface from equation 3.9 are strictly degenerate when and vanish. The latter degeneracy is dictated by the invariance of the product with respect to flipping the two spins at the same time. Yet the latter symmetry has another consequence: that the average magnetizations must all vanish at all temperatures. At the same time, the operation is intrinsically discrete. Thus, the symmetry breaking signaled by the emergence of the two minima in —each of which corresponds to nonvanishing nomagnetizations —must be also of the discrete kind. Consequently, no Goldstone modes (Goldstone et al., 1962) appear as a result of the symmetry breaking; the newly emerged minima of the free energy must be separated by a barrier. The barrier can be made small near criticality, if any, but criticality is ordinarily observed within a manifold of vanishing volume in the phase space and thus is rare.
When present, Goldstone modes imply the free energy minima, below the symmetry breaking, have a vanishing curvature along one or more directions in the order-parameter space. We see such noncompact free energy minima would be untypical for binary data sets, thus indicating the ergodicity breaking is of the harshest type possible.
For many problems, considering digitized data sets as binary is arguably gratuitous, especially when the underlying problem is continuous. In such cases, Goldstone modes would be absent nonetheless. This lack of Goldstone modes can be viewed, rather generally, as a consequence of the symmetry of the contextual ensemble 2.12, with respect to the gauge transformation , . When gauged symmetries are broken, candidate Goldstone excitations do become gapped (Anderson, 1984; Itzykson & Zuber, 2012). Now, a distinct variety of low-frequency modes can arise in models with translationally invariant, short-range forces when more than one phase coexists and are in near equilibrium with each other. Here system is us broken up into regions each occupied by an individual phase, a phenomenon called “spinodal decomposition” (Goldenfeld, 1992; Bray, 1994). Interfaces separating the latter regions can often deform and move about with relative ease. We view such situations as coincidental because conditions for phase equilibrium could be fulfilled only within a manifold of vanishing volume, in the phase space.
Appendix E: Patterns versus Generic Configurations: A Thermodynamic View
The restricted entropy , corresponding to equations 4.7, 4.11, and 4.12, can be parameterized to be a strictly convex-up function by construction. Assume for now that the entropy of the true states—computed using equations 4.6, 4.11, and 4.12—is also a strictly convex-up function. (We use distinct variables, and , for the energies of the true and false states, because their respective spectra generally overlap.) When the energy gap is at its lowest allowed value—corresponding to the equality in equation 4.15—the two phases are in mutual equilibrium. Indeed, when evaluated at and , respectively. Note that equation 4.15, at equality, expresses the familiar double tangent construction for phase equilibrium in the microcanonical ensemble (Lubchenko, 2008, 2020).
The two circles in Figure 8 correspond to . The mutual arrangements of the true and false states in Figure 8 represent the borderline case: Moving the false states by any amount toward lower energies—while keeping fixed—would make them more stable than the true states. Conversely, one is allowed to use a gap that is greater than the one we used in Figure 8. For this reason, . Now suppose that one has settled on a specific spectrum for the false states that satisfies the constraint 4.15. The two phases will be at equilibrium at a temperature from equation 4.17. Because of the discrete change of the entropy at the transition, , the transition is discontinuous and, furthermore, exhibits a latent heat equal to .
Criterion 4.16 can be lucidly visualized by plotting, on the same graph, the thermodynamic potential for each of the individual phases, where the temperature is now regarded as a fixed, externally imposed parameter. For a single-phase system, the potential , as a function of , is uniquely minimized at such that ; hereby the internal temperature becomes equal to (Lubchenko, 2020). The depth of the minimum is equal to the equilibrium Helmholtz energy . During phase coexistence, there is a separate minimum for each phase; the deepest minimum corresponds to the stable phase. We observe directly in Figure 12a that at , the true states are more stable than the false states and vice versa at .
The equilibrium energy and entropy correspond to the full partition function from equation 4.13; they are shown in Figures 8 and 11 with the dashed line. We observe that and each undergo an abrupt variation within a narrow temperature interval already at the modest value of the system size. It is straightforward to show that in the thermodynamic limit, , the equilibrium energy and entropy develop a strict discontinuity at , consistent with Figure 11, while the parametric equilibrium entropy asymptotically tends, within the gap, to the common tangent to the respective entropies of the pure phases, consistent with Figure 8. In any event, the equilibrium —a convex-up envelope of the two restricted entropies, respectively—does not pertain to either one of the individual phases when . In summary, the entropy and energy alike cannot be regarded as one-valued state functions during phase coexistence, analogously to the discussion of ergodicity breaking in section 3.
Acknowledgments
V.L. thanks Vitali Khvatkov, Rolf M. Olsen, and Michael Tuvim for inspiring conversations. We gratefully acknowledge the support of NSF grants CHE-1465125 and CHE-1956389, the Welch Foundation grant E-1765, and a grant from the Texas Center for Superconductivity at the University of Houston. We gratefully acknowledge the use of the Carya/Opuntia/Sabine Cluster and the advanced support from the Research Computing Data Core at the University of Houston acquired through NSF Award Number ACI-1531814.