Buckets:

|
download
raw
81 kB

The Convergence of Bird Flocking *

BERNARD CHAZELLE

Abstract

We bound the time it takes for a group of birds to reach steady state in a standard flocking model. We prove that (i) within single exponential time fragmentation ceases and each bird settles on a fixed flying direction; (ii) the flocking network converges only after a number of steps that is an iterated exponential of height logarithmic in the number of birds. We also prove the highly surprising result that this bound is optimal. The model directs the birds to adjust their velocities repeatedly by averaging them with their neighbors within a fixed radius. The model is deterministic, but we show that it can tolerate a reasonable amount of stochastic or even adversarial noise. Our methods are highly general and we speculate that the results extend to a wider class of models based on undirected flocking networks, whether defined metrically or topologically. This work introduces new techniques of broader interest, including the flight net, the iterated spectral shift, and a certain residue-clearing argument in circuit complexity.

1 Introduction

What do migrating geese, flocking cranes, bait balls of fish, prey-predator systems, and synchronously flashing fireflies have in common? All of them are instances of natural algorithms, ie, algorithms designed by evolution over millions of years. By and large, their study has been the purview of dynamical systems theory within the fields of zoology, ecology, evolutionary biology, etc. The main purpose of this work is to show how combinatorial and algorithmic tools from computer science might be of benefit to the study of natural algorithms—in particular, in the context of collective animal behavior [20]. We consider a classical open question in bird flocking: bounding the convergence time in a standard neighbor-based


*A preliminary version of this work appeared in Proceedings of ACM-SIAM Symposium on Discrete Algorithms (SODA09), January 4–6, 2009, 422–431. This work was supported in part by NSF grant CCF-0634958 and NSF CCF-0832797. Categories and Subject Descriptors: F.2.0 [Analysis of Algorithms and Problem Complexity]: General. General Terms: Algorithms, Theory. Additional Key Words and Phrases: Natural Algorithms, Dynamical Systems.

Department of Computer Science, Princeton University, chazelle@cs.princeton.edumodel. We give a tight bound on the number of discrete steps required for a group of $n$ birds to reach steady state. We prove that, within time exponential in $n$ , fragmentation ceases and each bird settles on a fixed flying direction. We also show that the flocking network converges after a number of steps that never exceeds an iterated exponential of height logarithmic in $n$ . Furthermore, we show that this exotic bound is in fact optimal. If we view the set of birds as a distributed computing system, our work establishes a tight bound on the maximum execution time. Holding for a large family of flocking mechanisms, it should be thought of as a busy beaver type result—or perhaps busy goose.

The bound is obtained by investigating an intriguing “spectral shift” process, which could be of independent interest. In the model, birds forever adjust their velocities at discrete time steps by averaging them with their neighbors flying within a fixed distance. The model is deterministic but we show that it tolerates a reasonable amount of stochastic or even adversarial noise. While, for concreteness, we settle on a specific geometric model, our methods are quite general and we suspect the results can be extended to a large class of flocking models, including topological networks [1]. The only serious limitation is that the flocking network must be undirected: this rules out models where one bird can process information from another one while flying in its “blind spot.”

Bird flocking has received considerable attention in the scientific and engineering literature, including the now-classical Boids model of Reynolds [21, 26–28]. Close scrutiny has been given to leaderless models where birds update their velocities by averaging them out over their nearest neighbors. Two other rules are often added: one to prevent birds from colliding; the other to keep them together. Velocity averaging is the most general and fundamental rule and, understandably, has received the most attention. Computer simulations support the intuitive belief that, by repeated averaging, each bird should eventually converge to a fixed speed and heading. This has been proven theoretically, but how long it takes for the system to converge had remained an open problem. The existential question (does the system converge?) has been settled in many different ways, and it is useful to review the history briefly.

A “recurrent connectivity” assumption stipulates that, over any time interval of a fixed length, every pair of birds should be able to communicate with each other, directly or indirectly via other birds. Jadbabaie, Lin, and Morse [9] proved the first of several convergence results under that assumption (or related ones [16, 17, 23, 27]). Several authors extended these results to variable-length intervals [8, 13, 15]. They established that the bird group always ends up as a collection of separate flocks (perhaps only one), each one converging toward its own speed and heading. Some authors have shown how to do away with the recurrent connectivity assumption by changing the model suitably. Tahbaz-Salehi and Jadbabaie [24], for example, assume that the birds fly on the surface of a torus. Cucker and Smale [7] use a broadcast model that extends a bird’s influence to the entire group while scaling it down as a function of distance. In a similar vein, Ji and Egerstedt [10] introduce a hysteresis rule to ensure that connectivity increases over time. Tangand Guo [25] prove convergence in a high-density probabilistic model. Recent work [1] suggests a “topological” rule for linking birds: a bird is influenced by a fixed number of its neighbors instead of all neighbors within a fixed distance. Whether the criteria are metric or topological, the bulk of work on leaderless flocking has assumed neighbor-based consensus rules. We are not aware of any bounds on the convergence time.

Our model is a variant of the one proposed by Cucker and Smale [7], which is itself a holonomic variant of the classical Vicsek model [29]. Given $n$ birds $\mathcal{B}_1, \dots, \mathcal{B}_n$ , represented at time $t$ by points $x_1(t), \dots, x_n(t)$ in $E^3$ , the flocking network $G_t$ has a vertex for each bird and an edge between any two of them within distance 1 of each other. By convention, $G_t$ has no self-loops. The connected components of $G_t$ are the flocks of the system. If $d_i(t)$ denotes the number of birds adjacent to $\mathcal{B}_i$ at time $t$ , the total number of birds within the closed unit disk centered at $\mathcal{B}_i$ is precisely $d_i(t) + 1$ .

Figure 1: Each bird updates its velocity by averaging it with those of its neighbors within a unit-radius circle.

The Model. The input consists of the initial position $x(0)$ and velocity $v(1)$ . Both vectors belong to $\mathbb{R}^{dn}$ , for any fixed $d \geq 1$ . For $t \geq 1$ and $1 \leq i \leq n$ ,

xi(t)=xi(t1)+vi(t),x_i(t) = x_i(t-1) + v_i(t),

where1

vi(t+1)vi(t)=ci(t)(i,j)Gt(vj(t)vi(t)).v_i(t+1) - v_i(t) = c_i(t) \sum_{(i,j) \in G_t} (v_j(t) - v_i(t)).


1 We denote the coordinates of a vector $x(t)$ by $x_i(t)$ and the elements of a matrix $X(t)$ (resp. $X_t$ ) by $x_{ij}(t)$ (resp. $(X_t)_{ij}$ ).The self-confidence coefficients $c_i(t)$ , so named because they tell us how much a bird is influenced by its neighbors, are normalized so that $0 < c_i(t)d_i(t) < 1$ . (See “Discussion” section below for an intriguing interpretation of these constraints.) We assume that $c_i(t)$ may vary only when $G_t$ does; in other words, while all neighborly relations remain the same, so do the self-confidence coefficients. A natural choice of coefficients is the one used in the classical Vicsek model [29]: $c_i(t) = (d_i(t) + 1)^{-1}$ , but we do not make this restrictive assumption here.

The model captures the simple intuition that, in an effort to reach consensus by local means, each bird should adjust its velocity at each step so as to be a weighted average of those of its neighbors. A mechanical interpretation sees in the difference $v_i(t+1) - v_i(t)$ the discrete analogue of the bird’s acceleration, so that, by Newton’s Law, $F = ma$ , a bird is subject to a force that grows in proportion to the differences with its neighbors. A more useful take on the model is to view it as a diffusion process: more precisely, as the discrete version of the heat equation

vt=CtLtv,\frac{\partial v}{\partial t} = -C_t L_t v,

where the Laplacian $L_t$ of the flocking network $G_t$ is defined by:

(Lt)ij={di(t)if i=j;1if (i,j)Gt;0else.(L_t)_{ij} = \begin{cases} d_i(t) & \text{if } i = j; \\ -1 & \text{if } (i, j) \in G_t; \\ 0 & \text{else.} \end{cases}

and $C_t = \text{diag } c(t)$ is the self-confidence matrix. Thus we express the dynamics of the system as

v(t+1)v(t)=CtLtv(t).v(t+1) - v(t) = -C_t L_t v(t).

This is correct in one dimension. To deal with birds in $d$ -space, we use a standard tensor lift. Here is how we do it. We form the velocity vector $v(t)$ by stacking $v_1(t), \dots, v_n(t)$ together into one big column vector of dimension $dn$ . Given a matrix $A$ , the product2 $(A \otimes I_d)v(t)$ interlaces into one vector the $d$ vectors obtained by multiplying $A$ by the vector formed by the $k$ -th coordinate of each $v_i(t)$ , for $k = 1, \dots, d$ . The heat equation would now be written as

v(t+1)=(P(t)Id)v(t).v(t+1) = (P(t) \otimes I_d)v(t).

where $P(t) = I_n - C_t L_t$ . One can check directly that the transition matrix $P(t)$ is row-stochastic. In the case of a 3-node path, for example, $P(t)$ has the form:

(1c1(t)c1(t)0c2(t)12c2(t)c2(t)0c3(t)1c3(t)).\begin{pmatrix} 1 - c_1(t) & c_1(t) & 0 \\ c_2(t) & 1 - 2c_2(t) & c_2(t) \\ 0 & c_3(t) & 1 - c_3(t) \end{pmatrix}.


2 The Kronecker $A \otimes B$ , product of two matrices $A$ and $B$ is the matrix we get if we replace each $a_{ij}$ by the block $a_{ij}B$ . Formally, if $A$ is $m$ -by- $n$ and $B$ is $p$ -by- $q$ , then the product $A \otimes B$ is the $mp$ -by- $nq$ matrix $C$ such that $c_{ip+j,kq+l} = a_{i,k}b_{j,l}$ . We will often use, with no further mention, the tensor identity $(A \otimes B)(C \otimes D) = AC \otimes BD$ .Figure 2: A 3-node flock with the transitions of the middle node indicated by curved arrows.

The dynamics of flocking is captured by the two equations of motion: For any $t \geq 1$ ,

{x(t)=x(t1)+v(t);v(t+1)=(P(t)Id)v(t).(1)\begin{cases} x(t) = x(t-1) + v(t); \\ v(t+1) = (P(t) \otimes I_d)v(t). \end{cases} \quad (1)

For tie-breaking purposes, we inject a tiny amount of hysteresis into the system. As we discuss below, this is necessary for convergence. Intuitively, the rule prevents edges of the flocking network from breaking because of microscopic changes. Formally, an edge $(i, j)$ of $G_t$ remains in $G_{t+1}$ if the distance between $\mathcal{B}_i$ and $\mathcal{B}_j$ changes by less than $\varepsilon_h > 0$ between times $t$ and $t+1$ . We choose $\varepsilon_h$ to be exponentially small for illustrative purposes only; in fact, virtually any hysteresis rule would work.

The Results. To express our main result, we need to define the fourth level of the Ackermann hierarchy, the so-called “tower-of-twos” function: $2 \uparrow\uparrow 1 = 2$ and, for $n > 1$ , $2 \uparrow\uparrow n = 2^{2 \uparrow\uparrow (n-1)}$ . The bird group is said to have reached steady state when its flocking network no longer changes. All the results below hold in any fixed dimension $d \geq 1$ .

  • • A group of $n$ birds reaches steady state in fewer than $2 \uparrow\uparrow (4 \log n)$ steps. The maximum number of switches in the flocking network of $n$ birds is at most $n^{O(n^3)}$ . The limit configuration of each bird $\mathcal{B}_i$ is of the form $a_i + b_i t$ , where $a_i, b_i$ are $d$ -dimensional rational vectors. After the fragmentation breakpoint $t_f = n^{O(n^3)}$ , network edges can only appear and never vanish.
  • • There exists an initial configuration of $n$ birds that requires more than $2 \uparrow\uparrow \log \frac{n}{2}$ steps before reaching steady state. The lower bound holds both with and without hysteresis.

Past the fragmentation breakpoint, the direction of each bird is essentially fixed, so $n^{O(n^3)}$ is effectively the bound for physical convergence. (Of course, damped local oscillations typically go on forever.) Combinatorial convergence is another matter altogether. It might take an extraordinarily long time beforethe network stops switching. The tower-of-twos' true height is actually less than $4 \log n$ , ie, a little better than stated above: specifically, the factor 4 can be replaced by $(\log x_0)^{-1}$ , where $x_0$ is the unique real root of $x^5 - x^2 - 1$ , which is about 3.912.

Figure 3: Flocks cease to lose edges after the fragmentation breakpoint $t_f$ and can only gain new ones. The network reaches steady state after a tower-of-twos of height logarithmic in the number of birds.

How many bits? The self-confidence matrices $C_t$ are rational with $O(\log n)$ bits per entry. The bound on the maximum number of network switches holds even if the inputs are arbitrary real numbers. Obviously, there is no hope of bounding the convergence time if two birds can be initialized to fly almost parallel to each other; therefore bounding the representation size of the input is necessary. The initial position and velocity of each bird are encoded as rationals over $p$ bits. Our results hold for virtually any value of $p$ . The dependency on $p$ begins to show only for $p \geq n^3$ , so this is what we shall assume when proving the upper bound on the convergence time. Keep in mind that $p$ is only an upper bound and the actual number of bits does not need to be this long. In fact, the lower bound requires only $\log n$ bits per bird. All computation is exact. The upper bound3 of $2 \uparrow \uparrow (4 \log n)$ is extremely robust, and holds for essentially any conceivable input bit-size and hysteresis rule.

Is the lower bound pathological? Surprisingly, the answer is no. As we mentioned, initial conditions require only $p = O(\log n)$ bits per bird. Our construction ensures that the hysteresis rule never kicks in, so the lower bound holds whether the model includes hysteresis or not. The flocks used for the construction are single paths,


3 Logarithms to the base 2 are written as $\log$ while the natural variety is denoted by $\ln$ . For convenience we assume throughout this paper that $n$ , the number of birds, is large enough. To handle small bird groups, of course, we can always add fictitious birds that never interact with anyone.and the matrix $P(t)$ corresponds to a lazy random walk with probability $\frac{1}{3}$ of staying in place. The lower bound holds in any dimension $d > 0$ . Here are the initial positions and velocities for $d = 1$ :

{x(0)=(0,23,2,83,,2l,2l+23,,n2,n43)T;v(1)=(n11,0,n11,0,n11,0,,n11,0,n11,0n)T.\begin{cases} x(0) = \left(0, \frac{2}{3}, 2, \frac{8}{3}, \dots, 2l, 2l + \frac{2}{3}, \dots, n-2, n - \frac{4}{3}\right)^T; \\ v(1) = \left(\underbrace{n^{-11}, 0, -n^{-11}, 0, n^{-11}, 0, \dots, n^{-11}, 0, -n^{-11}, 0}_n\right)^T. \end{cases}

Flocking obeys two symmetries: one translational; the other kinetic (or “relativistic,” as a physicist might say). The absolute positioning of the birds is irrelevant and adding a fixed vector to each bird’s velocity has no effect on flocking. In other words, one cannot infer velocity from observing the evolution of the flocks. Indeed, only differences between velocities are meaningful. This invariance under translation in velocity space implies that slow convergence cannot be caused by forcing birds to slow down. In fact, one can trivially ensure that no bird speed falls below any desired threshold. The lower bound relies on creating small angles, not low speeds. (Thus, in particular, the issue of stalling does not arise.) To simplify the lower bound proof, we allow a small amount of noise into the system. Within the next $n^{O(1)}$ steps following any network switch, the velocity of an $m$ -bird flock may be multiplied by $I_m \otimes \hat{\alpha}$ , where $\hat{\alpha}$ is the diagonal matrix with $\alpha = (\alpha_1, \dots, \alpha_d)$ along the diagonal and rational $|\alpha_i| \leq 1$ encoded over $O(\log n)$ -bits. The noise-free case corresponds to $\alpha_i = 1$ . The perturbed velocity at time $t$ should not differ from the original one by more than $\delta_t = \frac{\log t}{t} e^{O(n^3)}$ but we allow a number of perturbations as large as $e^{O(n^3)}$ . This noise model could be enriched considerably without affecting the convergence bounds, but our choice was guided by simplicity. Note that some restrictions are necessary for convergence; trivially, noise must be bounded past the last switch since two flocks flying parallel to each other could otherwise be forced to merge arbitrarily far into the future. Switching to a noisy model has two benefits: one is a more general result, since the same upper bound on the convergence time holds whether the noise is turned on or off; the other is a simpler lower bound proof. It allows us to keep the initial conditions extremely simple. We use only $\log n$ perturbations and $\delta_t \approx 1/t$ , so noise is not germane to the tower-of-twos growth.

  • Why hysteresis? Network convergence easily implies velocity convergence, but the converse is not true: velocities might reach steady state while the network does not. Indeed, in §3.2, we specify a group of birds that alternates forever between one and two flocks without ever converging. This is an interesting but somewhat peripheral issue that it is best to bypass, as is done in [10], by injecting a minute amount of hysteresis into the system. Whatever one’s rule—and, as we mentioned earlier, almost any rule would work—it must be sound, meaning that any two birds at distance ever so slightly away from 1 should have the correct pairing status.Note that soundness does not follow immediately from our definition of hysteresis. This will need to be verified. By construction, we know that any two birds within unit distance of each other at time $t$ are always joined by an edge of the flocking network $G_t$ . We will show that, if we set $\varepsilon_h = n^{-bn^3}$ for a large enough constant $b$ , then no two birds at distance greater than $1 + \sqrt{\varepsilon_h}$ are ever adjacent in $G_t$ .

How robust are the bounds? The tower-of-twos bound continues to hold regardless (almost) of which hysteresis rule we adopt and how many input bits we allow. The assumption $\varepsilon_h = n^{-bn^3}$ is introduced for notational convenience; for example, they allow us to express soundness very simply by saying that no birds at distance greater than $1 + \sqrt{\varepsilon_h}$ should ever be joined by an edge of the network. Without the assumptions above, the bounds are more complicated. For the interested reader, here is what happens to the number $N(n)$ of network switches and the fragmentation breakpoint $t_f$ , ie, the time after which flocks can only merge:

{N(n)=nO(n3)(p+log1εh)n1;tf=1εhnO(n3)2O(p)(p+log1εh)n.\begin{cases} N(n) = n^{O(n^3)} (\mathfrak{p} + \log \frac{1}{\varepsilon_h})^{n-1}; \\ t_f = \frac{1}{\varepsilon_h} n^{O(n^3)} 2^{O(\mathfrak{p})} (\mathfrak{p} + \log \frac{1}{\varepsilon_h})^n. \end{cases}

Discussion. How relevant are this paper’s results? Why are they technically difficult? We address these two points briefly. Our bounds obviously say nothing about physical birds in the real world. They merely highlight the exotic behavior of the mathematical models. Although we focus on a Cucker-Smale variant, we believe that the bounds hold for a much wider variety of neighbor-based models. We introduce new techniques that are likely to be of further interest. The most promising seems to be the notion of a “virtual bird” flying back in time. We design a structure, the flight net, that combines both kinetic and positional information in a way that allows us to use both the geometry and the algebra of the problem at the same time. Perhaps the most intriguing part of this work is the identification of a curious phenomenon, which we call the (iterated) spectral shift.

Self-confidence leads to an interesting phenomenon. Too much of it prevents consensus but so does too little. Harmony in a group seems to be helped by a minimum amount of self-confidence among its members. Both extreme selfishness and excessive altruism get in the way of reaching cohesion in the group. Self-confidence provides a retention mechanism necessary for reaching agreement. The coefficient $c_i(t)d_i(t)$ represents how much a bird lets itself influenced by its neighbors. By requiring that it be less than 1, we enforce a certain amount of self-confidence for each bird. This idea is not new and can be found in [8, 14, 15].

Besides noise and hysteresis, our model differs from Cucker-Smale [7] in two other ways. One is that our flocking networks are not complete graphs: they undergo noncontinuous transitions, which create the piecewise linearity of the system. Another difference is that the transition matrices of our model are not symmetric. This greatly limits the usefulness of linear algebra. The reason why might not beobvious, so here is some quick intuition. Cucker and Smale diagonalize the Laplacian and note that, since only differences are of interest, the vectors might as well be assumed to lie in the space $\mathbf{1}^\perp$ . Not only is that space invariant under the Laplacian but it contracts at an exponential rate set by the Fiedler number (the second eigenvalue). From this, a quadratic Lyapunov function quickly emerges, namely the energy $v^T L_t v$ of the system. When the graph is connected, the Fiedler number is bounded away from 0 by an inverse polynomial, so differences between velocities decay to 0 at a rate of $2^{tn-c}$ for some constant $c > 0$ .

In the nonsymmetric case (ours), this approach is doomed. If, by chance, all the transition matrices had the same left eigenvectors, then the variance of the time-dependent Markov chain sampled at the (common) stationary distribution would in fact be a valid Lyapunov function, but that assumption is completely unrealistic. In fact, it has been proven [9, 19] that the dynamical systems under consideration do not admit of any suitable quadratic Lyapunov function for $n \geq 8$ . Worse, as was shown by Olshevsky and Tsitsiklis [19], there is not even any hope of finding something weaker, such as a nonzero positive semidefinite matrix $\Lambda$ satisfying, for any allowable transition $v(t) \rightarrow v(t+1)$ ,

{Λ1=0;v(t+1)TΛv(t+1)v(t)TΛv(t).\begin{cases} \Lambda \mathbf{1} = 0; \\ v(t+1)^T \Lambda v(t+1) \leq v(t)^T \Lambda v(t). \end{cases}

Our transition matrices are diagonalizable, but the right eigenspace for the subdominant eigenvalues is not orthogonal to $\mathbf{1}$ and the maps might not even be globally nonexpansive: for example, the stochastic matrix

115(123105)\frac{1}{15} \begin{pmatrix} 12 & 3 \\ 10 & 5 \end{pmatrix}

has the two eigenvalues 1 and 0.133; yet it stretches the unit vector $(1, 0)$ to one of length 1.041. Linear algebra alone seems unable to prove convergence. The rationality of limit configurations is not entirely obvious. In fact, the iterated spectral shift is reminiscent of lacunary-series constructions of transcendental numbers, which is not the most auspicious setting for proving rationality. This work draws from many areas of mathematics and computer science, including Markov chains, nonnegative matrices, algebraic graph theory, elimination theory, combinatorics, harmonic analysis, circuit complexity, computational geometry, and of course linear algebra.

2 A Bird's Eye View of the Proof

To establish a tight bound on the convergence time, we break down the proof into four parts, each one using a distinct set of ideas. We briefly discuss each one in turn. The first step is to bound the number of network switches while ignoring all time considerations. This decoupling allows us to treat the problemas purely one of information transfer. In one step a bird influences each one of its neighbors by forcing its velocity into the computation of these neighbors' new velocities. This influence propagates to other birds in subsequent steps in a manner we can easily trace by following the appropriate edges along the time-dependent flocking network. Because of self-confidence, each bird influences itself constantly. It follows that once a bird influences another one (directly or indirectly via other birds) it does so forever, even if the two birds find themselves forever confined to distinct connected components. For this reason, influence alone is a concept of limited usefulness. We need another analytical tool: refreshed influence. Suppose that, at time $t_0$ , $\mathcal{B}_1$ claims influence on $\mathcal{B}_2$ . As we just observed, this claim will hold for all $t > t_0$ . But suppose that we “reboot” the system at time $t_0 + 1$ and declare all influences void. We may now ask if $\mathcal{B}_1$ will again claim influence on $\mathcal{B}_2$ at some time $t > t_0$ in the future: in other words, whether a chain of edges will over time transfer information again from $\mathcal{B}_1$ to $\mathcal{B}_2$ after $t_0$ . If yes, we then speak of refreshed influence. Suppose now that $\mathcal{B}_1$ exerts refreshed influence on $\mathcal{B}_2$ infinitely often: we call such influence recurrent. Although influence is not a symmetric relation, it is an easy exercise to prove that recurrent influence is.

Figure 4: Each bird is influenced by the one pointing to it. If this chain of influence occurs repeatedly (not necessarily with the same set of intermediate birds), a backward sphere of influence centered at the end of the chain will begin to propagate backwards and eventually reach the first bird in the chain.

This appears to be a principle of general interest. If political conversations consist of many two-way communications between pairs of people, with the pairs changing over time, then the only way $A$ can influence $B$ repeatedly is if it is itself influenced by $B$ repeatedly. What makes this fact interesting is that it holds even if $A$ and $B$ never exchange opinions directly with each other and only a single pairwise communication occurs at any given time. Self-confidence plays an important role in this phenomenon. It provides information retention thatprevents agents from being influenced by their own opinions in periodic fashion. In fixed networks, this avoids the classical oscillation issue for random walks in bipartite graphs.

In time-dependent networks, the role of self-confidence is more subtle. To understand it, one must first remember one fundamental difference between fixed directed and undirected consensus networks (ie, where at each step, the opinion at each node $v$ is averaged over the opinions linked to by the edges from $v$ ). In a fixed directed network, the fraction of an agent's opinion that is measurable at some other node of the network might be exponentially small in the time elapsed since that opinion was expressed. This cannot happen in undirected networks: any fraction of an opinion is either 0 or bounded from below independently of time. Time-dependent undirected networks, on the other hand, are expressive enough to (essentially) simulate fixed directed ones: time, indeed, can be used to break edge symmetry. The benefits of undirectedness are thus lost, and time-dependent undirected consensus networks can behave much like fixed directed ones—see [5, 6] for an application of this principle to interactive proof systems; in particular, they can witness exponential opinion propagation decay. Adding self-confidence magically prevents such decay. The idea would appear to warrant special scrutiny outside of its native habitat of computer science and control theory.

How many switches? Suppose that $\mathcal{B}_1$ exerts recurrent influence on $\mathcal{B}_2$ . We show that, at some point, both birds will join a connected component of the flocking network and remain there forever. How many switches can occur before that event? Let $V_1$ be the set of birds influenced by $\mathcal{B}_1$ . As soon as everyone in $V_1$ has been influenced by $\mathcal{B}_1$ , let's "reboot" the system and define $V_2$ to be the new set of birds with refreshed influence from $\mathcal{B}_1$ . Obviously $V_1 \supseteq V_2$ . Repeating this process leads to an infinite nested sequence

V1V2V3V,V_1 \supseteq V_2 \supseteq V_3 \supseteq \cdots \supseteq V_\infty,

where $V_\infty$ contains at least the two birds $\mathcal{B}_1$ and $\mathcal{B}_2$ . Let $T_k$ be the formation time of $V_k$ and let $\delta_k$ be the difference in velocity between the two birds at time $T_k$ . We wish we could claim a uniform bound, $|\delta_k|2 < (1 - \varepsilon)|\delta_{k-1}|_2$ , for some fixed $\varepsilon > 0$ independent of the time difference $T_k - T{k-1}$ . Indeed, this would show that, for $k$ large enough, the two velocities are close enough for the hysteresis rule to kick in and keep the two birds together in the same flock forever. Of course, since the two birds need not be adjacent, this argument should be extended to all pairs of birds in $V_\infty$ . While the inequality $|\delta_k|2 < (1 - \varepsilon)|\delta{k-1}|_2$ is too much to ask for, we show that $|\delta_k|2 \leq \zeta_k$ , where $\zeta_k < (1 - \varepsilon)\zeta{k-1}$ . In other words, the velocity difference between $\mathcal{B}_1$ and $\mathcal{B}2$ may not shrink monotonically, but it is bounded by a function that does. The uniformity of the shrinking, which is crucial, depends critically on self-confidence and the retention mechanism it implies. Technically, this translates into a uniform lower bound on the nonzero entries of products of stochastic matrices. This allows us to rescue the previous argument and boundthe value of $k$ such that $V_k = V_\infty$ . To bound the number of switches before time $T_k$ , we need to find how many of them can take place between a reboot at $T{j-1}$ and the formation of $V_j$ . The key observation is that $V_j$ is formed by a growth process of smaller flocks (ie, all of them of size less than $n$ ): we can therefore set up a recurrence relation and bound the number of switches inductively.

  • How much time between switches? Flock behavior between switches is linear, so spectral analysis provides most of the tools we need to bound the inter-switch time. At time $t$ , the number of bits needed to encode the velocity is (roughly) $O(t)$ . This means that, in the worst case, two birds can fly either parallel to each or at an angle at least $e^{-O(t)}$ . From this we can infer that, should the birds want to be joined together in the flocking network after time $t$ , this union must happen within a period of $e^{O(t)}$ . Things are more complex if the stationary velocities of the two flocks are parallel. We need to use root separation bounds for various extension fields formed by adjoining to the rationals all the relevant eigen-information. Intuitively, the question we must answer is how long one must wait for a system of damped oscillators to cross a given real semi-algebraic set with known parameters. All of these techniques alone can only yield a convergence time bound in the form of a tower-of-twos of height exponential in $n$ . To bring the height down to logarithmic requires two distinct ideas from computational geometry and circuit complexity.

  • How to bring the height down to linear? So far, we have only used combinatorics, algebraic graph theory, linear algebra, and elimination theory. We use algorithmic ideas from convex geometry to reduce the height to linear. We lift the birds into 4 dimensions (or $d + 1$ in general) by making time into one of the dimensions. We then prove that, after exponential time, birds can only fly almost radially (ie, along a line passing through the origin). This implies that, after a certain time threshold, flocks can only merge and never fragment again. From that point on, reducing the height of the tower-of-twos to linear is easy. Our geometric investigation introduces the key idea of a virtual bird. The stochastic transitions have a simple geometric interpretation in terms of new velocities lying in the convex hulls of previous ones. This allows us to build an exponential-size flight net consisting of convex structures through which all bird trajectories can be monitored. A useful device is to picture the birds flying back in time with exactly one of them carrying a baton. When a bird is adjacent to another one in a flock, it may choose to pass its baton. The trajectory of the baton is identified as that of a virtual bird. Because of the inherent nondeterminism of the process, we may then ask the question: is there always a virtual bird trajectory that follows a near-straight line? The answer, obviously negative in the case of actual birds, turns out to be yes. This is the benefit of virtuality. This fact has numerous geometric consequences bearing on the angular flight motion of the real birds.

  • How to bring the height down to logarithmic? It is not so easy to build intuitionfor the logarithmic height of the tower-of-twos.4 A circuit complexity framework helps to explain the residue clearing phenomenon behind it. To get a tower-of-twos requires an iterated spectral shift. When two flocks meet, energy must be transferred from the high-frequency range down to the lowest mode in the power spectrum. This process builds a residue: informally, think of it, for the purpose of intuition, as residual heat generated by the transfer. This heat needs to be evacuated to make room for further spectral shifts. The required cooling requires free energy in the form of previously created spectral shifts. This leads to an inductive process that limits any causal chain of spectral shifts to logarithmic length. The details are technical, and the best way to build one’s intuition is to digest the lower bound first.

  • How to prove the optimality of the logarithmic height? The starting configuration is surprisingly simple. The $n$ birds stand on a wire and fly off together at various angles. The initial conditions require only $O(\log n)$ bits per bird. The $n$ birds meet in groups of 2, 4, 8, etc, forming a balanced binary tree. Every “collision” witnesses a spectral shift that creates flying directions that are increasingly parallel; hence the longer waits between collisions. To simplify the calculations, we use the noisy model to flip flocks occasionally in order to reverse their flying directions along the $X$ -axis. This occurs only $\log n$ times and can be fully accounted for by the model we use for the upper bound. Because the flocks are simple paths, we can use harmonic analysis for cyclic groups to help us resolve all questions about their power spectra.

3 The Upper Bound

We begin with a few opening observations in §3.1. We explore both the algebraic and geometric aspects of flocking in §3.2. We establish a crude convergence bound in §3.3, which gives us a glimpse of the spectral shift. An in-depth study of its combinatorial aspects is undertaken in §3.4, from which a tight upper bound follows. We shall always assume that $\mathbf{p} \geq n^3$ . To highlight the robustness of the bounds, we leave both $\mathbf{p}$ and $\varepsilon_h$ as parameters throughout much of our discussion, thus making it easier to calculate convergence times for arbitrary settings. For convenience and clarity, we adopt the default settings below in §3.4 (but not before). One should keep in mind that virtually any assignment of parameters would still produce a tower-of-twos. Let $b$ denote a large enough constant:

DEFAULT SETTINGS{p=n3;εh=nbn3.(2)\text{DEFAULT SETTINGS} \quad \begin{cases} \mathbf{p} = n^3; \\ \varepsilon_h = n^{-bn^3}. \end{cases} \quad (2)


4As a personal aside, let me say that I acquired that intuition only after I had established the matching lower bound. For this reason, I recommend reading the lower bound section before the final part of the upper bound proof.Recall that $\mathfrak{p}$ and $\varepsilon_h$ denote, respectively, the input bit-size and the hysteresis parameter. With these settings, the fragmentation breakpoint and the maximum switch count are both $n^{O(n^3)}$ .

3.1 Preliminaries

We establish a few useful facts about the growth of the coordinates over time. It is useful to treat coordinates as integers, which we can do by expressing them as fractions sharing the same denominator. For example, the initial positions and velocities can be expressed either as $\mathfrak{p}$ -bit rationals or, more usefully, as $O(\mathfrak{p}n)$ -bit CD-rationals, ie, rationals of the form $p_i/q$ , with the common denominator $q$ . We mention some important properties of such representations. We will also introduce some of the combinatorial tools needed to measure ergodicity. The objective is to predict how fast backward products of stochastic matrices tend to rank-one matrices. We treat the general case in this section and investigate the time-invariant case in the next.

Numerical Complexity. The footprint of a matrix $A$ is the matrix $\underline{A}$ derived from $A$ by replacing each nonzero entry by 1. For $t \geq s$ , we use $P(t, s)$ as shorthand for $P(t)P(t-1)\cdots P(s)$ . Note that, in the absence of noise, the fundamental equation (1) can be rewritten as

v(t+1)=(P(t,1)Id)v(1).v(t+1) = (P(t, 1) \otimes I_d)v(1).

A bird may influence another one over a period of time without the converse being true; in other words, the matrices $P(t, s)$ and $\underline{P}(t, s)$ are in general not symmetric; the exception is $\underline{P}(t)$ , which not only is symmetric but has its diagonal full of ones. Because of this last property, $\underline{P}(t, s)$ can never lose any 1 as $t$ grows, or to put it differently the corresponding graph can never lose an edge. Before we get to the structural properties of $P(t, s)$ , we need to answer two basic questions: how small can the nonzero entries be and how many bits do we need to represent them? As was shown in [8, 14], nonzero elements of $P(t, s)$ can be bounded uniformly, ie, independently of $t$ . Note that this relies critically on the positivity of the diagonals. Indeed, without the condition $c_i(t)d_i(t) < 1$ , we could choose $P(t) = A$ for even $t$ and $P(t) = B$ for odd $t$ , where

A=(010100001)B=12(020101020).A = \begin{pmatrix} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{pmatrix} \qquad B = \frac{1}{2} \begin{pmatrix} 0 & 2 & 0 \\ 1 & 0 & 1 \\ 0 & 2 & 0 \end{pmatrix}.

For even $t > 0$ ,

P(t,1)=(AB)t/2=(2t/2121t/22t/2010010).P(t, 1) = (AB)^{t/2} = \begin{pmatrix} 2^{-t/2} & 1 - 2^{1-t/2} & 2^{-t/2} \\ 0 & 1 & 0 \\ 0 & 1 & 0 \end{pmatrix}.To understand this process, think of a triangle with a distinguished vertex called the halver. Each vertex holds an amount of money. At odd steps, the halver splits its amount in half and passes on each half to its neighbor; the other vertices, meanwhile, pass on their full amount to the halver. The total amount of money in the system remains the same. At the following (even) step, the role of halver is handed to another vertex (which one does not matter); and the process repeats itself. This alternate sequence of halving and relabeling steps produces an exponential decay. If each vertex is prohibited to pass its full amount, however, then money travels while leaving a “trace” behind. As we prove below, exponential decay becomes impossible. This prohibition is the equivalent of the positive self-confidence built into bird flocking.

LEMMA 3.1. For any $1 \leq s \leq t$ , the elements of $P(t, s)$ are CD-rationals over $O((t - s + 1)n \log n)$ bits. The nonzero elements are in $n^{-O(n^2)}$ .

Proof. Each row of $P(t)$ contains rationals with the same $O(\log n)$ -bit denominator, so the matrix $P(t)$ can be written as $N^{-1}$ times an integer matrix, where both $N$ and the matrix elements are encoded over $O(n \log n)$ bits. Each element of $P(t, s)$ is a product of $t - s + 1$ such matrices; hence a matrix with $O((t - s + 1)n \log n)$ -bit integer elements divided by a common $O((t - s + 1)n \log n)$ -bit integer. For the second part of the lemma, we use arguments from [8, 14]. Recall that $P(t) = I_n - C_t L_t$ , where $C_t$ is a diagonal matrix of positive rationals encoded over $O(\log n)$ bits, so the case $t = s$ is obvious. Let $\rho(t, s)$ be the smallest positive element of $P(t, s)$ and suppose that $t > s$ .

We begin with a few words of intuition. Because $P(s, t) = P(t)P(t - 1, s)$ , a nonzero entry $p_{ij}(t, s)$ is the expected value of $p_{kj}(t - 1, s)$ , for a random $k$ adjacent to $i$ in $\underline{P}(t)$ , or, to be more precise, in the graph induced by the nonzero elements of that matrix. If, for all such $k$ , $p_{kj}(t - 1, s) > 0$ , then $p_{ij}(t, s)$ , being an average of positive numbers, is at least $\rho(t - 1, s)$ , and we are done. On the other hand, having some $p_{kj}(t - 1, s)$ equal to 0 means that the edge $(k, j)$ is missing from the “graph” $\underline{P}(t - 1, s)$ . If we now consider the 2-edge path formed by $(k, i)$ in $\underline{P}(t)$ and $(i, j)$ in $\underline{P}(t - 1, s)$ , we conclude that at least one of $(i, j)$ or $(k, j)$ is a brand-new edge in $\underline{P}(t, s)$ . We then use the fact that such events happen rarely.

  • • Suppose that $p_{kj}(t - 1, s) > 0$ for each $i, j, k$ such that $p_{ij}(t, s)p_{ik}(t) > 0$ . Then, for any $p_{ij}(t, s) > 0$ , by stochasticity,

pij(t,s)=kpik(t)pkj(t1,s)(kpik(t))ρ(t1,s)=ρ(t1,s).p_{ij}(t, s) = \sum_k p_{ik}(t)p_{kj}(t - 1, s) \geq \left( \sum_k p_{ik}(t) \right) \rho(t - 1, s) = \rho(t - 1, s).

It follows that $\rho(t, s) \geq \rho(t - 1, s)$ .

  • • Assume now that $p_{ij}(t, s)p_{ik}(t) > 0$ and $p_{kj}(t - 1, s) = 0$ for some $i, j, k$ . Since $p_{ij}(t, s)$ is positive, so is $p_{il}(t)p_{lj}(t - 1, s)$ for some $l$ ; hence $p_{ij}(t, s) \geq p_{il}(t)p_{lj}(t - 1, s) \geq \rho(t - 1, s)n^{-O(1)}$ . We show that this drop coincides withthe gain of an 1 in $\underline{P}(t, s)$ . The footprint of $P(t)$ is symmetric, so $p_{ki}(t) > 0$ and hence

pkj(t,s)=lpkl(t)plj(t1,s)pki(t)pij(t1,s)nO(1)pij(t1,s).p_{kj}(t, s) = \sum_l p_{kl}(t)p_{lj}(t-1, s) \geq p_{ki}(t)p_{ij}(t-1, s) \geq n^{-O(1)}p_{ij}(t-1, s).

We distinguish between two cases. If $p_{ij}(t-1, s)$ is positive, then so is $p_{kj}(t, s)$ . Since $p_{kj}(t-1, s) = 0$ , the matrix $P(t, s)$ has at least one more positive entry than $P(t-1, s)$ ; recall that no entry can become null as we go from $P(t-1, s)$ to $P(t, s)$ . On the other hand, if $p_{ij}(t-1, s) = 0$ , our assumption that $p_{ij}(t, s) > 0$ leads us to the same conclusion. In both cases, $P(t, s)$ differs from $P(t-1, s)$ in at least one place: this cannot happen more than $n^2$ times.

If we fix $s$ then $\rho(t, s) \geq \rho(t-1, s)$ for all but at most $n^2$ values of $t$ . For the others, as we saw earlier, $p_{ij}(t, s) \geq \rho(t-1, s)n^{-O(1)}$ ; hence $\rho(t, s) \geq \rho(t-1, s)n^{-O(1)}$ . $\square$

The coordinates of $v(1)$ and $x(0)$ can be expressed as CD-rationals over $O(\mathfrak{p}n)$ bits. By the previous lemma, this implies that, in the noise-free case, for $t > 1$ , $v(t) = (P(t-1, 1) \otimes I_d)v(1)$ is a vector with CD-rational coordinates over $O(tn \log n + \mathfrak{p}n)$ bits. The equation of motion (1) yields

x(t)=x(0)+((P(t1,1)++P(1,1)+In)Id)v(1).x(t) = x(0) + \left( (P(t-1, 1) + \cdots + P(1, 1) + I_n) \otimes I_d \right) v(1).

Note that $P(t-1, 1) = N^{-1}Q$ , where $Q$ is an integer matrix with $O(tn \log n)$ -bit integer elements and $N$ is an $O(tn \log n)$ -bit integer. The other matrices are subproducts of $P(t-1, 1) = P(t-1) \cdots P(1)$ , so we can also express them in this fashion for the same value of $N$ . It follows that $v(t)$ and $x(t)$ have CD-rational coordinates over $O(tn \log n + \mathfrak{p}n)$ bits. Adding noise makes no difference asymptotically. Indeed, bringing all the coordinates of the scaling vectors $\alpha$ in CD-rational form adds only $O(n \log n)$ bits to the velocities at each step.

LEMMA 3.2. For any $t \geq 1$ , the vectors $v(t)$ and $x(t)$ have CD-rational coordinates over $O(tn \log n + \mathfrak{p}n)$ bits.

The $\ell_\infty$ norm of the velocity vector never grows, as transition matrices only average them out and the noise factors are bounded by 1: since $\mathfrak{p} \geq n^3$ , it follows that, for any $t \geq 1$ ,

v(t)2=2O(p).(3)\|v(t)\|_2 = 2^{O(\mathfrak{p})}. \quad (3)

Ergodicity. Ignoring noise, the fundamental motion equation (1) gives the position of the birds at time $t > 1$ as $x(t) = x(0) + (P^*(t-1) \otimes I_d)v(1)$ , where

P(t)=P(1)+P(2)P(1)+P(3)P(2)P(1)++P(t)P(2)P(1).P^*(t) = P(1) + P(2)P(1) + P(3)P(2)P(1) + \cdots + P(t) \cdots P(2)P(1).Products of the form $P(t) \cdots P(1)$ appear in many applications [22], including the use of colored random walks in space-bounded interactive proof systems [5,6]. One important difference is that random walks correspond to products that grow by multiplication from the right while the dynamics of bird flocking is associated with backward products: the transition matrices evolve by multiplication from the left. This changes the nature of ergodicity. Intuitively, one would expect (if all goes well) that these products should look increasingly like rank-1 matrices. But can the rows continue to vary widely forever though all in lockstep (weak ergodicity), or do they converge to a fixed vector (strong ergodicity)? The two notions are equivalent for backward products but not for the forward kind [22]. Here is an intuitive explanation. Backward products keep averaging the rows, so their entries themselves tend to converge: geometrically, the convex hull of the points formed by the row keeps shrinking. Forward products lack this notion of averaging. For a simple illustration of the difference, consider the three stochastic matrices:

A=12(1111)B=12(2011)C=14(3131).A = \frac{1}{2} \begin{pmatrix} 1 & 1 \\ 1 & 1 \end{pmatrix} \qquad B = \frac{1}{2} \begin{pmatrix} 2 & 0 \\ 1 & 1 \end{pmatrix} \qquad C = \frac{1}{4} \begin{pmatrix} 3 & 1 \\ 3 & 1 \end{pmatrix}.

Backward products are given by the simple formula,

ABABABABn=C,\underbrace{\cdots ABABABAB}_{n} = C,

for all $n > 1$ . On the other hand, the forward product tends to a rank-one matrix but never converges:

ABABABABn={Ceven n>1;Aodd n>0,\underbrace{ABABABAB\cdots}_{n} = \begin{cases} C & \text{even } n > 1; \\ A & \text{odd } n > 0, \end{cases}

Figure 5: Premultiplying a matrix, whose rows are shown as points, by a stochastic matrix $P(t)$ shrinks its convex hull.

As we just mentioned, the key to ergodicity for backward products resides in the convex hull of the rows. We introduce a family of metrics to measure its“shrinkage.” For any $p > 1$ , let $\tau_p(A)$ , the ergodicity coefficient of $A$ , denote the $\ell_p$ -diameter of the convex hull formed by the rows of a matrix $A$ , ie,

τp(A)=maxi,jAiAjp,\tau_p(A) = \max_{i,j} \|A_{i*} - A_{j*}\|_p,

where $A_{i*}$ denotes the $i$ -th row of $A$ . From the fact that $\ell_p$ is a metric space for $p > 1$ , it follows by convexity that the diameter is always achieved at vertices of the convex hull. We extend the definition to $p = 1$ but, for reasons soon to be apparent, it is important to keep the coefficients between 0 and 1, so we divide the diameter by two, ie,

τ1(A)=12maxi,jkaikajk.\tau_1(A) = \frac{1}{2} \max_{i,j} \sum_k |a_{ik} - a_{jk}|.

To understand why $\tau_p(A)$ relates to ergodicity, assume that $A$ is row-stochastic. We observe then that

0τ1(A)=1mini,jkmin{aik,ajk}1.0 \leq \tau_1(A) = 1 - \min_{i,j} \sum_k \min\{a_{ik}, a_{jk}\} \leq 1.

This follows from the fact that the distance $|a - b|$ between two numbers $a, b$ is twice the difference between their average and the smaller one. There are many fascinating relations between these diameters [22]. For our purposes, the following submultiplicativity result will suffice [14].5

LEMMA 3.3. Given two row-stochastic matrices $A, B$ that can be multiplied,

τ2(AB)τ1(A)τ2(B).\tau_2(AB) \leq \tau_1(A)\tau_2(B).

Proof. Fix the two rows $i, j$ that define $\tau_2(AB)$ , and let $\alpha = 1 - \sum_k \min{a_{ik}, a_{jk}}$ . Note that $0 \leq \alpha \leq \tau_1(A)$ . If $\alpha = 0$ , then $A_{i*} = A_{j*}$ and $\tau_2(AB) = 0$ , so the lemma holds trivially. Assuming, therefore, that $\alpha > 0$ , we derive

τ2(AB)=kaikBkkajkBk2=k(aikmin{aik,ajk})Bkk(ajkmin{aik,ajk})Bk2τ1(A)1αk(aikmin{aik,ajk})Bk1αk(ajkmin{aik,ajk})Bk2.\begin{aligned} \tau_2(AB) &= \left\| \sum_k a_{ik} B_{k*} - \sum_k a_{jk} B_{k*} \right\|_2 \\ &= \left\| \sum_k (a_{ik} - \min\{a_{ik}, a_{jk}\}) B_{k*} - \sum_k (a_{jk} - \min\{a_{ik}, a_{jk}\}) B_{k*} \right\|_2 \\ &\leq \tau_1(A) \left\| \frac{1}{\alpha} \sum_k (a_{ik} - \min\{a_{ik}, a_{jk}\}) B_{k*} - \frac{1}{\alpha} \sum_k (a_{jk} - \min\{a_{ik}, a_{jk}\}) B_{k*} \right\|_2. \end{aligned}

Observe now that the coefficients $\alpha^{-1}(a_{ik} - \min{a_{ik}, a_{jk}})$ are nonnegative and sum up to 1, so the corresponding sum is a convex combination of the rows of $B$ . The same is true of the other sum; so, by convexity, the distance between any two of them cannot exceed $\tau_2(B)$ . $\square$


5 Submultiplicativity is not true for $\tau_2$ in general. First, to make the notion meaningful, we would need to normalize it and use $\hat{\tau}_2 = \tau_2/\sqrt{2}$ instead, to ensure that $\hat{\tau}_2(A) \leq 1$ for any stochastic $A$ . Unfortunately, $\hat{\tau}2$ is not submultiplicative, as we easily check by considering a regular random walk $A$ on $K{2,2}$ and checking that $\hat{\tau}_2(A^2) > \hat{\tau}_2(A)^2$ .Figure 6: $\tau_p(A)$ is the $\ell_p$ -diameter of the convex hull of the rows of $A$ .

Displacement. For future use, we mention an elementary relation between bird distance and velocity. The relative displacement between two birds $\mathcal{B}_i$ and $\mathcal{B}j$ is defined as $\Delta{ij}(t) = |\text{DIST}_t(\mathcal{B}_i, \mathcal{B}j) - \text{DIST}{t-1}(\mathcal{B}_i, \mathcal{B}_j)|$ , where the distance between two birds is denoted by $\text{DIST}_t(\mathcal{B}_i, \mathcal{B}_j) = |x_i(t) - x_j(t)|_2$ .

LEMMA 3.4. For $t \geq 1$ , $\Delta_{ij}(t) \leq |v_i(t) - v_j(t)|_2$ .

Proof. By the triangle inequality,

xi(t)xj(t)2xi(t1)xj(t1)2+xi(t)xi(t1)(xj(t)xj(t1))2.\|x_i(t) - x_j(t)\|_2 \leq \|x_i(t-1) - x_j(t-1)\|_2 + \|x_i(t) - x_i(t-1) - (x_j(t) - x_j(t-1))\|_2.

Reversing the roles of $t$ and $t-1$ gives us a similar inequality, from which we find that

DISTt(Bi,Bj)DISTt1(Bi,Bj)xi(t)xi(t1)(xj(t)xj(t1))2.|\text{DIST}_t(\mathcal{B}_i, \mathcal{B}_j) - \text{DIST}_{t-1}(\mathcal{B}_i, \mathcal{B}_j)| \leq \|x_i(t) - x_i(t-1) - (x_j(t) - x_j(t-1))\|_2.

3.2 The Algebra and Geometry of Flocking

To separate the investigation of network switches from the time analysis is one of the key ideas of our method. Our first task, therefore, is to bound the number of times the flocking network can change, while ignoring how long it takes. Next, we investigate the special case of time-invariant networks. In the worst case, the pre-convergence flying time vastly exceeds the number of network switches, so it is quite intuitive that a time-invariant analysis should be critical. Our next task is then to prove the rationality of the limit configuration. We also show why the hysteresis rule is sound. We follow this with an in-depth study of the convex geometry of flocking. We define the flight net, and with it derive whatis arguably our most versatile analytical tool: a mathematical statement that captures the intuition that flocks that hope to meet in the future must match their velocities more and more closely over time. To do this we introduce the key concept of a virtual bird, which is a bird that can switch identities with its neighbors nondeterministically.

Counting Network Switches. Let $N(n)$ be the maximum number of switches in the flocking network, ie, the number of times $t$ such that $P(t) \neq P(t+1)$ . Obviously, $N(1) = 0$ ; note that, by our requirement that $C_t$ may vary only when $G_t$ does, we could use footprints equivalently in the definition. For the sake of our inductive argument, we need a uniform bound on $N(n)$ over all initial conditions. Specifically, we define $N(n)$ as the largest number of switches of an $n$ -bird flocking network, given arbitrary initial conditions: for the purpose of bounding $N(n)$ , $x(0)$ and $v(1)$ are any real vectors, with $|v(1)|_2 = 2^{O(p)}$ . This involves building a quantitative framework around the existential analyses of [8, 13–15]. We now prove the network switching bound claimed in the “Results” section of §1.

LEMMA 3.5. The maximum number $N(n)$ of switches in the flocking network is bounded by $n^{O(n^3)}(p + \log \frac{1}{\epsilon_h})^{n-1}$ .

COROLLARY 3.6. Under the default settings (2), $N(n) = n^{O(n^3)}$ .

Proof of Lemma 3.5. We begin with the noise-free model. Fix $s > 0$ once and for all. For $t > s$ , let $N(t, s)$ be the number of network changes between times $s$ and $t$ , ie, the number of integers $u$ ( $s < u \leq t$ ) such that $\underline{P}(u) \neq \underline{P}(u-1)$ . Since the diagonal of each $P(t)$ is positive, $\underline{P}(t, s)$ can never lose a 1 as $t$ grows, so there exists a smallest $T_1$ such that $\underline{P}(t, s) = \underline{P}(T_1, s)$ for all $t > T_1$ . Consider the first column and let $n_0 < \dots < n_{l_1} \leq n$ be its successive Hamming weights (ie, number of ones); because $p_{11}(s) \neq 0$ , $n_0 \geq 1$ . We define $t_k$ as the smallest $t \geq s$ such that the first column of $P(t, s)$ acquires weight $n_k$ . Note that $t_0 = s$ and $t_{l_1} \leq T_1$ . How large can $N(t_{k+1}, t_k)$ be, for $0 \leq k < l_1$ ? Let $F$ denote the subgraph of $G_{t_{k+1}}$ consisting of the connected components (ie, flocks) that include the $n_k$ birds indexed by the first column of $\underline{P}(t_k, s)$ . Intuitively, at time $t_k + 1$ , bird $\mathcal{B}_1$ can claim it has had influence over the $n_k$ birds since time $t_0$ . At time $t_k + 2$ , this influence will spread further to the neighbors of these $n_k$ birds in $F$ . Note that having been influenced by $\mathcal{B}_1$ in the past does not imply connectivity among the $n_k$ birds.

  • • If $F$ contains more than $n_k$ birds then, at time $t_k + 1$ , at least one of these extra birds, $\mathcal{B}i$ , is adjacent in $G{t_{k+1}}$ to one of the $n_k$ birds, say, $\mathcal{B}j$ . Then, $p{ij}(t_k + 1) > 0$ and $p_{j1}(t_k, s) > 0$ ; hence $p_{i1}(t_k + 1, s) \geq p_{ij}(t_k + 1)p_{j1}(t_k, s) > 0$ . Since $\mathcal{B}i$ is not one of the $n_k$ birds, $p{i1}(t_k, s) = 0$ and the first column of $\underline{P}(t, s)$ acquires a new 1 between $t_k$ and $t_k + 1$ . This implies that $t_{k+1} = t_k + 1$ and $N(t_{k+1}, t_k) \leq 1$ .- • Assume now that $F$ has exactly $n_k$ vertices. The flocking network $G_{t_k+1}$ consists of a set of flocks totalling $n_k$ birds and a separate set of flocks including the $n - n_k$ others. The next $N(n_k) + N(n - n_k) + 1$ network switches must include one between the two sets, since by then we must run out of allowable “intra-switches.” It follows by monotonicity of $N(n)$ that

N(tk+1,tk)1+N(nk)+N(nnk)2N(n1)+1.N(t_{k+1}, t_k) \leq 1 + N(n_k) + N(n - n_k) \leq 2N(n - 1) + 1.

The figure consists of two side-by-side square panels with a light brown background. Each panel contains a network diagram. In both panels, there is a central black node at the top, connected by black lines to several other nodes. Below these, there are several clusters of white nodes. In the left panel, the black node is connected to a cluster of white nodes, and those white nodes are further connected to other white nodes, showing the propagation of influence. In the right panel, the black node is connected to a cluster of white nodes, but these white nodes are not connected to each other, indicating they are waiting for flocks to join together before the influence can expand further.

Figure 7: The white birds have all been influenced by $\mathcal{B}_1$ : on the left, they propagate that influence at the next step; on the right, they have to wait for flocks to join together before the influence of $\mathcal{B}_1$ can expand further.

In both cases, $N(t_{k+1}, t_k) \leq 2N(n - 1) + 1$ , so summing over all $0 \leq k < l_1$ ,

N(tl1,s)=k=0l11N(tk+1,tk)2nN(n1)+n.N(t_{l_1}, s) = \sum_{k=0}^{l_1-1} N(t_{k+1}, t_k) \leq 2nN(n - 1) + n.

Of course, there is nothing special about bird $\mathcal{B}_1$ . We can apply the same argument for each column and conclude that the time $T_1$ when the matrix $\underline{P}(t, s)$ has finally stabilized satisfies

N(T1,s)2nN(n1)+n.(4)N(T_1, s) \leq 2nN(n - 1) + n. \quad (4)

The index set $V_1$ corresponding to the ones in the first column of $\underline{P}(T_1, s)$ is called the first stabilizer. For $t > T_1$ , no edge of $G_t$ can join $V_1$ to its complement, since this would immediately add more ones to the first column of $\underline{P}(t, s)$ . This means that $\mathcal{B}_1$ can no longer hope to influence any bird outside of $V_1$ past time $T_1$ .

Relabel the rows and columns so that all the ones in $\underline{P}(T_1, s)$ 's first column appear on top. Then, for any $t > T_1$ , $P(t)$ is a 2-block diagonal matrix with the top left block, indexed by $V_1 \times V_1$ , providing the transitions among the verticesof $V_1$ at time $t$ . This is a restatement of our observation regarding $G_t$ and $V_1$ . Here is why. Since the footprint of $P(t)$ is symmetric, it suffices to consider the consequence of a nonzero, nondiagonal entry in $P(t)$ , ie, $p_{ij}(t) > 0$ , with $i \notin V_1$ and $j \in V_1$ . This would imply that

pi1(t,s)pij(t)pj1(t1,s)>0,p_{i1}(t, s) \geq p_{ij}(t)p_{j1}(t-1, s) > 0,

and hence that $i \in V_1$ , a contradiction. Being 2-block diagonal is invariant under composition, so $P(t, T_1 + 1)$ is also a matrix of that type. Let $A|_{V \times W}$ denote the submatrix of $A$ with rows indexed by $V$ and columns by $W$ . Writing $V_0 = {1, \dots, n}$ , for $t > T_1$ ,

PV1×V0(t,s)=PV1×V1(t,T1+1)PV1×V0(T1,s).P|_{V_1 \times V_0}(t, s) = P|_{V_1 \times V_1}(t, T_1 + 1)P|_{V_1 \times V_0}(T_1, s).

By setting $s$ to $T_1 + 1$ we can repeat the same argument, the only difference being that the transition matrices are now $|V_1|$ -by- $|V_1|$ . This leads to the second stabilizer $V_2 \subseteq V_1$ , which, by relabeling, can be assumed to index the top of the subsequent matrices. We define $T_2$ as the smallest integer such that $\underline{P}|{V_1 \times V_1}(t, T_1 + 1) = \underline{P}|{V_1 \times V_1}(T_2, T_1 + 1)$ for all $t > T_2$ . The set $V_2$ indexes the ones in the first column of $\underline{P}|_{V_1 \times V_1}(T_2, T_1 + 1)$ . Iterating in this fashion leads to an infinite sequence of times $T_1 < T_2 < \dots$ and stabilizers $V_1 \supseteq V_2 \supseteq \dots$ such that, for any $t > T_k$ ,

PVk×V0(t,s)=PVk×Vk(t,Tk+1)PVk×Vk1(Tk,Tk1+1)PV2×V1(T2,T1+1)PV1×V0(T1,T0+1),P|_{V_k \times V_0}(t, s) = P|_{V_k \times V_k}(t, T_k + 1)P|_{V_k \times V_{k-1}}(T_k, T_{k-1} + 1) \cdots P|_{V_2 \times V_1}(T_2, T_1 + 1)P|_{V_1 \times V_0}(T_1, T_0 + 1),

where $P|{V_i \times V{i-1}}(T_i, T_{i-1} + 1)$ is a $|V_i|$ -by- $|V_{i-1}|$ matrix and $T_0 = s - 1$ . The stabilizers are the sets under refreshed influence from $\mathcal{B}_1$ . We illustrate this decomposition below:

A=12(200011011)B=12(110110002)C=(100010001).A = \frac{1}{2} \begin{pmatrix} 2 & 0 & 0 \\ 0 & 1 & 1 \\ 0 & 1 & 1 \end{pmatrix} \quad B = \frac{1}{2} \begin{pmatrix} 1 & 1 & 0 \\ 1 & 1 & 0 \\ 0 & 0 & 2 \end{pmatrix} \quad C = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix}.

Consider the word $M = CB^3CABABA$ . The matrix $M|_{V_6 \times V_0}$ is factored as

CV6×V5BV5×V4BV4×V3(BC)V3×V2(AB)V2×V1(ABA)V1×V0,C|_{V_6 \times V_5} B|_{V_5 \times V_4} B|_{V_4 \times V_3} (BC)|_{V_3 \times V_2} (AB)|_{V_2 \times V_1} (ABA)|_{V_1 \times V_0},

where $V_0 = V_1 = V_2 = {1, 2, 3}$ , $V_3 = V_4 = V_5 = {1, 2}$ and $V_6 = {1}$ . The factorization looks like this:

MV6×V0=(10)12(1111)12(1111)12(110110)14(220112112)18(422233233),M|_{V_6 \times V_0} = (1 \quad 0) \cdot \frac{1}{2} \begin{pmatrix} 1 & 1 \\ 1 & 1 \end{pmatrix} \cdot \frac{1}{2} \begin{pmatrix} 1 & 1 \\ 1 & 1 \end{pmatrix} \cdot \frac{1}{2} \begin{pmatrix} 1 & 1 & 0 \\ 1 & 1 & 0 \end{pmatrix} \cdot \frac{1}{4} \begin{pmatrix} 2 & 2 & 0 \\ 1 & 1 & 2 \\ 1 & 1 & 2 \end{pmatrix} \cdot \frac{1}{8} \begin{pmatrix} 4 & 2 & 2 \\ 2 & 3 & 3 \\ 2 & 3 & 3 \end{pmatrix},

with the infinite nested sequence

V1={1,2,3}{1,2,3}{1,2}{1,2}{1,2}{1}{1}{1}V_1 = \{1, 2, 3\} \supseteq \{1, 2, 3\} \supseteq \{1, 2\} \supseteq \{1, 2\} \supseteq \{1, 2\} \supseteq \{1\} \supseteq \{1\} \supseteq \{1\} \cdotsWhat is the benefit of rewriting the top rows of $P(t, s)$ in such a complicated manner? The first column of each $P_{|V_i \times V_{i-1}|}(T_i, T_{i-1} + 1)$ consists entirely of positive entries, so the submultiplicativity of the ergodicity coefficients implies rapid convergence of the products toward a rank-one matrix. This has bearing on the relative displacement of birds and groupings into flocks. By Lemma 3.1, each entry in the first column of each $P_{|V_i \times V_{i-1}|}(T_i, T_{i-1} + 1)$ is at least $n^{-O(n^2)}$ , so half the $\ell_1$ -distance between any two rows is at most $1 - n^{-O(n^2)} \leq e^{-n^{-O(n^2)}}$ ; therefore

τ1(PVi×Vi1(Ti,Ti1+1))enO(n2).\tau_1(P_{|V_i \times V_{i-1}|}(T_i, T_{i-1} + 1)) \leq e^{-n^{-O(n^2)}}.

Lemma 3.3 implies that $\tau_2(A) \leq \tau_1(A)\tau_2(I) \leq \sqrt{2}\tau_1(A)$ , and

τ2(PVk×V0(t,s))2τ1(PVk×Vk(t,Tk+1))i=1kτ1(PVi×Vi1(Ti,Ti1+1))2eknO(n2).(5)\begin{aligned} \tau_2(P_{|V_k \times V_0|}(t, s)) &\leq \sqrt{2}\tau_1(P_{|V_k \times V_k|}(t, T_k + 1)) \prod_{i=1}^k \tau_1(P_{|V_i \times V_{i-1}|}(T_i, T_{i-1} + 1)) \\ &\leq \sqrt{2}e^{-kn^{-O(n^2)}}. \end{aligned} \quad (5)

Let $\chi(i, j)$ denote the $n$ -dimensional vector with all coordinates equal to 0, except for $\chi(i, j)_i = 1$ and $\chi(i, j)_j = -1$ . Note that

vi(t)vj(t)=((χ(i,j)P(t1,1))Id)v(1);v_i(t) - v_j(t) = ((\chi(i, j)P(t-1, 1)) \otimes I_d)v(1);

therefore, by Cauchy-Schwarz and (3),

vi(t)vj(t)2dτ2(P(t1,1))v(1)2τ2(P(t1,1))2O(p).(6)\|v_i(t) - v_j(t)\|_2 \leq \sqrt{d}\tau_2(P(t-1, 1))\|v(1)\|_2 \leq \tau_2(P(t-1, 1))2^{O(p)}. \quad (6)

If we restrict $i, j$ to $V_k$ , we can replace $P(t-1, 1)$ by $P_{|V_k \times V_0|}(t-1, 1)$ and write

vi(t)vj(t)2τ2(PVk×V0(t1,1))2O(p).\|v_i(t) - v_j(t)\|_2 \leq \tau_2(P_{|V_k \times V_0|}(t-1, 1))2^{O(p)}.

Setting $k = n^{b_0 n^2} \lceil p + \log \frac{1}{\varepsilon_h} \rceil$ for a large enough integer constant $b_0 > 0$ , we derive from (5) that, for any $t > T_k + 1$ ,

maxi,jVkvi(t)vj(t)2eknO(n2)+O(p)<εh.(7)\max_{i, j \in V_k} \|v_i(t) - v_j(t)\|_2 \leq e^{-kn^{-O(n^2)} + O(p)} < \varepsilon_h. \quad (7)

By Lemma 3.4, it then follows that $\Delta_{ij}(t) < \varepsilon_h$ . By the hysteresis rule, this means that if birds $\mathcal{B}_i$ and $\mathcal{B}_j$ are joined after time $T_k + 1$ , they will always remain so. This leaves at most $\binom{|V_k|}{2}$ extra network changes (final pairings), so the total number is conservatively bounded by

N(Tk,Tk1)++N(T1,1)+(Vk2).N(T_k, T_{k-1}) + \cdots + N(T_1, 1) + \binom{|V_k|}{2}.

But (4) holds for any pair $(T_i, T_{i-1} + 1)$ , so

N(n)<k(2nN(n1)+n)+n2.N(n) < k(2nN(n-1) + n) + n^2.Since $N(1) = 0$ , for all $n > 1$ ,

N(n)=nO(n3)(p+log1εh)n1.N(n) = n^{O(n^3)} (\mathfrak{p} + \log \frac{1}{\varepsilon_h})^{n-1}.

There is a technical subtlety we need to address. In the inductive step defining $N(n-1)$ , and more generally $N(n')$ for $n' < n$ , the initial conditions and element sizes of the transition matrices should be treated as global parameters: they depend on $n$ , not $n'$ . In fact, it is safe to treat $n$ as a fixed parameter everywhere, except in the recurrence (4). The key observation is that, as $n'$ decreases, the bounds provided by (5) and in the setting of $k = n^{b_0 n^2} (\mathfrak{p} + \log \frac{1}{\varepsilon_h})$ still provide valid—in fact, increasingly conservative—estimates as $n'$ decreases. The noise is handled by reapplying the bound after each of the $e^{O(n^3)}$ perturbations. $\square$

Figure 8: The arborescence of birds separating into groups.

Remark 2.1. The rationality of positions and velocities was never used in the proof. The only requirement is that the initial velocities of the birds should have Euclidean norm in $2^{O(\mathfrak{p})}$ .

Remark 2.2. The nested sequence $V_1 \supseteq V_2 \supseteq \dots$ is infinite but the number of different subsets obviously is not. The smallest stabilizer $V_i$ , denoted $V_{k_1}$ to indicate its relation to $\mathcal{B}1$ , cannot be empty since a bird influences itself for ever; hence ${1} \in V{k_1}$ . If $|V_{k_1}| > 1$ , then $\mathcal{B}1$ influences all the birds in $V{k_1}$ recurrently, ie, infinitely often. In fact, this is true not just of $\mathcal{B}1$ but of all $V{k_1}$ , all of whose birds influence all others in that set recurrently. The sets $V_{k_1}, \dots, V_{k_n}$ are thereforepairwise disjoint or equal. This implies a partition of the bird set into recurrently self-influencing classes. One can model the process leading to it as an arborescence whose root corresponds to the first time the set of $n$ birds is split into two subsets that will no longer influence each other. Iterating in this fashion produces a tree whose leaves are associated with the disjoint $V_{k_j}$ 's. Note that the stabilizers $V_1, V_2$ , etc, are specific to $\mathcal{B}_1$ and their counterparts for $\mathcal{B}2$ might partly overlap with them (except for the last one); therefore, the path in the tree toward the leaf labeled $V{k_1}$ cannot be inferred directly from the stabilizers.

Time-Invariant Flocking. Birds are expected to spend most of their time flying in fixed flocks. We investigate this case separately. The benefit is to derive a convergence time that is exponentially faster than in the general case. In this section, $G_t = G$ is time-invariant; for notational convenience, we assume there is a single flock, ie, $G_t$ is connected. The flocking is noise-free. We can express the stochastic matrix $P$ as $I_n - CL$ . The corresponding Markov chain is reversible and, because of connectivity, irreducible. The diagonal being nonzero, it is aperiodic, hence ergodic. The transition matrix $P$ has the simple dominant eigenvalue 1 with right and left eigenvectors $\mathbf{1}$ and

π=1tr C1C11,\pi = \frac{1}{\text{tr } C^{-1}} C^{-1} \mathbf{1},

respectively. Lack of symmetry does not keep $P$ from being diagonalizable, though it denies us eigenvector orthogonality. Define

M=C1/2PC1/2=C1/2(InCL)C1/2=InC1/2LC1/2.(8)M = C^{-1/2} P C^{1/2} = C^{-1/2} (I_n - CL) C^{1/2} = I_n - C^{1/2} L C^{1/2}. \quad (8)

Being symmetric, $M$ can be diagonalized as $\sum_{k=1}^n \lambda_k u_k u_k^T$ , where the $u_k$ 's are orthonormal eigenvectors and the eigenvalues are real. It follows that $P$ can be diagonalized as well, with the same eigenvalues. By Perron-Frobenius and standard properties of ergodic walks [4, 22], $1 = \lambda_1 > \lambda_2 \geq \dots \geq \lambda_n > -1$ and $u_1 = (\sqrt{\pi_1}, \dots, \sqrt{\pi_n})^T$ . Since $\sum_k u_k u_k^T = I_n$ , the following identity holds for all nonnegative $s$ , including $s = 0$ (for which we must assume that $0^0 = 1$ ):

Ps=C1/2MsC1/2=1πT+k=2nλksC1/2ukukTC1/2.(9)P^s = C^{1/2} M^s C^{-1/2} = \mathbf{1} \pi^T + \sum_{k=2}^n \lambda_k^s C^{1/2} u_k u_k^T C^{-1/2}. \quad (9)

The left and right eigenvectors of $P$ for $\lambda_k$ are given (in column form) by $C^{-1/2} u_k$ and $C^{1/2} u_k$ and, together, form inverse matrices; in general, neither group forms an orthogonal basis. We can bound the second largest eigenvalue by using standard algebraic graph theory. We include a proof for completeness.

LEMMA 3.7. *If $\mu \stackrel{\text{def}}{=} \max_{k>1} |\lambda_k|$ , then $\mu \leq 1 - n^{-O(1)}$ .*Proof. By the $O(\log n)$ -bit encoding of $C$ , each diagonal of $P$ is at least $n^{-b}$ , for some constant.6 The matrix $(1 - \frac{1}{2}n^{-b})^{-1}(P - \frac{1}{2}n^{-b}I_n)$ is stochastic and all of its eigenvalues all lie in $[-1, 1]$ . It follows that $\lambda_{n-1} \geq n^{-O(1)} - 1$ , for any $k > 1$ . Observe now that $1 - \lambda_2$ is the smallest positive eigenvalue of the normalized Laplacian $C^{1/2}LC^{1/2}$ . The simplicity of the eigenvalue 0 (by connectivity) implies that any eigenvector of the normalized Laplacian corresponding to a nonzero eigenvalue is normal to $C^{-1/2}\mathbf{1}$ ; therefore, by Courant-Fischer,

1λ2=min{xTC1/2LC1/2x:1TC1/2x=0 and x2=1}.1 - \lambda_2 = \min \left\{ x^T C^{1/2} L C^{1/2} x : \mathbf{1}^T C^{-1/2} x = 0 \text{ and } \|x\|_2 = 1 \right\}.

Write $y = C^{1/2}x$ and express the system in the equivalent form: $1 - \lambda_2 = \min y^T L y$ , subject to (i) $\mathbf{1}^T C^{-1} y = 0$ and (ii) $|C^{-1/2}y|_2 = 1$ . By using ideas from [4, 12], we argue that, for some $m$ and $M$ , by (i), $y_m \leq 0$ , for some $m$ , and from (ii) $y_M \geq (\text{tr } C^{-1})^{-1/2}$ . Since $G$ is connected, there exists a path $\mathcal{M}$ of length at most $n$ joining nodes $m$ and $M$ . Thus, by Cauchy-Schwarz, the solution $y$ of the system satisfies:

1λ2=yTLy=(i,j)G(yiyj)2(i,j)M(yiyj)21n((i,j)Myiyj)21nyMym21n(tr C1)=nO(1).\begin{aligned} 1 - \lambda_2 = y^T L y &= \sum_{(i,j) \in G} (y_i - y_j)^2 \geq \sum_{(i,j) \in \mathcal{M}} (y_i - y_j)^2 \geq \frac{1}{n} \left( \sum_{(i,j) \in \mathcal{M}} |y_i - y_j| \right)^2 \\ &\geq \frac{1}{n} |y_M - y_m|^2 \geq \frac{1}{n(\text{tr } C^{-1})} = n^{-O(1)}. \end{aligned}

By (9), for all $i, j, s > 0$ , $(P^s){ij} \geq \pi_j - \sum{k>1} |\lambda_k|^s \sqrt{c_i/c_j} |(u_k)_i (u_k)_j| \geq \pi_j - n^{O(1)} \mu^s$ . A similar derivation gives us the corresponding upper bound; so,7 by Lemma 3.7,

Ps1πTFesnO(1)+O(logn).(10)\|P^s - \mathbf{1}\pi^T\|_F \leq e^{-sn^{-O(1)} + O(\log n)}. \quad (10)

Similarly, for $s > n^{c_0}$ , for a constant $c_0$ large enough,

τ1(Ps)=1mini,jk=1nmin{(Ps)ik,(Ps)jk}1k=1n(πknO(1)esnO(1))=nO(1)esnO(1)<12.(11)\begin{aligned} \tau_1(P^s) &= 1 - \min_{i,j} \sum_{k=1}^n \min \{ (P^s)_{ik}, (P^s)_{jk} \} \\ &\leq 1 - \sum_{k=1}^n (\pi_k - n^{O(1)} e^{-sn^{-O(1)}}) = n^{O(1)} e^{-sn^{-O(1)}} < \frac{1}{2}. \end{aligned} \quad (11)

Given a vector $\xi$ in $\mathbb{R}^n$ , consider the random variable $X$ formed by picking the $i$ -coordinate of $x$ with probability $\pi_i$ . As claimed in the introduction, the variance of $X$ is a quadratic Lyapunov function. This is both well known and intuitively obvious since we are sampling from the stationary distribution of an

6 To simplify the notation, constants such as $b$ and $c$ are reused frequently in the text, with their values depending on the context.

7 The Frobenius norm $|M|_F$ of a matrix is the Euclidean norm of the vector formed by its elements. The property we will use most often is a direct consequence of Cauchy-Schwarz, $|Mu|_2 \leq |M|_F |u|_2$ , and more generally the submultiplicativity of the norm.ergodic Markov chain and then taking one “mixing” step: the standard deviation decreases at a rate given by the Fiedler value. As was observed in [19], because the random variable involves only $\pi$ and not $P$ , any flock switching that keeps the graph connected with the same stationary distribution admits a common quadratic Lyapunov function. If $\xi = \mathbf{1}$ , then obviously, $\mathbf{var}X = 0$ . We now show that the variance decays exponentially fast.

LEMMA 3.8. $\mathbf{var}(PX) \leq \mu^2(\mathbf{var}X)$ .

Proof. For any $\xi$ , the vector $y = (I_n - \mathbf{1}\pi^T)\xi$ is such that $C^{-1/2}y$ is orthogonal to $u_1 = (\sqrt{\pi_1}, \dots, \sqrt{\pi_n})^T$ . Therefore the latter lies in the contractive eigenspace of $M$ and

M(C1/2y)2μC1/2y2;\|M(C^{-1/2}y)\|_2 \leq \mu\|C^{-1/2}y\|_2;

hence, by (8),

(Py)TC1(Py)=(yTC1/2)(C1/2PTC1/2)(C1/2PC1/2)(C1/2y)=MC1/2y22μ2C1/2y22.\begin{aligned} (Py)^T C^{-1}(Py) &= (y^T C^{-1/2})(C^{1/2} P^T C^{-1/2})(C^{-1/2} P C^{1/2})(C^{-1/2} y) \\ &= \|M C^{-1/2} y\|_2^2 \leq \mu^2 \|C^{-1/2} y\|_2^2. \end{aligned}

As a result,

(Py)TC1(Py)μ2yTC1y.(Py)^T C^{-1}(Py) \leq \mu^2 y^T C^{-1} y.

Since $\pi = (\text{tr } C^{-1})^{-1} C^{-1} \mathbf{1}$ ,

varX=i=1nπi(ξiiπξi)2=ξT(Inπ1T)C1tr C1(In1πT)ξ=yTC1tr C1y.\mathbf{var}X = \sum_{i=1}^n \pi_i \left( \xi_i - \sum_i \pi \xi_i \right)^2 = \xi^T (I_n - \pi \mathbf{1}^T) \frac{C^{-1}}{\text{tr } C^{-1}} (I_n - \mathbf{1} \pi^T) \xi = y^T \frac{C^{-1}}{\text{tr } C^{-1}} y.

Because $P$ commutes with $I_n - \mathbf{1} \pi^T$ ,

var(PX)=(Py)TC1tr C1(Py)μ2(varX),\mathbf{var}(PX) = (Py)^T \frac{C^{-1}}{\text{tr } C^{-1}} (Py) \leq \mu^2 (\mathbf{var}X),

and $\mathbf{var}X$ is the desired Lyapunov function. $\square$

What both (11) and Lemma 3.8 indicate is that convergence for a time-invariant flock evolves as $e^{-tn^{-O(1)}}$ , whereas in general the best we can do is invoke (5) and hope for a convergence speed of the form $e^{-tn^{-O(n^2)}}$ , which is exponentially slower.

The Rationality of Limit Configurations. The locations of the birds remain rational at all times. Does this mean that in the limit their configurations remain so? We prove that this is, indeed, the case. We do not do this simply out of curiosity. This will be needed for the analysis of convergence. We cover the case of a time-invariant connected network here and postpone the general case for later. For $t > 0$ , we define$$\Gamma_t = -\mathbf{1}\pi^T t + \sum_{s=0}^{t-1} P^s. \quad (12)$$

It is immediate that $\Gamma_t$ converges to some matrix $\Gamma$ , as $t$ goes to infinity. Indeed, by (9),

Γ=s0(Ps1πT)=k>111λkC1/2ukukTC1/2.\Gamma = \sum_{s \geq 0} (P^s - \mathbf{1}\pi^T) = \sum_{k > 1} \frac{1}{1-\lambda_k} C^{1/2} u_k u_k^T C^{-1/2}.

What is perhaps less obvious is why the limit is rational. We begin with a simple characterization of $\Gamma$ , which we derive by classical arguments about fundamental matrices for Markov chains [11]. We also provide a more ad hoc characterization (Lemma 3.10) that will make later bound estimations somewhat easier.

LEMMA 3.9. As $t \rightarrow \infty$ , $\Gamma_t$ converges to $\Gamma = -\mathbf{1}\pi^T + (I_n - P + \mathbf{1}\pi^T)^{-1}$ .

Proof. Because $\mathbf{1}$ and $\pi$ are respectively right and left eigenvectors of $P$ for the eigenvalue 1, for any integer $s > 0$ ,

(P1πT)s=Ps1πT.(13)(P - \mathbf{1}\pi^T)^s = P^s - \mathbf{1}\pi^T. \quad (13)

This follows from the identity

(P1πT)s=Ps+k=0s1(1)sk(sk)Pk(1πT)sk=Ps+(1πT)k=0s1(1)sk(sk)=Ps1πT.\begin{aligned} (P - \mathbf{1}\pi^T)^s &= P^s + \sum_{k=0}^{s-1} (-1)^{s-k} \binom{s}{k} P^k (\mathbf{1}\pi^T)^{s-k} \\ &= P^s + (\mathbf{1}\pi^T) \sum_{k=0}^{s-1} (-1)^{s-k} \binom{s}{k} = P^s - \mathbf{1}\pi^T. \end{aligned}

And so, for $t > 1$ ,

Γt+1πT=In+s=1t1(Ps1πT)=s=0t1(P1πT)s.\Gamma_t + \mathbf{1}\pi^T = I_n + \sum_{s=1}^{t-1} (P^s - \mathbf{1}\pi^T) = \sum_{s=0}^{t-1} (P - \mathbf{1}\pi^T)^s.

Pre-multiplying this identity by the “denominator” that we expect from the geometric sum, ie, $I_n - P + \mathbf{1}\pi^T$ , we simplify the telescoping sum, using (13) again,

(InP+1πT)(Γt+1πT)=(InP+1πT)s=0t1(P1πT)s=In(P1πT)t=In(Pt1πT)\begin{aligned} (I_n - P + \mathbf{1}\pi^T)(\Gamma_t + \mathbf{1}\pi^T) &= (I_n - P + \mathbf{1}\pi^T) \sum_{s=0}^{t-1} (P - \mathbf{1}\pi^T)^s \\ &= I_n - (P - \mathbf{1}\pi^T)^t = I_n - (P^t - \mathbf{1}\pi^T) \end{aligned}

By (9), $P^t$ converges to $\mathbf{1}\pi^T$ as $t$ goes to infinity, so $(I_n - P + \mathbf{1}\pi^T)(\Gamma_t + \mathbf{1}\pi^T)$ converges to the identity. This implies that, for $t$ large enough, the matrix cannot be singular and, hence, neither can $I_n - P + \mathbf{1}\pi^T$ . This allows us to write:

Γ+1πT=(InP+1πT)1.\Gamma + \mathbf{1}\pi^T = (I_n - P + \mathbf{1}\pi^T)^{-1}.

There is another characterization of $\Gamma$ without $\pi$ in the inverse matrix. We use the notation $(Y | y)$ to refer to the $n$ -by- $n$ matrix derived from $Y$ by replacing its last column with the vector $y$ .

LEMMA 3.10. $\Gamma = (I_n - \mathbf{1}\pi^T | \mathbf{0}) (I_n - P | \mathbf{1})^{-1}$ .

Proof. Since $\pi$ is a left eigenvector of $P$ for 1, $\mathbf{1}\pi^T(I_n - P) = 0$ ; hence, for $t > 0$ ,

InPt=(In+P++Pt1)(InP)=(Γt+1πTt)(InP)=Γt(InP).I_n - P^t = (I_n + P + \cdots + P^{t-1})(I_n - P) = (\Gamma_t + \mathbf{1}\pi^T t)(I_n - P) = \Gamma_t(I_n - P).

As $t \rightarrow \infty$ , $P^t \rightarrow \mathbf{1}\pi^T$ ; therefore $\Gamma(I_n - P) = I_n - \mathbf{1}\pi^T$ . Since $\mathbf{1}$ lies in the kernel of $\Gamma_t$ , and hence of $\Gamma$ , the latter matrix satisfies the relation

Γ(InP1)=(In1πT0).(14)\Gamma(I_n - P | \mathbf{1}) = (I_n - \mathbf{1}\pi^T | \mathbf{0}). \quad (14)

The simplicity of $P$ 's dominant eigenvalue 1 implies that $I_n - P$ is of rank $n - 1$ . Since $\mathbf{1} \in \ker(I_n - P)$ , the last column of $I_n - P$ is the negative sum of the others; so to get the correct rank the first $n - 1$ columns of $I_n - P$ must be independent. Note that the vector $\mathbf{1}$ is not in the space they span: if, indeed, it were, we would have $\mathbf{1} = (I_n - P)y$ , for some $y \in \mathbb{R}^n$ . Since $\pi^T(I_n - P) = 0$ , this would imply that $1 = \pi^T \mathbf{1} = \pi^T(I_n - P)y = 0$ , a contradiction. This is evidence that $(I_n - P | \mathbf{1})$ is of full rank, which, by (14), completes the proof. □

The motion equation (1) becomes, for $t \geq 1$ ,

x(t)=x(0)+(s=0t1PsId)v(1)(15)x(t) = x(0) + \left( \sum_{s=0}^{t-1} P^s \otimes I_d \right) v(1) \quad (15)

or, equivalently, by (12),

x(t)=x(0)+t((1πT)Id)v(1)+(ΓtId)v(1).(16)x(t) = x(0) + t((\mathbf{1}\pi^T) \otimes I_d)v(1) + (\Gamma_t \otimes I_d)v(1). \quad (16)

We call $\mathbf{m}_\pi[x(t)] = (\pi^T \otimes I_d)x(t)$ the mass center of the flock and the vector $\mathbf{m}_\pi[v(1)]$ its stationary velocity. The latter is the first spectral (vector) coefficient of the velocity. In our lower bound, we will make it the first Fourier coefficient of the dynamical system. The mass center drifts in space at constant speed along a fixed line in $d$ -space: Indeed, $\pi^T \Gamma_t = 0$ , so by (16),

mπ[x(t)]=mπ[x(0)]+tmπ[v(1)]\mathbf{m}_\pi[x(t)] = \mathbf{m}_\pi[x(0)] + t\mathbf{m}_\pi[v(1)]

and

x(t)=x(0)start+t(1Id)mπ[v(1)]linear drift+(ΓtId)v(1)damped oscillator.(17)x(t) = \underbrace{x(0)}_{\text{start}} + \underbrace{t(\mathbf{1} \otimes I_d)\mathbf{m}_\pi[v(1)]}_{\text{linear drift}} + \underbrace{(\Gamma_t \otimes I_d)v(1)}_{\text{damped oscillator}}. \quad (17)

The oscillations are damped at a rate of $e^{-tn^{-O(1)}}$ . (We use the term not in the “harmonic” sense but by reference to the negative eigenvalues that might causeactual oscillations.) Moving the origin to the mass center of the birds, we express $x(t)$ , relative to this moving frame, as

xr(t)=x(t)(1Id)mπ[x(t)];x^r(t) = x(t) - (\mathbf{1} \otimes I_d) \mathbf{m}_\pi[x(t)];

therefore, by simple tensor manipulation,

x(t)=xr(t)+((1πT)Id)x(0)+t((1πT)Id)v(1);(18)x(t) = x^r(t) + ((\mathbf{1}\pi^T) \otimes I_d)x(0) + t((\mathbf{1}\pi^T) \otimes I_d)v(1); \quad (18)

and, by (16),

xr(t)=x(t)((1πT)Id)x(t)=((In1πT)Id)x(0)+(ΓtId)v(1)x^r(t) = x(t) - ((\mathbf{1}\pi^T) \otimes I_d)x(t) = ((I_n - \mathbf{1}\pi^T) \otimes I_d)x(0) + (\Gamma_t \otimes I_d)v(1)

and, by Lemma 3.9,

LEMMA 3.11. If $G$ is connected, the relative flocking configuration $x^r(t)$ converges to the limit

xr=((In1πT)Id)x(0)+(ΓId)v(1).x^r = ((I_n - \mathbf{1}\pi^T) \otimes I_d)x(0) + (\Gamma \otimes I_d)v(1).

The mass center of the configuration moves in $\mathbb{R}^d$ at constant speed in a fixed direction.

LEMMA 3.12. The elements of $\Gamma$ and the coordinates of the limit configuration $x^r$ are CD-rationals over $O(n \log n)$ and $O(n \log n + \mathfrak{pn})$ bits, respectively.

Proof. Let $C_b$ denote the $O(n \log n)$ -bit long product of all the denominators in the diagonal matrix $C$ . The determinant of $(CL | \mathbf{1})$ can be expressed as $C_b^{-1}$ times the determinant $N$ of an $n$ -by- $n$ matrix with $O(\log n)$ -bit integer elements. By the Hadamard bound [30], $N$ is an $O(n \log n)$ -bit integer. For the same reason, each element of $\text{adj}(CL | \mathbf{1})$ is also the product of $C_b^{-1}$ with an $O(n \log n)$ -bit integer; therefore,

(InP1)1=(CL1)1=adj(CL1)det(CL1)(I_n - P | \mathbf{1})^{-1} = (CL | \mathbf{1})^{-1} = \frac{\text{adj}(CL | \mathbf{1})}{\det(CL | \mathbf{1})}

is of the form $N^{-1}$ times an $O(n \log n)$ -bit integer matrix (since the two appearances of $C_b^{-1}$ cancel out). The same is true of $(I_n - \mathbf{1}\pi^T | \mathbf{0})$ : this is because, trivially, $\pi^T = (0, \dots, 0, 1)(I_n - P | \mathbf{1})^{-1}$ . Therefore, both $(I_n - \mathbf{1}\pi^T | \mathbf{0})$ and $(I_n - P | \mathbf{1})^{-1}$ are matrices with CD-rational coordinates over $O(n \log n)$ bits. Lemma 3.11, with the formulation of Lemma 3.10 for $\Gamma$ , completes the proof. $\square$

This implies that $x(t)$ tends toward $a + bt$ , where $a, b$ are rational vectors. Since the number of switches and perturbations is finite, this proves the rationality claim made in §1. $\square$

Xet Storage Details

Size:
81 kB
·
Xet hash:
2751fff4546363f34dc3187b3b3c2ee8f440d0f6d6c9b0f4206e32add043a4f7

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.