Computing Homology - ANU

Chapter 3

Computing Homology 3.1

Introduction

We now turn our attention to the more difficult problem of deducing the homology of a compact metric space from a finite amount of data. The homology groups of a space characterize the number and type of holes in that space and therefore give a fundamental description of its structure. This type of information is used, for example, in understanding the structure of attractors from embedded time series data [58, 60], or for determining similarities between proteins in molecular biology [13]. The computability of homology groups from a given triangulation is well-known and the algorithm uses simple linear algebra [61]. This algorithm has extremely poor numerical behavior, however, so the study of computational homology remains an active area of research. In many applications knowledge of the entire group structure is unnecessary — all that is needed is the rank of the homology group, i.e., the Betti number. This information can be computed indirectly from a triangulation; many algorithms exist [11, 14, 26, 37]. The extraction of homology from data involves the additional problem of generating a triangulation or other regular cell-complex that reflects the topology of the underlying space. There are many different approaches to this [18, 37, 60]. Our goal is to develop computational techniques that allow us to extrapolate the homology of an underlying compact space, , given only a finite approximation, . We assume that approximates in a metric sense, i.e., that every point of is within distance of and vice versa. As in Chapter 2, the basic trick is to coarse-grain the data at a sequence of resolutions that tend to zero. Of course, the extrapolation will always be constrained by the accuracy of the data — this is measured by , the cutoff resolution. We model the coarse-graining of a set, , by closed -neighborhoods:1

Our main contribution to the literature on computational homology is sound mathematical foundations for relating the homology of the -neighborhoods of to the homology of . Our multiresolution approach has the additional advantage of being applicable to fractal sets. We begin this chapter with an overview of the relevant concepts from homology theory in Section 3.2. The section starts by describing simplicial homology theory. This theory is based on finite triangulations so is readily adapted to computer implementation. Fractal sets do not

1

"!

This is a different coarse-graining procedure to that in Chapter 2 — the resolution parameter there is related to the distance between points. A set is -connected in the sense of Chapter 2 if its -neighborhood is connected.

39

have finite triangulations, so a more general homology theory is needed. The appropriate forˇ ˇ mulation is Cech homology. The basis of Cech homology is an inverse system of finite triangulations that approximate a space. This idea is extended by shape theory which considers inverse systems of approximating spaces in a more general class. Our sequence of -neighborhoods fits this framework. The central results of this chapter are given in Section 3.3. This section describes the inverse and the corresponding inverse systems of homology groups. system of -neighborhoods of We would like to quantify the topological structure by computing Betti numbers as functions of . There is a problem with this however, since the -neighborhood can have holes that do not exist in . We resolve this problem by introducing the concept of persistent Betti that correspond to a hole in . When number, which counts the number of holes in has fractal structure, it is possible to see unbounded growth in the persistent Betti numbers as . We characterize this growth by assuming an asymptotic power law. The next part of this section derives formal relationships between the -neighborhoods of and a finite pointset approximation. For the finite approximations, we show that it is possible to reduce the computation of -persistent Betti numbers to linear algebra. In Section 3.4, we turn to the practical problem of how to implement these ideas computationally. As mentioned above, there are a number of existing approaches for generating triangulations and computing Betti numbers. The one that is closest to our needs is due to Edelsbrunner et al. [11, 18]. Their algorithms are based on subcomplexes of the Delaunay triangulation called alpha shapes. We describe their approach in some detail since we use the NCSA implementation of these algorithms to generate data in Section 3.5. We also give a brief overview of some other computational homology algorithms. In the final part of this section, we outline a multiresolution approach to computational homology that may be a more efficient implementation of our ideas. We use the Sierpinski triangle relatives as test examples again in Section 3.5. Since we have not yet implemented algorithms for computing the persistent Betti numbers, we give data for the regular Betti numbers. This distinguishes between the connected Sierpinski triangle and the simply connected relative. The examples demonstrate that the regular Betti numbers can be misleading and that the persistent Betti numbers are necessary for a proper characterization of the underlying topology. The material in Sections 3.3–3.5 is published in [69].

$#&%

3.2

An overview of homology theory

Homology theory is a branch of algebraic topology that attempts to distinguish between spaces by constructing algebraic invariants that reflect the connectivity properties of the space. The field has it origins in the work of Poincaré. In Section 3.2.1 we review the basic definitions for simplicial homology — a theory based on triangulations of spaces. Simplicial homology lends itself to computational implementation because triangulations of data are common numerical constructions and there is a well defined algorithm for computing homology groups from a given triangulation. Fractals typically require infinitely many simplices in their triangulations. This means the groups associated with a fractal should also be infinite, but this is not possible with simplicial homology. A different approach is needed, therefore, to describe spaces with infinitely detailed structure. The appropriate formulation of limit for this situation is given by the machinery of inverse systems, which we describe in Section 3.2.3. An inverse limit ˇ system is then used to define Cech homology in Section 3.2.4. We also give a brief outline of 40

c

c

a

b

a

b

[abc] = [bca] = [cab]

[bac] = [acb]=[cba]

σ

−σ

Figure 3.1: Two triangles (2-simplices) with opposite orientations. . h(c)

c h

. h(b)

. h(a)

, ' ()-. , ( * -

a

b

%

, *' -

' )(

, Figure 3.2: A triangulation of the circle. The simplicial complex contains the -simplices and , and the -simplices , and . The homeomorphism from the complex to the circle is denoted by .

*

+

/

shape theory, another branch of algebraic topology. The basic result of shape theory is that any compact metric space can be approximated by an inverse system of polyhedra. We use this idea to formulate the mathematical foundations for our approach to computational homology.

3.2.1

Simplicial homology

There are a number of different, but equivalent, formulations of homology theory. The simplest to understand is simplicial homology. This theory is based on triangulations of topological spaces (simplicial complexes). Singular homology is a more general technique that uses maps of simplices into a general topological space. Another common approach uses CW-complexes that are built from general -dimensional cells rather than simplices. This is probably the most popular tool in current research, and there is a good introduction to algebraic topology from this perspective by Hatcher [32]. We focus on simplicial homology, since it is the easiest to adapt for implementation on a computer. The notation we use in this section is based on that of Munkres [61].

0

Simplicial complexes

1 89: > : @ 3 D ?BAC + 8: > 3 1

243 1 0

1657+ %

The basic building block is an oriented -simplex, — the convex hull of geometrically independent points, , with . For example, a -simplex is just a point, a -simplex is a line segment, a -simplex a triangle, and a -simplex is a tetrahedron. We write to denote a -simplex and its vertices. The ordering of the vertices 41

E

defines an orientation of the simplex. This orientation is chosen arbitrarily but is fixed. Any even permutation of the vertices in a simplex gives another simplex with the same orientation, while an odd permutation gives a simplex with negative orientation; see Figure 3.1. An abstract simplicial complex, , is a collection of oriented simplices with the property that a non-empty intersection of two simplices in must itself be a simplex in . If the simplicial for some ; certain complexes with complex is finite then it can always be embedded in infinitely many simplices can also be embedded in finite-dimensional space (see Figure 3.10 for an example). Such an embedded complex is a geometric realization of . The subset of occupied by the geometric complex is denoted by and is called a polytope or polyhedron. When a topological space is homeomorphic to a polytope, , it is called a triangulated space, and the simplicial complex is a triangulation of . For example, a circle is homeomorphic to the boundary of a triangle, so the three vertices and three -simplices, are a triangulation of the circle; see Figure 3.2. A complete characterization of the class of topological spaces that have a triangulation is not known.

F

F

AGC

F

F ' )( *

F

, ' :( -. , ( * -. , *' -

F

0

F

AGC

+

Homology groups We now define the group structures associated with a space, , that is triangulated by a finite simplicial complex, . Although the triangulation of a space is not unique, the homology groups for any triangulation of the same space are identical; this makes simplicial homology welldefined. The essential ingredients for constructing the homology groups are the free groups of -simplices sums of -simplices, and the boundary operator that maps a -simplex to the in its boundary. The set of all -simplices from form the basis of a free group called the th chain group, . The group operation is an additive one; recall that is just with the opposite orientation, so this defines the inverse elements. In general, a -chain is the formal sum of a finite number of oriented -simplices: . The coefficients, , are elements of ˇ another group that is typically the integers: , but can be any abelian group. For Cech homology and our computational work we use rational or real coefficients. When , the boundary operator, , maps a -simplex onto a sum of the -simplices in its boundary. If is a -simplex, we have

F

1

K L 3

1IHJ+

1

1

F

H6243 1

243

1

O MPO ' O 2 O3 *3 N ' ' RO QTS K # K ; V 1 W U + X 1 1$H\+ 243 3Z, ]Y 8: > 3=:[ 3 - 1 ^ 3 O 89 b a O : (3.1) X 3 2 3 O`_ 8 HR+ , >> >> 3

89 >> c a O >> : - represents the 1HP+ -simplex obtained by deleting the vertex O . where , 3 The action of the boundary operator onO general 1 -chains is obtained by linear extension from O O I O O M M ' 243 ' X 3 24O3 . We drop the subscript from the its action on the 1 -simplices: X 3 boundary operator when the dimension is understood. 8 There are two ways to define the boundary operator X on % -chains. The first approach is to 8 8 d make X * % for all % -chains. With this definition, the dimension of the resulting homology group counts the number of path-connected components of a space. An alternative definition is 8 M O ' O 2 O8 is zero to say that the boundary of a -chain when the integer coefficients % * M O ' O % — this is called augmented homology.only The dimension of the augmented add to zero: 1

homology group is one less than the number of path-connected components. We use the first definition, because it links back to the work in Chapter 2, where we counted the number of -connected components.

42

c

c

0 a

b

a

b

Figure 3.3: The boundary of a 2-simplex is the sum of its three edges and the boundary of this 1-chain is zero.

, ' ) (= * -

As an example, consider the simplicial complex consisting of a triangle and all its edges is and vertices, as shown in Figure 3.3. The boundary of the -simplex

D

X , ' ( * -e , ( * - Hf, 'g* - 5h, ' ()-.

+ X , ( * - i H , 'j* - 5h, ' ()-k *lH ( H *lH' 5 . ( H ' d %c

and the boundary of this -chain is:

This illustrates the fundamental property of the boundary operator, namely that

X 3 m X 3>n ; %c (3.2) The proof is straightforward, and involves some combinatorics; see [35]. K that have important geometric interpretations. The We now consider two subgroups of 3 first subgroup consists of 1 -chains that map to zero under the boundary operator. This group is the cycle group, denoted o — it is the kernel of X and its elements are called 1 -cycles. The 3 3 second is the group of 1 -chains that bound a 1R5P+ -chain. This is the boundary group, p — it 3 ; . It follows from (3.2) that every boundary is a cycle, i.e. p is a subgroup is the image of X > 3 n 3 of o . 3 Since p ?qo 3 , we can form the quotient group, r 3 o 3ts p 3 . This is the precisely the 3 homology group. The elements of r are equivalence classes of 1 -cycles that do not bound any 3 characterizes — this is how homology 1wu; 5v w+ x chain ; wx Q holes. Formally, two 1 -cycles Q o are in the same equivalence class if 1 w -dimensional H 3 p 3 . Such cycles are w said to be 3 - Q r for the equivalence3 class 3 3 homologous. We write , w of cycles homologous to . 3 3 3 The homology groups of a finite simplicial complex are finitely generated abelian groups, so the following theorem tells us about their general structure.

Theorem 6 (Munkres, Thm 4.3). If G is a finitely generated abelian group then it is isomorphic to the following direct sum:

S |T}>}>}|7S |~S ; |P}>}>}|7S y{z @ (3.3) s= s= S TheS number of copies of the integer group is O called the Betti number . The cyclic O are called groups the torsion subgroups and the are the torsion coefficients. The torsion I O = s x which divides and so on. The coefficients have the property that + and ; divides torsion coefficients of r F measure the twistedness of the space in some sense. The Betti 3 is denoted . For 1BU+ , is the number nonnumber of the 1 th homology group r 3 3 3 equivalent non-bounding 1 -cycles and this can be interpreted as the number of 1 -dimensional 8 holes. As we mentioned earlier, counts the number of path-connected components of F . The Betti numbers are therefore exactly the type of information we seek. 43

a

b

c

f

d

a

g

d

h e

i

a

c

b

(a)

r 8R S r ; S|iS

j

e

a

(b)

r x S

Figure 3.4: (a) The torus has two non-equivalent, non-bounding 1-cycles as shown. The homology groups are , , and . (b) A triangulation of the torus. The left and right edges of the rectangle are identified, and the top and bottom edges. Examples Two simple examples are the circle, Figure 3.2, and the torus, Figure 3.4. The circle is homeomorphic to the triangle, and the chain groups have the following bases:

K x %j K ; Y , ' (:-. , ( * -. , 'j* - K 8 Y ' ) ( =* Y ()- ( There is a single generator for the 1-cycles, , ' 5v, * H, 'g*

and it is not boundary of a 2-chain. The circle is path-connected and there are no 2-simplices, so the homology groups are:

r 8; SS r Z r x %j

, ' )( - ~ 5 , ( * - 5~, *' -

, 'c - 5~, t- 5~, ' 2

The torus is triangulated by the simplicial complex in Figure 3.4(b). It has two nonand . These correspond to the loops in homologous 1-cycles, Figure 3.4(a). There is also a non-bounding 2-cycle, equal to the sum of all the 2-simplices. The homology groups are therefore:

r 68 S r ;l SS |7S r x I r %j

Smith normal form There is a well defined algorithm for computing the homology groups of a given simplicial complex. This algorithm is based on finding the Smith normal form (SNF) for a matrix representation of the boundary operators. Recall that the oriented -simplices form a basis for the th chain group, . This means it is possible to represent the boundary operator, ,

K

1

3

44

1; K K X Z3 Y 3 # =3 [

% + HI+ 3

3

;

3

by a matrix with entries in . We denote the SNF of the boundary matrix by . If is the number of -simplices then has columns and rows. The algorithm to reduce an integer matrix to SNF is similar to Gaussian elimination, but at all stages the entries remain integers. The resulting matrix has the form:

1

3

3=[

% "(

(>;

p 3 where p . . 3 N (3.4) . 3 b % O ( (; ( ( The nonzero entries satisfy UW+ and divides x , divides , and so on. For a full description of the basic algorithm see Munkres [61]. ; The SNF matrices for X and X give a complete characterization of the 1 th homology O > 3 n 3 ( ; that are greater are the diagonal entries, , of group, r . The torsion coefficients of r 3 3 > 3 n than one. The rank of the cycle group, o , is the number of zero columns of , i.e., 3 . 3 ; ,3 i.e., 3 ; .HThe The rank of the boundary group, p , is the number of non-zero rows of 3 3>n 3>n 1 th Betti number is therefore 3 P)t b¡¢ o 3 H )t b¡¢ p 3 d 3 H 3 H 3>n ; Bases for o and p 3 3 (and hence r 3 ) are determined by the row operations used in the SNF reduction. There are two practical problems with the algorithm for reducing a matrix to SNF as it is described in Munkres [61]. First, the time-cost of the algorithm is of a high polynomial degree in the number of simplices; second, the entries of the intermediate matrices typically become extremely large and create numerical problems. Devising algorithms that overcome these problems is an area of active research. When only the Betti numbers are required, it is possible to do better. In fact, if we construct the homology groups over the rationals, rather than the integers, then we need only apply Gaussian elimination to diagonalize the boundary operator matrices — a process that requires on the order of arithmetic operations. Doing this means we lose all information about the torsion, however. We discuss some other approaches to computational homology in Section 3.4.

04

3.2.2

The role of homotopy in homology

The study of homotopy leads to a substantial branch of algebraic topology. In this section we give some elementary definitions that are necessary for considering equivalence classes of maps between spaces and the corresponding homomorphisms of homology groups. For more details, see Munkres [61] or Hocking and Young [35]. Homotopy equivalence Two maps or two spaces are homotopy equivalent if there is a continuous deformation from one to the other. This type of equivalence is usually easy to visualize and gives us a powerful tool for computing homology groups. We start by defining a homotopy between two continuous maps. and be any topological spaces. Two maps are homotopic if their Let images, and , can be continuously deformed into one another. Formally, are homotopic if there is a mapping such that for each , and . The map is called a homotopy between and , and is the

£

£ ¥ ¨§G¤ , , © + ®¯¥ / 3 - if ,/ 89: >> / 3 - Q É /Í 2 3 d % otherwise Q K È . This definition is extended linearly to all chains * 3 3 The crucial property of the chain map /Í is that it commutes with the boundary operator, i.e., X /ÎÍ * 3 /Í X * 3 . This implies that /ÎÍ maps cycles in o 3 È to cycles in o 3 É and also boundaries to boundaries. This in turn means that /Í induces a homomorphism of the homology È #¦r É . This homomorphism is defined on equivalence classes of 1 groups, /ÎÏ Q r Y 3 cycles , w r 3 È by: /Î3 Ï , w 3 -Ð ,/Í w 3 - . See [35] for a proof that this definition satisfies 3 the properties of a homomorphism. È # È is the identity map, then /Ï is the identity homoIn the special case that / Y È # É is a homeomorphism of the underlying spaces, then /Ï is an morphism. If / Y isomorphism of the corresponding homology groups. For a continuous map, ¤ , any simplicial approximation to ¤ induces the same group homomorphism, denoted ¤tÏ . The result we need for Sections 3.2.4 and 3.3 is that homotopic maps

induce identical homomorphisms.

¤ ¥ Y È # É

¤ iis¥ homotopic to ¥ ¤Ï Ï.

are two continuous maps and that Theorem 8. Suppose Then they induce the same homomorphism of the homology groups, i.e.,

.

This is a standard result in homology; see [61] for a proof.

3.2.3

Inverse systems

In order to generalize homology to spaces with infinite structure we need to use a limit. There are two ways to generalize the notion of limit to general index sets and spaces: direct and inverse limit systems. Our problem involves the latter and we describe the associated definitions below; see Hocking and Young [35] or Spanier [77] for more details. An inverse system of topological spaces consists of collection of spaces, , indexed by a directed set , and continuous mappings , for each pair . The maps are called bonding morphisms and must satisfy the following two conditions.

ÒÓÕÔ

Ö Ñ>× Y × # Ñ Ö ÑÕÑI +ÇÚ the identity map on ÑÎ Ö >Ñ × Ö ×=Û Ö ÑÕÛg

for any choice of

47

Ñ

Ø ÔTÙ

and

Ô Ù Ü Ô Ø f

(3.5)

(3.6)

Ý ÞLÑb Ö Ñ>×]:Ò ß ·`àâáã Ý

LÑÎ Ö Ñ>×Î

Notice that these maps act against the order relation, which is why the system is an “inverse” one. We write for the inverse system, or when the index set is clear from context. The inverse limit space, , is the subspace of consisting of all “preorbits” in . That is,

Ý

äÓå Ñ

ß á·`à ã Ý ½ ]Ñc ]Ñ Q Ñ and Ö Ñ>×] × dP ]Ñ for Ø ÔiÙ (3.7) × ß ·æàáã Ý¸# × , are the continuous maps Ö × ]ÑcIÌ × . In fact, the The projections, Ö Y ·`àâáã Ý inverse limit space is defined up to an isomorphism: any space that is homeomorphic to ß is also an inverse limit of Ý . Ò We can often simplify the inverse limit calculation by using a subset of theQ index set, Ê ? Ò . The subset must be cofinal for this restriction to work, i.e., given any Ù Ò , there is an Q Ò Ê with Ø ÔiÙ . We then have the following theorem [52]. element Ø Ò Ò LÑb Ö Ñ>×]:Ò and LÑb Ö Ñ>×]:Ò Ê Theorem 9. If Ê is a cofinal subset of , then the inverse systems have isomorphic limits — i.e., their inverse limit spaces are homeomorphic.

8 , % + - è;Z , % ; -é , x +

ç

As an example of an inverse system, suppose the index set is the non-negative integers, , and let , and be the level- approximation to the middlethird Cantor set (c.f., Section 2.2). Since , for all , the bonding morphisms are just inclusion maps:

Ö 3ê¿Y êÓë #

; ?3 >3 n 3 3

1

when

1

ì 1]

It is easy to show that these maps satisfy the two conditions (3.5) and (3.6). The inverse system is represented by the diagram:

}>}>}Õí H]# îbï

; >} }>} x Hí=# ð è; Hí # ï 8 3=[ such that Q for all 1 , and Ö u The inverse limit space consists of sequences 3 . It3 follows that a sequence 3ê ê

. Since the projections are inclusion maps, this3 means

ñ ê 3 3 in the inverse limit space just repeats the same point, and this point must be in the Cantor set. íH]# 3

Therefore, the inverse limit space is homeomorphic to the middle-third Cantor set. The concept of inverse limit system also exists for the category of groups. In this case the bonding morphisms are group homomorphisms and the inverse limit is a subgroup of the direct of the groups in the inverse system. We use this in the next section when we desum ˇ fine Cech homology by relating an inverse system of topological spaces and the corresponding inverse system of homology groups.

ò å y Ñ

3.2.4

ˇ Cech homology

ˇ Cech homology is a general homology theory that can capture the infinitely detailed structure of a fractal. It agrees with simplicial homology on finite simplicial complexes. For more background material, see Hocking and Young [35] or Mardesic and Segal [52]. The nerve of a cover

ó L

ˇ The Cech approach is based on a way to generate simplicial complexes by taking the nerve of a cover. Given a compact Hausdorff space, , let denote the family of all finite open 48

ô Q ó L

coverings of (a covering, , is a collection of open sets whose union contains ). We construct an abstract simplicial complex, called the nerve of the cover, by associating each open set, with a vertex, also labelled , in the complex. An edge exists between two vertices and if and only if the corresponding open sets intersect. Higher-dimensional are included if the intersection is non-empty. Note that although simplices the space could be low dimensional, the nerve of a given covering could contain many highdimensional simplices. In fact, given any finite simplicial complex and a compact, perfect, Hausdorff space , it is possible to construct a covering of whose nerve is isomorphic to [35]. This ambiguity is removed by considering an inverse limit system. A partial order relation, , that makes a directed set is given by the notion of refinement. The cover is a refinement of , denoted , if for any set , there is a set such that . The family of all covers with this ordering is the index set for the ˇ Cech inverse system; we now define the bonding morphisms. If refines , we define a projection map by taking the image of a set , , to be any fixed element such that . The definition of the projection map is not unique since there may be more than one choice of for a given . However, any two choices of projection from into are homotopic. Therefore, we define the bonding morphisms to be homotopy equivalence classes of projections , and these “H-maps” then is referred to as the satisfy the conditions (3.5) and (3.6). The inverse system ˇ Cech system. Next, we consider the corresponding inverse system of homology groups.

õ Q ô õ ö8 ,õ > >õ 3

ø

õ Q ô

ø Ö ú¢û ö

õ

O ÷ Ò3 _ 8 õ

È

Ô

ô

ó L

öù?Áõ

ô

ø

ö Q ø

ø Ô ô

Öcú¢û Y ~ ø #¯ô öÁ?fõ

õ Q ô

ô

È

ö Q ø

õ

ö

ô Ö ú¢Ö ú¢û û ó L

ˇ The Cech homology groups

ô

y

The nerve of a finite cover is a simplicial complex, so we can compute its homology groups by the usual techniques. There is one slight difference, however, the coefficient group must be more general than the integers; for example, we could use the rational or real numbers. The homology groups of the nerves with coefficients in are denoted . The projection maps are identified with simplicial maps of the nerves of and . They therefore induce . Theorem 8 implies homomorphisms on the homology groups: that two elements of the homotopy class of induce the same homomorphism on the hoˇ is called the th Cech homology mology groups. The inverse system ˇ system. Finally, the th Cech homology group with coefficients in is denoted and ˇ is defined to be the inverse limit of the th Cech homology system. in for some Geometrically, suppose there is a non-bounding -cycle . Then this -cycle exists in the limit only if given any refinement of , there is a -cycle whose image under the projection homomorphism is , i.e., . ˇ is overwhelmingly large for compuThe Cech system for all finite covers of the space tational purposes; this is why cofinal subsets are useful. A subcollection, , is cofinal in if given any cover , there is a cover such that is a refinement of . As an example, if we have a compact space , let be a finite cover of by open balls with diameter . The collection of all such coverings for forms ˇ a cofinal subset of . We can now set up a Cech system for this subfamily of coverings and compute the corresponding inverse system of homology groups. By Theorem 9, the inverse ˇ limit of this system is isomorphic to the inverse limit of the full Cech system. Thus, using a ˇ cofinal family of coverings greatly reduces the amount of work needed to “compute” the Cech homology groups. The use of cofinal families is also important in our computational work.

y

Ö ú¢û

1

ów LvQ y 1 3Ê r 3 ø ó L ô

ó L

Ö 4ú û Ï Y r r ô y Ö ú¢: û Öjú¢û 3 1

ô Q ó Lv +s0

49

3 ø Ïó 1

r 3 ô y y #¬ô r ô ø y 3 Lv 1 y ür 3 L y Q w r ô y ô 3 øw 3 ô w Ê d 1 w Ö Ï 4 ú û 3 3 3 ó Ê L ?Áó L Q L ø óýÊ ø ?PAþ ô C 0 Q~S

C C

A

B

A B

Figure 3.6: The level-1 cover of the Sierpinski triangle and the nerve of this cover. The higher level covers repeated this pattern on smaller scales.

Figure 3.7: Nerves of the level-3, level-2, and level-1 covers of the Sierpinski triangle. The arrows show the action of the projection maps on the circled triples of points. An example

ÿ

ˇ We use a cofinal sequence of covers of the Sierpinski triangle, , to compute its Cech homology groups. This example is considered further in Section 3.5. , let be a cofinal sequence of covers of with the following For properties. The sets in each cover are identical open convex regions with different centers. There are such open sets in such that for each there are exactly three sets that map into , and each point of the Sierpinski triangle belongs to at most two of the sets in . See Figure 3.6 for an illustration of the level-1 cover, and its nerve. The last condition implies that the nerve of has only 0-simplices and 1-simplices. The nerves for , , and are given in Figure 3.7. This figure also shows the action of the bonding morphisms, , which are defined by the inclusion of the three sets into . The simple, one-dimensional nature of the nerves means that we can write down the homology groups by inspection. The zeroth and second homology groups are the same for all

Ä % +D O E õ O n ; ö O n ;= O n ; Q

ô ; ô õ O n ; ö O n ;= O n ; Q

E >>

ô O ô O

õ OQ ô O

ô On ; O õO ô x OO ;ô O ; Ö O ; n Y ô nOGQ Â # Oô O ô n õ ô

ô O

50

ÿ

ô ;

c

p

12

d1 d2 a

b Figure 3.8: The projection map from

ô x

into

ô ;.

cc

dc ac

p23

d2

d3

ca

c

bc

cb

a da aa

b

db ba

ab

bb

Figure 3.9: The projection map from

51

ô

into

ô x.

Ä:

z y r 98 ô OO yy W z j% r x ô

Ä % + D > > % + D >> for Ä

for

The first order homology groups have the following pattern:

r ;; ô 8=; yy r ô r ; ô x y r ; ô y r ;= ô O y

z j% z{y z{y | y | y | y y z{y ; O ; z{y C where 0 O ^ [ E 3 + E O \ H + _8 D 3 consider the action of the inclusion maps on the first-order homology only, starting with ;Ö x Ï We ; ô ; y is the loop ; , . The action of this map is shown in Figure 3.8. The basis for r y ; ) = ( : ; ô x is given by)(=the * and a basis for r four loops, ' . Since Ö x always maps three x * , are each collapsed to a single point. The action of points into one, the corner loops, ' ;Ö x Ï r ; ô x y #Àr ; ô ;= y is therefore Y ' ( H]#À% H]#À% % ; x * H]H]#À # ; A matrix representation of Ö x Ï relative to the bases above is ; x % % % +d ;= ô y # r ; ô x y , see r The above action is repeated in a threefold way for Ö x Ï ;= ô y as follows: Y Figure 3.9. We order the basis for r ' )( * : ' >> : ' >> : : þ The action of Ö x Ï is H]#À' H]# ( H]#À* H]# x y ; All other loops in r ô map to %j . The matrix for this map is: % % % + % % % + x % % % + + 52

1 n

( - ,1)

(0,1)

(1,1)

... 1 1 (- , - ) n

2

(0,0)

(1,0)

ür ;ý y

r ; %

ˇ Figure 3.10: The Warsaw circle has trivial simplicial homology group, , but its Cech homology group is . It has the same “shape” as the circle because they are both inverse limits of a sequence of annuli.

Ö O O n ; Ï Y r ; ô O n ;= y # r ; ô O y we O ; O O O [ ; O O ; [ O ; O n [ + ;= y is ˇ This means that the first Cech homology group for the Sierpinski triangle, r ü ÿ y ; O consist of sequences, w , an infinitely generated abelian group. The elements of r ü ÿ O \ O Q O O O O of elements, w r ;= ô y , such that Ö n ; Ï w n ;\ w for all Ä + D E >> . For example, the Sierpinski triangle O would be represented by the sequence O ¬cthe ;=: x central : >>hole , andin the ¬ % ' : >> . Smaller holes have ' -loop of ô x by ' This pattern continues for higher levels, and for have the recursive form for the matrix

a longer initial string of zeros. ˇ The next section generalizes Cech homology by considering inverse limit systems of approximating spaces other than nerves of covers.

3.2.5

Shape theory

Shape theory generalizes the concept of homotopy equivalence by considering inverse systems of “nice” approximating spaces. As an example, the unit circle has the same shape as the Warsaw circle of Figure 3.10, because both are inverse limits of sequences of annuli, although they have different homotopy and simplicial homology groups. The essential result we use from shape theory is that every compact metric space is homeomorphic to the limit of an inverse system in the category of finite polyhedra and homotopy ˇ equivalence classes of maps (H-maps). The Cech system for a compact space is an example ˇ of such an inverse system. Shape theory generalizes the Cech approach by allowing the approximating spaces to be homotopy equivalent to a finite polyhedron. The following theorem is a compilation of results from Mardesic and Segal [52, App.1]; it characterizes spaces that are homotopy equivalent to finite polyhedra. Theorem 10. A topological space with the homotopy type of a compact CW-complex or a compact ANR is homotopy equivalent to a compact polyhedron. 53

CW-complexes are general cell-complexes, and ANR stands for absolute neighborhood retract; see [35] for definitions. ANRs are an important class of space; they have many useful properties — one of these is that the identity map extends to a neighborhood of the space (this is the neighborhood retract part of ANR). The unit cube in and the -sphere are examples of ANRs. The algebraic side of shape theory associates an inverse system of homology groups with ˇ the inverse system of finite polyhedra, in much the same manner as Cech homology. That is, if is an inverse system of polyhedra, then is the corresponding inverse system of homology groups. A third result we need from shape theory is:

AGC

dÑb Ö Ñ>×

dÑb Ö Ñ>×

0

r dÑc: Ö >Ñ × Ï 3

¿ÑÎÕÑ>×Î

and are inverse systems of polyhedra (or ANRs or CWTheorem 11. If complexes) and H-maps whose inverse limit spaces are homotopy equivalent, then the corresponding inverse systems of homology groups have isomorphic limits. ˇ For a compact space, , the Cech system is an inverse system of polyhedra. The above theorem therefore implies that any other inverse system of polyhedra yields an inverse system ˇ of homology groups whose limit isomorphic to Cech homology. This is the sense in which ˇ shape theory generalizes Cech homology. In the following section we consider an inverse system of closed -neighborhoods and inclusion maps for a compact space .

3.3

Foundations for computing homology

In this section we develop theoretical foundations for understanding how homology groups computed from data relate to the homology of the space they approximate. The setting for our analysis is as follows. We assume that the underlying space, , is a compact subset of a metric , and that the finite set of points, , approximates in a metric sense, i.e., space, each point of is within distance of some point in and vice versa. In other words, is the Hausdorff distance between and : . In a given application, this assumption must be justified by physical or numerical arguments; we give a number of different examples in Chapter 4. A small value of implies is a good approximation to . Typically, depends on the number of points in , as we demonstrated with examples in Chapter 2, and more points naturally result in a smaller value of . In some applications, could represent the magnitude of noise present in the data, or a discretization error. To give the compact space, , and its finite approximation, , comparable topological structure, we form their closed -neighborhoods:

W:Î

f? L ¨ l

G Q 9 and G Q ] Roughly speaking, and are within , their -neighborhoods should have similar .since properties for We make this precise in Section 3.3.4 using an inverse system framework converges to the homology of as â# % in from shape theory. Since the homology of an inverse limit sense, we hope to extrapolate the homology of from the homology of the -neighborhoods of the data, , for . Of course, extrapolation is never guaranteed to give

the correct answer, and we are always restricted by the inherent accuracy of the finite data.

3.3.1

The inverse system of ! -neighborhoods

G @ 9]

We begin by describing the inverse system of -neighborhoods for the underlying compact set . The spaces for the inverse system are the closed -neighborhoods, , 54

Figure 3.11: A segment of Antoine’s necklace. This picture is from Eric Weisstein’s World of Mathematics [85]. http://mathworld.wolfram.com/AntoinesNecklace.html

8 % k ÙTÔ i Ù Ö Ñ Y Ñ #

Ñ ?

Ùf #°% Ö Y #

indexed by #" . Since we are interested in the limit as , the order relation is an when . Since when , the bonding morphisms are inverted one: simply inclusion maps , which easily satisfy the conditions (3.5) and (3.6). The inverse limit space is homeomorphic to and the projections are again inclusion maps. Each -neighborhood of is an ANR and therefore has the homotopy type of a finite polyhedron. This means the th simplicial homology groups are well-defined for . The inclusion maps induce homomorphisms on the homology groups in the for these induced standard way described in Section 3.2.2. We write homomorphisms. The homology groups, together with the inclusion-induced homomorphisms, . Results from shape theory show that the yield inverse systems of groups, denoted by ˇ inverse limit of is isomorphic to the th Cech homology group ; for details see [52, p.121]. , For computational purposes, we typically use a cofinal sequence of -neighborhoods, where is a decreasing sequence of -values with .

1 Ö Ñ Y Ñ #

r 3 Ý

; Ui x Uh>>

3.3.2

Persistent Betti numbers

)t b¡ r L: 3

%

r 3 L: Ö Ñ Ï Y r 3 L Ñc #Ãr 3 L) r 3 Ý 1 r ü 3 L O #À%

3 L ± L : L 3 # 3 Ó #% ,

We would like to quantify the structure of by looking at the Betti numbers as . In general though, it is not the case that as i.e.,

Ó#°%

·`à L :& % ß ·æ$¿à 8 ) ß $¿8 3 3

¼

¼ 8

As an example, consider Antoine’s necklace, Figure 3.11. This Cantor set, , is constructed , by taking the intersection of a sequence of nested and linked solid tori. The zeroth level, is a single solid torus, which is homotopy equivalent to a circle and therefore has first Betti number, . A chain of ' linked solid tori is embedded in to give the first level approximation, . This process is repeated inside each one of these tori so that the th term in the sequence, , consists of ' links. Antoine’s necklace is then . It is possible to choose a sequence of -values so that . We then have ' )( . However, . The problem stems from ignoring the role of the since the limit is a Cantor set, bonding morphisms. We now describe how to incorporate this information. The essential point is that we only want to count holes in an -neighborhood that are generated by a hole in the underlying space, not holes that are caused after fattening the set to its

; ¼ 8 ; + ¼O ¼ ¼

O

¼ 8

z O ; ¼ ¼ ý % ¼

÷~¼ O O ¼ ; ¼ O u #

55

Ä

Xλ

Xε

Figure 3.12: An illustration of the difference between persistent and non-persistent holes. The underlying space, , is made of an ’O’ and a ’C’ that touch at a point. In the larger neighborhood on the right, the hole from the ’O’ has a preimage in the smaller -neighborhood. . The hole from the ’C’ is not the image of a hole in , or

Ñ

Ù

-neighborhood. This idea is illustrated in Figure 3.12. The smaller Ù -neighborhood captures ; the homology of the underlying space, which has a single hole (i.e., + ). For the larger -value, the neighborhood has two holes; one corresponds to a genuine hole inL,) and the other that are the does not. In terms of homology, we want to count only those cycles in r 3 LÑc , for all Ù "i . image of some cycle in r Ù 3 - Q r L) persists More formally, for "i , we say that an equivalence class of cyclesQ , w LÑb if it is in the image of the bonding homomorphism , w - Ö Ñ Ï r 3 LÑc . The in r 3 3 that persist in Ñ is therefore just the rank of the image subgroup: number of holes in Ñ L dT)t b¡¢ Ñ LÑc (3.8) 3 Ö Ï r 3 Since we want to know the topology of , we are also interested in the quantity 8 (3.9) 3 L dT t b¡254 ÿ O O ß $+·`à * 3ß 254 E HiO + ; s D >ß 254ÐE (3.16) ß3254 + s ß3254D s ß>254ÐD ¾

Not surprisingly, this number is the same as the similarity dimension. 57

3.3.4

Finite approximations

L ¨ u

We now analyze the finite approximation, , and its relationship to the compact set . As described in the introduction to this section, we assume that the Hausdorff distance, @? and ? . We keep the notation, , i.e., that , for the inverse systems of -neighborhoods of and write for the system of neighborhoods of . The inclusions of into ? and into @? give a formal relationship between these inverse systems and allow us to derive bounds on the persistent Betti numbers of in terms of -persistent Betti numbers of . @? and ? , we have the following inclusion maps for any Since ,

?q

U Ù ~ 5 D

h?

?f

\?

L) Ö Ñ

) ÕÑ

Ä Y # n

?

and

ì Y ¬ # n t?

%

With appropriate bonding morphisms, this gives us a commuting diagram for any :

Xλ

Sλ−ρ

Ù UB

and

Xε

Sλ+ρ

Sε−ρ

Sε+ρ (3.17)

L) has a preimage in r LÑc , there must be an This diagram implies that if an element of r 3 a hole that persists in ? with a preimage in r 3 Ñ ? . Similarly, if ? has element of r [ a corresponding hole in3 n that persists in Ñ .n In terms 3 be Ñ ? , there must of Betti numbers, [ Ù Ù we have that for U~ and 6U Ñ ? 5\ D ,u Ñ L:ý Ñ ? 3[ n? 3 3n [? (3.18) Ñ ) We can swap the roles of and in (3.17) to obtain analogous bounds on 3 : Ñ ? L Ð Ñ ý Ñ ? L 3[ 3n (3.19) n? 3 [? ÙV? , we can only hope to get information about from holes in that Since maps into A persist in @? . Setting in (3.19), we for 6U~ , 8 LhaveÐthat ? ) (3.20) ? n 3 8 L B. 3 Thus, if 3 ? : C0 as â# % and is small enough, we should see at least that order of growth in . 3 In one sense, is the optimal resolution for coarse-graining the data to estimate the topolog ical structure of the underlying space. This could imply that ? is the best approximation 3 Lv , which would render our multiresolution approach redundant. to For very simple spaces 3 this may be the case. However, we are interested in more complicated settings. In general, the inverse system of -neighborhoods and the persistent Betti numbers offer two advantages. First, the cutoff resolution is typically not known in advance and must be determined from the data; examining at many -values helps us estimate . Second, computing the Betti numbers 3 @ ? at a single resolution does not distinguish between holes due to the topology of and ? holes in ? induced by the geometry. The sequence of persistent Betti numbers ? from which to extrapolate topological information 3 for about give a more accurate basis than @ 3 (c.f., the example of Antoine’s necklace). Such an extrapolation must always be given with respect to the cutoff resolution , however, since it is possible that the topological properties of change at resolutions below . 58

3.3.5

Computing persistent Betti numbers

3? :f t b¡D "Ï r 3 .

Õ Ï

? @? We want to compute the number of holes in that persist in @? , that is, This definition is in terms of the rank of a linear operator, ? , on the two quotient spaces, @? and . Below, we derive an equivalent formula that is given in terms of the linear map on the chain groups, ? . Recall that the -simplices form a basis for @? the th chain group, so it is easy to write down a matrix representation for this map. To simplify notation a little, we write:

r 3 1

r 3 )

Í K # K ) Y 3 3 K

K ?

Ï o ?o E p? p r ? o ? s p ? r o sp

K ): 3

for the chain group for

Í

for the chain map

1

K A ? : 3 ? Í

for the homology homomorphism

Õ ? Ï

for the cycle groups for the boundary groups, and and for the homology groups.

Ï r ? r

y

F? Our starting point is the image group, . One of the fundamental theorems about homomorphisms of groups [34] is that the image of a group, , under a homomorphism, G , is 5H isomorphic to the quotient, G . Thus,

y ¡ > s

Ï rF? z r ? s ¡5H>@ Ï= ¡5H>I Ï , contains all elements of F The kernel, r ? that map to the zero element of r . That is, ¡5H>J Ï , w - Q r ? , Í w - , % - Q r But a cycle is homologous to % if and only if it is in the boundary group, so ¡ H>J Ï z o ?DK Í [ ; p 5

Now, recall from Section 3.2.2 that a homomorphism induced by a simplicial map commutes with the boundary operator. It follows that cycles map to cycles and boundaries to boundaries; i.e.,

Í oL? f ? o and Í p? ?\p ; ; p ? ? Í [ p ) , so p?Z?hoL? K Í [ p : . It follows that the The second expression implies that quotient ; )- z ; )z 5H>J z ¡ Ï r ? r ? s Ï ,Lo ? s p ? s ,Lo ? K Í [ p o ? s ,oL? K Í [ p E

Finally, this means that the persistent Betti number is

3 dT)t Î¡ , o ? - H )t b¡ ,o ?

?DK

; Í[ p

(3.21)

In terms of actual computation, then, we can find the persistent Betti number from the dimensions of null spaces and ranges of matrix representations for the boundary operator and the inclusion maps. Specifically, is the dimension of the null space of the matrix for

)t b¡ o 3

59

X 3 Y K 3 M # spaces: :Í , ß`ß X

K

; . For the second term in (3.21) we need to find the intersection of two 3=[- , and p , which is the range of X ; K ; # K . Finding 3 3>n Y 3>n 3 the intersection of two linear subspaces requires some information about their bases, and is 04

therefore a more difficult problem than computing dimensions. The standard algorithms for time, where is the largest dimension of the determining intersections typically run in N matrices involved [29]. As we discuss in the following section, we have not yet implemented these algorithms. Instead, for the examples of Section 3.5 we use pre-existing software for computing the regular Betti numbers from subsets of a triangulation described in Section 3.4.1. These examples illustrate why the regular Betti numbers are insufficient for extrapolation.

3.4

0

Implementation

Our goal is to take a finite cloud of points as input, and compute persistent Betti numbers as a function of a resolution parameter. There are four parts to the overall process:

1. For a sequence of -values, generate simplicial complexes that triangulate the -neighborhoods of the data.

2. Estimate the cutoff resolution .

3? , for . 4. If appropriate, compute the growth rate, - . 3 3. Compute persistent Betti numbers,

This is essentially the same approach we used in Chapter 2 in computing the number of connected components. For that case, however, a simplicial complex is unnecessary — all the information about -connected components of is encoded in the Euclidean minimal spanning tree of the data. In the present context, steps 1 and 3 are the most computationally intensive. The two steps are also closely related; efficient algorithms for computing Betti numbers make explicit use of the data structures involved in building the complexes. Thus, given an -neighborhood, , the first problem is to generate a simplicial complex, , whose underlying space is at least homotopy equivalent to . Since we are interested in the inverse system of -neighborhoods, we . In order to have inclusion maps need simplicial complexes for a sequence of numbers PO that are well defined, we need to be either a subcomplex or a subdivision of when " . A subcomplex approach to this problem due to Edelsbrunner et al. [18], is described in detail in Section 3.4.1. This group has also developed a fast incremental algorithm for computing or . We use their implementations for the examples in Betti numbers of complexes in Section 3.5. We use the same criterion for Step 2 that we derived in Chapter 2 for approximations to perfect spaces. Since a perfect space has no isolated points, we estimate the cutoff resolution as the largest value of for which has at least one isolated point. This underestimates , but the examples in Section 3.5 show it to be a reasonably good the value for which @?/Q approximation. Recall that isolated points are straightforward to detect numerically — a point is -isolated if the distance from it to every other point in the set is greater than . The computation of growth rates in Step 4 is straightforward once the persistent Betti numbers are found.

F

F

A x

O #¸%

Ae

F

60

O i ê

(a)

(b)

Figure 3.13: (a) The Voronoi diagram. (b) The Delaunay triangulation.

3.4.1

Alpha shapes

As we mentioned above, alpha shapes are a sequence of simplicial complexes derived from the Voronoi diagram and the Delaunay triangulation (DT) — fundamental constructions in computational geometry. Alpha shapes were first defined by Edelsbrunner et al. for a finite set of points in the plane [17] and later generalized to higher dimensions [18]. The Voronoi Diagram and the Delaunay Triangulation The Voronoi diagram of a finite set of points represents the region of influence of each point; an example is given in Figure 3.13(a). The graph has a wide range of applications in computational geometry (finding nearest neighbors, for example), in biology (assigning areas of influence to individual trees), and modelling crystal growth, to name a few. The Delaunay complex is closely related to the Voronoi diagram, and is equally important in computational geometry. We give the basic definitions of these graphs here. For more details about their properties, and algorithms for computing them, see [67] , we define the Voronoi cell, , of to be the Given a finite set of points, set of all points closer to than any other point in :

ö Ö

W? Aþ

Ö ö Ö G Q A þ 9 Ö Ð\] g: ö Ö Ð A þ

ö Ö ö g

for

Ö Q

Q HVÖ

(3.22)

The collection of Voronoi cells is the Voronoi diagram. The cells are closed convex regions, +W % V K and the union R 5SUT . If , then their intersection is a subset of the hyperplane that is perpendicular to, and bisects the edge ; the interiors of the Voronoi cells are disjoint. The Delaunay complex, X , is defined to be the geometric dual of the Voronoi diagram; see Figure 3.13(b) for an example in the plane. The geometric dual is a similar construction to the nerve of a cover: if the cells have non-empty intersection, then the convex hull of , is their dual. The nerve of , however, is the -simplex

í

Ö 8 > > Ö 3

Ö

ö Ö =8 : > > ö Ö 3 8 : ö Ö > > ö Ö 3 61

1

d

d c

c a a

b b

' ) (= * :

are not in general position because they simultaneously lie Figure 3.14: The points, on a circle. The four Voronoi cells intersect at the center of this circle which means the Delaunay complex contains the quadrilateral . The nerve of the Voronoi complex contains the tetrahedron .

, ' ( * t-

' (*

, Ö 8 > > Ö 3 - — the difference is illustrated in Figure 3.14. In this example, four cells intersect at a point, so the geometric dual is a quadrilateral, whereas the nerve contains a tetrahedron. This situation is a degenerate one; it only occurs when four points simultaneously lie on a circle. To exclude this degeneracy, the points in are typically assumed to be in general position. This imposes the condition that no points in lie on the boundary of a -sphere (we discuss the practicality of this assumption later). When satisfies the general position condition, a point in can belong to at most different Voronoi cells and it follows that the Delaunay complex is a simplicial complex. This is why the Delaunay complex is usually referred to as the Delaunay triangulation. Note that the underlying space of the Delaunay complex, X , is the convex hull of . A consequence of the above discussion is that when the points are in general position, the Delaunay triangulation is a geometric realization of the nerve of the Voronoi diagram. This correspondence is used to define the alpha complexes.

57D 5T+

AGþ

Alpha complexes We now describe how Edelsbrunner introduces the resolution parameter Y into the Voronoi diagram and Delaunay triangulation. This operation generates a sequence of simplicial complexes that triangulate the Y -neighborhoods of the data set . The parameter Y is exactly the same as our parameter ; we switch to Y in this section to be consistent with the original papers. The closed Y -neighborhood @Z is just the union of all closed balls of radius Y with centers in :

Z \[ p Z Ö : í5SUT

where

p Z Ö d ] Ö GY

These Y -balls therefore form a cover of Z . We could take the nerve of this cover, but the corresponding simplicial complex is likely to have simplices of a higher dimension than the 62

Figure 3.15: The union of alpha balls is partitioned by the Voronoi cells (in blue). The alpha complex (in orange) is a subset of the Delaunay triangulation that is homotopy equivalent to the Y -neighborhood.

ambient space of . A better approach is to take the intersection of the Y -balls with the Voronoi cells:

ö Z Ö ö Ö K p Z Ö

This gives us a cover of Z by closed convex regions, called the alpha diagram:

Z

[

í5SUT

ö Z Ö

If the points of are in general position, then the nerve of the alpha diagram is a subset of the Delaunay triangulation, called the alpha complex, Z . This subset relation holds since there is &] % V K K a -simplex in the alpha complex only if Z . This implies Z ^] % V K K , so the simplex is in X . See Figure 3.15 for an illustration of the above constructions. We now give some properties of the alpha complexes. The most important for our purposes is that the underlying space of the alpha complex, _Z , is homotopy equivalent to the Y -neighborhood, Z . This means the two spaces have isomorphic homology groups (recall Theorem 7), and therefore the same Betti numbers. A direct proof of this property is given by Edelsbrunner in [16]. The construction of the alpha complexes implies a sequential ordering of simplices that is very useful in developing efficient algorithms. First, by the same argument that Z `X , we see that if Ya"]Y , then Z Z5b . Also, since the Delaunay triangulation is finite, there are only a finite number of distinct alpha complexes:

F

8 }>}>} ö Ö 3 8, Ö 9 >> Ö - ö Ö 3

1 8 >} }>} , Ö 89 >> Ö 3 ö Ö ö Ö3

F

Ê

F ?

F ?iF

I F Zdc F Z ï > > F V

63

Zde

X

8 % ì

8 fV F 2 8 2 ;= cî >ï > 2 F H F V

These are ordered by increasing Y , and by convention, Y and . This ordering of the alpha complexes induces an ordering of the simplices in X . If hg b is the g b ordered list of simplices in Zdi , then g b are the simplices in Z i Zdi , listed in order of increasing dimension. This sequential ordering of simplices is called a filter and the corresponding sequence of complexes is a filtration. The algorithm for determining this ordering is based on computing the radius of the circumsphere of each simplex in X ; see [18] for details. Delfinado and Edelsbrunner’s algorithm [11] for computing the Betti numbers of a simplicial complex relies on this sequential ordering of simplices. The theoretical underpinnings of their algorithm come from central results in homology theory such as the Mayer-Vietoris sequence and Poincaré duality. The Betti numbers are computed incrementally as each simplex is added to the complex. This process depends on a test to determine whether the new simplex belongs to a -cycle of the new complex. There are efficient algorithms for testing 1-cycles, and homology-cohomology duality theorems transform the -cycles into 1-cocycles that are equally easy to test for. However, there is no test for other -cycles, so Delfinado and or . Edelsbrunner’s algorithm applies only to subcomplexes of

F

2 n ; > > 2 nbê

1

i H + 1 A x Ae

Remarks

Ax

The NCSA ftp site provides software that implements all of the above alpha shape constructions and [1]. We generate the Betti number data for the examples in Section 3.5 using this in software. The NCSA alpha shape software requires the input data, , to be in integer format. This reduces some of the standard problems with data structures in computational geometry. The implementation uses a technique of simulated perturbations to cope with any degeneracies, so the restriction that the points of be in general position is removed [18]. Complexity bounds for the algorithms involved are at worst quadratic in the number of points, , for both time and storage. The Delaunay triangulation of a set of points in the plane can be found in N >254 time and N storage. For a subset of , the NCSA software uses an incremental flip algorithm which builds the complex in N time and storage. The simplices of the Delaunay triangulation are then sorted by the radius of their circumsphere; this process requires time, where is the number of simplices. The incremental algorithm for computN >254 ing the Betti number takes N jY to find all Betti numbers for all the alpha complexes of a subset of , and is slightly faster for subsets of . The function Y is the inverse of Ackermann’s function (which is defined by repeated exponentiation) and it therefore has extremely slow growth. This time estimate is common in algorithms involving set operations; see [9]. The NCSA alpha shape implementation goes a long way towards carrying out our desired program. It is not clear how to easily incorporate the computation of persistent Betti numbers. It is possible that an incremental algorithm for finding the persistent Betti numbers exists, as is the case for the regular Betti numbers. However, finding the persistent Betti numbers requires some explicit information about the cycles and boundaries, and the alpha shape algorithm does not generate or record this information. This problem clearly requires further work. Another drawback of the alpha shape algorithm is a large degree of redundancy in the triangulations for the type of data we are interested in. The Delaunay triangulation builds simplices that involve every single data point, and this generates a much finer complex than is neces. This redundancy is likely to occur whenever we construct as sary for resolutions 3O a subcomplex of for " . In Section 3.4.3, we discuss a possible alternative approach to building complexes on multiple scales based on subdivisions. In the following section, we outline some other fast algorithms for computing Betti numbers of a single complex.

Ae

0¿ß Ð0

hß Ð AG

0

0

0 x A

Ax

F

F §U O B ê

64

3.4.2

Other algorithms for computational homology

The development of fast algorithms for computational homology is an active area of research. Many different approaches exist for various specific applications, mostly for complexes in . To the best of our knowledge, the alpha shape algorithm is the only one that gives information at multiple resolutions. In this section, we give a very brief overview of some of the recent literature. Dey and Guha [14] describe an efficient algorithm for computing Betti numbers and geometric representations of the non-bounding cycles from 3-manifolds in . They then generalize this to include any simplicial complex in through a process of thickening. They are interested in applications to solid modelling, molecular biology and computer-aided manufacturing. This algorithm finds the Betti numbers with time and storage complexity that are linear in the size of the complex, and the generators for the first and second homology groups with a cost of order N in time, where is the maximum genus of the boundary surfaces. In [37] Kalies et al. develop an algorithm for computing the Betti numbers from cubical cell complexes. Their approach is based on a local reduction of the complex to simpler form. In this reduction is a homotopy equivalence and the result of the reductions is a minimal complex consisting of loops. The number of loops gives the first Betti number. In higher dimensions, the reduced complex may not be minimal and the reduction steps are no longer simple homotopies. operations, Kalies et al. conjecture that the Betti numbers can be computed with N 3254 where is the number of cubes in the complex. This code was developed for applications in dynamical systems — specifically, for computing the Conley index of isolating neighborhoods of invariant sets for flows generated by ODEs; see [58] for an example. The generation of cubical covers of such sets is part of the GAIO (global analysis of invariant objects) project [12] and is an efficient way to represent successive approximations to attractors or unstable manifolds, for example. Finally, we describe an approach due to Friedman [26] which computes the Betti numbers of arbitrary simplicial complexes in . This method is based on a fundamental result of Hodge theory which says that the homology groups (with real or rational coefficients) are isomorphic to the null space of a Laplacian operator on the chain complexes. The Laplacian, k is formed from the boundary operators and their transposes:

Ae

Ae

Ae

0 x ¥

¥

A x

0¿ß 0

0

Aþ

k

1

3

K K 3ZY 3 # 3

X ;X Ï ; ~ Ï 3>n 3 n 5 X 3 X 3

3

Hodge theory implies that the th Betti number is the dimension of the null space of k . For simplicial complexes, Friedman constructs a matrix representation of the Laplacian directly, is positive semidefinite without using the boundary operators explicitly. The matrix for k and symmetric, and typically quite sparse, so it is amenable to fast algorithms for computing ranks and null spaces. To compute the dimension of the null space, Friedman makes a careful application of the power method for finding eigenvalues and eigenvectors. The power method can be inaccurate for large matrices with repeated eigenvalues; Friedman’s method attempts to rigorously verify the correctness of the computed Betti numbers. This algorithm holds for simplicial complexes of any finite dimension and the running time is approximately quadratic in the number of simplices. See [26] for detailed complexity bounds.

3

3.4.3

A better way?

The two main drawbacks to the alpha shape implementation are (1) that the simplicial complexes are finer than they need to be, and (2) the fast algorithm for computing Betti numbers 65

Ax

A

only holds in and . The excessive number of simplices in the alpha complexes means that computing the persistent Betti numbers from the formula in Section 3.3.5 is unrealistic. Any subcomplex approach to generating a sequence of complexes at different resolutions will encounter the same problem of having an unnecessarily large number of simplices at coarser resolutions. The remainder of this section sketches an alternative approach to generating complexes at multiple resolutions based on subdivisions. The advantage of subdivisions is that complexes at coarse resolutions have fewer simplices than those at fine resolutions. Subdivision of simplicial complexes is a common process, both in homology theory and in computational geometry. For applications to general scattered point data, however, cubical complexes are a more natural construction. The cubical complexes we have in mind are essentially regular meshes in . The level-0 complex is a single cube Q , with side . This cube is then subdivided into equal cubes of side , and the level-1 complex, , consists of those cubes that contain a point from . This process is repeated on the cubes of to get , and so on. The sequence of complexes can be organized into a tree structure; such a multiresolution cubical complex is usually referred to as a quadtree in , and an octree .

AGC

* F; ; F Ae

DC

Fx

F 8

*sD

Ax

This sequence of cubical complexes does not have as close a correspondence with -neighborhoods as the alpha complexes do, so we need to slightly modify the inverse systems of Section 3.3.1. , it is possible to construct a sequence of cubical complexes as Given a compact space , of such a cubical complex still we described in the previous section. The underlying space, in the Hausdorff metric; and for , has the homotopy type of a finite polyhedron; is a refinement of . These properties should be enough to show that the resulting inverse ˇ system of homology groups is isomorphic to Cech homology.

?iAGC

FO

F O #

Fê

F O

Ä ì

Another substantial difference between the subdivision and subcomplex approaches is in the chain maps induced by inclusion on the underlying spaces. For the sequence of alpha complexes, these chain maps are one-to-one inclusion maps on the complexes. For the cubical . The chain maps are no complexes, if is a subdivision of , we still have that longer one-to-one, however, since each cube in is subdivided into cubes, and all of these are possibly in . The inclusion-induced chain map therefore maps all of these cubes onto the larger one. This has implications for the numerical implementation of computing persistent Betti numbers.

FO

FO

Fê

Fê

F O ? F ê D C

Í

Of all the fast Betti number computations described earlier, it seems that Friedman’s approach using the Laplacian has the most potential for adaptation to our proposed subcomplex approach. The isomorphism between homology groups and the null space of a Laplacian matrix suggests that in order to compute persistent Betti numbers, we need only find intersections of null spaces of the appropriate Laplacian matrices. This is an easier numerical linear algebra problem than the one implied by the formula in Section 3.3.5. There are still problems to be worked through here. In particular, how to build the Laplacian matrix from cubical complexes (rather than simplicial ones), and how to incorporate the inclusion-induced chain maps between the Laplacian null spaces. The above ideas are just one possible direction for the development of more efficient algorithms to compute persistent Betti numbers from data. This an open problem that needs a substantial amount of further work. 66

8

;

Table 3.1: Computed values of - and - . The results are slopes of linear least-squares fits to the data; error bounds are estimated by varying the scaling range. The exact values for the 3254 l