Testing of Algorithms - Semantic Scholar

4 downloads 369 Views 3MB Size Report
A primary focus of the field of mathematical programming ... of a computer program (often in FORTRAN) for each algo- rit
I

318 0899-1499/96/0803-0318 $01.25

INFORMS Journal on Computing Vol. 8, No. 3, Summer 1996

© 1996 INFORMS

Use of Representative Operation Counts in Computational

Testing of Algorithms RAVINDRA K. AHUJA

JAMES B. ORLIN

/ Department of Industrial and Management Engineering, Indian Institute of Technology, Kanpur 208 016, India; Email: [email protected]

/ Sloan School of Management, Massachusetts Institute of Technology, Cambridge, MA 02139; Email: [email protected] (Received: July 1992; revised: August 1993; accepted: July 1995)

- -- In the mathematical programming literature, researchers have conducted a large number of computational studies to assess the empirical behavior of various algorithms and have utilized CPU time as the primary measure of performance. CPU time has the following drawbacks as a measure of an algorithm's performance: it is implementation dependent, hard to replicate, and limited in the insight it provides into an algorithm's behavior. In this paper, we illustrate the notion of representative operation counts that can complement the conventional CPU time analysis and can help us (i) to identify the asymptotic bottleneck operations in an algorithm, (ii) to estimate an algorithm's running time for different problem sizes, and (iii) to obtain a fairer comparison of several algorithms. These concepts are easily incorporated into empirical studies and often yield valuable insights into an algorithm's behavior. The purpose of mathematical programming is insight, not numbers. -Geoffrion Hundreds of books catalog optimization algorithms of one kind or another. In addition, hundreds of research papers address applications of optimization algorithms. And yet, experimentation in optimization is a topic that has not received nearly the attention it merits. -- Golden, Assad, Wasil, and Baker

A primary focus of the field of mathematical programming is the development of the most "efficient" algorithm for a variety of optimization problems. The notion of efficiency involves all the various computing resources needed for executing an algorithm. However, time is often a dominant computing resource in practice, and computational time is often as the primary measure for assessing algorithmic efficiency. One important way to measure the computational time of an algorithm is through its worst-case analysis. Worst-case analysis provides upper bounds on the number of steps that a given algorithm can take on any problem instance. For a variety of reasons, worst-case analysis of algorithms is a popular method for judging algorithmic efficiency, and this approach has stimulated considerable research. However, worst-case analysis can be overly pessimistic, and exceedingly rare "pathological" instances often determine the performance of an algorithm. Since the empirical behavior of an algorithm is often much better than

I- -- ---

suggested by its worst-case analysis, researchers typically rely upon the empirical testing of an algorithm to assess its performance in practice. In the operations research literature, researchers have conducted a large number of computational studies to assess the empirical behavior of various algorithms and to determine the best algorithm in practice. A typical study tests more than one algorithm for a specific optimization problem and generally consists of the following steps: (i) the writing of a computer program (often in FORTRAN) for each algorithm to be tested; (ii) the use of pseudo-random problem generators to generate random problem instances with selected combinations of input size parameters (for example, the number of variables and constraints) arid/or the use of some real-life instances of the problem; (iii) the measurement of the CPU times of the different computer programs on these problem instances; (iv) an assessment of which (if any) of the algorithms takes the least amount of CPU time, and is thus the "winner." If different algorithms are faster for different input size parameters, the study reports this fact as well. We suggest here that the existing literature on computational testing over relies on CPU time as the primary measure of performance. CPU time depends upon subtle details of the computational environment and the test problems such as: (i) the chosen programming language, compiler, and computer; (ii) the implementation style and skill of the programmer; (iii) combinations of input size parameters; and (iv) the particular programming environment (for example, the use of the computer system by other users). Because of multiple sources of variability, CPU times are often difficult to replicate, which is a serious drawback for scholarly research. Also, because CPU time is an aggregate measure of empirical performance, it is more difficult to obtain detailed insight into an algorithm's behavior using CPU time measurements only. In particular, a typical CPU time analysis does not identify the algorithm's bottleneck operations. Identifying bottleneck operations of an algorithm can provide useful guidelines for where to direct future efforts to understand and subsequently improve an algorithm.

Subject classifications: Analysis of algorithms: Analyzing empirical behavior of algorithms. Net.vorks/graphs: Testing of network simplex algorithm.

I

______.._.____.__.lylP ..

ill

---

-

I^IC-L--^-

CII__·_CCIII^_(I

_______. __.._.., ___..___..______.. _ ... .___ __.__.._,,,.,,- --bPacmw I YYPILL--··---YL-IIILel-YY-

319 Representative Operation Counts

The spirit of worst-case analysis is to identify theoretical bottlenecks in the performance of any algorithm and to provide upper bounds on the computation counts of these bottleneck operations as a measure of the algorithm's overall behavior. Borrowing this point of view for computational testing, we might attempt to measure the empirical performance of an algorithm (or its computer implementation) by counting the number of times the algorithm executes each of these bottleneck operations while solving each instance of the problem. That is, we would conduct computer experiments to obtain an actual count of the bottleneck operations instead of providing a theoretical upper bound on this number. This approach suggests that in analyzing the empirical behavior of an algorithm, we need not count the number of times it executes each of the (possibly, thousands of) lines of code, but instead we can focus on a relatively small number of lines that are "summary measures" of the algorithm's empirical behavior. Even for relatively complex algorithms, we have generally found that one needs to keep track of the computation counts of at most three or four operations or lines of code. For example, we show in Section 3 that in the network simplex algorithm for the minimum cost flow problem, the following three operations dominate all other operations: (i) scanning arcs in identifying the arc to enter the basis; (ii) scanning arcs in the pivot cycles; and (iii) examining nodes in the subtrees whose potentials change. Even though any implementation of the network simplex algorithm might contain hundreds of lines of code, we need to keep track of only three fundamental operations in order to identify the bottleneck operations, and in order to estimate the running time within a constant factor. We will soon formalize this notion of "representative operation counts" in computational testing; however, let us first summarize some of the advantages of keeping track of computation counts in addition to CPU time over the more traditional approach that relies only on a CPU time analysis. 1. Representative operation counts facilitate the identification of the asymptotic bottleneck operations of the algorithm, that is, the operations that progressively take a larger share of the computational time as the problem size increases. Improvements in asymptotic bottleneck operations have the maximum impact on the running time of the algorithm for sufficiently large problem sizes. 2. Representative operation counts are data that can provide additional guidance and insight into the comparison of algorithms that are run on different computers, and even permit us to compare algorithms implemented using different computer languages. 3. Representative operation counts aid in the determination of lower and upper bounds on the asymptotic growth rate in the computation time as a function of the problem size.

4. The total of the operation counts of the representative operations is within a constant factor of the CPU time. Furthermore, one can use statistical methodologies to estimate the CPU time on a computer as a linear function of the representative operation counts. We refer to this

~~I ~~~~-·-------------·-~~..-

---c----.-. "-~~~.------ "lriC~~~~~k-I~~~ I~~~~----------^-~~~-~-.-I----. - ----

estimate of the CPU times as the virtual CPU time. The virtual CPU time permits researchers to carry out experiments on different computers, but estimate the running times as if all of the experiments had been carried out on a common computer. This type of asymptotic empirical analysis complements worst-case analysis, which is so popular in the operations research literature. Representative operation counts are data that can help to identify the actual empirical behavior of the algorithm as the problem size grows larger. Just as the worst-case analysis ignores constant factors in the running time, empirical analyses described here using representative operation counts ignore constant factors in the running time, and focus on the dominant term in the computations. This paper describes the underlying ideas for this type of empirical analysis and illustrates it using a variant of the wellknown network simplex algorithm for solving the minimum cost flow problem. 1. Related Research This paper focuses on using representative operations counts that aid in analyzing the empirical behavior of an algorithm. The idea of using representative operation counts in the empirical analysis of algorithms is quite old, and

probably dates back to the origins of computational experiments. Nevertheless, the literature on computational testing for mathematical programming has traditionally used CPU time as its primary measure of computational effort, and has used representative operation counts in a rather limited way. For example, most computational experiments on the network simplex algorithm have kept track of the number of pivots, but have not kept track in any systematic way of the work per pivot. Although there is a long history of work involving computational counts, this paper is most inspired by the thesis of McGeoch.[ 24] McGeoch articulates in the following quote why algorithms and not programs should be the primary object of study. By maintaining abstraction in the cost function (number of comparisons, say, rather than running time), one obtains

results that are implementation independent and therefore useful in a variety of situations. Abstraction can produlce deeper uznderstanding of underlying mechanisms and discovery of algorithmic paradigms. Mathematical models of algorithms and input allozw consistent and zwell-defined manipulation of parameters, provide a standard vocabulan of communicating results, and promote generalization of algorithms and analysis techniques. McGeoch [241 uses computation counts instead of CPU times. We propose to follow more the model of Knuth,[ ' ] who uses computation counts in conjunction with CPU times. For example, Knuth describes an analysis of Quicksort in which he predicts, based on first principles, that the total running time of Quicksort applied to an instance I is 24A + llB + 4C + 3D + 8E + 7N + 9S, where A is the number of partitioning stages in instance I, B is the number of exchanges in instance I, etc. Another outstanding paper in computational testing using counts to predict running times

~~ ~

---U·-----

·-·-sraa~~~~~~~~~~~~~~~~l

~ ~ ~ ~ ~ ~ ~ -

-

c

--

----- -1~~~~~I

-iM"a"TI

320 Ahuja and Orlin

of Quicksort is due to Sedgewick.i2 7] Another example is the paper by Shier[281 analyzing Floyd's algorithm. These detailed predictions of the CPU time from computation counts are subtle. They are concerned with predicting the CPU time precisely as a function of the computation counts. As such these papers go substantially beyond the scope of what is presented here. However, they are also difficult to apply to algorithms developed today. The analysis assumes that one knows precise characteristics of the computer program (for example, its expression in assembly or machine language) as well as detailed characteristics of the computer. Even if one does not know these features of the computer and the computer program, this type of prediction is no longer valid because of the complex caching schemes currently in use. Here we attempt to achieve a balance between the approaches of McGeoch and Knuth. The notion of representative operation counts allows us to estimate the running time

of an algorithm within a constant factor and, at the same time, allows us to restrict attention to a fairly small subset of operation counts. For example, to evaluate the Quicksort algorithm, it would suffice to keep track of only one computational count, that is, the number of comparisons. Since there is typically a very small set of representative operations, even for relatively complex algorithms, an analysis using the counts of these representative operations is not burdensome for researchers. Moreover, it often leads to simpler analyses and to clearer insights into the asymptotic behavior. We do not claim originality for the idea of using computation counts in empirical analyses. For example, Johnson, 191 Bentley, 2 1 and others have used computation counts to analyze the empirical behavior of algorithms. A good exposition of counting representative operations appears in the thesis by Lee.t 3 1 Nevertheless, the literature on computational testing is still dominated by the CPU time analysis. This paper serves three purposes: (i) it attempts to formalize the usage of computational counting while simultaneously popularizing the notion of representative operation counts; (ii) it discusses methods for identifying asymptotic bottleneck operations and comparing two algorithms; and (iii) it presents a detailed empirical analysis of the well-known network simplex algorithm. This paper focuses on empirical running time as measured both by representative operation counts and by CPU times. It is beyond the scope of this paper to treat other important measures of performance for an algorithm such as: (i) ease of implementation, (ii) robustness, (iii) reliability, and (iv) accuracy of the solutions. We refer the readers of this paper to papers by Crowder and Saunders,t[ll Hoffman and Jackson,[ 1 61 and Greenberg, 1[ 4] each of which discusses one or more of these measures of performance. Reporting computational results is an integral part of any empirical study of algorithms. The most comprehensive references on the reporting of computational experiments in mathematical programming are the papers by Crowder, Dembo, and Mulvey. [9 ' 10] They provide guidelines for what should be reported in a research paper, as well as provide help on how to conduct appropriate computational experi-

--.. - - ··------

·-

------ - -·--·---rl-u -~ --

c~1----c ·

-

'

i '-

-

ments. Jackson and 'Mulveyt1 8 have summarized the reporting of computational experiments within the mathematical programming literature, largely detailing how poor the reporting had been up to that time. More recently, Jackson, Boggs, Nash, and Powellt 17 1 have provided updated guidelines for conducting and reporting computational experiments. Often, statistical methodologies can provide analysis and insight that are unavailable through other means. Some papers on computational testing that provide details on statistical methodologies are due to Bland and Jensen, [41 and Golden, Assad, Wasil, and Baker.[13] (The paper by Golden, Assad, Wasil, and Baker[1 3 1 nicely illustrates a variety of other important topics such as comparing different algorithms using multiple criteria and the different issues that one faces in developing a system for commercial use.) Mnore-

over, statistical methodologies such as variance reduction can often increase the power of the analysis and reduce the number of experiments needed to obtain conclusive results. We refer the reader to McGeoch[2 5t for illustrations of variance reduction as well as for other pointers to the literature. Algorithm animation is another technique for gaining insight into an algorithm. Brown1 61 provides a comprehensive treatment of this topic. Algorithm animation techniques view the progress of an algorithm on a single instance as a sequence of snapshots. For example, consider the greedy algorithm for finding a minimum spanning tree joining n points in the plane. This algorithm creates the minimum spanning tree adding one arc at a time, maintaining a forest at each intermediate stage until it ultimately creates the minimum cost spanning tree. Using animation, we could construct a sequence of figures showing the forest at intermediate stages. Viewing a motion picture consisting of a series of these snapshots might reveal additional insights into the algorithm. Alternatively, we could see how the total length of the forest increases as a function of computer time, or we could see how the sizes of the components increase as a function of computer time. The papers by Bentley and Kernigh'an,P3 1 and Bentley t 2 1 discuss a simple language that facilitates the construction of these animations. 2. Representative Operation Counts In order to formalize our notion of counting the operations performed by an algorithm, let us first consider what a computer does in executing a program. Suppose that we have a computer program for solving some problem, and that I is an instance of the problem. The computer program consists of a finite number of lines, say K, of computer code, say a,, a, ... , aK. Each line of code gives either one instruction or a small number of instructions to the computer. The instruction might tell the computer to carry out an arithmetic operation or to move data from one memory location to another; or it might be a control instruction informing the computer which line of code it should execute next. For convenience, we assume that the program is written so that each line of the code gives 0(1) instructions to the computer, and that each instruction requires 0(1) units of time to execute. We assume that the fastest computer operation

I~4jP

I

1

-X·I^_-_C.--

- o ---" -~_iI

__ _M ml

321 Representative Operation Counts

Lemma 2. Let as be a representative set of lines of code. Then CPU(I) = 0e(Xks aCk()).

requires 1 time unit to execute. The assumption that each operation executed by the computer requires a comparable amount of time seems reasonable in practice, because the time requirements of these operations are typically within a small constant factor. The preceding discussion implies that executing any line of code requires 0(1) time units, and at least 1 time unit. Therefore, each line of code requires 0(1) time units since its execution time is bounded from both above and below by a constant number of units. (If f: Z+ -> R + and g: Z+ -> R+, we say that f(n) = 0(g(n)) if clg(n) c f(n) - c2 g(n) for all n > no and some constants c, > 0, c2 > 0, no > 0.) For a given instance I of the problem, let eak(I), for k = 1 to K, be the number of times that the computer executes line k of this computer program. Let CPU(I) denote the CPU time of the computer program on instance I. The preceding discussion implies the following lemma. Lemma 1. CPU(I)= 0(Ka

1

The methodology requires the user to identify a set of representative lines of code, and during the empirical investigations to keep track of the execution counts of these lines. We often use the expression "a set of representative operations" rather than "a representative set of lines of code" and we refer to "the computation counts for a set of representative lines of code" more briefly as the representativeoperation counts. In the next section, we discuss various uses of representative operation counts. We point out that in addition to noting representative operation counts for various instances solved, we might count additional operations that might be helpful in assessing an algorithm's behavior. For example, in the network simplex algorithm, for each instance solved we might count the major iterations that the algorithm performs, such as the number of pivots. At first glance, one might suspect that determining a small representative set of operations can be difficult. But our experience suggests that it is generally quite easy to determine small representative sets, and often there are a wide number of alternative valid choices. We now give examples of representative sets for several popular algorithms. We emphasize that the representative sets we indicate for the algorithms are not unique. In some cases, we have suggested additional operation counts that might be of value in empirical investigations. Most of the algorithms analyzed here are network flow algorithms because they are simple and illustrative; however, the discussion that follows requires some familiarity with network notation and network flow algorithms. For additional details of these network flow algorithms, we refer the reader to Ahuja, Magnanti, and Orlin.P[ ] We now provide some notation. We consider a directed network G = (N, A), where N represents the node set and A denotes the arc set. Let n = NI and m = IAI. Each arc (i, j) has an associated cost cij and an associated capacity uii.

aCk(I))-

Lemma 1 states that we can estimate the running time of an algorithm to within a constant factor by counting the number of times it executes each line of code. However, counting each line of code is unnecessarily burdensome. As we shall soon see, it really suffices to count a relatively small number of lines of code. For example, consider the following fragment of code: for i = 1 to 3 do begin A(i) : A(i) + 1; B(i) : B(i) + 2;

C(i) := C(i) + 3; end; We need not count the number of times the algorithm executes the statement "B(i) := B(i) + 2," since it executes this statement whenever it executes the statement "A(i) := A(i) + I." Similarly, we need not count the number of times the algorithm executes the statement "C(i) := C(i) + 3." Moreover, it appears that regardless of the size of the problem solved, the algorithm modifies each of A, B, and C three times during the execution of the "for loop." So, in this case, it would suffice to keep track of just the number of times that the algorithm executes the for loop. We now formalize the notion we have been suggesting, that is, keeping track of a small number of lines of code. Let S denote a subset of {1, ... , K}, and let a denote the set {ai: i S. We say that as is a representative set of lines of code of a program if for some constant c > 0. cz(I)sc

(kES

Dijkstra's algorithm for the shortest path problem (Version I). In the shortest path problem, we wish to obtain the shortest path distance from a specified source node s to every other node in the network G, where ci is the cost of the arc (i, j). Dijkstra's algorithm (see, for example, Lawler t ]2 1) can be used to solve this problem provided that all arc costs are nonnegative. The algorithm maintains a distance label d(i) with each node i (such that d(s) = 0), which is the cost of some path from the source node s to node i. Dijkstra's algorithm maintains three sets of nodes: permanently labeled, temporarily labeled, and unseen. Each permanently labeled node i has a distance label d(i) which is the minimum cost of a path from node s to node i. Each temporarily labeled node i has a finite distance label d(i) which is the minimum cost of a path from node s to i with the property that all internal nodes of the path are permanently labeled. At each iteration, by scanning all temporarily labeled nodes, the algorithm identifies the temporarily labeled node i whose distance label is minimum (node selection step). The algorithm then makes

aCk())

for every instance I of the problem and for every line ai of code. In other words, the time spent in executing line a is dominated (up to a constant) by the time spent in executing the lines of code in S. We do not require a set of representative lines of code to be minimal. In fact, S = 1, ... , K is a set of representative lines of code. The following is an immediate consequence of Lemma 1.

-

~~

--

I,~"I·~~IILS~aeC· 1.

-

a-~

.-· *

i

I

L~L~-

--

. ---_P__-·

C-~lpA

g~FCI--~~L

322 Ahuja and Orlin

node i permanent, and scans the arcs in its adjacency list A(i) to update the distance labels of adjacent nodes (distance update step). The algorithm terminates when all nodes are permanently labeled. (We assume here without loss of generality that each node j is reachable from s. To satisfy this assumption artificial arcs (s, j) of large cost can be added to the network.) One set of representative operations for Dijkstra's algorithm consists of: (i) scanning of nodes during the node selection step; and (ii) scanning of arcs to update distance labels. The number of operations performed in the distance update step is ieN IA(i) = 0(m), but the number of temporarily labeled nodes examined in the node selection step is problem dependent. Therefore, the representative operation counts in this version of Dijkstra's algorithm are the number of arcs (that is, m), and the number of temporarily labeled nodes examined. Dijkstra's algorithm for the shortest path problem (Version II). In this version of Dijkstra's algorithm, we store the temporarily labeled nodes in a binary heap. In this case, each node selection step and each distance update step requires a heap update, which can be decomposed into several siftup or siftdown operations, each requiring 0(1) time (see, for example, Cormen, Leiserson, and Rivest[81). The number of sift operations per heap update is problem dependent, but is bounded by log n in the worst case. Hence one set of representative operations in this version of Dijkstra's algorithm consists of the arc scans (there are m of them) and the sift operations. To gain additional insight, one could also keep track of the number of heap updates and the average number of nodes in the heap. Label correcting algorithm for the shortest path problem. Label correcting algorithms are an important class of algorithms for solving the single-source shortest path problem in networks with arbitrary arc costs and no negative cost cycles. The generic label correcting algorithm maintains a distance label d(i). If these distance labels satisfy the optimality conditions d(j) < d(i) + cij for each arc (i, j) E A, then they represent the shortest path distances; otherwise they are non-optimal. The generic label correcting algorithm also maintains a set of nodes, denoted by LIST, satisfying the invariant that if an arc (i, j) violates its optimality condition, then node i is in LIST. LIST can be stored as a stack or as a queue. At each iteration, the label correcting algorithm removes a node i from LIST, sequentially scans each arc (i, j) in its adjacency list A(i), and if the arc (i, j) satisfies d(j) > d(i) + cij, it updates d(j) = d(i) + ci and adds node j to LIST, if it is not already in LIST. The algorithm terminates when LIST is empty. The generic label correcting algorithm performs the following operations repeatedly: (i) it removes nodes from LIST (removals); (ii) it scans adjacency lists of nodes removed from LIST (arc scans); and (iii) it adds nodes to LIST (additions). Using the facts that the number of removals equals the number of additions, and that whenever a node i is added we must scan every arc (i, j) in A(i), we conclude that one set of representative operations consists of the singleton opera-

-··--··-- --------LCLI-L--------··

- L-----·II---·C.

b

IIYI-r

--

IIIIIIIL-·IJCDIC--

tion of scanning arcs. In addition, it might be useful to keep track of the number of additions to LIST. Labeling algorithm for the maximum flow problem. The

labeling algorithm starts with a zero flow and proceeds by identifying augmenting paths from the source node s to the sink node t and sending flows along these paths. (An augmenting path is a directed path along which positive amount of flow can be sent from the source node to the sink node.) The algorithm identifies an augmenting path using a search method that works in the following manner. The search method maintains a set, LIST, of labeled nodes (that is, the nodes reachable from the source) that have not been examined. Initially, LIST contains the single node s. Subsequently, at each iteration, the method removes a node i from LIST, scans arcs in its adjacency list A(i) to label more (unlabeled) nodes, and adds them to LIST. The search method terminates when either the sink node t is labeled or LIST becomes empty. In the former case, we have identified an augmenting path; in this case, we augment a maximum possible flow along this path, unlabel all nodes, and reapply

the search method. In the latter case, the current flow is maximum and the labeling algorithm terminates. The labeling algorithm performs the following operations repeatedly: (i) removing nodes from LIST; (ii) scanning arcs in the adjacency list of nodes removed from LIST; (iii) adding nodes to LIST; (iv) sending flows along augmenting paths; and (v) unlabeling the labeled nodes. We assume here that node removals and insertions from LIST take O(1) time as does checking whether a node is in LIST. The 0(1) running time is easily accomplished by implementing LIST as a doubly linked list. One set of representative operations for the labeling algorithm consists of the following: scanning arcs while examining labeled nodes. This operation dominates all other operations that the labeling algorithm performs. In addition to keeping track of the number of arc scans, one may want to count the number of augmentations. This additional information might provide insight into comparing the labeling algorithm to other maximum flow algorithms, and can help assess in practice whether the bottleneck is caused by too many augmentations or too much time per augmentation. Bubble sort algorithm. The bubble sort algorithm is a wellknown algorithm that sorts the elements of an array caof n elements in nondecreasing order. This algorithm proceeds by performing passes over the array ca. In the Ith pass, it

compares the two elements a(k) and a(k + 1) for each 1 $ k • n - I one bv one, and if the elements are out of order (that is, a(k) > a(k + 1)), it swaps their positions. This algorithm terminates when, during an entire pass, no twvo elements are

found to be out of order. One set of representative operations in this algorithm consists of checking whether adjacent elements are out of order. In addition, one may want to keep track of the number of passes performed by the bubble sort algorithm. Network simplex algorithm. The network simplex algorithm is a well-known algorithm to solve the minimum cost flow problem. The network simplex algorithm is an adap-

---------a2·)lislb·1111

--

-

-------

I--------·----^··----·

323 Representative Operation Counts

readers to the studies of Glover, Karney, and Klingman,[l2 1 Bradley, Brown, and Graves,t 51 Mulvey, [26 1 Grigoriadis,[ 151 and Chang and Chen.[7 1 The computational time that the network simplex algorithm requires to solve a minimum cost flow problem depends on the test problems. For NETGEN, we need to specify a number of different parameters, including the following: the number of nodes, the number of arcs, the magnitudes of the cost and capacity data, and the number of supply nodes. To illustrate the approach discussed in this paper, we have chosen to focus on the number of nodes and the number of arcs. We conducted experiments on networks

tation of the revised simplex algorithm for linear programming and maintains a basic feasible solution at each step, which corresponds to a spanning tree in the underlying network. The network simplex algorithm maintains a spanning tree using tree indices. A common set of tree indices includes the predecessor, the depth, and the thread; however, the comments below concerning representative operation counts are valid for a wide range of different sets of tree indices. For a basic feasible solution with a corresponding tree T, the algorithm scans through the arc list to identify a nontree arc (k, ) violating its optimality conditions. Adding arc (k, ) to the current spanning tree creates a unique cycle W. The algorithm identifies this cycle and augments a maximum possible flow along this cycle. Doing so saturates at least one arc, say (p, q), in the cycle W. The algorithm replaces the arc (p, q) by the arc (k, ) in T, and updates the node potentials (or, simplex multipliers) of a subtree T2 of T and the tree indices. The network simplex algorithm repeats this process until all arcs satisfy the optimality conditions. The following is a set of representative operations for the network simplex algorithm: (i) the scanning of arcs to identify entering arcs; (ii) the scanning of arcs in the pivot cycles; and (iii) the identification of nodes in the subtrees whose potentials change. The network simplex algorithm performs many other operations. We do not need to keep track of these operations because the representative operations dominate them. For example, the algorithm performs some initialization operations that we have ignored. Furthermore, updating all the tree indices in a pivot (predecessor, depth, and thread) requires (IWI + ITi2) time, which is dominated by (ii) and (iii). This set of representative operations is strikingly compact considering the fact that implementations of the network simplex algorithm are typically quite intricate. In addition to these representative counts, we might keep track of other operation counts. For example, we would most likely wish to keep track of the number of pivots. Moreover, if we were concerned about the effects of degeneracy, we would also keep track of the number of degenerate pivots performed by the algorithm.

with 1,000, 2,000, 4,000, 6,000, and 8,000 nodes. For each

choice of n nodes, we created networks with 2n, 4n, 6n, 8n, and 10n arcs. Setting the average node degree d = m/n, we

considered networks with d = 2, 4, 6, 8, and 10. Thus, the largest network size we considered had 8,000 nodes and 80,000 arcs. For each specific setting of n and m, we solved five different problems generated from NETGEN and computed the averages of these five problem instances. Because the averages have a lower variability of outcomes than the individual tests, the resulting graphs and charts reveal patterns more clearly. We chose to take the averages of the data of five instances so that the patterns are highlighted in the graphs; for other studies, one can also employ more sophisticated statistical methodologies to address issues of variability.

For each problem that we solved, we noted the following values: (i) (ii) (iii) (iv) (v) (vi)

ac:

the number of arcs scanned in selecting an entering arc (summed over all pivots); caw: the number of arcs of the pivot cycles (summed over all pivots); cep: the number of node potentials modified (summed over all pivots); the number of pivots; p: p: the number of degenerate pivots; and the CPU time in seconds to execute the algorithm (times noted on a HP9000/850 computer under a multiprogramming and multisharing environment).

3. Using Computation Counts to Analyze the Network Simplex Algorithm In this section, we illustrate the use of representative operation counts using the network simplex algorithm for the minimum cost flow problem. We provide experiments based upon the network simplex algorithm using the first eligible arc pivot rule for selecting the entering arc. The first eligible arc pivot rule scans the arc list in a wraparound fashion and selects the first arc violating its optimality condition as the entering arc. This version of the simplex algorithm is not the most efficient in practice; we have analyzed it because it provides a good illustration of the use of representative operation counts. The code for the network simplex algorithm was developed by the authors. To generate minimum cost flow problems, we used the well-known network generator NETGEN developed by Klingman, Napier, and Stutz. 120 1 For additional codes and computational testing of the network simplex algorithm, we refer

We computed the averages of these values over five problem instances for each parameter setting. Table I gives these averages for the network simplex algorithm with the first eligible arc pivot rule.

Identifying Asymptotic Bottleneck Operations We consider an operation to be a bottleneck operation for an algorithm if the operation consumes a significant percentage of the execution time on at least some fraction of the problems tested. We refer to an operation as an asymptotic nonbottleneck operation if its share in the computational time

becomes smaller and approaches zero as the problem size increases. Othervise, we refer to an operation as an asymptotic bottleneclk operation. The asymptotic bottleneck opera-

tions are important because they determine the running times of the algorithm for sufficiently large problem sizes. Earlier, we have shown that the representative operation counts provide both upper and lower bounds on the number

PR

; 13

-m

"-

I

I

- -

.

ARRM, ".- -, I

324 Ahuja and Orlin

Computation Counts for the Network Simplex Algorithm with the First Eligible Arc Pivot Rule

Table I.

CPU Time Serial No.

aw

d

m

n

p p /p

aPpE

183540 831593 3989374

237599 1356773 8319325

4721 15358 51936

863573

10770317

25045320

116303

54392393

197005

1 2 3

1000 2000 4000

2000 4000 8000

2 2 2

41590 129125 406115

4

6000

12000

2

a

aE

+

+

(in seconds)

462729 2317491 12714814

4.46 21.62 119.70

0.85

36679210

345.64

0.85

77285199

699.88

0.79 0.81 0.82

5

8000

16000

2

1453942

21438864

6 7 8 9 10

1000 2000 4000 6000 8000

4000 8000 16000 24000 32000

4 4 4 4 4

87883 248760 765758 1475919 2419120

284830 1101807 5171901 12277824 24614630

403699 2090021 12341060 33993925 77921501

10275 29879 9985 197858 335296

0.84 0.85 0.86 0.87 0.87

776412 3440588 18278719 47747668 104955251

7.31 31.26 160.70 420.96 853.21

11 12 13 14 15

1000 2000 4000 6000 8000

6000 12000 24000 36000 48000

6 6 6 6 6

141475 379171 1152766 2329115 3639711

351763 1253222 5655431 14249457 26157289

518265 2353123 15215392 42841133 90280080

14372 38837 124249 259742 416123

0.85 0.87 0.87 0.88 0.88

1011503 3985516 22023589 59419705 120077080

9.33 35.42 188.55 505.79 978.57

16 17 18 19 20

1000 2000 4000 6000 8000

8000 16000 32000 48000 64000

8 8 8 8 8

207114 529747 1534355 2918853 4532492

380015 1452365 5919861 13917221 25791836

598636 3012190 13833227 48131763 99057203

16586 47116 139434 273089 437558

0.85 0.87 0.87 0.87 0.88

1185765 4994302 21287443 64967837 129381531

10.70 42.61 200.67 518.08 1033.05

21 22 23 24 25

' 1000 2000 4000 6000 8000

10000 20000 40000 60000 80000

10 10 10 10 10

266541 710258 1975386 3902416 5942548

410304 1555660 5844629 15193494 26724636

739070 3305765 17922439 48615853 103238910

18634 53425 145303 312355 487624

0.83 0.87 0.86 0.88 0.89

1415915 5571683 25742454 67711763 135906094

12.39 48.08 208.30 584.34 1064.45

of operations an algorithm performs; therefore, the set of representative operations must contain at least one asymptotic bottleneck operation. There is no rigorous method for determining asymptotic bottleneck operations using computational testing, unless we are willing to base the determination on further assumptions about the behavior of an algorithm on large problems. After all, it is theoretically possible that an algorithm behaves one way for problems of sufficiently large size, and it behaves totally differently for small-size problems. Nevertheless, there are procedures that seem quite effective in practice at determining asymptotic bottlenecks, even if they are not totally rigorous. We now illustrate a graphical approach for identifying an asymptotic bottleneck operation for the network simplex algorithm. Let as(I) = aE(I) + aw(I) + acp(I) denote the sum of the representative operation counts. A simple but effecas(l), ap(I)/as() tive approach is to plot c(I)/C s(), a(I)C for increasingly larger problem instances I and look for a trend. Figure 1 gives these plots for the network simplex algorithm with the first eligible pivot rule. In these plots, we

--- 1.

__

*nnR~~~~~~ OMMO,

--

PoI"

I

-IY

-. -

~

lto

L-_

~~~,~·-M-

use the number of nodes as a surrogate for the problem size and give a plot for each different degree d = m/n. This choice helps us visualize the effect of n and m on the growth in representative operation counts. These plots suggest that updating node potentials is an asymptotic bottleneck operation in the algorithm, and suggest that determining the entering arc and sending flow around the cycle are asymptotic nonbottleneck operations. Estimating Growth Rates of Bottleneck Operations Ho-w can one estimate the growth in the computational time (that is, CPU time) as the problem size increases? Instead of

directly estimating the growth in the computational time, we estimate the growth in the asymptotic bottleneck operations. There is no need to consider the nonbottleneck operations in estimating the asymptotic running time. The CPIU time is approximately a linear function of the terms ai(), for i = 1 to K, and thus includes information from many of the nonbottleneck operations. For this reason, the CPU time is a much noisier estimate of the asymptotic running time. We now describe a simple approach for estimating the

-_-·-·· I

A-·o*

-Ml--.

ILn--···--

325 Representative Operation Counts

SteP 1. Determine an appropriate functional form for estimating the count for the operation c. In our example, we choose the functional form to be c, na do for some choices of constants c, ca, and 3.

0.2-

Step 2. Use multiple regression to estimate the values of c ,,

d= Il

0.1.

a, and /3. Obtain statistics (such as R2 , adjusted R2 , and

standard errors, etc.) in order to evaluate the quality of the fit.

d=

0.0

-

Step 3. Use graphics to help verify that the estimate is an upper bound on the asymptotic growth rate. For example, a useful visual check can be obtained by plotting (p)

; I

_

2000

6

400 n

00 -

Bboo

(c, n d).

s8ob

To perform the regression analysis of the network simplex algorithm, we took the logarithm of the estimated function ap = cqnad, which gave us log(ap) = log c + a log n + / log d. The multiple regression analysis yielded log c = -7.31, a = 2.49, and/3 = 0.50. (In our regression, we took logarithms to the base 2.) The R2 value of this fit was 0.998, and the adjusted R2 value was 0.997, indicating that the linear fit was indeed very good. The standard errors were also fairly small. We also provide summary statistics of the regression analysis in Table II. Thus the regression analysis suggests that acp can be estimated quite accurately by the function a, = .0063 n2 5' d0 5, which is equivalent to at, = .0063 n2m05 (because cl = 2- 7 3 1 = .0063). (The difference between n2 49 and n2 5 is insignificant for the problem sizes tested. and one can construct lausible scenarios under which 2.5 would be the most likely exponent.) In Figure 2,

, (a)

0.47 I

0.37

VI

d=: d d= d=

0.27

0.17 I

d=

_

n U

-r

I

L ...

2000

n

4000

I

6bO

.

(b)

we plot the ratio .0063 n2 5 d0 S/crp. The figure provides

(b)

additional insight into the goodness of the fit obtained by the regression analysis. For example, the function .0063 ,25i0 II

$-3 1=4 =2

5 U

a

cp.t.I

p_t

cnc LU

t. Ad t VUt

i-;iI

e nUVan 1VV.tlt

CL CnA fbiC .ALt kL Lt

rLio rnwJ 61U 'vv LtL 1i

oF vI

the potential updates for d = 2 because the curve .0063 nz2 d°0 5/p is strictly decreasing for d = 2.

We note that regression analysis, while providing useful insights, is not fully applicable to our situation. First of all,

we are interested in the asymptotic analysis, which requires an extension of the curve to problem sizes significantly beyond those tested. Secondly, we are interested in the worst-case analysis (and, thus, upper bounds) rather than in best fits. For example, for the function f(x) = x sin x, there is no good linear fit; however, the bound f(x) < x is excellent. Although the regression analysis is not fully applicable, the fit is quite good in our case as indicated by an R2 value very close to 1.

l

ClS(I)

n

(c)

Figure 1. Identifying asymptotic bottlene tck operations (a)r Plot of the ratio of arc scan operations to the otal number of operations. (b) Plot of the ratio of flow update operations to the total number of operaltions. (c) Plot of the ratio of potential update operaticDns to the total number of operations.

growth rate of the counts of a bottleneck o]peration, which in our illustration is a,, the number of poter Ltial updates. Our approach consists of the following steps.

...............

Comparing Two Algorithms Suppose that we want to compare two different algorithms 1 and slz 2 for solving the same problem and are interested in knowing which algorithm performs better asymptotically. In general, one can apply the same methodology for assessing asymptotic bottleneck operations to compare different algorithms. Let I denote an instance of the problem, and let al(I) and ar2 (I) be the sum of the counts of the representative operations performed by the algorithms I2 and sI~ 2 on instance I. Let X(I) = a(I)/a 2(I) be the ratio of the computation counts, where II = n. We define X, to be the random variable with values Xn(I), where I is selected

~13g,'S

;~~~iST

~

~

P

~

F~E

1~

~It-6U

S~

· B~?35, JS~

326 Ahuja and Orlin

or

-i .,

Table II.

Regression Analysis for Estimating the Log of the Number of Potential Updates

.1i

II

Multiple R

R Square

Adjusted R Square

Standard Error

Observations

0.99882478

0.99765094

0.99743739

0.14228152

25

Analysis of Variance

df

Sum of Squares

Mean Square

Regression Residual Total

2 22 24

189.149 0.'445 189.594

94.574 0.020

Intercept log(nodes) log(degree)

,1.311, I

Coefficients

Standard Error

-7.31 2.49 0.50

0.317 0.026 0.035

degree = 2 J__ ....

clegree =4

,

degree = 6 degree -8 degree = 10

1.2. I

1.11.0 25

.0063 n

Cap

-5

d

0.9

I I

A 0 u.s.

0.7.

2000

4boo

86oo

600

too

n

Figure 2. Determining asymptotic growth rates of the bottleneck operations.

from a given probability distribution of problems of size n. We say that the algorithm As/ is asymptotically superior to the algorithm YL2 on the given probability distribution if, for all ,

> 0,

lim Prob(X, < 6) > 1 -

.

no---

To formally determine the asymptotically superior algorithm is a difficult problem, well beyond the scope of this paper. Here we suggest that a useful first step is to plot the

ratio of ca(I) and a2(I) for some randomly selected instances and look for trends as III increases in size.

Virtual Running Times In the introduction, we mentioned some difficulties that arise when we use CPU time as the only empirical measure for computational investigations. Here we mention yet one more disadvantage. CPU time can force unnecessary rigidity in the testing of an algorithm. To collect the CPU times for an algorithm, we should, in principle, conduct all of our tests on the same machine using the same compiler. Moreover, if the machine is a time-sharing machine, then one should ideally solve all test problems when the machine has a similar workload. Moreover, the CPU times for large in-

stances might be dominated by paging, that is, the memory management of the secondary storage space, and CPU times are influenced by caching as well. We can overcome some of these drawbacks using virtual

time instead of CPU time. (We point out that the virtual time is not directly related to the virtual memory.) The virtual running time of an algorithm is a linear estimate of its CPU time obtained by using its representative operation counts. The term "virtual time" appears in Brown, [6 and we use the term in a similar way to the way in which Brown used it. The virtual running time V(I) of the network simplex algorithm to solve an instance I is given by the linear function V(I) = c2 aE(I) + c3aw(I) + c 4cp(I),

where c, c, and c4 are constants selected so that V(I) is the best possible estimate of the CPU time given by the algorithm's actual running time CPU(I) on instance I. One plausible way to determine the constants c2, c, and c4 is to use (multiple) regression analysis. In our regression analysis, we

generated each of the points (CPU(I), a(I), cw(I), ap(I)) by taking an average of five problem instances; we used the

averages reported in Table I and found the best linear fit to the 5-point averages. Using regression analysis, we found that for the network simplex algorithm with the first eligible pivot rule, the virtual running time of an instance I as follows: V(I) = (1.07 aE(I) + 1. 9Saw(I) + 0.46acp(I))/100,000. In Table III, we provide summary statistics of the regression analysis, and in Figure 3, we plot the estimated CPU time divided by the real CPU time. The estimates seem to be excellent for larger values of n, and moderately good for lower values of n. The maximum relative error in estimation was 16%. One explanation for the larger relative errors for

smaller values of n is the nature of least squares regression. The five data points with the highest relative error corresponded to problems with 1,000 nodes, and these five points all had the smallest computation time. Recall, that the computation time is the divisor in the plotted ratios. In fact, the residuals for these five points (the absolute error in estima-

L~~il~ ias~~ ·

~E~~s ~~ur~.7s·WA": R~:rs~~ehr-

M

327 Representative Operation Counts

Regression Statistics for Estimating the CPU Time

Table III. Multiple R

R Square

Adjusted R Square

Standard Error

Observations

0.99973806

0.99947619

0.95397403

8.55919711

25

Analysis of Variance

df

Sum of Squares

Mean Square

Regression Residual Total

3 22 25

3075320.430 1611.717 3076932.140

1025106.810 73.260

Intercept Arc scans Cycle nodes Tree nodes

Coefficients

Standard Error

0 1.066E-05 1.9757E-05 4.6487E-06

N/A 2.7135E-06 9.327E-07 3.3261E-07

t 1_

n7

1.3-

V.l/

.- o---

1.21.1.

degree = 2 degree = 4 ___ 1 _-egree = b egree = 8 egreet = 10

-40.6

E

a0-

_0

-a

degree = 2 degree = 4 degree = 6 degree = 8 degree = 10

0.5

0:

1.0.

_. cJu _

CPU(I)

:;.-A0.4.

Ls .

._ o

0.9-

U.

1-u

0.8.

C) = p

u.7/ i .

0.3.

.: 't) Q t = Cs

0

1

I

2000

r

·

-I

4000

600

·

,

0.2.

' -i

V. -

1

8000

10000

I

Iori

':

-

2000

. .·

4000

I

·

8000o

6000

0000

Figure 3. Determining how well the virtual running time estimates the CPU time for the stated problem.

n Figure 4. Identifying the proportion of the virtual running time accounted for by the potential updates.

tion) were among the lowest. (It is plausible that a small constant term should have been included in the regression; however, when the regression was rerun allowing for a small constant term, the relative errors in estimating the CPU time for problems with 1,000 nodes were even larger than given above.) Using virtual running times has several advantages. First, the virtual running time helps us to assess the proportion of time that an algorithm spends on different representative operations. As an immediate consequence, it helps us to identify not only the asymptotic bottleneck operation, but also the bottleneck operations for different sizes of problem instances. For example, we estimated the virtual running time of the network simplex algorithm as V(I) = (1.07 EcE(I) + 1.98aw(I) + .46%p(I))/100,000. To estimate the proportion of time spent on potential updates, in Figure 4 we plot .46 acp()/(1.07 c(I) + 1.98cew(I ) + .4 6 ap(I))/100l,O000. As we can see, for small problems, updating of the node potentials is not the bottleneck operation; as n increases, however,

the proportion of the time taken by the potential updates increases as well. A second advantage of the virtual running time is that it is particularly well suited for situations in which the testing is carried out on more than one computer. This situation might arise for several reasons. For example, several users might be conducting computational experiments at different sites, or a single user might wish to conduct additional tests after upgrading from an old computer. This situation is very common in the research literature because authors often conduct additional experiments at the suggestion of a referee or editor. When one moves from one computer system to another, the representative operation counts remain unchanged. Using the representative operation counts of the previous study and the constants obtained for t new computer system (obtained in the regression estimate for the virtual running time), one can obtain the virtual running times for all problems of the previous study measured in terms of the new computer system. A third advantage of using the virtual running time is that

n

I -I IPOI WIN

-

ll~~~~~~-;'i?~;lm~c~a~-;jrN; ,ii~~r

~ ~ ~

yz~Fi~~rt6r~i a

·

W~BF~t.a~

r

328 Ahuja and Orlin

. It.3-

-.-- m-C-- degree = 2 degree = 4 deeree = 6

.

ee=8 ree= 10

1.2-

1.predicted pivots actual pivots

l.a

n Q. v.(- T--.-

,I,

i6ooo n r e f h n r o Figure 5. Estimating the growth rate of the number of 0

2b00

4000

6000

8000

pivots.

it permits us to estimate the running time exclusive of the time spent on paging. When a computer program executes a large program whose data do not fully fit into the computer's primary memory (for example, RAM), it stores part of the data in the secondary memory (for example, disk) with higher retrieval times. As a result, large programs run slower. In the virtual running time analysis, if we evaluate the constants using small-size problems (that fully execute in the primary memory), then the virtual running times for large-size problems would be as though the program is entirely run in the primary memory. Thus, virtual running time can be used to estimate the running time of an algorithm on a computer with sufficiently large primary memory. A natural question is whether the constants used in the virtual running time are robust, that is, do they give an accurate estimate of the CPU times for all possible problem inputs. Again, our use of virtual CPU times as stated here is not fully rigorous, but our limited experience so far has suggested that they do seem to be robust. In practice, one can also measure the robustness of the constants using statistical analysis. Additional Insight into an Algorithm We can also use computation counts to gain additional insight into an algorithm. For example, we can use the preceding analysis to estimate the number of pivots performed by the network simplex algorithm. In the linear programming literature, it is widely believed that for most pivoting rules, the number of pivots is typically proportional to the number of constraints and rarely more than 3 times the number of constraints. As in the case of estimating the number- of potential updates, we ran a linear regression estimating the number of pivots as f(n, d) = c,nadO. This gave uldg(p) = log c5 + a log n + log d. The multiple regression analysis yielded log c = -4.64, a = 1.65, and 3 = 0.69. Exponentiating both sides of the equality gives an estimate of the number of pivots as .04 n '

65

d

69

.

We plotted the predicted number of pivots divided by the actual number of pivots in Figure 5. The regression statistics

I

-F

are given in Table IV. Although the regression does give the best fit for the number of pivots, the prediction for the case that d = 2 is asymptotically an underestimate and the plot for d = 10 suggests that the prediction for the case that d = 10 is an overestimate. One might conjecture from these plots that the dependence of the number of pivots on the average degree d and the number of nodes n is more complicated than suggested by the functional form. One might also be interested in determining what percentage of pivots is degenerate as the network size grows. Figure 6 gives the ratio of the number of degenerate pivots to the total number of pivots. We observe that the ratio varies between 79% and 89% and the proportion of the degenerate pivots increases as the problem size increases. We might plot a few more graphs to gain additional insight into the network simplex algorithm. For example, we might plot the average size of the tree whose node potentials change during the pivot operation (that is, cp(I)/p). This plot would give us a good estimate of the running time per pivot since potential updating is the bottleneck operation, at

least for the first eligible arc pivot rule for the problems generated by NETGEN. In addition, one might pose other questions such as the following: How quickly does the network simplex algorithm converge to the optimal objective function value? Does the algorithm quickly obtain a solution with a near optimal objective function value and then slow down, or does it slowly approach the optimal objective function value and then converge rapidly? We could, in principle, provide a partial answer to these questions by selecting a few sufficiently large instances and plotting the objective function values as a function of the number of pivots or as a function of the virtual CPU time. 4. Limitations The methodology discussed in this paper is broadly applicable to sequential algorithms. We can always determine a representative set of operations since the set of all operations is itself a representative set. In addition, our experience suggests that there is typically a very small representative set of operations. Consequently, our methodology can be used to obtain valuable information about an algorithm's behavior and the comparative performance of different algorithms. However, our methodology has some limitations and one should be cautious in drawing conclusions. Here we list some of the important limitations. 1. One cannot make conclusions about the asymptotic performance without a well-developed theory of the growth in the computational time. Even if the growth in the computational time is well behaved and well characterized empirically for n < 100,000 (or -any other large number), it is possible in principle that the growth in the computational time could be fundamentally different for values of n that are larger than 100,000. 2. Our methodology is not appropriate for analyzing parallel algorithms. Although we can keep track of the number of representative operations in a parallel algorithm, these num-

329 Representative Operation Counts

Table IV. Regression Statistics for Estimating the Number of Pivots Multiple R

R Square

Adjusted R Square

Standard Error

Observations

0.99709

0.99419

0.99366

0.15372

25

Analysis of Variance

df

Sum of Squares

Mean Square

Regression Residual Total

2 22 24

88.913 0.520 89.433

44.456 0.024

Coefficients

Standard Error

-4.636 1.651 0.688

0.343 0.028 0.038

Intercept log(nodes) log(degree)

0.90 -

0.88-

0.86 =2 p

0.84-

0.82-

0.80-

(.1~

/i iI

2,000

4,000

6,000

large number of fundamental operations. We refer to a subset of fundamental operations as a set of representative operations if for every possible problem instance, the sum of the counts of the representative operations provides an upper bound (to within a multiplicative constant) on the number of operations that an algorithm performs. We have shown that representative operation counts can provide valuable information about an algorithm's behavior that is not captured by the CPU time alone. In many cases, the representative operation counts facilitate: (i) determining the asymptotic growth rate of the running time of an algorithm independent of its computing environment; (ii) assess-

I

ing the time an algorithm spends on different basic opera-

8,000

tions; (iii) comparing two algorithms executed on different computers; and (iv) estimating the running time of an algo-

Figure 6.

Occurrence of degenerate pivots.

bers cannot necessarily be used to estimate CPU time because many of these operations are performed in parallel. Communication overhead is often the bottleneck operation in parallel algorithms, and it is quite difficult to estimate this overhead using only computation counts. In addition to these limitations, there are a number of issues that we have either not treated or have treated in a very limited way. For example, many algorithms and heuristics are not guaranteed to produce the optimal solution. In this case, one may be very interested in the deviation of the solution from optimality; that is, one may be interested in either the absolute error or the relative error or both. We have also not touched much on statistics, or on the selection of test problems. 5. Summary NMfost'iterative algorithms for solving optimization problems repetitively perform some basic operations. For almost all algorithms, we can decompose these basic operations into fundamental operations so that the algorithm executes each operation in 0(1) time. An algorithm typically performs a

rithm on a computer different from the one carrying out the experiments. The simple methodologies that we have presented for conducting empirical analysis of an algorithm using representative operation counts do not provide rigorous guarantees; nevertheless, they often provide considerable insight

into an algorithm's behavior, and they typically yield far more insight than can be obtained through analyzing only CPU times. We also anticipate that researchers will be able to draw upon a number of concepts and tools from statistics to strengthen the analysis and add rigor where appropriate. The ideas we have outlined in this paper apply to most network algorithms and should, as well, apply to optimization algorithms for problems arising in other application domains.

Acknowledgments This paper is largely excerpted from a chapter in our book, Ahuja, Mlagnanti, and Orlin.l We thank the referees and the associate editor for their insightful and perceptive comments which greatly helped us to improve the presentation. We also thank Professor Roy Welsch of MIT for his help on issues related to the use of statistics. This research was supported by the Air Force Office of Scientific Research

i,: r-··Ta·rrcT;prri-.·F;?saW4u

L/WIITII(IPJ···Il*lrrS1+TCII

·pllyt·-7f

:7r.Y:- fllli ? · 111*:-·*-Y --i

-7- 1 * - -I :ijl.

330 Ahuja and Orlin

1. R.K. AHUJA, T.L. MAGNANTI and J.B. ORLIN, 1993. Network Flows: Theory, Algorithms, and Applications, Prentice Hall, Inc.,

15. M.D. GRIGORIADIS, 1986. An Efficient Implementation of the Network Simplex Method, Mathematical ProgrammingStudy 26, 83-111. 16. K.L. HOFFMAN and R.H.F. JACKSON, 1982. In Pursuit of a Methodology for Testing Mathematical Programming Software, in J.M. Mulvey (ed.), Evaluating Mathematical Programming Techniques, Lecture Notes in Economics and Mathematical Systems 199,

Englewood Cliffs, NJ. 2. J.L. BENTLEY, 1990. Experiments on Geometric Traveling Sales-

17. R.H.F. JACKSON, P.T. BOGGS, S.G. NASH and S. POWELL, 1989.

Grant AFOSR-88-0088 as well as grants from UPS and Prime Computer Corporation. References

Springer-Verlag, New York, 177-199.

man Heuristics, Proceedings of the First Annual ACM-SIAM Symposium on Discrete Algorithms, 91-99. 3. J.L. BENTLEY and B.W. KERNIGHAN, 1990. A System for Algo-

Report of the Ad Hoc Committee to Revise the Guidelines for Reporting Computational Experiments in Mathematical Pro-

rithm Animation: Tutorial and Algorithm Animation, Unix Research System Paper, l 0 th Ed., Vol. II, Saunders College Publishing, 451-475. 4. R.G. BLAND and D.L. JENSEN, 1992. On the Computational Behavior of a Polynomial-Time Network Flow Algorithm, Mathe-

18. R.H.F. JACKSON and J.M. MULVEY, 1978. A Critical Review of Comparisons of Mathematical Programming Algorithms and

matical Programming 54, 1-39. 5. G. BRADLEY, G. BROWN and G. GRAVEs, 1977. Design and Implementation of Large Scale Primal Transshipment Algorithms, Management Science 21, 1-38. 6. M.H. BROWN, 1988. Algorithm Animation, MIT Press, Cambridge, MA. 7. M.D. CHANG and C.J. CHEN, 1989. An Improved Primal Simplex

Salesman Problem, Proceedings of the 17th Colloquium on Automata, Languages and Programming, Springer-Verlag, Berlin, 446461. 20. D. KLINGNuo4, A. NAPIER and J. STUTZ, 1974. NETGEN: A

Variant for Pure Processing Networks, ACM Transactions on Mathematical Software 15, 64-78. 8. T.H. CORMEN, C.L. LEISERSON and R.L. RIVEST, 1990. Introduction to Algorithms, MIT Press and McGraw-Hill, New York. 9. H.P. CROWDER, R.S. DENIBO and J.M. MULVEY, 1978. Reporting

Computational Experiments in Mathematical Programming, Mathematical Programming15, 316-329. 10. H.P. CROWDER, R.S. DEMBO and J.M. MULVEY, 1979. On Report-

ing Computational Experiments with Mathematical Software, ACM Transactions on Mathematical Software 5, 193-203. 11. H.P. CROWDER and P.B. SAUNDERS, 1980. Results of a Survey on MP Performance Indicators, COAL Newsletter, January, 2-6. 12. F. GLOVER, D. KARNEY and D. KLINGMAN, 1974. Implementation

and Computational Comparisons of Primal, Dual and PrimalDual Computer Codes for Minimum Cost Network Flow Problems, Networks 4, 191-212. 13. B.L. GOLDEN, A.A. ASSAD, E.A. WASIL and E. BAKER, 1986. Experimentation in Optimization, EuropeanJournal of Operational Research 27, 1-16. 14. H. GREENBERG, 1990. Computational Testing: Why, How, and How Much, ORSA Journal on Computing 2, 94-97.

~~_~___~__~_ __ ~__ _~__i _~_ _ _ lli~-· ~_-_~ I. --·----l~---l ·

_ -. __.~~,~~ ··e --*rcarPII _~_. ----- ,-~ ·-

~

gramming, COAL Newsletter No. 18, 3-14.

Software (1953-1977), Journalof Research of the National Bureau of Standards 83, 563-584.

19. D.S. JOHNSON, 1990. Local Optimization and the Traveling

Program for Generating Large Scale Capacitated Assignment, Transportation, and Minimum Cost Flow Network Problems, Management Science 20, 814-821. 21. D.E. KNUTH, 1973. The Art of Computer Programming, Volume 3: Sorting and Searching, Addison-Wesley, Reading, MNA. 22. E.L. LAWLER, 1976. CombinatorialOptimization: Networks and Ma-

troids, Holt, Rinehart and Winston, New York. 23. Y. LEE, 1993. Computational Analysis of Network Optimization Algorithms, Ph.D. Dissertation, Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA.

24. C.C. McGEOCH, 1986. Experimental Analysis of Algorithms, Ph.D. Dissertation, Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA. 25. C.C. McGEOCH, 1992. Analysis of Algorithms by Simulation: Variance Reduction Techniques and Simulation Speedups, ACM Computing Surveys 24, 195-212.

26. J. MuLVEY, 1978. Pivot Strategies for Primal-Simplex Network Codes, Journal of AC.MX 25, 266-270.

27. R. SEDGEWICK, 1977. The Analysis of Quicksort Programs, Acta Informatica 7, 327-355. 28. D.R. SHIER, 1981. A Computational Study of Floyd's Algorithm, Computers & Operations Research 8, 275-293.

~-.

----mmi llim-m

I

r~r~II~F~

L~L~T-~L*C~-- "I WP--P v Itmp I - i -i-!WA--- -

X-O"P

I

I

a lg Ip