Table of Contents ***************** Uniformity Testing 1 Introduction 1.1 How to Read a TexiWEB 1.2 Coding Style 1.3 License 2 Uniformity Testing 2.1 External Interface: `uniformity.h' 2.1.1 Low-Level Routines 2.1.2 Options 2.2 Implementation: `uniformity.c' 2.2.1 High-Level Test Function 2.2.2 Convex Hull Insideness Testing Function 2.2.3 MST Test Run Function 2.2.4 Results Calculation Function 2.2.5 Options Initialization Function 3 Minimum Spanning Tree Computation 3.1 Class Structure 3.2 Prim's Algorithm 3.3 Class Implementations 4 Priority Queues 4.1 Class Structure 4.2 Binary Heap 4.2.1 Creation 4.2.2 Destruction 4.2.3 Minimum Value Extraction 4.2.4 Decreasing Keys 4.2.5 Count 4.2.6 Key Lookup 4.2.7 Verification 4.2.8 Dumping 4.2.9 Class Implementation 4.3 Fibonacci Heap 4.3.1 Creation 4.3.2 Destruction 4.3.3 Minimum Value Extraction 4.3.4 Decreasing Keys 4.3.5 Count 4.3.6 Key Lookup 4.3.7 Verification 4.3.8 Dumping 4.3.9 Class Implementation 5 Other Classes 5.1 Memory Allocation 5.2 Random Number Generator 5.3 Point Set 6 Discussion 6.1 Programs for Testing 6.1.1 Convex Hull Estimation 6.1.2 Minimum Spanning Trees 6.1.3 Uniformity Tester 6.2 Testing Results 6.2.1 Correctness 6.2.1.1 Known Sampling Window 6.2.1.2 Unknown Sampling Window 6.2.2 Normal Distributions 6.2.3 Testing Separation 6.3 Sources of Error 6.3.1 Bugs in MST Implementation 6.3.2 Bugs in Convex Hull Insideness Test 6.3.3 Imprecise Arithmetic 6.4 Performance 6.5 Future Work Appendix A References Appendix B Index Uniformity Testing ****************** 1 Introduction ************** This report presents a library for comparing the distribution of multidimensional data against a uniform distribution, using the minimum spanning tree-based approach described in *Note Smith 1984::. The complete source code for the library is included, along with discussion of the rationale behind its structure. Finally, its performance is analyzed, and possibilities for further refinements are noted. The audience for this document is twofold. First, it is intended as a class semester report. For this purpose, the first part of the next chapter (*note Uniformity Testing::) and the final chapter (*note Discussion::) are probably the most interesting parts. A secondary purpose for this document is to present the source code for the library as a literate program, as described below. For this reason, some algorithms, such as those for maintaining a binary heap, are described in more depth than would otherwise be necessary. 1.1 How to Read a TexiWEB ========================= The code in this report is written in ANSI/ISO C89 using TexiWEB, a literate programming system. Literate programming is a philosophy that regards software as a form of literature. The ideas behind literate programming have been around for a long time, but the term itself was invented by famous computer scientist Donald Knuth in 1984, who wrote two of his most famous programs (TeX and METAFONT) with a literate programming system of his own design. That system, called WEB, inspired the form and much of the syntax of TexiWEB. A TexiWEB document is a C program that has been cut into sections, rearranged, and annotated, with the goal to make the program as a whole as comprehensible as possible to a reader who starts at the beginning and reads the entire program in order. To this end, each section of a TexiWEB program is assigned both a number and a name. Section numbers are assigned sequentially, starting from 1 with the first section, and they are used for cross-references between sections. Section names are English words or phrases assigned by the TexiWEB program's author to describe the function of the corresponding code. Here's a sample TexiWEB section: 19. = for (i = 0; i < hash->m; i++) hash->entry[i] = NULL; You should notice how the first line gives the section's number, followed by its name and again its number within angle brackets. Code segments often contain references to other code segments, shown as a section name and number within angle brackets. These act something like macros, in that they stand for the corresponding replacement text. For instance, consider the following segment: 15. = hash->m = 13; <*Note Clear hash table entries:: 19> This means that the code for `Clear hash table entries' should be inserted as part of `Initialize hash table'. Since the name of a section explains what it does, it's often unnecessary to know anything more. But if you do want more detail, the section number 19 in <*Note Clear hash table entries:: 19> can easily be used to find the full text and annotations for `Clear hash table entries'. In addition, at the bottom of section 19 you will find a note reading `This code is included in segment *Note 15: Initialize hash table.', making it easy to move back to section 15 that includes it. It is possible to have multiple sections with the same name. When a name that corresponds to multiple sections is referenced, code from all the sections with that name is substituted, in order of appearance. For instance, the following shows another line of code in `Initialize hash table': 16. += hash->n = 0; When multiple sections have the same name, those sections end with a note listing the numbers of all other same-named sections. In this case, the note would read `See also segment *Note 15: Initialize hash table.'. When a TexiWEB program is converted to C, conversion conceptually begins from sections whose names are file names; e.g., <`foo.c' 37>. Within these sections, all of the section references are expanded, then any references within those sections are expanded, and so on. When expansion is complete, the specified files are written out. Finally, another useful resource in reading a TexiWEB is the index, which contains an entry for the points of declaration of every section name, function, type, structure, union, global variable, and macro. Declarations within functions are not indexed. (This section is adapted from "Reading the Code," *Note Pfaff 2000::. It was inspired by "How to read a WEB," *Note Knuth 1992::.) 1.2 Coding Style ================ The source code here complies to the requirements imposed by ANSI/ISO C89 and ISO/IEC C99. Only features present in C89 are used, but the code is forward-compatible with C99. Most of the GNU Coding Standards are followed. Indentation style is an exception: in print, to save vertical space, K&R indentation style is used instead of GNU style. The code here also conforms to Ben Pfaff's published personal coding standards. References: *Note ISO 1990::; *Note ISO 1999::; *Note FSF 2001::, section "Writing C"; *Note Pfaff 2001::. 1.3 License =========== The library code itself is under a liberal BSD-style license: 1. = /* Copyright (c) 2001 Ben Pfaff . All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: - Redistributions of source code must retain the above copyright notice, this list of conditions, and the following disclaimer. - Redistributions in binary form must reproduce the above copyright notice, this list of conditions, and the following disclaimer in the documentation and/or other materials provided with the distribution. There is NO WARRANTY, not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. */ This code is included in segments *Note 3: uniformityh, *Note 9: uniformityc, *Note 26: mst-primc, *Note 34: pqh, *Note 35: pq-bin-heapc, *Note 50: pq-fib-heapc, *Note 68: mem-stdc, *Note 70: rng-stdc, and *Note 72: set-rectc. TexiWEB and some of the library's test code is derived from GNU libavl. This code falls under the GNU General Public License: 2. = /* Copyright (C) 2001 Free Software Foundation, Inc. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. The author may be contacted at on the Internet, or as Ben Pfaff, 12167 Airport Rd, DeWitt MI 48820, USA through more mundane means. */ 2 Uniformity Testing ******************** The main function of the library presented in this report is to compare the distribution of points in a multidimensional space against a uniform distribution. The library takes as input a set p of n points in d dimensions each and returns as output a single real number that signifies how uniformly the points are distributed in d-space. The algorithm used for uniformity testing is based on a minimum spanning tree. It can be outlined as follows: 1. Generate a set q of n points uniformly distributed over the approximate convex hull of p. 2. Find a minimum spanning tree (MST) for the set of points containing both p and q. 3. Calculate test statistic t, based on the MST results, as well as its expected value e and variance var given that p is uniformly distributed. 4. Compute the normal significance s of the difference between t and e. The smaller the value of s determined in the final step, the more likely it is that the set p is not representative of a uniform distribution. Typically a small threshold value such as 0.05 or 0.02 is compared to s in order to make the final decision on uniformity. This chapter presents the external interface to the uniformity testing routines as well as their high-level implementation. References: *Note Smith 1984::, pp. 76-77. 2.1 External Interface: `uniformity.h' ====================================== The entire external interface to the uniformity testing routines is in `uniformity.h'. All externally visible identifiers in this file and others are prefixed by unf_ to avoid polluting global namespace. This is the only header file that needs to be included to use this library. Here is the header file's high-level outline: 3. <`uniformity.h' 3> = <*Note BSD License:: 1> #ifndef UNIFORMITY_H #define UNIFORMITY_H #include <*Note Test options:: 6> <*Note High-level uniformity test routine prototype:: 4> <*Note Low-level uniformity test routines prototypes:: 5> #endif /* uniformity.h */ Most of the time, only one function in this file, unf_test(), will be of interest. It performs all the steps needed for a MST-based uniformity test, using a straightforward interface. The function takes a set of points p and returns a value representing how uniformly the points are distributed. It has the following parameters: struct unf_options *options A set of configuration options, which may be set to NULL if no special configuration is desired. const double *P The set of points to test, a total of n * d doubles. The first d doubles make up the first point, the second d doubles make up the second point, and so on. int n The number of points in p. int d The number of dimensions in the points in p. unf_test() returns a number between 0 and 1, where 0 is very non-uniform and 1 is completely uniform. 4. = /* High-level routine. */ double unf_test (const struct unf_options *, const double *, int, int); This code is included in segment *Note 3: uniformityh. 2.1.1 Low-Level Routines ------------------------ Three low-level routines are also provided for use by external programs: unf_inside_hull() Tests whether a provided point is within the approximate convex hull of a set of points. unf_run_mst() Finds the MST of a set of points and calculates statistics based on the results. unf_calc_results() Calculates the significance of test results. These functions will be discussed in more detail and implemented later in this chapter. Here are their prototypes: 5. = /* Low-level routines. */ int unf_inside_hull (const double *, int, int, const double *, double *); int unf_run_mst (struct unf_mem *, struct unf_mst *, int *, int *, const double *, int, int); double unf_calc_results (int, int, int); This code is included in segment *Note 3: uniformityh. 2.1.2 Options ------------- Occasionally, it may be desirable to fine-tune the operation of unf_test(). This possibility is supported by the "hooks" provided by struct unf_options, defined as follows: 6. = /* Options. */ /* A set of options for unf_test(). */ struct unf_options { struct unf_mem *unf_mem; /* Memory allocator. */ struct unf_rng *unf_rng; /* Random number generator. */ struct unf_set *unf_set; /* Point set. */ struct unf_mst *unf_mst; /* Minimum spanning tree. */ }; See also segments *Note 7: Test options-2, *Note 8: Test options-3, *Note 25: Test options-4, *Note 67: Test options-5, *Note 69: Test options-6, and *Note 71: Test options-7. This code is included in segment *Note 3: uniformityh. To use this structure, declare an instance of it, initialize its members to their defaults, then set as desired any members that should not take their default values. This technique provides for the best forward-compatibility should a new version of this library be released with additional options. Two methods are provided to initialize members of struct unf_options to their defaults. At declaration time, macro UNF_DEFAULTS may be used as an initializer. Alternatively, a previously declared option structure may be initialized with unf_init_options(): 7. += /* Default options. */ #define UNF_DEFAULTS {NULL, NULL, NULL, NULL} void unf_init_options (struct unf_options *); See also segments *Note 6: Test options, *Note 8: Test options-3, *Note 25: Test options-4, *Note 67: Test options-5, *Note 69: Test options-6, and *Note 71: Test options-7. This code is included in segment *Note 3: uniformityh. Each of the members of struct unf_options points to an object-oriented "class" structure. These classes are used during uniformity testing for various tasks: memory allocation, random number generation, point selection, and minimum spanning tree calculation. The defaults are generally reasonable, but these hooks are provided to allow for substitution of better alternatives. Later chapters will discuss these classes in detail. For now, it is sufficient to list the implementations provided here: 8. += /* Provided class implementations. */ extern struct unf_mem unf_mem_malloc; extern struct unf_set unf_set_rectangular; extern struct unf_rng unf_rng_mt; extern struct unf_rng unf_rng_system; extern struct unf_mst unf_mst_prim_binary; extern struct unf_mst unf_mst_prim_fibonacci; See also segments *Note 6: Test options, *Note 7: Test options-2, *Note 25: Test options-4, *Note 67: Test options-5, *Note 69: Test options-6, and *Note 71: Test options-7. This code is included in segment *Note 3: uniformityh. 2.2 Implementation: `uniformity.c' ================================== The functions declared in `uniformity.h' are implemented in `uniformity.c', which begins as follows: 9. <`uniformity.c' 9> = <*Note BSD License:: 1> #include #include #include #include #include ``uniformity.h'' See also segments *Note 10: uniformityc-2, *Note 15: uniformityc-3, *Note 16: uniformityc-4, *Note 17: uniformityc-5, *Note 22: uniformityc-6, and *Note 24: uniformityc-7. 2.2.1 High-Level Test Function ------------------------------ Here is the outline of the unf_test() function: 10. <`uniformity.c' 9> += /* Tests for uniformity of p, a set of n points in d dimensions. p is an array of n * d doubles, where the first point is stored in the first d elements, the second in the next d elements, and so on. The contents in user_options, if non-NULL, are used to control the testing procedure. Returns a confidence level between 0 and 1, 1 indicating that the set is perfectly uniform and 0 indicating that the set is perfectly clustered. Returns -1 in the event of a memory allocation error. */ double unf_test (const struct unf_options *user_options, const double *p, int n, int d) { struct unf_options options; /* Local copy of options. */ double *r; /* Stores p plus generated points. */ assert (p != NULL && n > 0 && d > 0); <*Note Set up |options| for uniformity test:: 11> <*Note Allocate memory for uniformity test:: 12> <*Note Generate points for uniformity test:: 13> <*Note Run MST calculate results and return:: 14> } See also segments *Note 9: uniformityc, *Note 15: uniformityc-3, *Note 16: uniformityc-4, *Note 17: uniformityc-5, *Note 22: uniformityc-6, and *Note 24: uniformityc-7. The first step is to set up the user_options argument. If it is NULL, we need to replace it entirely by the defaults. Otherwise, we make a copy and check member by member for nulls: 11. = { static const struct unf_options default_options = { &unf_mem_malloc, &unf_rng_system, &unf_set_rectangular, &unf_mst_prim_binary, }; if (user_options == NULL) options = default_options; else { options = *user_options; if (options.unf_mem == NULL) options.unf_mem = default_options.unf_mem; if (options.unf_rng == NULL) options.unf_rng = default_options.unf_rng; if (options.unf_set == NULL) options.unf_set = default_options.unf_set; if (options.unf_mst == NULL) options.unf_mst = default_options.unf_mst; } } This code is included in segment *Note 10: uniformityc. The second step is to allocate memory, using the memory allocator in options.unf_mem. We need storage for (2 * n) points of d dimensions for storing the points from p and an equal number of generated points, plus space for 2 more such points for temporary storage. In order to save on calls for memory allocation, we consolidate all this temporary storage into a single block and allocate it all at once. 12. = r = options.unf_mem->unf_alloc (sizeof *r * ((2 + n * 2) * d)); if (r == NULL) return -1.0; This code is included in segment *Note 10: uniformityc. The third step is to generate the points in r on which to perform the MST. The first n of these points are copied from p. It would be faster here to avoid the copy, passing instead a pair of arrays to the MST calculation routine, but it would both complicate and slow down the MST routine itself, so such a change would probably not be worthwhile. The remaining n of the points are generated randomly within the approximate convex hull of p. This is done using a rejection procedure: points are picked randomly from a conveniently shaped volume that contains the convex hull (for instance, a hyper-rectangle or hypersphere), then points that do not fall within the approximate convex hull are discarded. The routine for testing whether a point falls within the approximate convex hull of p, unf_inside_hull(), uses the storage allocated earlier for an additional pair of points. Notice that this is one place where being able to use C99 features would be handy. Variable Length Arrays (VLAs) from C99 would allow these additional points to be declared as local variables with no need to pre-allocate and pass them down, improving modularity. 13. = { double *const tmp = r + (n * 2) * d; struct unf_a_set *s; int i; s = options.unf_set->unf_create (options.unf_mem, p, n, d); if (s == NULL) { options.unf_mem->unf_free (r); return -1.0; } memcpy (r, p, sizeof *r * n * d); for (i = 0; i < n; ) { double *y = r + (n + i) * d; options.unf_set->unf_random (s, options.unf_rng, y); if (unf_inside_hull (p, n, d, y, tmp)) i++; } options.unf_set->unf_discard (s); } This code is included in segment *Note 10: uniformityc. Finally, we run the MST and calculate and return the results: 14. = { int c, t; int okay; okay = unf_run_mst (options.unf_mem, options.unf_mst, &c, &t, r, n, d); options.unf_mem->unf_free (r); return okay ? unf_calc_results (c, t, n) : -1; } This code is included in segment *Note 10: uniformityc. 2.2.2 Convex Hull Insideness Testing Function --------------------------------------------- Function unf_inside_hull() determines whether a given point y is within the approximate convex hull of a set of points p, using the algorithm specified. This algorithm first calculates the vector __ \ n_star_est = (1/n) > (p[i] - y) * length(p[i] - y))**(d + 1) /_ i=1...n where, as usual, n is the number of points and d the number of dimensions. This vector is an estimate of n_star, a vector for which the dot product n_star * (p[i] - y) > 0 is true for all i = 1, 2, ..., n, which is true if and only if n_star is within the convex hull of p. The test calculates this dot product for all values of i and reports that y is within p's convex hull only if the result is always positive. The first part of the implementation is a helper function to calculate n_star_est. This function, calc_n_star_est(), iterates over the values of i. For each i, it measures the magnitude of the vector difference p[i] - y and calculates a scalar factor representing the denominator in the equation above. The vector difference, times factor, is added to the running sum for n_star_est. Finally, the 1/n factor is applied, and the function returns. 15. <`uniformity.c' 9> += /* Calculates vector n_star_est at point y within points p. tmp is used for temporary storage. Its contents are destroyed. There are d elements in each of n_star, tmp, and y, and n * d elements in p. */ static void calc_n_star_est (double *n_star_est, double *tmp, const double *p, int n, int d, const double *y) { int i; assert (n_star_est != NULL && tmp != NULL && p != NULL && n > 0 && d > 0 && y != NULL); for (i = 0; i < d; i++) n_star_est[i] = 0.0; for (i = 0; i < n; i++) { double length = 0.0; double factor; int j; for (j = 0; j < d; j++) { double diff = tmp[j] = p[j] - y[j]; length += diff * diff; } factor = 1.0 / pow (length, d + 1.0); for (j = 0; j < d; j++) n_star_est[j] += tmp[j] * factor; p += d; } for (i = 0; i < d; i++) n_star_est[i] /= n; } See also segments *Note 9: uniformityc, *Note 10: uniformityc-2, *Note 16: uniformityc-4, *Note 17: uniformityc-5, *Note 22: uniformityc-6, and *Note 24: uniformityc-7. The unf_inside_hull() function itself is simple. It calls calc_n_star_est() to calculate n_star_est, then calculates the dot product dot of n_star_est with the vector difference p[i] - y for each possible value of i. If dot is positive every time, then y is within the approximate convex hull of p; otherwise, it is not. 16. <`uniformity.c' 9> += /* Returns nonzero only if point y is within the (approximate) convex hull of the n points in array p. Each point has d dimensions, and tmp must point to a modifiable scratch array with room for 2 * d doubles. */ int unf_inside_hull (const double *p, int n, int d, const double *y, double *tmp) { double *n_star_est; int i; assert (p != NULL && n > 0 && d > 0 && y != NULL && tmp != NULL); n_star_est = tmp + d; calc_n_star_est (n_star_est, tmp, p, n, d, y); for (i = 0; i < n; i++) { double dot; int j; dot = 0.0; for (j = 0; j < d; j++) dot += (*p++ - y[j]) * n_star_est[j]; if (dot <= 0.0) return 1; } return 0; } See also segments *Note 9: uniformityc, *Note 10: uniformityc-2, *Note 15: uniformityc-3, *Note 17: uniformityc-5, *Note 22: uniformityc-6, and *Note 24: uniformityc-7. References: *Note Smith 1984::, pp. 75-76. 2.2.3 MST Test Run Function --------------------------- The unf_run_mst() function actually runs the MST of the pooled original and generated points, then generates the test statistics c and t based on the results. It works like this: 17. <`uniformity.c' 9> += /* Runs the MST-based test for uniformity on the 2 * n points in r, of d dimensions each. The first n of the points should be the points to test, the last n the randomly generated points within the approximate convex hull of those points. *c and *t receive on output the C and T statistics. Returns nonzero if successful, zero if a memory allocation error occurred. */ int unf_run_mst (struct unf_mem *mem_class, struct unf_mst *mst_class, int *c, int *t, const double *r, int n, int d) { unf_mst_result *mst; int i; assert (mem_class != NULL && mst_class != NULL && c != NULL && t != NULL && r != NULL && n > 0 && d > 0); <*Note Find MST of pooled points:: 18> <*Note Calculate T statistic:: 19> <*Note Calculate C statistic:: 20> <*Note Free memory and return:: 21> } See also segments *Note 9: uniformityc, *Note 10: uniformityc-2, *Note 15: uniformityc-3, *Note 16: uniformityc-4, *Note 22: uniformityc-6, and *Note 24: uniformityc-7. We delegate the computation of the MST to a hook function. The function takes a memory allocator class and the set of points as its arguments. It returns a pointer to 2 * n - 1 pairs of ints, where each pair designates the indexes of two elements of r connected within the MST. 18. = mst = mst_class->unf_run (mem_class, r, 2 * n, d); if (mst == NULL) return 0; This code is included in segment *Note 17: uniformityc. The T statistic is the number of edges in the MST that join a point in the original set of points p with a point in the set of randomly generated points. The original points are the first n in r, the generated ones are the remainder, so this is simple. 19. = *t = 0; for (i = 0; i < 2 * n - 1; i++) if ((mst[i][0] < n) != (mst[i][1] < n)) (*t)++; This code is included in segment *Note 17: uniformityc. The C statistic is the number of pairs of edges in the MST that share a node in common. 20. = *c = 0; for (i = 0; i < 2 * n - 1; i++) { int j; for (j = i + 1; j < 2 * n - 1; j++) if (mst[i][0] == mst[j][0] || mst[i][0] == mst[j][1] || mst[i][1] == mst[j][0] || mst[i][1] == mst[j][1]) (*c)++; } This code is included in segment *Note 17: uniformityc. Finally, we finish up by freeing the memory allocated for the MST and returning a successful value. 21. = mem_class->unf_free (mst); return 1; This code is included in segment *Note 17: uniformityc. 2.2.4 Results Calculation Function ---------------------------------- The final step in uniformity testing is to analyze the results based on the MST from the previous step, using c and t computed there. Qualitatively, if the test data are uniformly distributed, then we expect there to be about as many edges joining points within the two groups of points (the test set and the generated set) as edges between the two groups. But if the test data are clustered, then we expect most edges to join points within a group, with a few edges joining the two groups. This notion is quantifiable. The expected or mean value E[t] of t, given that the data are uniformly distributed, is n, the number of test points, with variance _ _ n ! c - 2n + 2 ! Var[t!c] = ------ ! n - 1 - ---------- ! , 2n - 1 !_ 2n - 3 _! and the distribution of t conditioned on c is asymptotically normal in n. We treat t as a normal variate and convert it to a z-score using the formula z = (t - E[t]) / sqrt(Var[t!c]) The significance s is then found using the cumulative distribution function P(z) of the standard normal distribution Z(z), / z s = P(z) = ! Z(x) dx, Z(z) = 1 / (2 * pi) * exp(-0.5 * x**2). / -oo The significance is returned to the caller, which can then apply its own criteria to make the final decision about uniformity. Typically, the caller classifies as uniform samples with s > s_0 and as non-uniform samples with s <= s_0, where s_0 is a small threshold value such as 0.02 or 0.05. The implementation is straightforward. To find the results, calculate the mean and variance of t conditioned on c, then report the significance of the resulting z-score: 22. <`uniformity.c' 9> += <*Note Significance of a normal variate:: 23> /* Returns uniformity significance level between 0 and 1. c and t are the values returned by unf_run_mst(), n the size of the test set. */ double unf_calc_results (int c, int t, int n) { double mean = n; double var = ((n / (2.0 * n - 1.0)) * (n - 1.0 - (c - 2.0 * n + 2.0) / (2.0 * n - 3.0))); double z = (t - mean) / sqrt (var); return normal_sig (z); } See also segments *Note 9: uniformityc, *Note 10: uniformityc-2, *Note 15: uniformityc-3, *Note 16: uniformityc-4, *Note 17: uniformityc-5, and *Note 24: uniformityc-7. The code above makes use of a routine normal_sig() to compute the significance. The significance P(z) of a standard normal variate z >= 0 can be calculated using the formula P(z) = 1 - Z(z)[(a_1)t + (a_2)t**2 + (a_3)t**3] + epsilon(x), t = 1/(1 + px) where a_1, a_2, and a_3 are constants and e(x) < .00001 is an error term. For z < 0, we can use the identity P(x) = 1 - P(-x). The implementation in C is as follows: 23. = /* Returns the significance of normal variate x. */ static double normal_sig (double x) { const double a1 = 0.4361836; const double a2 = -0.1201676; const double a3 = 0.9372980; const double p = 0.33267; const double pi = 3.14159265358979323846; double y = fabs (x); double t = 1.0 / (1.0 + p * y); double t2 = t * t; double t3 = t2 * t; double z = 1.0 / sqrt (2.0 * pi) / exp (0.5 * y * y); double s = z * (a1 * t + a2 * t2 + a3 * t3); return x >= 0 ? 1.0 - s : s; } This code is included in segment *Note 22: uniformityc. References: *Note Smith 1984::, pp. 76; *Note Abramowitz 1972::, pp. 931-932; *Note Zwillinger 1996::, pp. 583-584. 2.2.5 Options Initialization Function ------------------------------------- 24. <`uniformity.c' 9> += /* Initializes options to the defaults, as an alternative to using unf_defaults as an initializer. */ void unf_init_options (struct unf_options *options) { assert (options != NULL); options->unf_mem = NULL; options->unf_rng = NULL; options->unf_set = NULL; options->unf_mst = NULL; } See also segments *Note 9: uniformityc, *Note 10: uniformityc-2, *Note 15: uniformityc-3, *Note 16: uniformityc-4, *Note 17: uniformityc-5, and *Note 22: uniformityc-6. 3 Minimum Spanning Tree Computation *********************************** Computing the MST of the set of pooled points is the most interesting part of the implementation of the uniformity test. It is also the part that offers the most possibilities. Several methods for computing a minimum spanning tree are suitable. The following table lists the ones considered, along with their asymptotic time requirements for a complete graph with n vertices: Method Best/Avg. Time Worst Time Kruskal's algorithm O(n**2 lg n) O(n**2 lg n) Prim's algorithm with binary O(n**2 lg n) O(n**2 lg n) heap Prim's algorithm with Fibonacci O(n**2) O(n**2) heap Bentley's single-fragment O(n lg n) O(n**2 lg n) algorithm Bentley's multifragment O(n lg n) (unknown) algorithm Each of these algorithms has its own advantages and disadvantages in this context. Kruskal's algorithm is simple to implement, but slow. Prim's algorithm requires a priority queue. If a binary heap is used for the priority queue, then it has the same asymptotic speed as Kruskal's algorithm. Substituting a Fibonacci heap speeds up processing considerably. Bentley's single-fragment algorithm is faster than any other algorithm in the average case, but in the worst case it degrades to worse than Prim's using a Fibonacci heap. Unfortunately, this worst case occurs when the points are clustered, one of the situations of interest to us. Bentley's multifragment algorithm may remedy this problem, but its worst-case time bound is in fact unknown. Prim's algorithm was chosen for implementation because it has the best worst-case behavior of any of the algorithms. Both binary heap and Fibonacci heap variants are implemented. The object-oriented architecture used allows different algorithms, such as Bentley's, to be substituted later or even chosen dynamically at runtime. References: *Note Fredman 1987::; *Note Cormen 1990::, chapters 21 and 24; *Note Bentley 1978::. 3.1 Class Structure =================== The "class" structure that allows alternative MST implementations to be used is struct unf_mst. Its only member is pointer to function unf_run. This function takes as arguments a memory allocator class, a pointer to a set of points, and counts of the points and the number of dimensions that they contain. It returns a pointer to pairs of vertex indexes indicating the edges in the MST. 25. += typedef int unf_mst_result[2]; struct unf_mst { unf_mst_result *(*unf_run) (struct unf_mem *, const double *, int, int); }; See also segments *Note 6: Test options, *Note 7: Test options-2, *Note 8: Test options-3, *Note 67: Test options-5, *Note 69: Test options-6, and *Note 71: Test options-7. This code is included in segment *Note 3: uniformityh. The Prim's algorithm MST implemented here uses an additional class structure struct pq_class, which represents a particular type of priority queue. This structure will be presented later, along with the priority queue implementations themselves. 3.2 Prim's Algorithm ==================== Prim's algorithm for computing a minimum spanning tree is implemented by `mst-prim.c'. It begins as follows: 26. <`mst-prim.c' 26> = <*Note BSD License:: 1> #include #include #include #include ``pq.h'' #include ``uniformity.h'' See also segments *Note 27: mst-primc-2, *Note 28: mst-primc-3, and *Note 33: mst-primc-4. The first piece of code is a helper function to calculate the Euclidean distance between two points: 27. <`mst-prim.c' 26> += /* Calculates and returns the distance between points a and b, each in d dimensions. */ static double calc_distance (const double *a, const double *b, int d) { double sum = 0.0; assert (a != NULL && b != NULL && d >= 0); while (d--) { double diff = *a++ - *b++; sum += diff * diff; } return sum; } See also segments *Note 26: mst-primc, *Note 28: mst-primc-3, and *Note 33: mst-primc-4. The MST function itself is next. Its outline looks like this: 28. <`mst-prim.c' 26> += /* Calculates the MST of the n d-dimensional points at p. Uses mem for memory allocation and an instance of pq_class as a priority queue. */ static unf_mst_result * calc_mst (struct pq_class *pq_class, struct unf_mem *mem, const double *p, int n, int d) { struct pq *pq; unf_mst_result *mst; int *edges; assert (pq_class != NULL && mem != NULL && p != NULL && n > 0 && d > 0); <*Note Memory allocation for Prim MST:: 29> <*Note Execute Prim MST:: 30> <*Note Record results of Prim MST:: 31> <*Note Clean up and return from Prim MST:: 32> } See also segments *Note 26: mst-primc, *Note 27: mst-primc-2, and *Note 33: mst-primc-4. The first step is memory allocation. We need to create a priority queue, temporary space used for storing edges, and space in which to return the result to the user: 29. = pq = pq_class->create (mem, n); mst = mem->unf_alloc (sizeof *mst * (n - 1)); edges = mem->unf_alloc (sizeof *edges * n); if (pq == NULL || mst == NULL || edges == NULL) { if (pq != NULL) pq_class->discard (pq); mem->unf_free (mst); mem->unf_free (edges); return NULL; } This code is included in segment *Note 28: mst-primc. The actual computation of the MST is the second step. When this step is complete, array edges[] implicitly stores the minimum spanning tree: for each i in 1...n, there is an edge between vertices i and edges[i]. The first element, edges[0], is not used. 30. = pq_class->decrease_key (pq, 0, 0); while (pq_class->count (pq) > 0) { int u, v; u = pq_class->extract_min (pq); for (v = 0; v < n; v++) if (u != v) { double key = pq_class->key (pq, v); if (key != -DBL_MAX) { double cost = calc_distance (p + d * u, p + d * v, d); if (cost < key) { edges[v] = u; pq_class->decrease_key (pq, v, cost); } } } } This code is included in segment *Note 28: mst-primc. After executing the MST, we translate the edges[] array into the form expected by the caller. 31. = { int i; for (i = 1; i < n; i++) { mst[i - 1][0] = i; mst[i - 1][1] = edges[i]; } } This code is included in segment *Note 28: mst-primc. Finally, we free temporary data and return to the caller. 32. = pq_class->discard (pq); mem->unf_free (edges); return mst; This code is included in segment *Note 28: mst-primc. References: *Note Cormen 1990::, section 24.2. 3.3 Class Implementations ========================= The final task is to declare class structures for the two MSTs defined here. This requires a wrapper function for each MST, because calc_mst() requires one more argument than does struct unf_mst's unf_run member. In all, we have the following: 33. <`mst-prim.c' 26> += /* Prim's Algorithm MST with binary heap priority queue. */ static unf_mst_result * calc_mst_prim_bin_heap (struct unf_mem *mem, const double *p, int n, int d) { extern struct pq_class unf_pq_bin_heap; return calc_mst (&unf_pq_bin_heap, mem, p, n, d); } struct unf_mst unf_mst_prim_binary = { calc_mst_prim_bin_heap, }; /* Prim's Algorithm MST with Fibonacci heap priority queue. */ static unf_mst_result * calc_mst_prim_fib_heap (struct unf_mem *mem, const double *p, int n, int d) { extern struct pq_class unf_pq_fib_heap; return calc_mst (&unf_pq_fib_heap, mem, p, n, d); } struct unf_mst unf_mst_prim_fibonacci = { calc_mst_prim_fib_heap, }; See also segments *Note 26: mst-primc, *Note 27: mst-primc-2, and *Note 28: mst-primc-3. 4 Priority Queues ***************** Prim's algorithm for computing a minimum spanning tree requires the use of a priority queue to determine the order of insertion of edges into the tree. This chapter presents two different priority queues for this purpose. The first is based on simple binary heaps, the second on more complicated and asymptotically faster Fibonacci heaps. 4.1 Class Structure =================== In order to make it easy to switch among implementatations, the priority queue is defined in <*Note pqh:: 34> as a "class" structure of function pointers. Notice that because <*Note pqh:: 34> is not intended to be included by programs using the uniformity-testing library, only by <*Note mst-primc:: 26>, it does not restrict itself to the unf_ namespace: 34. <`pq.h' 34> = <*Note BSD License:: 1> #ifndef PQ_H #define PQ_H 1 struct unf_mem; struct pq_class { struct pq *(*create) (struct unf_mem *mem, int n); void (*discard) (struct pq *pq); int (*extract_min) (struct pq *pq); void (*decrease_key) (struct pq *pq, int vertex, double key); int (*count) (const struct pq *pq); double (*key) (const struct pq *pq, int vertex); void (*verify) (const struct pq *pq); void (*dump) (const struct pq *pq); }; #endif /* pq.h */ The abstract model behind struct pq_class is a first-in-smallest-out priority queue. The items in the queue have two parts: an integer "vertex" number between 0 and n - 1, where n is the maximum capacity of the queue, and a "key" stored as a floating-point number. The items are ordered based on the key field. An instance of struct pq_class must implement these abstract operations: struct pq *create (struct unf_mem *mem, int n); Creates and returns a new priority queue containing n items initially, all of which initially have key +oo. Any needed memory allocation for the priority queue must be done with the functions in mem. Returns a null pointer if the priority queue cannot be created. void discard (struct pq *pq); Destroys priority queue pq and frees any associated memory. int extract_min (struct pq *pq); Deletes the item within pq having minimum key and returns its vertex. If pq is empty, behavior is undefined. void decrease_key (struct pq *pq, int vertex, double key); Finds the item in pq with vertex vertex and changes its key to key. Behavior is undefined if pq does not contain an item with vertex vertex or if key is greater than that item's current key. int count (const struct pq *pq); Returns the number of items remaining in pq. This will always be less than or equal to n passed to create(). double key (const struct pq *pq, int vertex); Returns the key of the item in pq with vertex vertex. Returns -DBL_MAX if there is no vertex vertex in pq. Behavior is undefined if vertex is not in the valid range. void verify (const struct pq *pq); (For debugging purposes.) Verifies that pq's state is internally consistent. void dump (const struct pq *pq); (For debugging purposes.) Prints a human-readable representation of pq's internal state to stdout. 4.2 Binary Heap =============== File <*Note pq-bin-heapc:: 35> implements binary heaps. It begins like this: 35. <`pq-bin-heap.c' 35> = <*Note BSD License:: 1> #include #include #include #include ``pq.h'' #include ``uniformity.h'' See also segments *Note 36: pq-bin-heapc-2, *Note 37: pq-bin-heapc-3, *Note 38: pq-bin-heapc-4, *Note 39: pq-bin-heapc-5, *Note 40: pq-bin-heapc-6, *Note 44: pq-bin-heapc-7, *Note 45: pq-bin-heapc-8, *Note 46: pq-bin-heapc-9, *Note 47: pq-bin-heapc-10, *Note 48: pq-bin-heapc-11, and *Note 49: pq-bin-heapc-12. The class of binary heaps is a subset of the class of binary trees. A binary tree is a binary heap if: 1. For every node except the root, the key value in its parent node is less or equal than its own key value (the "heap property"); _and_ 2. The binary tree is "nearly complete," meaning that every level in the tree is filled, except possibly for the bottommost, which must be filled in left-to-right order without gaps. It is not necessary, but here we will also assume that key values in heap nodes are distinct. Here are two examples of heaps, with the key values shown inside the node circles: 2 _.-' \ 1 4 5 / `_ _.' `_ ^ 2 6 7 11 6 8 ^ / / \ / 3 5 7 9 10 12 A heap can be conveniently represented as an array, with no need for pointers, by writing down its elements from left to right, top to bottom. The two heaps pictured above would have the following representations as arrays: (2, 4, 5, 7, 11, 6, 8, 9, 10, 12) (1, 2, 6, 3, 5, 7) The first element of the array, at index 1, is the heap's root. To get from the i'th element to its left child, simply multiply i by 2; to get to its right child, multiply by 2 and add 1; to get to its parent, divide by 2 and discard any remainder. If the result is between 1 and the number of nodes in the heap, then it is the index of the desired node; otherwise, there is no such node in the heap. We can easily implement these "movement" operations in C: 36. <`pq-bin-heap.c' 35> += /* Heap movement operations. */ #define parent(i) ((i) / 2) /* Get node i's parent. */ #define left(i) ((i) * 2) /* Get node i's left child. */ #define right(i) (left (i) + 1) /* Get node i's right child. */ See also segments *Note 35: pq-bin-heapc, *Note 37: pq-bin-heapc-3, *Note 38: pq-bin-heapc-4, *Note 39: pq-bin-heapc-5, *Note 40: pq-bin-heapc-6, *Note 44: pq-bin-heapc-7, *Note 45: pq-bin-heapc-8, *Note 46: pq-bin-heapc-9, *Note 47: pq-bin-heapc-10, *Note 48: pq-bin-heapc-11, and *Note 49: pq-bin-heapc-12. Our binary heap priority queue structure based on these ideas looks like this: 37. <`pq-bin-heap.c' 35> += /* Binary heap priority queue. */ struct pq { struct unf_mem *mem; /* Memory allocator. */ int n; /* Number of items in tree. */ int m; /* Maximum number of items in tree. */ /* Heap contents. */ double *key; /* Keys. */ int *vertex; /* Satellite information (vertex number). */ /* Reverse index. */ int *index; /* Heap index by vertex number; 0 = not present. */ }; See also segments *Note 35: pq-bin-heapc, *Note 36: pq-bin-heapc-2, *Note 38: pq-bin-heapc-4, *Note 39: pq-bin-heapc-5, *Note 40: pq-bin-heapc-6, *Note 44: pq-bin-heapc-7, *Note 45: pq-bin-heapc-8, *Note 46: pq-bin-heapc-9, *Note 47: pq-bin-heapc-10, *Note 48: pq-bin-heapc-11, and *Note 49: pq-bin-heapc-12. Most of this structure should be self-explanatory. Member key points to the heap itself; vertex is a parallel array that contains the companion vertex values. Both of these arrays have an extra unused element at the beginning, which allows for 1-based indexing and simplifies the movement macros above.(1) Member index is used to store the indexes into key and vertex that contain particular vertex values. This array is necessary for fast lookup of particular vertices, since there is no quick way to search a heap. The following sections implement the abstract operations from struct pq_class on binary heaps. References: *Note Cormen 1990::, section 7.1; *Note Knuth 1973::, section 5.2.3. ---------- Footnotes ---------- (1) On many architectures some pointer trickery could be used instead, but this is technically undefined according to *Note ISO 1990::, section 6.3.6. See *Note Summit 1999::, question 6.17, for more information. 4.2.1 Creation -------------- Creation and initialization of the heap is straightforward: 38. <`pq-bin-heap.c' 35> += static void pq_discard (struct pq *pq); static struct pq * pq_create (struct unf_mem *mem, int n) { struct pq *pq; int i; /* Allocate memory for heap itself. */ assert (n >= 0); pq = mem->unf_alloc (sizeof *pq); if (pq == NULL) return NULL; /* Allocate memory for heap's members. */ pq->mem = mem; pq->key = mem->unf_alloc (sizeof *pq->key * (n + 1)); pq->vertex = mem->unf_alloc (sizeof *pq->vertex * (n + 1)); pq->index = mem->unf_alloc (sizeof *pq->index * (n + 1)); if (pq->key == NULL || pq->vertex == NULL || pq->index == NULL) { pq_discard (pq); return NULL; } /* Initialize heap. */ pq->n = pq->m = n; for (i = 1; i <= n; i++) { pq->key[i] = DBL_MAX; pq->vertex[i] = i; pq->index[i] = i; } return pq; } See also segments *Note 35: pq-bin-heapc, *Note 36: pq-bin-heapc-2, *Note 37: pq-bin-heapc-3, *Note 39: pq-bin-heapc-5, *Note 40: pq-bin-heapc-6, *Note 44: pq-bin-heapc-7, *Note 45: pq-bin-heapc-8, *Note 46: pq-bin-heapc-9, *Note 47: pq-bin-heapc-10, *Note 48: pq-bin-heapc-11, and *Note 49: pq-bin-heapc-12. 4.2.2 Destruction ----------------- Destruction simply discards the memory that was allocated at creation time: 39. <`pq-bin-heap.c' 35> += static void pq_discard (struct pq *pq) { assert (pq != NULL); pq->mem->unf_free (pq->key); pq->mem->unf_free (pq->vertex); pq->mem->unf_free (pq->index); pq->mem->unf_free (pq); } See also segments *Note 35: pq-bin-heapc, *Note 36: pq-bin-heapc-2, *Note 37: pq-bin-heapc-3, *Note 38: pq-bin-heapc-4, *Note 40: pq-bin-heapc-6, *Note 44: pq-bin-heapc-7, *Note 45: pq-bin-heapc-8, *Note 46: pq-bin-heapc-9, *Note 47: pq-bin-heapc-10, *Note 48: pq-bin-heapc-11, and *Note 49: pq-bin-heapc-12. 4.2.3 Minimum Value Extraction ------------------------------ Extracting the minimum value from a binary heap can be divided into three steps. 40. <`pq-bin-heap.c' 35> += static int pq_extract_min (struct pq *pq) { int min; assert (pq != NULL && pq->n > 0); <*Note Save minimum value of binary heap:: 41> <*Note Delete root from binary heap:: 42> <*Note Restore binary heap property:: 43> return min - 1; } See also segments *Note 35: pq-bin-heapc, *Note 36: pq-bin-heapc-2, *Note 37: pq-bin-heapc-3, *Note 38: pq-bin-heapc-4, *Note 39: pq-bin-heapc-5, *Note 44: pq-bin-heapc-7, *Note 45: pq-bin-heapc-8, *Note 46: pq-bin-heapc-9, *Note 47: pq-bin-heapc-10, *Note 48: pq-bin-heapc-11, and *Note 49: pq-bin-heapc-12. The minimum value in a binary heap is the root, so finding the minimum value is easy: 41. = min = pq->vertex[1]; This code is included in segment *Note 40: pq-bin-heapc. The interesting part is deleting the root node and rearranging the resultant binary tree so that it is still a heap. The successful approach is to move the last (n'th) node in the heap into the root position, then move down the tree swapping nodes until the heap property is satisfied. Here's an example of the effects of heap deletion, with the original heap at the left and the heap after at the right and the intermediate steps in between: 1 7 2 2 / `_ / \ / \ / \ 2 6 => 2 6 => 7 6 => 3 6 ^ / ^ ^ ^ 3 5 7 3 5 3 5 7 5 We start by deleting the minimum node from the reverse index, moving the last node to the root, and decrementing the tree's node count: 42. = assert (min >= 1 && min <= pq->m); pq->index[min] = 0; pq->vertex[1] = pq->vertex[pq->n]; pq->key[1] = pq->key[pq->n]; pq->index[pq->vertex[1]] = 1; pq->n--; This code is included in segment *Note 40: pq-bin-heapc. To restore the heap property, we start from the root and move down the tree, swapping the current node with the smallest of its children. If the current node is smaller than its children, we can stop: 43. = { int i = 1; for (;;) { /* Find index of smallest of i or its children as smallest. */ int l = left (i); int r = right (i); int smallest = i; if (l <= pq->n && pq->key[l] < pq->key[smallest]) smallest = l; if (r <= pq->n && pq->key[r] < pq->key[smallest]) smallest = r; if (smallest == i) break; /* Swap nodes i and smallest. */ { int t_vertex = pq->vertex[i]; double t_key = pq->key[i]; pq->vertex[i] = pq->vertex[smallest]; pq->key[i] = pq->key[smallest]; pq->vertex[smallest] = t_vertex; pq->key[smallest] = t_key; pq->index[pq->vertex[i]] = i; pq->index[pq->vertex[smallest]] = smallest; } i = smallest; } } This code is included in segment *Note 40: pq-bin-heapc. References: *Note Cormen 1990::, section 7.5; *Note Knuth 1973::, section 5.2.3. 4.2.4 Decreasing Keys --------------------- To decrease the key value of a heap item, we cause it to "bubble up" the tree until its parent has a smaller value than it does or it reaches the root. We can always do this because we replace smaller values by larger values, so the proper relationship between the keys replaced and their children's keys is maintained. In the example below, the key of the node labeled 5 is decreased to 0: 1 1 1 0 / `_ / `_ / `_ / `_ 2 6 => 2 6 => 1 6 => 1 6 ^ / ^ / ^ / ^ / 3 5 7 3 2 7 3 2 7 3 2 7 No further discussion should be necessary: 44. <`pq-bin-heap.c' 35> += static void pq_decrease_key (struct pq *pq, int vertex, double key) { int i, p; assert (pq != NULL); assert (vertex >= 0 && vertex < pq->m); assert (pq->index[vertex + 1] != 0); assert (key <= pq->key[pq->index[vertex + 1]]); for (i = pq->index[vertex + 1]; i > 1; i = p) { p = parent (i); if (pq->key[p] < key) break; pq->key[i] = pq->key[p]; pq->vertex[i] = pq->vertex[p]; pq->index[pq->vertex[i]] = i; } pq->key[i] = key; pq->vertex[i] = vertex + 1; pq->index[pq->vertex[i]] = i; } See also segments *Note 35: pq-bin-heapc, *Note 36: pq-bin-heapc-2, *Note 37: pq-bin-heapc-3, *Note 38: pq-bin-heapc-4, *Note 39: pq-bin-heapc-5, *Note 40: pq-bin-heapc-6, *Note 45: pq-bin-heapc-8, *Note 46: pq-bin-heapc-9, *Note 47: pq-bin-heapc-10, *Note 48: pq-bin-heapc-11, and *Note 49: pq-bin-heapc-12. References: *Note Cormen 1990::, section 7.5, exercise 4. 4.2.5 Count ----------- The count function just returns struct pq's n value: 45. <`pq-bin-heap.c' 35> += static int pq_count (const struct pq *pq) { assert (pq != NULL); return pq->n; } See also segments *Note 35: pq-bin-heapc, *Note 36: pq-bin-heapc-2, *Note 37: pq-bin-heapc-3, *Note 38: pq-bin-heapc-4, *Note 39: pq-bin-heapc-5, *Note 40: pq-bin-heapc-6, *Note 44: pq-bin-heapc-7, *Note 46: pq-bin-heapc-9, *Note 47: pq-bin-heapc-10, *Note 48: pq-bin-heapc-11, and *Note 49: pq-bin-heapc-12. 4.2.6 Key Lookup ---------------- The index array is designed for key lookup, making this function easy: 46. <`pq-bin-heap.c' 35> += static double pq_key (const struct pq *pq, int vertex) { int i; assert (pq != NULL && vertex >= 0 && vertex < pq->m); i = pq->index[vertex + 1]; return i != 0 ? pq->key[i] : -DBL_MAX; } See also segments *Note 35: pq-bin-heapc, *Note 36: pq-bin-heapc-2, *Note 37: pq-bin-heapc-3, *Note 38: pq-bin-heapc-4, *Note 39: pq-bin-heapc-5, *Note 40: pq-bin-heapc-6, *Note 44: pq-bin-heapc-7, *Note 45: pq-bin-heapc-8, *Note 47: pq-bin-heapc-10, *Note 48: pq-bin-heapc-11, and *Note 49: pq-bin-heapc-12. 4.2.7 Verification ------------------ The verification function checks that the heap property is satisfied and that the index array is properly set up: 47. <`pq-bin-heap.c' 35> += static void pq_verify (const struct pq *pq) { int i; int error = 0; assert (pq != NULL); assert (pq->m > 0); assert (pq->n <= pq->m); for (i = 2; i <= pq->n; i++) { int parent = parent (i); if (pq->key[parent] > pq->key[i]) { printf ("Heap property for %d:%d violated: %g > %g\n", parent, i, pq->key[parent], pq->key[i]); error = 1; } } for (i = 1; i <= pq->n; i++) if (pq->index[i] != 0 && pq->vertex[pq->index[i]] != i) { printf ("index for %d points to position %d, which contains %d\n", i, pq->index[i], pq->vertex[pq->index[i]]); error = 1; } assert (error == 0); } See also segments *Note 35: pq-bin-heapc, *Note 36: pq-bin-heapc-2, *Note 37: pq-bin-heapc-3, *Note 38: pq-bin-heapc-4, *Note 39: pq-bin-heapc-5, *Note 40: pq-bin-heapc-6, *Note 44: pq-bin-heapc-7, *Note 45: pq-bin-heapc-8, *Note 46: pq-bin-heapc-9, *Note 48: pq-bin-heapc-11, and *Note 49: pq-bin-heapc-12. 4.2.8 Dumping ------------- The dump function prints out an array representation of the binary heap to stdout. Vertex numbers and (finite) keys are printed separated by colons. 48. <`pq-bin-heap.c' 35> += static void pq_dump (const struct pq *pq) { int i; for (i = 1; i <= pq->n; i++) { printf ("%d", pq->vertex[i]); if (pq->key[i] != DBL_MAX) printf (":%g", pq->key[i]); putchar (' '); } putchar ('\n'); } See also segments *Note 35: pq-bin-heapc, *Note 36: pq-bin-heapc-2, *Note 37: pq-bin-heapc-3, *Note 38: pq-bin-heapc-4, *Note 39: pq-bin-heapc-5, *Note 40: pq-bin-heapc-6, *Note 44: pq-bin-heapc-7, *Note 45: pq-bin-heapc-8, *Note 46: pq-bin-heapc-9, *Note 47: pq-bin-heapc-10, and *Note 49: pq-bin-heapc-12. 4.2.9 Class Implementation -------------------------- The class implementation of the binary heap is the final declaration in the file: 49. <`pq-bin-heap.c' 35> += struct pq_class unf_pq_bin_heap = { pq_create, pq_discard, pq_extract_min, pq_decrease_key, pq_count, pq_key, pq_verify, pq_dump, }; See also segments *Note 35: pq-bin-heapc, *Note 36: pq-bin-heapc-2, *Note 37: pq-bin-heapc-3, *Note 38: pq-bin-heapc-4, *Note 39: pq-bin-heapc-5, *Note 40: pq-bin-heapc-6, *Note 44: pq-bin-heapc-7, *Note 45: pq-bin-heapc-8, *Note 46: pq-bin-heapc-9, *Note 47: pq-bin-heapc-10, and *Note 48: pq-bin-heapc-11. 4.3 Fibonacci Heap ================== File <*Note pq-fib-heapc:: 50> implements Fibonacci heaps. It begins this way: 50. <`pq-fib-heap.c' 50> = <*Note BSD License:: 1> #include #include #include #include ``pq.h'' #include ``uniformity.h'' See also segments *Note 51: pq-fib-heapc-2, *Note 52: pq-fib-heapc-3, *Note 53: pq-fib-heapc-4, *Note 54: pq-fib-heapc-5, *Note 55: pq-fib-heapc-6, *Note 56: pq-fib-heapc-7, *Note 57: pq-fib-heapc-8, *Note 58: pq-fib-heapc-9, *Note 59: pq-fib-heapc-10, *Note 60: pq-fib-heapc-11, *Note 61: pq-fib-heapc-12, *Note 62: pq-fib-heapc-13, *Note 63: pq-fib-heapc-14, *Note 64: pq-fib-heapc-15, *Note 65: pq-fib-heapc-16, and *Note 66: pq-fib-heapc-17. A Fibonacci heap is a complicated data structure, quite a bit more involved to explain than a simple binary heap. As a result, only the implementation will be presented here. For a complete description of the principles behind a Fibonacci heap, consult one of the references. The Fibonacci heap priority queue structure is defined this way: 51. <`pq-fib-heap.c' 50> += /* Fibonacci heap priority queue. */ struct pq { int n; /* Number of items in heap. */ int m; /* Maximum number of items in heap. */ struct unf_mem *mem; /* Memory allocator. */ struct node *min; /* Minimum node. */ struct node **index; /* Index by vertex number to get node. */ struct node **rank; /* Used during extract_min(). */ struct node *prealloc; /* Memory allocated for nodes. */ }; See also segments *Note 50: pq-fib-heapc, *Note 52: pq-fib-heapc-3, *Note 53: pq-fib-heapc-4, *Note 54: pq-fib-heapc-5, *Note 55: pq-fib-heapc-6, *Note 56: pq-fib-heapc-7, *Note 57: pq-fib-heapc-8, *Note 58: pq-fib-heapc-9, *Note 59: pq-fib-heapc-10, *Note 60: pq-fib-heapc-11, *Note 61: pq-fib-heapc-12, *Note 62: pq-fib-heapc-13, *Note 63: pq-fib-heapc-14, *Note 64: pq-fib-heapc-15, *Note 65: pq-fib-heapc-16, and *Note 66: pq-fib-heapc-17. struct node has this form: 52. <`pq-fib-heap.c' 50> += /* Fibonacci heap node. */ struct node { int vertex; /* Vertex value. */ double key; /* Key value. */ int rank; /* Number of direct children. */ int mark; /* 1=marked, 0=unmarked. */ struct node *parent; /* Parent node, NULL if a root. */ struct node *child; /* One of our children, NULL if terminal. */ struct node *next, *prev; /* Next and previous in circular list. */ }; See also segments *Note 50: pq-fib-heapc, *Note 51: pq-fib-heapc-2, *Note 53: pq-fib-heapc-4, *Note 54: pq-fib-heapc-5, *Note 55: pq-fib-heapc-6, *Note 56: pq-fib-heapc-7, *Note 57: pq-fib-heapc-8, *Note 58: pq-fib-heapc-9, *Note 59: pq-fib-heapc-10, *Note 60: pq-fib-heapc-11, *Note 61: pq-fib-heapc-12, *Note 62: pq-fib-heapc-13, *Note 63: pq-fib-heapc-14, *Note 64: pq-fib-heapc-15, *Note 65: pq-fib-heapc-16, and *Note 66: pq-fib-heapc-17. The last bit before the code itself is a set of prototypes for internal functions: 53. <`pq-fib-heap.c' 50> += static void insert (struct pq *pq, int vertex, double key); static void detach (struct node *node); static void add_root (struct pq *pq, struct node *root); static void add_child (struct node *parent, struct node *child); See also segments *Note 50: pq-fib-heapc, *Note 51: pq-fib-heapc-2, *Note 52: pq-fib-heapc-3, *Note 54: pq-fib-heapc-5, *Note 55: pq-fib-heapc-6, *Note 56: pq-fib-heapc-7, *Note 57: pq-fib-heapc-8, *Note 58: pq-fib-heapc-9, *Note 59: pq-fib-heapc-10, *Note 60: pq-fib-heapc-11, *Note 61: pq-fib-heapc-12, *Note 62: pq-fib-heapc-13, *Note 63: pq-fib-heapc-14, *Note 64: pq-fib-heapc-15, *Note 65: pq-fib-heapc-16, and *Note 66: pq-fib-heapc-17. References: *Note Cormen 1990::, chapter 21; *Note Fredman 1987::; *Note Boyer 1997::. 4.3.1 Creation -------------- Creating a Fibonacci tree is a relatively simple matter of allocating memory and initializing all the fields of struct pq properly, then inserting all the nodes with initial keys of +oo: 54. <`pq-fib-heap.c' 50> += static struct pq * pq_create (struct unf_mem *mem, int n) { struct pq *pq; int i; assert (n >= 0); pq = mem->unf_alloc (sizeof *pq); if (pq == NULL) return NULL; pq->mem = mem; pq->n = 0; pq->m = n; pq->min = NULL; pq->index = pq->mem->unf_alloc (sizeof *pq->index * n); pq->rank = pq->mem->unf_alloc (sizeof *pq->rank * n); pq->prealloc = pq->mem->unf_alloc (sizeof *pq->prealloc * n); if (pq->index == NULL || pq->prealloc == NULL) { mem->unf_free (pq); return NULL; } for (i = 0; i < n; i++) { pq->index[i] = NULL; pq->rank[i] = NULL; } for (i = 0; i < n; i++) insert (pq, i, DBL_MAX); return pq; } See also segments *Note 50: pq-fib-heapc, *Note 51: pq-fib-heapc-2, *Note 52: pq-fib-heapc-3, *Note 53: pq-fib-heapc-4, *Note 55: pq-fib-heapc-6, *Note 56: pq-fib-heapc-7, *Note 57: pq-fib-heapc-8, *Note 58: pq-fib-heapc-9, *Note 59: pq-fib-heapc-10, *Note 60: pq-fib-heapc-11, *Note 61: pq-fib-heapc-12, *Note 62: pq-fib-heapc-13, *Note 63: pq-fib-heapc-14, *Note 64: pq-fib-heapc-15, *Note 65: pq-fib-heapc-16, and *Note 66: pq-fib-heapc-17. The insert() helper function for pq_create() inserts the specified vertex/key pair into the root list, using the (n + 1)th element of prealloc to do it: 55. <`pq-fib-heap.c' 50> += static void insert (struct pq *pq, int vertex, double key) { struct node *node; assert (pq != NULL); assert (vertex >= 0 && vertex < pq->m); assert (pq->index[vertex] == NULL); pq->index[vertex] = node = pq->prealloc + pq->n++; node->vertex = vertex; node->key = key; node->rank = 1; node->mark = 0; node->parent = node->child = NULL; add_root (pq, node); if (node->key < pq->min->key) pq->min = node; } See also segments *Note 50: pq-fib-heapc, *Note 51: pq-fib-heapc-2, *Note 52: pq-fib-heapc-3, *Note 53: pq-fib-heapc-4, *Note 54: pq-fib-heapc-5, *Note 56: pq-fib-heapc-7, *Note 57: pq-fib-heapc-8, *Note 58: pq-fib-heapc-9, *Note 59: pq-fib-heapc-10, *Note 60: pq-fib-heapc-11, *Note 61: pq-fib-heapc-12, *Note 62: pq-fib-heapc-13, *Note 63: pq-fib-heapc-14, *Note 64: pq-fib-heapc-15, *Note 65: pq-fib-heapc-16, and *Note 66: pq-fib-heapc-17. 4.3.2 Destruction ----------------- Destroying a Fibonacci heap is easy: 56. <`pq-fib-heap.c' 50> += static void pq_discard (struct pq *pq) { assert (pq != NULL); pq->mem->unf_free (pq->index); pq->mem->unf_free (pq->prealloc); pq->mem->unf_free (pq->rank); pq->mem->unf_free (pq); } See also segments *Note 50: pq-fib-heapc, *Note 51: pq-fib-heapc-2, *Note 52: pq-fib-heapc-3, *Note 53: pq-fib-heapc-4, *Note 54: pq-fib-heapc-5, *Note 55: pq-fib-heapc-6, *Note 57: pq-fib-heapc-8, *Note 58: pq-fib-heapc-9, *Note 59: pq-fib-heapc-10, *Note 60: pq-fib-heapc-11, *Note 61: pq-fib-heapc-12, *Note 62: pq-fib-heapc-13, *Note 63: pq-fib-heapc-14, *Note 64: pq-fib-heapc-15, *Note 65: pq-fib-heapc-16, and *Note 66: pq-fib-heapc-17. Notice that because all of the nodes were allocated at once from the prealloc array, there is no need to free each of them separately, nor would it be correct to do so. 4.3.3 Minimum Value Extraction ------------------------------ Extracting the minimum value from a Fibonacci heap is complicated. Consult one of the references for details. Armed with one of those, the comments in the code below should be sufficient explanation: 57. <`pq-fib-heap.c' 50> += static int pq_extract_min (struct pq *pq) { struct node *min; assert (pq != NULL); assert (pq->n > 0); /* Grab the minimum value and remove it from the heap. */ min = pq->min; pq->index[min->vertex] = NULL; pq->n--; pq->min = min->next; if (pq->min == min) pq->min = NULL; detach (min); /* Add all of the children of the former minimum as roots. */ for (;;) { struct node *child = min->child; if (child == NULL) break; detach (child); add_root (pq, child); } /* If the heap is now empty, return early. */ if (pq->min == NULL) return min->vertex; /* Go through all the root nodes and put them into rank[], linking together the ones with equal rank fields. */ { struct node *w = pq->min; do { struct node *x = w; w = w->next; if (w == pq->min) w = NULL; for (;;) { struct node *y = pq->rank[x->rank]; if (y == NULL || y == x) break; pq->rank[x->rank] = NULL; if (x->key > y->key) { struct node *t = x; x = y; y = t; } if (pq->min == y) pq->min = y->next; detach (y); add_child (x, y); y->mark = 0; } pq->rank[x->rank] = x; } while (w != NULL); } /* Go through the root nodes and unset the corresponding rank[] elements. Pick the next pq->min while we're at it. */ { struct node *x; x = pq->min; for (;;) { pq->rank[x->rank] = NULL; x = x->next; if (x == pq->min) break; if (x->key < pq->min->key) pq->min = x; } } return min->vertex; } See also segments *Note 50: pq-fib-heapc, *Note 51: pq-fib-heapc-2, *Note 52: pq-fib-heapc-3, *Note 53: pq-fib-heapc-4, *Note 54: pq-fib-heapc-5, *Note 55: pq-fib-heapc-6, *Note 56: pq-fib-heapc-7, *Note 58: pq-fib-heapc-9, *Note 59: pq-fib-heapc-10, *Note 60: pq-fib-heapc-11, *Note 61: pq-fib-heapc-12, *Note 62: pq-fib-heapc-13, *Note 63: pq-fib-heapc-14, *Note 64: pq-fib-heapc-15, *Note 65: pq-fib-heapc-16, and *Note 66: pq-fib-heapc-17. Function pq_extract_min() makes use of three helper functions. The first of these is detach(), which removes takes a specified node out of its parent's list of children: 58. <`pq-fib-heap.c' 50> += /* Removes node from the heap, making it parentless. */ static void detach (struct node *node) { assert (node->next != NULL && node->prev != NULL); node->next->prev = node->prev; node->prev->next = node->next; if (node->parent != NULL) { if (node->parent->child == node) { node->parent->child = node->next; if (node->parent->child == node) node->parent->child = NULL; } node->parent->rank--; node->parent = NULL; } #ifndef NDEBUG node->next = node->prev = NULL; #endif } See also segments *Note 50: pq-fib-heapc, *Note 51: pq-fib-heapc-2, *Note 52: pq-fib-heapc-3, *Note 53: pq-fib-heapc-4, *Note 54: pq-fib-heapc-5, *Note 55: pq-fib-heapc-6, *Note 56: pq-fib-heapc-7, *Note 57: pq-fib-heapc-8, *Note 59: pq-fib-heapc-10, *Note 60: pq-fib-heapc-11, *Note 61: pq-fib-heapc-12, *Note 62: pq-fib-heapc-13, *Note 63: pq-fib-heapc-14, *Note 64: pq-fib-heapc-15, *Note 65: pq-fib-heapc-16, and *Note 66: pq-fib-heapc-17. The first of these adds a specified node to the priority queue's root list. This node must be parentless, meaning that it should earlier have been detached with detach(): 59. <`pq-fib-heap.c' 50> += /* Adds root as a root of pq. pq->min is changed only if it is NULL, so the caller is responsible for updating it if it should decrease. */ static void add_root (struct pq *pq, struct node *root) { assert (root->parent == NULL); if (pq->min == NULL) { pq->min = root; root->prev = root->next = root; } else { root->prev = pq->min; root->next = pq->min->next; pq->min->next->prev = root; pq->min->next = root; } } See also segments *Note 50: pq-fib-heapc, *Note 51: pq-fib-heapc-2, *Note 52: pq-fib-heapc-3, *Note 53: pq-fib-heapc-4, *Note 54: pq-fib-heapc-5, *Note 55: pq-fib-heapc-6, *Note 56: pq-fib-heapc-7, *Note 57: pq-fib-heapc-8, *Note 58: pq-fib-heapc-9, *Note 60: pq-fib-heapc-11, *Note 61: pq-fib-heapc-12, *Note 62: pq-fib-heapc-13, *Note 63: pq-fib-heapc-14, *Note 64: pq-fib-heapc-15, *Note 65: pq-fib-heapc-16, and *Note 66: pq-fib-heapc-17. The second helper function adds a node to an arbitrary parent node's child list. Again, child must be parentless: 60. <`pq-fib-heap.c' 50> += /* Adds child as a child of parent. */ static void add_child (struct node *parent, struct node *child) { assert (parent != NULL && child != NULL); assert (child->parent == NULL); child->parent = parent; if (parent->child == NULL) { child->next = child->prev = child; parent->child = child; } else { child->prev = parent->child; child->next = parent->child->next; parent->child->next->prev = child; parent->child->next = child; } parent->rank++; } See also segments *Note 50: pq-fib-heapc, *Note 51: pq-fib-heapc-2, *Note 52: pq-fib-heapc-3, *Note 53: pq-fib-heapc-4, *Note 54: pq-fib-heapc-5, *Note 55: pq-fib-heapc-6, *Note 56: pq-fib-heapc-7, *Note 57: pq-fib-heapc-8, *Note 58: pq-fib-heapc-9, *Note 59: pq-fib-heapc-10, *Note 61: pq-fib-heapc-12, *Note 62: pq-fib-heapc-13, *Note 63: pq-fib-heapc-14, *Note 64: pq-fib-heapc-15, *Note 65: pq-fib-heapc-16, and *Note 66: pq-fib-heapc-17. 4.3.4 Decreasing Keys --------------------- The function for decreasing keys in a Fibonacci tree is not as complicated as for extracting the minimum key, but it is still nontrivial, mostly due to the need for "cascading cuts" in some circumstances. Again, consult the references for full explanation: 61. <`pq-fib-heap.c' 50> += static void pq_decrease_key (struct pq *pq, int vertex, double key) { struct node *x, *y; assert (pq != NULL); assert (vertex >= 0 && vertex < pq->m); assert (pq->index[vertex] != NULL); assert (key <= pq->index[vertex]->key); /* Find the node in question. */ x = pq->index[vertex]; x->key = key; y = x->parent; /* Make cascading cuts as necessary. */ if (y != NULL && x->key < y->key) for (;;) { struct node *z; assert (x->parent != NULL); assert (x->parent == y); assert (pq->min != x); detach (x); add_root (pq, x); x->mark = 0; z = y->parent; if (z == NULL) break; if (y->mark == 0) { y->mark = 1; break; } else { x = y; y = z; } } /* Actually decrease the key's value. */ if (key < pq->min->key) pq->min = pq->index[vertex]; } See also segments *Note 50: pq-fib-heapc, *Note 51: pq-fib-heapc-2, *Note 52: pq-fib-heapc-3, *Note 53: pq-fib-heapc-4, *Note 54: pq-fib-heapc-5, *Note 55: pq-fib-heapc-6, *Note 56: pq-fib-heapc-7, *Note 57: pq-fib-heapc-8, *Note 58: pq-fib-heapc-9, *Note 59: pq-fib-heapc-10, *Note 60: pq-fib-heapc-11, *Note 62: pq-fib-heapc-13, *Note 63: pq-fib-heapc-14, *Note 64: pq-fib-heapc-15, *Note 65: pq-fib-heapc-16, and *Note 66: pq-fib-heapc-17. 4.3.5 Count ----------- The count is stored in struct pq, just as for the binary heap implementation: 62. <`pq-fib-heap.c' 50> += static int pq_count (const struct pq *pq) { assert (pq != NULL); return pq->n; } See also segments *Note 50: pq-fib-heapc, *Note 51: pq-fib-heapc-2, *Note 52: pq-fib-heapc-3, *Note 53: pq-fib-heapc-4, *Note 54: pq-fib-heapc-5, *Note 55: pq-fib-heapc-6, *Note 56: pq-fib-heapc-7, *Note 57: pq-fib-heapc-8, *Note 58: pq-fib-heapc-9, *Note 59: pq-fib-heapc-10, *Note 60: pq-fib-heapc-11, *Note 61: pq-fib-heapc-12, *Note 63: pq-fib-heapc-14, *Note 64: pq-fib-heapc-15, *Note 65: pq-fib-heapc-16, and *Note 66: pq-fib-heapc-17. 4.3.6 Key Lookup ---------------- Looking up a key is just a matter of consulting the index: 63. <`pq-fib-heap.c' 50> += static double pq_key (const struct pq *pq, int vertex) { struct node *node; assert (pq != NULL); assert (vertex >= 0 && vertex < pq->m); node = pq->index[vertex]; return node != NULL ? node->key : -DBL_MAX; } See also segments *Note 50: pq-fib-heapc, *Note 51: pq-fib-heapc-2, *Note 52: pq-fib-heapc-3, *Note 53: pq-fib-heapc-4, *Note 54: pq-fib-heapc-5, *Note 55: pq-fib-heapc-6, *Note 56: pq-fib-heapc-7, *Note 57: pq-fib-heapc-8, *Note 58: pq-fib-heapc-9, *Note 59: pq-fib-heapc-10, *Note 60: pq-fib-heapc-11, *Note 61: pq-fib-heapc-12, *Note 62: pq-fib-heapc-13, *Note 64: pq-fib-heapc-15, *Note 65: pq-fib-heapc-16, and *Note 66: pq-fib-heapc-17. 4.3.7 Verification ------------------ Extensive verification of the Fibonacci heap's structure can be done: 64. <`pq-fib-heap.c' 50> += static void verify_recursive (const struct pq *pq, const struct node *first, const struct node *parent, const struct node **tmp) { const struct node *iter; int count; if (first == NULL) return; /* Check that the forward links are circular. */ iter = first; count = 0; do { assert (iter->parent == parent); assert (count < pq->m); tmp[count++] = iter; iter = iter->next; } while (iter != first); /* Check that the backward links are circular, and that they are the same ones as when we go forward. */ iter = first; do { iter = iter->prev; assert (count > 0); count--; assert (tmp[count] == iter); } while (iter != first); assert (count == 0); /* Perform verification recursively on child nodes. */ iter = first; do { verify_recursive (pq, iter->child, iter, tmp); iter = iter->next; } while (iter != first); } static void pq_verify (const struct pq *pq) { const struct node **tmp; assert (pq != NULL); assert (pq->n >= 0 && pq->n <= pq->m); tmp = pq->mem->unf_alloc (sizeof *tmp * pq->n); if (pq->n != 0 && tmp == NULL) { printf ("virtual memory exhausted\n"); abort (); } verify_recursive (pq, pq->min, NULL, tmp); pq->mem->unf_free (tmp); } See also segments *Note 50: pq-fib-heapc, *Note 51: pq-fib-heapc-2, *Note 52: pq-fib-heapc-3, *Note 53: pq-fib-heapc-4, *Note 54: pq-fib-heapc-5, *Note 55: pq-fib-heapc-6, *Note 56: pq-fib-heapc-7, *Note 57: pq-fib-heapc-8, *Note 58: pq-fib-heapc-9, *Note 59: pq-fib-heapc-10, *Note 60: pq-fib-heapc-11, *Note 61: pq-fib-heapc-12, *Note 62: pq-fib-heapc-13, *Note 63: pq-fib-heapc-14, *Note 65: pq-fib-heapc-16, and *Note 66: pq-fib-heapc-17. 4.3.8 Dumping ------------- We dump out the full multilevel recursive structure of the Fibonacci tree: 65. <`pq-fib-heap.c' 50> += static void dump_recursive (const struct node *p) { const struct node *q = p; putchar ('('); do { if (q != p) fputs (", ", stdout); printf (q->key == DBL_MAX ? "+oo" : "%g", q->key); if (q->child != NULL) dump_recursive (q->child); q = q->next; } while (q != p); putchar (')'); } static void pq_dump (const struct pq *pq) { dump_recursive (pq->min); } See also segments *Note 50: pq-fib-heapc, *Note 51: pq-fib-heapc-2, *Note 52: pq-fib-heapc-3, *Note 53: pq-fib-heapc-4, *Note 54: pq-fib-heapc-5, *Note 55: pq-fib-heapc-6, *Note 56: pq-fib-heapc-7, *Note 57: pq-fib-heapc-8, *Note 58: pq-fib-heapc-9, *Note 59: pq-fib-heapc-10, *Note 60: pq-fib-heapc-11, *Note 61: pq-fib-heapc-12, *Note 62: pq-fib-heapc-13, *Note 63: pq-fib-heapc-14, *Note 64: pq-fib-heapc-15, and *Note 66: pq-fib-heapc-17. 4.3.9 Class Implementation -------------------------- The class implementation of the Fibonacci heap is the final declaration in the file: 66. <`pq-fib-heap.c' 50> += struct pq_class unf_pq_fib_heap = { pq_create, pq_discard, pq_extract_min, pq_decrease_key, pq_count, pq_key, pq_verify, pq_dump, }; See also segments *Note 50: pq-fib-heapc, *Note 51: pq-fib-heapc-2, *Note 52: pq-fib-heapc-3, *Note 53: pq-fib-heapc-4, *Note 54: pq-fib-heapc-5, *Note 55: pq-fib-heapc-6, *Note 56: pq-fib-heapc-7, *Note 57: pq-fib-heapc-8, *Note 58: pq-fib-heapc-9, *Note 59: pq-fib-heapc-10, *Note 60: pq-fib-heapc-11, *Note 61: pq-fib-heapc-12, *Note 62: pq-fib-heapc-13, *Note 63: pq-fib-heapc-14, *Note 64: pq-fib-heapc-15, and *Note 65: pq-fib-heapc-16. 5 Other Classes *************** This chapter contains descriptions and implementations of the three classes that have not been treated yet: memory allocation, random number generation, and point set handling. 5.1 Memory Allocation ===================== struct unf_mem provides a "class" structure for memory allocators within the uniformity testing library. It is declared as follows: 67. += /* Memory allocator class. */ struct unf_mem { void *(*unf_alloc) (size_t); void (*unf_free) (void *); }; See also segments *Note 6: Test options, *Note 7: Test options-2, *Note 8: Test options-3, *Note 25: Test options-4, *Note 69: Test options-6, and *Note 71: Test options-7. This code is included in segment *Note 3: uniformityh. In an implementation, unf_alloc must be a function that takes a count of bytes to allocate and returns a pointer to the allocated memory if successful or NULL if no memory is available. Companion member unf_free must point to a function that frees memory allocated with unf_alloc. File <*Note mem-stdc:: 68> provides an implementation of the memory allocator class that uses the standard C malloc() and free() functions: 68. <`mem-std.c' 68> = <*Note BSD License:: 1> #include #include ``uniformity.h'' static void * standard_alloc (size_t n) { return malloc (n); } static void standard_free (void *p) { free (p); } struct unf_mem unf_mem_malloc = { standard_alloc, standard_free }; 5.2 Random Number Generator =========================== The class interface for random number generation contains one important member unf_get, which, when called, must return a pseudo-random number in the closed range [0, 1]. An additional member unf_seed is provided for seeding the generator for the convenience of calling code, but it is not used by the uniformity testing library itself. 69. += struct unf_rng { void (*unf_seed) (unsigned long); double (*unf_get) (void); }; See also segments *Note 6: Test options, *Note 7: Test options-2, *Note 8: Test options-3, *Note 25: Test options-4, *Note 67: Test options-5, and *Note 71: Test options-7. This code is included in segment *Note 3: uniformityh. File <*Note rng-stdc:: 70> uses the standard C pseudo-random number generator (PRNG) rand() for random number generation. This is not a good choice on many systems, where rand() may have very poor randomness properties. 70. <`rng-std.c' 70> = <*Note BSD License:: 1> #include #include ``uniformity.h'' static void rng_seed (unsigned long seed) { srand (seed); } static double rng_get (void) { return (double) rand () / RAND_MAX; } struct unf_rng unf_rng_system = { rng_seed, rng_get }; An additional random number generator is provided in `rng-mt.c'. It is based on the fast and high-quality Mersenne Twister random number generator. References: *Note Matsumoto 1997::. 5.3 Point Set ============= The purpose of the point set class is to represent an infinite set of points that includes the convex hull of a provided finite collection of points, but may include other points as well. For instance, it could be implemented as the smallest hypersphere or hyper-rectangle containing all the provided points. The class is represented as struct unf_set: 71. += /* Point set. */ struct unf_set { struct unf_a_set *(*unf_create) (struct unf_mem *, const double *, int n, int d); void (*unf_discard) (struct unf_a_set *); void (*unf_random) (const struct unf_a_set *, struct unf_rng *, double *); }; See also segments *Note 6: Test options, *Note 7: Test options-2, *Note 8: Test options-3, *Note 25: Test options-4, *Note 67: Test options-5, and *Note 69: Test options-6. This code is included in segment *Note 3: uniformityh. The members of struct unf_set are abstractly described as follows: struct unf_a_set *unf_create (struct unf_mem *mem, const double *p, int n, int d) Creates and returns a new point set that includes the n d-dimensional points at p. Memory allocation is performed using mem. Returns NULL if unsuccessful due to lack of memory or for some other reason. void unf_discard (struct unf_a_set *set); Destroys point set set. void unf_random (const struct unf_a_set *set, struct unf_rng *rng, double *p); Selects a random point from a uniform distribution over the set of points represented by set. Pseudo-random number generator rng is used in selection of the point. The point is stored into the vector at p, which must have room for d elements, d being the same value passed to unf_create(). The implementation provided here in <*Note set-rectc:: 72> uses a minimum-size hyper-rectangle to surround the points. The implementation begins with the definition of struct unf_a_set: 72. <`set-rect.c' 72> = <*Note BSD License:: 1> #include #include ``uniformity.h'' /* Hyper-rectangular point set. */ struct unf_a_set { struct unf_mem mem; /* Memory allocator. */ int d; /* Number of dimensions. */ double *min; /* Minimum values for each of d dimensions. */ double *max; /* Maximum values for each of d dimensions. */ }; See also segments *Note 73: set-rectc-2, *Note 74: set-rectc-3, *Note 75: set-rectc-4, and *Note 76: set-rectc-5. To create a set, we allocate memory for it, then iterate through the set of points provided setting up the min and max arrays: 73. <`set-rect.c' 72> += struct unf_a_set * set_create (struct unf_mem *mem, const double *p, int n, int d) { struct unf_a_set *set; int i; assert (p != NULL && n > 0 && d > 0); /* Memory allocation. */ set = mem->unf_alloc (sizeof *set); if (set == NULL) return NULL; set->mem = *mem; set->d = d; set->min = mem->unf_alloc (sizeof *set->min * d * 2); if (set->min == NULL) { mem->unf_free (set->min); mem->unf_free (set); return NULL; } set->max = set->min + d; /* Construct hyper-rectangle. */ for (i = 0; i < d; i++) set->min[i] = set->max[i] = *p++; for (i = 1; i < n; i++) { int j; for (j = 0; j < d; j++) { if (*p < set->min[j]) set->min[j] = *p; else if (*p > set->max[j]) set->max[j] = *p; p++; } } return set; } See also segments *Note 72: set-rectc, *Note 74: set-rectc-3, *Note 75: set-rectc-4, and *Note 76: set-rectc-5. Destroying a set is just a matter of freeing its memory: 74. <`set-rect.c' 72> += void set_discard (struct unf_a_set *set) { assert (set != NULL); set->mem.unf_free (set->min); set->mem.unf_free (set); } See also segments *Note 72: set-rectc, *Note 73: set-rectc-2, *Note 75: set-rectc-4, and *Note 76: set-rectc-5. To generate a random point in a hyper-rectangle, we generate an uniform random variate along each axis independently: 75. <`set-rect.c' 72> += void set_random (const struct unf_a_set *set, struct unf_rng *rng, double *v) { int i; for (i = 0; i < set->d; i++) v[i] = set->min[i] + rng->unf_get () * (set->max[i] - set->min[i]); } See also segments *Note 72: set-rectc, *Note 73: set-rectc-2, *Note 74: set-rectc-3, and *Note 76: set-rectc-5. Finally, we declare the class structure implementation: 76. <`set-rect.c' 72> += struct unf_set unf_set_rectangular = { set_create, set_discard, set_random, }; See also segments *Note 72: set-rectc, *Note 73: set-rectc-2, *Note 74: set-rectc-3, and *Note 75: set-rectc-4. 6 Discussion ************ This chapter discusses the library for uniformity testing. It begins with a description of the provided programs for testing. Sample results from these programs are then displayed and discussed. Next, implications for the correctness of the library and its components are suggested. The library's performance is analyzed. Finally, some directions for further work are mentioned. 6.1 Programs for Testing ======================== This section briefly discusses the programs for testing the uniformity testing library as a whole or in part. The source code for these programs is not presented due to its length and tedium. 6.1.1 Convex Hull Estimation ---------------------------- The first step in the uniformity testing algorithm is to select a random set of points within the approximate convex hull of the points to be tested. It is worthwhile to test that these points are generated properly. The `hull-test' program tests the convex hull estimation visually. It picks an initial set of points within a triangular area, then generates another set in their approximate convex hull using unf_inside_hull(). It prints a graphical representation of both sets of points to stdout in a format suitable for the `jgraph' graphing utility. The user can then judge for himself whether the points were correctly generated. 6.1.2 Minimum Spanning Trees ---------------------------- The code for priority queues for the Prim MST is somewhat complex, and therefore error-prone. A simple test to make sure that the priority queues are probably implemented correctly is helpful. This is what the `pq-test' program does. When run without command-line arguments, `pq-test' will perform its test 16 times for a priority queue of 128 items and print a success or failure message for each test. To see a list of other options, use the command `pq-test -h'. The minimum spanning tree algorithm as a whole is also worth testing. The `mst-test' program does this. It works similarly to the `hull-test' program, generating a random set of points, calculating their MST, and producing as output a graphical representation of the MST. 6.1.3 Uniformity Tester ----------------------- Testing the uniformity testing library itself is important, too. The `unf-test' program does this. Each time it is run it generates a set of points randomly within user-specified parameters, then runs a uniformity test on it and reports the results. The user can specify several parameters for the test using command-line arguments: `-d DIMS' `--dims=DIMS' d, the number of dimensions. `-c COUNT' `--count=COUNT' n, the number of points to test. `-a AVG' `--avg=AVG' Optionally, a number of calls to unf_test() over which to average s. By default only one call is used per data set. `-S SIG' `--sig=SIG' The minimum value of s that is considered "uniform," .05 by default. Increasing this value will decrease the chance of non-uniform data sets considered uniform and increase the chance of uniform data sets considered non-uniform. `-D DIST' `--dist=DIST' The distribution from which the test points are picked. Argument DIST may be one of the following: `uniform' The uniform distribution on [0, 1]. `decreasing' The distribution of decreasing values such that X1 > X2 > ... > Xd. The first variate X1 is picked uniformly between 0 and 1, the next variate X2 between 0 and X1, and so on. This distribution is so non-uniform that it should essentially _never_ be detected as uniform. `normal' The multivariate normal distribution with mean at the origin and identity covariance matrix. `-m MST' `--mst=MST' Selects the minimum spanning tree algorithm to use, one of: `prim-bin' Prim's algorithm with a binary heap priority queue. `prim-fib' Prim's algorithm with a Fibonacci heap priority queue. `-p PROXIMITY' `--prox=PROXIMITY' The default behavior is to select all of the test points from a single distribution of the type specified. When this option is specified along with a positive value for PROXIMITY, the data points are instead divided into two equal-size groups, the means of which are displaced apart a distance of PROXIMITY. `-r RUNS' `--runs=RUNS' Perform the specified tests on RUNS sets of data. The default is 16. `-s SEED' `--seed=SEED' Use SEED as the initial random number seed for the first run. The seed is incremented for each succeeding run. The default is based on the current time. `-h' `--help' Outputs a brief help screen and exits. `-V' `--version' Outputs version and licensing information and exits. These features will be used extensively to produce data for the discussion below. 6.2 Testing Results =================== The test programs above can be used to analyze the correctness and performance of the uniformity testing library. This section presents results obtained using them along with a discussion of these results' meaning. 6.2.1 Correctness ----------------- Correctness of the testing algorithm can be tested both when the sampling window is known and when it is unknown. The ordinary `unf-test' program can be used for the latter; the former can be tested the same way by temporarily commenting out the line if (unf_inside_hull (p, n, d, y, tmp)) in <*Note Generate points for uniformity test:: 13> and recompiling.(1) Tables below show the results of uniformity testing for cells with various dimensions d and point counts n. Each cell is of the form (R(0.05), R(0.02)), where R(s) indicates the percentage of uniform distributions wrongly detected as non-uniform at the given level s. Each entry corresponds to 10,000 runs. ---------- Footnotes ---------- (1) The proper way to do this would be to avoid use of unf_test(), using the low-level functions prototyped in <*Note uniformityh:: 3> instead. This is not done in `unf-test' because one of its purposes is to ensure that unf_test() works. 6.2.1.1 Known Sampling Window ............................. This table shows the results of uniformity testing for uniform samples when the sampling window is known: d 2 5 10 50 (4.3,1.5) (4.2,1.7) (4.2,1.6) n 100 (5.0,1.9) (5.2,2.0) (4.6,2.0) 200 (4.8,2.0) (4.5,2.1) (5.0,1.9) The expected values for R(0.05) and R(0.02) are 5 and 2, respectively. Considering each uniformity test R(s) as a binomial trial with probability of success p = s and selecting a significance level alpha = 0.05, the allowed intervals to accept the hypothesis p = p_hat are 4.57 < R(0.05) < 5.43 and 1.73 < R(0.02) < 2.27. For n = 50, few of the values in the table are in the range for acceptance, but for n = 100 and n = 200 most of them are. This indicates that the algorithm is probably correct. A later section will discuss possible sources of error. In the original paper, all of the values are within the range to allow acceptance, but only 100 runs were made per entry, instead of 10,000, presumably because of time constraints. When only 100 runs are made, it is easy to accidentally get results that are all within the acceptable range. References: *Note Smith 1984::, table I; *Note Zwillinger 1996::, sections 7.9.1 and 7.12.2. 6.2.1.2 Unknown Sampling Window ............................... When the sampling window is unknown, the results are like this: d 2 5 10 50 (4.2,1.6) (2.3,0.9) (0.5,0.1) n 100 (5.3,1.8) (3.0,1.0) (0.7,0.2) 200 (4.6,1.9) (3.3,1.1) (0.5,0.2) For small d, entries are in the same general range as for the known sampling window data, and as d increases, they become more conservative. This is the expected behavior according to the references. References: *Note Smith 1984::, table IV. 6.2.2 Normal Distributions -------------------------- We can run the uniformity test on points selected from a normal distribution. If this is done for several values of n with d = 5 and a desired 5% false detection rate, this graph results: (Refer to high-quality output for graph.) In words, for small n, uniform and normal distributions are almost indistinguishable. As n increases, the distribution is detected more and more often as non-uniform, until the time that n = 200 is reached, at which it is essentially always found to be non-uniform. 6.2.3 Testing Separation ------------------------ An additional interesting test is to divide the test points into two sets and separate their means by increasing distances. The graphs below show the results for (a) normal distributions and (b) uniform distributions, each with a total of n = 50 points. Note that the graphs are nearly identical. (Refer to high-quality output for graph.) 6.3 Sources of Error ==================== Based on results above, the code implementing the uniformity testing algorithm seems to work properly. This section discusses possible sources of error that were considered during design and implementation but have not already been discussed. 6.3.1 Bugs in MST Implementation -------------------------------- There are two programs to test the minimum spanning tree algorithms: `pq-test' for the priority queues and `mst-test' for the spanning tree itself. Testing the queue implementations for obvious flaws by running the `pq-test' program with several sets of parameters does not reveal any problems in either priority queue implementation. This does not mean that none exist, but it is a good sign. Running `mst-test' shows that it produces correct minimum spanning trees on test data. (Refer to a high-quality output version of this report for examples of `mst-test' output.) 6.3.2 Bugs in Convex Hull Insideness Test ----------------------------------------- The test whether a point is inside the approximate convex hull of the set of points is suspicious from the beginning because of the way that the equation for the test is written in both the dissertation and the paper describing the algorithm, using a subscript `2' in a confusing way. Presumably a superscript `2' was intended; it was interpreted this way in the implementation. A search of the literature did not turn up use of any similar tests, so no comparisons could be made. Several possible variants of this test were tried, but none produced better results. If the `hull-test' program is used to visually test the convex hull estimation, the results are disappointing. (Refer to the a high-quality output version of this report for an example.) The original points are marked with circles, the ones generated within the "approximate convex hull" with crosses. Notice how far away from the triangular border some of the generated points fall. None of generated points in a similar illustration in the original paper fall as far away as these. References: *Note Smith 1984::, figure 3. 6.3.3 Imprecise Arithmetic -------------------------- The uniformity testing library uses double for all of its floating-point calculations. It is possible that additional precision might increase accuracy in calculations. To experiment with this idea, all double (64-bit IEEE 754) quantities were replaced by long double (80-bit IEEE 754) quantities. In addition, sums were performed both using Kahan summation and in least-to-greatest order (by performing a sort operation). There was no noticeable change in behavior from these increases in precision. In conclusion, for the current data sets, arithmetic precision is not a limiting factor. References: *Note Goldberg 1991::. 6.4 Performance =============== The two significant steps in the uniformity test whose performance depend on n and d are the generation of random points and calculation of the minimum spanning tree. The first of these has asymptotic performance O(n**2), the latter O(n**2 lg n) for Prim's algorithm with a binary heap or O(n**2) with a Fibonacci heap. When `unf-test' is compiled with GCC 2.95.4 prerelease 20010319 using options `-O3 -DNDEBUG' and run on a Pentium II/233 MHz, tests with various n and d using the binary heap priority queue, run 50 times each, take the times shown in the table below, in seconds: d 2 5 10 50 0.3 0.4 0.7 n 100 1.2 1.4 2.2 200 4.3 5.1 8.0 400 17.2 20.5 30.2 800 69.1 85.2 116.6 Note that doubling n causes the time to quadruple, almost exactly, so the practical performance of the code for n in these ranges is O(n**2) under these conditions. When a Fibonacci heap is used instead, the times _increase_ slightly: d 2 5 10 50 0.4 0.4 0.7 n 100 1.2 1.4 2.3 200 4.6 5.5 8.2 400 18.4 21.7 31.1 800 72.5 86.8 119.6 This effect is presumably due to the increased complexity of a Fibonacci heap. According to *Note Cormen 1990::, chapter 21, this is not unexpected: From a practical point of view, however, the constant factors and programming complexity of Fibonacci heaps make them less desirable than ordinary binary (or k-ary) heaps for most applications. Thus, Fibonacci heaps are predominantly of theoretical interest. 6.5 Future Work =============== Currently the code for the uniformity testing library is very general, because it was written with little idea of how it was going to be used. Specializing it for particular situations may be one task for future work. For instance, some applications may prefer the use of a bounding hypersphere instead of the provided hyper-rectangle implementation. There is much potential for optimization. Profiling shows that generation and testing of points and the minimum spanning tree computation take roughly equal amounts of time, so improvement in either one would improve performance. A few minutes work on the implementation of the Prim MST in particular might help. Some experimentation shows that a 10% improvement there, by modifying its interface to the priority queue, is fairly simple to achieve, and a larger improvement might not be much more difficult. Replacement of Prim's algorithm by one of Bentley's algorithms might reap much greater benefits. This document could also use some elaboration for the benefit of less knowledgeable readers. For instance, a better explanation of Prim's algorithm for constructing MST and for Fibonacci heap algorithms would be helpful. Appendix A References ********************* [Abramowitz 1972]. Abramowitz, M., and I. A. Stegun, _Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables_. U. S. Govt. Print. Off., Washington, D.C., 1972. [Bentley 1978]. Bentley, J. L., and J. H. Friedman, "Fast Algorithms for Constructing Minimal Spanning Trees in Coordinate Spaces," _IEEE Trans. on Comp._ 2 (1984), pp. 97-105. [Boyer 1997]. Boyer, J., "Algorithm Alley: The Fibonacci Heap," _Dr. Dobb's Journal_, Jan. 1997, `http://www.ddj.com/articles/1997/9701/9701o/9701o.htm?topic=algorithms'. [Cormen 1990]. Cormen, C. H., C. E. Leiserson, and R. L. Rivest, _Introduction to Algorithms_. McGraw-Hill, 1990. ISBN 0-262-03141-8. [FSF 2001]. Free Software Foundation, _GNU Coding Standards_, edition 23 Mar 2001. `http://www.gnu.org/prep/standards_toc.html'. [Goldberg 1991]. Goldberg, D., "What Every Computer Scientist Should Know About Floating-Point Arithmetic," _ACM Computing Surveys_, 23 (1991), pp. 5-48. [ISO 1990]. International Organization for Standardization, _ANSI/ISO 9899-1990: American National Standard for Programming Languages--C_, 1990. Reprinted in _The Annotated ANSI C Standard_, ISBN 0-07-881952-0. [ISO 1999]. International Organization for Standardization, _International Standard ISO/IEC 9899:1999(E): Programming Languages--C_, 1999. [Knuth 1992]. Knuth, D. E., _Literate Programming_, CSLI Lecture Notes Number 27. Center for the Study of Language and Information, Leland Stanford Junior University, 1992. ISBN 0-9370-7380-6. [Knuth 1973]. Knuth, D. E., _The Art of Computer Programming, Volume 3: Sorting and Searching_. Addison-Wesley, 1973. ISBN 0-201-03803-X. 2nd ed.: ISBN 0-201-89685-0, 1998. [Matsumoto 1997]. Matsumoto, M., and T. Nishimura, "Mersenne Twister" pseudo-random number generator. `http://www.math.keio.ac.jp/~matumoto/emt.html'. [Pfaff 2000]. Pfaff, B. L., _`libavl' library for manipulation of binary trees_, version 2.0 ALPHA, edition 13 Dec 2000. `http://www.msu.edu/~pfaffben/avl/'. [Pfaff 2001]. Pfaff, B. L., _Personal Coding Standards_, version 1.2, edition 5 Jan 2001. `http://www.msu.edu/~pfaffben/writings/'. [Smith 1984]. Smith, S. P., and A. K. Jain, "Testing for Uniformity in Multidimensional Data," _IEEE Trans. on Pattern Analysis and Machine Intelligence_ 6 (1984), pp. 73-81. [Summit 1999]. Summit, S., _comp.lang.c Answers to Frequently Asked Questions_, version 3.5. `http://www.eskimo.com/~scs/C-faq/top.html'. ISBN 0-201-84519-9. [Fredman 1987]. Fredman, L. F., and R. E.: Tarjan, "Fibonacci Heaps and Their Uses in Improved Network Optimization Algorithms," _J. ACM_, 3 (1987), pp. 596-615. [Zwillinger 1996]. Zwillinger, D., _CRC Standard Mathematical Tables and Formulae_, 30th ed. CRC Press, Boca Raton, 1996. ISBN 0-8493-2479-3. Appendix B Index **************** add_child function: See 4.3.3. add_root function: See 4.3.3. allocate memory for uniformity test: See 2.2.1. BSD License: See 1.3. calc_mst function: See 3.2. calc_n_star_est function: See 2.2.2. calculate C statistic: See 2.2.3. calculate T statistic: See 2.2.3. clean up and return from Prim MST: See 3.2. delete root from binary heap: See 4.2.3. detach function: See 4.3.3. dump_recursive function: See 4.3.8. execute Prim MST: See 3.2. find MST of pooled points: See 2.2.3. free memory and return: See 2.2.3. generate points for uniformity test: See 2.2.1. GNU GPL: See 1.3. high-level uniformity test routine prototype: See 2.1. insert function: See 4.3.1. left macro: See 4.2. low-level uniformity test routines prototypes: See 2.1.1. mem-std.c: See 5.1. memory allocation for Prim MST: See 3.2. mst-prim.c <1>: See 3.3. mst-prim.c: See 3.2. node structure: See 4.3. normal_sig function: See 2.2.4. parent macro: See 4.2. pq structure <1>: See 4.3. pq structure: See 4.2. pq-bin-heap.c <1>: See 4.2.9. pq-bin-heap.c <2>: See 4.2.8. pq-bin-heap.c <3>: See 4.2.7. pq-bin-heap.c <4>: See 4.2.6. pq-bin-heap.c <5>: See 4.2.5. pq-bin-heap.c <6>: See 4.2.4. pq-bin-heap.c <7>: See 4.2.3. pq-bin-heap.c <8>: See 4.2.2. pq-bin-heap.c <9>: See 4.2.1. pq-bin-heap.c: See 4.2. pq-fib-heap.c <1>: See 4.3.9. pq-fib-heap.c <2>: See 4.3.8. pq-fib-heap.c <3>: See 4.3.7. pq-fib-heap.c <4>: See 4.3.6. pq-fib-heap.c <5>: See 4.3.5. pq-fib-heap.c <6>: See 4.3.4. pq-fib-heap.c <7>: See 4.3.3. pq-fib-heap.c <8>: See 4.3.2. pq-fib-heap.c <9>: See 4.3.1. pq-fib-heap.c: See 4.3. pq.h: See 4.1. pq_class structure: See 4.1. pq_count function <1>: See 4.3.5. pq_count function: See 4.2.5. pq_create function <1>: See 4.3.1. pq_create function: See 4.2.1. pq_decrease_key function <1>: See 4.3.4. pq_decrease_key function: See 4.2.4. pq_discard function <1>: See 4.3.2. pq_discard function: See 4.2.2. pq_dump function <1>: See 4.3.8. pq_dump function: See 4.2.8. pq_extract_min function <1>: See 4.3.3. pq_extract_min function: See 4.2.3. PQ_H macro: See 4.1. pq_key function <1>: See 4.3.6. pq_key function: See 4.2.6. pq_verify function <1>: See 4.3.7. pq_verify function: See 4.2.7. record results of Prim MST: See 3.2. restore binary heap property: See 4.2.3. right macro: See 4.2. rng-std.c: See 5.2. rng_get function: See 5.2. rng_seed function: See 5.2. run MST, calculate results, and return: See 2.2.1. save minimum value of binary heap: See 4.2.3. set up options for uniformity test: See 2.2.1. set-rect.c: See 5.3. set_create function: See 5.3. set_discard function: See 5.3. set_random function: See 5.3. significance of a normal variate: See 2.2.4. standard_alloc function: See 5.1. standard_free function: See 5.1. test options <1>: See 5.3. test options <2>: See 5.2. test options <3>: See 5.1. test options <4>: See 3.1. test options: See 2.1.2. unf_a_set structure: See 5.3. unf_calc_results function: See 2.2.4. UNF_DEFAULTS macro: See 2.1.2. unf_init_options function: See 2.2.5. unf_inside_hull function: See 2.2.2. unf_mem structure: See 5.1. unf_mem_malloc variable: See 2.1.2. unf_mst structure: See 3.1. unf_mst_prim_binary variable: See 2.1.2. unf_mst_prim_fibonacci variable: See 2.1.2. unf_mst_result type: See 3.1. unf_options structure: See 2.1.2. unf_rng structure: See 5.2. unf_rng_mt variable: See 2.1.2. unf_rng_system variable: See 2.1.2. unf_set structure: See 5.3. unf_set_rectangular variable: See 2.1.2. unf_test function: See 2.2.1. uniformity.c <1>: See 2.2.5. uniformity.c <2>: See 2.2.4. uniformity.c <3>: See 2.2.3. uniformity.c <4>: See 2.2.2. uniformity.c <5>: See 2.2.1. uniformity.c: See 2.2. uniformity.h: See 2.1. UNIFORMITY_H macro: See 2.1. verify_recursive function: See 4.3.7.