A Probabilistic Theory of Pattern Recognition
by
Springer-Verlag, New York, 1996
ISBN 0-387-94618-7
INDEX
Note: Please let us know when you find errors.
Do not complain about the figures though, because
many were changed by Springer's editors without
consulting us.
Preface
1 Introduction
2 The Bayes error
2.1 The Bayes problem
2.2 A simple example
2.3 Another simple example
2.4 Other formulas for the Bayes risk
2.5 Plug-in decisions
2.6 Bayes error versus dimension
Problems and exercises
3 Inequalities and alternate distance measures
3.1 Measuring discriminatory information
3.2 The Kolmogorov variational distance
3.3 The nearest neighbor error
3.4 The Bhattacharyya affinity
3.5 Entropy
3.6 Jeffreys' divergence
3.7 F-errors
3.8 The Mahalanobis distance
3.9 f-divergences
Problems and exercises
4 Linear discrimination
4.1 Univariate discrimination and Stoller splits
4.2 Linear discriminants
4.3 The Fisher linear discriminant
4.4 The normal distribution
4.5 Empirical risk minimization
4.6 Minimizing other criteria
Problems and exercises
5 Nearest neighbor rules
5.1 Introduction
5.2 Notation and simple asymptotics
5.3 Proof of Stone's lemma
5.4 The asymptotic probability of error
5.5 The asymptotic error probability of weighted nearest neighbor rules
5.6 k-nearest neighbor rules---even k
5.7 Inequalities for the probability of error
5.8 Behavior when L* is small
5.9 Nearest neighbor rules when L*=0
5.10 Admissibility of the nearest neighbor rule
5.11 The (k,l)-nearest neighbor rule
Problems and exercises
6 Consistency
6.1 Universal consistency
6.2 Classification and regression estimation
6.3 Partitioning rules
6.4 The histogram rule
6.5 Stone's theorem
6.6 The k-nearest neighbor rule
6.7 Classification is easier than regression function estimation
6.8 Smart rules
Problems and exercises
7 Slow rates of convergence
7.1 Finite training sequence
7.2 Slow rates
Problems and exercises
8 Error estimation
8.1 Error counting
8.2 Hoeffding's inequality
8.3 Error estimation without testing data
8.4 Selecting classifiers
8.5 Estimating the Bayes error
Problems and exercises
9 The regular histogram rule
9.1 The method of bounded differences
9.2 Strong universal consistency
Problems and exercises
10 Kernel rules
10.1 Consistency
10.2 Proof of the consistency theorem
10.3 Potential function rules
Problems and exercises
11 Consistency of the k-nearest neighbor rule
11.1 Strong consistency
11.2 Breaking distance ties
11.3 Recursive methods
11.4 Scale-invariant rules
11.5 Weighted nearest neighbor rules
11.6 Rotation-invariant rules
11.7 Relabeling rules
Problems and exercises
12 Vapnik-Chervonenkis theory
12.1 Empirical error minimization
12.2 Fingering
12.3 The Glivenko-Cantelli theorem
12.4 Uniform deviations of relative frequencies from probabilities
12.5 Classifier selection
12.6 Sample complexity
12.7 The zero-error case
12.8 Extensions
Problems and exercises
13 Combinatorial aspects of Vapnik-Chervonenkis theory
13.1 Shatter coefficients and VC dimension
13.2 Shatter coefficients of some classes
13.3 Linear and generalized linear discrimination rules
13.4 Convex sets and monotone layers
Problems and exercises
14 Lower bounds for empirical classifier selection
14.1 Minimax lower bounds
14.2 The case L_C=0
14.3 Classes with infinite VC dimension
14.4 The case L_C>0
14.5 Sample complexity
Problems and exercises
15 The maximum likelihood principle
15.1 Maximum likelihood: the formats
15.2 The maximum likelihood method: regression format
15.3 Consistency
15.4 Examples
15.5 Classical maximum likelihood: distribution format
Problems and exercises
16 Parametric classification
16.1 Example: exponential families
16.2 Standard plug-in rules
16.3 Minimum distance estimates
16.4 Empirical error minimization
Problems and exercises
17 Generalized linear discrimination
17.1 Fourier series classification
17.2 Generalized linear classification
Problems and exercises
18 Complexity regularization
18.1 Structural risk minimization
18.2 Poor approximation properties of VC-classes
18.3 Simple empirical covering
Problems and exercises
19 Condensed and edited nearest neighbor rules
19.1 Condensed nearest neighbor rules
19.2 Edited nearest neighbor rules
19.3 Sieves, prototypes
Problems and exercises
20 Tree classifiers
20.1 Invariance
20.2 Trees with the X-property
20.3 Balanced search trees
20.4 Binary search trees
20.5 The chronological k-d tree
20.6 The deep k-d tree
20.7 Quadtrees
20.8 Best possible perpendicular splits
20.9 Splitting criteria based on impurity functions
20.10 A consistent splitting criterion
20.11 BSP trees
20.12 Primitive selection
20.13 Constructing consistent tree classifiers
20.14 A greedy classifier
Problems and exercises
21 Data-dependent partitioning
21.1 Introduction
21.2 A Vapnik-Chervonenkis inequality for partitions
21.3 Consistency
21.4 Statistically equivalent blocks
21.5 Partitioning rules based on clustering
21.6 Data-based scaling
21.7 Classification trees
Problems and exercises
22 Splitting the data
22.1 The holdout estimate
22.2 Consistency and asymptotic optimality
22.3 Nearest neighbor rules with automatic scaling
22.4 Classification based on clustering
22.5 Statistically equivalent blocks
22.6 Binary tree classifiers
Problems and exercises
23 The resubstitution estimate
23.1 The resubstitution estimate
23.2 Histogram rules
23.3 Data-based histograms and rule selection
Problems and exercises
24 Deleted estimates of the error probability
24.1 A general lower bound
24.2 A general upper bound for deleted estimates
24.3 Nearest neighbor rules
24.4 Kernel rules
24.5 Histogram rules
Problems and exercises
25 Automatic kernel rules
25.1 Consistency
25.2 Data splitting
25.3 Kernel complexity
25.4 Multiparameter kernel rules
25.5 Kernels of infinite complexity
25.6 On minimizing the apparent error rate
25.7 Minimizing the deleted estimate
25.8 Sieve methods
25.9 Squared error minimization
Problems and exercises
26 Automatic nearest neighbor rules
26.1 Consistency
26.2 Data splitting
26.3 Data splitting for weighted NN rules
26.4 Reference data and data splitting
26.5 Variable metric NN rules
26.6 Selection of k based on the deleted estimate
Problems and exercises
27 Hypercubes and discrete spaces
27.1 Multinomial discrimination
27.2 Quantization
27.3 Independent components
27.4 Boolean classifiers
27.5 Series methods for the hypercube
27.6 Maximum likelihood
27.7 Kernel methods
Problems and exercises
28 Epsilon entropy and totally bounded sets
28.1 Definitions
28.2 Examples---totally bounded classes
28.3 Skeleton estimates
28.4 Rate of convergence
Problems and exercises
29 Uniform laws of large numbers
29.1 Minimizing the empirical squared error
29.2 Uniform deviations of averages from expectations
29.3 Empirical squared error minimization
29.4 Proof of the Theorem of Vapnik and Chervonenkis
29.5 Covering numbers and shatter coefficients
29.6 Generalized linear classification
Problems and exercises
30 Neural networks
30.1 Multilayer perceptrons
30.2 Arrangements
30.3 Approximation by neural networks
30.4 VC dimension
30.5 L_1 error minimization
30.6 The Adaline and Padaline
30.7 Polynomial networks
30.8 Kolmogorov-Lorentz networks and additive models
30.9 Projection pursuit
30.10 Radial basis function networks
Problems and exercises
31 Other error estimates
31.1 Smoothing the error count
31.2 Posterior probability estimates
31.3 Rotation estimate
31.4 Bootstrap
Problems and exercises
32 Feature extraction
32.1 Dimensionality reduction
32.2 Transformations with small distortion
32.3 Admissible and sufficient transformations
Problems and exercises
33 Appendix
A.1 Basics of measure theory
A.2 The Lebesgue integral
A.3 Denseness results
A.4 Probability
A.5 Inequalities
A.6 Convergence of random variables
A.7 Conditional expectation
A.8 The binomial distribution
A.9 The hypergeometric distribution
A.10 The multinomial distribution
A.11 The exponential and gamma distributions
A.12 The multivariate normal distribution
34 List of notations
35 Bibliography
36 References