A Probabilistic Theory of Pattern Recognition

by

Luc Devroye, László Györfi and Gábor Lugosi

Springer-Verlag, New York, 1996

ISBN 0-387-94618-7

INDEX

Preface
Introduction
The Bayes error
Inequalities and alternate distance measures
Linear discrimination
Nearest neighbor rules
Consistency
Slow rates of convergence
Error estimation
The regular histogram rule
Kernel rules
Consistency of the k-nearest neighbor rule
Vapnik-Chervonenkis theory
Combinatorial aspects of Vapnik-Chervonenkis theory
Lower bounds for empirical classifier selection
The maximum likelihood principle
Parametric classification
Generalized linear discrimination
Complexity regularization
Condensed and edited nearest neighbor rules
Tree classifiers
Data-dependent partitioning
Splitting the data
The resubstitution estimate
Deleted estimates of the error probability
Automatic kernel rules
Automatic nearest neighbor rules
Hypercubes and discrete spaces
Epsilon entropy and totally bounded sets
Uniform laws of large numbers
Neural networks
Other error estimates
Feature extraction
Appendix
List of notations
Bibliography
References

Note: Please let us know when you find errors. Do not complain about the figures though, because many were changed by Springer's editors without consulting us.

Preface

1 Introduction

2 The Bayes error

2.1 The Bayes problem
2.2 A simple example
2.3 Another simple example
2.4 Other formulas for the Bayes risk
2.5 Plug-in decisions
2.6 Bayes error versus dimension
Problems and exercises

3 Inequalities and alternate distance measures

3.1 Measuring discriminatory information
3.2 The Kolmogorov variational distance
3.3 The nearest neighbor error
3.4 The Bhattacharyya affinity
3.5 Entropy
3.6 Jeffreys' divergence
3.7 F-errors
3.8 The Mahalanobis distance
3.9 f-divergences
Problems and exercises

4 Linear discrimination

4.1 Univariate discrimination and Stoller splits
4.2 Linear discriminants
4.3 The Fisher linear discriminant
4.4 The normal distribution
4.5 Empirical risk minimization
4.6 Minimizing other criteria
Problems and exercises

5 Nearest neighbor rules

5.1 Introduction
5.2 Notation and simple asymptotics
5.3 Proof of Stone's lemma
5.4 The asymptotic probability of error
5.5 The asymptotic error probability of weighted nearest neighbor rules
5.6 k-nearest neighbor rules---even k
5.7 Inequalities for the probability of error
5.8 Behavior when L* is small
5.9 Nearest neighbor rules when L*=0
5.10 Admissibility of the nearest neighbor rule
5.11 The (k,l)-nearest neighbor rule
Problems and exercises

6 Consistency

6.1 Universal consistency
6.2 Classification and regression estimation
6.3 Partitioning rules
6.4 The histogram rule
6.5 Stone's theorem
6.6 The k-nearest neighbor rule
6.7 Classification is easier than regression function estimation
6.8 Smart rules
Problems and exercises

7 Slow rates of convergence

7.1 Finite training sequence
7.2 Slow rates
Problems and exercises

8 Error estimation

8.1 Error counting
8.2 Hoeffding's inequality
8.3 Error estimation without testing data
8.4 Selecting classifiers
8.5 Estimating the Bayes error
Problems and exercises

9 The regular histogram rule

9.1 The method of bounded differences
9.2 Strong universal consistency
Problems and exercises

10 Kernel rules

10.1 Consistency
10.2 Proof of the consistency theorem
10.3 Potential function rules
Problems and exercises

11 Consistency of the k-nearest neighbor rule

11.1 Strong consistency
11.2 Breaking distance ties
11.3 Recursive methods
11.4 Scale-invariant rules
11.5 Weighted nearest neighbor rules
11.6 Rotation-invariant rules
11.7 Relabeling rules
Problems and exercises

12 Vapnik-Chervonenkis theory

12.1 Empirical error minimization 12.2 Fingering
12.3 The Glivenko-Cantelli theorem
12.4 Uniform deviations of relative frequencies from probabilities
12.5 Classifier selection
12.6 Sample complexity
12.7 The zero-error case
12.8 Extensions
Problems and exercises

13 Combinatorial aspects of Vapnik-Chervonenkis theory

13.1 Shatter coefficients and VC dimension
13.2 Shatter coefficients of some classes
13.3 Linear and generalized linear discrimination rules
13.4 Convex sets and monotone layers
Problems and exercises

14 Lower bounds for empirical classifier selection

14.1 Minimax lower bounds
14.2 The case L_C=0
14.3 Classes with infinite VC dimension
14.4 The case L_C>0
14.5 Sample complexity
Problems and exercises

15 The maximum likelihood principle

15.1 Maximum likelihood: the formats
15.2 The maximum likelihood method: regression format
15.3 Consistency
15.4 Examples
15.5 Classical maximum likelihood: distribution format
Problems and exercises

16 Parametric classification

16.1 Example: exponential families
16.2 Standard plug-in rules
16.3 Minimum distance estimates
16.4 Empirical error minimization
Problems and exercises

17 Generalized linear discrimination

17.1 Fourier series classification
17.2 Generalized linear classification
Problems and exercises

18 Complexity regularization

18.1 Structural risk minimization
18.2 Poor approximation properties of VC-classes
18.3 Simple empirical covering
Problems and exercises

19 Condensed and edited nearest neighbor rules

19.1 Condensed nearest neighbor rules
19.2 Edited nearest neighbor rules
19.3 Sieves, prototypes
Problems and exercises

20 Tree classifiers

20.1 Invariance
20.2 Trees with the X-property
20.3 Balanced search trees
20.4 Binary search trees
20.5 The chronological k-d tree
20.6 The deep k-d tree
20.7 Quadtrees
20.8 Best possible perpendicular splits
20.9 Splitting criteria based on impurity functions
20.10 A consistent splitting criterion
20.11 BSP trees
20.12 Primitive selection
20.13 Constructing consistent tree classifiers
20.14 A greedy classifier
Problems and exercises

21 Data-dependent partitioning

21.1 Introduction
21.2 A Vapnik-Chervonenkis inequality for partitions
21.3 Consistency
21.4 Statistically equivalent blocks
21.5 Partitioning rules based on clustering
21.6 Data-based scaling
21.7 Classification trees
Problems and exercises

22 Splitting the data

22.1 The holdout estimate
22.2 Consistency and asymptotic optimality
22.3 Nearest neighbor rules with automatic scaling
22.4 Classification based on clustering
22.5 Statistically equivalent blocks
22.6 Binary tree classifiers
Problems and exercises

23 The resubstitution estimate

23.1 The resubstitution estimate
23.2 Histogram rules
23.3 Data-based histograms and rule selection
Problems and exercises

24 Deleted estimates of the error probability

24.1 A general lower bound
24.2 A general upper bound for deleted estimates
24.3 Nearest neighbor rules
24.4 Kernel rules
24.5 Histogram rules
Problems and exercises

25 Automatic kernel rules

25.1 Consistency
25.2 Data splitting
25.3 Kernel complexity
25.4 Multiparameter kernel rules
25.5 Kernels of infinite complexity
25.6 On minimizing the apparent error rate
25.7 Minimizing the deleted estimate
25.8 Sieve methods
25.9 Squared error minimization
Problems and exercises

26 Automatic nearest neighbor rules

26.1 Consistency
26.2 Data splitting
26.3 Data splitting for weighted NN rules
26.4 Reference data and data splitting
26.5 Variable metric NN rules
26.6 Selection of k based on the deleted estimate
Problems and exercises

27 Hypercubes and discrete spaces

27.1 Multinomial discrimination
27.2 Quantization
27.3 Independent components
27.4 Boolean classifiers
27.5 Series methods for the hypercube
27.6 Maximum likelihood
27.7 Kernel methods
Problems and exercises

28 Epsilon entropy and totally bounded sets

28.1 Definitions
28.2 Examples---totally bounded classes
28.3 Skeleton estimates
28.4 Rate of convergence
Problems and exercises

29 Uniform laws of large numbers

29.1 Minimizing the empirical squared error
29.2 Uniform deviations of averages from expectations
29.3 Empirical squared error minimization
29.4 Proof of the Theorem of Vapnik and Chervonenkis
29.5 Covering numbers and shatter coefficients
29.6 Generalized linear classification
Problems and exercises

30 Neural networks

30.1 Multilayer perceptrons
30.2 Arrangements
30.3 Approximation by neural networks
30.4 VC dimension
30.5 L_1 error minimization
30.6 The Adaline and Padaline
30.7 Polynomial networks
30.8 Kolmogorov-Lorentz networks and additive models
30.9 Projection pursuit
30.10 Radial basis function networks
Problems and exercises

31 Other error estimates

31.1 Smoothing the error count
31.2 Posterior probability estimates
31.3 Rotation estimate
31.4 Bootstrap
Problems and exercises

32 Feature extraction

32.1 Dimensionality reduction
32.2 Transformations with small distortion
32.3 Admissible and sufficient transformations
Problems and exercises

33 Appendix

A.1 Basics of measure theory
A.2 The Lebesgue integral
A.3 Denseness results
A.4 Probability
A.5 Inequalities
A.6 Convergence of random variables
A.7 Conditional expectation
A.8 The binomial distribution
A.9 The hypergeometric distribution
A.10 The multinomial distribution
A.11 The exponential and gamma distributions
A.12 The multivariate normal distribution