- Preface
- Introduction
- The Bayes error
- Inequalities and alternate distance measures
- Linear discrimination
- Nearest neighbor rules
- Consistency
- Slow rates of convergence
- Error estimation
- The regular histogram rule
- Kernel rules
- Consistency of the k-nearest neighbor rule
- Vapnik-Chervonenkis theory
- Combinatorial aspects of Vapnik-Chervonenkis theory
- Lower bounds for empirical classifier selection
- The maximum likelihood principle
- Parametric classification
- Generalized linear discrimination
- Complexity regularization
- Condensed and edited nearest neighbor rules
- Tree classifiers
- Data-dependent partitioning
- Splitting the data
- The resubstitution estimate
- Deleted estimates of the error probability
- Automatic kernel rules
- Automatic nearest neighbor rules
- Hypercubes and discrete spaces
- Epsilon entropy and totally bounded sets
- Uniform laws of large numbers
- Neural networks
- Other error estimates
- Feature extraction
- Appendix
- List of notations
- Bibliography
- References

Note: Please let us know when you find errors. Do not complain about the figures though, because many were changed by Springer's editors without consulting us.

2.2 A simple example

2.3 Another simple example

2.4 Other formulas for the Bayes risk

2.5 Plug-in decisions

2.6 Bayes error versus dimension

Problems and exercises

3.2 The Kolmogorov variational distance

3.3 The nearest neighbor error

3.4 The Bhattacharyya affinity

3.5 Entropy

3.6 Jeffreys' divergence

3.7 F-errors

3.8 The Mahalanobis distance

3.9 f-divergences

Problems and exercises

4.2 Linear discriminants

4.3 The Fisher linear discriminant

4.4 The normal distribution

4.5 Empirical risk minimization

4.6 Minimizing other criteria

Problems and exercises

5.2 Notation and simple asymptotics

5.3 Proof of Stone's lemma

5.4 The asymptotic probability of error

5.5 The asymptotic error probability of weighted nearest neighbor rules

5.6 k-nearest neighbor rules---even k

5.7 Inequalities for the probability of error

5.8 Behavior when L* is small

5.9 Nearest neighbor rules when L*=0

5.10 Admissibility of the nearest neighbor rule

5.11 The (k,l)-nearest neighbor rule

Problems and exercises

6.2 Classification and regression estimation

6.3 Partitioning rules

6.4 The histogram rule

6.5 Stone's theorem

6.6 The k-nearest neighbor rule

6.7 Classification is easier than regression function estimation

6.8 Smart rules

Problems and exercises

7.2 Slow rates

Problems and exercises

8.2 Hoeffding's inequality

8.3 Error estimation without testing data

8.4 Selecting classifiers

8.5 Estimating the Bayes error

Problems and exercises

9.2 Strong universal consistency

Problems and exercises

10.2 Proof of the consistency theorem

10.3 Potential function rules

Problems and exercises

11.2 Breaking distance ties

11.3 Recursive methods

11.4 Scale-invariant rules

11.5 Weighted nearest neighbor rules

11.6 Rotation-invariant rules

11.7 Relabeling rules

Problems and exercises

12.3 The Glivenko-Cantelli theorem

12.4 Uniform deviations of relative frequencies from probabilities

12.5 Classifier selection

12.6 Sample complexity

12.7 The zero-error case

12.8 Extensions

Problems and exercises

13.2 Shatter coefficients of some classes

13.3 Linear and generalized linear discrimination rules

13.4 Convex sets and monotone layers

Problems and exercises

14.2 The case L_C=0

14.3 Classes with infinite VC dimension

14.4 The case L_C>0

14.5 Sample complexity

Problems and exercises

15.2 The maximum likelihood method: regression format

15.3 Consistency

15.4 Examples

15.5 Classical maximum likelihood: distribution format

Problems and exercises

16.2 Standard plug-in rules

16.3 Minimum distance estimates

16.4 Empirical error minimization

Problems and exercises

17.2 Generalized linear classification

Problems and exercises

18.2 Poor approximation properties of VC-classes

18.3 Simple empirical covering

Problems and exercises

19.2 Edited nearest neighbor rules

19.3 Sieves, prototypes

Problems and exercises

20.2 Trees with the X-property

20.3 Balanced search trees

20.4 Binary search trees

20.5 The chronological k-d tree

20.6 The deep k-d tree

20.7 Quadtrees

20.8 Best possible perpendicular splits

20.9 Splitting criteria based on impurity functions

20.10 A consistent splitting criterion

20.11 BSP trees

20.12 Primitive selection

20.13 Constructing consistent tree classifiers

20.14 A greedy classifier

Problems and exercises

21.2 A Vapnik-Chervonenkis inequality for partitions

21.3 Consistency

21.4 Statistically equivalent blocks

21.5 Partitioning rules based on clustering

21.6 Data-based scaling

21.7 Classification trees

Problems and exercises

22.2 Consistency and asymptotic optimality

22.3 Nearest neighbor rules with automatic scaling

22.4 Classification based on clustering

22.5 Statistically equivalent blocks

22.6 Binary tree classifiers

Problems and exercises

23.2 Histogram rules

23.3 Data-based histograms and rule selection

Problems and exercises

24.2 A general upper bound for deleted estimates

24.3 Nearest neighbor rules

24.4 Kernel rules

24.5 Histogram rules

Problems and exercises

25.2 Data splitting

25.3 Kernel complexity

25.4 Multiparameter kernel rules

25.5 Kernels of infinite complexity

25.6 On minimizing the apparent error rate

25.7 Minimizing the deleted estimate

25.8 Sieve methods

25.9 Squared error minimization

Problems and exercises

26.2 Data splitting

26.3 Data splitting for weighted NN rules

26.4 Reference data and data splitting

26.5 Variable metric NN rules

26.6 Selection of k based on the deleted estimate

Problems and exercises

27.2 Quantization

27.3 Independent components

27.4 Boolean classifiers

27.5 Series methods for the hypercube

27.6 Maximum likelihood

27.7 Kernel methods

Problems and exercises

28.2 Examples---totally bounded classes

28.3 Skeleton estimates

28.4 Rate of convergence

Problems and exercises

29.2 Uniform deviations of averages from expectations

29.3 Empirical squared error minimization

29.4 Proof of the Theorem of Vapnik and Chervonenkis

29.5 Covering numbers and shatter coefficients

29.6 Generalized linear classification

Problems and exercises

30.2 Arrangements

30.3 Approximation by neural networks

30.4 VC dimension

30.5 L_1 error minimization

30.6 The Adaline and Padaline

30.7 Polynomial networks

30.8 Kolmogorov-Lorentz networks and additive models

30.9 Projection pursuit

30.10 Radial basis function networks

Problems and exercises

31.2 Posterior probability estimates

31.3 Rotation estimate

31.4 Bootstrap

Problems and exercises

32.2 Transformations with small distortion

32.3 Admissible and sufficient transformations

Problems and exercises

A.2 The Lebesgue integral

A.3 Denseness results

A.4 Probability

A.5 Inequalities

A.6 Convergence of random variables

A.7 Conditional expectation

A.8 The binomial distribution

A.9 The hypergeometric distribution

A.10 The multinomial distribution

A.11 The exponential and gamma distributions

A.12 The multivariate normal distribution