## Kernel Density Estimates

A weighted univariate kernel density estimate involves a variable
*X* and a weight variable *W*. Let (*X*_{i},*W*_{i}), *i* = 1,2, ... ,*n*
denote a sample of *X* and *W* of size *n*. The weighted kernel
density estimate of *f*(*x*), the density of *X*, is as follows:

where *h* is the bandwidth and

is the standard normal density rescaled by the bandwidth.
If and
, then the optimal bandwidth is

This optimal value is unknown, and so approximations methods are
required. For a derivation and discussion of these results, refer to
Silverman (1986, Chapter 3) and Jones, Marron, and Sheather (1996).
For the bivariate case, let **X** = (*X*,*Y*) be a bivariate random
element taking values in with joint density function
, and let **X**_{i} = (*X*_{i},*Y*_{i}), *i* = 1,2, ... , *n* be a sample of size *n* drawn from this
distribution. The kernel density estimate of *f*(*x*,*y*) based on this
sample is

where , *h*_{X}>0 and *h*_{Y}>0 are the
bandwidths and is the rescaled normal
density:

where is the standard normal density function:

Under mild regularity assumptions about *f*(*x*,*y*), the mean
integrated squared error of is

as , and .Now set

which is the asymptotic mean integrated squared error. For fixed
*n*, this has minimum at (*h*_{AMISE_X}, *h*_{AMISE_Y}) defined as

and

These are the optimal asymptotic bandwidths in the sense that they
minimize MISE. However, as in the univariate case, these
expressions contain the second derivatives of the unknown density
*f* being estimated, and so approximations are required. Refer to
Wand and Jones (1993) for further details.

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.