In this post, I describe some thoughts on (the proofs and related ideas behind) the Weierstrass approximation theorem.

**1. Revisited/filtered thoughts after first trying to prove it myself**

First instinct is to use Lagrange interpolation. Runge’s phenomenon says equally spaced nodes are bad for this. More generally even smarter things like Chebyshev nodes are bad. See comments here for some intuition: high degree means greater oscillations in between nodes, as we’ve only controlled nodes perfectly and it’s thus hard to bound stuff between nodes. (On the other hand, I don’t see good intuition a priori why something like Chebyshev nodes shouldn’t work, **it’s just that it’s more plausible that it won’t work** than a “smoother/more-averaged-out” approximation. In fact the Wikipedia says all absolutely continuous guys are good with Chebyshev so… .)

Let’s do a continuous/smoother version of Lagrange interpolation instead. Analogous to Lagrange interpolation polynomials ( with and for ), we want interpolation polynomials such that when for some bijection (and quickly decays as leaves ); for simplicity we just use , but perhaps in other contexts we’d need to do something more complicated? We’ll then take ; this is apparently related to blurring; actually, this has some issues with endpoints, but if we first reduce to (or write out stuff more carefully) then we’re good to go. Anyway here works, where is a fixed positive integer and is just a scale factor to make , with an easy bound of . The rest is not hard, looking at using a uniform continuity bound to bound when is small. (We need uniform continuity since the same bound will apply independently of .)

**2. Probabilistic proofs**

The probabilistic proof with Bernstein polynomials is interesting and I don’t have great intuition for why this works while the Lagrange interpolation doesn’t, except that the term has “smooth/well-behaved contribution” that peaks when . (Compared with Lagrange where the contribution is at , at the , and perhaps unpredictable elsewhere. That probably makes stuff around particularly hard to control, given that our main bound will probably come from uniform continuity of around .)

**3. A more careful approach (which extends to other contexts, like trigonometric polynomials; Stone’s generalization)**

Another approach: it suffices to approximate the absolute value function on , by the following two links.

By the polynomial interpolation Wikipedia link above though, we should be able to use Chebyshev nodes to do this. Fix . Let for , so for and for . Then with , so on .

But (by differentiating or using roots of unity) we have if , and otherwise. So letting and noting that , we have

For we have , so for fixed , we have a negative number with increasing magnitude as increases in . So we have an alternating series, which bounds stuff by

for . Of course by symmetry we get the same bound for , so for all , and we’re done.

(Actually note that these are Chebyshev nodes of the *second* kind, but at least intuitively, it shouldn’t make a huge difference…)

**4. More related reading**

- Wikipedia on approximation theory, and a MathOverflow question on approximating polynomials related to Hermite interpolation.
- Chebyshev equioscillation theorem, and theorem/reference about how well Chebyshev can do in general: see here.
- Rational functions are way better at approximating: link.

After taking 18.103 (http://math.mit.edu/~lguth/Math103.html): I think the Fourier analysis perspective is pretty nice (possibly even better), using a “good” kernel like the Fejer kernel (http://blogs.ethz.ch/kowalski/smoothing-sums-wiki-page/ or http://blogs.ethz.ch/kowalski/smoothing-sums-wiki-page/).

LikeLike