# Weierstrass approximation theorem

In this post, I describe some thoughts on (the proofs and related ideas behind) the Weierstrass approximation theorem.

1. Revisited/filtered thoughts after first trying to prove it myself

First instinct is to use Lagrange interpolation. Runge’s phenomenon says equally spaced nodes are bad for this. More generally even smarter things like Chebyshev nodes are bad. See comments here for some intuition: high degree means greater oscillations in between nodes, as we’ve only controlled nodes perfectly and it’s thus hard to bound stuff between nodes. (On the other hand, I don’t see good intuition a priori why something like Chebyshev nodes shouldn’t work, it’s just that it’s more plausible that it won’t work than a “smoother/more-averaged-out” approximation. In fact the Wikipedia says all absolutely continuous guys are good with Chebyshev so… .)

Let’s do a continuous/smoother version of Lagrange interpolation instead. Analogous to Lagrange interpolation polynomials (${p_k(x)}$ with ${p_k(x_k) = 1}$ and ${p_k(x_j) = 0}$ for ${j\ne k}$), we want interpolation polynomials ${p_t(x)}$ such that ${p_t(x) \approx 1}$ when ${x \approx x_t}$ for some bijection ${t\rightarrow x_t}$ (and ${p_t(x)}$ quickly decays as ${x}$ leaves ${x_t}$); for simplicity we just use ${x_t := t}$, but perhaps in other contexts we’d need to do something more complicated? We’ll then take ${P(x) = \int_0^1 f(t) p_t(x) dt}$; this is apparently related to blurring; actually, this has some issues with endpoints, but if we first reduce to ${f(0) = f(1) = 0}$ (or write out stuff more carefully) then we’re good to go. Anyway here ${p_t(x) = c_n(1-(x-t)^2)^n}$ works, where ${n}$ is a fixed positive integer and ${c_n}$ is just a scale factor to make ${c_n \int_{-1}^1 (1-u^2)^n du = 1}$, with an easy bound of ${c_n < \sqrt{n}}$. The rest is not hard, looking at ${\int_{-1+x}^{1+x} f(x)p_t(x) dt - \int_0^1 f(t)p_t(x) dt}$ using a uniform continuity bound to bound ${\lvert{f(t)-f(x)}\rvert}$ when ${t-x}$ is small. (We need uniform continuity since the same bound will apply independently of ${x}$.)

2. Probabilistic proofs

The probabilistic proof with Bernstein polynomials is interesting and I don’t have great intuition for why this works while the Lagrange interpolation doesn’t, except that the ${f(i/n) [\binom{n}{i} x^i(1-x)^{n-i}]}$ term has “smooth/well-behaved contribution” that peaks when ${x\approx i/n}$. (Compared with Lagrange where the contribution is ${1}$ at ${x_i}$, ${0}$ at the ${x_{j\ne i}}$, and perhaps unpredictable elsewhere. That probably makes stuff around ${[x_{i-1},x_{i+1}]}$ particularly hard to control, given that our main bound will probably come from uniform continuity of ${f}$ around ${x_i}$.)

3. A more careful approach (which extends to other contexts, like trigonometric polynomials; Stone’s generalization)

Another approach: it suffices to approximate the absolute value function on ${[-1,1]}$, by the following two links.

By the polynomial interpolation Wikipedia link above though, we should be able to use Chebyshev nodes to do this. Fix ${n\ge1}$. Let ${t_k = \cos{k\pi /n}}$ for ${0\le k\le n}$, so ${t_k \ge 0}$ for ${k \le n/2}$ and ${t_k \le 0}$ for ${k \ge n/2}$. Then ${g(x) := \prod_{k=0}^{n} (x-t_k) = \frac{\sqrt{x^2-1}}{2^n}[(x+\sqrt{x^2-1})^n - (x-\sqrt{x^2-1})^n]}$ with ${g(\cos\alpha) = 2^{-n}(i\sin\alpha)(2i\sin(n\alpha))}$, so ${\lvert{g(x)}\rvert \le 2^{-(n-1)}}$ on ${[-1,1]}$.

But (by differentiating or using roots of unity) we have ${\prod_{j\ne k}(t_k-t_j) = (-1)^k n2^{-(n-1)}}$ if ${0, and ${\prod_{j\ne k}(t_k-t_j) = (-1)^k n 2^{-(n-2)}}$ otherwise. So letting ${P_n(x) = \sum_{k=0}^{n} \lvert{t_k}\rvert \prod_{j\ne k} \frac{x-t_j}{t_k-t_j}}$ and noting that ${x = \sum_{k=0}^{n} t_k \prod_{j\ne k} \frac{x-t_j}{t_k-t_j}}$, we have

$\displaystyle \frac{x - P_n(x)}{2} = g(x)\left(\frac{t_n}{x-t_n}\frac{2^{n-2}}{(-1)^n n} + \sum_{k > n/2}^{n-1} \frac{t_k}{x-t_k}\frac{2^{n-1}}{(-1)^k n}\right).$

For ${k>n/2}$ we have ${t_k<0}$, so for fixed ${x\ge0}$, we have ${\frac{t_k}{x-t_k} \ge -1}$ a negative number with increasing magnitude as ${k}$ increases in ${(n/2,n]}$. So we have an alternating series, which bounds stuff by

$\displaystyle \frac{\lvert{x-P_n(x)}\rvert}{2} \le \lvert{g(x)}\rvert \frac{2^{n-2}}{n} (1 + 2\cdot 1) \le \frac{3}{2n}$

for ${x\ge0}$. Of course by symmetry we get the same bound for ${x\le0}$, so ${\lvert P_n(x) - \lvert{x}\rvert \rvert \le 3/n}$ for all ${x\in[-1,1]}$, and we’re done.

(Actually note that these are Chebyshev nodes of the second kind, but at least intuitively, it shouldn’t make a huge difference…)