Weierstrass approximation theorem

In this post, I describe some thoughts on (the proofs and related ideas behind) the Weierstrass approximation theorem.

1. Revisited/filtered thoughts after first trying to prove it myself

First instinct is to use Lagrange interpolation. Runge’s phenomenon says equally spaced nodes are bad for this. More generally even smarter things like Chebyshev nodes are bad. See comments here for some intuition: high degree means greater oscillations in between nodes, as we’ve only controlled nodes perfectly and it’s thus hard to bound stuff between nodes. (On the other hand, I don’t see good intuition a priori why something like Chebyshev nodes shouldn’t work, it’s just that it’s more plausible that it won’t work than a “smoother/more-averaged-out” approximation. In fact the Wikipedia says all absolutely continuous guys are good with Chebyshev so… .)

Let’s do a continuous/smoother version of Lagrange interpolation instead. Analogous to Lagrange interpolation polynomials ({p_k(x)} with {p_k(x_k) = 1} and {p_k(x_j) = 0} for {j\ne k}), we want interpolation polynomials {p_t(x)} such that {p_t(x) \approx 1} when {x \approx x_t} for some bijection {t\rightarrow x_t} (and {p_t(x)} quickly decays as {x} leaves {x_t}); for simplicity we just use {x_t := t}, but perhaps in other contexts we’d need to do something more complicated? We’ll then take {P(x) = \int_0^1 f(t) p_t(x) dt}; this is apparently related to blurring; actually, this has some issues with endpoints, but if we first reduce to {f(0) = f(1) = 0} (or write out stuff more carefully) then we’re good to go. Anyway here {p_t(x) = c_n(1-(x-t)^2)^n} works, where {n} is a fixed positive integer and {c_n} is just a scale factor to make {c_n \int_{-1}^1 (1-u^2)^n du = 1}, with an easy bound of {c_n < \sqrt{n}}. The rest is not hard, looking at {\int_{-1+x}^{1+x} f(x)p_t(x) dt - \int_0^1 f(t)p_t(x) dt} using a uniform continuity bound to bound {\lvert{f(t)-f(x)}\rvert} when {t-x} is small. (We need uniform continuity since the same bound will apply independently of {x}.)

2. Probabilistic proofs

The probabilistic proof with Bernstein polynomials is interesting and I don’t have great intuition for why this works while the Lagrange interpolation doesn’t, except that the {f(i/n) [\binom{n}{i} x^i(1-x)^{n-i}]} term has “smooth/well-behaved contribution” that peaks when {x\approx i/n}. (Compared with Lagrange where the contribution is {1} at {x_i}, {0} at the {x_{j\ne i}}, and perhaps unpredictable elsewhere. That probably makes stuff around {[x_{i-1},x_{i+1}]} particularly hard to control, given that our main bound will probably come from uniform continuity of {f} around {x_i}.)

3. A more careful approach (which extends to other contexts, like trigonometric polynomials; Stone’s generalization)

Another approach: it suffices to approximate the absolute value function on {[-1,1]}, by the following two links.

By the polynomial interpolation Wikipedia link above though, we should be able to use Chebyshev nodes to do this. Fix {n\ge1}. Let {t_k = \cos{k\pi /n}} for {0\le k\le n}, so {t_k \ge 0} for {k \le n/2} and {t_k \le 0} for {k \ge n/2}. Then {g(x) := \prod_{k=0}^{n} (x-t_k) = \frac{\sqrt{x^2-1}}{2^n}[(x+\sqrt{x^2-1})^n - (x-\sqrt{x^2-1})^n]} with {g(\cos\alpha) = 2^{-n}(i\sin\alpha)(2i\sin(n\alpha))}, so {\lvert{g(x)}\rvert \le 2^{-(n-1)}} on {[-1,1]}.

But (by differentiating or using roots of unity) we have {\prod_{j\ne k}(t_k-t_j) = (-1)^k n2^{-(n-1)}} if {0<k<n}, and {\prod_{j\ne k}(t_k-t_j) = (-1)^k n 2^{-(n-2)}} otherwise. So letting {P_n(x) = \sum_{k=0}^{n} \lvert{t_k}\rvert \prod_{j\ne k} \frac{x-t_j}{t_k-t_j}} and noting that {x = \sum_{k=0}^{n} t_k \prod_{j\ne k} \frac{x-t_j}{t_k-t_j}}, we have

\displaystyle \frac{x - P_n(x)}{2} = g(x)\left(\frac{t_n}{x-t_n}\frac{2^{n-2}}{(-1)^n n} + \sum_{k > n/2}^{n-1} \frac{t_k}{x-t_k}\frac{2^{n-1}}{(-1)^k n}\right).

For {k>n/2} we have {t_k<0}, so for fixed {x\ge0}, we have {\frac{t_k}{x-t_k} \ge -1} a negative number with increasing magnitude as {k} increases in {(n/2,n]}. So we have an alternating series, which bounds stuff by

\displaystyle \frac{\lvert{x-P_n(x)}\rvert}{2} \le \lvert{g(x)}\rvert \frac{2^{n-2}}{n} (1 + 2\cdot 1) \le \frac{3}{2n}

for {x\ge0}. Of course by symmetry we get the same bound for {x\le0}, so {\lvert P_n(x) - \lvert{x}\rvert \rvert \le 3/n} for all {x\in[-1,1]}, and we’re done.

(Actually note that these are Chebyshev nodes of the second kind, but at least intuitively, it shouldn’t make a huge difference…)

4. More related reading

  1. Wikipedia on approximation theory, and a MathOverflow question on approximating polynomials related to Hermite interpolation.
  2. Chebyshev equioscillation theorem, and theorem/reference about how well Chebyshev can do in general: see here.
  3. Rational functions are way better at approximating: link.

One thought on “Weierstrass approximation theorem”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s