Topics in Analysis

Warning: there are almost surely errors in this document.


Lecture 1; Introduction and Proof Techniques

Wednesday, January 20th


Proof Techinques

A proof is a formal argument that would convince your enemy. Types of proofs:

  1. Direct
  2. Indirect (assume false, and follow logic until arriving at nonsense)
  3. Induction

The direct and indirect methods have been known since antiquity.

Direct Proofs

Example: Direct Proof

Statement: For all \(x>0\), \(x + \frac{1}{x} \geq 2\).

Example: Flawed Proof

If \(n\) is the largest possible integer, then \(n=1\). Proof. Let \(n\) be the largest possible integer. The square of a \(n\) is greater than or equal to \(n\). However, because \(n\) is the largest possible integer, we then have \(n^2 = n\). It follows that:

\begin{align} n^2 &= n \\ n &= 1 \\ \end{align}

The flaw in the reasoning lies in the statement of what we wish to prove. We decompose the statement into two parts:

  1. There is a largest possible integer.
  2. That integer \(n=1\).

The first part is nonsense. ex false sequitur quodlibet.

Indirect Proofs


  1. We want to show (p).
  2. We assume (p) to be false.
  3. Attempt to derive nonsense.
Example: Indirect Proof

History. This proof was originally presented by Hippasus, a disciple of Pythagorus. Unable to accept that not all numbers can be expressed as integer relations, Pythagorus murdered Hippasus.

We want to show that two is irrational. In order words, \(\sqrt{2} \neq \frac{a}{b}\) for \(a, b \in \mathbb{N}\).

Proof by contradiction.

Step 1. Assume there exists \(a, b \in \mathbb{N}\) such that \(\sqrt{2} = \frac{a}{b}\).

\begin{align} 2 &= \frac{a^2}{b^2} \\ 2b^2 &= a^2 \\ \end{align}

Step 2. We can then assume that either a or b is not divisible by two. If both of them are, then simplify \(\frac{a}{b}\) and start again from the beginning.

Step 3. Note that even numbers have even squares, and odd numbers have odd squares. Because \(a^2 = 2b^2\), \(a^2\) must be even, implying that \(a\) is even as well.

Step 4. From (Step 2), \(b\) must be odd.

Step 5. From (Step 3), we know that \(a\) is even. We rewrite \(a = 2(\frac{a}{2})\), where \(\frac{a}{2}\) is an integer.

Step 6.

\begin{align} 2b^2 &= \left(2\left(\frac{a}{2}\right)\right)^2 \\ &= 4 \left(\frac{a}{2}\right) ^2 \\ b^2 &= 2 \left(\frac{a}{2}\right) ^2 \\ \end{align}

Step 7. Thus, \(b^2\) is even. \(\square\)

Example: Flawed Indirect Proof

Theorem: All numbers are definable in under even words.

Proof: Suppose not. There exists some number \(n\) that cannot be defined in under eleven words.

Let n = smallest number not definable with less than eleven words.

The smallest of these numbers was defined in under eleven words, therefore none exist. If any existed, it would be the smallest by default.

The error in this proof lies in "definable". It is not well-specified, making the entire proof bullshit. The "theorem" is meaningless.

To learn more, read about the verification principle, which was emerged in 1929.

Lecture 2; Induction, function classes, infinity

Friday, January 22nd


An Aside: Epimenides Paradox

Epimenides, a man from Crete, stated: "All Cretans are liars." The issue of his statement being self-referential is immediately clear. Centuries later, mathematics would wrestle with this idea after the introduction of set theory.

It is worth noting, however, that in the case of Epimenides, the paradox is resolved in Titus 1:12-13.

12 One of themselves, a prophet of their own, said "Cretans are always liars, evil beasts, lazy gluttons."
13 This testimony is true. [...]

Proofs: A Review

Recall the three types of proofs introduced last lecture:

Mathematical Induction

P(n) denotes a statement about a number p. For example:

P(n): n is divisible by 2.

Suppose we want to prove a statement about all n. Induction is an excellent tool for this task. There are two major steps:

  1. Base case: Show P(1) is true
  2. Inductive step: Show that \(\forall n \in \mathbb{N}\), \(P(n) \Rightarrow P(n+1)\)

This allows us to start from a simple case and move forward. More importantly, we are allowed to assume truth for some n, and move forward just one step to n+1.

Example: Bernoulli's Inequality

Bernoulli's Inequality places an upper bound on \((1+x)^n\).

\[ (1+x)^n \geq 1 + nx \]

Claim: P(n): \(\forall\) \(x \geq 0\), \((1+x)^n \geq 1 + nx\).

Base case. For all \(x \geq 0\), \((1+x)^1 \geq 1 + nx\).

Inductive step. Assuming P(n) is true, show that this implies P(n+1) is also true.

\begin{align} (1+x)^{n+1} &= (1+x)(1+x)^n \\ &\geq (1+x)(1+nx) \\ &= 1+x+nx+nx^2 \\ &\geq 1+(n+1)x \qquad\qquad\qquad\qquad \square \end{align}

Note that in the third line of the proof, we used the fact that \(nx^2\) is always possible for \(n \in \mathbb{N}\).

Functions and function classes

Definition: A function is an assignment rule from set to set; \(f: A \rightarrow B\), for every \(a \in A\), \(f(a) \in B\).


\begin{align} f(\mathbb{R})\,\,\, &\rightarrow \mathbb{R} \\ f(\mathbb{R}^n) &\rightarrow \mathbb{R} \\ f(\mathbb{C})\,\,\, &\rightarrow \mathbb{C} \\ f(\mathbb{R}^2) &\rightarrow \{0, 1\} \\ \end{align}

There are three main function classes:

Definition: injective. A function \(f\) is injective if:

\[ a_1 \neq a_2 \Rightarrow f(a_1) \neq f(a_2) \]

For example, \(x \rightarrow x^3\) is injective while \(x \rightarrow x^4\) is not.

Definition: surjective. A function \(f\) is surjective if for every \(b \in B\) \(\exists\) \(a \in A\) such that \(f(a) = b\). That is, "you cover everything".

Definition: bijective. A function is bijective if it is both injective and surjective.


The notion of "infinity" does not exist in math; it is simply shorthand for "larger than any number.

For example, saying that there are "infinitely many primes" simply means that for any finite number \(n \in \mathbb{N}\), it can be shown that there are \(n+1\) primes.

Remark. If I have two finite sets A, B, then they have the same number of elements if and only if there is a bijection.

Definition. Two sets have the same cardinality if there exists a bijection \(f: A \rightarrow B\). Note that here we used "if" instead of "iff" because it is a definition.

Galileo's Squares

Galileo wrote a list of positive integers and their squares. Note that there is a bijection between these two sets.

\[ f: \mathbb{N} \rightarrow \{n^2 | n \in \mathbb{N}\} \]

However, the set of squares is a strict subset of the set of natural numbers. So how could be the same size?

To answer this question, we must look towards Georg Cantor, who single-handedly established set theory.

Lecture 3; Countably infinite sets


Infinity and sets

Definition. Two sets A, B have the same cardinality if there exists a bijection \(f: A \rightarrow B\).

Definition. A set is countable if it has the same cardinality as \(\mathbb{N} = \{1, 2, ...\}\).


Note that the last three examples are strict subsets of the natural numbers, yet they have the same cardinality as the natural numbers.

Remark. The last two examples comprise Hilbert's Hotel.

Hilbert's Hotel

Mappings of countable sets

Statement. If A, B are both countable, then there exists a bijection between the two \(h: A \rightarrow B\).

insert drawing number 1 from notes here

Define a bijection \(f: A \rightarrow \mathbb{N}\) to the natural numbers. In doing so, we have 'numbered' each element of the set \(A\). Then, define a bijection \(g: B \rightarrow \mathbb{N}\) which maps the set \(B\) to the natural numbers. By doing this, we can refer to the elements in \(A\) and \(B\) as, for example, 'the element mapped to 17'. With these two functions, we can map from set \(A\) to \(B\) with \(g^{-1} \circ f\), where \(g^{-1}: \mathbb{N} \rightarrow B\) mapps from the natural numbers to \(B\).

Remark. Note that like \(g\), \(g^{-1}\) is also bijective.

In summary:

General remarks
Higher dimensions

Sets occupying higher dimensions are countable as well, even though the infinity is of "higher dimensionality". Take \(\mathbb{Z}^2\), for example.

Claim: There is a bijection between \(\mathbb{Z}^2\) and \(\mathbb{N}\).

To prove this is true, we just need to create a mapping rule. Do so by starting at the origin (or any arbitrary point) and draw a spiral outwards.

Smallest infinity

Are countable sets the smallest type of infinity can find?

Theoroem. If \(S\) is a countable set, and \(T \subset S\) is infinite, then there is a bijection \(f: S \rightarrow T\). Thus, they are the same "size" of infinity.

Seeing this is easy. Create a mapping \(g: S \rightarrow \mathbb{N}\). Use \(g^{-1}\) to map the elements of \(S\) to the natural numbers. Then, create a new mapping \(h\) from the natural numbers to \(T \subset S\).

Cantor's Diagonal Argument

Claim: The set of real numbers \((0, 1)\) is not countable.

Proof. By way of contradiction, assume that there exists a bijective mapping \(f: \mathbb{N} \rightarrow (0, 1)\).

If it were bijective, we would be able to list every real number. We just need to find something not in the listing.

Remark. Decimal expansions are not unique (why is this important?)

Cantor's argument: create a list of real numbers. Then, go down along the diagonal (take the number in the tenth's place for the first number, hundredth's place for the second number, ...). Replace every digit along the diagonal with another digit that is not 9. Use these digits to create a real number.

This new number is not in the list. It can't be the first, can't be the second... \(\square\)

As we can see, we now have two types of infinity:

Is there an infinity "in between"? The answer to this question is independent of math; the question is ill-posed. Paul Cohen received the Field's Medal in 1966 for this result.

Lecture 4; Set theory, sequences

Wednesday, January 27th

Cantor's diagonal argument, rehashed

Here we present a proof that follows the style of Cantor's diagonal argument from last lecture, but leads to a different result.

Theorem. If \(A\) is a countable set, then the set of all subsets is uncountable.

Proof. Assume, without loss of generality, that \(A = \mathbb{N} = \{1, 2, 3, ...\}\). If A is not the set of natural numbers, you can always create a bijection \(f: A \rightarrow \mathbb{N}\), as \(A\) is countable. Now, list the subsets of \(A\). A few are given below:

Subsets \ Digit 1 2 3 4 5 6 Identifier
1 x o x o x o Even integers
2 x o o x o x Prime numbers
3 x x o x x o Multiples of 3
4 o o o x o x Fibonacci numbers

In the above table, 'o' indicates that the number belongs to that subset, and 'x' indicates that it does not belong to that subset. Now, we will build a new subset.

Step 1. We assume, by way of contradiction, that the set of all subsets is actually countable.

Step 2. Then, we can put them into a list, like the table above. Every conceivable subset should be in this list.

Step 3. We now build a specific subset of the integers \(S\). How? \(n \in S\) if and only if \(n \notin A_n\). For example, we would not include 3 is \(S\) because 3 is in \(A_3\).

Step 4. \(S\) is somewhere in the list, because every conceivable subset is in the list.

Step 5. However, \(S\) cannot be in position \(k\), because if the kth number of the subset in position k is in \(A_k\), then it is not in \(S\), and vice-versa.

Step 6. \(S\) is not in the list.

This argument of "going down the diagonal" is sometimes referred to as "condensation of singularities".

This concludes Stefan's introduction to set theory.

Mathematical Foundations of Calculus: Sequences

What are sequences? Consider this:

\[ a_n: \{1, 2, 3, ...\} \rightarrow \mathbb{R} \]

So, a sequence is just a mapping of the natural numbers to the real numbers.


The German textbook from which Stefan learned convergence termed it "the second most important invention, after soap".

Definition. A sequence converges if:

\[ \lim_{n \rightarrow \infty} a_n = a \]

But what does this mean? Put another way, a sequence converges if for every \(\varepsilon > 0\) there exists \(N \in \mathbb{N}\) such that for all \(n \geq N\), \(|a_n - a| < \varepsilon\). Basically, for any arbitrarily small number that you pick, there is some term in the sequence after which the successive terms in the sequence are close to that number.


Why do we do this rather than comparing digits? Presumably, we can say that numbers are close if they have a very long succession of digits that are the same. Consider the following:

\[ \lim_{n \rightarrow \infty} 1 + \frac{1}{n} = 1 \]

Our definition of convergence allows us to show this easily. Observe:

For an arbitrarily small \(\varepsilon > 0\), if \(n \geq N\), then we have:

\begin{align} |(1+\frac{1}{n}) - 1| &< \varepsilon \\ \frac{1}{n} &< \varepsilon \\ \frac{1}{\varepsilon} &< n \end{align}

The statement is true, and we can even say:

\[ N = \left\lceil \frac{1}{\varepsilon} \right\rceil > \frac{1}{\varepsilon} \]

This \(n\) does the job. Often, we don't need to find \(n\), we just need to show its existence.


Suppose we have a sequence where \(a_1 = 5\); \(a_{n+1} = a_n \cdot q\); for \(|q| < 1\).

Claim. \(a_n = 5q^{n-1}\)

This claim can be proven by induction.

Base case. \(a_1 = 5\)

Inductive step.

\[ a_{n+1} = a_n q = (5q^{n-1})q = 5q^n \qquad \qquad \qquad \square \]

For every \(\varepsilon > 0\) there exists an \(N \in \mathbb{N}\) such that for all \(n \geq N\),

\begin{align} |5q^{n-1} - 0| &< \varepsilon \\ 5|q|^{n-1} &< \varepsilon \\ |q|^{n-1} &< \frac{\varepsilon}{5} \\ (n-1)\log(|q|) &< \log(\frac{\varepsilon}{5}) \\ n-1 &< \frac{\log(\frac{\varepsilon}{5})}{\log(|q|)} \end{align}

Definition. A sequence "diverges" if it does not converge. Some examples include \(a_n: (-1)^n\) and \(a_n: \mathbb{I}\{\text{n is prime}\}\).

Definition. A sequence "goes to infinity" or "diverges to infinity" if for any arbitrarily large integer \(M \in \mathbb{N}\) there exists some index \(N \in \mathbb{N}\) such that for all \(n > N\), \(a_n > M\).

Theorem. If you have a sequence of real numbers that converges to a \(a_n \rightarrow a\), then the sequence is bounded, which means that there exists an \(M\) such that \(|a_n < M\).


Pick \(\varepsilon = 1\). Because the sequence converges, there exists \(N \in \mathbb{N}\) such that for all \(n \geq N\), \(|a_n - a| < 1\).

\begin{align} |a_n - 1| &< 1 \\ \Rightarrow a_n &< a + 1 \\ \Rightarrow a_n &> a - 1 \end{align}

Outside this bound \((a-1, a+1)\), they may just around like crazy, but it is still a finite sequence. The statement holds for \(M = \max\{|a_1|, |a_2|, ..., |a_N|, |a-1|, |a+1|\}\).


Consider \(a_n = \frac{5}{n}\).

Since the sequence converges to zero, pick \(\varepsilon = 1\). We know that after a while, the sequence is trapped, but not at first.

So, we take the max over the absolute values to get a bound for all \(n\).


What about \(a_n = \frac{1}{n-1}\)? It goes to zero, but it isn't bounded!

That's because it is not a sequence. A sequence maps from the natural numbers to the real numbers, but \(a_n\) is not well-defined for \(a_1\). Infinity is not a real number.

Lecture 5; Convergence

Friday, January 29th

Recall that a sequence is a mapping \(a_n : \mathbb{N} \rightarrow \mathbb{R}\). The limits we saw in last lecture are very helpful for proofs.

Convergence of the sum

Theorem. If two sequences both converge, then the sequence of the sum of the terms converges. More concisely, if \(a_n \rightarrow a\) and \(b_n \rightarrow b\), then \(a_n + b_n \rightarrow a + b\).

Proof. The basic notion of this proof: if two real numbers are really close to two corresponding numbers, then their sum is really close to the sum of those two corresponding numbers.

We know that because \(a_n \rightarrow a\), for all \(\varepsilon > 0\), there exists an \(N_a\) such that for all \(n \geq N_a\), \(|a_{N_a} - a| \leq \frac{\varepsilon}{2}\). We can say the same about the existence of an \(N_b\).

We want to discern something about \(|a_n + b_n - (a + b)|\). We begin by rewriting it:

\begin{align} | (a_n + b_n) - (a + b) | &= |(a_n - a) + (b_n - b)| \\ &\leq |a_n - a| + |b_n - b| \\ \end{align}

And we know that the first term in the last statement is less than \(\frac{\varepsilon}{2}\) if \(n > N_a\), and the second term is less than \(\frac{\varepsilon}{2}\) if \(n > N_b\). As long as \(n\) is larger than both these numbers, we are good. If \(n\geq N_a \and n \geq N_b\), then \(|a_n + b_n - (a + b)| \leq \varepsilon\).

Sandwich lemma

Theorem. Suppose we have two sequences that converge to the same limit, and there is another sequence between the two:

  1. \(a_n \rightarrow a\), \(b_n \rightarrow a\)
  2. \(a_n \leq c_n \leq b_n\)

Then we know that the sequence between the two also converges to the same limit: \(c_n \rightarrow a\).

This is especially useful if we want to, for example, show that \(1 + \frac{\cos(n^5) e^{-1}}{n^2 + \sqrt{\pi}}\) converges. We can bound it from below with \(1\) and bound it from above with \(a + \frac{1}{n^2}\).

Proof. We must verify that \(c_n\) converges.

For all \(\varepsilon > 0\), we have to find an \(N\) such that for all \(n \geq N\), \(|c_n - a| \leq \varepsilon\). We can find an \(N_a\) such that for all \(n \geq N_a\), \(|a_{N_a} - a | \leq \varepsilon\), and a corresponding \(N_b\). So, the \(N\) that we wish to find is \(N = \max\{N_a, N_b\}\).

We know that \(b_n \geq a_n\). So, they are always trapped in some interval, and jumping around, but so that \(b_n\) is greater than or equal to \(a_n\).

Suppose by way of contradiction that \(n > \max\{N_a, N_b\}\), but \(c_n < a - \varepsilon\). Then proposition (2) says that \(a_n \leq c_n < a - \varepsilon\).

\begin{align} a_n &< a - \varepsilon \\ a_n - a &> \varepsilon \\ |a_n - a | &> \varepsilon \end{align}

This says that \(a_n\) is more that \(\varepsilon\) away from \(a\) for some \(n \geq N_c \geq N_a\), but \(N_a\) is the number after which \(a_n\) is always within \(\varepsilon\) of \(a\). Contradiction. \(\square\)

Two-stage Markov chains

This is an interesting example of sequences. Suppose you check somebody every minute, and see whether or not they are currently on a phone call. Keep a log. So, if x denotes not in a call and o denotes in a phone call, we might have:


This can be model by a fair coin toss, but it would not be very accurate. We could take the overall proportion of times they are in a phone call when we check, and model it as a Bernoulli(p). Still, this would not be accurate. As we see above, the states tend to cluster together. That's because if you're not in a call, you it's relatively rare that you enter a call; and if you're in a call, you tend to stay in the call. So the states tend to appear like this:

Where, perhaps, we could have \(p\) close to 1, and \(q\) high, but not as close -- perhaps 0.85.

It happens that this is rather good at modeling the weather and traversal through the internet. It also gives rise to very nice sequences.

Convergence of Markov chains


  1. We start at stage x.
  2. Let \(p_n\) be the likelihood of not being on the phone at minute \(n\), and let \(q_n\) be the likelihood of being on the phone at minute \(n\).
Iteration p q
1 \(1\) \(0\)
2 \(p\) \(1-p\)
3 \(p_2p + q_2(1-q)\) $p_2(1-p) + q_2q

At iteration 4, the same reasoning applies. So, in general, we can say by induction that:

At this point, we should check that \(p_n + q_n = 1\), to ensure that our equations are correct. The proof (by induction) plays out beautifully, and is left as an exercise for the reader.

Finally, we want to see whether this series (\(p_n\)) converges. What is \(\lim_{n \rightarrow \infty} p_n\)?

Lecture 6; Markov chains, square roots

Monday, January 29th

Markov chains, continued

Recall the two-state Markov chain from last class. We want to prove some sort of convergence. Recall the definition of convergence: if \(a_n \rightarrow a\), then \(\lim_{n \rightarrow \infty} a_n - a = 0\).

Assuming \(p_n\) and \(q_n\) converge, we can compute:

\[ \lim_{n \rightarrow \infty} p_{n+1} - p_n = 0 \] \[ \lim_{n \rightarrow \infty} q_{n+1} - q_n = 0 \] \[ \begin{bmatrix} p_{n+1} \\ q_{n+1} \end{bmatrix} - \begin{bmatrix} p_{n} \\ q_{n} \end{bmatrix} = \begin{bmatrix} 1-q & p \\ q & 1-p \end{bmatrix} \begin{bmatrix} p_n \\ q_n \end{bmatrix} - \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ \end{bmatrix} \approx \begin{bmatrix} 0 \\ 0 \end{bmatrix} \]

Square roots

Consider computing the following: $\( \sqrt{1 + \sqrt{1 + \sqrt{1 + ...}}} \)$

We can view this as a sequence, where \(x_1 = 1\) and \(x_{n+1} = \sqrt{1 + \sqrt{x_n}}\). Assuming the limit exists, we can compute it.

\(x_n\) and \(x_{n+1}\) should be close to each other. The only possible limit would satisfy: $\(x = \sqrt{1+\sqrt{x}}\)$

Aside. Convergence is like asking for directions to Toads. The limit is the place where people stop telling you to "go over there", because you have arrived.

Remark. When considering two-state Markov chains, the only possible limit is at the eigenvector with eigenvalue 1.

More square roots

Consider computing \(\sqrt{a}\) for some \(a > 0\). A nice method is to observe that a square of area \(a\) has sides of length \(\sqrt{a}\).

Therefore, a reasonable way of computing \(\sqrt{a}\) is to guess a value \(x_n\), compute \(\frac{a}{x_n}\), and update our estimate of \(sqrt{a}\) to be \(x_{n+1} = \frac{x_n + \frac{a}{x_n}}{2}\). This is known as Heron's method. As it turns out, this sequence converges to \(\sqrt{a}\), and it does so extremely quickly. Every iteration of the algorithm doubles the number of digits of precision.


If there is a limit to the sequence \(x_n\) defined above, it is \(sqrt{a}\). Furthermore, it satisfies:

\begin{align} x &= \frac{x + \frac{a}{x}}{2} \\ \Leftrightarrow 2x &= x^2 + a \\ \Leftrightarrow x^2 &= a \end{align}

Now, we show two elementary facts. First, the sequence will converge from above. That is, \(x_{n+1} \geq \sqrt{a}\).

\begin{align} x_{n+1} = \frac{x_n + \frac{a}{x_n}}{2} &\geq \sqrt{a} \\ x^2_n + a &\geq 2\sqrt{a}x_n \\ (x_n \sqrt{a})^2 &\geq 0 \end{align}

The second elementary fact is that if \(x_n > \sqrt{a}, then \)x_{n+1} < x_n$.

\begin{align} x_{n+1} = \frac{x_n + \frac{a}{x_n}}{2} &< x_n \\ x^2_n + a &< 2x^2_n \\ a &< x^2_n \\ x_n &> \sqrt{a} \end{align}

Thus, the sequence has the following properties:

  1. \(\forall\) \(n\in\mathbb{N}\), \(x_n \geq \sqrt{a}\)
  2. \(\forall\) \(n\in\mathbb{N}\), \(x_{n+1} \leq x_n\)

So, does it converge?

Yes, but the proof is difficult. We must assume an axiom about the real numbers.

Remark. When we have an estimate \(x_n\), we know that \(a \leq x_m \leq x_n\) for \(m > n\). However, we cannot be sure that the sequence converges to \(a\) rather than some other number within those bounds. To find it, we can conduct binary search.

Lecture 7; Cauchy sequences, series

Wednesday, February 3rd

Theorem. If a sequence \(x_n\) satisfies:

  1. \(x_{n+1} \leq x_n\) (always decreasing
  2. \(c \leq x_n\) (bounded from below by some constant)

then \(x_n\) converges.

Cauchy sequences

Definition. \(a_n: \mathbb{N} \rightarrow \mathbb{R}\) is Cauchy if for all \(\varepsilon > 0\) there exists an \(N\) such that for all \(n, m > N\), \(|a_n - a_m| < \varepsilon\). Note that this is a statement about the sequence, not the limit.


  1. Cauchy sequences converge.
  2. If a sequence converges, it is Cauchy.

We will prove the second statement first, directly.

Suppose \(a_n\) converges. Then, for all \(\varepsilon > 0\), there exists an \(N\) such that for all \(n > N\), \(|a_n - a| < \varepsilon\). In plain English, this means that for every \(\varepsilon\), we can cut off an initial part of he sequence and have the remainder \(a_{n+1}, a_{n+2}, ...\) be in the interval \((a - \varepsilon, a + \varepsilon)\). Because they are all in there, each point is close to the other points.

\begin{align} |a_n - a_m| = |a_n - a + a - a_m| &\leq |a_n - a| + |a - a_m| \\ &\leq \varepsilon + \varepsilon \\ &\leq 2\varepsilon \end{align}

What about a proof of statement one? This proof is hard. It requires the Axiom of Completeness, which depends on how you define the real numbers. The book does not touch it.

Proof of Theorem. Show that \(x_n\) is Cauchy. We use iterative refinement. At each point in time, we have an upper bound (the value of the sequence at that time) and a lower bound (constant \(c\)). So, we divide that interval into two parts. There are two cases:

  1. If there is an element of the sequence in the left part, all but finitely many are in there.
  2. There is no element in the left part.

Then, we repeat this process. This is sufficient to verify the Cauchy property. After repeating this process a few times, you have a tiny interval that has all the future elements inside.

Remark. A Cauchy sequence from elements in \(\mathbb{Q}\) need not have a limit in \(\mathbb{Q}\). Take, for example, this sequence, whose limit is \(\sqrt{2}\): \(\frac{1}{1}, \frac{14}{10}, \frac{141}{100}, ...\)


If we have \(a_n\), a sequence of real numbers, then we have have a series \(b_n = \sum_{k=1}^n a_k\). Note that this is just a special type of sequence.

Theorem. If \(\sum_{n=1}^\infty a_n\) converges, then \(a_n \rightarrow 0\).

Proof. If \(b_n \rightarrow b\), then \(b_{n + 1} - b_n \rightarrow 0\). Then, we have:

\[ b_{n+1} - b_n = \sum_{k=1}^{n+1} a_k - \sum_{k=1}^n a_k = a_n+1 \]

Note that we are adding up infinitely many things to get a finite sum.

Divergence of the harmonic series

Oresme gave an extremely elegant proof of the divergence of the harmonic series. Note that the first term, \(1 \geq \frac{1}{2}\). Then, note that the second term, \(\frac{1}{2} \geq \frac{1}{2}\). The third and fourth terms, \(\frac{1}{3} + \frac{1}{4} \geq \frac{1}{2}\). As you go down the series, you find infinitely many one-halves, so the sum goes to infinity.

More interesting series

Leibniz also gave a very elegant proof:

\[ \frac{1}{1 \cdot 2} + \frac{1}{2 \cdot 3} + \frac{1}{3 \cdot 4} + ... = 1 \]

How? Note that this series can equivalently be written as:

\[ \left(\frac{1}{1} - \frac{1}{2}\right) + \left(\frac{1}{2} - \frac{1}{3}\right) + \left(\frac{1}{3} - \frac{1}{4}\right) + ... \]

More mathematically, we would like to show that:

\[ \sum_{k=1}^n \frac{1}{k(k+1)} = 1 - \frac{1}{n+1} \]

which may be done by induction.

Lecture 8; Supremum and Infimum

Friday, February 5th Theorem. There are infinitely many prime numbers.

Proof. Suppose not. Then we can take the largest prime number and call it \(p_n\). Compute the product of all the prime numbers, and then add 1 to that product.

\[ P = p_1 p_2 p_3 ... p_n + 1 \]

Now, there are two cases for \(P\). If it is prime, then \(P\) is a prime that was not on our list. If \(P\) is not prime, then it is divisible by some prime \(p'\). However, \(p'\) is not any of the \(p_k\) for \(1 \leq k \leq n\), otherwise \(p'\) would be a divisor of \(1\), which is impossible. Either way, our original list was not comprehensive. Contradiction. \(\square\)

Proof. Suppose not. List the prime numbers \(p_1, ..., p_n\).

\begin{align} \infty > \prod_{k = 1}^n \frac{1}{1 - \frac{1}{p_k}} &= \prod_{k = 1}^n\left(1 + \frac{1}{p_k} + \frac{1}{p_k^2} + \frac{1}{p_k^3} + ... \right) \\ &= \left(1 + \frac{1}{2} + \frac{1}{4} + ...\right) \left(1 + \frac{1}{3} + \frac{1}{9} + ... \right) \left(1 + \frac{1}{5} + \frac {1}{25} + ...\right) ... \\ &= 1 + \frac{1}{2} + \frac{1}{3} + \frac{1}{4} + \frac{1}{5} + \frac{1}{6} + ... \end{align}

The last line is the harmonic sequence, which diverges to infinity. Therefore, \(p_n\) has to be infinite.

Supremum and Infimum

Suppose we have \(A \in \mathbb{R}\). A number \(b\) is an upper bound for a set \(A\) if \(x \leq b\) for all \(x \in A\). \(b_0\) is the least upper bound of \(A \) if \(b_0\) is an upper bound and \(b_0 \leq b\) for any other upper bound \(b\). Then we may say that \(b_0 := \sup A\).


Suppose we have \(A = \{0, \frac{1}{2}, \frac{2}{3}, \frac{3}{4}, ... \}\). For all \(a \in A\), \(a \leq 1\).

Remark. One is the smallest upper bound.

Suppose \(1 - \varepsilon\) were an upper bound as well. For all \(n\),

\begin{align} \frac{n}{n + 1} &\leq 1 - \varepsilon \\ 1 - \frac{n}{n + 1} &\leq 1 - \varepsilon \\ \varepsilon &\leq \frac{n}{n + 1} \end{align}

This fails for \(n\) big enough.


Theorem. If \(A\) is a set bounded above (\(\exists c \in \mathbb{R}\) \(\forall\) \(a \in A\) such that \(a \leq c\)), then \(\sup A\) exists. The argument here is very similar to the proof of the convergence of Cauchy sequences. Essentially, we take a point \(a_1 \in A\), and divide the interval between \(a_1\) and \(c\) in half. If there is an element in the right-hand interval, continue the process in that interval. If there are no elements, continue the search in the left-hand interval.


Consider the set \(A = \{0, \frac{1}{2}, \frac{2}{3}, \frac{3}{4}, ...\}\). We know that for all \(a \in A\), \(a \leq 1\). We claim that this is the smallest upper bound.

Proof. By contradiction.

Suppose that \(1 - \epsilon\) is a smaller upper bound. This would mean that for all \(n\), we would have:

\begin{align} \frac{n}{n + 1} &\leq 1 - \varepsilon \\ 1 - \frac{n}{n+1} &\leq 1 - \varepsilon \\ \varepsilon &\leq \frac{n}{n + 1} \end{align}

And this fails for \(n\) big enough.

Lecture 9; Supremum, infimum, Bolzano-Weierstrass

Monday, February 8th

Supremum and Infimum, continued

Last time, we established an upper bound \(c\) for a set \(A\) by specifying that if we had a set \(A \subset R\), then for all \(a \in A\), we have \(a \leq c\).

Definition: Supremum. A supremum is the smallest upper bound. This replaces the maximum for infinite sets.

We require the concept of a supremum because with an infinite set, there is no largest element. Consider the set \(A = \{\frac{1}{2}, \frac{2}{3}, \frac{3}{4}, ...\}\). In this case, we have \(\sup A = 1\).

Definition: Infimum. An infimum is the largest lower bound.

Claim. If we have \(A \subset \mathbb{R}\), and \(-A = \{-a: a \in A\}\), then \(-\inf A = \sup (-A)\).

Just draw a diagram. We reverse the order of the points. The great thing about this proof is that if we have proofs for the supremum, the same follows for the infimum.

Theorem. For \(A, B \in \mathbb{R}\), for all \(a \in A\), \(b \in B\), if we have \(a \leq b\), then \(\sup(A) \leq \inf(B)\).

Proof. From our assumption, we have for all \(a, b\), \(a \leq b\).

Let us fix \(b^\star \in B\). That means that for all \(a \in A\), \(a \leq b^\star\).

That means that \(b^\star\) is an upper bound for \(A\). Then, this upper bound \(b^\star\) is by definition greater than or equal to the smallest upper bound. So, \(\sup(A) \leq b^\star\).

Now, note that the choice of \(b^\star\) was arbitrary. This argument works for all \(b^\star\). So, the supremum of \(A\) is a lower bound for \(B\); hence, it is certainly smaller than the largest possible lower bound. \(\square\)

Bolzano-Weierstrass Theorem

Recall that a sequence is a mapping \(a: \mathbb{N} \rightarrow \mathbb{R}\). Let us define a subsequence \(a_{n_k}\).

\(n_k = n(k): \mathbb{N} \rightarrow \mathbb{N}\) is a mapping that is strictly increasing. It essentially provides a rule for selecting which elements of the sequence \(a_n\) to include in the subsequence \(a_{n_k}\), and does so by specifing the indices of the elements we want.

For example, suppose \(a_n = n\). Then, the subsequence \(a_{n^2}\) would have elements \(1, 4, 9, 16, ...\).

Theorem: Bolzano-Weierstrass. If \(a_n\) is a sequence in a bounded interval, then there exists a convergent subsequence.

Suppose we had the sequence \(a_n\) with elements \(\frac{1}{4}, \frac{3}{4}, \frac{1}{4}, \frac{3}{4}, ...\). Clearly, it does not converge. However, it is bounded! So, there is a convergent subsequence, and we indeed find one: \(a_{2n - 1}\) converges to \(\frac{1}{4}\).


Step 1. Take a fixed number of initial elements, say 5. We consider \(a_1, a_2, a_3, a_4, a_5\) explicitly. They do not converge, Now, we bisect the interval in which they exist.

Step 2. Now, at least one of the two intervals has infinitely many terms of the sequence.

Step 3. Pick the interval that has infinitely more terms. Now, pick the next 5 elements after \(a_5\) that exist in that interval.

Step 4. Repeat steps 2 and 3.

This construct produces a Cauchy sequence, which we know converge. After some number of iterations, all the elements will be in tiny intervals (see the formal definition of a Cauchy sequence).

An interesting application

Let us consider a bounded sequence derived from \(\pi\). Consider:

\begin{align} \pi &= 3.1415926535... x_1 &= 0.1415926535... x_2 &= 0.415926535... x_3 &= 0.15926535... \end{align}

So, our rule is \(x_1 = \pi - 3\), and \(x_n = 10x_{n - 1} - \lfloor 10x_{n-1} \rfloor\).

The Bolzano-Weierstrass theorem implies that there exists a subsequence such that \(x_{n_k}\) converges. So, there exists arbirtrarily long string combinations that repeat infinitely many times in \(\pi\).

Lecture 10; Continuity

Wednesday, February 10th

Continuity. What is it? A property of a function? Not quite. It is a local property.

Definitions of continuity

Sequence definition. We may say \(f: \mathbb{R} \rightarrow \mathbb{R}\) is continuous in \(b^\star\) if for all sequences \(x_n \rightarrow b^\star\), it is true that \(f(x_n) \rightarrow f(b^\star)\).

Epsilon-Delta definition. We may say that \(f\) is continuous in \(b^\star\) if for all \(\varepsilon > 0\) there exists a \(\delta\) such that \(|x - b^\star| < \delta\) implies \(|f(x) - f(b^\star)| < \varepsilon\).

These definitions are useful in different contexts, so it's nice to have them both.

A strange example

Let us consider a strange example of continuous functions. Suppose we have a function \(f: \mathbb{R} \rightarrow \mathbb{R}\).

\begin{align} f(x) &= \begin{cases} 0 &\text{if } x = 0 \\ 0 &\text{if } x \text{ irrational} \\ \frac{1}{|q|} &\text{if } x = \frac{p}{q} \text{ where } \gcd(p, q) = 1 \end{cases} \end{align}

Note that we may say that \(|f(x)| \leq |x|\). The interesting part is that this function is continuous in zero.

We use the fact that \(0 \leq |f(x)| \leq |x|\), and the \(\varepsilon\)-\(\delta\) definition of continuity.

Proof. Choose \(\delta = \varepsilon\). Suppose \(|x| < \varepsilon\). Then, we have \(|f(x)| \leq |x| < \varepsilon\). It follows that \(|f(x) - f(0)| < \varepsilon\). This concludes our proof. \(\square\)

Polynomials are continuous

We now present some lemmas that allow us to conclude that polynomials are continuous.

Lemma A. If \(f, g\) are continuous in \(b^\star\), then so is \(f + g\).

Proof. Suppose \(x_n \rightarrow b^\star\). Then we have \(f(x_n) \rightarrow f(b^\star)\), and \(g(x_n) \rightarrow g(b^\star)\). It follows, then, from the algebraic limit theorems that \(f(x_n) + g(x_n) \rightarrow f(b^\star) + g(b^\star)\). \(\square\)

Lemma B. If \(f, g\) are continuous in \(b^\star\), then so is \(f \cdot g\).

Proof. Suppose \(x_n \rightarrow b^\star\). Then we have \(f(x_n) \rightarrow f(b^\star)\), and \(g(x_n) \rightarrow g(b^\star)\). It follows, then, from the algebraic limit theorems that \(f(x_n) \cdot g(x_n) \rightarrow f(b^\star) \cdot g(b^\star)\). \(\square\)

Lemma C. If a function is continuous in \(b^\star\) and \(c \in \mathbb{R}\) is some constant, then \(c \cdot f\) is also continuous in \(b^\star\).

Proof. \(\lim c\cdot a_n = c\lim a_n \rightarrow c \cdot f(b^\star)\)

Lemma D. The function \(f(x) = x\) is continous.

Proof. Suppose \(x_n \rightarrow b^\star\). Then \(f(x_n) \rightarrow f(b^\star)\). That's all Stefan wrote. Eh.

Theorem. Polynomials are continous.

Proof. This follows from Lemmas A through D. Using Lemma A, we can treat each of the terms separately. Using Lemma B, we can treat terms raised to a power greater than 1 by decomposing them into a product of order-one terms. The coefficients are covered by Lemma C, and the order-one terms are covered by Lemma D. \(\square\)

Lecture 11; Continuity, continued, intermediate value property, Borsuk-Ulam Theorem

Friday, February 12th

Continuity of \(e^x\)

This is an example where the \(\varepsilon\)-\(\delta\) definition of a limit is helpful. We claim that \(e^x\) is continous. We will use the following definition of \(e^x\):

\[ e^x := \sum_{k = 0}^\infty \frac{x^k}{k!} = 1 + x + \frac{x^2}{2} + ... \]

Because polynomials are continuous, each term of the sum converges independently. We may hope that \(e^{x^\star} = 1 + x^\star + \frac{x^{2\star}}{2} + ...\) converges, but this proof does not work. The sum of infinitely many continuous functions need not be continuous. Why not?

Let us consider the following function:

\begin{align} f_n(x) &= \begin{cases} x^n &\text{ if} 0 \leq x \leq 1 \\ 1 &\text{ if} x \geq 1 \\ 0 &\text{ otherwise} \end{cases} \end{align}

Now, let us consider \(g_n(x) = f_{x + 1}(x) - f_n(x)\). This function is continuous.

Now, let us make the following non-rigorous statement:

\[ f_1(x) + \sum_{n = 1}^\infty g_n(x) \text{"}=\text{"} f_\infty(x) \]

This function \(f_{\infty}(x)\) is no longer continuous. We put quotation marks around the equals sign because it is ill-defined. The sum of infinite polynomials does not work as an argument.

Fortunately, we may turn to the \(\varepsilon\)-\(\delta\) definition of continuity.


Our proof consists of two main steps:

  1. Demonstrate continuity around \(x = 0\).
  2. Use the identity \(e^{x + h} - e^x = e^x(e^h - 1)\).

Continuity everywhere else follows.

\begin{align} |e^h - 1| &= \left|h + \frac{h^2}{2} + \frac{h^3}{6} + ... \right| \\ &\leq |h| + \left|\frac{h^2}{2}\right| + ... \\ &\leq |h| + |h|^2 + ... \\ &= \frac{|h|}{1 - |h|} \\ &\leq 2|h| \end{align}

Assume \(|h| \leq \frac{1}{2}\), if we want \(|e^h - 1| < \varepsilon\), just take \(\delta \leq \frac{\varepsilon}{2}\).

So, if we would like to have \(|e^{x^\star}(e^h - 1)| \leq \varepsilon\), we simply need to select \(h < \frac{\varepsilon}{2e^{x^\star}}\). \(\square\)


The previous proof on the continuity of \(e^x\) may have been rather confusing. Here's a way of understanding it, and the \(\varepsilon\)-\(\delta\) criterion for continuity, more intuitively.

Alice and Bob are walking down the street, pondering the continuity of \(e^x\). Bob says, if we use the identity \(e^{x + h} - e^x = e^x(e^h - 1)\), then proving continuity at \(x = 0\) means that we have proven continuity everywhere. To help Bob get started, Alice says, "I will pick a \(\varepsilon\), any \(\varepsilon > 0\), and you must be able to supply me with a \(\delta\) which ensures that if \(h\) is within \(\delta\) of \(0\), then \(e^h - 1\) is within \(\varepsilon\) of \(0\)."

Bob assents. He says, "ah, I see. If that's the case, then I should set about finding a \(\delta\) that bounds \(h\) properly. I will work backwards. I know that \(|e^h - 1| < \varepsilon\) must be true, so I need to take this information into account to find an \(h\) that ensures this to be true." He spends a while thinking. "Okay," he says. "I've discovered that if we constrain \(|h| \leq \frac{1}{2}\), then we can guarantee \(|e^h - 1| \leq 2|h|\).

"Nice!" Alice responded. "With that bound on \(|e^h - 1|\), if you can ensure that \(2|h| < \varepsilon\), then you have can guarantee that \(|e^h - 1| < \varepsilon\) as well! All you need to do is specify \(\delta = \frac{\varepsilon}{2}\); if you do that, then you will have \(|h|\) small enough."

Some theorems on bounded functions

Theorem. A continuous function on an interval \([a, b]\) is bounded from above and below. We will only show that \(f\) is bounded from above, as the argument for bounding from below is symmetric (take \(-f\)).

Proof. Suppose not. Then for every point in the function you name, you can always choose a point that is bigger. Then, there exists \(x_n \in [a, b]\) such that \(f(x_n) > n\). This is a sequence in \([a, b]\). By the Bolzano-Weierstrass Theorem, there exists a convergent subsequence \(x_{n_k} \rightarrow c\). The continuity of \(f\) implies \(f(x_{n_k} \rightarrow c\). However, this does not work, because \(f(c)\) is a fixed number. But, \(f(x_{n_k})\) is a sequence that keeps increasing. We have \(f(x_{n_k}) > n_k \rightarrow \infty\). It cannot converge. Contradiction. \(\square\)

Discussion. But what about \(f(x) = \frac{1}{x}\) on \((0, 1]\)? This is a continuous function on an inteval, but it is not bounded from above. Clearly, we need the endpoints (zero is excluded here). But why? It turns out that the subsequence may converge to something that's not in the interval. In fact, the sequence itself converges \(x_n \rightarrow \frac{1}{n}\). That's why you need the endpoints.

Theorem. A continuous function on [a, b] assumes its \(S = \sup \{f(x) | a \leq x \leq b\}\). ("Assumes" here means that it takes that value.)

Proof. We want to show that for an arbitrary \(\varepsilon > 0\), there exists a \(a \leq x^\star \leq b\) that fulfills \(f(x^\star) \geq S - \varepsilon\). If this were not true, then \(S - \varepsilon\) would be the supremum. Recall that a supremum is the smallest upper bound possible.

Now, we find a sequence \(x_n\) in \([a, b]\) such that \(f(x_n) \geq S - \frac{1}{n}\). Then, we have \(S \geq f(x_n) \geq S - \frac{1}{n}\).

We then apply the Bolzano-Weierstrass Theorem. There is a subsequence \(x_{n_k}\) that converges to \(c\). We now have \(S \geq f(x_{n_k}) \geq S - \frac{1}{n_k}\), and \(f(x_{n_k}) \rightarrow c\).

By the sandwich lemma, we may conclude that \(f(c) = \lim_{k \rightarrow \infty} f(x_{k_k}) = S\). \(\square\)

Intermediate Value Property

This is a property for continuous functions \(f\) on the interval \([a, b]\).

Statement. If we have \(f(a) < y < f(b)\), then there exists \(a \leq x \leq b\) for which \(f(x) = y\).

Remark. We will not prove this, but the number of solutions is always odd.

This property is useful for showing the existence of solutions. For example, suppose wish to know whether or not \(\cos(x) = x\) has a solution.

We define a function \(h(x) = \cos(x) - x\). Observe that \(h(0) = 1 > 0\), and that \(h(2\pi) = 1 - 2\pi < -5\). By the Intermediate Value Property, there exists an \(x^\star\) such that \(\cos(x^\star) = x^\star\).

Borsuk-Ulam Theorem

Here we present a special case of the theorem, which is possible through the intermediate value property.

Theorem. There exists at any time two points on the equator that are antipodal with the same temperature. This can also be said for barometric pressure, humidity -- any continuous function. So, if you had a ring, and a continuous function on \(2\mathbb{D} = \{x: |x| = 1\}\), then the equation \(f(x) = f(-x)\) always has a solution. (The ring in this case is the unit circle.)

Proof. Suppose not. Pick any number \(x^\star\). Then \(f(x^\star) \neq f(-x^\star)\).

Now, consider \(h(x) = f(x) - f(-x)\). This function is continous on the boundary of the unit disk. Then, we have:

\begin{align} h(x^\star) &= f(x^\star) - f(-x^\star) \neq 0 \\ h(-x^\star) &= f(-x^\star) - f(x^\star) = -h(x^\star) \end{align}

Observe that because when \(x^\star\) went to \(-x^\star\), \(h(x^\star)\) went to \(-h(x^\star)\), then the function must have crossed zero at some point. Contradiction. \(\square\)

Remark. This problem depends on its geometry. It's a beautiful instance in which geometry and analysis meet. Though left unproven here, it generalizes to higher dimensions.

Lecture 12; Intermediate value theorem, uniform continuity

Monday, February 15th

Intermediate value theorem

Recall from last class the Intermediate Value Theorem. It is useful for studying systems which are highly nonlinear. A nice application of it is the Borsuk-Ulam Theorem from last class. Today, we will prove the intermediate value theorem.

Motivation. Suppose we have a function \(f(x) = x^2 - 2\), where we know \(f(0) < 0\), and \(f(2) > 0\). Then, we wish to show that there exeists an \(x \in \mathbb{R}\), \(0 < x < 2\), such that \(f(2) = 0\).


Step 1. Consider the set \(S = \{a \leq x \leq b: f(x) < y\}\). We know that is not empty, because we have at least \(a \in S\).

Step 2. Observe that the set \(S\) is bounded from above by \(b\). Therefore, a finite smallest upper bound \(c = \sup S\) exists.

Step 3. The existence of \(c\) implies that for all \(\varepsilon > 0\), there exists \(h \in S\) such that \(c - \varepsilon < h < c\). This just says that there are elements near \(c\), because it is the supremum.

Step 4. This allows us to construct a sequence \(x_n \in S\), \(x_n \rightarrow c\), by choosing subsequently smaller \(\varepsilon\).

Step 5. We know that \(f(x_n) < y\) for every \(n\).

Step 6. \(\lim_{n\rightarrow \infty} f(x_n) = f(c)\). This is just the definition of continuity.

Step 7. \(f(c) \leq y\).

Now, suppose by way of contradiction that \(f(c) < y\). By the \(\varepsilon\)-\(\delta\) definition of continuity, there exists a \(\delta > 0\) such that \(f(z) < y\) for all \(z \in (c - \delta, c + \delta)\). This contradicts \(c = \sup s\) beacuse it implies there are elements in \(S\) (the entire \((c - \delta, c + \delta)\)) that are bigger than \(c\). Basically, we can have elements \(\delta\) away from \(c\) in either direction.

Thus, we may conclude that \(f(c) = y\). \(\square\)

Uniform continuity

Recall the definition of continuity on a domain \(D\), where \(D\) be \((a, b)\), \([a, b]\), or \(\mathbb{R}\). We may say that a function is continuous on a domain \(D\) if, for all \(x \in D\), and given any arbitrarily small \(\varepsilon > 0\), there exists and we may find a \(\delta > 0\) such that for all \(y\), we have \(|x - y| < \delta\) implies \(|f(x) - f(y)| < \varepsilon\).

Definition: Uniform continuity. A function \(f\) is uniformly continuous on a domain \(D\) if for all \(\varepsilon > 0\) there exists a \(\delta > 0\) such that for all \(x \in D\) and for all \(y \in D\), we have \(|x - y| < \delta\) implies \(|f(x) - f(y)| < \varepsilon\).

The difference between continuity and uniform continuity is subtle, but it exists. In normal continuity, the choice of \(\delta\) depends on the \(x\) at which we wish to prove continuity. In uniform continuity, the choice of \(\delta\) is made regardless of which \(x\) at which we wish to prove continuity, because it must hold true for all \(x\).

In short, uniform continuity says that a function does not go crazy.


Suppose we wish to prove that \(f(x) = x^2\) is uniformly continuous.

We want to be able to select a \(\delta > 0\) such that if \(|x^* - x| < \delta\), the following holds for any \(\varepsilon < 0\).

\begin{align} |f(x^*)- f(x)| &= |x^{*^2} - x^2| \\ &= |x^* + x||x^* - x| \\ &< \varepsilon \end{align}

But at this point, we observe that if we want the above to hold, then we must set \(|x^* - x| \leq \frac{\varepsilon}{|x^* + x|}\). As we can see, the requisite \(\delta\) does depend on the choice of \(x\). Therefore, \(f(x) = x^2\) is not uniformly continuous.

More remarks on uniform continuity

Theorem. If \(f:[a, b] \rightarrow \mathbb{R}\) is continuous, then it is uniformly continuous.

Proof. Suppose not. Then there exists \(\varepsilon > 0\) such that for all \(\delta > 0\), there exists \(x, y \in [a, b]\), we have \(|x - y| < \delta\) and \(|f(x) - f(y)| > \varepsilon\).

We want to fail to prove this claim, and we will do so by leveraging the fact that it must be true for all \(\delta\).

Imagine we have \(\varepsilon = 1\). We will choose subsequently smaller \(delta > 0\) with the rule that \(\delta_n = 2^{-n}\). For each subsequent \(\delta\), the distance \(\varepsilon\) is the same. Once we can prove that there are \(\delta > 0\) such that the jump between \(f(x)\) and \(f(y)\) is smaller than \(\varepsilon\), then we are done.

Now, we employ the Bolzano-Weierstrass Theorem! We have \(x_{n_k} \rightarrow c\). Then \(|x_{n_k} - y_{n_k}| \leq \frac{1}{2^{n_k}}\), so we also have \(y_{n_k} \rightarrow c\). This implies that \(f(x_{n_k}) \rightarrow f(c)\) and \(f(y_{n_k}) \rightarrow f(c)\). We thus have \(\lim_{k \rightarrow \infty} |f(x_{n_k}) - f(y_{n_k})| = 0 \geq \varepsilon\), which violates the claim that it is actually supposed to be \(\geq \varepsilon > 0\). Contradiction. \(\square\)

Lecture 13; Lipschitz continuity, Riemann sums

Wednesday, February 17th

Lipschitz continuity

Definition. A function \(f: \mathbb{R} \rightarrow \mathbb{R}\) is Lipschitz continuous with Lipschitz constant \(\mathcal{L}\) (\(0 \leq \mathcal{L} < \infty\)) if, for all \(x, y \in \mathbb{R}\), \(|f(x) - f(y)| \leq \mathcal{L}|x - y|\).

Here are some examples of Lipschitz continuous functions:

\(f\) \(\mathcal{L}\)
\(f(x) = x\) \(1\)
\(f(x) = \sin(2x)\) \(2\)
\(f(x) = x^2\) not Lipschitz continuous

Theorem. Lipchitz continuity implies uniform continuity.

Proof. \(\delta = \frac{\varepsilon}{\mathcal{L}}\).

Note that the concept of Lipschitz continuity is philosophically very similar to differentiation. Think of it as the following: \(\frac{|f(x) - f(y)|}{x - y} \leq L\).

In fact, we know that if \(\lim_{y \rightarrow x} \frac{|f(x) - f(y)|}{x - y}\) exists, then \(f'(x) \leq L\).

Integration: an introduction

We know that at the Italians and at least one Babylonian knew about integration. What type of integrals are out there?

We begin by focusing on the Riemann integral.

Riemann integral

We are familiar with the idea that \(\int_a^b f(x) dx\) denote the area under the curve \(f(x)\) between \(a\) and \(b\). A natural way to approximate this number is by splitting this interval into smaller partitions \(a = x_0 < a_1 < ... < x_n = b\). We may now define upper and lower sums:

\begin{align} L_P &= \sum_{k = 1}^n (x_k - x_{k - 1})\inf\{f(x) | x_{k - 1} \leq x \leq x_k\} \\ U_P &= \sum_{k = 1}^n (x_k - x_{k - 1})\sup\{f(x) | x_{k - 1} \leq x \leq x_k\} \end{align}

Lemma. Suppose \(P = \{\text{some points}\}\) and \(Q = P \cup \{\text{some more points}\}\). Then we find that \(L_Q \geq L_P\) and \(U_Q \leq U_P\).

Proof. There are two cases here:

  1. We do not add a point; that is \(Q = P\). Then the computation is the same.
  2. We add one or more points. Then we make it slightly more precise.

It can be shown that if \(A \subset B\), then \(\inf\{f(x) | x \in A\} \geq \inf\{f(x) | x \in B\}\).

Lemma. Let \(P\) and \(Q\) be arbitrary partitions. Then, \(L_P \leq U_Q\).

Proof. Here, we use the previous lemma.

\[ L_P \leq L_{P \cup Q} \leq U_{P \cup Q} \leq U_Q \]

If the sum exists, the lower sum is always lower than than the area. That's why this is so useful.

This implies that \(\sup\{L_P | P \text{ partition}\} \leq \inf\{U_Q | Q \text{ partition}\}\). Recall that if \(a \leq b\) for all \(a \in A, b \in B\), then \(\sup A \leq \inf B\).

Definition. If for two arbitrary partitions \(P\) and \(Q\), \(\sup\{L_P|P \text{ partition}\} = \inf\{U_Q|Q \text{ partition}\}\), then \(f\) is Riemann integrable, on \([a, b]\), and \(\int_a^b f(x) dx = \inf(U_Q) = \sup(L_P)\). Stefan calls this the "magic number in the middle". Note that in this definition, there is no notion of area.

Example: Not Riemann integrable
\begin{align} f(x) = \begin{cases} 1 &\text{ if } x \in \mathbb{R} \setminus \mathbb{Q} \\ 0 &\text{ if } x \in \mathbb{Q} \end{cases} \end{align}

Suppose we consider \(\int_0^1 f(x) dx\). The computation for all arbitrary partitions \(P, Q\) would be \(L_P = 0\), \(U_Q = 1\). For every small interval, there is an irrational number and a rational number, causing this behavior. But the integral should exist, with the value of 1. Although this function is not Riemann integrable, it is integrable. To integrate it, we should look towards Lebesegue integration.

Example: Riemann integrable

Suppose we consider \(\int_0^1 x dx\). We know that the answer is \(\frac{1}{2}\), but is it Riemann integrable?

\[ \sum_{k = 1}^n \frac{k}{n^2} = \frac{1}{n^2} \sum_{k = 1}^n k \]

Lecture 15; Riemann sums, improper integrals

Monday, February 22nd

Riemann sums, reviewed

Recall the definition of upper and lower sums from last lecture. If we have:

\[ \sup_p \texttt{lower sum} = \inf_p \texttt{upper sum} \]

Then the value on either side is the Riemann integral.

Theorem. Continuous functions on closed intervals are Riemann integrable.

Theorem. It does not matter which partitions you take, as long as the largest element goes to zero.

Theorem. If \(f: [a, b] \rightarrow \mathbb{R}\) is monotone and bounded, then it is Riemann integrable.

Let us reflect on that last theorem. Suppose we had a function that was monotone and bounded, but not continuous. Normally, this would be bad, because the upper and lower do not match; however, it is okay, but the width goes to "zero". If it is bounded and monotone, it cannot have many jumps. On the other hand, if it were not monotone, it could have a wild number of jumps.

Improper integrals

Suppose you wanted to integrate \(\int_0^1 \frac{1}{\sqrt{x}} dx\). With our definition of Riemann integrals, we cannot do this, since for the leftmost interval, the infimum would never equal the supremum. We would always get infinity at zero for \(\texttt{Upper} = \sum_{k = 1}^n (\sup_{I_k} f) |I_k|\).

Instead, we can do the following:

\begin{align} \int_\varepsilon^1 \frac{1}{\sqrt{x}}dx &= 2\sqrt{x}|_\varepsilon^1 \\ &= 2 - 2\sqrt{\varepsilon} \\ &\overset{\varepsilon \rightarrow 0}{\longrightarrow} 2 \end{align}

Definition. If \(f: (a, b] \rightarrow \mathbb{R}\) and \(\lim_{\varepsilon \rightarrow 0+} \int_{a + \varepsilon}^b f(x) dx\) exists, then we may define:

\[ \int_a^b f(x) dx = \lim_{\varepsilon \rightarrow 0+}^b f(x) dx \]
\begin{align} \int_{\varepsilon}^1 \frac{1}{x^\alpha} dx &= \frac{x^{-\alpha + 1}}{1 - \alpha}\rvert_\varepsilon^1 \\ &= \frac{1}{1 - \alpha} - \frac{1}{1 - \alpha}\frac{1}{\varepsilon^{\alpha - 1}} \end{align}

If \(\alpha < 1\), then \(\int_0^1 \frac{1}{x^\alpha} dx = \frac{1}{1 - \alpha}\). If \(\alpha = 1\), then \(\int_\varepsilon^1 \frac{1}{x^\alpha} dx = \log(x)\rvert_\varepsilon^1 = -\log(\varepsilon)\), which is bad as \(\varepsilon\) approaches zero.

Lecture 16; Improper integrals, cont., differentiability

Last lecture, we considered the statement \(\int_0^1 \frac{1}{\sqrt{x}} dx = 2\), which is a true statement, although the integral on the left cannot be computed via a Riemann integral. To resolve this, we introduced the concept of an improper integral.

Now, suppose you wanted \(\int_0^1 \frac{\cos(x)}{\sqrt{x}} dx\). How would you show that the limit \(\lim_{\varepsilon \rightarrow 0+} \frac{\cos(x)}{\sqrt{x}}dx\) exists?

Proof. We argue that the limit exists via continuity.

Saying that the \(\lim_{\varepsilon \rightarrow 0+} \int_\varepsilon^1 f(x) dx\) exists is equivalent to saying that \(\int_{\varepsilon}^1 f(x) dx\) is continuous in \(\varepsilon = 0\). This is now a function \(F(\varepsilon)\) of \(\varepsilon\).

We use a one-sided form of continuity, which may be succinctly stated as: for all \(\varepsilon > 0\), there exists a \(\delta > 0\) such that for all \(a, b \in (0, \delta)\), \(\left|\int_a^1 f(x) dx - \int_b^1f(x)dx\right| < \varepsilon\), which may be rewritten as \(\left|\int_a^b f(x) dx \right| < \varepsilon\).


Consider \(\int_0^1 \frac{\cos(x)}{\sqrt{x}} dx\). We wish to show that for all \(\varepsilon > 0\), there exists a \(\delta > 0\) such that for all \(a < b \in (0, \delta)\), \(\left|\int_a^b \frac{\cos(x)}{\sqrt{x}} dx \right| < \varepsilon\).

\begin{align} \left| \int_a^b \frac{\cos(x)}{\sqrt{x}} dx \right| &\leq \int_a^b \frac{|\cos(X)|}{\sqrt{x}}dx \\ &\leq \int_a^b \frac{1}{\sqrt{x}} \\ &= 2\sqrt{b} - 2\sqrt{a} \\ &\leq 2\sqrt{b} \\ &\leq 2\sqrt{\delta} \\ &< \varepsilon \end{align}

We don't need to compute \(\delta\), we just need to make sure that it is small enough.


Consider \(\int_0^1 \frac{1}{x^2}dx\). This is not Riemann integrable, as you may not write down partitions from one to infinity.

However, if we rewrite it with a limit, we are able to derive an expression that is Riemann integrable.

\begin{align} \int_1^\infty \frac{1}{x^2}dx &= \lim_{N \rightarrow \infty} \int_1^N \frac{1}{x^2}dx \\ &= \lim_{N \rightarrow \infty} \frac{-1}{N} + 1 \\ &= 1 \end{align}

As we can see, the improper integral works.

Proof. For all \(\varepsilon > 0\), there exists an \(M\) such that for all \(a, b \in (M, \infty)\), \(\lvert \int_a^b f(x) dx \rvert < \varepsilon\).

As an aside, here is one type of integral that Riemann can do, but Lebesgue cannot. Consider:

\[ \int_0^\infty \sin(x^2) dx \]

The oscillations get faster, and the bumps cancel out, apparently.


Differentiation, like continuity, is also a local property.

We may say that \(f: \mathbb{R} \rightarrow \mathbb{R}\) is differentiable in \(x_0\) if \(\frac{f(x_0 + h) - f(x_0)}{h}\) as a function of \(h\) is continuous in zero.

\[ f'(x_0) = \lim_{h \rightarrow 0} \frac{f(x_0 + h) - f(x_0)}{h} \]

Theorem. If \(f\) is differentiable in \(x_0\), then it is continuous in \(x_0\).

Proof. We want to show that \(f(x_0 + h) - f(x_0)\) is very small, So we can restate it as \(\frac{f(x_0 + h) - f(x_0)}{h} \cdot h\), and say that \(h\) gets sufficiently small...

Continuous but not differentiable

Let us consider a classic example from Weierstrass. Take \(\sin(x)\), and then add another sine that is smaller in height but larger in oscillation. Repeat. Ultimately, you get a very jagged sine curve that is represented as:

\[ \sum_{n = 1}^\infty \frac{\sin(x 4^n)}{2^n} \]

This function is continuous, but not differentiable. If we were to naively differentiate it, we would get \(\sum_{n = 1}^\infty \frac{\cos(x 4^n) 4^n}{2^n}\), which 'equals' \(\sum_{n = 1}^\infty \cos(x 4^n) 2^n\).

In fact, most continuous functions are not differentiable (Baire Category Theorem).

Some theorems on differentiability

Theorem. If \(f, g\) are differentiable, then for all \(\alpha, \beta \in \mathbb{R}\), \(\alpha f + \beta g\) is differentiable, with \((\alpha f + \beta g)' = \alpha f' + \beta g'\).

Theorem. If \(f, g\) are differentiable, then \((f \cdot g)' = f'g + fg'\).


\begin{align} \frac{f(x + h)g(x + h) - f(x)g(x)}{h} &= \frac{f(x + h) - f(x)}{h}g(x + h) + \frac{g(x + h) - g(x)}{h} f(x) \end{align}

We want to show that this is continuous as \(h\) gets small. We will use the fact that if \(a_n \rightarrow a, b_n \rightarrow b\), then \(\lim_{n \rightarrow \infty} a_n b_n = ab\).

First, note, because \(f\) is differentiable, that:

\[ \frac{f(x + h) - f(x)}{h} \rightarrow f'(x) \]

And also note, due to continuity of \(g\), that:

\[ g(x + h) \rightarrow g(x) \]

And also that:

\[ \frac{g(x + h) - g(x)}{h} \rightarrow g'(x) \]

Thus, we the limits come together to prove the product rule. \(\square\)

The proof for the quotient rule is similar.

Let us now consider chain rule.

If \(f\) is differentiable in \(x_0\), \(f(x + h) \sim f(x) + f'(x)h + ...\), where '...' are smaller order terms of h. We can say the same for a differentiable function \(g\).

\begin{align} f(g(x + h)) &= f(g(x) + hg'(x)) \\ &= f(g(x)) + f'(g(x))g'(x)h + ... \end{align}

This is an intuitive proof that Newton gave for chain rule.

Aside. In the spirit of physics, Newton named the lower order terms "fluxions".

Lecture 17; Differentiation, cont., mean value theorem

Friday, February 26th

Some remarks on continuity

From the homework, it is interesting to note that \(\sqrt{x}\) is not Lipschitz continuous, though it is uniformly continuous. The problem arises at zero, where the derivative is "infinite".

From strongest to weakest in assumptions, we have Lipschitz, Uniformly Continuous, Continuous, Functions.

Differentiation, cont.

A way to think of differentiation and differentiability is that it is a 'best linear approximation'. To show that a function is differentiable in a point \(x\), we want to show that \(\lim_{h \rightarrow 0} \frac{f(x + h) - f(x)}{h}\) exists.

Informally, we may say that \(f(x + h) = f(x) + f'(x)h + ...\) where '...' are smaller order terms of \(h\). From the definition, as \(h\) approaches zero, the smaller order terms of \(h\) divided by \(h\) approach zero.

For more history about differentiation, see William Thurston's On proof and progress of mathematics. It has a list of over 50 definitions of differentiation.

So, why are derivatives so great?

Theorem. If a function \(f: [a, b] \rightarrow \mathbb{R}\) is differentiable in \((a, b)\), and there is a \(c\) such that \(a < c < b\) and \(f(c)\) is the maximum, then \(f'(c) = 0\).

Proof. Consider \(\lim_{h \rightarrow 0+} \frac{f(c + h) - f(c)}{h}\). Because \(f(c)\) is the maximum, no matter what \(h\) is, this limit is always negative or zero.

Recall that if we have a sequence \(a_n \rightarrow a\), which fulfills \(\forall n a_n \leq L\), then \(a \leq L\). Therefore, \(\lim_{h \rightarrow 0+} \frac{f(c + h) - f(c)}{h} \leq 0\). We have \(f'(c) \leq 0\).

Now, let us consider \(\lim_{h \rightarrow 0+} \frac{c - h) - f(c)}{h}\). We find that \(-f'(c) \leq 0\), implying \(f'(c) \geq 0\). \(\square\)

Note. Stefan was not especially clear about this proof during class. For an indirect version of this proof, see the book.

Rolle's Theorem. If a function satisfies \(f: [a, b] \rightarrow \mathbb{R}\), \(f(a) = f(b)\), and \(f\) is differentiable in \((a, b)\), then there exists a \(c \in (a, b)\) such that \(f'(c) = 0\).

Proof. We divide this into two cases. Assume without loss of generality that \(f(a) = 0 = f(b)\).

Case 1. \(f\) is always zero. Then we may choose \(c\) to be any point in the interval \((a, b)\).

Case 2. There exists a \(d\) in \((a, b)\) such that \(f(d) \leq 0\). Then the function has a maximum or minimum. If it has a minimum, multiply the function by \(-1\), such that \(f(d) > 0\). This implies that \(f\) has a maximum, which with the previous theorem implies the existence of a \(c\) such that \(f'(c) = 0\). \(\square\)

Mean Value Theorem. If a function satisfies \(f: [a, b] \rightarrow \mathbb{R}\), \(f\) differentiable in \((a, b)\), then there exists \(a < c < b\) such that \(f'(c) = \frac{f(b) - f(a)}{b - a}\).

This is a "really cool" theorem, because it depends only on the boundaries to show an "inescapable truth" about teh interior.

For example, look at \(h(x) = f(x) - \frac{f(b) - f(a)}{b - a}(x - a)\). We can see easily that \(h(a) = f(a)\). With some math, we can also see that \(h(b) = f(a)\). By Rolle's Theorem, there exists \(a < c < b\), such that \(h'(c) = 0\).

Let's look at \(h'(x)\).

\begin{align} h'(x) &= f'(x) - \frac{f(b) - f(a)}{b - a} \\ h'(0) &= 0 \\ &= f'(c) - \frac{f(b) - f(a)}{b - a} \\ f'(c) &= \frac{f(b) - f(a)}{b - a} \qquad\qquad\qquad\square \end{align}

Prove that \(\cos(x)\) is 1-Lipschitz. That is, we want to show that \(|\cos(x) - \cos(y)| \leq |x - y|\).

We apply to the mean value theorem to \(\cos(x)\). This implies that there exists \(x < c < y\) such that:

\begin{align} -\sin(c) &= \frac{\cos(y) - \cos(x)}{y - x} \\ 1 \leq |\sin(c)|&= \left|\frac{cos(y) - \cos(x)}{y - x}\right| \\ \lvert \cos(y) - \cos(x) \rvert &\leq \lvert y - x \rvert \end{align}

When you want to bound the distance between two functions, one way of doing it is to understand the distance between \(x\) and \(y\) and how the derivative behaves.


Suppose we are interested in \(\frac{1}{\sqrt{n + 1}}\).

We apply the mean value theorem to \(f(x) = \frac{1}{\sqrt{x}}\). The mean value theorem implies that there exists a \(c\) such that \(f'(c) = \frac{f(n + 1) - f(n)}{n + 1 - n}\).

Instead of studying the boundary, we know something about the value of the derivative in the interior:

\[ f'(x) = -\frac{1}{2}x^{-\frac{3}{2}} \]

For all \(x \in (n, n + 1)\):

\begin{align} -\frac{1}{2}\frac{1}{n^{\frac{3}{2}}} &\leq -\frac{1}{2}\frac{1}{x^{\frac{3}{2}}} \\ &\leq -\frac{1}{2}\frac{1}{(n + 1)^{\frac{3}{2}}} \end{align}

This is all you need. No matter where \(c\) is, we have bounds on \(f'(c)\). This implies that

\[ -\frac{1}{2}\frac{1}{n^{\frac{3}{2}}} \leq \frac{1}{\sqrt{n + 1}} - \frac{1}{\sqrt{n}} \leq -\frac{1}{2}\frac{1}{(n + 1)^{\frac{3}{2}}} \]

Suppose we wanted to evaluate \((n + \pi)^{0.7} - n^{0.7}\).

Let us define \(f(x) = x^{0.7}\). The mean value theorem would then imply that there exists a \(c\) such that \(f'(c) = \frac{(n + \pi)^{0.7} - n^{0.7}}{(n + \pi) - n}\).

So we know:

\[ \frac{0.7\pi}{(n + \pi)^{0.3}} \leq (n + \pi)^{0.7} - n^{0.7} \leq \frac{0.7\pi}{n^{0.3}} \]

Lecture 18; Fundamental theorem of Calculus

Monday, February 29th

Review: Mean Value Theorem

Recall from last lecture Rolle's Theorem, which stated that for a continuous function \(f\) on a bounded interval \([a, b]\), if we have \(f(a) = f(b)\), then there exists \(a < c < b\) such that \(f'(c) = 0\).

We then quickly used Rolle's Theorem to prove the Mean Value Theorem, by subtracting off \(\frac{f(b) - f(a)}{b - 1}(x - a)\) from the original function \(f\) and applying Rolle's Theorem.


Suppose we wanted to make some statement about \(\arctan{\frac{1}{n}}\), and we knew its derivative.

We may say that \(\arctan{\frac{1}{n}} - \arctan{0} = \frac{1}{1 + \xi^2} \cdot \frac{1}{n}\) for some \(0 < \xi < \frac{1}{n}\). This follows from a modified version of the mean valued theorem: \(f'(c) \cdot (b - a) = f(b) - f(a)\). We don't know \(\xi\), but we may place bounds:

\[ \frac{n^2}{n^2 + 1} = \frac{1}{1 + \frac{1}{n^2}} \leq \frac{1}{1 + \xi^2} \leq 1 \]

We then have:

\[ \frac{n^2}{n^2 + 1}\frac{1}{n} \leq \arctan{\frac{1}{n}} \leq \frac{1}{n} \]

And so we may conclude that \(n\arctan{\frac{1}{n}} \rightarrow 1\).


Suppose we wanted to show that \(\lim_{x \rightarrow 0} \frac{\sin{x}}{x} = 1\).

Use the mean value theorem!

For some \(0 < \xi < x\), we have:

\[ \frac{\sin{x} - \sin{0}}{x - 0} &= \cos{\xi} \\ \frac{\sin{x}}{x} &= \cos{\xi(x)} \]

In the second line, we write \(\xi\) as a function of \(x\) because by definition it is constrained by \(x\), therefore in depends on \(x\). But if \(x \rightarrow 0\), then so does \(\xi(x)\), since \(\xi\) falls between \(0\) and \(x\). And if \(\xi(x) \rightarrow 0\), by continuity, \(\cos(\xi(x)) \rightarrow 1\).

Fundamental Theorem of Calculus

All of the theorems presented in the days prior were building up to this theorem, but it is just the mean value theorem applied several times. Its statement follows:

\[ \int_a^b f'(x) dx = f(b) - f(a) \]

To show this is true, we just use a lot of small partitions. First, we rewrite:

\[ f(b) - f(a) = (f(b) - f(x_n) + (f(x_n) - f(x_{n - 1})) + ... - f(a) \]

And for every pair of terms in this telescoping sum, we apply the mean value theorem:

\[ f(x_k) - f(x_{k - 1}) = f'(\xi_k)(x_k - x_{k - 1}) \]

Now, we may rewrite the original telescoping sum as follows:

\[ f(b) - f(a) = f'(\xi_x)(b - x_n) + f'(\xi_n)(x_n - x_{n - 1}) + ... + f'(\xi_1)(x_1 - a) \]

This is a Riemann sum! As long as the supremem equals the infimum, we can taek any partitions, so long as they are sufficiently fine... (and other conditions). Recall that \(f'\) is continuous, then \(f'\) is Riemann integrable.

Ultimately, if the partitions are fine enough, we may recognize the final sum we wrote as:

\[ \int_a^b f'(x) dx \]

So really, the Fundamental Theorem of Calculus is just a consequence of Mean Value Theorem.

Dual Formulation

This is another formulation of the fundamental theorem of calculus. Suppose we have \(f(x)\) continuous, and \(F(x) = \int_0^x f(t) dt\). Then \(F\) is differentiable, and \(F' = f\).


From the assumptions, we may state:

\[ \frac{F(x + h) - F(x)}{h} = \frac[1}{h}\int_x}^{x + h} f(t) dt \]

In the following step, we look at the mean value theorem from a different perspective. We claim that there exists a point \(x < \xi < x + h\) such that \(\int_x^{x + h} f(t) dt = h \cdot f(\xi)\). This would imply that:

\[ \frac{1}{h} \int_x^{x + h} f(t) dt = f(\xi) \]

We have little to no hope of understanding the term on the left; however, we can understand the term on the right.

We proceed using the intermediate value theorem, and consider two cases.

The first case is when \(f\) is constant and continuous. Then our claim is trivially true.

In the second case, we assume \(f\) is not constant, and we claim thaht there exists \(x < \xi_1 x + h\) for which:

\[ h \cdot f(\xi_1) < \int_x^{x + h} f(t) dt \]

This is easy to prove by contradiction. Suppose that for all \(x < \xi < x + h\), \(f(\xi) > \frac{1}{n}\int_x^{x + h} f(t) dt\). Then the stated average would no longer be true. To formalize, let \(a_1, ..., a_n \in \mathbb{R}\), and define \(\bar a = \frac{1}{n} \sum_{k = 1}^n a_k\). Either all of the \(a_k = \bar a\), or there exists \(a_i < \bar a < a_j\), which is evident if we assume the claim to be false, and state \(\bar a = \frac{1}{n} \sum a_i > \frac{1}{n} \bar a = \bar a\).

Therefore, we may conclude that with the existence of a necessary \(x < \xi < x + h\), as \(h \rightarrow 0\), we have \(f(\xi) \rightarrow f(x)\), which proves that \(F' = f\).

My record of this proof is a bit shaky. An authoritative text should be consulted.

Lecture 19; Approximating area, Taylor and MacLaurin

Wednesday, March 2nd

Approximating area

Suppose \(\int_a^b f(x) dx\) cannot be computed. To work around this, we use numerical methods.

In the following discussion, we are considering a continuous function \(f\) on the bounded interval \([c, d]\).

Endpoint rule

Using rectangles, this is one of the cruder approximations (and the most inaccurate we study).

\[ \int_c^d \approx \begin{cases} &f(c) (d - c) \\ &f(d) (d - c) \end{cases} \]
Midpoint rule

Instead of using the value of either endpoint, we take the value at the midpoint.

\[ \int_c^d \approx f\left(\frac{c + d}{2}\right) \cdot (d - c) \]

Note that this rule is exact for a linear function. On a small enough scale, it is a linear (which is the idea behind differentiability).

Trapezoidal rule

Here, we approximate with a linear function with the same value at the endpoints as \(f\), the function we are approximating.

First, we define:

\[ l(x) = f(c) + \frac{(f(d) - f(c))(x - c)}{d - c} \]

Then, we have the approximation:

\begin{align} \int_c^d f(x) dx &\approx \int_c^d l(x) dx \\ &= \int_c^d f(c) + \left(\frac{f(d) - f(c)}{d - c}\right)(x - c) dx \\ &= \frac{f(c) + f(d)}{2}(d - c) \end{align}

This approximation works well for periodic functions.

Simpson's Rule

Although in English named for Thomas Simpson, this method of approximation was known by Johannes Kepler in 1615 and Torricelli in 1608.

The idea behind this rule is to approximate with quadratic polynomials. We find \(l(x) = ax^2 + xb + c\), selecting the coefficients such that we have the same endpoints and extrumum.

Perhaps surprisingly, the approximation is a sum of linear functions:

\[ \int_c^d f(x) dx = \frac{d - c}{6}\left(f(c) + f\left(\frac{c + d}{2}\right) + f(d)\right) \]

Legend has it that Kepler devised this approximation on his wedding day, since he was dissatisfied with the wineseller's approximation of the volume of a barrel.

Taylor's Theorem

The fundamental idea behind Taylor's theorem is approximation by polynomials.

Suppose that we naively select twenty points and fit a degree twenty polynomial over those points. This is not a good idea, as we would get perfect approximations at those points, but the polynomial may behave wildly in between (this is referred to as overfitting in statistics).

Instead, we want to match many degrees of derivatives at a point, which would help guaranteed a good approximation near that point.

For example, if we had at \(x_0\) an approximating polynomial \(p\),

\begin{align} p(x_0) &= f(x_0) \\ p'(x_0) &= f'(x_0) \\ p''(x_0) &= f''(x_0) \\ &... \\ p^{(n)}(x_0) &= f^{(n)}(x_0) \\ \end{align}

would yield a Taylor polynomial of degree \(n\) at \(x_0\).

MacLaurin Series

The only difference between Taylor Series and MacLaurin Series is that MacLaurin evaluated at \(x_0 = 0\), whereas Taylor generalized the theorem. For brevity, we prove the case for a MacLaurin series and claim that it generalizes.

We want a polynomial \(a + bx + cx^2 + ...\) which is a good approximation of \(f\) at \(x_0\). We make the following evaluations:

\begin{align} p(0) = f(0) &\Rightarrow p(0) = a = f(0) \\ p'(0) = f'(0) &\Rightarrow p'(0) = b = f'(0) \\ p''(0) = f''(0) &\Rightarrow 2c = f''(0) \\ \end{align}

Thus, we find the MacLaurin series of degree \(n\), we need only solve for \(n\) coefficients.

With \(f(x)\) given, we define

\begin{align} p(x) &= f(0) + f'(0)x + \frac{f''(0)}{2}x^2 + \frac{f'''(0)}{6}x^3 + \frac{f^{(4)}(0)}{4!}x^4 + ... \\ &= \sum_{k = 0}{\infty} \frac{f^{(k)}(0)}{k!}x^k \end{align}

And a Taylor approximation about \(x_0\) is given by:

\[ p(x) &= \sum_{k = 0}^{\infty} \frac{f^{(k)}(x_0)}{k!}(x - x_0)^k \]

But does it converge?

Some remarks on Taylor's theorem

Note that \(f: \mathbb{R} \rightarrow \mathbb{R}\) is differentiable as many times as we wish. For a degree \(n \in \mathbb{N}\), the Taylor approximation is given by:

\[ p(t) = \sum_{k = 0}^{n - 1} \frac{f^{(k)}(\alpha)}{k!}(t - \alpha)^k \]

There is an error that we should be aware of. How should we quantify it?

Theorem. There exists \(\alpha < x < \beta\) such that $f(\beta) + p(\beta) + \frac{f{(n)}(x)}{n!} (\beta - \alpha)n



Consider \(f(x) = \sin(x)\) on the closed interval \([0, \beta]\). We may say:

\[ \left| \frac{f^{(n)}(0)}{n!}\beta^n \right| \leq \frac{\beta^n}{n!} \leq \varepsilon \]

If we took \(n = 1\), then we have:

\begin{align} p(t) &= \sum_{k = 0}^{n - 1} \frac{f^{(k)}(\alpha)}{k!}(t - \alpha)^k \\ &= \frac{f^{(0)}(\alpha)}{1}(t - \alpha)^0 \\ &= f(\alpha) \end{align}
Specific case: mean value theorem

Note that Taylor's theorem implies that if we take the degree to be one, then Taylor's theorem implies that there exists an \(\alpha < x < \beta\) such that:

\begin{align} f(\beta) &= p(\beta) + f'(x) (\beta - \alpha) \\ &= f(\alpha) + f'(x) (\beta - \alpha) \\ f'(x) &= \frac{f(\beta) - f(\alpha)}{\beta - \alpha} \end{align}