2 General Concepts

For PDEs of order greater than one, there does not exists a general theory. Over the time there have been discovered different methods to solve several PDEs, in particular those PDEs which show up in physics. Afterwards these methods were extended to larger and larger classes of PDEs. It turned out that the successful methods of solving PDEs differ from each other substantially. As a result there does not exists one unified theory of PDEs, but there exist several islands of well understood families of PDEs inside the large set of all PDEs. It was Jacobi who formulated in his lectures on Dynamics in the years 1842–43 the following general recipe:

“The main obstacle for the integration of a given differential equations lies in the definition of adapted variables, for which there is no general rule. For this reason we should reverse the direction of our investigation and should endeavour to find, for a successful substitution, other problems which might be solved by the same.”

The strategy is to determine for any successful method all PDEs which can be solved by this method. We have seen that the method of characteristics is a more-or-less general method to solve first order PDEs. Now we investigate the second order PDEs. In this lecture we consider only second order linear PDEs. A general second order linear PDE has the following form

By Schwarz’s Theorem for twice differentiable

u

this expression does not change if we replace

a_{𝑖𝑗}

\frac{1}{2} (a_{𝑖𝑗} + a_{𝑗𝑖})

. So we always assume that

a_{𝑖𝑗}

is symmetric.

It is a theorem of linear algebra that a real symmetric matrix can be diagonalised (by an orthogonal matrix even) and the eigenvalues are all real. We use the signs of these eigenvalues as our classification.

This is not a complete classification, in the sense that there are PDEs which are not of these three types. For example, the

a

matrix of a PDE may have several positive, zero, and negative eigenvalues. If

a

is not constant, then of course the eigenvalues are functions of

x

and may be different at different points. An operator might only be elliptic on a subdomain of

Ω

. And this definition only applies to second order linear PDEs. None-the-less we will concentrate on PDEs of these three types; this is an introductory course.

Elliptic PDEs. As we have just stated, for an elliptic PDE the eigenvalues of

a

all have the same sign. Any solution of

𝐿𝑢 = 0

is also a solution of

- 𝐿𝑢 = 0

, so we can always arrange for the eigenvalues to be positive. There is an equivalent condition that is more often taken as the definition of ellipticity: the operator

L

is elliptic if and only if

To see this equivalence, write

a = O^{T} 𝐷𝑂

for

O

an orthogonal matrix and

D = diag (λ_{1}, \dots, λ_{n})

the eigenvalues. Observe

a

satisfies (2.1) if and only if

D

does. Now let

w = e_{j}

be the

j

th standard basis vector. This yields

w^{T} 𝐷𝑤 = λ_{j}

. Hence (2.1) holds if and only if all the eigenvalues are positive.

Now we consider some concrete examples. If the matrix

a_{𝑖𝑗}

is the identity matrix and

b = 0, c = 0

, then this is the

Laplace equation.

△ u : = \frac{\partial^{2} u}{\partial x_{1}^{2}} + \dots + \frac{\partial^{2} u}{\partial x_{n}^{2}} = 0 .

Solutions of the Laplace equation are called harmonic functions. In Chapter 3 we present several tools which establish many properties of these harmonic functions. It turns out that many properties of the harmonic functions also apply to general solutions of

𝐿𝑢 = 0

, if the matrix

a_{𝑖𝑗}

is positive (or negative) definite. There has been done a lot of work to extend these tools to larger and larger classes of PDEs. One of the results is that the influence of the higher order derivatives on the properties of solutions is much more important than the influence of the lower order derivatives. We offer another lecture which presents many of these tools for such elliptic second order PDEs.

There are also non-linear PDEs to which these methods of elliptic PDEs apply. An important example whose investigation played a major role in the development of the elliptic theory is the

Minimal surface equation.

\nabla \cdot \frac{\nabla u}{\sqrt{1 + | \nabla u |^{2}}} = 0, u : Ω \to ℝ, Ω \subset ℝ^{n}

open.
The graphs of solutions describe so called minimal surfaces. The area of such hypersurfaces in

ℝ^{n + 1}

does not change with respect to infinitesimal variations. Soap bubbles are examples of such minimal surfaces. The boundary value problem of the minimal surface equation is called Plateau’s problem. For the first proof of the existence of solutions of this Plateau problem in the 1930s, Jesse Douglas received the first Field’s Medal. In this non-linear second order PDE the coefficients of the second derivatives also depend on the solution. A lot of work has been done to extend the tools of elliptic theory to elliptic PDEs whose coefficients belong to larger and larger functions spaces. This development induced the introduction of many new function spaces. In Section 2.4 we shall introduce the so called space of distributions. Many of the more advanced functions spaces are special subsets of the distributions.

Parabolic PDEs. For these linear PDEs the matrix

a_{𝑖𝑗}

considered as a symmetric bilinear form is only semi-definite and they belong to the boundary of the class of elliptic PDEs. Most of the methods of elliptic PDEs have an extension to this limiting case. So these limiting cases together with the class of elliptic PDEs form some extended class of elliptic PDEs. In spite of the deep relationship to the elliptic PDEs these equations have their own label. The simplest example is the:

Heat equation.

\dot{u} - △ u = 0 .

These parabolic PDEs describe diffusion processes. These are processes which level inhomogeneities of some quantity by some flow along the negative gradient of the quantity. A typical example for this quantity is the temperature from which the name for the heat equation originates. Many stochastic processes have this property. So the theory of parabolic PDEs has a deep relationship to the theory of stochastic processes. In this lecture we present in Chapter 4 this simplest example of linear parabolic PDE. We shall see how the tools for the Laplace equation can be applied in modified form to this heat equation. In case of the parabolic PDEs there also exists a non-linear example from the geometric analysis, whose investigation played a major role for the development of the elliptic theory (the tensor fields

g

and

R

are defined below):

Ricci Flow.

ġ_{𝑖𝑗} = - 2 R_{𝑖𝑗} .

This PDE describes a diffusion-like process on Riemannian manifolds. It levels the inhomogeneities of the metric, namely the Riemannian metric

g

. In the long run the corresponding Riemannian manifolds converge to metric spaces with large symmetry groups. Richard Hamilton proposed (in the 1970s) a program that aims to prove the geometrization conjecture of Thurston with the help of these PDEs. It states that every three-dimensional manifold can be split into parts, which can be endowed with an Riemannian metric such that the isometry group acts transitively. This conjecture implies the Poincare conjecture, which states that every simply connected compact manifold is the 3-sphere. Hamilton tries to control the long time limit of the Ricci flow on a general 3-dimensional Riemannian manifold. In 2003 the Russian mathematician Grisha Perelman published on the internet three articles which overcome the last obstacle of this program. This lead to the first proof of one of the Millennium Problems of the American Mathematical Society and was a great success of geometric analysis.

Hyperbolic PDEs. Besides the elliptic PDEs (including the limiting cases) the other important class of linear PDEs are called hyperbolic. In this case the matrix

a_{𝑖𝑗}

has one eigenvalue of opposite sign than all other eigenvalues. The simplest example is the

Wave equation.

\frac{\partial^{2} u}{\partial t^{2}} - △ u = 0 .

In Chapter 5 we present the methods how to solve this equation. We shall see that it describes the propagation of waves with constant finite speed. The solutions of general hyperbolic equations are similar to the solutions of this case, and many tools can be generalised to all hyperbolic PDEs. The investigation of these PDEs depend on the understanding of all trajectories, which propagate by the given speed. It was motivated by the theory of the electrodynamic fields, whose main system of PDEs are the

Maxwell equations.

\begin{aligned} Ė - \nabla \times B & = - 4 𝜋𝑗 & Ḃ + \nabla \times E & = 0 \\ \nabla \cdot E & = 4 𝜋𝜌 & \nabla \cdot B & = 0 . \end{aligned}

In this theory there is given a distribution of charges

ρ

and currents

j

on space time

ℝ \times ℝ^{3}

. The unknown functions are the electric magnetic fields

E

and

B

, which describe the electrodynamic forces induced by the given distributions of charges and currents

ρ

and

j

. The conservation of charge is formulated in the same way as in the scalar conservation law. So the change of the total charge contained in a spatial domain is described by the flux of the current through the boundary of the domain. By the divergence theorem this means that distributions of charge

ρ

and currents

j

obey

Again there exists a non-linear version which stimulated the development of the theory:

Einsteins field equations of general relativity.

R_{𝑖𝑗} - \frac{1}{2} g_{𝑖𝑗} R = κ T_{𝑖𝑗} .

Here for a given distribution of masses the energy stress tensor and the space time metric

g_{𝑖𝑗}

are the unknown functions. This metric is a symmetric bilinear form with one positive and three negative eigenvalues on the tangent space of space time. The corresponding Ricci curvature is denoted by

R_{𝑖𝑗}

and the scalar curvature by

R

Integrable Systems with Lax operators. Finally I want to mention a smaller class of PDEs, which are the main objects of my research. They are non-linear PDEs which describe an evolution with respect to time which is very stable. This means that the solutions have in a specific sense a maximal number of conserved quantities. The theory of integrable systems belongs to the field of Hamiltonian mechanics, which originated from Newtons description of the motion of the planets. The Scottish Lord John Scott Russell got very excited in 1934 about the observation of an solitary wave in a Scottish channel and published a “Report on Waves”. This report was quite influential. The two Dutch mathematicians Korteweg and De Vries translated his observation into a PDE describing the profile of water waves travelling along the channel:

Korteweg-de-Vries equation.

4 \dot{u} - 6 u \frac{∂𝑢}{∂𝑥} - \frac{\partial^{3} u}{\partial x^{3}} = 0 .

First by numerical experiments in the 1950s with the first computers and latter in the 1970s by mathematical theory, the solutions of this PDE were shown to have exactly the properties which made Lord Russell so exited: they describe waves which propagate through each other without changing their shape. This lead to the discovery of an hidden relation of the theory of integrable systems with the theory of Riemann surfaces, which is another field with a long history. A major step towards the discovery of this relation was the observation of Peter Lax that this equation can be written as

2.2 The Questions

In addition to the types of PDEs we will study, let us explain some of the questions that we are interested in answering. Broadly speaking they are existence, uniqueness, and regularity.

The first two you probably have some experience with, so let us begin with the third. The regularity of a solution of a differential equation refers to its properties. Most often this is its differentiability, for example, twice continuously differentiable. But it can also be about boundedness, integrability, or the growth/decay rate of the function. These regularity properties are usually expressed in the form of a function space, e.g.

C^{2}

Many types of regularity can be ordered in a hierarchy. Later in this chapter we will introduce distributions. We say have the lowest regularity. Then comes the measurable functions. Some measurable functions are locally integrable,

L_{loc}^{1}

. A function

f

belongs to

L_{loc}^{1}

if for every compact set

K

in its domain

| f |_{K} |

has a finite integral. You might also know the Lebesgue norms. The elements of

L p_{loc}

describe ever smaller families of functions, whose regularity increase with

p \in [1, \infty]

. The next smallest class are Sobolev functions whose

k

-th order partial derivatives belong to

L p_{loc}

. The regularity further increases for the functions in

C^{k}

. Finally we end with the smooth functions and the analytic functions with the highest regularity.

Consider the analogy to the algebraic equation

x^{2} + 1 = 0

. This has no solutions

x \in ℝ

but two solutions

x \in ℂ

. Likewise, the number of solutions to a PDE can change depending on which functions we are considering. To be concrete, perhaps there are many

C^{2}

solutions, but no bounded

C^{2}

solutions. There is usually a natural level of regularity to require of a solution: the solution to a second order PDE should be twice differentiable. But as we have seen in the previous chapter, it is sometimes necessary to change the meaning of ‘solution’ and consider ‘non-differentiable solutions’ (weak solutions) to a PDE, even if that sounds like a contradiction.

Sometimes allowing lower regularity doesn’t increase the number of solutions of a differential equation. A classic example from ODE theory is

u^{'} = u

. Suppose

u

is a differentiable solution to this equation. But then

u^{'} = u

tells us that

u^{'}

is equal to a differentiable function. That means that

u^{'}

is differentiable, i.e. that

u

is twice differentiable. Repeating this argument, we see that

u

is infinitely many times differentiable (smooth). We say that the solution has higher regularity than expected. The Laplace and heat equations both exhibit highly regular solutions.

A problem is a differential equation on some domain together with some additional (non PDE) conditions. A typical ODE has infinitely many solutions, but a typical partial differential equations has an infinite dimensional space of solutions. The idea is to give the right additional conditions so that the problem has a unique solution. A solution of an ordinary differential equations of

m

-th order is in many cases uniquely determined by fixing the values of the first

m

derivatives at

t = 0

. For partial differential equations the solutions are functions on higher dimensional domains

Ω \subset ℝ^{n}

. A natural condition is the specification of the values of the solution and some of its derivatives on the boundary of the domain or on a hypersurface within the domain. The search for solutions which obey this further specification are called boundary value problems. When one of the variables represents time and we give conditions at

t = 0

, naturally we call this an initial value problem.

We are most interested in problems that are well-posed. This means that (1) the problem has a solution, (2) the solution is unique, and (3) the solution depends continuously on the data. We have to balance the choice of regularity and how much data we give so that the problem has a unique solution, but not too much that it doesn’t have any solution. If we determine all possible boundary values that have solutions, then the space of solutions is completely parameterised. Again to give an ODE analogy, the solutions of

u^{'} = u

are

u (t) = A e^{t}

, so

A \in ℝ

parameterises the solutions.

Finally, there is the question of existence. This is perhaps the most fundamental question, because it is about the definition of ‘a solution’. We have already mentioned how this is affected by regularity (including weak solutions) and boundary value conditions. But proving the existence of solutions of PDEs is in general much more difficult than for ODEs, and there are not too many general theorems that we have. In fact, there is a famous example of a simple-looking PDE that does not have any solutions, not even locally. This example is a simplification by Nirenberg of an example of H. Lewy: there is no solution

u : Ω \to ℂ

on any open subset

Ω \subset ℝ^{2}

where

f

is a specially constructed smooth function. Notice in particular that this is first order PDE, with analytic coefficients and a smooth inhomogeneous term. In previous years we gave a proof of this, but it requires certain facts from complex analysis (aka Funktionstheorie) that many students didn’t understand. Interested students may ask me for a copy of the old script. Instead we will give a different example of a PDE without a solution in Section 3.5 using the techniques that fit the themes of this course.

2.3 Divergence Theorem

In this section we present a generalisation of the fundamental theorem of calculus to higher dimensions, namely the divergence theorem. This theorem has many important consequences. In this section we present two: First we generalise partial integration to higher dimensions. Second we explain in which sense the higher dimensional scalar conservation law describes a conserved quantity.

The divergence theorem is a statement about the integral over a submanifold of

ℝ^{n}

, so naturally we should define submanifolds and their integrals.

Definition 2.2. The graph of a function $λ : U \subset ℝ^{k} \to ℝ^{n - k}$ is the $k$ -dimensional subset

graph (λ) = {(x, y) \in U \times ℝ^{n - k} ∣ y = λ (x)} \subset ℝ^{n} .

We allow reordering of the components of $ℝ^{n}$ . By this we mean that both ${(x, y, x^{2} + y^{2}) \in ℝ^{3}}$ and ${(x, x^{2} + z^{2}, z) \in ℝ^{3}}$ are graphs, for example.

A subset $A \subset ℝ^{n}$ is called a $k$ -dimensional submanifold if it is a $k$ -dimensional graph locally. That means for every point $x \in A$ there exists an open $O_{x} \subset ℝ^{n}$ such that $A_{x} : = A \cap O_{x}$ is a $k$ -dimensional graph.

We say that a graph or submanifold is $C^{l}$ (or some other regularity) if it is locally the graph of a function $λ \in C^{l}$ .

The classic, and for us important, example of a submanifold that is not a graph globally is the circle

𝕊^{1} = {(x, y) \in ℝ^{2} ∣ x^{2} + y^{2} = 1}

. This is because

y = \pm \sqrt{1 - x^{2}}

is not a function. However it can written as the union of four graphs

For practical calculation it is not always the best idea to reduce a submanifold to graphs. Often a parameterisation can cover more of the submanifold, which means less work.

The Jacobian of

Φ

is an

n \times k

matrix, so its rank cannot be greater than

k

. Thus a regular parameterisation is also called full-rank. A graph is a special type of parameterisation, one where

k

of the components of

Φ

are just the

k

input variables. In other words

Φ (x) = (x, λ (x))

, or some rearrangement. This is always a regular parameterisation, because

Φ^{'}

contains the

k \times k

identity matrix. For an example of a non-regular parameterisation, consider the parameterisation

(x, y) \mapsto (x, 0, 0)

of the

x

-axis in

ℝ^{3}

. We see that

y

is not really playing any role and the submanifold is only one-dimensional. This is the reason we should consider regular parameterisations.

The symbol

d σ

can be given a formal meaning, but for us it is just a reminder that it is a ‘submanifold integral’ and not an integral on a subset of

ℝ^{n}

in the usual sense. The

k

-dimensional parallelotope spanned by the

k

column vectors of a

n \times k

-matrix

A

has the volume

\sqrt{\det (A^{T} A)}

. The motivation for the

\sqrt{\det}

factor in the definition of the integral is that it measures the distortion of the parameterisation. This value turns out to be independent of the choice of regular parameterisation of

A

Proof. $Φ (x) = (x, λ (x))$ is a regular parameterisation (without loss of generality we can relabel the coordinates to achieve this form). Suppose that we have another regular parameterisation $Ψ : V \to A$ . Then define $Υ = Φ^{- 1} \circ Ψ : V \to U$ . We claim that $Υ$ is continuously differentiable. This is not so clear, because $Φ^{- 1}$ is only defined on $A$ , not on a euclidean space, and so we can’t apply the chain rule directly. However, let $Π : ℝ^{n} \to ℝ^{k}$ be the projection $Π (x, y) = x$ . Clearly $Π \circ Φ (x) = Π (x, λ (x)) = x$ . This shows that $Φ^{- 1} = Π |_{A}$ . Therefore $Υ = Π \circ Ψ$ is another formula for $Υ$ . Now we can apply the chain rule and conclude $Υ$ is continuously differentiable.

Now we can carry out an computation that connects the two integrals

\begin{array}{l} \int_{V} f \circ Ψ \times \sqrt{\det ({(Ψ^{'})}^{T} Ψ^{'})} d μ_{ℝ^{k}} \\ = \int_{V} f \circ Φ \circ Υ \sqrt{\det ({({(Φ \circ Υ)}^{'})}^{T} {(Φ \circ Υ)}^{'})} d μ_{ℝ^{k}} \\ = \int_{V} (f \circ Φ \sqrt{\det ({(Φ^{'})}^{T} Φ^{'})}) \circ Υ | \det Υ^{'} | d μ_{ℝ^{k}} \\ = \int_{U} f \circ Φ \sqrt{\det ({(Φ^{'})}^{T} Φ^{'})} d μ_{ℝ^{k}} . \end{array}

In the last step we applied the transformation formula of Jacobi. This shows that using any parameterisation gives the same result as using the graph parameterisation. □

This is a very practical definition in that it gives you a concrete integral to compute. However many submanifolds that we want to consider cannot be covered by a single parameterisation. The typical example is the sphere: any open set

U \subset ℝ^{k}

is not compact and the sphere is compact, so there cannot exist a homeomorphism

Φ

between them. However if we use two parameterisations, then each can cover a part of the sphere and together they can cover the whole sphere. The trouble is now these

A_{i}

can overlap, so if we just integrate in each parameterisation then we will ‘double-count’ the points of

A

. The answer to this is an elegant theoretical tool, but one that is not practically useful: a so called partitions of unity.

For every countable family of open subsets of

ℝ^{n}

there exists a smooth partition of unity. A proof can be found in many textbooks and in Prof Schmidt’s script of the lecture Analysis II.

The idea of this definition is that we can write

f (x) = 1 \times f (x) = \sum_{i} h_{i} (x) f (x)

. Then each function

h_{i} f

is zero outside of

A_{i}

so it is only necessary to integrate it on

A_{i}

, not on all of

A

. We assumed that

A

was compact so that the sum is finite and we avoid any issues of convergence. The restriction that

A

is compact is not necessary, but then one must deal with the convergence issues.

Proof. Suppose that we have two covers of parameterising sets $A = \cup_{i} A_{i} = \cup_{j} B_{j}$ and correspondingly two partitions of unity $h_{i}$ and $g_{j}$ . Define a new cover $C_{i, j} = A_{i} \cap B_{j}$ . It has a partition of unity $h_{i} g_{j}$ . Each set $C_{i, j}$ can be parameterised by restricting the parameterisation $Φ$ of $A_{i}$ to $Φ^{- 1} [A_{i} \cap B_{j}]$ . Observe

\sum_{i} \int_{A_{i}} h_{i} f d σ = \sum_{i} \int_{A_{i}} (\sum_{j} g_{j}) h_{i} f d σ = \sum_{i, j} \int_{A_{i}} g_{j} h_{i} f d σ = \sum_{i, j} \int_{C_{i, j}} g_{j} h_{i} f d σ .

The same calculation holds for the integral $\sum_{j} \int_{B_{j}} g_{j} f d σ$ , showing that the two are equal. We have already seen that if we use two different parameterisations for the same set that the integral has the same value. Therefore we have shown that definition is independent of parameterisation and partition of unity. □

Integrals over submanifolds have many of the same properties as the usual integral. This is because ‘under the hood’ it is a the usual integral with a correction factor. An important property that does not carry over is the change of variables formula. Only certain changes of variables preserve the correction factor. These properties will be proved in the tutorials.

We are almost ready to state the divergence theorem. In the divergence theorem we work with a bounded and open set

Ω \subset ℝ

. In general such sets can have very complicated boundaries, for example fractals. We will require

\partial Ω

to be a

(n - 1)

-dimensional submanifold. The idea of requiring

Ω

to be bounded is that

\partial Ω

is compact. This idea appears in the proof when we say

\bar{Ω}

is compact.

There are three more formulas that we will need. First, just in case you missed the first tutorial, for a vector valued function

F

, the divergence of

F

\nabla \cdot F = \partial_{1} F_{1} + \dots + \partial_{n} F_{n}

. The

\cdot

is meant to remind you of the formula for the dot product; it is not actually a dot product. Second, because a submanifold is locally a graph, it is possible to understand its geometry. In the situation of the theorem

λ : U \subset ℝ^{n - 1} \to ℝ

is a scalar valued function. Then

e_{i} + \partial_{i} λ e_{n}

are tangent vectors to the submanifold and so

{(- \nabla λ, 1)}^{T}

is perpendicular to the submanifold. Thus the unit length normal vector is

We see that it a smooth vector field, well-defined up to a choice of sign. The last formula is a simplification of the distortion factor for the graph parameterisation

Φ (v) = {(v, λ (v))}^{T}

. We have

Φ^{'} = {(I_{n - 1} | \nabla λ)}^{T}

where

I_{n - 1}

is the

(n - 1) \times (n - 1)

identity matrix. The following calculation makes use of the Weinstein–Aronszajn identity

Proof. First we construct a cover for $\bar{Ω}$ . Because the boundary is a $(n - 1)$ -dimensional submanifold for every point of $\partial Ω$ we know that there is an open set $O$ such that $\partial Ω \cap O = graph (λ)$ . By shrinking $O$ , we can assume that $O$ is a cube. Let the cover be $Ω$ and all these cubes. Due to the compactness of $\bar{Ω}$ we can find a finite subcover and choose a subordinate partition of unity. This decomposes $F$ into a finite sum $\sum h_{l} F$ . By linearity it suffices to show the statement for any $G = h_{l} F$ individually.

This leads to two cases. For the first case, suppose we have a term $G = h_{l} F$ corresponding to $Ω$ . In particular $G$ and $G^{'} = {(h_{l} F)}^{'}$ are zero on $\partial Ω$ . The right hand side of the divergence theorem is zero. By defining $G$ to be zero outside of $Ω$ we can extend $G$ to a continuously differentiable function on $ℝ^{n}$ . Choose a cube ${[- R, R]}^{n}$ which contains $Ω$ . By Fubini we may integrate the $i$ -th term of $\nabla \cdot G = \partial_{1} G_{1} + \dots, + \partial_{n} G_{n}$ first over $d x_{i}$ . Due to the fundamental theorem of calculus this integral is the difference of the values of $G$ at two boundary points and vanishes. For example

\begin{array}{l} \int_{Ω} \partial_{1} G_{1} & = \int_{{[- R, R]}^{n}} \partial_{1} G_{1} d x = \int_{{[- R, R]}^{n - 1}} (\int_{- R}^{R} \partial_{1} G_{1} d x_{1}) d x_{2} \dots d x_{n} \\ = \int_{{[- R, R]}^{n - 1}} [G_{1}]_{- R}^{R} d^{n - 1} x = \int_{{[- R, R]}^{n - 1}} [0 - 0] d^{n - 1} x = 0 . \end{array}

This shows that the left side of the divergence theorem also vanishes.

In the second case, we have a term $G = h_{l} F$ that corresponds to a set that covers the boundary. We assumed the set was a cube, so write it as $U \times (a, b)$ . Relabel the coordinates so that the boundary is a graph $x_{n} = λ (x)$ and $\bar{Ω} \cap (U \times (a, b)) = {(\begin{matrix} u \\ y \end{matrix}) ∣ y \leq λ (u)}$ . We use the variables $u = (u_{1}, \dots, u_{n - 1}) = (x_{1}, \dots, x_{n - 1})$ and $y = x_{n}$ for convenience. We may assume that $G$ and $G^{'}$ are zero on $∂𝑈 \times (a, b)$ and $U \times {a}$ , but not that it is zero on $\partial Ω$ . This is because $\partial Ω$ is ‘inside’ the cube $U \times (a, b)$ and we only know that $h_{l}$ vanishes on the outside of the cube.

Again, we handle the terms of $\nabla \cdot G = \partial_{1} G_{1} + \dots + \partial_{n} G_{n}$ one at a time. Suppose $1 \leq i < n$ and consider the function

u \mapsto \int_{a}^{λ (u)} G_{i} (u, y) d y .

It vanishes for $u \in ∂𝑈$ as does its derivative

\frac{\partial}{\partial x_{i}} \int_{a}^{λ (u)} G_{i} (u, y) d y = \frac{∂𝜆 (u)}{\partial x_{i}} G_{i} (u, λ (u)) + \int_{a}^{λ (u)} \frac{\partial G_{i} (u, y)}{\partial x_{i}} d y .

Applying the same argument as in the first case, we see that the integral of $\partial_{i}$ -derivative over $U$ vanishes. Therefore

\begin{array}{l} \int_{\bar{Ω} \cap (U \times (a, b))} \frac{\partial G_{i} (u, y)}{\partial x_{i}} d μ = \int_{U} \int_{a}^{λ (u)} \frac{\partial G_{i} (u, y)}{\partial x_{i}} d y dn - 1 u = - \int_{U} \frac{∂𝜆 (u)}{\partial x_{i}} G_{i} (u, λ (u)) dn - 1 u \\ = \int_{U} G_{i} (u, λ (u)) N_{i} \sqrt{1 + | \nabla λ |^{2}} dn - 1 u = \int_{A^{'}} G_{i} N_{i} d σ . \end{array}

Note that the signs required us to use the outward-pointing normal, which in this case means that the last component of the vector $N$ is positive.

For $i = n$ , we can just use the fundamental theorem of calculus on the inner integral

\begin{array}{l} \int_{\bar{Ω} \cap (U \times (a, b))} \frac{\partial G_{n} (u, y)}{\partial x_{n}} d μ = \int_{U} \int_{a}^{λ (u)} \frac{\partial G_{n} (u, y)}{\partial x_{n}} d y d^{n - 1} u = \int_{U} G_{n} (u, λ (u)) d^{n - 1} u \\ = \int_{U} G_{n} (u, λ (u)) N_{n} \sqrt{1 + | \nabla λ |^{2}} d^{n - 1} u = \int_{A^{'}} G_{n} N_{n} d σ \end{array}

Summing these terms together proves the theorem. □

We consider now some special cases of the theorem that occur over and over in practice. For a scalar valued function

f

the divergence theorem implies for all

i = 1, \dots, n

For two functions

f

and

g

whose product vanishes on the boundary

\partial Ω

and satisfies the corresponding assumptions of the divergence theorem we obtain by the Leibniz rule

This is called integration by parts. Inductively we get for any multi-index

γ

As a second application of the divergence theorem we can generalise the idea of the scalar conservation law to vector-valued functions. For any continuously differentiable function

F : ℝ \to ℝ^{n}

we call

a conservation law. For open and bounded

Ω \subset ℝ^{n}

with

n - 1

-dimensional submanifold

\partial Ω

ℝ^{n}

we obtain

This is the meaning of a conservation law: the change of the integral of

u (\cdot, t)

over

Ω \subset ℝ^{n}

is equal to the integral of the flux

- F (u (\cdot, t)) \cdot N

through the boundary

\partial Ω

This idea also gives the following cute trick to calculate the surface area of a ball in relation to its volume. Let the volume of the

n

-dimensional unit ball be

ω_{n}

. By scaling, the volume of the ball

B (0, r)

ω_{n} r^{n}

. Let

σ_{n} (r)

denote the area of

∂𝐵 (0, r) \subset ℝ^{n}

. The divergence of

x \mapsto x

n

, so by the divergence theorem we have

The way that the divergence theorem relates an integral over a set to an integral over its boundary can be used to decompose the set into layers. The typical example, and the only relevant example for us, is that a ball

B (0, R)

can be thought of as spheres

∂𝐵 (0, r)

for

0 < r < R

(the origin is measure zero and can be ignored). The layers are the level-sets of a function

ℓ

. For the ball

ℓ (x) = | x |

, which has

| \nabla ℓ | = 1

so the formula below simplifies further. There will also be an exercise that proves this formula for the ball directly from the definition of the submanifold integral.

Proof. Let’s make some additional definitions to simplify the working. The gradient of a function is always normal to its level set pointing in the direction of increase, so $N = | \nabla ℓ |^{- 1} \nabla ℓ$ is the outward pointing unit normal of $\partial Ω_{t}$ . Define $F = 𝑓𝑁$ , a vector valued function. In particular $F \cdot N = 𝑓𝑁 \cdot N = f$ .

Now we can begin. By the product rule for divergence

\begin{array}{l} \nabla \cdot ((T - ℓ) F) & = \nabla (T - ℓ) \cdot F + (T - ℓ) \nabla \cdot F \\ = - \nabla ℓ \cdot 𝑓𝑁 + (T - ℓ) \nabla \cdot F \\ = - f | \nabla ℓ | + (T - ℓ) \nabla \cdot F . \end{array}

Rearranging this and applying the divergence theorem shows

\begin{array}{l} \int_{Ω_{T}} f | \nabla ℓ | d x & = \int_{Ω_{T}} (T - ℓ) \nabla \cdot F d x - \int_{Ω_{T}} \nabla \cdot ((T - ℓ) F) d x \\ = \int_{Ω_{T}} (T - ℓ) \nabla \cdot F d x - \int_{\partial Ω_{T}} (T - ℓ) F \cdot N d x . \end{array}

On the boundary of $Ω_{T}$ the function $ℓ = T$ , since the set $Ω_{T} = {ℓ < T}$ by definition. Hence $T - ℓ = 0$ on $\partial Ω_{T}$ and the second term on the right is zero. The next step is a ‘magic trick’:

\begin{array}{l} \int_{Ω_{T}} [T - ℓ (x)] \nabla \cdot F (x) d x & = \int_{Ω_{T}} [\int_{ℓ (x)}^{T} 1 d t] \nabla \cdot F d x = \int_{Ω_{T}} \int_{ℓ (x)}^{T} \nabla \cdot F d t d x . \end{array}

We want to apply Fubini’s theorem to change the order of this double integral. But the limits of the inner integral depends on the variable of the outer integral, so first we use an indicator function to make them independent

\begin{array}{l} \int_{Ω_{T}} \int_{ℓ (x)}^{T} \nabla \cdot F d t d x & = \int_{Ω_{T}} \int_{0}^{T} χ_{{t \geq ℓ (x)}} \nabla \cdot F d t d x = \int_{0}^{T} \int_{Ω_{T}} χ_{{t \geq ℓ (x)}} \nabla \cdot F d x d t \\ = \int_{0}^{T} \int_{Ω_{t}} \nabla \cdot F d x d t . \end{array}

We apply the divergence theorem one more time to get the result.

\int_{0}^{T} \int_{Ω_{t}} \nabla \cdot F d x d t = \int_{0}^{T} \int_{\partial Ω_{t}} F \cdot N d σ d t = \int_{0}^{T} \int_{\partial Ω_{t}} f d σ d t . □

2.4 Distributions

For the transport equation we developed a solution that also seems to make sense when it is not differentiable. For the scalar conservation law we saw that there were in some situations no solutions, except if we generalised the notion of solution to include discontinuous functions. The lesson we draw from these examples is that the existence and uniqueness of solutions depends on the notion of solution we use. In order to say that these solutions solve the PDE, clearly all partial derivatives of a solution which occur in the partial differential equation have to exist. The trick is to come up with a new notion of partial derivative and interpret the PDE to be about these new derivatives.

In this section we introduce distributions (also called generalised functions) and a corresponding notion of differentiation. This notion is ‘backwards compatible’: if a differentiable function is considered as a distribution, the two types of derivatives are equal. Remarkably distributions can always be differentiated and indeed they can be differentiated infinitely many times. For this achievement we have to pay a price: these distributions cannot be multiplied with each other in general. Linear partial differential equations extend to well defined equations on such distributions. Distributions solving the linear partial differential equations are called weak solutions or solutions in the sense of distributions. There exist other notions of weak solutions which also apply to non-linear partial differential equations. The most prominent example is the notion of a Sobolev function, which are introduced in the course “Partial Differential Equations”, the sequel to this course. But Sobolev functions can be understood as a special type of distribution, so even if one is interested in Sobolev functions it is helpful to start with distributions.

First we need to define a special class of very well behaved functions. The support

supp f

of a function

f

is the closure of

{x ∣ f (x) \neq 0}

. On an open set

Ω \subseteq ℝ^{n}

let

C_{0}^{\infty} (Ω)

denote the algebra of smooth functions whose support is a compact subset of

Ω

. We call these test functions and say they have compact support in

Ω

, symbolically

supp f ⋐ Ω

Within the set of test functions there are a special families that we will often use called a mollifier or approximate identity. This is a family of non-negative test functions

{(λ_{𝜖})}_{𝜖 > 0}

with

supp λ_{𝜖} = \bar{B (0, 𝜖)}

and

\int λ_{𝜖} d μ = 1

. We construct a prototype: the function

is a smooth function on

ℝ^{n}

, has support

\bar{B (0, 1)}

, and is non-negative. By the way, this example shows that test functions actually exist. We can choose the constant

C

such that its integral is

1

. By rescaling

x

and

λ

we obtain

which has the required properties. This particular example of a mollifier is called the standard mollifier, but for our purposes it does not matter which mollifier we use. Any such family is called an approximate identity because of the following property. Take any continuous function

f

Ω

and suppose

0 \in Ω

. By continuity

f

is approximately equal to

f (0)

on a sufficiently small ball

B (0, 𝜖)

. Therefore

In fact, as we will prove in the next lemma, this approximation becomes an equality in the limit

𝜖 ↓ 0

Proof. Choose a compact subset of $Ω$ . There is an $𝜖$ such that for any point $x$ in the compact set the ball $B (x, 𝜖)$ lies in $Ω$ . For this $𝜖$ or smaller we have

| f_{𝜖} (x) - f (x) | = | \int_{Ω} λ_{𝜖} (x - y) (f (y) - f (x)) dn y | \leq \sup_{y \in B (x, 𝜖)} | f (y) - f (x) | .

On compact sets, continuous functions are uniformly continuous. This shows the uniform convergence $\lim_{𝜖↓ 0} f_{𝜖} = f$ .

Observe that if $f$ is smooth, then we can compute the derivatives of $f_{𝜖}$ in the following way. Choose any point $x_{0} \in Ω$ and let $𝜖$ be small enough that $B (x_{0}, 2 𝜖) \subset Ω$ . Then for all points $x \in B (x_{0}, 𝜖)$

f_{𝜖} (x) = \int_{B (x, 𝜖)} f (y) λ_{𝜖} (x - y) dn y = \int_{B (0, 𝜖)} f (x - z) λ_{𝜖} (z) dn z .

Therefore $\partial^{α} f_{𝜖} = {(\partial^{α} f)}_{𝜖}$ and the same convergence argument carries over to all partial derivatives of $f$ . □

The formula we see in the definition of

f_{𝜖}

turns out to be useful. We use it to define a type of product operator on

C_{0}^{\infty} (ℝ^{n})

: the convolution

This product is bilinear, commutative, and associative (Exercise). One advantage of the convolution compared to pointwise multiplication is that it behaves nicely with differentiation. There is no Leibniz rule, rather

Furthermore convolution is well-behaved with respect to integral norms, which is useful in more advanced theory. We can consider the simplest case, where integral of

f * g

is the product of the integrals of

f

and

g

. This follows by noticing that the coordinate transformation

z = y - x, y = y

is volume preserving, thus

Proof. The proof is just a sequence of coordinate transformations. We begin with the definition and make the euclidean motion $y = 𝑂𝑧 + b$

(f * g) (a + b + 𝑂𝑥) = \int_{ℝ^{n}} f (a + b + 𝑂𝑥 - y) g (y) d y = \int_{ℝ^{n}} f (a + O (x - z)) g (b + 𝑂𝑧) d z .

It is important to see here that $d y = d z$ since $O$ is orthogonal. Now we use the orthogonal properties of $f$ and $g$ to continue

= \int_{ℝ^{n}} f (a + x - z) g (b + z) d z = \int_{ℝ^{n}} f (a + x - y^{'} + b) g (y^{'}) d y^{'} = (f * g) (a + b + x) . □

Now it is time to introduce distributions. We have seen in the previous lemma that the operation of integrating a continuous function against a test function somehow retains the information of the function. In this spirit each

f \in L 1_{loc} (Ω)

defines a linear map

We will see that the information of

f

is also retained in this linear form. The idea of distributions is consider not just functions integrated against test functions, but all linear forms

F

acting on

𝒞_{0}^{\infty} (Ω)

There is a technical matter to discuss at this point. The set of test functions should be given a different topology than the norm topology of

C^{\infty} (Ω)

with the supremum norm, but this other topology is tricky and only used in a few places in this course. We use the notation

𝒟 (Ω)

for the set of test functions equipped with this topology. Instead of explaining the topology in full detail, let us give the criterion for when a sequence of test functions converges. We define for any compact subset

K \subset Ω

and every multi-index

α

the following seminorm:

We say that

f_{n} \to f

𝒟 (Ω)

if there is a compact subset

K \subset Ω

such that the supports of every

f_{n}

and

f

are contained in

K

and that

∥ f_{n} - f ∥_{K, α}

for every multi-index

α

(including

α = 0

). This is a much stronger condition that convergence with respect to

∥ \cdot ∥_{\infty} = ∥ \cdot ∥_{Ω, 0}

The

𝒟^{'}

for distributions indicates (for the correctly defined topology) that they are the dual space of

𝒟

. Concretely the continuity condition yields the following convergence property for distributions: if

ϕ_{n} \to ϕ

𝒟 (Ω)

then the values

F (ϕ_{n})

converges to

F (ϕ)

. Similarly, a sequence of distribution

F_{n}

converges to

F

F_{n} (ϕ) \to F (ϕ)

for all test functions

ϕ

As previously mentioned, any

f \in L 1_{loc} (Ω)

defines in a canonical way a distribution

F_{f}

. Let us verify now that it really meets the definition of distribution. For any compact subset

K \subset Ω

and

ϕ \in 𝒟 (Ω)

with support

K

we have

Let us give another example of a distribution, one that does not correspond to an element of

L 1_{loc} (ℝ^{n})

Intuitively (and we will prove rigorously soon) any corresponding

f \in L 1_{loc} (ℝ^{n})

would vanish on

ℝ^{n} ∖ {0}

and would have a total integral one. Since

{0}

has measure zero such a function does not exist. Distributions that come from

L 1_{loc} (Ω)

functions are called regular, and those that don’t are non-regular. This distribution is called Dirac’s

δ

-function. We can also show that it is the limit of the sequence of distributions corresponding to the mollifier

λ_{𝜖}

We now return to the question of whether the distribution

F_{f}

retains the information of

f

. The answer is yes.

Proof. It suffices to prove the local statement for $f \in L 1 (Ω)$ . We extend $f$ to $ℝ^{n}$ by setting $f$ on $ℝ^{n} ∖ Ω$ equal to zero. The extended function is also denoted by $f$ and belongs to $f \in L 1 (ℝ^{n})$ . For a mollifier ${(λ_{𝜖})}_{𝜖 > 0}$ we have

\begin{array}{l} ∥ λ_{𝜖} * f - f ∥_{1} & = \int_{ℝ^{n}} | \int_{B (0, 𝜖)} λ_{𝜖} (y) f (x - y) dn y - f (x) | dn x \\ \leq \int_{B (0, 𝜖)} \int_{ℝ^{n}} λ_{𝜖} (y) | f (x - y) - f (x) | dn x dn y \leq \sup_{y \in B (0, 𝜖)} ∥ f (\cdot - y) - f ∥_{1} . \end{array}

If $f$ is the characteristic function of a rectangle, then the supremum on the right hand side converges to zero for $𝜖 ↓ 0$ . Due to the triangle inequality the same holds for step functions, i.e. finite linear combinations of such functions. Since step functions are dense in $L 1 (ℝ^{n})$ for each $f \in L 1 (ℝ^{n})$ this supremum becomes arbitrary small for sufficiently small $𝜖$ . Hence the family of functions ${(λ_{𝜖} * f)}_{𝜖 > 0}$ converges in $L 1 (ℝ^{n})$ in the limit $𝜖 ↓ 0$ to $f$ .

Moreover, the functions $λ_{𝜖} * f$ are non-negative. This is because the mollifiers are non-negative and we can write the convolution as the action of $F_{f}$ on a test function

(λ_{𝜖} * f) (x) = \int_{ℝ^{n}} λ_{𝜖} (x - y) f (y) dn y = F_{f} (λ_{𝜖} (x - \cdot)) \geq 0

using the assumption on $F_{f}$ .

So it remains to show that a limit in $L 1$ of a sequence of non-negative functions is also non-negative. In particular there exists a sequence ${(𝜖_{n})}_{n \in ℕ}$ which converges to zero, with $∥ f_{n} - f ∥_{1} \leq 2^{- n}$ for all $n \in ℕ$ for $f_{n} = λ_{𝜖_{n}} * f$ . This ensures that the series $\sum_{n \in ℕ} | f_{n} - f |$ converges in $L 1 (ℝ^{n})$ . So for almost every point $x$ the series $\sum_{n \in ℕ} | f_{n} (x) - f (x) |$ is finite, and in particular the tail of the series converges to zero. In other words $\lim_{n \to \infty} f_{n} (x) = f (x)$ . This indeed shows that $f$ is a.e. non-negative.

In particular, if $f$ belongs to the kernel of $f \mapsto F_{f}$ , then both $f$ and $- f$ are almost everywhere non-negative. So $f$ vanishes almost everywhere. □

Two definitions for functions carry over naturally to distributions. If

Ω^{'} \subset Ω

then every test function on

Ω^{'}

extends to a test function on

Ω

. In this way we can think of any distribution on

Ω

as a distribution on

Ω^{'}

, which we call the restriction. For regular distributions, this is really the restriction of functions. Using restriction we can give a definition of support. The complement of the support of a distribution is the union of all sets on which the restriction vanishes. In symbols

The support of the delta distribution is

{0}

, and the support of the distribution of a continuous function is its support in the normal sense.

We want to define as many operations on distributions as possible, in such a way that they extend operations on functions. Restriction and support are two examples where this is clear. The general strategy for making such definitions is to compare

F_{f}

F_{𝐴𝑓}

where

A

is the operation. If we can write the relation in a way that only depends on the distribution and not directly on the function, then it is suitable to make a generalised definitions. Let us consider the case of multiplication by a smooth function

g \in C^{\infty} (Ω)

. Then for a regular distribution

The product of a distribution with a function

g \in C^{\infty} (Ω)

is defined as

This product makes the embedding

C^{\infty} (Ω) ↪ 𝒟^{'} (Ω)

to a homomorphism of modules over the algebra

C^{\infty} (Ω)

. However, the product of a distribution with a non-smooth functions is not defined because then

𝑔𝜙

is not a test function.

So we come to the most important operation on distributions. If

f

has a derivative, then by integration by parts we obtain

Consequently for any distribution

F \in 𝒟^{'} (Ω)

we define the partial derivatives as

Here we see the advantage of choosing smooth test functions: test functions are always differentiable and so distributions have infinitely many derivatives. These two operations we have just defined, multiplication with a smooth function and partial differentiation, define new distributions. Clearly these new distributions are linear. We should check that they also obey the continuity condition, but we will skip this formality.

We also want to extend convolution to distributions. In order to extend it to a product between a smooth function and a distribution we calculate:

where

(𝖯 g) (z) : = g (- z)

is the point-reflection operator. Therefore we define for

g \in C_{0}^{\infty} (ℝ^{n})

and

F \in 𝒟^{'} (ℝ^{n})

Not only is this a well-defined distribution, the result of convolution is in fact always a regular distribution that corresponds to a smooth function!

Proof. First we show that the function defined in the lemma exists and is smooth. The support of $𝖳_{x} 𝖯 g$ is ${y \in ℝ^{n} ∣ x - y \in supp (g)} = x - supp (g)$ . Hence for every $x$ the value $F (𝖳_{x} 𝖯 g)$ is well defined for $F \in 𝒟^{'} (Ω)$ . Since continuous functions are uniformly continuous on compact sets, the map $x \mapsto 𝖳_{x} 𝖯 g$ is continuous with respect to the seminorms $∥ \cdot ∥_{K, 0}$ . Furthermore, the same holds for the seminorms $∥ \cdot ∥_{K, α}$ since $\frac{𝖳_{x + 𝜖h} - 𝖳_{x}}{𝜖} g = 𝖳_{x} \frac{𝖳_{𝜖h} - 1 l}{𝜖} g$ converges in the limit $𝜖 \to 0$ for all $g \in C_{0}^{\infty} (ℝ^{n})$ uniformly on $ℝ^{n}$ to $𝖳_{x} (\sum_{i = 1}^{n} - h_{i} \partial_{i} g)$ . This shows $x \mapsto F (𝖳_{x} 𝖯 g) \in C^{\infty} (ℝ^{n})$ for $F \in 𝒟^{'} (ℝ^{n})$ .

Next we show this smooth function corresponds to the distribution $g * F$ we defined immediately before the lemma. For any $ϕ \in 𝒟 (ℝ^{n})$ appropriate Riemann sums define a sequence of finite linear combinations of functions in ${𝖳_{x} 𝖯 g \in C_{0}^{\infty} (ℝ^{n}) ∣ x \in supp (ϕ)}$ , which converges with respect to $∥ \cdot ∥_{K, α}$ to $\int_{ℝ^{n}} 𝖳_{x} 𝖯 𝑔𝜙 (x) dn x$ . Hence the linearity and continuity of $F$ gives

\int_{ℝ^{n}} (g * F) (x) ϕ (x) dn x = \int_{ℝ^{n}} F (𝖳_{x} 𝖯 g) ϕ (x) dn x = F (\int_{ℝ^{n}} 𝖳_{x} 𝖯 𝑔𝜙 (x) dn x) = F (𝖯 g * ϕ) .

Finally, we consider the support. If $F (𝖳_{x} 𝖯 g) \neq 0$ , then $g (x - y) \neq 0$ for an element $y \in supp F$ . Hence $x = y + (x - y) \subset supp F + supp g$ and $supp (x \mapsto F (𝖳_{x} 𝖯 g)) \subset supp F + supp g$ . □

This Lemma implies that even the convolution of a distribution

F \in 𝒟^{'} (ℝ^{n})

with a distribution

G \in 𝒟^{'} (ℝ^{n})

with compact support

supp G

is a well defined distribution:

In particular, we can convolve any distribution with the

δ

-distribution. Remarkably this returns the same distribution, i.e.

F * δ = F

(Exercise). We say that

δ

is the identity element or neutral element of convolution.

Further details of the theory of distributions can be found in the short and lucid first chapter of the book of Lars Hörmander: “Linear Partial Differential Operators”.

Chapter 2
General Concepts

2.1 Types of Second Order PDEs

2.2 The Questions

2.3 Divergence Theorem

2.4 Distributions

Chapter 2General Concepts

2.1 Types of Second Order PDEs

2.2 The Questions

2.3 Divergence Theorem

2.4 Distributions

Chapter 2
General Concepts