For PDEs of order greater than one, there does not exists a general theory. Over the time there have been discovered different methods to solve several PDEs, in particular those PDEs which show up in physics. Afterwards these methods were extended to larger and larger classes of PDEs. It turned out that the successful methods of solving PDEs differ from each other substantially. As a result there does not exists one unified theory of PDEs, but there exist several islands of well understood families of PDEs inside the large set of all PDEs. It was Jacobi who formulated in his lectures on Dynamics in the years 1842β43 the following general recipe:
βThe main obstacle for the integration of a given differential equations lies in the definition of adapted variables, for which there is no general rule. For this reason we should reverse the direction of our investigation and should endeavour to find, for a successful substitution, other problems which might be solved by the same.β
The strategy is to determine for any successful method all PDEs which can be solved by this method. We have seen that the method of characteristics is a more-or-less general method to solve first order PDEs. Now we investigate the second order PDEs. In this lecture we consider only second order linear PDEs. A general second order linear PDE has the following form
By Schwarzβs Theorem for twice differentiable this expression does not change if we replace by . So we always assume that is symmetric.
It is a theorem of linear algebra that a real symmetric matrix can be diagonalised (by an orthogonal matrix even) and the eigenvalues are all real. We use the signs of these eigenvalues as our classification.
Definition 2.1. On an open domain a partial differential operator
with symmetric coefficients is called
This is not a complete classification, in the sense that there are PDEs which are not of these three types. For example, the matrix of a PDE may have several positive, zero, and negative eigenvalues. If is not constant, then of course the eigenvalues are functions of and may be different at different points. An operator might only be elliptic on a subdomain of . And this definition only applies to second order linear PDEs. None-the-less we will concentrate on PDEs of these three types; this is an introductory course.
Elliptic PDEs. As we have just stated, for an elliptic PDE the eigenvalues of all have the same sign. Any solution of is also a solution of , so we can always arrange for the eigenvalues to be positive. There is an equivalent condition that is more often taken as the definition of ellipticity: the operator is elliptic if and only if
To see this equivalence, write for an orthogonal matrix and the eigenvalues. Observe
so satisfiesΒ (2.1) if and only if does. Now let be the th standard basis vector. This yields . HenceΒ (2.1) holds if and only if all the eigenvalues are positive.
Now we consider some concrete examples. If the matrix is the identity matrix and , then this is the
Laplace equation. 
Solutions of the Laplace equation are called harmonic functions. In ChapterΒ 3 we present
several tools which establish many properties of these harmonic functions. It turns out
that many properties of the harmonic functions also apply to general solutions of
, if the
matrix 
is positive (or negative) definite. There has been done a lot of work to extend these tools to
larger and larger classes of PDEs. One of the results is that the influence of the higher order
derivatives on the properties of solutions is much more important than the influence of the
lower order derivatives. We offer another lecture which presents many of these tools for such
elliptic second order PDEs.
There are also non-linear PDEs to which these methods of elliptic PDEs apply. An important example whose investigation played a major role in the development of the elliptic theory is the
Minimal surface equation. 
open.
The graphs of solutions describe so called minimal surfaces. The area of such hypersurfaces in
 does
not change with respect to infinitesimal variations. Soap bubbles are examples of such minimal
surfaces. The boundary value problem of the minimal surface equation is called Plateauβs
problem. For the first proof of the existence of solutions of this Plateau problem in the 1930s,
Jesse Douglas received the first Fieldβs Medal. In this non-linear second order PDE the
coefficients of the second derivatives also depend on the solution. A lot of work has been done
to extend the tools of elliptic theory to elliptic PDEs whose coefficients belong to
larger and larger functions spaces. This development induced the introduction of
many new function spaces. In SectionΒ 2.4 we shall introduce the so called space of
distributions. Many of the more advanced functions spaces are special subsets of the
distributions.
Parabolic PDEs. For these linear PDEs the matrix considered as a symmetric bilinear form is only semi-definite and they belong to the boundary of the class of elliptic PDEs. Most of the methods of elliptic PDEs have an extension to this limiting case. So these limiting cases together with the class of elliptic PDEs form some extended class of elliptic PDEs. In spite of the deep relationship to the elliptic PDEs these equations have their own label. The simplest example is the:
Heat equation. 
These parabolic PDEs describe diffusion processes. These are processes which level
inhomogeneities of some quantity by some flow along the negative gradient of the quantity. A
typical example for this quantity is the temperature from which the name for the heat equation
originates. Many stochastic processes have this property. So the theory of parabolic PDEs has a
deep relationship to the theory of stochastic processes. In this lecture we present in ChapterΒ 4
this simplest example of linear parabolic PDE. We shall see how the tools for the
Laplace equation can be applied in modified form to this heat equation. In case of the
parabolic PDEs there also exists a non-linear example from the geometric analysis, whose
investigation played a major role for the development of the elliptic theory (the tensor fields
 and
 are
defined below):
Ricci Flow. 
This PDE describes a diffusion-like process on Riemannian manifolds. It
levels the inhomogeneities of the metric, namely the Riemannian metric
. In the
                                                                                       
                                                                                       
long run the corresponding Riemannian manifolds converge to metric spaces with large
symmetry groups. Richard Hamilton proposed (in the 1970s) a program that aims to
prove the geometrization conjecture of Thurston with the help of these PDEs. It
states that every three-dimensional manifold can be split into parts, which can be
endowed with an Riemannian metric such that the isometry group acts transitively. This
conjecture implies the Poincare conjecture, which states that every simply connected
compact manifold is the 3-sphere. Hamilton tries to control the long time limit of the
Ricci flow on a general 3-dimensional Riemannian manifold. In 2003 the Russian
mathematician Grisha Perelman published on the internet three articles which overcome the
last obstacle of this program. This lead to the first proof of one of the Millennium
Problems of the American Mathematical Society and was a great success of geometric
analysis.
Hyperbolic PDEs. Besides the elliptic PDEs (including the limiting cases) the other important class of linear PDEs are called hyperbolic. In this case the matrix has one eigenvalue of opposite sign than all other eigenvalues. The simplest example is the
Wave equation. 
In ChapterΒ 5 we present the methods how to solve this equation. We shall see that it
describes the propagation of waves with constant finite speed. The solutions of general
hyperbolic equations are similar to the solutions of this case, and many tools can
be generalised to all hyperbolic PDEs. The investigation of these PDEs depend on
the understanding of all trajectories, which propagate by the given speed. It was
motivated by the theory of the electrodynamic fields, whose main system of PDEs are
the
Maxwell equations. 
In this theory there is given a distribution of charges
 and currents
 on space time
. The unknown functions are
the electric magnetic fields 
and ,
which describe the electrodynamic forces induced by the given distributions of charges and currents
 and
. The
conservation of charge is formulated in the same way as in the scalar conservation law. So the change
of the total charge contained in a spatial domain is described by the flux of the current through
the boundary of the domain. By the divergence theorem this means that distributions of charge
                                                                                       
                                                                                       
 and
currents 
obey
Again there exists a non-linear version which stimulated the development of the theory:
Einsteins field equations of general relativity.
Here for a given distribution of masses the energy stress tensor and the space time metric
 are the
unknown functions. This metric is a symmetric bilinear form with one positive and three negative
eigenvalues on the tangent space of space time. The corresponding Ricci curvature is denoted by
 and the scalar
curvature by :
Integrable Systems with Lax operators. Finally I want to mention a smaller class of PDEs, which are the main objects of my research. They are non-linear PDEs which describe an evolution with respect to time which is very stable. This means that the solutions have in a specific sense a maximal number of conserved quantities. The theory of integrable systems belongs to the field of Hamiltonian mechanics, which originated from Newtons description of the motion of the planets. The Scottish Lord John Scott Russell got very excited in 1934 about the observation of an solitary wave in a Scottish channel and published a βReport on Wavesβ. This report was quite influential. The two Dutch mathematicians Korteweg and De Vries translated his observation into a PDE describing the profile of water waves travelling along the channel:
Korteweg-de-Vries equation. 
First by numerical experiments in the 1950s with the first computers and latter in
the 1970s by mathematical theory, the solutions of this PDE were shown to have
exactly the properties which made Lord Russell so exited: they describe waves which
propagate through each other without changing their shape. This lead to the discovery of
an hidden relation of the theory of integrable systems with the theory of Riemann
surfaces, which is another field with a long history. A major step towards the discovery
of this relation was the observation of Peter Lax that this equation can be written
as
In addition to the types of PDEs we will study, let us explain some of the questions that we are interested in answering. Broadly speaking they are existence, uniqueness, and regularity.
The first two you probably have some experience with, so let us begin with the third. The regularity of a solution of a differential equation refers to its properties. Most often this is its differentiability, for example, twice continuously differentiable. But it can also be about boundedness, integrability, or the growth/decay rate of the function. These regularity properties are usually expressed in the form of a function space, e.g. .
Many types of regularity can be ordered in a hierarchy. Later in this chapter we will introduce distributions. We say have the lowest regularity. Then comes the measurable functions. Some measurable functions are locally integrable, . A function belongs to if for every compact set in its domain has a finite integral. You might also know the Lebesgue norms. The elements of describe ever smaller families of functions, whose regularity increase with . The next smallest class are Sobolev functions whose -th order partial derivatives belong to . The regularity further increases for the functions in . Finally we end with the smooth functions and the analytic functions with the highest regularity.
Consider the analogy to the algebraic equation . This has no solutions but two solutions . Likewise, the number of solutions to a PDE can change depending on which functions we are considering. To be concrete, perhaps there are many solutions, but no bounded solutions. There is usually a natural level of regularity to require of a solution: the solution to a second order PDE should be twice differentiable. But as we have seen in the previous chapter, it is sometimes necessary to change the meaning of βsolutionβ and consider βnon-differentiable solutionsβ (weak solutions) to a PDE, even if that sounds like a contradiction.
Sometimes allowing lower regularity doesnβt increase the number of solutions of a differential equation. A classic example from ODE theory is . Suppose is a differentiable solution to this equation. But then tells us that is equal to a differentiable function. That means that is differentiable, i.e.Β that is twice differentiable. Repeating this argument, we see that is infinitely many times differentiable (smooth). We say that the solution has higher regularity than expected. The Laplace and heat equations both exhibit highly regular solutions.
A problem is a differential equation on some domain together with some additional (non PDE) conditions. A typical ODE has infinitely many solutions, but a typical partial differential equations has an infinite dimensional space of solutions. The idea is to give the right additional conditions so that the problem has a unique solution. A solution of an ordinary differential equations of -th order is in many cases uniquely determined by fixing the values of the first derivatives at . For partial differential equations the solutions are functions on higher dimensional domains . A natural condition is the specification of the values of the solution and some of its derivatives on the boundary of the domain or on a hypersurface within the domain. The search for solutions which obey this further specification are called boundary value problems. When one of the variables represents time and we give conditions at , naturally we call this an initial value problem.
We are most interested in problems that are well-posed. This means that (1) the problem has a solution, (2) the solution is unique, and (3) the solution depends continuously on the data. We have to balance the choice of regularity and how much data we give so that the problem has a unique solution, but not too much that it doesnβt have any solution. If we determine all possible boundary values that have solutions, then the space of solutions is completely parameterised. Again to give an ODE analogy, the solutions of are , so parameterises the solutions.
Finally, there is the question of existence. This is perhaps the most fundamental question, because it is about the definition of βa solutionβ. We have already mentioned how this is affected by regularity (including weak solutions) and boundary value conditions. But proving the existence of solutions of PDEs is in general much more difficult than for ODEs, and there are not too many general theorems that we have. In fact, there is a famous example of a simple-looking PDE that does not have any solutions, not even locally. This example is a simplification by Nirenberg of an example of H.Β Lewy: there is no solution on any open subset of
where is a specially constructed smooth function. Notice in particular that this is first order PDE, with analytic coefficients and a smooth inhomogeneous term. In previous years we gave a proof of this, but it requires certain facts from complex analysis (aka Funktionstheorie) that many students didnβt understand. Interested students may ask me for a copy of the old script. Instead we will give a different example of a PDE without a solution in SectionΒ 3.5 using the techniques that fit the themes of this course.
In this section we present a generalisation of the fundamental theorem of calculus to higher dimensions, namely the divergence theorem. This theorem has many important consequences. In this section we present two: First we generalise partial integration to higher dimensions. Second we explain in which sense the higher dimensional scalar conservation law describes a conserved quantity.
The divergence theorem is a statement about the integral over a submanifold of , so naturally we should define submanifolds and their integrals.
Definition 2.2. The graph of a function is the -dimensional subset
We allow reordering of the components of . By this we mean that both and are graphs, for example.
A subset is called a -dimensional submanifold if it is a -dimensional graph locally. That means for every point there exists an open such that is a -dimensional graph.
We say that a graph or submanifold is (or some other regularity) if it is locally the graph of a function .
The classic, and for us important, example of a submanifold that is not a graph globally is the circle . This is because is not a function. However it can written as the union of four graphs
For practical calculation it is not always the best idea to reduce a submanifold to graphs. Often a parameterisation can cover more of the submanifold, which means less work.
Definition 2.3. A continuously differentiable injection is called a parameterisation of a submanifold . It is called regular if the Jacobian has rank at every point of .
The Jacobian of is an matrix, so its rank cannot be greater than . Thus a regular parameterisation is also called full-rank. A graph is a special type of parameterisation, one where of the components of are just the input variables. In other words , or some rearrangement. This is always a regular parameterisation, because contains the identity matrix. For an example of a non-regular parameterisation, consider the parameterisation of the -axis in . We see that is not really playing any role and the submanifold is only one-dimensional. This is the reason we should consider regular parameterisations.
Definition 2.4. Let be a subset with a regular parameterisation and a continuous function on . We define
The symbol can be given a formal meaning, but for us it is just a reminder that it is a βsubmanifold integralβ and not an integral on a subset of in the usual sense. The -dimensional parallelotope spanned by the column vectors of a -matrix has the volume . The motivation for the factor in the definition of the integral is that it measures the distortion of the parameterisation. This value turns out to be independent of the choice of regular parameterisation of .
Proof. is a regular parameterisation (without loss of generality we can relabel the coordinates to achieve this form). Suppose that we have another regular parameterisation . Then define . We claim that is continuously differentiable. This is not so clear, because is only defined on , not on a euclidean space, and so we canβt apply the chain rule directly. However, let be the projection . Clearly . This shows that . Therefore is another formula for . Now we can apply the chain rule and conclude is continuously differentiable.
Now we can carry out an computation that connects the two integrals
In the last step we applied the transformation formula of Jacobi. This shows that using any parameterisation gives the same result as using the graph parameterisation. β‘
This is a very practical definition in that it gives you a concrete integral to compute. However many submanifolds that we want to consider cannot be covered by a single parameterisation. The typical example is the sphere: any open set is not compact and the sphere is compact, so there cannot exist a homeomorphism between them. However if we use two parameterisations, then each can cover a part of the sphere and together they can cover the whole sphere. The trouble is now these can overlap, so if we just integrate in each parameterisation then we will βdouble-countβ the points of . The answer to this is an elegant theoretical tool, but one that is not practically useful: a so called partitions of unity.
Definition 2.6. (Partition of Unity) Let be covered by a countable family of open subsets of , i.e. . A smooth partition of unity is a countable family of smooth functions with the following properties:
Each has a neighbourhood on which all but finitely vanish identically.
For all we have .
Each vanishes outside of .
For every countable family of open subsets of there exists a smooth partition of unity. A proof can be found in many textbooks and in Prof Schmidtβs script of the lecture Analysis II.
Definition 2.7. Let be compact -dimensional submanifold. Because is compact and , only finitely many parametrisations are needed to cover it. Choose a partition of unity subordinate to . Let be a continuous function on . We define
The idea of this definition is that we can write . Then each function is zero outside of so it is only necessary to integrate it on , not on all of . We assumed that was compact so that the sum is finite and we avoid any issues of convergence. The restriction that is compact is not necessary, but then one must deal with the convergence issues.
Lemma 2.8. The integral neither depends on the choice of the partition of unity nor on the choice of the parametrizations.
Proof. Suppose that we have two covers of parameterising sets and correspondingly two partitions of unity and . Define a new cover . It has a partition of unity . Each set can be parameterised by restricting the parameterisation of to . Observe
The same calculation holds for the integral , showing that the two are equal. We have already seen that if we use two different parameterisations for the same set that the integral has the same value. Therefore we have shown that definition is independent of parameterisation and partition of unity. β‘
Integrals over submanifolds have many of the same properties as the usual integral. This is because βunder the hoodβ it is a the usual integral with a correction factor. An important property that does not carry over is the change of variables formula. Only certain changes of variables preserve the correction factor. These properties will be proved in the tutorials.
Lemma 2.9. The following properties hold for and .
Linearity: .
Order Preserving: if on then .
Triangle Inequality: .
Transformation: If is a euclidean motion (translation, reflection, rotation) and is a scaling factor then . β‘
We are almost ready to state the divergence theorem. In the divergence theorem we work with a bounded and open set . In general such sets can have very complicated boundaries, for example fractals. We will require to be a -dimensional submanifold. The idea of requiring to be bounded is that is compact. This idea appears in the proof when we say is compact.
There are three more formulas that we will need. First, just in case you missed the first tutorial, for a vector valued function , the divergence of is . The is meant to remind you of the formula for the dot product; it is not actually a dot product. Second, because a submanifold is locally a graph, it is possible to understand its geometry. In the situation of the theorem is a scalar valued function. Then are tangent vectors to the submanifold and so is perpendicular to the submanifold. Thus the unit length normal vector is
We see that it a smooth vector field, well-defined up to a choice of sign. The last formula is a simplification of the distortion factor for the graph parameterisation . We have where is the identity matrix. The following calculation makes use of the WeinsteinβAronszajn identity
Theorem 2.10. (Divergence Theorem) Let be bounded and open with being a -dimensional submanifold of . Let be continuous and differentiable on such that continuously extends to . Then we have
where is the outward-pointing normal.
Proof. First we construct a cover for . Because the boundary is a -dimensional submanifold for every point of we know that there is an open set such that . By shrinking , we can assume that is a cube. Let the cover be and all these cubes. Due to the compactness of we can find a finite subcover and choose a subordinate partition of unity. This decomposes into a finite sum . By linearity it suffices to show the statement for any individually.
This leads to two cases. For the first case, suppose we have a term corresponding to . In particular and are zero on . The right hand side of the divergence theorem is zero. By defining to be zero outside of we can extend to a continuously differentiable function on . Choose a cube which contains . By Fubini we may integrate the -th term of first over . Due to the fundamental theorem of calculus this integral is the difference of the values of at two boundary points and vanishes. For example
This shows that the left side of the divergence theorem also vanishes.
In the second case, we have a term that corresponds to a set that covers the boundary. We assumed the set was a cube, so write it as . Relabel the coordinates so that the boundary is a graph and . We use the variables and for convenience. We may assume that and are zero on and , but not that it is zero on . This is because is βinsideβ the cube and we only know that vanishes on the outside of the cube.
Again, we handle the terms of one at a time. Suppose and consider the function
It vanishes for as does its derivative
Applying the same argument as in the first case, we see that the integral of -derivative over vanishes. Therefore
Note that the signs required us to use the outward-pointing normal, which in this case means that the last component of the vector is positive.
For , we can just use the fundamental theorem of calculus on the inner integral
Summing these terms together proves the theorem. β‘
We consider now some special cases of the theorem that occur over and over in practice. For a scalar valued function the divergence theorem implies for all
For two functions and whose product vanishes on the boundary and satisfies the corresponding assumptions of the divergence theorem we obtain by the Leibniz rule
This is called integration by parts. Inductively we get for any multi-index
As a second application of the divergence theorem we can generalise the idea of the scalar conservation law to vector-valued functions. For any continuously differentiable function we call
a conservation law. For open and bounded with -dimensional submanifold of we obtain
This is the meaning of a conservation law: the change of the integral of over is equal to the integral of the flux through the boundary .
This idea also gives the following cute trick to calculate the surface area of a ball in relation to its volume. Let the volume of the -dimensional unit ball be . By scaling, the volume of the ball is . Let denote the area of . The divergence of is , so by the divergence theorem we have
In summary .
The way that the divergence theorem relates an integral over a set to an integral over its boundary can be used to decompose the set into layers. The typical example, and the only relevant example for us, is that a ball can be thought of as spheres for (the origin is measure zero and can be ignored). The layers are the level-sets of a function . For the ball , which has so the formula below simplifies further. There will also be an exercise that proves this formula for the ball directly from the definition of the submanifold integral.
Corollary 2.11 (Co-area Formula). Let be a non-negative function. Suppose for every that is a domain to which the divergence theorem applies. For any ,
Proof. Letβs make some additional definitions to simplify the working. The gradient of a function is always normal to its level set pointing in the direction of increase, so is the outward pointing unit normal of . Define , a vector valued function. In particular .
Now we can begin. By the product rule for divergence
Rearranging this and applying the divergence theorem shows
On the boundary of the function , since the set by definition. Hence on and the second term on the right is zero. The next step is a βmagic trickβ:
We want to apply Fubiniβs theorem to change the order of this double integral. But the limits of the inner integral depends on the variable of the outer integral, so first we use an indicator function to make them independent
We apply the divergence theorem one more time to get the result.
For the transport equation we developed a solution that also seems to make sense when it is not differentiable. For the scalar conservation law we saw that there were in some situations no solutions, except if we generalised the notion of solution to include discontinuous functions. The lesson we draw from these examples is that the existence and uniqueness of solutions depends on the notion of solution we use. In order to say that these solutions solve the PDE, clearly all partial derivatives of a solution which occur in the partial differential equation have to exist. The trick is to come up with a new notion of partial derivative and interpret the PDE to be about these new derivatives.
In this section we introduce distributions (also called generalised functions) and a corresponding notion of differentiation. This notion is βbackwards compatibleβ: if a differentiable function is considered as a distribution, the two types of derivatives are equal. Remarkably distributions can always be differentiated and indeed they can be differentiated infinitely many times. For this achievement we have to pay a price: these distributions cannot be multiplied with each other in general. Linear partial differential equations extend to well defined equations on such distributions. Distributions solving the linear partial differential equations are called weak solutions or solutions in the sense of distributions. There exist other notions of weak solutions which also apply to non-linear partial differential equations. The most prominent example is the notion of a Sobolev function, which are introduced in the course βPartial Differential Equationsβ, the sequel to this course. But Sobolev functions can be understood as a special type of distribution, so even if one is interested in Sobolev functions it is helpful to start with distributions.
First we need to define a special class of very well behaved functions. The support of a function is the closure of . On an open set let denote the algebra of smooth functions whose support is a compact subset of . We call these test functions and say they have compact support in , symbolically .
Within the set of test functions there are a special families that we will often use called a mollifier or approximate identity. This is a family of non-negative test functions with and . We construct a prototype: the function
is a smooth function on , has support , and is non-negative. By the way, this example shows that test functions actually exist. We can choose the constant such that its integral is . By rescaling and we obtain
which has the required properties. This particular example of a mollifier is called the standard mollifier, but for our purposes it does not matter which mollifier we use. Any such family is called an approximate identity because of the following property. Take any continuous function on and suppose . By continuity is approximately equal to on a sufficiently small ball . Therefore
In fact, as we will prove in the next lemma, this approximation becomes an equality in the limit .
Lemma 2.12. Let and be a mollifier. The family of smooth functions
converges uniformly on any compact subset of to as . For smooth functions the same holds for all derivatives of .
Proof. Choose a compact subset of . There is an such that for any point in the compact set the ball lies in . For this or smaller we have
On compact sets, continuous functions are uniformly continuous. This shows the uniform convergence .
Observe that if is smooth, then we can compute the derivatives of in the following way. Choose any point and let be small enough that . Then for all points
Therefore and the same convergence argument carries over to all partial derivatives of . β‘
The formula we see in the definition of turns out to be useful. We use it to define a type of product operator on : the convolution
This product is bilinear, commutative, and associative (Exercise). One advantage of the convolution compared to pointwise multiplication is that it behaves nicely with differentiation. There is no Leibniz rule, rather
Furthermore convolution is well-behaved with respect to integral norms, which is useful in more advanced theory. We can consider the simplest case, where integral of is the product of the integrals of and . This follows by noticing that the coordinate transformation is volume preserving, thus
Finally, we include a lemma that will be necessary later
Lemma 2.13. Suppose that and are rotationally symmetric about and respectively. This means, for example for any orthogonal transformation that . Then the convolution of and is rotationally symmetric about .
Proof. The proof is just a sequence of coordinate transformations. We begin with the definition and make the euclidean motion
It is important to see here that since is orthogonal. Now we use the orthogonal properties of and to continue
Now it is time to introduce distributions. We have seen in the previous lemma that the operation of integrating a continuous function against a test function somehow retains the information of the function. In this spirit each defines a linear map
We will see that the information of is also retained in this linear form. The idea of distributions is consider not just functions integrated against test functions, but all linear forms acting on .
There is a technical matter to discuss at this point. The set of test functions should be given a different topology than the norm topology of with the supremum norm, but this other topology is tricky and only used in a few places in this course. We use the notation for the set of test functions equipped with this topology. Instead of explaining the topology in full detail, let us give the criterion for when a sequence of test functions converges. We define for any compact subset and every multi-index the following seminorm:
We say that in if there is a compact subset such that the supports of every and are contained in and that for every multi-index (including ). This is a much stronger condition that convergence with respect to .
Definition 2.14. On an open subset the space of distributions is defined as the vector space space of all linear maps which are continuous with respect to the seminorms ; i.e.Β for each compact there exist finitely many multi indices and constants such that the following inequality holds for all test functions with compact support in :
The for distributions indicates (for the correctly defined topology) that they are the dual space of . Concretely the continuity condition yields the following convergence property for distributions: if in then the values converges to . Similarly, a sequence of distribution converges to if for all test functions .
As previously mentioned, any defines in a canonical way a distribution . Let us verify now that it really meets the definition of distribution. For any compact subset and with support we have
Let us give another example of a distribution, one that does not correspond to an element of :
Intuitively (and we will prove rigorously soon) any corresponding would vanish on and would have a total integral one. Since has measure zero such a function does not exist. Distributions that come from functions are called regular, and those that donβt are non-regular. This distribution is called Diracβs -function. We can also show that it is the limit of the sequence of distributions corresponding to the mollifier .
We now return to the question of whether the distribution retains the information of . The answer is yes.
Lemma 2.15. (Fundamental Lemma of the Calculus of Variations) If obeys for all non-negative test functions , then is non-negative almost everywhere. In particular the map , is injective.
Proof. It suffices to prove the local statement for . We extend to by setting on equal to zero. The extended function is also denoted by and belongs to . For a mollifier we have
If is the characteristic function of a rectangle, then the supremum on the right hand side converges to zero for . Due to the triangle inequality the same holds for step functions, i.e.Β finite linear combinations of such functions. Since step functions are dense in for each this supremum becomes arbitrary small for sufficiently small . Hence the family of functions converges in in the limit to .
Moreover, the functions are non-negative. This is because the mollifiers are non-negative and we can write the convolution as the action of on a test function
using the assumption on .
So it remains to show that a limit in of a sequence of non-negative functions is also non-negative. In particular there exists a sequence which converges to zero, with for all for . This ensures that the series converges in . So for almost every point the series is finite, and in particular the tail of the series converges to zero. In other words . This indeed shows that is a.e.Β non-negative.
In particular, if belongs to the kernel of , then both and are almost everywhere non-negative. So vanishes almost everywhere. β‘
Two definitions for functions carry over naturally to distributions. If then every test function on extends to a test function on . In this way we can think of any distribution on as a distribution on , which we call the restriction. For regular distributions, this is really the restriction of functions. Using restriction we can give a definition of support. The complement of the support of a distribution is the union of all sets on which the restriction vanishes. In symbols
The support of the delta distribution is , and the support of the distribution of a continuous function is its support in the normal sense.
We want to define as many operations on distributions as possible, in such a way that they extend operations on functions. Restriction and support are two examples where this is clear. The general strategy for making such definitions is to compare to where is the operation. If we can write the relation in a way that only depends on the distribution and not directly on the function, then it is suitable to make a generalised definitions. Let us consider the case of multiplication by a smooth function . Then for a regular distribution
The product of a distribution with a function is defined as
This product makes the embedding to a homomorphism of modules over the algebra . However, the product of a distribution with a non-smooth functions is not defined because then is not a test function.
So we come to the most important operation on distributions. If has a derivative, then by integration by parts we obtain
Consequently for any distribution we define the partial derivatives as
Here we see the advantage of choosing smooth test functions: test functions are always differentiable and so distributions have infinitely many derivatives. These two operations we have just defined, multiplication with a smooth function and partial differentiation, define new distributions. Clearly these new distributions are linear. We should check that they also obey the continuity condition, but we will skip this formality.
We also want to extend convolution to distributions. In order to extend it to a product between a smooth function and a distribution we calculate:
where is the point-reflection operator. Therefore we define for and
Not only is this a well-defined distribution, the result of convolution is in fact always a regular distribution that corresponds to a smooth function!
Lemma 2.16. The convolution of a test function with a distribution belongs to . It is the function
where is the translation operator. The support of is contained in the pointwise sum .
Proof. First we show that the function defined in the lemma exists and is smooth. The support of is . Hence for every the value is well defined for . Since continuous functions are uniformly continuous on compact sets, the map is continuous with respect to the seminorms . Furthermore, the same holds for the seminorms since converges in the limit for all uniformly on to . This shows for .
Next we show this smooth function corresponds to the distribution we defined immediately before the lemma. For any appropriate Riemann sums define a sequence of finite linear combinations of functions in , which converges with respect to to . Hence the linearity and continuity of gives
Finally, we consider the support. If , then for an element . Hence and . β‘
This Lemma implies that even the convolution of a distribution with a distribution with compact support is a well defined distribution:
In particular, we can convolve any distribution with the -distribution. Remarkably this returns the same distribution, i.e.Β (Exercise). We say that is the identity element or neutral element of convolution.
Further details of the theory of distributions can be found in the short and lucid first chapter of the book of Lars HΓΆrmander: βLinear Partial Differential Operatorsβ.