Introduction to multivariable calculus
This page is print-friendly. Simply press Ctrl + P (or Command + P if you use a Mac) to print the page and download it as a PDF.
Table of contents
Note: it is highly recommended to navigate by clicking links in the table of contents! It means you can use the back button in your browser to go back to any section you were reading, so you can jump back and forth between sections!
This is a guide to multivariable calculus from its fundamentals. We will cover differentiation of multivariable functions, the gradient, divergence, and curl operators, as well as integration in multiple dimensions, on a beginner's level. It is highly recommended to look over this guide before (or at the same time as) learning any advanced topics in math and physics.
These notes are shared with the permission of Dr. Elizabeth Brown of Rensselaer Polytechnic Institute., to whom much is appreciated.
Multivariable functions
A multivariable function is a function of two or more variables. For instance, is a function of two variables and is a function of three variables, so both can be considered multivariable function. Meanwhile, is a function of just one variable, and therefore is not a multivariable function.
The set of on which a function of variables is defined is its domain, which are the valid inputs to the function. The set of points that can be obtained by evaluating the function is its range, which are valid outputs to the function.
As an example, consider the function , which can be rewritten . Such a function is not defined for which can also be written . Therefore its domain is . Visually speaking, this is all points in the XY plane inside or on a circle of radius because . Meanwhile, due to its restricted domain, can only output values between 2 (for ) and 0 (for ). Therefore the range is .
We may plot a function of two variables in 3D space, such that . From doing so, we create a surface. This is usually done by computer, and is shown below:

Created with Math3D
Instead a surface plot, we may alternatively make various 2D representations of functions of 2 variables. To do so, we first find the horizontal traces (also called isolines), which are all curves in which where is a constant. This results in an implicit equation that can be solved to result in the equation of a curve in one variable. Doing this for equally spaced results in a contour map, as shown:
We can also plot values of for values of constant or constant to find vertical traces, curves of constant value in one plane. So as a summary:
- Horizontal traces are curves of constant height, found by solving for where is a specified constant
- Vertical traces are curves of constant or , found by solving or
- Level curves result from finding multiple horizontal traces for equally-spaced
- Contour maps (occasionally called isoline plots) are the result of level curves drawn on the XY plane
What about for function of more than two variables? With few exceptions, functions of three variables cannot be directly plotted in 3D space because they would require 4D space to plot. However, we can plot level surfaces (also called isosurfaces), which are partially-transparent surfaces of constant value just like level curves, like these:
Functions of more than three variables cannot be plotted at all through conventional means, but specialized data visualization methods can be used to visualize them. However, even if visualization is not possible, the methods of calculus can still be used to analyze them.
Partial derivatives
Partial derivatives are the equivalent of ordinary derivatives, but for multivariable functions. They are found by treating all variables not differentiated with respect to as constant. Formally, they are defined using limits just as ordinary derivatives are, and notated with script rather than the normal . For a function , the respective definitions of the partial derivatives with respect to and of are:
The geometric interpretation is that is the derivative with respect to of the trace curve in the XZ plane, and is the derrivative with respect to of the trace curve in the XY plane.
Partial derivative notation
The Leibniz-style notation can be alternatively written as and second derivatives are also possible such as for the second partial derivative with respect to . We may also consider cases in which we take the partial derivative with respect to one variable, then another, and in such cases we go from right to left. As an example means to take the partial derivative with respect to first, and then after that. A way to remember this is that you're applying nested derivative operators, that is, and therefore if the brackets are removed, the differentiated variables go from right to left.
There is an alternate notation for partial derivatives, known as the prime notation. In this notation, is the notation for the partial derivative with respect to and analagously is the notation for the partial derivatives with respect to . Unlike Leibnitz notation, prime partial derivative notation reads from left to right. That is, means to take the partial derivative with respect to and then to .
In addition, especially in advanced physics, partial derivatives can be denoted by the shorthand and . Then we may write for the second partial derivative with respect to the variable , or for the nth-partial derivative with respect to the variable . This is not as common in mathematics and generally this will not be used here.
There are situations in which one notation is preferred over the other for convenience. The choice ultimately does not matter as they do not change the underlying mathematics.
A worked example for taking a partial derivatives
Consider the function . We wish to compute the (first) partial derivatives with respect to and . We may use the sum/difference rule in differentiation to first simplify to:
Then we may begin to take the partial derivatives with respect to each variable. For the term we treat y as if it was a constant, so we can factor it out via the constant coefficient rule, resulting in . We may then differentiate as usual, resulting in (as the derivative of is itself). For the term , we notice that does not depend on , therefore its partial derivative is zero. Following the same technique for the partial derivatives with respect to , the solutions are given by:
The gradient operator
In many cases, we wish to find a vector containing all the partial derivatives of a function. The operator that does this is called the gradient operator . Applying the gradient operator on a function becomes:
The gradient vector points in the direction of steepest ascent of .
The product rule for partial derivatives
The product rule holds true for partial derivatives with minimal modifications. For a function , the product rule takes the form:
The chain rule for partial derivatives
The chain rule for taking partial derivatives of composite functions also applies for partial derivatives. However, an important thing to remember is that you must sum the derivatives of all intermediary variables. For instance, consider the function where and are functions of and , that is, and . Then the partial derivatives of with respect to and are respectively given by:
Notice that for both partial derivatives we must include all the "middle variables" ( in the first case and in the second case) that depends on.
The chain rule for paths
Consider a path . A function that depends on the path , that is, , may be differentiated with respect to via , as the expansion becomes:
Clairaut's Theorem
A very useful tip is that taking partial derivatives is commutative: the order in which you take mixed partial derivatives does not matter:
In prime notation (remember prime notation goes left to right instead of Leibnitz which goes from right to left) we can write the same as . In addition, this holds true in the general case to any number of mixed partial derivatives. Sometimes it is easier to differentiate with respect to one variable than to another variable so this rule can become very useful.
Applications of partial derivatives
Linear approximations
A complex function may be approximated in the neighborhood of the point by a tangent plane, just like a function of one variable can be approximated in the neighborhood of a point by a tangent line. To compute the tangent plane, we first define two tangent vectors to the linear traces as follows:
The normal vector becomes which can be found by calculating the below determinant:
Therefore, the normal vector is given by:
And the equation of the tangent plane is given by where , assuming that is smooth (and thus continuous). We may also also write the equation of the tangent plane as an implicit equation as .
When do we actually use the linear approximation? We use the linear approximation for many applications in physics where we only consider local variations from a point for a function of several variables. For instance, such an approximation may be used to calculate the value of the pressure from a known pressure at a point and given the pressure as a function of and . In addition, linearity and linearizing functions is very important for solving differential equations in mathematics and physics, where a linear approximation allows the problem to be tractable so that an analytical (exact) solution may be found within a local region.
More on linear approximations
The total change in a function from to is given by the linear approximation shown below:
The infinitesimal version becomes:
For a point near a known value of a function and in which the derivatives of a function are known, this can help us evaluate a function, which is very useful in evaluating square roots (for instance, evaluating by setting as we know the value of ), trigonometric functions, and other transcendentals.
Surfaces formed by a function of two variables
We may write a function of two variables as a surface with the explicit equation , or the implicit equation . The surface is distinct from the function; it is the shape in 3D space produced by the function.
We must distinguish carefully between gradient of a function and the gradient of a surface formed by a function. The gradient of a function takes the form:
As such, it is a 2D vector (only and components with no -component) that points in the direction of the greatest increase of .
By contrast, the gradient of the surface formed by a function, defined by has the different form:
The gradient of the surface is a 3D vector (that has all three of components) and is always normal to the tangent plane at a point of the function.
Optimization in multiple variables
We would like to be able to optimize multivariable functions, just like we can optimize functions of a single variable. To do so, we can use the generalization of Fermat's theorem. But let's first review its simpler case, for functions of one variable; it is given as follows:
Fermat's theorem in one variable: If a function has a local extrenum (maximum or minimum) at , then or (does not exist). We call a critical point of .
Be careful! Note that the converse is not true: or does not necessarily mean that there is a local extrenum.
In similar fashion to single-variable calculus, we can define what a local minimum and local maximum means for a function of several variables. The formal definitions are as follows, although they are not necessary for actually finding the local maxima and minima:
Definition: A local minimum of a function of several variables as the point such that for all in a region , where is the domain of the function, and is a shorthand notation for the function.
Definition: A local maximum of as the point such that for all in a region . Global minima and maxima have almost exactly the same definition other than the fact that must extend to the bounds of the domain.
Intuitively, if we draw out a multivariable function as a surface, then the minima look just like valleys, and the maxima look just like hilltops. To find the maxima and minima (and saddle points), we can use the generalization of Fermat's theorem:
Fermat's theorem in one variable: If a function of several variables has a local extrenum at , then or . Thus we say is a critical point.
But simply finding the critical point(s) is insufficient for determining local extrema. We must either test all points within a radial region (e.g. open disk for a function of two variables) around the point (just like the first derivative test) or we must apply the second derivative test.
The second derivative test determines whether a critical point is a local minimum, local maximum, or saddle point. To state the second derivative test, we must first define a function that is given by the following:
Here is called the discriminant and represents the a second-order polynomial centered at , and is the determinant of the Hessian, a matrix made of all the second derivatives of a function. The conditions are as follows:
- If then is either a local maximum or minimum
- If in addition, (second partial derivative with respect to evaluated at is greater than zero), then the function is concave up at and thus is a local minimum
- If in addition, (second partial derivative with respect to evaluated at is less than zero), then the function is concave up at and thus is a local maximum
- If then is a saddle point (inflection point)
- If then the test is inconclusive and another test must be used instead
Multivariable extreme value theorem
For functions of several variables, the extreme value theorem from single-variable calculus takes a more general form:
Extreme value theorem in several variables: Given a closed domain of a function with absolute maximum located at and absolute minimum on , the extreme values of are and . That is to say, there are three conditions that must be satisfied to guarantee the existence of at least one minimum and one maximum: continuity, boundedness, and a closed domain. If any of the three are not satisfied, the extreme value theorem does not hold.
To additionally guarantee the existence of an absolute extrema, a point must satisfy the extreme value theorem and take on a greater (for maxima) or smaller (for minima) value than the boundary point. It is necessary to find all critical points of and evaluate on each of them; the critical point that yields the largest value of when evaluated is the absolute maximum and the critical point that yields the smallest value of when evaluated is the absolute minimum. Remember that the absolute maximum and absolute minimum may not be a critical point; it may also be one of the boundary points. Therefore, comparing against all boundary points is essential to finding the true absolute maxima and minima.
Optimization via Lagrange multipliers
In the restricted case where a constraint is applied to a multivariable function, we may use an alternative method called Lagrange multipliers for optimization. This may be, for instance, a multivariable function subject to the following condition:
A feature of this system is that the we may treat the constraint curve as a slice along , i.e. a level curve. The gradient of evaluated at a point always follows the direction of maximum increase of , and is normal to its surface. The gradient of is by definition perpendicular to the level curves, and all level curves run tangent to the function, meaning it is also normal to 's surface. This means that the gradient of is always parallel to the gradient of and thus the two gradients must be scalar multiples of each other. Thus we have the Lagrange condition:
By expansion for the gradient in the case of a function of two variables (and this may be generalized to functions of any number of variables), we have a series of three simultaneous equations that we may evaluate to find the critical points:
All solution points that satisfy the Lagrange system of equations are critical points. We may then compare all critical points to obtain the absolute minimum and maximum.
As an example, let us find the absolute minimum and maximum of on a unit circle, that is, where following the equation of a circle placed in the form . To do so, we find that the partial derivatives are and . After substitution into the Lagrange multiplier equations previously stated, we find:
Thus from the first equation we have and from the second we have . We then substitute into both equations, then subsitute into both equations, to get all solutions. From this way, we find that . Thus, we have the four critical points .
We must now evaluate at the critical points. Doing so, we have . Therefore, the maximum value of given the constraint is and the minimum value of given the constraint is .
This demonstrative example is the simplest case; there may be more than one constraint, and a function of more than two variables. Thus the general Lagrange multiplier equations become:
Multiple integration
In single-variable calculus, we integrate to find the sum of a continuous quantity, such as the area under a curve, the displacement from a velocity-time graph, or the surface area of a surface of rotation. In multivariable calculus, we generalize this concept to multiple dimensions, where we can define several types of multivariate integrals.
Double integrals
A double integral, the first type of multivariate integral, is used in any situation where one needs to perform integration over an area. For instance, one can find the volume under the surface by integrating a function over the domain :
Or the volume between two surfaces and where for all as:
One may also find the area over an irregular region by performing the double integral shown:
To evaluate a double integral, we must first check that the domain is closed (i.e. has a continuous boundary) and bounded (i.e. not infinite). Then we define the domain as either a vertically-simple or horizontally-simple region. A vertically-simple region is a region between constant vertical bounds (bounds at constant x-values). For instance, the region bounded by is a vertically-simple region, because its vertical boundaries and are at constant x-values. However, the region bounded by is not vertically-simple, because its right vertical bound is not at a constant x-value. It may be helpful to visualize a vertically-simple region as a region between two "walls" on the grid.
Meanwhile, a horizontally-simple region is a region between constant horizontal bounds (bounds at constant y-values). For instance, the region bounded by is a horizontally-simple region, because its horizontal boundaries and are constant; the region bounded by is however not horizontally simple.
In the special case that the region is both vertically and horizontally simple, a region is often notated . This means that the integral must be performed with bounds in of and bounds in of .
Integration is usually done after a region is checked to be either vertically-simple or horizontally simple, as otherwise the integral cannot be computable. For all non-simple regions, it is necessary to divide these regions into several simple vertically and horizontally simple regions, after finding the intersection points, and then integrate.
Fubini's theorem
After preparing and analyzing the domain, we may use Fubini's theorem to solve a double integral. Fubini's theorem says that a double integral over the region can be evaluated in any of the below ways:
Notice how the bounds "sandwich" the integrand inside, and how we can change the order of integration by switching the bounds. Evaluating the resultant iterated integral becomes then a matter of taking the partial integral (integrating while treating all other variables as constant) multiple times until the final answer is reached.
It is also possible for double integrals to have variable bounds. However, note that the bounds and the integration variable must be different. For instance, these two forms are valid:
For the left integral, the variable bound is in and integrated with respect to ; for the right integral, the variable bound is in and is integrated with respect to . Thus both are perfectly valid. However, the below integral would be invalid:
This is not possible because an integral cannot be integrated over bounds with the same variable as the integration variable. Therefore, the order of integration and bounds must be carefully chosen to ensure that the integral makes sense. A careful choice of the bounds, which may include switching the bounds, may make a very complex or even impossible integral tractable.
Double integrals in polar coordinates
We often want to find a double integral in polar coordinates when we spot an integral over a function that exhibits radial symmetry - for instance, which is equivalent to . In this case, we may use the polar coordinates transformation to evaluate the double integral:
Triple integrals
We may extend the same concept of a double integral into integrating over a 3D region (volume) in space for a function of three variables. This gives us a triple integral. The triple integral can be used to find the volume of a 3D region:
It can also be used to find the total of some quantity distributed throughout space, for which we write:
A triple integral is evaluated using much the same way as a double integral, using Fubini's theorem. That is, for a triple integral defined over we have:
Or any of the different orderings possible by switching the order of integration. A triple integral may sometimes be simplified if we know that the bounds in are and where . In this case, we say that the domain is a z-simple region and we may rewrite the triple integral in terms of a double integral as follows:
Triple integrals in cylindrical and spherical coordinates
We may define alternate coordinate systems in addition to Cartesian coordinates. For instance, we can use spherical coordinates where is the radial distance, is the polar angle measuring rotation up and down the XY plane, and is the azimuthal angle measuring the rotation around XY plane. We often also refer to the XY plane as the equatorial. A diagram of the spherical coordinates system is shown as follows:
Credit: Wikipedia
Note: this is the physics convention. Mathematicians often use the alternate convention rather than , in which and are swapped. Additionally, the letter is often written as (they are both the same greek letter, just rendered in different styles).
The coordinate transformations for to are as follows:
A volume element in spherical coordinates (using the physics convention), would be an infinitesimal cube with side lengths , , and . So the volume element is .
We may also use a cylindrical coordinate system where we use the coordinates . In this case, the coordinate transforms would be:
A volume element in cylindrical coordinates - which can be thought of as a infinitesimal volume cube, again - would then have the side lengths , , and . So the volume element is .
Vector multivariable calculus
Recall the gradient operator is the vector of derivative operators on a multivariable scalar function:
We can now define a vector multivariable function . This is often known as a vector field. A vector field is a quantity that extends across all space that returns a vector for every point in space. We may now define several differential operators on vector fields.
First, we have the divergence, given by:
The notation is suggestive of the actual definition of the divergence:
The divergence can be interpreted as a measure of the spread of a vector field. A positive divergence means that the vector field is spreading outwards; a negative divergence means that the vector field is contracting inwards. A zero divergence means that the vector field is neither spreading or contracting (or that the amount of spreading and contracting cancel each other out).
Second, we have the curl, given by:
The curl in three dimensions is defined by:
In two dimensions this reduces to:
The interpretation of the curl is that it describes the rotational tendency of a vector field. A nonzero curl means that a vector field will tend to make an object placed within it spin (rotate), with the direction of the rotation based on the. A zero curl means that the vector field is irrotational.
A vector field can be described as either conservative or non-conservative. If a vector field has a nonzero curl it is guaranteed to be non-conservative (but zero curl does not always imply a field is conservative). A conservative field can be written in the form where is called a potential field.
The descriptive term conservative arises from physics, where conservative vector fields obey the conservation of energy (i.e. that energy can never be created or destroyed). All fundamental fields of nature are conservative, but some non-fundamental fields are non-conservative.
Line integrals
We have already seen integration over an area in the form of area integrals, which we evaluated as double integrals. We may also define integration over a curve. Where is this useful? We can use a scalar line integral for calculating some quantity of a curved thin object, such as a wire, a spring, or a cable; for instance, the total charge from the charge density, total mass from the mass density, etc. of a wire. Line integrals are also used extensively in physics for defining the energy transferred by a force along a path, among numerous other applications.
Note for the advanced reader: In addition, in complex analysis, line integrals have very specific uses that are important for contour integration, which is important for evaluating some very complex integrals, especially those that appear in quantum field theory.
We denote a line integral over a scalar function as:
Here, is the line element (a very tiny amount of arc length) and is the curve that is integrated over. Recall that is the speed, given by:
A line integral over a vector-valued function can similarly be defined, only with a dot product rather than a regular product:
We must parametrize to (and similarly parametrize into for vector-valued functions) to actually evaluate line integrals. That is to say, we must convert the equation of a curve to a parametric equation. We can then evaluate the line integral to find its solution. For instance, if we were performing a line integral over a circular loop (a common problem in the physics describing a wire-carrying current), we may choose the parametrization:
For which we may then find and . In a similar fashion, we may define a 3D helix:
Or a quadratic curve:
Concluding notes
While it may seem like we have covered a lot, our exploration of multivariable calculus is in fact only the beginning. Multivariable calculus encompasses a very broad range of topics, and generalizes naturally to vector calculus, as well as more advanced applications of calculus, such as tensor calculus and the calculus of variations, as well as the study of partial differential equations. For those readers interested in more advanced topics, please feel free to read the vector calculus & advanced topics in calculus guide as well as the introductory guide to partial differential equations to continue your journey of learning calculus.
Show table of contents Back to all notes