PETSc: An overview for PISM users {#petscID} ===== ## Preamble In v0.6, because of abstractions especially in IceModelVec and its derived classes, programmers building derived classes of IceModel now need minimal knowledge of PETSc and especially PETSc DMs. What *is* important to know is that solving the SSA in parallel is relatively deep technology. Very specifically, because effective parallel iterative linear algebra requires preconditioning, *the solution of the SSA stress balance depends on the number of processors*. This will also be true of high resolution parallel solution of all other membrane-including stress balances (Blatter, Stokes). ## Introduction The PETSc library [@ref petsc-user-ref] provides essential support for distributed arrays and linear solvers in a parallel computing environment. "PETSc" stands for "Portable, Extensible Toolkit for Scientific Computation." It is a suite of data structures and routines in C for the scalable parallel solution of partial differential equations and their scientific applications. Large parts of PETSc relate especially to finite-difference-type regular, rectangular grids but PETSc has been used for unstructured grids as well. PETSc employs the MPI standard for all message-passing communication but it is deliberately at a higher level of abstraction than MPI. We find that using PETSc protects the programmer from explicit consideration of message passing about 90% of the time. Documentation for PETSc is at http://www.mcs.anl.gov/petsc/. PISM is a C++ program and PETSc is a C library, but all PISM calls to PETSc use the PETSc C API, not the C++ API. ### PETSc types Most important variables deep inside PISM are of these PETSc types: - `DM` - `Vec` - `Mat` - `KSP` In fact most of the PETSc types merely declare pointers, as these are abstract data types. Variables must be created with calls to functions like `DMDACreate2d()`, `VecCreate()`, etc., and destroyed when no longer needed. These concepts are largely buried inside IceGrid (esp. DM type) and IceModelVec (esp. Vec type). But Mat and KSP types are exposed in the IceModel code which solves the SSA. ### Distributed arrays and vectors PETSc has an abstract date type called a Distributed Array, which is a kind of a Distributed Mesh (type `DM`). `DMs` contain information about the grid and stencil. They set up *ghosted* values which are intended to duplicate the values on neighboring processors so communication can be done in big batches. Vectors (type `Vec`) are created *using* a `DM` with `DMCreateLocalVector()` and similar procedures. These vectors are distributed across the processors according to the information in the `DM`. Just for emphasis: *PISM distributes itself across the processors by calling PETSc procedures which create a `DM`*. There are two parameters of note when creating a `DMDA`: *stencil type* and *stencil width.* The stencil types are `DMDA_STENCIL_STAR` and `DMDA_STENCIL_BOX`. They are generalizations of the five point and nine point stencils typical of two-dimensional discretizations of second order partial derivatives. Using `DMDA_STENCIL_STAR` means that ghosted points are needed only along the coordinate axes, while `DMDA_STENCIL_BOX` indicates that ghosted points are needed in the box-shaped %region surrounding each point, and thus each processor. (On processor *N,* information owned by a neighboring processor but duplicated onto *N* is called *ghosted.* A picture is worth a thousand words here, so find the appropriate picture in the PETSc manual!) We only need `BOX` style stencils when gradient terms must be evaluated on a staggered grid (*h* in SIA regions and @f$\bar{u},\bar{v}@f$ in the computation of effective viscosity in SSA regions). Keeping all other two-dimensional vectors on a `STAR` type stencil would reduce the necessary communication slightly, but for now all two dimensional arrays are based on `BOX`. The stencil width indicates how many points in each direction will be needed. We never need a stencil width greater than 1. The three dimensional distributed arrays are distributed across processors in the same way as two-dimensional arrays, in the sense that *each column of the ice (or bedrock in that thermal model) is owned by exactly one processor*. From the point of view of parallelization, our problem is two-dimensional. All three-dimensional arrays have stencil type `STAR` in the horizontal directions. One point of confusion (we admit ...) is the redefinition of the @f$x,y,z@f$ axes. Contrary to the PETSc default, our @f$z@f$ axis changes most rapidly through memory while the @f$x@f$ axis changes most slowly. The desirable consequence, however, is that our C arrays are addressed as `u[i][j][k]` where `i,j,k` are the coordinate indices in the directions @f$x,y,z@f$ respectively. ### Accessing the processor's part of a DM-managed Vec `DM` -based vectors must be accessed by a call to `DMDAVecGetArray()` before the access and a call to `DMDAVecRestoreArray()` after. This just means that one gets a valid pointer to the actual memory, for the abstract data type `Vec`. The point has type `double**` for a 2-dimensional array and type `double***` for a 3-dimensional array. The resulting pointer can be addressed using normal (to the extent that C is "normal" in this regard ...) multidimensional array indexing. Furthermore, the indices themselves allow one to pretend that the given processor is addressing the entire array, even though only a chunk is stored on the local processor. *No message passing occurs here.* Instead, the crucial idea is that the `DM` knows what are the valid index ranges for each processor. That's why essentially all loops over two dimensional arrays in PISM look like this: for (PetscInt i = grid.xs; i < grid.xs + grid.xm; ++i) { for (PetscInt j = grid.ys; j < grid.ys + grid.ym; ++j) { ... vH(i,j) ... } } The integers `grid.xs`, `grid.xm`, `grid.ys`, `grid.ym`, are just the numbers that come from a call to `DMDAGetLocalInfo()`, just after the `DM` is created, but normally a programmer just calls IceGrid. ### "Local" versus "global" Vecs PETSc `DM` -based vectors can be "local" or "global". (This weird PETSc language should probably be translated as "has ghosts" or "sans ghosts".) "Local" vectors include space for the ghosted points. That is, when `DAVecGetArray()` is called, the resulting array can be indexed on all of the local portion owned by the processor, *plus* the ghosted points. The processor owns *copies* of the values at grid points owned by the neighboring processors, according to the stencil type and stencil width in the `DA`. The processor should treat these ghost values as "read-only", because of what happens at the communication stage (which we describe in a moment). *Example.* We create a local `Vec` `v` which we want to have `STAR` type stencil. Assuming `da` is `DM` created with `DMDA_STENCIL_STAR` then we do this to create `v` and get an array pointer `arr` into the local portion of it: DMDACreateLocalVector(da, &v); DMDAVecGetArray(da, v, &arr); Now the processor can access `arr[i][j]` for every grid.xs <= i <= grid.xs + grid.xm - 1 and grid.ys <= j <= grid.ys + grid.ym - 1 But this is just the regular (owned by the processor) portion. In addition, all of these are valid for every `i` and `j` in the above ranges: arr[i+1][j], arr[i-1][j], arr[i][j+1], arr[i][j-1]. (*If* the `da` were of type `DMDA_STENCIL_BOX` then, in addition arr[i+1][j+1], arr[i-1][j+1], arr[i+1][j-1], arr[i-1][j-1]. would all be valid.) Once the processor is done with modifying its portion of `v` (i.e. not including the ghosts, although it may read them) then we would want to update the ghost values so that all processors agree on the values at those grid points: DMDAVecRestoreArray(da, v, &arr); DMDALocalToLocalBegin(da, v, INSERT_VALUES, v); DMDALocalToLocalEnd(da, v, INSERT_VALUES, v); (Note we release the array pointer before the communication stage, although that may not be essential.) `DMDALocalToLocalBegin()` and then `DMDALocalToLocalEnd()` are called to update (communicate) the ghosted values before the next stage at which they will be needed. This pair of commands perform a stage of actual (MPI) message passing. Only enough values are passed around to update the ghosts. The size of the messages is relatively small, and *latency is more important (in slowing performance) than bandwidth*. The significance of `STAR` versus `BOX` is not in the size of the extra memory, but in the extra messages which must be passed to the "diagonal" neighboring processors. Global vectors do not hold ghosted values. Certain array operations are more appropriate to these kind of vectors, like viewing arrays in graphics windows. (Of course, viewing an array distributed across all the processors requires message passing. I think PETSc sends all the values to processor 0 to display then.) In any case, when ghosts are not needed there is no need to allocate space for them, and a "global" `Vec` is appropriate. Local vectors typically need to be mapped to global vectors before viewing or using in a linear system, but this is an entirely "local" operation which does not require message passing (`DMLocalToGlobalBegin/End()`). At risk of repetition, most of the above distinction between "local" and "global" is buried in IceModelVec abstraction. ### Solving linear systems PETSc is designed for solving large, sparse systems in parallel. Iterative methods are the name of the game and especially Krylov subspace methods such as conjugate gradients and GMRES [@ref TrefethenBau, @ref Saad]. For consistency, all methods use the same Krylov-subspace-method-with-preconditioner `KSP` interface, even though some are really direct methods. The place in PISM where this is already used is in the solution of the linearized SSA velocity problem. See the documentation for SSAFD and SSAFEM in this *Manual.* One starts by declaring an object of type `KSP`; in PISM this is done in IceModel::createVecs(). Various options can be set by the program, but, if the program calls `KSPSetFromOptions()` before any linear systems are solved, which PISM does, then user options like `-ksp_type`, `-pc_type`, and `-ksp_rtol` can be read to control (respectively) which Krylov method is used, which preconditioner is used, and what relative tolerance will be used as the convergence criterion of the Krylov solver. The default is GMRES(30) with ILU preconditioning. (As a general mechanism, PETSc has an options database which holds command line options. All PISM options use this database. See IceModel::setFromOptions().) To solve the system, a matrix must be attached to the `KSP`. The first time `KSPSolve()` is called, the matrix will be factored by the preconditioner and reused when the system is called for additional right hand sides. In the case of PISM and the solution of the balance of momentum equations for the SSA, the matrix changes because the equations are nonlinear (because the effective viscosity depends on the strain rates). The default matrix format is similar to the *Matlab* `sparse` format. Each processor owns a range of rows. Elements in matrices and vectors can be set using `MatSetValues()` and `VecSetValues()`. These routines use a "global" indexing which is not based on the regular grid and stencil choices in the `DA`. One can set any value on any processor but a lot of message passing has to occur; fortunately PETSc manages that all. Values are cached until one calls `MatAssemblyBegin()` followed by `MatAssemblyEnd()` to communicate the values. ### Additional PETSc utility functions Quite ellaborate error tracing and performance monitoring is possible with PETSc. All functions return `PetscErrorCode` which should be checked by the macro `CHKERRQ()`. Therefore runtime errors normally print sufficient traceback information. If this information is not present (because the error did not get traced back), you may need to use a debugger which is accessible with the command line (PETSc) options `-start_in_debugger` and `-on_error_attach_debugger`. Option `-log_summary` is useful to get some diagnostics written to the terminal. The `PetscViewer` interface allows PETSc objects to be displayed. One can "view" a `Vec` to a graphical (X) window, but the "view" can be saving the `Vec` to a binary file on disk, in plain text to the terminal (standard out), or even by sending to a running instance of *Matlab.* In PISM the X views are available using the `-view_map` option. See the "User's Manual". When viewing `Vec` s graphically in multiprocessor jobs, the display may have to be set on the command line, for instance as `-display :0` or similar; this must be given as the final option. For example, mpiexec -n 2 pisms -view_map thk -display :0 allows a two processor run to view the ice thickness.