Modern Computational Science - Summer School

The first week of the School will be mainly devoted to "Fundamentals" (Part I), while "Special Topics" will be dealt with in week two (Part II). Hands-on lab exercises will complement the lectures.

For dates and times, please refer to the Schedule.


Part I: Fundamentals

Basic Algorithms and Data Structures I - II

Michael Sonnenschein, Ute Vogel, University of Oldenburg

Adequate data structures and efficient algorithms are fundamental for good solutions of computational problems. Therefore, in the first part, we introduce some basic data structures of computer science (lists, trees and graphs) together with their efficient organisation for solving common computational problems. In the second part, we discuss general techniques for designing efficient algorithms and for analysing their complexity and performance. We illustrate these techniques by standard sorting and searching algorithms. Moreover, we will also see examples for computational problems which are intractable by efficient algorithms.


Software Engineering for Scientists

Helmut G. Katzgraber, Texas A&M University and ETH Zürich

Writing complex computer programs to study scientific problems requires careful planning and an in-depth knowledge of programming languages and tools, as well as analysis and visualization programs. In this lecture, the importance of using the right tool for the right problem is emphasized. Furthermore, the core concepts needed to work in an Unix environment are presented. Common tools to organize computer programs (cvs and make), as well as to debug (gdb, valgrind) and improve them (gprof, compiler flags) are discussed, followed by simple data reduction strategies and visualization tools. Finally, some scientific libraries such as Boost, GSL, LEDA and Numerical Recipes are outlined.


Random number generation

Helmut G. Katzgraber, Texas A&M University and ETH Zürich

Random numbers play a crucial role in science and industry. Many numerical methods require the use of random numbers, in particular the Monte Carlo method. Therefore it is of paramount importance to have efficient random number generators. The differences, advantages and disadvantages of true and pseudo random number generators are discussed with an emphasis on the intrinsic details of modern and fast pseudo random number generators. Furthermore, standard tests to verify the quality of the random numbers produced by a given generator are outlined. Finally, standard scientific libraries with built-in generators are presented, as well as different approaches to generate non-uniform random numbers. Potential problems that one might encounter when using large parallel machines are discussed.


Heuristic Optimisation

Michael Sonnenschein, Ute Vogel, University of Oldenburg

To solve optimisation problems, the best solutions in a space of feasible solutions must be determined. Often, the algorithms for computing this best solution have a too high complexity to be performed in acceptable time. In this case, heuristic algorithms are used, which deliver good solutions in acceptable time, but cannot guarantee the optimal solution. We introduce the heuristic optimisation methods Tabu Search and possible applications. Additionally, we will give a very short glance on Genetic Algorithms and Ant Colony Optimisation.


Exploratory Data Analysis I-II

Thomas Kneib, University of Oldenburg

Exploratory data analysis falls in between the two classical branches of statistics which are dealing with the description of observed data without referring to the underlying data generating mechanism (descriptive statistics) or with methods for drawing inferential conclusions about the data generating mechanism (inferential statistics). While the former does not make use of any stochastic assumptions about the data, the latter relies on the formulation of statistical models in terms of suitable probability distributions. Exploratory data analysis bridges this gap in providing statistical tools that are often based on descriptive statistics but already hint at inferential procedures. In particular, exploratory data analysis is often used to generate hypotheses about features of the data generating mechanism or on relations between variables.
In the first lecture, we will introduce methods for describing the distribution of univariate variables. These will either concentrate on specific features such as the location or the variability or may aim at describing the complete distribution. The second lecture provides us with possibilities to study interrelations between variables. A particular focus will be on graphical methods but we will also sketch certain association measures such as correlation coefficients.


Semiparametric Regression

Thomas Kneib, University of Oldenburg

Regression models form one of the most important model classes of applied statistics when interest is on analysing the impact of covariates on a response variable. While parametric models rely on pre-specified functional forms and therefore require strong assumptions about the relation between covariates and response, semiparametric extensions allow for a purely data-driven investigation of these relations. In this lecture, we will present some of the basic ideas underlying semiparametric regression in general and penalised spline smoothing in particular. We will also sketch extensions to bivariate semiparametric regression and spatial smoothing.


Introduction to Monte Carlo Methods I - II

Helmut G. Katzgraber, Texas A&M University and ETH Zürich

Monte Carlo methods play an important role in scientific computation, especially when problems have a vast phase space. In this lecture an introduction to the Monte Carlo method is given. Concepts such as Markov chains, detailed balance, critical slowing down, and ergodicity, as well as the Metropolis algorithm are explained. The Monte Carlo method is illustrated by numerically studying the critical behavior of the two-dimensional Ising ferromagnet using finite-size scaling methods.
In the second part of the lecture advanced Monte Carlo methods are described (parallel tempering Monte Carlo) and illustrated with nontrivial models from the physics of glassy systems. Special attention is paid to the equilibration of the simulations, as well as the inclusion of autocorrelation effects.


Efficient approximation of solutions of nonlinear
partial differential equations using amplitude equations I - II

Hannes Uecker, University of Oldenburg

Despite the steady increase in computing power many nonlinear partial differential equations still cannot be solved by brute force numerical methods in acceptable time. However, often the method of multiple scales resulting in so-called amplitude equations can be used to approximate physically or technologically interesting solutions with high accuracy. We explain this method using a number of examples, including pattern-forming systems from mathematical biology, nonlinear wave equations used to model pulse propagation in optical fibers, and kinetic equations like the Vlasov-Poisson system from plasma physics.



Part II: Special Topics

Hydro- and Sediment Dynamics I-II

Hans-Jörg Wolff, Karsten Lettmann, University of Oldenburg

Over the last decade the area-specific hydrodynamics and the suspended particulate matter (SPM) dynamics in the East-Frisian Wadden Sea have been extensively investigated under the umbrella of the DFG Research Group on BioGeoChemistry of Tidal Flats at the University of Oldenburg. High-resolution modelling efforts in combination with high-resolution measurements at the Wadden Sea time series station and novel theoretical approaches have revealed various new and exciting insights into this complex system thus significantly contributing to the current state of research. The available suite of numerical models integrates worldwide experience with innovative concepts in numerical modelling such as state-of-the-art parameterizations of physical processes and novel concepts of suspended particulate matter transport. These models have been widely used for various research applications in the Wadden Sea and the German Bight in different horizontal resolutions. Combined with sophisticated measurements using ADCP and satellite data these models enable a reliable state estimate for hydrodynamics and sediment dynamics. The response of the Wadden Sea system to atmospheric and hydrological impacts (storms and sea-level rise) will be highlighted in these lectures along with the basic underlying physics and model details.
A new model approach using a finite-volume method on unstructured grids is discussed in a descriptive manner, showing the underlying ideas, the model characteristics, the necessary grid generation and some results from benchmarking the model on various computer architectures. Scientific results from an application to the waste water dispersal close to the city of Wilhelmshaven give first insights into the model performance in the area of the East Frisian coast.



Foundations of Quantum Chemistry

Rainer Koch, University of Oldenburg

Quantum mechanics and its applications to chemistry have grown into a powerful tool helping a wide variety of scientists understanding chemical problems. This introductory lecture to quantum chemistry will cover the theory of quantum mechanics starting with the molecular Hamiltonian and the Schrödinger equation. Terms such as Born-Oppenheimer approximation, SCF, LCAO, and variation principle are explained on the way to the Hartree-Fock theory as the fundamental concept. Additionally, an overview over basis set families and their differences is given. Configuration interaction will be introduced as an approach to treat electron correlation and electronically excited states. A brief overview over various methods used in theoretical chemistry is given at the end of the lecture.



Surface Photochemistry from First Principles

Thorsten Klüner, University of Oldenburg

Photodesorption of small molecules from surfaces is one of the most fundamental processes in surface photochemistry. Despite its apparent simplicity, a microscopic understanding beyond a qualitative picture still poses a true challenge for theory. While the dynamics of nuclear motion can be treated on various levels of sophistication, all approaches suffer from the lack of sufficiently accurate potential energy surfaces, in particular for electronically excited states involved in the desorption scenario.
In the last decade, we have developed a systematic and accurate methodology to reliably calculate accurate ground and excited state potential energy surfaces (PES) for different adsorbate-substrate systems. These potential energy surfaces serve as a prerequisite for subsequent quantum dynamical wave packet calculations, which allow for a direct simulation of experimentally observable quantities such as velocity distributions.
In this contribution, I will focus on recent results obtained for photodesorption of NO and CO from a NiO(100) surface. In contrast to previous studies, we were able to construct highly accurate potential energy surfaces based on correlated quantum chemical calculations (CASPT-2/CCSD(T)). Despite the enormous computational cost, this level of theory turns out to be crucial, since less sophisticated approaches such as density functional theory (DFT) cannot even provide a reliable description of ground state properties, not to mention electronically excited states [1].
These potential energy surfaces were used in subsequent wave packet studies which reveal new desorption mechanisms. In the NO/NiO(100) case, we observed an anti-Antoniewicz scenario in which the wave packet is initially repelled from the surface but eventually reaches a turning point before it is back-scattered from the surface. State-resolved desorption velocity distributions have been calculated, and the results are in good agreement with experimental findings [2].
In the CO/NiO(100) system, we observe the formation of a genuine covalent bond upon photoexcitation for the first time. As demonstrated in the current study, this chemical bond formation is the crucial step in the desorption mechanism for this system [3,4]. Again, our results are in good agreement with recent experiments.

References

  1. I. Mehdaoui, T. Klüner, J. Phys. Chem. A 111 (2007), 13233: Bonding of CO and NO to NiO(100): A strategy for obtaining accurate adsorption energies.
  2. I. Mehdaoui, D. Kröner, M. Pykavy, H.-J. Freund, T. Klüner, Phys. Chem. Chem. Phys. 8, 1584 (2006): Photo-induced desorption of NO from NiO(100): Calculation of the four-dimensional potential energy surfaces and systematic wave packet studies.
  3. I. Mehdaoui, T. Klüner, Phys. Rev. Lett. 98, 037601 (2007): Understanding Surface Photochemistry from First Principles: The Case of CO-NiO(100).
  4. I. Mehdaoui, T. Klüner, Phys. Chem. Chem. Phys. 10 (2008), 4559: New mechanistic insight into electronically excited CO-NiO(100): A quantum dynamical analysis.


An introduction to phylogenetic analysis

Olaf R.P. Bininda-Emonds, University of Oldenburg

Biological species do not represent independent units of analysis because of the pattern of descent with modification through evolution. Although this fact has been appreciated since at least Darwin, it is only in the past 15 years that biologists have increasingly appreciated the need to account for evolutionary relatedness in their analyses. However, phylogeny transcends pure biology to play an important role in related fields. For instance, alignment programs like Clustal rely on phylogenetic information in constructing their guide trees. Unfortunately, finding the optimal phylogeny for a given data set, like most other interesting problems in computational biology, is an NP-hard problem because the number of potential solutions ("tree space") grows super-exponentially with the number of taxa in the analysis. With the rapid growth of phylogenomic data and the scope of the analyses, phylogenetic inference has also garnered increasing attention from computer scientists eager to find fast solutions to this difficult problem.
In this talk, I introduce the problem of phylogenetic inference in general and discuss several optimization criteria (parsimony, likelihood, and distance methods) and the strengths and weaknesses of each for building evolutionary trees.


Molecular homology and sequence alignment

Olaf R.P. Bininda-Emonds, University of Oldenburg

Phylogenetic analysis depends crucially on the data underlying the analysis, with issues of data quality being but one aspect of the problem. A second, slightly less appreciated aspect is that of data comparability and specifically the evolutionary homology (= derivation from a common shared ancestry) of the individual characters being examined. The goal of sequence alignment is to establish the positional homology of the individual elements (nucleotide bases or amino acid residues) of two or more molecular sequences (DNA or proteins, respectively). Like many other problems in computational biology, the simultaneous alignment of three or more sequences is known to be NP-hard. Fortunately, many fast and reasonable heuristics to this problem do exist, however.
In this talk, I first examine the concept of molecular homology before describing efficient algorithms for the alignment of pairs of sequences. I then expand on this to discuss potential solutions and strategies to tackle the problem of multiple sequence alignment.


Maximum Likelihood Inference of Phylogenetic Trees:
Heuristics, Algorithms, and Parallelism by Example of RAxML

Alexandros Stamatakis, TU München

Following the general introduction to the field by Olaf Bininda-Emonds I will address implementation issues and potential solutions by example of RAxML which is a real-world open-source program for phylogenetic inference.
In the first part of this lecture, I will address the design of search heuristics, the design of search convergence criteria, methods for fast computation of support values, and efficient algorithms for computing the topological distance between trees.
In the second part of the lecture I will describe how to parallelize tree searches and the Maximum Likelihood function on a variety of parallel architectures ranging from FPGAs to the IBM BlueGene using for instance Pthreads, OpenMP and MPI.
I will also address solutions for load balance problems in the phylogenetic likelihood function and address issues regarding single- versus double-precision floating point arithmetics and SSE3-based vectorization.
PDFs of papers and RAxML source code available at wwwkramer.in.tum.de/exelixis/.


Intermediate Disturbance Hypothesis - the random way out

Jan Freund, University of Oldenburg

When organisms compete in a non-fluctuating environment, basic models of population dynamics suggest that superior competitors ultimately drive inferior competitors to extinction, and the species number declines - a process called competitive exclusion. To reconcile the competition-exclusion principle with the observed coexistence of many species, Connell (1978) proposed the Intermediate Disturbance Hypothesis (IDH) according to which disturbances of intermediate strength and frequency act to mainain high levels of biodiversity. Since realistic environmental fluctuations will be of random nature, a model approach to the IDH must involve concepts of stochastic processes.
In this lecture we will give a short introduction into stochastic dynamics and illustrate the IDH via stochastic simulations of two competing species. A generalisation to more species and diverse types of randomness will be left to a hands-on session.


References
  1. Connell, J.H. (1978) Diversity in tropical rain forests and coral reefs. Science 199: 1302-1310
  2. Shea, K., Roxburgh, S.H., and Rauschert, E.S.J. (2004) Moving from pattern to process: coexistence mechanisms under intermediate disturbance regimes. Ecology Letters 7: 491-508
  3. Mannella, R. (2000) A Gentle Introduction to the Integration of Stochastic Differential Equations. In: Stochastic Processes in Physics, Chemistry, and Biology. Edited by J.A. Freund, Th. Pöschel, Lecture Notes in Physics, vol. 557, pp.353-364. Springer, Berlin, 2000.


Paradox of plankton - the chaotic way out

Ulrike Feudel, University of Oldenburg

The life of species is determined by their growth and death depending on environmental conditions. Particularly, the availability of nutrients is essential for the survival of the species. However, the nutrient availability is not only dependent on what nature offers in a certain region but is largely determined by the existence of other species feeding on the same nutrients. Hence competition of species plays an important role in the time evolution of a community of species in an ecosystem. One of the most famous principles in theoretical ecology is the competition-exclusion principle that states that in equilibrium the number of species which can coexist can not be larger than the number of limiting resources. This is sometimes called the "survival of the fittest", because the best adapted species, which is considered to be the fittest, would win the competition.
However, observations of plankton species in the ocean show that thousands of species can coexist on a rather low number of limiting resources, such as e.g. nitrogen, phosphorous and light. This so-called paradox of plankton illustrates that the competition-exclusion principle of theoretical ecology is not fulfilled in nature, though it is often observed in laboratory experiments. There have been several attempts to solve this problem including spatial and temporal changes in environmental conditions. We illustrate another way of solving the paradox of plankton by showing that coexistence of many species competing for a small number of limiting resources becomes possible if the time evolution of species does not reach an equilibrium, but species coexist in a chaotic manner, so that available nutrients can be shared in a more efficient way.
The lecture does not only include the explanation of the paradox of plankton but offers also an introduction to the basic principles of chaos theory to understand its application to an important problem in ecology.


The unified theory of biodiversity - the neutral way out

Bernd Blasius, University of Oldenburg

The unified theory of biodiversity is a simple mathematical model for explaining the diversity and relative abundance of species in "neutral" ecological communities. Here neutrality refers to the assumption of per capita ecological equivalence among all individuals of a species, and thus it stands in striking contrast to the believe of many ecologists that different species behave in different ways from one another. Notwithstanding neutral theories have been applied successfully to describe empirical patterns in diverse ecosystems, giving rise to a controversial debate of "neutrality vs the niche" in modern ecology.
This lecture will provide a gentle introduction into neutral models. Starting from Hubble's first formulation we will discuss the calculation of species abundance distributions, the comparison to field data and several modern developments of the theory.


Computational Fluid Dynamics I - III

Joachim Peinke, University of Oldenburg

In this contribution, we want to discuss some aspects of how to solve the fundamental equations for fluid mechanics. The fluid problems are commonly described by the Navier Stokes equation, i.e. in terms of a partial differential equation. Although this equation is already an approximation, the solution of it causes serious problems. We want to focus on methods to solve the Navier Stokes equation for technical applications. First we will discuss the principles of different approximations. Then we will discuss some applications for the wind energy issue.
We will in addition introduce the participants into an open source software for solving the Navier Stokes equation, a so-called CFD (computational fluid dynamics) solver.