Mechanisms for log normal concentration distributions in the environment

Log normal concentrations in the environment

Empirical concentration distributions in the environment exhibit a wide variability, ranging from familiar parametric forms, e.g., normal and log-normal distributions, to more complex shapes^9,10,11,12. Overall, the concentration distribution reflects the different processes that govern environmental fate, e.g., emissions/formation, transport processes, source mixing, sinks and dependences on other variables and processes. In principle, we may therefore extract mechanistic insights into the lifecycle of an environmental component from analysis of the concentration distribution. Although such deconvolution is intangible in the general case, there are specific cases where there are known mechanistic origins for certain distributions. For instance, source mixing in the environment will tend the distribution towards normal.

In this paper we present a model for how log-normal distributions may emerge in the environment (Fig. 1A). This topic has been addressed in a few earlier studies^6,7,8. A joint argument in these is the implication of the ‘Multiplicative Central Limit Theorem’ (or ‘Gibrat’s law’), which applies to processes that include the product of many random variables. If we take the logarithm of the product of many random variables, we have the sum of many random variables, by which the Central Limit Theorem applies, which predict a normal distribution. Back-transforming the logarithm, yields a log-normal distribution. Although this heuristic argumentation does not provide specific physico/chemical mechanisms as to how the multiplication of multiple random variables may commonly appear in the environment, it does provide an intuitive framework.

The present contribution is based on a physico/chemical first order exponential kinetics model (Fig. 1B). Starting with an ordinary differential equation (Eq. 1), we introduce a stochastically variable rate, resulting in a stochastic differential equation (Eq. 2). By solving the corresponding Fokker–Planck equation (FPE) (see Theory section and SI for details) we derive the log-normal distribution (Eq. 3), under certain assumptions about stochastic variability. Since exponential kinetics are commonly observed for many processes in the environment, this model provides a potential explanation for the relative ubiquity of lognormal distributions in the environment.

Observations of lognormal-like concentration distributions in the environment include a wide array of components (e.g., aerosols; bulk organic carbon; heavy metals; organic pollutants; minerals; trace gases; inorganic ions; humic matter; biomarkers; radionuclides; microplastics; pharmaceuticals and pesticides) in various environmental sub-compartments (e.g., groundwater; watersheds; urban air; precipitation; rocks; stormwater; indoor air; soils; wastewater; marine sediments; landfills; lake sediments; peat; glaciers; background air; biota and humans)^{2,3,12,13,14,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38}. However, even though log normality is relatively common in the environment, we emphasize that it by no means is generally observed^10,11,12: see discussion in the next section.

Ascribing a data set to a specific parametric form, e.g., log-normal, may provide mechanistic insights, but also typically simplifies statistical treatment¹¹. Among multiple approaches, a straight-forward approach to test for log-normality is by log-transforming the data and then use one of the many well-established tests for normality, e.g., the Shapiro-Wilks test^9,39,40. Specific aspects of analysis of log-normal data sets, e.g., the impact of non-detects or data averaging, or data sampling strategies, have been discussed in some detail in the literature^12,41,42,43. However, even though we may attribute a certain data set to a parametric distribution, there are many empirical situations where the data appears to follow a complex/un-known distribution. Can we identify some general principles that may help disentangle observed variability, even beyond common parametric forms?

Three mechanisms influencing concentration distributions

Any environmental system is governed by a multitude of different processes, occurring simultaneously. Fortunately, some mechanisms appears to be more prevalent than others, and we can make simplifications, allowing modeling, e.g., the currently presented model for log-normal dependence. To attempt a more general description for concentration distributions, it may prove useful to consider a few different principal mechanisms. Here we explore three such mechanisms:

Kinetics

The model presented here is a common first order approximation of kinetics, e.g., of source and sink processes (Fig. 2A–B). In reality, dynamical systems may be highly complex, beyond the present formulation with a constant deterministic part (μ_k, Eq. 2), with multiple coupled components, non-linearities, feedbacks and heterogenous interactions. How such more complex kinetics influence concentration distributions is a topic for further scientific discovery.

Figure 2

Numerical simulations (random sampling) of time-series concentration data and corresponding distributions, illustrating how different processes may influence concentration distributions. Panels (A)–(B): Log normally-distributed data (Eq. 3), e.g., influenced by exponential kinetics. Panels (C)–(D): Data reflecting the random fluctuation between three different log-normally distributed states (green, blue and red), yielding a multi-modal (mixture) distribution. Panels (E)–(F): Random mixing of the three distributions from panels (C)–(D), yielding a convolution distribution, tending towards a normal distribution. Panels (G)–(H): The log-normal distribution of panels (A)–(B), modulated by an oscillatory function (here sine function), illustrating the impact of, e.g. diurnal or seasonal cycles on observed distributions. The simulations were conducted using Matlab (ver. R2019b).

Full size image

Mixing

The concentration of a component in the environment is typically the sum of emissions from many sources. If these combine randomly, the concentration distribution approach normality (Fig. 2E–F). However, this type represents one limit of mixing—where the resulting distribution at each time point is the sum distribution (convolution) of many variables. Another limit is where the contributions from different sources are separated, such that the concentration at any given measurement point (e.g., time point) reflects one individual source (Fig. 2C–D). An example could be an atmospheric measurement site located between two cities: by wind direction each time point is mainly dominated by either city. In this limit the concentration data will tend to a multi-modal (or ‘mixture’) distribution.

External variables

In the model described in the Theory section the non-stochastic part of Eq. (2) is a constant (μ_k). But this parameter may also vary with external parameters, e.g., state variables. For instance, diurnal or seasonal variations of temperature, pressure or sunlight, including phase transitions, may strongly influence concentration distributions (Fig. 2G–H).

These three classes of mechanisms influence the lifecycle of a component in the environment, but their relative importance regarding how concentration distributions are affected may be highly variable. For certain instances it may be possible to ascribe a certain parametric function to the data, e.g., a normal distribution for well-mixed sources or a log-normal suggesting a kinetic domain. However, for situations with more complex, multi-modal or broad distributions both statistical treatment and interpretations are more challenging. For instance, quite similarly looking distributions may emerge from rather different mechanisms, e.g., dependence on an externally oscillating parameter or a stochastically jumping mixture distribution (e.g., compare Fig. 2C–D with Fig. 2G–H).

Outlook

The shape of empirical concentration distributions is determined by the lifecycle dynamics and thereby contain information of environmental fate. In this paper we show that log-normality suggests influence by first order exponential kinetics (Eqs. 1 and 5; Fig. 1A–B). Reflecting the overall non-equilibrium state of environmental systems, there are a number of different processes/mechanisms that may exhibit exponential kinetics and thereby drive concentrations towards log-normality, e.g., emissions/chemical formation (e.g., of primary or secondary pollutants), degradation/decomposition (e.g., chemical reactions or deposition), kinetic transfer between different pools/layers/reservoirs (e.g., between the troposphere and the stratosphere) or kinetic partitioning (e.g., between gas and liquid phases). Some of these processes, e.g., sink kinetics, are active throughout the lifecycle of a component, and thereby continuously push concentrations towards log-normality.

Overall, the implications of log-normal concentration distributions span a broad spectrum of potential applications, ranging from data analysis methodologies, sampling strategies, emissions estimation, source apportionment calculations, modelling of chemical fate, estimation of toxic exposure to future climate scenarios^{9,25,29,41,42,43,44,45,46,47,48,49,50,51,52}. Given the general mathematical formulation of the model (Eq. 2), we note that the applicability extends to log-normal concentration distributions also beyond Environmental systems^53,54.

A specific example, where mechanistic insights may be derived from analysis of concentration distributions is the estimation of lifetimes (τ = 1/k, Eq. 1)^55,56. Our present findings suggest that observation of log-normally distributed concentration data, perhaps especially at remote or receptor sites, may indicate sink kinetics, and may therefore provide insights into the sink rate (Eqs. 4a–b and 5). Another specific situation where concentration distribution analysis is of importance is the common challenge of trend detection in environmental data¹¹. For certain time-series data, log-transformation is commonly used prior to statistical analysis^51,57. The present analysis may provide a physico-chemical motivation for such transformations, potentially contributing to interpretations. A third specific example regards the analysis of ratios of diagnostic markers, which are common tools to assess sources and processes across Earth and Environmental sciences. Examples include ratios of chemical markers and isotope signatures, where the latter commonly are reported as concentration ratios. The ratio of two log-normal random variables is another log-normal variable⁵⁸. We may then predict that if the overall concentrations are log-normal, then, e.g., isotope signatures should also be log-normal, with potential implications for data analysis methodology and interpretation.

Finally, we note that rates in environmental systems often are proportional to concentrations, e.g., reactants. As an example, the sink rate for CO is proportional to the OH concentration (see Theory). Consequently, if the concentration is log-normal, so is the rate. Furthermore, fluxes (e.g., emission or sink fluxes) are often defined as the product of a rate and a concentration (or at least the amount, by some proportional measure). Taken together, this suggests that log-normal concentration distributions may imply log-normal distributions also for rates and/or fluxes for certain systems, with implications for, e.g., emission estimation or box models^47,49,59.

All-in-all, this paper presents a mechanistic model for how log-normal concentrations may emerge in the environment, which in turn suggests an explanation for the relative abundance.

Mechanisms for log normal concentration distributions in the environment

Log normal concentrations in the environment

Three mechanisms influencing concentration distributions

Kinetics

Mixing

External variables

Outlook

Deep learning and citizen science enable automated plant trait predictions from photographs

Genetic analyses reveal demographic decline and population differentiation in an endangered social carnivore, Asiatic wild dog

ITALIAN LANGUAGE

ENGLISH LANGUAGE