5.3. Insurance Probability

Objectives: AEP and OEP points; theory of the insurance charge; adjusting severity distributions to achieve selected loss picks by layer in an excess of loss program; theory of the Tweedie distribution; when is severity relevant?

Audience: Actuaries at the Associate or Fellow level.

Prerequisites: Individual risk and retro rating; GLM modeling; Tweedie distributions.

Contents.

5.3.1. Helpful References

Klugman et al. [2019]
Panjer and Willmot [1992]
Mildenhall and Major [2022]
Woo [2002]

5.3.2. Occurrence and Aggregate Probable Maximal Loss

5.3.2.1. Probable maximal loss (PML)

Probable maximal loss or PML and the related maximum foreseeable loss (MFL) originated in fire underwriting in the early 1900s. The PML estimates the largest loss that a building is likely to suffer from a single fire if all critical protection systems function as expected. The MFL estimates the largest fire loss likely to occur if loss-suppression systems fail. For a large office building, the PML could be a total loss to 4 to 6 floors, and the MFL could be a total loss within four walls, assuming a single structure burns down. McGuinness [1969] discusses PMLs.

Today, PML is used to quantify potential catastrophe losses. Catastrophe risk is typically managed using reinsurance purchased on an occurrence basis and covering all losses from a single event. Therefore insurers are interested in the annual frequency of events greater than an attachment threshold, leading to the occurrence PML, now known as occurrence exceeding probabilities.

5.3.2.2. Occurrence Exceeding Probability (OEP)

To describe occurrence PMLs, we need to specify the stochastic model used to generate events. It is standard to use a homogeneous Poisson process, with a constant event intensity \(\lambda\) per year. The number of events in time \(t\) has a Poisson distribution with mean \(\lambda t\). If \(X\) is the severity distribution (size of loss conditional on an event) then the number of events per year above size \(x\) has Poisson distribution with mean \(\lambda S(x)\). Therefore the probability of one or more events causing loss \(x\) or more is 1 minus the probability that a Poisson\((\lambda S(x))\) random variable equals zero, which equals \(1-e^{-\lambda S(x)}\). The \(n\) year occurrence PML, \(\mathsf{PML}_{n, \lambda}(X)=\mathsf{PML}_{n, \lambda}\), is the smallest loss \(x\) so that the probability of one or more events causing a loss of \(x\) or more in a year is at least \(1/n\). It can be determined by solving \(1-e^{-\lambda S(\mathsf{PML}_{n, \lambda})}=1/n\), giving

\[\begin{split}S(\mathsf{PML}_{n, \lambda})&=\frac{1}{\lambda}\log\left( \frac{n}{n-1}\right) \\ \implies \mathsf{PML}_{n, \lambda} &= q_X\left( 1 -\frac{1}{\lambda}\log\left( \frac{n}{n-1}\right) \right)\end{split}\]

(if \(S(x)=s\) then \(F(x)=1-s\) and \(x=q_X(1-s)=\mathsf{VaR}_{1-s}(X)\)). Thus, the occurrence PML is a quantile of severity at an adjusted probability level, where the adjustment depends on \(\lambda\).

Converting to non-exceedance probabilities, if \(p=1-1/n\) (close to 1) then \(n/(n-1)=1/p\) and we obtain a relationship between the occurrence PML and severity VaR:

\[\mathsf{PML}_{n, \lambda} = q_X\left( 1 +\frac{\log(p)}{\lambda} \right) =\mathsf{VaR}_{1+\log(p)/\lambda}(X)\]

Catastrophe models output a sample of \(N\) loss events, each with an associated annual frequency \(\lambda_i\) and an expected loss \(x_i\), \(i=1,\dots,N\). Each event is assumed to have a Poisson occurrence frequency distribution. The associated severity distribution is concentrated on the set \(\{x_1,\dots,x_N\}\) with \(\mathsf{Pr}(X=x_i)=\lambda_i/\lambda\), where \(\lambda=\sum_i \lambda_i\) is the expected annual event frequency. It is customary to fit or smooth \(X\) to get a continuous distribution, resulting in unique quantiles.

5.3.2.3. Return Periods

VaR points are often quoted by return period, such as a 100 or 250 year loss, rather than by probability level. By definition, the exceedance probability \(\mathsf{Pr}(X > \mathsf{VaR}_p(X))\) of \(p\)-VaR is less than or equal to \(1-p\), meaning at most a \(1-p\) probability per year. If years are independent, then the average waiting time to an exceedance is at least \(1/(1-p)\). (The waiting time has a geometric distribution, with parameter \(p\). Let \(q=1-p\). The average wait time is \(q + 2pq + 3p^2q+\cdots=q(1+2p+3p^2+\cdots)=1/q\).)

Standard return periods and their probability representation are shown below.

VaR threshold	Exceedance probability	Return Period	Applications
\(p\)	\(1-p\)	\(1/(1-p)\)
0.99	0.01	100 years
0.995	0.005	200 years	Solvency 2
0.996	0.004	250 years	AM Best, S&P, RBC
0.999	0.001	1,000 years

In a Poisson model, the waiting time between events with a frequency of \(\lambda\) has an exponential distribution with mean \(1/\lambda\). Thus, an event with frequency 0.01 is often quoted as having a 100 year return period. Notice, however, the distinction between the chances of no events in a year and the waiting time until the next event. If \(\lambda\) is large, say 12 (one event per month on average), the chances of no events in a year equals \(\exp(-12)=6.1\times 10^{-6}\) is vs. a one-month return period. For small \(\lambda\) there is very little difference between the two since the probability of one or more events equals \(1-\exp(-\lambda)\approx \lambda\).

To reiterate the definition above, when \(X\) represents aggregate annual losses, the statement \(x=\mathsf{VaR}_{0.99}(X)\), \(p=0.99\) means

\(x\) is the smallest loss for which \(X\le x\) with an annual probability of at least \(0.99\), or
\(x\) is the smallest loss with an annual probability at most \(0.01\) of being exceeded.

5.3.2.4. Aggregate Exceeding Probability (AEP)

Severity VaR (quantile) and occurrence PML are distinct but related concepts. However, aggregate PML or aggregate exceeding probability is often used as a synonym for aggregate VaR, i.e., VaR of the aggregate loss distribution..

Let \(A\) equal the annual aggregate loss random variable. \(A\) has a compound Poisson distribution with expected annual frequency \(\lambda\) and severity random variable \(X\). \(X\) is usually thick tailed. Then, as we explain shortly,

\[\mathsf{VaR}_p(A) \approx \mathsf{VaR}_{1-(1-p)/\lambda}(X).\]

This equation is a relationship between aggregate and severity VaRs.

We can sometimes estimate aggregate VaRs in terms of occurrence PMLs with no simulation. For large \(n\) and a thick tailed \(X\) occurrence PMLs and aggregate VaRs contain the same information—there is not more information in the aggregate, as is sometimes suggested. The approximation follows from the equation

\[\mathsf{Pr}(X_1+\cdots +X_n >x) \to n\mathsf{Pr}(X>x)\ \text{as}\ x\to\infty\]

for all \(n\), which holds when \(X\) is sufficiently thick tailed. See Embrechts et al. [1997], Corollary 1.3.2 for the details.

5.3.3. Self-Insurance Plan Stop-Loss Insurance

Self-insurance plans often purchase per occurrence (specific) insurance, to limit the amount from any one loss that flows into the plan, and aggregate stop-loss insurance, to limit their aggregate liability over all occurrences in a year. Retro rating plans need to estimate the insurance charge for the aggregate cover. It is a function of the expected loss, the specific loss limit, and the aggregate retention. They sometimes also want to know the insurance savings, a credit for losses below a minimum. Tables tabulating insurance savings and charges are called Table L (California) or Table M (rest of the US). The two differ in the denominator: limited or unlimited losses.

Let \(X\) denote unlimited severity, \(N\) annual frequency, \(l\) the occurrence limit and \(a\) the aggregate retention of limited losses. The distribution of gross aggregate losses is given by

\[A_g := X_1 + \cdots + X_N.\]

Aggregate losses retained by the plan, reflecting the specific but not the aggregate insurance, are a function of \(l\) and \(n:=\mathsf E [N]\) the expected ground-up claim count, with distribution

\[A(n, l) := (X_1 \wedge l) + \cdots + (X_N \wedge l).\]

Aggregate limits are expressed in terms of the entry ratio \(r\), which we define as the ratio

\[r = \frac{a}{\mathsf E[A(n,l)]}\]

of the aggregate limit to expected losses net of specific insurance. Therefore, the aggregate retention equals

\[a = r\mathsf E[A(n, l)] = rn\mathsf E[X_1 \wedge l].\]

The insurance charge

\[\begin{split}\phi(r):&= \frac{\mathsf E\left[A(n, l) 1_{A(n, l) > r\mathsf E[A(n,l)]}\right]}{\mathsf E[A(n,l)]} \\ &=\frac{\mathsf E\left[A(n, l) \mid A(n, l) > r\mathsf E[A(n,l)\right] S_{(n, l)}(r\mathsf E[A(n,l)])}{\mathsf E[A(n,l)]}\end{split}\]

where \(S_{(n, l)}(\cdot)\) is the survival function of \(A(n,l)\). The aggregate protection loss cost equals \(\phi(r)\mathsf E[A(n,l)]\). The insurance savings equals

\[\begin{split}\psi(r):&= \frac{\mathsf E\left[A(n, l) 1_{A(n, l) \le r\mathsf E[A(n,l)]}\right]}{\mathsf E[A(n,l)]} \\ &= \frac{\mathsf E\left[A(n, l) \mid A(n, l) \le r\mathsf E[A(n,l)\right] F_{A(n, l)}(r\mathsf E[A(n,l)])}{\mathsf E[A(n,l)]}.\end{split}\]

where \(F_{(n, l)}(\cdot)\) is the cdf of \(A(n,l)\).

With this notation, a retro program with maximum entry ratio \(r_1\) and minimum \(r_0\) has a net insurance charge (ignoring expenses and the loss conversion factor) equal to

\[(\phi(r_1) - \psi(r_0)) n\mathsf E[X_1 \wedge l].\]

The charge and savings are illustrated below. Losses are scaled by expected (limited) losses in the figure and so the area under the blue curve equal 1. The graph is the Lee diagram, plotting \(x\) against \(F(x)\).

In [1]: from aggregate.extensions.figures import savings_charge

In [2]: savings_charge();

The figure makes the put-call parity relationship, savings plus 1 equals entry plus charge obvious:

\[\psi(r) + 1 = r + \phi(r).\]

Remember \(r\) is the area under the horizontal line because the width of the plot equals 1. Taking \(r=1\) in put-call parity shows that \(\psi(1)=\phi(1)\): at expected losses, the savings equals the charge.

5.3.4. Adjusting Layer Loss Picks

Reinsurance actuaries apply experience and exposure rating to excess of loss programs. Experience rating trends and develops layer losses to estimate loss costs. Exposure rating starts with a (ground-up) severity curve. In the US, these are often published by a rating agency (ISO, NCCI). It then applies a limit profit and uses difference of ILFs with a ground up loss ratio to estimate layer losses. The actuary then selects a loss cost by layer based on the two methods. When the selection is different from the exposure rate, the actuary no longer has a well-defined stochastic model for the business. In this section we show how to adjust the severity curve to match the selected loss picks by layer. The adjusted curve can then be used in a stochastic model that will replicate the layer loss selections.

Layer severity equals the integral of the survival function and layer expected losses equals layer frequency times severity. The easiest way to adjust a single layer is to scale the frequency. The simple approach fails when there are multiple layers because higher layer frequency impacts lower layers. We are led to adjust the survival function in each layer to hit the all selected layer loss picks. The method described next creates a legitimate, non-increasing survival function and retains its continuity properties whenever possible. It is easy to select inconsistent layer losses which produces negative probabilities or values greater than 1. When such inconsistencies occur the selections must be altered.

Here is the layer adjustment process. Adjustments to higher layers impact all lower layers because they change the probability of limit losses. The approach is to start from the top-most layer, figure its adjustment, and then take the impact of that adjustment into account on the next layer down, and so forth. The adjusted severity curve to maintain the shape of the curve and it continuity properties, conditional on a loss in each layer.

To make these ideas rigorous requires a surprising amount of notation. Define

Specify layer attachment points \(0=a_0 < a_1 < a_2 < \cdots < a_n\) and corresponding layer limits \(y_i = a_i - a_{i-1}\) for \(i=1,2, \dots, n\). The layers are \(l_i\) excess \(a_{i-1}\).
\(l_i = \mathsf{LEV}(a_i) - \mathsf{LEV}(a_{i-1}) = \int_{a_{i-1}}^{a_i} S(x)dx = \mathsf{E}[ (X-a_{i-1})^+ \wedge y_i ]\) equals the unconditional expected layer loss (per ground-up claim).
\(p_i = \Pr(a_{i-1} < X \le a_i) = S(a_{i-1}) - S(a_i)\) equals the probability of a loss in the layer, excluding the mass at the limit.
\(e_i = y_iS(a_i)\) equals the part of \(l_i\) from full limit losses.
\(f_i = a_{i-1}p_i\)
\(m_i = \int_{a_{i-1}}^{a_i} xdF(x) - f_i = \int_{a_{i-1}}^{a_i} (x-a_{i-1})dF(x) = l_i - e_i\) equals the part of \(l_i\) from losses in the layer.
\(t_i\) are selected unconditional expected losses by layer. \(t_i=l_i\) results in no adjustment. \(t_i\) is computed by dividing the layer loss pick by the expected number of ground-up claims.

Integration by parts gives

\[\begin{split}\int_{a_{i-1}}^{a_i} S(x)dx &= xS(x)\,\big\vert_{a_{i-1}}^{a_i} + \int_{a_{i-1}}^{a_i} x dF(x) \\ &= a_iS(a_i) + \int_{a_{i-1}}^{a_i} (x - a_{i-1}) dF(x) \\ &= e_i + m_i.\end{split}\]

These quantities are illustrated in the next figure.

In [1]: from aggregate.extensions.figures import adjusting_layer_losses

In [2]: adjusting_layer_losses();

There is no adjustment to \(S\) for \(x\ge a_n\). In the top layer, adjust to \(\tilde S(x) = S(a_n) + w_n(S(x) - S(a_n))\), so

\[\begin{split}t_n &= \int_{a_{n-1}}^{a_n} \tilde S(x)dx \\ &= S(a_n)y_n + w_n(l_n - e_n) \\ &= \omega_n y_n + w_nm_n \\ \implies w_n &= \frac{t_n - \omega_n y_n}{m_n},\end{split}\]

where \(\omega_n=S(a_n)\). Set \(\omega_i = \omega_{i+1} + w_{i+1} p_{i+1}\) and \(\tilde S(x) = \omega_i + w_n(S(x) - S(a_n))\) in the \(i\)th layer. We can compute all the weights by proceeding down the tower:

\[\begin{split}t_i &= \int_{a_{i-1}}^{a_i} \tilde S(x)dx \\ &= \omega_i y_i + w_i(l_i - e_i) \\ \implies w_i &= \frac{t_i - \omega_i y_i}{m_i}.\end{split}\]

\(\tilde S\) is continuous is \(S\) is because of the definition of \(\omega\) at the layer boundaries. When \(x=a_{i-1}\), \(\tilde S(a_{i-1}) = \omega_i + w_i(S(a_{i-1}) - S(a_i)) = \omega_i + w_ip_i = \omega_{i=1}\).

The function utilities.picks_work computes the adjusted severity. In debug mode, it returns useful layer information. A severity can be adjusted on-the-fly by Aggregate using the picks keyword after the severity specification and before any occurrence reinsurance.

5.3.5. The Tweedie Distribution

The Tweedie distribution is a Poisson mixture of gammas. It is an exponential family distribution [Jørgensen, 1997]. Tweedie distributions are a suitable model for pure premiums and are used as unit distributions in GLMs [McCullagh and Nelder, 2019]. Tweedie distributions do not have a closed form density, but estimating the density is easy using aggregate.

The Tweedie family of distributions is a three-parameter exponential family. A variable \(X \sim \mathrm{Tw}_p(\mu, \sigma^2)\) when \(\mathsf E[X] = \mu\) and \(\mathsf{Var}(X) = \sigma^2 \mu^p\), \(1 \le p \le 2\). \(p\) is a shape parameter and \(\sigma^2>0\) is a scale parameter called the dispersion.

A Tweedie with \(1<p<2\) is a compound Poisson distribution with gamma distributed severities. The limit when \(p=1\) is an over-dispersed Poisson and when \(p=2\) is a gamma. More generally: \(\mathsf{Tw}_0(\mu,\sigma^2)\) is normal \((\mu, \sigma^2)\), \(\mathsf{Tw}_1(\mu, \sigma^2)\) is over-dispersed Poisson \(\sigma^2\mathsf{Po}(\mu/\sigma^2)\), and \(\mathsf{Tw}_2(\mu,\sigma^2)\) is a gamma with CV \(\sigma\).

Let \(\mathsf{Ga}(\alpha, \beta)\) denote a gamma with shape \(\alpha\) and scale \(\beta\), with density \(f(x;\alpha,\beta)=x^\alpha- e^{-x/\beta} / \beta^\alpha x\Gamma(\alpha)\). It has mean \(\alpha\beta\), variance \(\alpha\beta^2\), expected square \(\alpha(\alpha+1)\beta\) and coefficient of variation \(1/\sqrt\alpha\). We can define an alternative parameterization \(\mathsf{Tw}^*(\lambda, \alpha, \beta) = \mathsf{CP}(\lambda, \mathsf(Ga(\alpha,\beta))\) as a compound Poisson of gammas, with expected frequency \(\lambda\).

The dictionary between the two parameterizations relies on the relation between the two shape parameters \(\alpha\) and \(p\) given by

\[\alpha = \frac{2-p}{p-1}, \qquad p = \frac{2+\alpha}{1+\alpha}.\]

Starting from \(\mathrm{Tw}_p(\mu, \sigma^2)\): \(\lambda = \displaystyle\frac{\mu^{2-p}}{(2-p)\sigma^2}\) and \(\beta = \displaystyle\frac{\mu^{1-p}}{(p-1)\sigma^2} = \mu /\lambda \alpha\)

Starting from \(\mathsf{Tw}^*(\lambda, \alpha, \beta)\): \(\mu = \lambda \alpha \beta\) and \(\sigma^2 = \lambda \alpha(\alpha + 1) / (\beta^2\mu^p)\), by equating expressions for the variance.

It is easy to convert from the gamma mean \(m\) and CV \(\nu\) to \(\alpha=1/\nu^2\) and \(\beta = m/\alpha\). Remember, scipy.stats scale equals \(\beta\).

Tweedie distributions are mixed: they have a probability mass of \(p_0 =e^{-\lambda}\) at 0 and are continuous on \((0, \infty)\).

Jørgensen calls \(\mathsf{Tw}(\lambda, \alpha, \beta)\) the additive form of the model because

\[\sum_i \mathsf{Tw}(\lambda_i, \alpha, \beta) = \mathsf{Tw}\left(\sum_i \lambda_i, \alpha, \beta\right).\]

He calls \(\mathsf{Tw}_p(\mu, \sigma)\) the reproductive exponential dispersion model. If \(X_i\sim \mathsf{Tw}_p(\mu, \sigma/w_i)\) then

\[\frac{1}{w}\sum_i w_i X_i \sim \mathsf{Tw}_p\left(\mu, \frac{\sigma^2}{w}\right)\]

where \(w = \sum_i w_i\). The weights \(w_i\) represents volume in cell \(i\) and \(X_i\) represents the pure premium. The sum on the left represents the total pure premium.

The next diagram shows how the Tweedie family fits within the broader power variance exponential family of distributions. See the blog post The Tweedie-Power Variance Function Family for more details.

In [1]: from aggregate.extensions.figures import power_variance_family

In [2]: power_variance_family()

5.3.6. Excess Frequency Distributions

Given a ground-up claim count distribution \(N\), what is the distribution of the number of claims exceeding a certain threshold? We assume that severities are independent and identically distributed and that the probability of exceeding the threshold is \(q\). Define an indicator variable \(I\) which takes value 0 if the claim is below the threshold and the value 1 if it exceeds the threshold. Thus \(\mathsf{Pr}(I=0)=p=1-q\) and \(\mathsf{Pr}(I=1)=q\). Let \(M_N\) be the moment generating function of \(N\) and \(N'\) is the number of claims in excess of the threshold. By definition we can express \(N'\) as an aggregate

\[N'=I_1 + \cdots + I_N.\]

Thus the moment generating function of \(N'\) is

\[\begin{split}M_{N'}(\zeta) &=M_N(\log(M_I(\zeta))) \\ &=M_N(\log(p+qe^{\zeta}))\end{split}\]

Using indicator variables \(I\) is called \(p\)-thinning by Grandell [1997].

Here are some examples.

Let \(N\) be Poisson with mean \(n\). Then

\[M_{N'}(\zeta) = \exp(n(p+qe^{\zeta}-1)) = \exp(qn(e^{\zeta}-1))\]

so \(N'\) is also Poisson with mean \(qn\)—the simplest possible result.

Next let \(N\) be a \(G\)-mixed Poisson. Thus

\[\begin{split}M_{N'}(\zeta) &= M_N(\log(p+qe^{\zeta})) \\ &= M_G(n(p+qe^{\zeta}-1)) \\ &= M_G(nq(e^{\zeta}-1)).\end{split}\]

Hence \(N'\) is also a \(G\)-mixed Poisson with lower underlying claim count \(nq\) in place of \(n\).

In particular, if \(N\) has a negative binomial with parameters \(P\) and \(c\) (mean \(cP\), \(Q=1+P\), moment generating function \(M_N(\zeta)=(Q-Pe^{\zeta})^{-1/c}\)), then \(N'\) has parameters \(qP\) and \(c\). If \(N\) has a Poisson-inverse Gaussian distribution with parameters \(\mu\) and \(\beta\), so

\[M_N(\zeta)=\exp\left(-\mu(\sqrt{1+2\beta(e^{\zeta}-1)}-1)\right),\]

then \(N\) is also Poisson inverse Gaussian with parameters \(\mu q\) and \(\beta q\).

In all cases the variance of \(N'\) is lower than the variance of \(N\) and \(N'\) is closer to Poisson than \(N\) in the sense that the variance to mean ratio has decreased. For the general \(G\)-mixed Poisson the ratio of variance to mean decreases from \(1+cn\) to \(1+cqn\). As \(q\to 0\) the variance to mean ratio approaches \(1\) and \(N'\) approaches a Poisson distribution. The fact that \(N'\) becomes Poisson is called the law of small numbers.

5.3.7. When Is Severity Irrelevant?

In some cases the actual form of the severity distribution is essentially irrelevant to the shape of the aggregate distribution. Consider an aggregate with a \(G\)-mixed Poisson frequency distribution. If the expected claim count \(n\) is large and if the severity is tame (roughly tame means bounded or has a log concave density; a policy with a limit has a tame severity; unlimited workers compensation or cat losses may not be tame) then particulars of the severity distribution diversify away in the aggregate. Moreover, the variability from the Poisson claim count component also diversifies away, and the shape of the aggregate distribution converges to the shape of the frequency mixing distribution \(G\). Another way of saying the same thing is that the normalized distribution of aggregate losses (aggregate losses divided by expected aggregate losses) converges in distribution to \(G\).

We can prove these assertions using moment generating functions. Let \(X_n\) be a sequence of random variables with distribution functions \(F_n\) and let \(X\) another random variable with distribution \(F\). If \(F_n(x)\to F(x)\) as \(n\to\infty\) for every point of continuity of \(F\) then we say \(F_n\) converges weakly to \(F\) and that \(X_n\) converges in distribution to \(F\).

Convergence in distribution is a relatively weak form of convergence. A stronger form is convergence in probability, which means for all \(\epsilon>0\) \(\mathsf{Pr}(|X_n-X|>\epsilon)\to 0\) as \(n\to\infty\). If \(X_n\) converges to \(X\) in probability then \(X_n\) also converges to \(X\) in distribution. The converse is false. For example, let \(X_n=Y\) and \(X=1-Y\) be binomial 0/1 random variables with \(\mathsf{Pr}(Y=1)=\mathsf{Pr}(X=1)=1/2\). Then \(X_n\) converges to \(X\) in distribution. However, since \(\mathsf{Pr}(|X-Y|=1)=1\), \(X_n\) does not converge to \(X\) in probability.

It is a fact that \(X_n\) converges to \(X\) if the MGFs \(M_n\) of \(X_n\) converge to the MFG of \(M\) of \(X\) for all \(t\): \(M_n(t)\to M(t)\) as \(n\to\infty\). See Feller [1971] for more details. We can now prove the following result.

Proposition. Let \(N\) be a \(G\)-mixed Poisson distribution with mean \(n\), \(G\) with mean 1 and variance \(c\), and let \(X\) be an independent severity with mean \(x\) and variance \(x(1+\gamma^2)\). Let \(A=X_1+\cdots+X_N\) and \(a=nx\). Then \(A/a\) converges in distribution to \(G\), so

\[\mathsf{Pr}(A/a < \alpha) \to \mathsf{Pr}(G < \alpha)\]

as \(n\to\infty\). Hence

\[\sigma(A/a) = \sqrt{c + \frac{x(1+\gamma^2)}{a}}\to\sqrt{c}.\]

Proof. We know

\[M_A(\zeta)= M_G(n(M_X(\zeta)-1))\]

and so using Taylor’s expansion we can write

\[\begin{split}\lim_{n\to\infty} M_{A/a}(\zeta) &= \lim_{n\to\infty} M_A(\zeta/a) \\ &= \lim_{n\to\infty} M_G(n(M_X(\zeta/nx)-1)) \\ &= \lim_{n\to\infty} M_G(n(M_X'(0)\zeta/nx+R(\zeta/nx))) \\ &= \lim_{n\to\infty} M_G(\zeta+nR(\zeta/nx))) \\ &= M_G(\zeta)\end{split}\]

for some remainder function \(R(t)=O(t^2)\). Note that the assumptions on the mean and variance of \(X\) guarantee \(M_X'(0)=x=\mathsf{E}[X]\) and that the remainder term in Taylor’s expansion actually is \(O(t^2)\). The second part is trivial.

The proposition implies that if the frequency distribution is actually a Poisson, so the mixing distribution \(G=1\) with probability 1, then the loss ratio distribution of a very large book will tend to the distribution concentrated at the expected, hence the expression that “with no parameter risk the process risk completely diversifies away.”

The next figure illustrate the proposition, showing how aggregates change shape as expected counts increase.

In [1]: from aggregate.extensions import mixing_convergence

In [2]: mixing_convergence(0.25, 0.5)

On the top, \(G=1\) and the claim count is Poisson. Here the scaled distributions get more and more concentrated about the expected value (scaled to 1.0). Notice that the density peaks (left) are getting further apart as the claim count increases. The distribution (right) is converging to a Dirac delta step function at 1.

On the bottom, \(G\) has a gamma distribution with variance \(0.0625\) (asymptotic CV of 25%). The density peaks are getting closer, converging to the mixing gamma. The scaled aggregate distributions converge to \(G\) (thick line, right).

It is also interesting to compute the correlation between \(A\) and \(G\). We have

\[\begin{split}\mathsf{cov}(A,G) &= \mathsf{E}[AG]-\mathsf{E}[A]\mathsf{E}[G] \\ &= \mathsf{E}\mathsf{E}[AG \mid G] - nx \\ &= \mathsf{E}[nxG^2] - nx \\ &= nxc,\end{split}\]

and therefore

\[\mathsf{corr}(A,G)=nxc/\sqrt{nx\gamma + n(1+cn)}\sqrt{c}\to 1\]

as \(n\to\infty\).

The proposition shows that in some situations severity is irrelevant to large books of business. However, it is easy to think of examples where severity is very important, even for large books of business. For example, severity becomes important in excess of loss reinsurance when it is not clear whether a loss distribution effectively exposes an excess layer. There, the difference in severity curves can amount to the difference between substantial loss exposure and none. The proposition does not say that any uncertainty surrounding the severity distribution diversifies away; it is only true when the severity distribution is known with certainty. As is often the case with risk management metrics, great care needs to be taken when applying general statements to particular situations!