5.5. Distortions and Spectral Risk Measures

Objectives: Introduce distortion functions and spectral risk measures.

Audience: Readers looking for a deeper technical understanding.

Prerequisites: Knowledge of probability and calculus; insurance terminology.

See also: Distortion Class Calculations.

Contents:

5.5.1. Helpful References

  • Mildenhall and Major [2022]

  • The text in this section is derived from Major and Mildenhall [2020].

  • Mildenhall [2022]

5.5.2. Distortion Function and Spectral Risk Measures

We define SRMs and recall results describing their different representations. By De Waegenaere et al. [2003] SRMs are consistent with general equilibrium and so it makes sense to consider them as pricing functionals. The SRM is interpreted as the (ask) price for an insurer-written risk transfer.

Definition. A distortion function is an increasing concave function \(g:[0,1]\to [0,1]\) satisfying \(g(0)=0\) and \(g(1)=1\).

A spectral risk measure \(\rho_g\) associated with a distortion \(g\) acts on a non-negative random variable \(X\) as

\[\rho_g(X) = \int_0^\infty g(S(x))dx.\]

The simplest distortion if the identity \(g(s)=s\). Then \(\rho_g(X)=\mathsf E[X]\) from the integration-by-parts identity

\[\int_0^\infty S(x)\,dx = \int_0^\infty xdF(x).\]

Other well-known distortions include the proportional hazard \(g(s)=s^r\) for \(0<r\le 1\), its dual \(g(s)=1-(1-s)^r\) for \(r\ge 1\), and the Wang transform \(g(s)=\Phi(\Phi^{-1}(s)+\lambda)\) for \(\lambda \ge 0\), Wang [1995].

Since \(g\) is concave \(g(s)\ge 0g(0) + sg(1)=s\) for all \(0\le s\le 1\), showing \(\rho_g\) adds a non-negative margin.

Going forward, \(g\) is a distortion and \(\rho\) is its associated distortion risk measure. We interpret \(\rho\) as a pricing functional and refer to \(\rho(X)\) as the price or premium for insurance on \(X\).

SRMs are translation invariant, monotonic, subadditive, and positive homogeneous, and hence coherent, Acerbi [2002]. In addition, they are law invariant and comonotonic additive. In fact, all such functionals are SRMs. As well has having these properties, SRMs are powerful because we have a complete understanding of their representation and structure, which we summarize in the following theorem.

Theorem. Subject to \(\rho\) satisfying certain continuity assumptions, the following are equivalent.

  1. \(\rho\) is a law invariant, coherent, comonotonic additive risk measure.

  2. \(\rho=\rho_g\) for a concave distortion \(g\).

  3. \(\rho\) has a representation as a weighted average of TVaRs for a measure \(\mu\) on \([0,1]\): \(\rho(X)=\int_0^1 \mathsf{TVaR}_p(X)\mu(dp)\).

  4. \(\rho(X)=\max_{\mathsf Q\in\mathscr{Q}} \mathsf E_{\mathsf Q}[X]\) where \(\mathscr{Q}\) is the set of (finitely) additive measures with \(\mathsf Q(A)\le g(\mathsf P(A))\) for all measurable \(A\).

  5. \(\rho(X)=\max_{\mathsf Z\in\mathscr{Z}} \mathsf E[XZ]\) where \(\mathscr{Z}\) is the set of positive functions on \(\Omega\) satisfying \(\int_p^1 q_Z(t)dt \le g(1-p)\), and \(q_Z\) is the quantile function of \(Z\).

The Theorem combines results from Föllmer and Schied [2011] (4.79, 4.80, 4.93, 4.94, 4.95), Delbaen [2000], Kusuoka [2001], and Carlier and Dana [2003]. It requires that \(\rho\) is continuous from above to rule out the possibility \(\rho=\sup\). In certain situations, the \(\sup\) risk measure applied to an unbounded random variable can only be represented as a \(\sup\) over a set of test measures and not a max. Note that the roles of from above and below are swapped from Föllmer and Schied [2011] because they use the asset, negative is bad, sign convention whereas we use the actuarial, positive is bad, convention.

The relationship between \(\mu\) and \(g\) is given by Föllmer and Schied [2011] 4.69 and 4.70. The elements of \(\mathscr Z\) are the Radon-Nikodym derivatives of the measures in \(\mathscr Q\).

The next four sections introduce the idea of layer densities and prove that SRM premium can be allocated to policy in a natural and unique way.

5.5.3. Layer Densities

Risk is often tranched into layers that are then insured and priced separately. Meyers [1996] describes layering in the context of liability increased limits factors and Culp and O'Donnell [2009], Mango et al. [2013] in the context of excess of loss reinsurance.

Define a layer \(y\) excess \(x\) by its payout function \(I_{(x,x+y]}(X):=(X-x)^+\wedge y\). The expected layer loss is

\[\begin{split}\mathsf E[I_{(x,x+y]}(X)] &= \int_x^{x+y} (t-x)dF(t) + yS(x+y) \\ &= \int_x^{x+y} t dF(t) + tS(t)\vert_x^{x+y} \\ &= \int_x^{x+y} S(t)\, dt.\end{split}\]

Based on this equation, Wang [1996] points out that \(S\) can be interpreted as the layer loss (net premium) density. Specifically, \(S\) is the layer loss density in the sense that \(S(x)=d/dx(\mathsf E[I_{(0, x]}(X)])\) is the marginal rate of increase in expected losses in the layer at \(x\). We use density in this sense to define premium, margin and equity densities, in addition to loss density.

Clearly \(S(x)\) equals the expected loss to the cover \(1_{\{X>x\}}\). By scaling, \(S(x)dx\) is the close to the expected loss for \(I_{(x, x+dx]}\) when \(dx\) is very small; Bodoff [2007] calls these infinitesimal layers.

Wang [1996] goes on to interpret

\[\int_x^{x+y} g(S(t))\,dt\]

as the layer premium and hence \(g(S(x))\) as the layer premium density. We write \(P(x):=g(S(x))\) for the premium density.

We can decompose \(X\) into a sum of thin layers. All these layers are comonotonic with one another and with \(X\), resulting in an additive decomposition of \(\rho(X)\), since \(\rho\) is comonotonic additive. The decomposition mirrors the definition of \(\rho\) as an integral.

The amount of assets \(a\) available to pay claims determines the quality of insurance, and premium and expected losses are functions of \(a\). Premiums are well-known to be sensitive to the insurer’s asset resources and solvency, Phillips et al. [1998]. Assets may be infinite, implying unlimited cover. When \(a\) is finite there is usually some chance of default. Using the layer density view, define expected loss \(\bar S\) and premium \(\bar P\) functions as

\[\begin{split}\bar S(a) &= \mathsf E[X\wedge a]=\int_0^a S(x)\,dx \\ \bar P(a) &= \rho(X\wedge a) = \int_0^\infty g(S_{X\wedge a}(x))\,dx \\ &=\int_0^a g(S_{X}(x))dx.\end{split}\]

Margin is \(\bar M(a):=\bar P(a)-\bar S(a)\) and margin density is \(M(a)=d\bar M(a)/da\). Assets are funded by premium and equity \(\bar Q(a):=a-\bar P(a)\). Again \(Q(a)=d\bar Q/da = 1-P(a)\). Together \(S\), \(M\), and \(Q\) give the split of layer funding between expected loss, margin and equity. Layers up to \(a\) are, by definition, fully collateralized. Thus \(\rho(X\wedge a)\) is the premium for a defaultable cover on \(X\) supported by assets \(a\), whereas \(\rho(X)\) is the premium for an unlimited, default-free cover.

The layer density view is consistent with more standard approaches to pricing. If \(X\) is a Bernoulli risk with \(\Pr(X=1)=s\) and expected loss cost \(s\), then \(\rho(X)=g(s)\) can be regarded as pricing a unit width layer with attachment probability \(s\). In an intermediated context, the funding constraint requires layers to be fully collateralized by premium plus equity—without such funding the insurance would not be credible since the insurer has no other source of funds.

Given \(g\) we can compute insurance market statistics for each layer. The loss, premium, margin, and equity densities are \(s\), \(g(s)\), \(g(s)-s\) and \(1-g(s)\). The layer loss ratio is \(s/g(s)\) and \((g(s)-s)/(1-g(s))\) is the layer return on equity. These quantities are illustrated in the next figure for a typical distortion function. The corresponding statistics for ground-up covers can be computed by integrating densities.

In [1]: from aggregate.extensions.pir_figures import fig_10_3

In [2]: fig_10_3()
../_images/dist_g_fig.png

For an insured risk, we regard the margin as compensation for ambiguity aversion and associated winner’s curse drag. Both of these effects are correlated with risk, so the margin is hard to distinguish from a risk load, but the rationale is different. Again, recall, although \(\rho\) is non-additive and appears to charge for diversifiable risk, De Waegenaere et al. [2003] assures us the pricing is consistent with a general equilibrium.

The layer density is distinct from models that vary the volume of each unit in a homogeneous portfolio model. Our portfolio is static. By varying assets we are implicitly varying the quality of insurance.

5.5.4. Portfolio Pricing with Spectral Risk Measures

Taken as read: a painful discussion that markets set prices, not actuaries and models. Pricing here means valuing according to some model. For actuaries, valuation is a term of art it means reserving to a life actuary. Pricing actuaries understanding that they are just determining a model value. Thus we will refer to the model price.

Several methods apply a distortion \(g\) to price by computing

\[\rho_g(X) = \int g(S(t))dt\]

notably:

  1. Aggregate: price, apply_distortion

  2. Portfolio: price, apply_distortion, (called by analyze distortion)

  3. Distortion: price, price2

  4. Working by hand using density_df.p_total.

All of these methods use the same approach, the integral is approximated as a left Riemann sum:

\[\int_0^\infty g(S(t))dt \approx \sum_{k=0}{n} g(S(kb))b\]

The implementation computes

  • S as 1 - p_total.cumsum(),

  • gS = d.g(S), and

  • (gS.loc[:a - bs] * np.diff(S.index)).sum() or .cumsum().iloc[-1].

The p_total.cumsum() idiom automatically accounts for the case where the output distribution is not normalized (sums to \(<1\)). Using sum vs. cumsum is usually an O(1e-16) difference. These methods all use the forward difference of \(dt\) and match against the unlagged values of S or gS (per PIR p. 272-3). The Aggregate method prepends 0 and then computes a cumsum, so the a index gives the right value. Remember, pandas.Series.loc[:a] includes the element with index a (whereas iloc[:n] does not). When a is given, the series includes a (based on .loc[:a]) and the last value is dropped from the sum product.

The next block of code provides a reconciliation of methods. Build an aggregate and put it in a Portfolio object to expose calibrate_distortions.

In [3]: from aggregate import Portfolio, build, qd

In [4]: import pandas as pd

In [5]: a = build('agg CommAuto '
   ...:           '10 claims '
   ...:           '10000 xs 0 '
   ...:           'sev lognorm 50 cv 4 '
   ...:           'poisson')
   ...: 

In [6]: qd(a)

       E[X] Est E[X]    Err E[X]   CV(X) Est CV(X)  Skew(X) Est Skew(X)
X                                                                      
Freq     10                      0.31623            0.31623            
Sev  49.804   49.803 -4.6559e-06  3.5917    3.5918   20.434      20.434
Agg  498.04   498.03 -4.6559e-06   1.179     1.179   6.0196      6.0195
log2 = 16, bandwidth = 1/2, validation: not unreasonable.

In [7]: pa = Portfolio('test', [a])

In [8]: pa.update(log2=16, bs=1/4)

In [9]: qd(pa)

                E[X] Est E[X]    Err E[X]   CV(X) Est CV(X)  Skew(X) Est Skew(X)
unit     X                                                                      
CommAuto Freq     10                      0.31623            0.31623            
         Sev  49.804   49.804 -3.0452e-07  3.5917    3.5918   20.434      20.434
         Agg  498.04   498.04 -1.8636e-06   1.179     1.179   6.0196      6.0168
total    Freq     10                      0.31623            0.31623            
         Sev  49.804   49.804 -3.0452e-07  3.5917             20.434            
         Agg  498.04   498.03  -1.655e-05   1.179    1.1788   6.0196      6.0109
log2 = 16, bandwidth = 1/4, validation: fails agg mean error >> sev, possible aliasing; try larger bs.

Determine distortion parameters to achieve a 10% return at 99 percentile capital, and display them. Pull out the achieved pricing.

In [10]: pa.calibrate_distortions(ROEs=[0.1], Ps=[0.99], strict='ordered');

In [11]: d = pa.dists['dual']

In [12]: qd(pa.distortion_df)

                               S      L      P      PQ      Q  COC   param       error
a      LR       method                                                                
2.745k 700.875m ccoc   0.0099992 482.03 687.76 0.33431 2057.2  0.1     0.1           0
                ph     0.0099992 482.03 687.76 0.33431 2057.2  0.1 0.68741  1.7053e-12
                wang   0.0099992 482.03 687.76 0.33431 2057.2  0.1 0.43983   2.558e-11
                dual   0.0099992 482.03 687.76 0.33431 2057.2  0.1  1.9436 -1.1369e-13
                tvar   0.0099992 482.03 687.76 0.33431 2057.2  0.1 0.39096  8.1372e-06

In [13]: f"Exact premium {pa.distortion_df.iloc[0, 2]:.15f}"
Out[13]: 'Exact premium 687.755319984361222'

Compute pricing in the four ways described above.

In [14]: dm = pa.price(.99, d)

In [15]: f'Exact value {dm.price:.15f}'
Out[15]: 'Exact value 687.755319984360540'

In [16]: bit = a.density_df[['loss', 'p_total', 'S']]

In [17]: bit['aS'] = 1 - bit.p_total.cumsum()

In [18]: bit['gS'] = d.g(bit.S)

In [19]: bit['gaS'] = d.g(bit.aS)

In [20]: test = pd.Series((d.price(bit.loc[:a.q(0.99), 'p_total'], kind='both')[-1],
   ....:                   d.price(a.density_df.p_total, a.q(0.99), kind='both')[-1],
   ....:                   d.price2(bit.p_total).loc[a.q(0.99)].ask, \
   ....:                   d.price2(bit.p_total, a.q(0.99)).ask.iloc[0],
   ....:                   a.price(0.99, d).iloc[0, 1],
   ....:                   dm.price,
   ....:                   bit.loc[:a.q(0.99)-a.bs, 'gS'].sum() * a.bs,
   ....:                   bit.loc[:a.q(0.99)-a.bs, 'gS'].cumsum().iloc[-1] * a.bs,
   ....:                   bit.loc[:a.q(0.99)-a.bs, 'gaS'].sum() * a.bs,
   ....:                   bit.loc[:a.q(0.99)-a.bs, 'gaS'].cumsum().iloc[-1] * a.bs),
   ....:           index=['distortion.price',
   ....:                  'distortion.price with a',
   ....:                  'distortion.price2, find a',
   ....:                  'distortion.price2(a)',
   ....:                  'Aggregate.price',
   ....:                  'Portfolio.price',
   ....:                  'bit sum',
   ....:                  'bit cumsum',
   ....:                  'bit sum alt S',
   ....:                  'bit cumsum alt S'
   ....:                 ])
   ....: 

Display the results and the relative difference to the largest price.

In [21]: qd(test.sort_values(),
   ....:    float_format=lambda x: f'{x:.15f}')
   ....: 

distortion.price2(a)        687.755319984360540
distortion.price2, find a   687.755319984360540
Portfolio.price             687.755319984360540
Aggregate.price             687.755319984360540
bit cumsum                  687.755319984360540
bit cumsum alt S            687.755319984360540
distortion.price            687.755319984361108
distortion.price with a     687.755319984361108
bit sum                     687.755319984361108
bit sum alt S               687.755319984361108

In [22]: qd(test.sort_values() / test.sort_values().iloc[-1] - 1,
   ....:    float_format=lambda x: f'{x:.6e}')
   ....: 

distortion.price2(a)        -7.771561e-16
distortion.price2, find a   -7.771561e-16
Portfolio.price             -7.771561e-16
Aggregate.price             -7.771561e-16
bit cumsum                  -7.771561e-16
bit cumsum alt S            -7.771561e-16
distortion.price             0.000000e+00
distortion.price with a      0.000000e+00
bit sum                      0.000000e+00
bit sum alt S                0.000000e+00

5.5.5. The Equal Priority Default Rule

If assets are finite and the provider has limited liability we need to to determine policy-level cash flows in default states before we can determine the fair market value of insurance. The most common way to do this is using equal priority in default.

Under limited liability, total losses are split between provider payments and provider default as

\[X = X\wedge a + (X-a)^+.\]

Next, actual payments \(X\wedge a\) must be allocated to each policy.

\(X_i\) is the amount promised to \(i\) by their insurance contract. Promises are limited by policy provisions but are not limited by provider assets. At the policy level, equal priority implies the payments made to, and default borne by, policy \(i\) are split as

\[\begin{split}X_i &= X_i \frac{X\wedge a}{X} + X_i \frac{(X-a)^+}{X} \\ &= (\text{payments to policy $i$}) + (\text{default borne by policy $i$}).\end{split}\]

Therefore the payments made to policy \(i\) are given

\[\begin{split}X_i(a) := X_i \frac{X\wedge a}{X} = \begin{cases} X_i & X \le a \\ X_i\dfrac{a}{X} & X > a. \end{cases}\label{eq:equal-priority}\end{split}\]

\(X_i(a)\) is the amount actually paid to policy \(i\). It depends on \(a\), \(X\) and \(X_i\). The dependence on \(X\) is critical. It is responsible for almost all the theoretical complexity of insurance pricing.

It is worth reiterating that with this definition \(\sum_i X_i(a)=X\wedge a\).

Example.

Here is an example illustrating the effect of equal priority. Consider a certain loss \(X_0=1000\) and \(X_1\) given by a lognormal with mean 1000 and coefficient of variation 2.0. Prudence requires losses be backed by assets equal to the 0.9 quantile. On a stand-alone basis \(X_0\) is backed by \(a_0=1000\) and is risk-free. \(X_1\) is backed by \(a_1=2272\) and the recovery is subject to a considerable haircut, since \(\mathsf E[X_1\wedge 2272] = 732.3\). If these risks are pooled, the pool must hold \(a=a_0+a_1\) for the same level of prudence. When \(X_1\le a_1\) both units are paid in full. But when \(X_1 > a_1\), \(X_0\) receives \(1000(a/(1000+X_1))\) and \(X_1\) receives the remaining \(X_1(a/(1000+X_1))\). Payment to both units is pro rated down by the same factor \(a/(1000+X_1)\)—hence the name equal priority. In the pooled case, the expected recovery to \(X_0\) is 967.5 and 764.8 to \(X_1\). Pooling and equal priority result in a transfer of 32.5 from \(X_0\) to \(X_1\). This example shows what can occur when a thin tailed unit pools with a thick tailed one under a weak capital standard with equal priority. We shall see how pricing compensates for these loss payment transfers, with \(X_1\) paying a positive margin and \(X_0\) a negative one. The calculations are performed in aggregate as follows. First, set up the Portfolio:

In [23]: from aggregate import build, qd

In [24]: port = build('port Dist:EqPri '
   ....:              'agg A 1 claim dsev [1000] fixed '
   ....:              'agg B 1 claim sev lognorm 1000 cv 2 fixed',
   ....:             bs=4)
   ....: 

In [25]: qd(port)

            E[X] Est E[X]    Err E[X]  CV(X) Est CV(X) Skew(X) Est Skew(X)
unit  X                                                                   
A     Freq     1                           0                              
      Sev   1000     1000           0      0         0                    
      Agg   1000     1000           0      0         0                    
B     Freq     1                           0                              
      Sev   1000   999.91 -8.6294e-05      2    1.9921      14      12.417
      Agg   1000   999.91 -8.6294e-05      2    1.9921      14      12.417
total Freq     2                           0                              
      Sev   1000   999.96 -4.3147e-05 1.4142            19.799            
      Agg   2000   1999.9 -4.3673e-05      1   0.99599      14       12.41
log2 = 16, bandwidth = 4, validation: fails sev cv, agg cv.

var_dict() returns the 90th percentile points by unit and in total.

In [26]: port.var_dict(.9)
Out[26]: {'A': 1000.0, 'B': 2272.0, 'total': 3272.0}

Extract the relevant fields from density_df for the allocated loss recoveries. The first block shows standalone, the second pooled.

In [27]: qd(port.density_df.filter(regex='S|lev_[ABt]').loc[[port.B.q(0.9)]])

             S  lev_total  lev_A  lev_B
2.272k 0.20463       1589   1000 732.35

In [28]: qd(port.density_df.filter(regex='S|exa_[ABt]').loc[[port.q(0.9)]])

              S  exa_total  exa_A  exa_B
3.272k 0.099939     1732.4 967.51 764.85

5.5.6. Expected Loss Payments at Different Asset Levels

Expected losses paid to policy \(i\) are \(\bar S_i(a) := \mathsf E[X_i(a)]\). \(\bar S_i(a)\) can be computed, conditioning on \(X\), as

\[\bar S_i(a) = \mathsf E[\mathsf E[X_i(a)\mid X]] = \mathsf E[X_i \mid X \le a]F(a) + a\mathsf E\left[ \frac{X_i}{X}\mid X>a \right]S(a).\]

Because of its importance in allocating losses, define

\[\alpha_i(a) := \mathsf E[X_i/X\mid X> a].\]

The value \(\alpha_i(x)\) is the expected proportion of recoveries by unit \(i\) in the layer at \(x\). Since total assets available to pay losses always equals the layer width, and the chance the layer attaches is \(S(x)\), it is intuitively clear \(\alpha_i(x)S(x)\) is the loss density for unit \(i\), that is, the derivative of \(\bar S_i(x)\) with respect to \(x\). We now show this rigorously.

Proposition. Expected losses to policy \(i\) under equal priority, when total losses are supported by assets \(a\), is given by

\[\label{eq:alpha-S} \bar S_i(a) =\mathsf E[X_i(a)] = \int_0^a \alpha_i(x)S(x)dx\]

and so the policy loss density at \(x\) is \(S_i(x):=\alpha_i(x)S(x)\).

Proof. By the definition of conditional expectation, \(\alpha_i(a)S(a)=\mathsf E[(X_i/X)1_{X>a}]\). Conditioning on \(X\), using the tower property, and taking out the functions of \(X\) on the right shows

\[\alpha_i(a)S(a)=\mathsf E[\mathsf E[(X_i/X) 1_{X>a}\mid X]]=\int_a^\infty \mathsf E[X_i \mid X=x]\dfrac{f(x)}{x}dx\]

and therefore

\[\frac{d}{da}(\alpha_i(a)S(a)) = -\mathsf E[X_i \mid X=a]\dfrac{f(a)}{a}.\]

Now we can use integration by parts to compute

\[\begin{split}\int_0^a \alpha_i(x)S(x)\,dx &= x\alpha_i(x)S(x)\Big\vert_0^a + \int_0^a x\,\mathsf E[X_i \mid X=x]\dfrac{f(x)}{x}\,dx\\ &= a\alpha_i(a)S(a) + E[X_i \mid X\le a]F(a) \\ &= \bar S_i(a).\end{split}\]

Therefore the policy \(i\) loss density in the asset layer at \(a\), i.e. the derivative of cref{eq:eloss-main} with respect to \(a\), is \(S_{i}(a)=\alpha_i(a) S(a)\) as required.

Note that \(S_i\) is not the survival function of \(X_i(a)\) nor of \(X_i\).

5.5.7. The Natural Allocation Premium

Premium under \(\rho\) is given by \(\int_0^a g(S)\). We can interpret \(g(S(a))\) as the portfolio premium density in the layer at \(a\). We now consider the premium and premium density for each policy.

Using integration by parts we can express the price of an unlimited cover on \(X\) as

\[\label{eq:nat1} \rho(X)=\int_0^\infty g(S(x))\,dx = \int_0^\infty xg'(S(x))f(x)\,dx = \mathsf E[Xg'(S(X)))].\]

It is important that this integral is over all \(x\ge 0\) so the \(xg(S(x))\vert_0^a\) term disappears. The formula makes sense because a concave distortion is continuous on \((0,1]\) and can have at most countably infinitely many points where it is not differentiable (it has a kink). In total these points have measure zero, Borwein and Vanderwerff [2010], and we can ignore them in the integral. For more details see Dhaene et al. [2012].

Combining the integral and the properties of a distortion function, \(g'(S(X))\) is the Radon-Nikodym derivative of a measure \(\mathsf Q\) with \(\rho(X)=\mathsf E_{\mathsf Q}[X]\). In fact, \(\mathsf E_{\mathsf Q}[Y]=\mathsf E[Yg'(S(X))]\) for all random variables \(Y\). In general, any non-negative function \(Z\) (measure \(\mathsf Q\)) with \(\mathsf E[Z]=1\) and \(\rho(X)=\mathsf E[XZ]\) (\(=\mathsf E_{\mathsf Q}[X]\)) is called a contact function (subgradient) for \(\rho\) at \(X\), see Shapiro et al. [2009]. Thus \(g'(S(X))\) is a contact function for \(\rho\) at \(X\). The name subgradient comes from the fact that \(\rho(X+Y)\ge \mathsf E_{\mathsf Q}[X+Y] = \rho(X) + \mathsf E_{\mathsf Q}[Y]\), by the representation theorem. The set of subgradients is called the subdifferential of \(\rho\) at \(X\). If there is a unique subgradient then \(\rho\) is differentiable. Delbaen [2000] Theorem 17 shows that subgradients are contact functions.

We can interpret \(g'(S(X))\) as a state price density specific to the \(X\), suggesting that \(\mathsf E[X_ig'(S(X))]\) gives the value of the cash flows to policy \(i\). This motivates the following definition.

Definition. For \(X=\sum_i X_i\) with \(\mathsf Q\in\mathcal Q\) so that \(\rho(X)=\mathsf E_{\mathsf Q}[X]\), the natural allocation premium to policy \(X_j\) as part of the portfolio \(X\) is \(\mathsf E_{\mathsf Q}[X_j]\). It is denoted \(\rho_X(X_j)\).

The natural allocation premium is a standard approach, appearing in Delbaen [2000], Venter et al. [2006] and Tsanakas and Barnett [2003] for example. It has many desirable properties. Delbaen shows it is a fair allocation in the sense of fuzzy games and that it has a directional derivative, marginal interpretation when \(\rho\) is differentiable. It is consistent with Jouini and Kallal [2001] and Campi et al. [2013], which show the rational price of \(X\) in a market with frictions must be computed by state prices that are anti-comonotonic \(X\). In our application the signs are reversed: \(g'(S(X))\) and \(X\) are comonotonic.

The choice \(g'(S(X))\) is economically meaningful because it weights the largest outcomes of \(X\) the most, which is appropriate from a social, regulatory and investor perspective. It is also the only choice of weights that works for all levels of assets. Since investors stand ready to write any layer at the price determined by \(g\), their solution must work for all \(a\).

However, there are two technical issues with the proposed natural allocation. First, unlike prior works, we are allocating the premium for \(X\wedge a\), not \(X\), a problem also considered in Major [2018]. And second, \(\mathsf Q\) may not be unique. In general, uniqueness fails at capped variables like \(X\wedge a\). Both issues are surmountable for a SRM, resulting in a unique, well defined natural allocation. For a non-comonotonic additive risk measure this is not the case.

It is helpful to define the premium, risk adjusted, analog of the \(\alpha_i\) as

\[\label{eq:beta-def} \beta_i(a) := \mathsf E_{\mathsf Q}[(X_i/X) \mid X > a].\]

\(\beta_i(x)\) is the value of the recoveries paid to unit \(i\) by a policy paying 1 in states \(\{ X>a \}\), i.e. an allocation of the premium for \(1_{X>a}\). By the properties of conditional expectations, we have

\[\label{eq:beta-cond} \beta_i(a) = \frac{\mathsf E[(X_i/X) Z\mid X > a]}{\mathsf E[Z\mid X > a]}.\]

The denominator equals \(\mathsf Q(X>a)/\mathsf P(X>a)\). Remember that while \(\mathsf E_{\mathsf Q}[X]=\mathsf E[XZ]\), for conditional expectations \(\mathsf E_{\mathsf Q}[X\mid \mathcal F]=\mathsf E[XZ\mid \mathcal F]/\mathsf E[Z\mid \mathcal F]\), see [Föllmer and Schied [2011], Proposition A.12].

To compute \(\alpha_i\) and \(\beta_i\) we use a third function,

\[\label{eq:kappa-def} \kappa_i(x):= \mathsf E[X_i \mid X=x],\]

the conditional expectation of loss by policy, given the total loss.

Theorem. Let \(\mathsf Q\in \mathcal Q\) be the measure with Radon-Nikodym derivative \(Z=g'(S_X(X))\). Then:

  1. \(\mathsf E[X_i \mid X=x]=\mathsf E_{\mathsf Q}[X_i \mid X=x]\).

  2. \(\beta_i\) can be computed from \(\kappa_i\) as

\[\beta_i(a)= \frac{1}{\mathsf Q(X>a)}\int_a^\infty \dfrac{\kappa_i(x)}{x} g'(S(x))f(x)\, dx. \label{eq:beta-easy}\]
  1. The natural allocation premium for policy \(i\) under equal priority when total losses are supported by assets \(a\), \(\bar P_i(a):=\rho_{X\wedge a}(X_i(a))\), is given by

\[\begin{split}\bar P_i(a) &= \mathsf E_{\mathsf Q}[X_i \mid {X\le a}](1-g(S(a))) + a\mathsf E_{\mathsf Q}[X_i/X \mid {X > a}]g(S(a)) \label{eq:pibar-main} \\ &=\mathsf E[X_iZ\mid X\le a](1-S(a)) + a\mathsf E[(X_i/X)Z\mid X>a]S(a).\end{split}\]
  1. The policy \(i\) premium density equals

\[P_i(a)=\beta_i(a)g(S(a)). \label{eq:beta-gS}\]

It is an important to know when the natural allocation premium is unique. It is so when \(Z\) is the only contact function (i.e., there are no others). If \(X\) has a strictly increasing quantile function or is injective then \(\mathsf Q\) is unique and therefore given by \(g'S(X)\) and hence \(X\) measurable, see [Carlier and Dana, 2003] and Marinacci and Montrucchio [2004]. More generally, we can replace \(\mathsf Q\) with its expectation given \(X\) to make a canonical choice, resulting in the linear natural allocation [Cherny and Orlov, 2011].

The problem that can occur when \(\mathsf Q\) is not unique, but that can be circumvented when \(\rho\) is a SRM, can be illustrated as follows. Suppose \(\rho\) is given by \(p\)-TVaR. The measure \(\mathsf{Q}\) weights the worst \(1-p\) proportion of outcomes of \(X\) by a factor of \((1-p)^{-1}\) and ignores the others. Suppose \(a\) is chosen as \(p'\)-VaR for a lower threshold \(p'<p\). Let \(X_a=X\wedge a\) be capped insured losses and \(C=\{X_a=a\}\). By definition \(\Pr(C)\ge 1-p'>1-p\). Pick any \(A\subset C\) of measure \(1-p\) so that \(\rho(X)=\mathsf E[X\mid A]\). Let \(\psi\) be a measure preserving transformation of \(\Omega\) that acts non-trivially on \(C\) but trivially off \(C\). Then \(\mathsf{Q}'=\mathsf Q\psi\) will satisfy \(\mathsf E_{\mathsf{Q}'}[X_a]=\mathsf E_{\mathsf{Q}}[X_a\psi^{-1}]=\rho(X_a)\) but in general \(\mathsf E_{\mathsf{Q}'}[X]<\rho(X)\). The natural allocation with respect to \(\mathsf{Q}'\) will be different from that for \(\mathsf{Q}\). The theorem isolates a specific \(\mathsf Q\) to obtain a unique answer. The same idea applies to \(\mathsf Q\) from other, non-TVaR, \(\rho\): you can always shuffle part of the contact function within \(C\) to generate non-unique allocations. See Mildenhall and Major [2022] Example 239 for an illustration.

When \(\mathsf Q\) is \(X\) measurable, then \(\mathsf E_{\mathsf Q}[X_i \mid X]=\mathsf E[X_i \mid X]\), which enables explicit calculation. In this case there is no risk adjusted version of \(\kappa_i\). If \(\mathsf Q\) is not \(X\) measurable, then there can be risk adjusted \(\kappa_i\) because

\[\mathsf E[X_i Z \mid X] \not= \mathsf E[X_i \mid X] \mathsf E[Z \mid X].\]

The proof writes the price of a limited liability cover as the price of default-free protection minus the value of the default put. This is the standard starting point for allocation in a perfect competitive market taken by Phillips et al. [1998], Myers and Read Jr. [2001], Sherris [2006], and Ibragimov et al. [2010]. They then allocate the default put rather than the value of insurance payments directly.

To recap: the premium formulas have been derived assuming capital is provided at a cost \(g\) and there is equal priority by unit. The formulas are computationally tractable (see implementation in Portfolio Class Calculations) and require only that \(X\) have an increasing quantile function or that \(g'S(X)\) be used as the risk adjustment, but make no other assumptions. There is no need to assume the \(X_i\) are independent. They produce an entirely general, canonical determination of premium in the presence of shared costly capital. This result extends Grundl and Schmeiser [2007], who pointed out that with an additive pricing functional there is no need to allocate capital in order to price, to the situation of a non-additive SRM pricing functional.

5.5.8. Properties of Alpha, Beta, and Kappa

In this section we explore properties of \(\alpha_i\), \(\beta_i\), and \(\kappa_i\), and show how they interact to determine premiums by unit via the natural allocation.

For a measurable \(h\), \(\mathsf E[X_ih(X)]=\mathsf E[\kappa_i(X)h(X)]\) by the tower property. This simple observation results in huge simplifications. In general, \(\mathsf E[X_ih(X)]\) requires knowing the full bivariate distribution of \(X_i\) and \(X\). Using \(\kappa_i\) reduces it to a one dimensional problem. This is true even if the \(X_i\) are correlated. The \(\kappa_i\) functions can be estimated from data using regression and they provide an alternative way to model correlations.

Despite their central role, the \(\kappa_i\) functions are probably unfamiliar so we begin by giving several examples to illustrate how they behave. In general, they are non-linear and usually, but not always, increasing.

5.5.8.1. Examples of \(\kappa\) functions

  1. If \(Y_i\) are independent and identically distributed and \(X_n=Y_1+\cdots +Y_n\) then \(\mathsf E[X_m\mid X_{m+n}=x]=mx/(m+n)\) for \(m\ge 1, n\ge 0\). This is obvious when \(m=1\) because the functions \(\mathsf E[Y_i\mid X_n]\) are independent across \(i=1,\ldots,n\) and sum to \(x\). The result follows because conditional expectations are linear. In this case \(\kappa_i(x)=mx/(m+n)\) is a line through the origin.

  2. If \(X_i\) are multivariate normal then \(\kappa_i\) are straight lines, given by the usual least-squares fits

    \[\kappa_i(x)= \mathsf E[X_i] + \frac{\mathsf{cov}(X_i,X)}{\mathsf{var}(X)}(x-\mathsf E[X]).\]

    This example is familiar from the securities market line and the CAPM analysis of stock returns. If \(X_i\) are iid it reduces to the previous example because the slope is \(1/n\).

  3. If \(X_i\), \(i=1,2\), are compound Poisson with the same severity distribution then \(\kappa_i\) are again lines through the origin. Suppose \(X_i\) has expected claim count \(\lambda_i\). Write the conditional expectation as an integral, expand the density of the compound Poisson by conditioning on the claim count, and then swap the sum and integral to see that \(\kappa_1(x)=\mathsf E[X_1\mid X_1 + X_2=x]=x\,\mathsf E[N(\lambda_1)/(N(\lambda_1)+N(\lambda_2))]\) where \(N(\lambda)\) are independent Poisson with mean \(\lambda\). This example generalizes the iid case. Further conditioning on a common mixing variable extends the result to mixed Poisson frequencies where each aggregate can have a separate or shared mixing distribution. The common severity is essential. The result means that if a line of business is defined to be a group of policies that shares the same severity distribution, then premiums for policies within the line will have rates proportional to their expected claim counts.

  4. A theorem of Efron says that if \(X_i\) are independent and have log-concave densities then all \(\kappa_i\) are non-decreasing, Saumard and Wellner [2014]. The multivariate normal example is a special case of Efron’s theorem.

Denuit and Dhaene [2012] define an ex post risk sharing rule called the conditional mean risk allocation by taking \(\kappa_i(x)\) to be the allocation to policy \(i\) when \(X=x\). A series of recent papers, see Denuit and Robert [2020] and references therein, considers the properties of the conditional mean risk allocation focusing on its use in peer-to-peer insurance and the case when \(\kappa_i(x)\) is linear in \(x\).

5.5.9. Properties of the Natural Allocation

We now explore margin, equity, and return in total and by policy. We begin by considering them in total.

By definition the average return with assets \(a\) is

\[\label{eq:avg-roe} \bar\iota(a) := \frac{\bar M(a)}{\bar Q(a)}\]

where margin \(\bar M\) and equity \(\bar Q\) are the total margin and capital functions defined above.

The last formula has important implications. It tells us the investor priced expected return varies with the level of assets. For most distortions return decreases with increasing capital. In contrast, the standard RAROC models use a fixed average cost of capital, regardless of the overall asset level, Tasche [1999]. CAPM or the Fama-French three factor model are often used to estimate the average return, with a typical range of 7 to 20 percent, Cummins and Phillips [2005]. A common question of working actuaries performing capital allocation is about so-called excess capital, if the balance sheet contains more capital than is required by regulators, rating agencies, or managerial prudence. Our model suggests that higher layers of capital are cheaper, but not free, addressing this concern.

The varying returns may seem inconsistent with Miller-Modigliani. But that says the cost of funding a given amount of capital is independent of how it is split between debt and equity; it does not say the average cost is constant as the amount of capital varies.

5.5.9.1. No-Undercut and Positive Margin for Independent Risks

The natural allocation has two desirable properties. It is always less than the stand-alone premium, meaning it satisfies the no-undercut condition of Denault [2001], and it produces non-negative margins for independent risks.

Proposition. Let \(X=\sum_{i=1}^n X_i\), \(X_i\) non-negative and independent, and let \(g\) be a distortion. Then

  1. the natural allocation is never greater than the stand-alone premium, and

  2. the natural allocation to every \(X_i\) contains a non-negative margin.

Since \(\bar P_i = \mathsf E[\kappa_i(X)g'(S(X))]\) we see the no-undercut condition holds if \(\kappa_i(X)\) and \(g'(S(X))\) are comonotonic, and hence if \(\kappa_i\) is increasing, or if \(\kappa_i(X)\) and \(X\) are positively correlated (recall \(\mathsf E[g'(S(X))]=1\)). A policy \(i^*\) with increasing \(\kappa_{i^*}\) is a capacity consuming line that always has a positive margin. However, it can occur that no \(\kappa_i\) is increasing.

5.5.9.2. Policy Level Properties, Varying with Asset Level

We start with a corollary which gives a nicely symmetric and computationally tractable expression for the natural margin allocation in the case of finite assets.

Corollary. The margin density for unit \(i\) at asset level \(a\) is given by

\[\label{eq:coc-by-line} M_i(a) =\beta_i(a)g(S(a)) - \alpha_i(a)S(a).\]

Proof. We can compute margin \(\bar M_i(a)\) in \(\bar P_i(a)\) by line as

\[\begin{split}\bar M_i(a)=& \bar P_i(a) - \bar L_i(a) \nonumber \\ =& \int_0^a \beta_i(x)g(S(x)) - \alpha_i(x)S(x)\,dx. \label{eq:margin-by-line}\end{split}\]

Differentiating we get the margin density for unit \(i\) at \(a\) expressed in terms of \(\alpha_i\) and \(\beta_i\) as shown.

Margin in the current context is the cost of capital, thus this is an important result. It allows us to compute economic value by unit and to assess static portfolio performance by unit—one of the motivations for performing capital allocation in the first place. In many ways it is also a good place to stop. Remember these results only assume we are using a distortion risk measure and have equal priority in default. We are in a static model, so questions of portfolio homogeneity are irrelevant. We are not assuming \(X_i\) are independent.

What can we say about by margins by unit? Since \(g\) is increasing and concave \(P(a)=g(S(a))\ge S(a)\) for all \(a\ge 0\). Thus all asset layers contain a non-negative total margin density. It is a different situation by unit, where we can see

\[M_i(a) \ge 0 \iff \beta_i(a)g(S(a)) - \alpha_i(a)S(a)\ge 0 \iff \frac{\beta_i(a)}{\alpha_i(a)} \ge \frac{S(a)}{g(S(a))}.\]

The unit layer margin density is positive when \(\beta_i/\alpha_i\) is greater than the all-unit layer loss ratio. Since the loss ratio is \(\le 1\) there must be a positive layer margin density whenever \(\beta_i(a)/\alpha_i(a) > 1\). But when \(\beta_i(a)/\alpha_i(a) < 1\) it is possible the unit has a negative margin density. How can that occur and why does it make sense? To explore this we look at the shape of \(\alpha\) and \(\beta\) in more detail.

It is important to remember why the Proposition does not apply: it assumes unlimited cover, whereas here \(a<\infty\). With finite capital there are potential transfers between units caused by their behavior in default that overwhelm the positive margin implied by the proposition. Also note the proposition cannot be applied to \(X\wedge a=\sum_i X_i(a)\) because the unit payments are no longer independent.

In general we can make two predictions about margins.

Prediction 1: Lines where \(\alpha_i(x)\) or \(\kappa_i(x)/x\) increase with \(x\) will have always have a positive margin.

Prediction 2: A log-concave (thin tailed) unit aggregated with a non-log-concave (thick tailed) unit can have a negative margin, especially for lower asset layers.

Prediction 1 follows because the risk adjustment puts more weight on \(X_i/X\) for larger \(X\) and so \(\beta_i(x)/\alpha_i(x)> 1 > S(x) / g(S(x))\). Recall the risk adjustment is comonotonic with total losses \(X\).

A thin tailed unit aggregated with thick tailed units will have \(\alpha_i(x)\) decreasing with \(x\). Now the risk adjustment will produce \(\beta_i(x)<\alpha_i(x)\) and it is possible that \(\beta_i(x)/\alpha_i(x)<S(x)/g(S(x))\). In most cases, \(\alpha_i(x)\) approaches \(\mathsf E[X_i]/x\) and \(\beta_i(x)/\alpha_i(x)\) increases with \(x\), while the layer loss ratio decreases—and margin increases—and the thin unit will eventually get a positive margin. Whether or not the thin unit has a positive total margin \(\bar M_i(a)>0\) depends on the particulars of the units and the level of assets \(a\). A negative margin is more likely for less well capitalized insurers, which makes sense because default states are more material and they have a lower overall dollar cost of capital. In the independent case, as \(a\to\infty\) the proposition guarantees an eventually positive margins for all units.

These results are reasonable. Under limited liability, if assets and liabilities are pooled then the thick tailed unit benefits from pooling with the thin one because pooling increases the assets available to pay losses when needed. Equal priority transfers wealth from thin to thick in states of the world where thick has a bad event. But because thick dominates the total, the total losses are bad when thick is bad. The negative margin compensates the thin-tailed unit for transfers.

Another interesting situation occurs for asset levels within attritional loss layers. Most realistic insured loss portfolios are quite skewed and never experience very low loss ratios. For low loss layers, \(S(x)\) is close to 1 and the layer at \(x\) is funded almost entirely by expected losses; the margin and equity density components are nearly zero. Since the sum of margin densities over component units equals the total margin density, when the total is zero it necessarily follows that either all unit margins are also zero or that some are positive and some are negative. For the reasons noted above, thin tailed units get the negative margin as thick tailed units compensate them for the improved cover the thick tail units obtain by pooling.

In conclusion, the natural margin by unit reflects the relative consumption of assets by layer, Mango [2005]. Low layers are less ambiguous to the provider and have a lower margin relative to expected loss. Higher layers are more ambiguous and have lower loss ratios. High risk units consume more higher layer assets and hence have a lower loss ratio. For independent units with no default the margin is always positive. But there is a confounding effect when default is possible. Because more volatile units are more likely to cause default, there is a wealth transfer to them. The natural premium allocation compensates low risk policies for this transfer, which can result in negative margins in some cases.

5.5.10. The Natural Allocation of Equity

Although we have a margin by unit, we cannot compute return by unit, or allocate frictional costs of capital, because we still lack an equity allocation, a problem we now address.

Definition. The natural allocation of equity to unit \(i\) is given by

\[Q_i(a) = \frac{\beta_i(a)g(S(a)) - \alpha_i(x)S(a)}{g(S(a))- S(a)} \times (1-g(S(a))). \label{eq:main-alloc}\]

Why is this allocation natural? In total the layer return at \(a\) is

\[\iota(a) := \frac{M(a)}{Q(a)} = \frac{P(a) - S(a)}{1-P(a)} = \frac{g(S(a)) - S(a)}{1- g(S(a))}.\]

We claim that for a law invariant pricing measure the layer return must be the same for all units. Law invariance implies the risk measure is only concerned with the attachment probability of the layer at \(a\), and not with the cause of loss within the layer. If return within a layer varied by unit then the risk measure could not be law invariant.

We can now compute capital by layer by unit, by solving for the unknown equity density \(Q_i(a)\) via

\[\iota(a) = \frac{M(a)}{Q(a)} = \frac{M_i(a)}{Q_i(a)}\implies Q_i(a) = \frac{M_i(a)}{\iota(a)}.\]

Substituting for layer return and unit margin gives the result.

Since \(1-g(S(a))\) is the proportion of capital in the layer at \(a\), the main allocation result says the allocation to unit \(i\) is given by the nicely symmetric expression

\[\label{eq:q-formula} \frac{\beta_i(a)g(S(a)) - \alpha_i(x)S(a)}{g(S(a))- S(a)}.\]

To determine total capital by unit we integrate the equity density

\[\bar Q_i(a) := \int_0^a Q_i(x) dx.\]

And finally we can determine the average return to unit \(i\) at asset level \(a\)

\[\label{eq:avg-roe-by-unit} \bar\iota_i(a) = \frac{\bar M_i(a)}{\bar Q_i(a)}.\]

The average return will generally vary by unit and by asset level \(a\). Although the return within each layer is the same for all units, the margin, the proportion of capital, and the proportion attributable to each unit all vary by \(a\). Therefore average returns will vary by unit and \(a\). This is in stark contrast to the standard industry approach, which uses the same return for each unit and implicitly all \(a\). How these quantities vary by unit is complicated. Academic approaches emphasized the possibility that returns vary by unit, but struggled with parameterization, Myers and Cohn [1987].

This formula shows the average return by unit is an \(M_i\)-weighted harmonic mean of the layer returns given by the distortion \(g\), viz

\[\frac{1}{\bar\iota_i(a)} = \int_0^a \frac{1}{\iota(x)}\frac{M_i(x)}{\bar M_i(a)}\,dx.\]

The harmonic mean solves the problem that the return for lower layers of assets is potentially infinite (when \(g'(1)=0\)). The infinities do not matter: at lower asset layers there is little or no equity and the layer is fully funded by the loss component of premium. When so funded, there is no margin and so the infinite return gets zero weight. In this instance, the sense of the problem dictates that \(0\times\infty=0\): with no initial capital there is no final capital regardless of the return.

5.5.11. Appendix: Notation and Conventions

An insurer has finite assets and limited liability and is a one-period stock company. At \(t=0\) it sells its residual value to investors to raise equity. At time one it pays claims up to the amount of assets available. If assets are insufficient to pay claims it defaults. If there are excess assets they are returned to investors.

Total insured loss, or total risk, is described by a random variable \(X\ge 0\). \(X\) reflects policy limits but is not limited by provider assets. \(X=\sum_i X_i\) describes the split of losses by policy. \(F\), \(S\), \(f\), and \(q\) are the distribution, survival, density, and (lower) quantile functions of \(X\). Subscripts are used to disambiguate, e.g., \(S_{X_i}\) is the survival function of \(X_i\). \(X\wedge a\) denotes \(\min(X,a)\) and \(X^+=\max(X,0)\).

The letters \(S\), \(P\), \(M\) and \(Q\) refer to expected loss, premium, margin and equity, and \(a\) refers to assets. The value of survival function \(S(x)\) is the loss cost of the insurance paying \(1_{\{X>x\}}\), so the two uses of \(S\) are consistent. Premium equals expected loss plus margin; assets equal premium plus equity. All these quantities are functions of assets underlying the insurance.

We use the actuarial sign convention: large positive values are bad. Our concern is with quantiles \(q(p)\) for \(p\) near 1. Distortions are usually reversed, with \(g(s)\) for small \(s=1-p\) corresponding to bad outcomes. As far as possible we will use \(p\) in the context \(p\) close to 1 is bad and \(s\) when small \(s\) is bad.

Tail value at risk is defined for \(0\le p<1\) by

\[\mathsf{TVaR}_p(X) = \frac{1}{1-p}\int_p^1 q(t)dt.\]

Prices exclude all expenses. The risk free interest rate is zero. These are standard simplifying assumptions, e.g. Ibragimov et al. [2010].

The terminology describing risk measures is standard, and follows Föllmer and Schied [2011]. We work on a standard probability space, Svindland [2010], Appendix. It can be taken as \(\Omega=[0,1]\), with the Borel sigma-algebra and \(\mathsf P\) Lebesgue measure. The indicator function on a set \(A\) is \(1_A\), meaning \(1_A(x)=1\) if \(x\in A\) and \(1_A(x)=0\) otherwise.