5.5. Distortions and Spectral Risk Measures
Objectives: Introduce distortion functions and spectral risk measures.
Audience: Readers looking for a deeper technical understanding.
Prerequisites: Knowledge of probability and calculus; insurance terminology.
See also: Distortion Class Calculations.
Contents:
5.5.1. Helpful References
5.5.2. Distortion Function and Spectral Risk Measures
We define SRMs and recall results describing their different representations. By De Waegenaere et al. [2003] SRMs are consistent with general equilibrium and so it makes sense to consider them as pricing functionals. The SRM is interpreted as the (ask) price for an insurer-written risk transfer.
Definition. A distortion function is an increasing concave function \(g:[0,1]\to [0,1]\) satisfying \(g(0)=0\) and \(g(1)=1\).
A spectral risk measure \(\rho_g\) associated with a distortion \(g\) acts on a non-negative random variable \(X\) as
The simplest distortion if the identity \(g(s)=s\). Then \(\rho_g(X)=\mathsf E[X]\) from the integration-by-parts identity
Other well-known distortions include the proportional hazard \(g(s)=s^r\) for \(0<r\le 1\), its dual \(g(s)=1-(1-s)^r\) for \(r\ge 1\), and the Wang transform \(g(s)=\Phi(\Phi^{-1}(s)+\lambda)\) for \(\lambda \ge 0\), Wang [1995].
Since \(g\) is concave \(g(s)\ge 0g(0) + sg(1)=s\) for all \(0\le s\le 1\), showing \(\rho_g\) adds a non-negative margin.
Going forward, \(g\) is a distortion and \(\rho\) is its associated distortion risk measure. We interpret \(\rho\) as a pricing functional and refer to \(\rho(X)\) as the price or premium for insurance on \(X\).
SRMs are translation invariant, monotonic, subadditive, and positive homogeneous, and hence coherent, Acerbi [2002]. In addition, they are law invariant and comonotonic additive. In fact, all such functionals are SRMs. As well has having these properties, SRMs are powerful because we have a complete understanding of their representation and structure, which we summarize in the following theorem.
Theorem. Subject to \(\rho\) satisfying certain continuity assumptions, the following are equivalent.
\(\rho\) is a law invariant, coherent, comonotonic additive risk measure.
\(\rho=\rho_g\) for a concave distortion \(g\).
\(\rho\) has a representation as a weighted average of TVaRs for a measure \(\mu\) on \([0,1]\): \(\rho(X)=\int_0^1 \mathsf{TVaR}_p(X)\mu(dp)\).
\(\rho(X)=\max_{\mathsf Q\in\mathscr{Q}} \mathsf E_{\mathsf Q}[X]\) where \(\mathscr{Q}\) is the set of (finitely) additive measures with \(\mathsf Q(A)\le g(\mathsf P(A))\) for all measurable \(A\).
\(\rho(X)=\max_{\mathsf Z\in\mathscr{Z}} \mathsf E[XZ]\) where \(\mathscr{Z}\) is the set of positive functions on \(\Omega\) satisfying \(\int_p^1 q_Z(t)dt \le g(1-p)\), and \(q_Z\) is the quantile function of \(Z\).
The Theorem combines results from Föllmer and Schied [2011] (4.79, 4.80, 4.93, 4.94, 4.95), Delbaen [2000], Kusuoka [2001], and Carlier and Dana [2003]. It requires that \(\rho\) is continuous from above to rule out the possibility \(\rho=\sup\). In certain situations, the \(\sup\) risk measure applied to an unbounded random variable can only be represented as a \(\sup\) over a set of test measures and not a max. Note that the roles of from above and below are swapped from Föllmer and Schied [2011] because they use the asset, negative is bad, sign convention whereas we use the actuarial, positive is bad, convention.
The relationship between \(\mu\) and \(g\) is given by Föllmer and Schied [2011] 4.69 and 4.70. The elements of \(\mathscr Z\) are the Radon-Nikodym derivatives of the measures in \(\mathscr Q\).
The next four sections introduce the idea of layer densities and prove that SRM premium can be allocated to policy in a natural and unique way.
5.5.3. Layer Densities
Risk is often tranched into layers that are then insured and priced separately. Meyers [1996] describes layering in the context of liability increased limits factors and Culp and O'Donnell [2009], Mango et al. [2013] in the context of excess of loss reinsurance.
Define a layer \(y\) excess \(x\) by its payout function \(I_{(x,x+y]}(X):=(X-x)^+\wedge y\). The expected layer loss is
Based on this equation, Wang [1996] points out that \(S\) can be interpreted as the layer loss (net premium) density. Specifically, \(S\) is the layer loss density in the sense that \(S(x)=d/dx(\mathsf E[I_{(0, x]}(X)])\) is the marginal rate of increase in expected losses in the layer at \(x\). We use density in this sense to define premium, margin and equity densities, in addition to loss density.
Clearly \(S(x)\) equals the expected loss to the cover \(1_{\{X>x\}}\). By scaling, \(S(x)dx\) is the close to the expected loss for \(I_{(x, x+dx]}\) when \(dx\) is very small; Bodoff [2007] calls these infinitesimal layers.
Wang [1996] goes on to interpret
as the layer premium and hence \(g(S(x))\) as the layer premium density. We write \(P(x):=g(S(x))\) for the premium density.
We can decompose \(X\) into a sum of thin layers. All these layers are comonotonic with one another and with \(X\), resulting in an additive decomposition of \(\rho(X)\), since \(\rho\) is comonotonic additive. The decomposition mirrors the definition of \(\rho\) as an integral.
The amount of assets \(a\) available to pay claims determines the quality of insurance, and premium and expected losses are functions of \(a\). Premiums are well-known to be sensitive to the insurer’s asset resources and solvency, Phillips et al. [1998]. Assets may be infinite, implying unlimited cover. When \(a\) is finite there is usually some chance of default. Using the layer density view, define expected loss \(\bar S\) and premium \(\bar P\) functions as
Margin is \(\bar M(a):=\bar P(a)-\bar S(a)\) and margin density is \(M(a)=d\bar M(a)/da\). Assets are funded by premium and equity \(\bar Q(a):=a-\bar P(a)\). Again \(Q(a)=d\bar Q/da = 1-P(a)\). Together \(S\), \(M\), and \(Q\) give the split of layer funding between expected loss, margin and equity. Layers up to \(a\) are, by definition, fully collateralized. Thus \(\rho(X\wedge a)\) is the premium for a defaultable cover on \(X\) supported by assets \(a\), whereas \(\rho(X)\) is the premium for an unlimited, default-free cover.
The layer density view is consistent with more standard approaches to pricing. If \(X\) is a Bernoulli risk with \(\Pr(X=1)=s\) and expected loss cost \(s\), then \(\rho(X)=g(s)\) can be regarded as pricing a unit width layer with attachment probability \(s\). In an intermediated context, the funding constraint requires layers to be fully collateralized by premium plus equity—without such funding the insurance would not be credible since the insurer has no other source of funds.
Given \(g\) we can compute insurance market statistics for each layer. The loss, premium, margin, and equity densities are \(s\), \(g(s)\), \(g(s)-s\) and \(1-g(s)\). The layer loss ratio is \(s/g(s)\) and \((g(s)-s)/(1-g(s))\) is the layer return on equity. These quantities are illustrated in the next figure for a typical distortion function. The corresponding statistics for ground-up covers can be computed by integrating densities.
In [1]: from aggregate.extensions.pir_figures import fig_10_3
In [2]: fig_10_3()
For an insured risk, we regard the margin as compensation for ambiguity aversion and associated winner’s curse drag. Both of these effects are correlated with risk, so the margin is hard to distinguish from a risk load, but the rationale is different. Again, recall, although \(\rho\) is non-additive and appears to charge for diversifiable risk, De Waegenaere et al. [2003] assures us the pricing is consistent with a general equilibrium.
The layer density is distinct from models that vary the volume of each unit in a homogeneous portfolio model. Our portfolio is static. By varying assets we are implicitly varying the quality of insurance.
5.5.4. Portfolio Pricing with Spectral Risk Measures
Taken as read: a painful discussion that markets set prices, not actuaries and models. Pricing here means valuing according to some model. For actuaries, valuation is a term of art it means reserving to a life actuary. Pricing actuaries understanding that they are just determining a model value. Thus we will refer to the model price.
Several methods apply a distortion \(g\) to price by computing
notably:
Aggregate
:price
,apply_distortion
Portfolio
:price
,apply_distortion
, (called byanalyze distortion
)Distortion
:price
,price2
Working by hand using
density_df.p_total
.
All of these methods use the same approach, the integral is approximated as a left Riemann sum:
The implementation computes
S
as1 - p_total.cumsum()
,gS = d.g(S)
, and(gS.loc[:a - bs] * np.diff(S.index)).sum()
or.cumsum().iloc[-1]
.
The p_total.cumsum()
idiom automatically accounts for the case where the output distribution is not normalized (sums to \(<1\)).
Using sum
vs. cumsum
is usually an O(1e-16) difference. These methods all use the forward difference of \(dt\) and match against the unlagged values of S
or gS
(per PIR p. 272-3). The Aggregate
method prepends 0 and then computes a cumsum
, so the a
index gives the right value. Remember, pandas.Series.loc[:a]
includes the element with index a
(whereas iloc[:n]
does not).
When a
is given, the series includes a
(based on .loc[:a]
) and the last value is dropped from the sum product.
The next block of code provides a reconciliation of methods. Build an aggregate and put it in a Portfolio
object to expose calibrate_distortions
.
In [3]: from aggregate import Portfolio, build, qd
In [4]: import pandas as pd
In [5]: a = build('agg CommAuto '
...: '10 claims '
...: '10000 xs 0 '
...: 'sev lognorm 50 cv 4 '
...: 'poisson')
...:
In [6]: qd(a)
E[X] Est E[X] Err E[X] CV(X) Est CV(X) Skew(X) Est Skew(X)
X
Freq 10 0.31623 0.31623
Sev 49.804 49.803 -4.6559e-06 3.5917 3.5918 20.434 20.434
Agg 498.04 498.03 -4.6559e-06 1.179 1.179 6.0196 6.0195
log2 = 16, bandwidth = 1/2, validation: not unreasonable.
In [7]: pa = Portfolio('test', [a])
In [8]: pa.update(log2=16, bs=1/4)
In [9]: qd(pa)
E[X] Est E[X] Err E[X] CV(X) Est CV(X) Skew(X) Est Skew(X)
unit X
CommAuto Freq 10 0.31623 0.31623
Sev 49.804 49.804 -3.0452e-07 3.5917 3.5918 20.434 20.434
Agg 498.04 498.04 -1.8636e-06 1.179 1.179 6.0196 6.0168
total Freq 10 0.31623 0.31623
Sev 49.804 49.804 -3.0452e-07 3.5917 20.434
Agg 498.04 498.03 -1.655e-05 1.179 1.1788 6.0196 6.0109
log2 = 16, bandwidth = 1/4, validation: fails agg mean error >> sev, possible aliasing; try larger bs.
Determine distortion parameters to achieve a 10% return at 99 percentile capital, and display them. Pull out the achieved pricing.
In [10]: pa.calibrate_distortions(ROEs=[0.1], Ps=[0.99], strict='ordered');
In [11]: d = pa.dists['dual']
In [12]: qd(pa.distortion_df)
S L P PQ Q COC param error
a LR method
2.745k 700.875m ccoc 0.0099992 482.03 687.76 0.33431 2057.2 0.1 0.1 0
ph 0.0099992 482.03 687.76 0.33431 2057.2 0.1 0.68741 1.7053e-12
wang 0.0099992 482.03 687.76 0.33431 2057.2 0.1 0.43983 2.558e-11
dual 0.0099992 482.03 687.76 0.33431 2057.2 0.1 1.9436 -1.1369e-13
tvar 0.0099992 482.03 687.76 0.33431 2057.2 0.1 0.39096 8.1372e-06
In [13]: f"Exact premium {pa.distortion_df.iloc[0, 2]:.15f}"
Out[13]: 'Exact premium 687.755319984361222'
Compute pricing in the four ways described above.
In [14]: dm = pa.price(.99, d)
In [15]: f'Exact value {dm.price:.15f}'
Out[15]: 'Exact value 687.755319984360540'
In [16]: bit = a.density_df[['loss', 'p_total', 'S']]
In [17]: bit['aS'] = 1 - bit.p_total.cumsum()
In [18]: bit['gS'] = d.g(bit.S)
In [19]: bit['gaS'] = d.g(bit.aS)
In [20]: test = pd.Series((d.price(bit.loc[:a.q(0.99), 'p_total'], kind='both')[-1],
....: d.price(a.density_df.p_total, a.q(0.99), kind='both')[-1],
....: d.price2(bit.p_total).loc[a.q(0.99)].ask, \
....: d.price2(bit.p_total, a.q(0.99)).ask.iloc[0],
....: a.price(0.99, d).iloc[0, 1],
....: dm.price,
....: bit.loc[:a.q(0.99)-a.bs, 'gS'].sum() * a.bs,
....: bit.loc[:a.q(0.99)-a.bs, 'gS'].cumsum().iloc[-1] * a.bs,
....: bit.loc[:a.q(0.99)-a.bs, 'gaS'].sum() * a.bs,
....: bit.loc[:a.q(0.99)-a.bs, 'gaS'].cumsum().iloc[-1] * a.bs),
....: index=['distortion.price',
....: 'distortion.price with a',
....: 'distortion.price2, find a',
....: 'distortion.price2(a)',
....: 'Aggregate.price',
....: 'Portfolio.price',
....: 'bit sum',
....: 'bit cumsum',
....: 'bit sum alt S',
....: 'bit cumsum alt S'
....: ])
....:
Display the results and the relative difference to the largest price.
In [21]: qd(test.sort_values(),
....: float_format=lambda x: f'{x:.15f}')
....:
distortion.price2(a) 687.755319984360540
distortion.price2, find a 687.755319984360540
Portfolio.price 687.755319984360540
Aggregate.price 687.755319984360540
bit cumsum 687.755319984360540
bit cumsum alt S 687.755319984360540
distortion.price 687.755319984361108
distortion.price with a 687.755319984361108
bit sum 687.755319984361108
bit sum alt S 687.755319984361108
In [22]: qd(test.sort_values() / test.sort_values().iloc[-1] - 1,
....: float_format=lambda x: f'{x:.6e}')
....:
distortion.price2(a) -7.771561e-16
distortion.price2, find a -7.771561e-16
Portfolio.price -7.771561e-16
Aggregate.price -7.771561e-16
bit cumsum -7.771561e-16
bit cumsum alt S -7.771561e-16
distortion.price 0.000000e+00
distortion.price with a 0.000000e+00
bit sum 0.000000e+00
bit sum alt S 0.000000e+00
5.5.5. The Equal Priority Default Rule
If assets are finite and the provider has limited liability we need to to determine policy-level cash flows in default states before we can determine the fair market value of insurance. The most common way to do this is using equal priority in default.
Under limited liability, total losses are split between provider payments and provider default as
Next, actual payments \(X\wedge a\) must be allocated to each policy.
\(X_i\) is the amount promised to \(i\) by their insurance contract. Promises are limited by policy provisions but are not limited by provider assets. At the policy level, equal priority implies the payments made to, and default borne by, policy \(i\) are split as
Therefore the payments made to policy \(i\) are given
\(X_i(a)\) is the amount actually paid to policy \(i\). It depends on \(a\), \(X\) and \(X_i\). The dependence on \(X\) is critical. It is responsible for almost all the theoretical complexity of insurance pricing.
It is worth reiterating that with this definition \(\sum_i X_i(a)=X\wedge a\).
Example.
Here is an example illustrating the effect of equal
priority. Consider a certain loss \(X_0=1000\) and \(X_1\) given
by a lognormal with mean 1000 and coefficient of variation 2.0. Prudence
requires losses be backed by assets equal to the 0.9 quantile. On a
stand-alone basis \(X_0\) is backed by \(a_0=1000\) and is
risk-free. \(X_1\) is backed by \(a_1=2272\) and the recovery is
subject to a considerable haircut, since
\(\mathsf E[X_1\wedge 2272] = 732.3\). If these risks are pooled, the pool
must hold \(a=a_0+a_1\) for the same level of prudence. When
\(X_1\le a_1\) both units are paid in full. But when
\(X_1 > a_1\), \(X_0\) receives \(1000(a/(1000+X_1))\) and
\(X_1\) receives the remaining \(X_1(a/(1000+X_1))\). Payment to
both units is pro rated down by the same factor
\(a/(1000+X_1)\)—hence the name equal priority. In the pooled
case, the expected recovery to \(X_0\) is 967.5 and 764.8 to
\(X_1\). Pooling and equal priority result in a transfer of 32.5
from \(X_0\) to \(X_1\). This example shows what can occur when
a thin tailed unit pools with a thick tailed one under a weak capital
standard with equal priority. We shall see how pricing compensates for
these loss payment transfers, with \(X_1\) paying a positive margin
and \(X_0\) a negative one. The calculations are performed in aggregate
as follows. First, set up the Portfolio
:
In [23]: from aggregate import build, qd
In [24]: port = build('port Dist:EqPri '
....: 'agg A 1 claim dsev [1000] fixed '
....: 'agg B 1 claim sev lognorm 1000 cv 2 fixed',
....: bs=4)
....:
In [25]: qd(port)
E[X] Est E[X] Err E[X] CV(X) Est CV(X) Skew(X) Est Skew(X)
unit X
A Freq 1 0
Sev 1000 1000 0 0 0
Agg 1000 1000 0 0 0
B Freq 1 0
Sev 1000 999.91 -8.6294e-05 2 1.9921 14 12.417
Agg 1000 999.91 -8.6294e-05 2 1.9921 14 12.417
total Freq 2 0
Sev 1000 999.96 -4.3147e-05 1.4142 19.799
Agg 2000 1999.9 -4.3673e-05 1 0.99599 14 12.41
log2 = 16, bandwidth = 4, validation: fails sev cv, agg cv.
var_dict()
returns the 90th percentile points by unit and in total.
In [26]: port.var_dict(.9)
Out[26]: {'A': 1000.0, 'B': 2272.0, 'total': 3272.0}
Extract the relevant fields from density_df
for the allocated loss recoveries.
The first block shows standalone, the second pooled.
In [27]: qd(port.density_df.filter(regex='S|lev_[ABt]').loc[[port.B.q(0.9)]])
S lev_total lev_A lev_B
2.272k 0.20463 1589 1000 732.35
In [28]: qd(port.density_df.filter(regex='S|exa_[ABt]').loc[[port.q(0.9)]])
S exa_total exa_A exa_B
3.272k 0.099939 1732.4 967.51 764.85
5.5.6. Expected Loss Payments at Different Asset Levels
Expected losses paid to policy \(i\) are \(\bar S_i(a) := \mathsf E[X_i(a)]\). \(\bar S_i(a)\) can be computed, conditioning on \(X\), as
Because of its importance in allocating losses, define
The value \(\alpha_i(x)\) is the expected proportion of recoveries by unit \(i\) in the layer at \(x\). Since total assets available to pay losses always equals the layer width, and the chance the layer attaches is \(S(x)\), it is intuitively clear \(\alpha_i(x)S(x)\) is the loss density for unit \(i\), that is, the derivative of \(\bar S_i(x)\) with respect to \(x\). We now show this rigorously.
Proposition. Expected losses to policy \(i\) under equal priority, when total losses are supported by assets \(a\), is given by
and so the policy loss density at \(x\) is \(S_i(x):=\alpha_i(x)S(x)\).
Proof. By the definition of conditional expectation, \(\alpha_i(a)S(a)=\mathsf E[(X_i/X)1_{X>a}]\). Conditioning on \(X\), using the tower property, and taking out the functions of \(X\) on the right shows
and therefore
Now we can use integration by parts to compute
Therefore the policy \(i\) loss density in the asset layer at \(a\), i.e. the derivative of cref{eq:eloss-main} with respect to \(a\), is \(S_{i}(a)=\alpha_i(a) S(a)\) as required.
Note that \(S_i\) is not the survival function of \(X_i(a)\) nor of \(X_i\).
5.5.8. Properties of Alpha, Beta, and Kappa
In this section we explore properties of \(\alpha_i\), \(\beta_i\), and \(\kappa_i\), and show how they interact to determine premiums by unit via the natural allocation.
For a measurable \(h\), \(\mathsf E[X_ih(X)]=\mathsf E[\kappa_i(X)h(X)]\) by the tower property. This simple observation results in huge simplifications. In general, \(\mathsf E[X_ih(X)]\) requires knowing the full bivariate distribution of \(X_i\) and \(X\). Using \(\kappa_i\) reduces it to a one dimensional problem. This is true even if the \(X_i\) are correlated. The \(\kappa_i\) functions can be estimated from data using regression and they provide an alternative way to model correlations.
Despite their central role, the \(\kappa_i\) functions are probably unfamiliar so we begin by giving several examples to illustrate how they behave. In general, they are non-linear and usually, but not always, increasing.
5.5.8.1. Examples of \(\kappa\) functions
If \(Y_i\) are independent and identically distributed and \(X_n=Y_1+\cdots +Y_n\) then \(\mathsf E[X_m\mid X_{m+n}=x]=mx/(m+n)\) for \(m\ge 1, n\ge 0\). This is obvious when \(m=1\) because the functions \(\mathsf E[Y_i\mid X_n]\) are independent across \(i=1,\ldots,n\) and sum to \(x\). The result follows because conditional expectations are linear. In this case \(\kappa_i(x)=mx/(m+n)\) is a line through the origin.
If \(X_i\) are multivariate normal then \(\kappa_i\) are straight lines, given by the usual least-squares fits
\[\kappa_i(x)= \mathsf E[X_i] + \frac{\mathsf{cov}(X_i,X)}{\mathsf{var}(X)}(x-\mathsf E[X]).\]This example is familiar from the securities market line and the CAPM analysis of stock returns. If \(X_i\) are iid it reduces to the previous example because the slope is \(1/n\).
If \(X_i\), \(i=1,2\), are compound Poisson with the same severity distribution then \(\kappa_i\) are again lines through the origin. Suppose \(X_i\) has expected claim count \(\lambda_i\). Write the conditional expectation as an integral, expand the density of the compound Poisson by conditioning on the claim count, and then swap the sum and integral to see that \(\kappa_1(x)=\mathsf E[X_1\mid X_1 + X_2=x]=x\,\mathsf E[N(\lambda_1)/(N(\lambda_1)+N(\lambda_2))]\) where \(N(\lambda)\) are independent Poisson with mean \(\lambda\). This example generalizes the iid case. Further conditioning on a common mixing variable extends the result to mixed Poisson frequencies where each aggregate can have a separate or shared mixing distribution. The common severity is essential. The result means that if a line of business is defined to be a group of policies that shares the same severity distribution, then premiums for policies within the line will have rates proportional to their expected claim counts.
A theorem of Efron says that if \(X_i\) are independent and have log-concave densities then all \(\kappa_i\) are non-decreasing, Saumard and Wellner [2014]. The multivariate normal example is a special case of Efron’s theorem.
Denuit and Dhaene [2012] define an ex post risk sharing rule called the conditional mean risk allocation by taking \(\kappa_i(x)\) to be the allocation to policy \(i\) when \(X=x\). A series of recent papers, see Denuit and Robert [2020] and references therein, considers the properties of the conditional mean risk allocation focusing on its use in peer-to-peer insurance and the case when \(\kappa_i(x)\) is linear in \(x\).
5.5.9. Properties of the Natural Allocation
We now explore margin, equity, and return in total and by policy. We begin by considering them in total.
By definition the average return with assets \(a\) is
where margin \(\bar M\) and equity \(\bar Q\) are the total margin and capital functions defined above.
The last formula has important implications. It tells us the investor priced expected return varies with the level of assets. For most distortions return decreases with increasing capital. In contrast, the standard RAROC models use a fixed average cost of capital, regardless of the overall asset level, Tasche [1999]. CAPM or the Fama-French three factor model are often used to estimate the average return, with a typical range of 7 to 20 percent, Cummins and Phillips [2005]. A common question of working actuaries performing capital allocation is about so-called excess capital, if the balance sheet contains more capital than is required by regulators, rating agencies, or managerial prudence. Our model suggests that higher layers of capital are cheaper, but not free, addressing this concern.
The varying returns may seem inconsistent with Miller-Modigliani. But that says the cost of funding a given amount of capital is independent of how it is split between debt and equity; it does not say the average cost is constant as the amount of capital varies.
5.5.9.1. No-Undercut and Positive Margin for Independent Risks
The natural allocation has two desirable properties. It is always less than the stand-alone premium, meaning it satisfies the no-undercut condition of Denault [2001], and it produces non-negative margins for independent risks.
Proposition. Let \(X=\sum_{i=1}^n X_i\), \(X_i\) non-negative and independent, and let \(g\) be a distortion. Then
the natural allocation is never greater than the stand-alone premium, and
the natural allocation to every \(X_i\) contains a non-negative margin.
Since \(\bar P_i = \mathsf E[\kappa_i(X)g'(S(X))]\) we see the no-undercut condition holds if \(\kappa_i(X)\) and \(g'(S(X))\) are comonotonic, and hence if \(\kappa_i\) is increasing, or if \(\kappa_i(X)\) and \(X\) are positively correlated (recall \(\mathsf E[g'(S(X))]=1\)). A policy \(i^*\) with increasing \(\kappa_{i^*}\) is a capacity consuming line that always has a positive margin. However, it can occur that no \(\kappa_i\) is increasing.
5.5.9.2. Policy Level Properties, Varying with Asset Level
We start with a corollary which gives a nicely symmetric and computationally tractable expression for the natural margin allocation in the case of finite assets.
Corollary. The margin density for unit \(i\) at asset level \(a\) is given by
Proof. We can compute margin \(\bar M_i(a)\) in \(\bar P_i(a)\) by line as
Differentiating we get the margin density for unit \(i\) at \(a\) expressed in terms of \(\alpha_i\) and \(\beta_i\) as shown.
Margin in the current context is the cost of capital, thus this is an important result. It allows us to compute economic value by unit and to assess static portfolio performance by unit—one of the motivations for performing capital allocation in the first place. In many ways it is also a good place to stop. Remember these results only assume we are using a distortion risk measure and have equal priority in default. We are in a static model, so questions of portfolio homogeneity are irrelevant. We are not assuming \(X_i\) are independent.
What can we say about by margins by unit? Since \(g\) is increasing and concave \(P(a)=g(S(a))\ge S(a)\) for all \(a\ge 0\). Thus all asset layers contain a non-negative total margin density. It is a different situation by unit, where we can see
The unit layer margin density is positive when \(\beta_i/\alpha_i\) is greater than the all-unit layer loss ratio. Since the loss ratio is \(\le 1\) there must be a positive layer margin density whenever \(\beta_i(a)/\alpha_i(a) > 1\). But when \(\beta_i(a)/\alpha_i(a) < 1\) it is possible the unit has a negative margin density. How can that occur and why does it make sense? To explore this we look at the shape of \(\alpha\) and \(\beta\) in more detail.
It is important to remember why the Proposition does not apply: it assumes unlimited cover, whereas here \(a<\infty\). With finite capital there are potential transfers between units caused by their behavior in default that overwhelm the positive margin implied by the proposition. Also note the proposition cannot be applied to \(X\wedge a=\sum_i X_i(a)\) because the unit payments are no longer independent.
In general we can make two predictions about margins.
Prediction 1: Lines where \(\alpha_i(x)\) or \(\kappa_i(x)/x\) increase with \(x\) will have always have a positive margin.
Prediction 2: A log-concave (thin tailed) unit aggregated with a non-log-concave (thick tailed) unit can have a negative margin, especially for lower asset layers.
Prediction 1 follows because the risk adjustment puts more weight on \(X_i/X\) for larger \(X\) and so \(\beta_i(x)/\alpha_i(x)> 1 > S(x) / g(S(x))\). Recall the risk adjustment is comonotonic with total losses \(X\).
A thin tailed unit aggregated with thick tailed units will have \(\alpha_i(x)\) decreasing with \(x\). Now the risk adjustment will produce \(\beta_i(x)<\alpha_i(x)\) and it is possible that \(\beta_i(x)/\alpha_i(x)<S(x)/g(S(x))\). In most cases, \(\alpha_i(x)\) approaches \(\mathsf E[X_i]/x\) and \(\beta_i(x)/\alpha_i(x)\) increases with \(x\), while the layer loss ratio decreases—and margin increases—and the thin unit will eventually get a positive margin. Whether or not the thin unit has a positive total margin \(\bar M_i(a)>0\) depends on the particulars of the units and the level of assets \(a\). A negative margin is more likely for less well capitalized insurers, which makes sense because default states are more material and they have a lower overall dollar cost of capital. In the independent case, as \(a\to\infty\) the proposition guarantees an eventually positive margins for all units.
These results are reasonable. Under limited liability, if assets and liabilities are pooled then the thick tailed unit benefits from pooling with the thin one because pooling increases the assets available to pay losses when needed. Equal priority transfers wealth from thin to thick in states of the world where thick has a bad event. But because thick dominates the total, the total losses are bad when thick is bad. The negative margin compensates the thin-tailed unit for transfers.
Another interesting situation occurs for asset levels within attritional loss layers. Most realistic insured loss portfolios are quite skewed and never experience very low loss ratios. For low loss layers, \(S(x)\) is close to 1 and the layer at \(x\) is funded almost entirely by expected losses; the margin and equity density components are nearly zero. Since the sum of margin densities over component units equals the total margin density, when the total is zero it necessarily follows that either all unit margins are also zero or that some are positive and some are negative. For the reasons noted above, thin tailed units get the negative margin as thick tailed units compensate them for the improved cover the thick tail units obtain by pooling.
In conclusion, the natural margin by unit reflects the relative consumption of assets by layer, Mango [2005]. Low layers are less ambiguous to the provider and have a lower margin relative to expected loss. Higher layers are more ambiguous and have lower loss ratios. High risk units consume more higher layer assets and hence have a lower loss ratio. For independent units with no default the margin is always positive. But there is a confounding effect when default is possible. Because more volatile units are more likely to cause default, there is a wealth transfer to them. The natural premium allocation compensates low risk policies for this transfer, which can result in negative margins in some cases.
5.5.10. The Natural Allocation of Equity
Although we have a margin by unit, we cannot compute return by unit, or allocate frictional costs of capital, because we still lack an equity allocation, a problem we now address.
Definition. The natural allocation of equity to unit \(i\) is given by
Why is this allocation natural? In total the layer return at \(a\) is
We claim that for a law invariant pricing measure the layer return must be the same for all units. Law invariance implies the risk measure is only concerned with the attachment probability of the layer at \(a\), and not with the cause of loss within the layer. If return within a layer varied by unit then the risk measure could not be law invariant.
We can now compute capital by layer by unit, by solving for the unknown equity density \(Q_i(a)\) via
Substituting for layer return and unit margin gives the result.
Since \(1-g(S(a))\) is the proportion of capital in the layer at \(a\), the main allocation result says the allocation to unit \(i\) is given by the nicely symmetric expression
To determine total capital by unit we integrate the equity density
And finally we can determine the average return to unit \(i\) at asset level \(a\)
The average return will generally vary by unit and by asset level \(a\). Although the return within each layer is the same for all units, the margin, the proportion of capital, and the proportion attributable to each unit all vary by \(a\). Therefore average returns will vary by unit and \(a\). This is in stark contrast to the standard industry approach, which uses the same return for each unit and implicitly all \(a\). How these quantities vary by unit is complicated. Academic approaches emphasized the possibility that returns vary by unit, but struggled with parameterization, Myers and Cohn [1987].
This formula shows the average return by unit is an \(M_i\)-weighted harmonic mean of the layer returns given by the distortion \(g\), viz
The harmonic mean solves the problem that the return for lower layers of assets is potentially infinite (when \(g'(1)=0\)). The infinities do not matter: at lower asset layers there is little or no equity and the layer is fully funded by the loss component of premium. When so funded, there is no margin and so the infinite return gets zero weight. In this instance, the sense of the problem dictates that \(0\times\infty=0\): with no initial capital there is no final capital regardless of the return.
5.5.11. Appendix: Notation and Conventions
An insurer has finite assets and limited liability and is a one-period stock company. At \(t=0\) it sells its residual value to investors to raise equity. At time one it pays claims up to the amount of assets available. If assets are insufficient to pay claims it defaults. If there are excess assets they are returned to investors.
Total insured loss, or total risk, is described by a random variable \(X\ge 0\). \(X\) reflects policy limits but is not limited by provider assets. \(X=\sum_i X_i\) describes the split of losses by policy. \(F\), \(S\), \(f\), and \(q\) are the distribution, survival, density, and (lower) quantile functions of \(X\). Subscripts are used to disambiguate, e.g., \(S_{X_i}\) is the survival function of \(X_i\). \(X\wedge a\) denotes \(\min(X,a)\) and \(X^+=\max(X,0)\).
The letters \(S\), \(P\), \(M\) and \(Q\) refer to expected loss, premium, margin and equity, and \(a\) refers to assets. The value of survival function \(S(x)\) is the loss cost of the insurance paying \(1_{\{X>x\}}\), so the two uses of \(S\) are consistent. Premium equals expected loss plus margin; assets equal premium plus equity. All these quantities are functions of assets underlying the insurance.
We use the actuarial sign convention: large positive values are bad. Our concern is with quantiles \(q(p)\) for \(p\) near 1. Distortions are usually reversed, with \(g(s)\) for small \(s=1-p\) corresponding to bad outcomes. As far as possible we will use \(p\) in the context \(p\) close to 1 is bad and \(s\) when small \(s\) is bad.
Tail value at risk is defined for \(0\le p<1\) by
Prices exclude all expenses. The risk free interest rate is zero. These are standard simplifying assumptions, e.g. Ibragimov et al. [2010].
The terminology describing risk measures is standard, and follows Föllmer and Schied [2011]. We work on a standard probability space, Svindland [2010], Appendix. It can be taken as \(\Omega=[0,1]\), with the Borel sigma-algebra and \(\mathsf P\) Lebesgue measure. The indicator function on a set \(A\) is \(1_A\), meaning \(1_A(x)=1\) if \(x\in A\) and \(1_A(x)=0\) otherwise.