2.2. Actuarial Student
Objectives: Introduce the aggregate
library for working with aggregate probability distributions in the context of actuarial society exams (SOA exam STAM, CAS exam MAS I, or IFOA CS-2) and university courses in (short-term) actuarial modeling.
Audience: Actuarial science university students and actuarial analysts.
Prerequisites: Familiarity with aggregate probability distributions as covered on actuarial society exams and basic insurance terminology from insurance company operations.
See also: Student for a more basic introduction; User Guides for other applications.
Contents:
2.2.1. Realistic Insurance Example
Assumptions. You are given the following information about a book of trucking liability insurance business.
Premium equals 2000 and the expected loss ratio equals 67.5%.
Ground-up severity has been fit to a lognormal distribution with a mean of 100 and CV (coefficient of variation) of 1.75.
All policies have a limit of 1000 with no deductible or retention.
Frequency is modeled using a Poisson distribution.
You model aggregate losses using the collective risk model.
Questions. Model aggregate losses using the collective risk model and compute the following:
The expected insured severity and expected claim count.
The aggregate expected value, standard deviation, CV, and skewness.
The probability aggregate losses exceed the premium.
The probability aggregate losses exceed 2500
The expected value of aggregate losses limited to 2500
The expected policyholder deficit in excess of 2500
Answers.
Build an aggregate object using simple DecL program.
The dataframe a01.describe
gives the answers to questions 1 and 2. It printed and formatted automatically by qd(a01)
. Note the validation report in the last line.
In [1]: from aggregate import build, qd
In [2]: a01 = build('agg Actuary:01 '
...: '2000 premium at 0.675 lr '
...: '1000 xs 0 '
...: 'sev lognorm 100 cv 1.75 '
...: 'poisson')
...:
In [3]: qd(a01)
E[X] Est E[X] Err E[X] CV(X) Est CV(X) Skew(X) Est Skew(X)
X
Freq 13.948 0.26776 0.26776
Sev 96.788 96.788 8.8983e-11 1.4339 1.4339 3.5491 3.5491
Agg 1350 1350 -6.8412e-11 0.46809 0.46809 0.88368 0.88368
log2 = 16, bandwidth = 1/8, validation: not unreasonable.
The survival function a01.sf
answers 3 and 4. qd
is used to print with reasonable defaults. The dataframe a01.density_df
computes limited expected values (levs) and expected policyholder deficit indexed by loss level, and other values. Querying it answers 5 and 6.
In [4]: qd(a01.sf(2000), a01.sf(2500))
0.14971
0.053466
In [5]: qd(a01.density_df.loc[[2500], ['F', 'S', 'lev', 'epd']])
F S lev epd
loss
2500.0 0.94653 0.053466 1327.7 0.016541
2.2.2. College and Exam Questions
College courses and the early actuarial exams often ask purely technical questions. Using assumptions from the Realistic Insurance Example answer the following.
Compute the severity lognormal parameters mu and sigma.
Compute the expected insured severity and expected claim count.
Compute the probability the aggregate exceeds the premium using the following matched moment approximations:
Normal
Gamma
Lognormal
Shifted gamma
Shifted lognormal
Using the
aggregate
and a lognormal approximation, compute:The probability losses exceed 2500
The expected value of losses limited to 2500
The expected value of losses in excess of 2500
The code below provides all the answers. mu_sigma_from_mean_cv
computes the lognormal parameters—one of the most written macro in actuarial science! Start by applying it to the given severity parameters to answer question 1.
In [6]: from aggregate import mu_sigma_from_mean_cv
In [7]: import pandas as pd
In [8]: print(mu_sigma_from_mean_cv(50, 1.25))
(3.441531333195883, 0.9700429601128635)
The function a01.approximate
parameterizes all the requested matched moment approximations, returning frozen scipy.stats
distribution objects that expose cdf
methods. The Aggregate
class object a
also has a cdf
method. Using these functions, we can assemble a dataframe to answer question 3.
In [9]: fz = a01.approximate('all')
In [10]: fz['agg'] = a01
In [11]: df = pd.DataFrame({k: v.sf(2000) for k, v in fz.items()}.items(),
....: columns=['Approximation', 'Value']
....: ).set_index("Approximation")
....:
In [12]: df['Error'] = df.Value / df.loc['agg', 'Value'] - 1
In [13]: qd(df.sort_values('Value'))
Value Error
Approximation
lognorm 0.13445 -0.1019
slognorm 0.14456 -0.03437
gamma 0.14689 -0.018844
sgamma 0.14745 -0.015088
agg 0.14971 0
norm 0.15183 0.014183
The function lognorm_lev
computes limited expected values for the lognormal. It is used to assemble a dataframe to answer question 4. In this case, the lognormal approximation EPD is over 50% higher than the more accurate estimate provided by aggregate
.
In [14]: from aggregate import lognorm_lev
In [15]: mu, sigma = mu_sigma_from_mean_cv(a01.agg_m, a01.agg_cv)
In [16]: lev = lognorm_lev(mu, sigma, 1, 2500)
In [17]: lev_agg = a01.density_df.loc[2500, 'lev']
In [18]: default = a01.agg_m - lev
In [19]: epd = default / a01.est_m
In [20]: default_agg = a01.est_m - lev_agg
In [21]: bit = pd.DataFrame((lev, default, lev_agg, default_agg, epd, default_agg / a01.agg_m),
....: index=pd.Index(['Lognorm LEV', 'Lognorm Default', 'Agg LEV',
....: 'Agg Default', 'Lognorm EPD', 'Agg EPD'],
....: name='Item'),
....: columns=['Value'])
....:
In [22]: qd(bit)
Value
Item
Lognorm LEV 1319.5
Lognorm Default 30.495
Agg LEV 1327.7
Agg Default 22.331
Lognorm EPD 0.022589
Agg EPD 0.016541
2.2.3. Advantages of Modeling with Aggregate Distributions
Aggregate distributions provide a powerful modeling paradigm. It separates the analysis of frequency and severity. Different datasets can be used for each. KPW list seven advantages.
Only the expected claim count changes with volume. The severity distribution is a characteristic of the line of business.
Inflation impacts ground-up severity but not claim count. The situation is more complicated when limits and deductibles apply.
Coverage terms impact occurrence limits and deductibles, which affect ground-up severity.
The impact on claims frequencies of changing deductibles is better understood.
Severity curves can be estimated from homogeneous data. Kaplan-Meier and related methods can adjust for censoring and truncation caused by limits and deductibles.
Retained, insured, ceded, and net losses can be modeled consistently.
Understanding properties of frequency and severity separately illuminates the shape of the aggregate.
2.2.4. Summary of Objects Created by DecL
Objects created by build()
in this guide.
In [23]: from aggregate import pprint_ex
In [24]: for n, r in build.qlist('^Actuary:').iterrows():
....: pprint_ex(r.program, split=20)
....: