2.2. Actuarial Student

Objectives: Introduce the aggregate library for working with aggregate probability distributions in the context of actuarial society exams (SOA exam STAM, CAS exam MAS I, or IFOA CS-2) and university courses in (short-term) actuarial modeling.

Audience: Actuarial science university students and actuarial analysts.

Prerequisites: Familiarity with aggregate probability distributions as covered on actuarial society exams and basic insurance terminology from insurance company operations.

See also: Student for a more basic introduction; User Guides for other applications.

Contents:

Realistic Insurance Example
College and Exam Questions
Advantages of Modeling with Aggregate Distributions
Summary of Objects Created by DecL

2.2.1. Realistic Insurance Example

Assumptions. You are given the following information about a book of trucking liability insurance business.

Premium equals 2000 and the expected loss ratio equals 67.5%.
Ground-up severity has been fit to a lognormal distribution with a mean of 100 and CV (coefficient of variation) of 1.75.
All policies have a limit of 1000 with no deductible or retention.
Frequency is modeled using a Poisson distribution.

You model aggregate losses using the collective risk model.

Questions. Model aggregate losses using the collective risk model and compute the following:

The expected insured severity and expected claim count.
The aggregate expected value, standard deviation, CV, and skewness.
The probability aggregate losses exceed the premium.
The probability aggregate losses exceed 2500
The expected value of aggregate losses limited to 2500
The expected policyholder deficit in excess of 2500

Answers.

Build an aggregate object using simple DecL program. The dataframe a01.describe gives the answers to questions 1 and 2. It printed and formatted automatically by qd(a01). Note the validation report in the last line.

In [1]: from aggregate import build, qd

In [2]: a01 = build('agg Actuary:01 '
   ...:             '2000 premium at 0.675 lr '
   ...:             '1000 xs 0 '
   ...:             'sev lognorm 100 cv 1.75 '
   ...:             'poisson')
   ...: 

In [3]: qd(a01)

       E[X] Est E[X]    Err E[X]   CV(X) Est CV(X)  Skew(X) Est Skew(X)
X                                                                      
Freq 13.948                      0.26776            0.26776            
Sev  96.788   96.788  8.8983e-11  1.4339    1.4339   3.5491      3.5491
Agg    1350     1350 -6.8412e-11 0.46809   0.46809  0.88368     0.88368
log2 = 16, bandwidth = 1/8, validation: not unreasonable.

The survival function a01.sf answers 3 and 4. qd is used to print with reasonable defaults. The dataframe a01.density_df computes limited expected values (levs) and expected policyholder deficit indexed by loss level, and other values. Querying it answers 5 and 6.

In [4]: qd(a01.sf(2000), a01.sf(2500))
0.14971
0.053466

In [5]: qd(a01.density_df.loc[[2500], ['F', 'S', 'lev', 'epd']])

             F        S    lev      epd
loss                                   
2500.0 0.94653 0.053466 1327.7 0.016541

2.2.2. College and Exam Questions

College courses and the early actuarial exams often ask purely technical questions. Using assumptions from the Realistic Insurance Example answer the following.

Compute the severity lognormal parameters mu and sigma.
Compute the expected insured severity and expected claim count.
Compute the probability the aggregate exceeds the premium using the following matched moment approximations:
1. Normal
2. Gamma
3. Lognormal
4. Shifted gamma
5. Shifted lognormal
Using the aggregate and a lognormal approximation, compute:
1. The probability losses exceed 2500
2. The expected value of losses limited to 2500
3. The expected value of losses in excess of 2500

The code below provides all the answers. mu_sigma_from_mean_cv computes the lognormal parameters—one of the most written macro in actuarial science! Start by applying it to the given severity parameters to answer question 1.

In [6]: from aggregate import mu_sigma_from_mean_cv

In [7]: import pandas as pd

In [8]: print(mu_sigma_from_mean_cv(50, 1.25))
(3.441531333195883, 0.9700429601128635)

The function a01.approximate parameterizes all the requested matched moment approximations, returning frozen scipy.stats distribution objects that expose cdf methods. The Aggregate class object a also has a cdf method. Using these functions, we can assemble a dataframe to answer question 3.

In [9]: fz = a01.approximate('all')

In [10]: fz['agg'] = a01

In [11]: df = pd.DataFrame({k: v.sf(2000) for k, v in fz.items()}.items(),
   ....:              columns=['Approximation', 'Value']
   ....:             ).set_index("Approximation")
   ....: 

In [12]: df['Error'] = df.Value / df.loc['agg', 'Value'] - 1

In [13]: qd(df.sort_values('Value'))

                Value     Error
Approximation                  
lognorm       0.13445   -0.1019
slognorm      0.14456  -0.03437
gamma         0.14689 -0.018844
sgamma        0.14745 -0.015088
agg           0.14971         0
norm          0.15183  0.014183

The function lognorm_lev computes limited expected values for the lognormal. It is used to assemble a dataframe to answer question 4. In this case, the lognormal approximation EPD is over 50% higher than the more accurate estimate provided by aggregate.

In [14]: from aggregate import lognorm_lev

In [15]: mu, sigma = mu_sigma_from_mean_cv(a01.agg_m, a01.agg_cv)

In [16]: lev = lognorm_lev(mu, sigma, 1, 2500)

In [17]: lev_agg = a01.density_df.loc[2500, 'lev']

In [18]: default = a01.agg_m - lev

In [19]: epd = default / a01.est_m

In [20]: default_agg = a01.est_m - lev_agg

In [21]: bit = pd.DataFrame((lev, default, lev_agg, default_agg, epd, default_agg / a01.agg_m),
   ....:              index=pd.Index(['Lognorm LEV', 'Lognorm Default', 'Agg LEV',
   ....:              'Agg Default', 'Lognorm EPD', 'Agg EPD'],
   ....:              name='Item'),
   ....:              columns=['Value'])
   ....: 

In [22]: qd(bit)

                   Value
Item                    
Lognorm LEV       1319.5
Lognorm Default   30.495
Agg LEV           1327.7
Agg Default       22.331
Lognorm EPD     0.022589
Agg EPD         0.016541

2.2.3. Advantages of Modeling with Aggregate Distributions

Aggregate distributions provide a powerful modeling paradigm. It separates the analysis of frequency and severity. Different datasets can be used for each. KPW list seven advantages.

Only the expected claim count changes with volume. The severity distribution is a characteristic of the line of business.
Inflation impacts ground-up severity but not claim count. The situation is more complicated when limits and deductibles apply.
Coverage terms impact occurrence limits and deductibles, which affect ground-up severity.
The impact on claims frequencies of changing deductibles is better understood.
Severity curves can be estimated from homogeneous data. Kaplan-Meier and related methods can adjust for censoring and truncation caused by limits and deductibles.
Retained, insured, ceded, and net losses can be modeled consistently.
Understanding properties of frequency and severity separately illuminates the shape of the aggregate.

2.2.4. Summary of Objects Created by DecL

Objects created by build() in this guide.

In [23]: from aggregate import pprint_ex

In [24]: for n, r in build.qlist('^Actuary:').iterrows():
   ....:     pprint_ex(r.program, split=20)
   ....: