3.3. Distributions Module

3.3.1. Frequency Class

class aggregate.distributions.Frequency(freq_name, freq_a, freq_b, freq_zm, freq_p0)[source]

Manages Frequency distributions: creates moment function and MGF.

  • freq_moms(n): returns EN, EN^2 and EN^3 when EN=n

  • freq_pgf(n, z): returns the moment generating function applied to z when EN=n

Frequency distributions are either non-mixture types or mixture types.

Non-Mixture Frequency Types

  • fixed: no parameters

  • bernoulli: exp_en interpreted as a probability, must be < 1

  • binomial: Binomial(n/p, p) where p = freq_a, and n = exp_en

  • poisson: Poisson(n)

  • geometric: geometric(1/(n + 1)), supported on 0, 1, 2, …

  • logarithmci: logarithmic(theta), supported on 1, 2, …; theta solved numerically

  • negymana: Po(n/freq_a) stopped sum of Po(freq_a) freq_a = “eggs per cluster”

  • negbin: freq_a is the variance multiplier, ratio of variance to mean

  • pascal:

  • pascal: (generalized) pascal-poisson distribution, a poisson stopped sum of negative binomial; exp_en gives the overall claim count. freq_a is the CV of the frequency distribution and freq_b is the number of claimants per claim (or claims per occurrence). Hence, the Poisson component has mean exp_en / freq_b and the number of claims per occurrence has mean freq_b. This parameterization may not be ideal(!).

Mixture Frequency Types

These distributions are G-mixed Poisson, so N | G ~ Poisson(n G). They are labelled by the name of the mixing distribution or the common name for the resulting frequency distribution. See Panjer and Willmot or JKK.

In all cases freq_a is the CV of the mixing distribution which corresponds to the asympototic CV of the frequency distribution and of any aggregate when the severity has a variance.

  • gamma: negative binomial, freq_a = cv of gamma distribution

  • delaporte: shifted gamma, freq_a = cv of mixing disitribution, freq_b = proportion of certain claims = shift. freq_b must be between 0 and 1.

  • ig: inverse gaussian, freq_a = cv of mixing distribution

  • sig: shifted inverse gaussian, freq_a = cv of mixing disitribution, freq_b = proportion of certain claims = shift. freq_b must be between 0 and 1.

  • sichel: generalized inverse gaussian mixing distribution, freq_a = cv of mixing distribution and freq_b = lambda value. The beta and mu parameters solved to match moments. Note lambda = -0.5 corresponds to inverse gaussian and 0.5 to reciprocal inverse gauusian. Other special cases are available.

  • sichel.gamma: generalized inverse gaussian mixture where the parameters match the moments of a delaporte distribution with given freq_a and freq_b

  • sichel.ig: generalized inverse gaussian mixture where the parameters match the moments of a shifted inverse gaussian distribution with given freq_a and freq_b. This parameterization has poor numerical stability and may fail.

  • beta: beta mixing with freq_a = Cv where beta is supported on the interval [0, freq_b]. This method should be used carefully. It has poor numerical stability and can produce bizzare aggregates when the alpha or beta parameters are < 1 (so there is a mode at 0 or freq_b).

Code proof for Neyman A:

from aggregate import build, qd
mean = 10
eggs_per_cluster = 4
neya = build(f'agg Neya {mean} claims dsev[1] neymana {eggs_per_cluster}')
qd(neya)

po = build(f'agg Po4 {eggs_per_cluster} claims dsev[1] poisson')
po_pmf = po.density_df.query('p_total > 1e-13').p_total

byhand = build(f'agg ByHand {mean / eggs_per_cluster} claims dsev {list(po_pmf.index)} {po_pmf.values} poisson')
qd(byhand)

df = pd.concat((neya.density_df.p_total, byhand.density_df.p_total), axis=1)
df.columns = ['neya', 'byhand']
df['err'] = df.neya - df.byhand
assert df.err.abs().max() < 1e-5
df.head(40)

Code proof for Pascal:

from aggregate import build, qd
mean = 10
claims_per_occ =1.24
overall_cv = 1.255
pascal = build(f'agg PascalEg {mean} claims dsev[1] pascal {overall_cv} {claims_per_occ}', log2=16)
qd(pascal)

c = (mean * overall_cv**2 - 1 - claims_per_occ) / claims_per_occ
th = claims_per_occ * c
a = 1 / c
# from form of nb pgf identify r = a and beta = theta, mean is rb, var is rb(1+b)
nb = build(f'agg NB {claims_per_occ} claims dsev[1] negbin {th + 1}', log2=16)
nb_pmf = nb.density_df.query('p_total > 1e-13').p_total
qd(nb)

byhand = build(f'agg ByHand {mean / claims_per_occ} claims dsev {list(nb_pmf.index)} {nb_pmf.values} poisson', log2=16)
qd(byhand)

df = pd.concat((pascal.density_df.p_total, byhand.density_df.p_total), axis=1)
df.columns = ['pascal', 'byhand']
df['err'] = df.pascal - df.byhand
assert df.err.abs().max() < 1e-5
df.head(40)
Parameters
  • freq_name – name of the frequency distribution, poisson, geometric, etc.

  • freq_a

  • freq_b

__init__(freq_name, freq_a, freq_b, freq_zm, freq_p0)[source]

Creates the freq_pgf and moment function:

  • moment function(n) returns EN, EN^2, EN^3 when EN=n.

  • freq_pgf(n, z) is the freq_pgf evaluated at log(z) when EN=n

Parameters
  • freq_name – name of the frequency distribution, poisson, geometric, etc.

  • freq_a

  • freq_b

  • freq_zm – freq_zm True if zero modified, default False

  • freq_p0 – modified p0, probability of zero claims

3.3.2. Severity Class

class aggregate.distributions.Severity(sev_name, exp_attachment=None, exp_limit=inf, sev_mean=0, sev_cv=0, sev_a=nan, sev_b=0, sev_loc=0, sev_scale=0, sev_xs=None, sev_ps=None, sev_wt=1, sev_lb=0, sev_ub=inf, sev_conditional=True, name='', note='')[source]
__init__(sev_name, exp_attachment=None, exp_limit=inf, sev_mean=0, sev_cv=0, sev_a=nan, sev_b=0, sev_loc=0, sev_scale=0, sev_xs=None, sev_ps=None, sev_wt=1, sev_lb=0, sev_ub=inf, sev_conditional=True, name='', note='')[source]

A continuous random variable, subclasses scipy.statistics_df.rv_continuous, adding layer and attachment functionality. It overrides

  • cdf

  • pdf

  • isf

  • ppf

  • sf

  • stats

See scipy.stats continuous rvs for more details about available distributions. The following distributions with two shape parameters are supported:

  • Burr (burr)

  • Generalized Pareto (genpareto)

  • Generalized gamma (gengamma)

It is easy to add others in the code below. With two shape parameters the mean cv input format is not available.

See code in Problems and Solutions to extract distributions from scipy stats by introsepection.

Parameters
  • sev_name – scipy statistics_df continuous distribution | (c|d)histogram cts or discerte | fixed

  • exp_attachment – None if layer is missing, distinct from 0; if a = 0 then losses are conditional on X>a, if a = None then losses are conditional on X>=0

  • exp_limit

  • sev_mean

  • sev_cv

  • sev_a – first shape parameter

  • sev_b – second shape parameter (e.g., beta)

  • sev_loc – scipy.stats location parameter

  • sev_scale – scipy.stats scale parameter

  • sev_xs – for fixed or histogram classes

  • sev_ps

  • sev_wt – this is not used directly; but it is convenient to pass it in and ignore it because sevs are implicitly created with sev_wt=1.

  • sev_conditional – conditional or unconditional; for severities use conditional

  • name – name of the severity object

  • note – optional note.

cv_to_shape(cv, hint=1)[source]

Create a frozen object of type dist_name with given cv. The lognormal, gamma, inverse gamma and inverse gaussian distributions are solved analytically. Other distributions solved numerically and may be unstable.

Parameters
  • cv

  • hint

Returns

mean_to_scale(shape, mean, loc=0)[source]

Adjust the scale to achieved desired mean. Return a frozen instance.

Parameters
  • shape

  • mean

  • loc – location parameter (note: location is added to the mean…)

Returns

moms()[source]

Revised moments for Severity class. Trying to compute moments of

X(a,d) = min(d, (X-a)+)

==> E[X(a,d)^n] = int_a^d (x-a)^n f(x) dx + (d-a)^n S(d).

Let x = q(p), F(x) = p, f(x)dx = dp. for 1,2,…n(<=3).

E[X(a,d)^n] = int_{F(a)}^{F(d)} (q(p)-a)^n dp + (d-a)^n S(d)

The base is to compute int_{F(a)}^{F(d)} q(p)^n dp. These are exi below. They are then adjusted to create the moments needed.

Old moments tried to compute int S(x)dx, but that is over a large, non-compact domain and did not work so well. With 0.9.3 old_moms was removed. Old_moms code did this:

ex1 = safe_integrate(lambda x: self.fz.sf(x), 1)
ex2 = safe_integrate(lambda x: 2 * (x - self.attachment) * self.fz.sf(x), 2)
ex3 = safe_integrate(lambda x: 3 * (x - self.attachment) ** 2 * self.fz.sf(x), 3)

Test examples

def test(mu, sigma, a, y):
    global moms
    import types
    # analytic with no layer attachment
    fz = ss.lognorm(sigma, scale=np.exp(mu))
    tv = np.array([np.exp(k*mu + k * k * sigma**2/2) for k in range(1,4)])

    # old method
    s = agg.Severity('lognorm', sev_a=sigma, sev_scale=np.exp(mu),
                     exp_attachment=a, exp_limit=y)
    est = np.array(s.old_moms())

    # swap out moment routine
    setattr(s, moms.__name__, types.MethodType(moms, s))
    ans = np.array(s.moms())

    # summarize and report
    sg = f'Example: mu={mu}  sigma={sigma}  a={a}  y={y}'
    print(f'{sg}\n{"="*len(sg)}')
    print(pd.DataFrame({'new_ans' : ans, 'old_ans': est,
                        'err': ans/est-1, 'no_la_analytic' : tv}))


test(8.7, .5, 0, np.inf)
test(8.7, 2.5, 0, np.inf)
test(8.7, 2.5, 10e6, 200e6)

Example: mu=8.7, sigma=0.5, a=0, y=inf

        new_ans       old_ans           err  no_la_analytic
0  6.802191e+03  6.802191e+03  3.918843e-11    6.802191e+03
1  5.941160e+07  5.941160e+07  3.161149e-09    5.941160e+07
2  6.662961e+11  6.662961e+11  2.377354e-08    6.662961e+11

Example: mu=8.7 sigma=2.5 a=0 y=inf (here the old method failed)

        new_ans       old_ans           err  no_la_analytic
0  1.366256e+05  1.366257e+05 -6.942541e-08    1.366257e+05
1  9.663487e+12  1.124575e+11  8.493016e+01    9.669522e+12
2  2.720128e+23  7.597127e+19  3.579469e+03    3.545017e+23

Example: mu=8.7 sigma=2.5 a=10000000.0 y=200000000.0

        new_ans       old_ans           err  no_la_analytic
0  1.692484e+07  1.692484e+07  2.620126e-14    1.366257e+05
1  1.180294e+15  1.180294e+15  5.242473e-13    9.669522e+12
2  1.538310e+23  1.538310e+23  9.814372e-14    3.545017e+23

The numerical issues are very sensitive. Goign for a compromise between speed and accuracy. Only an issue for very thick tailed distributions with no upper limit - not a realistic situation. Here is a tester program for two common cases:

logger_level(30) # see what is going on
for sh, dist in zip([1,2,3,4, 3.5,2.5,1.5,.5], ['lognorm']*3 + ['pareto']*4):
    s = Severity(dist, sev_a=sh, sev_scale=1, exp_attachment=0)
    print(dist,sh, s.moms())
    if dist == 'lognorm':
        print('actual', [(n, np.exp(n*n*sh*sh/2)) for n in range(1,4)])
Returns

vector of moments. np.nan signals unreliable but finite value. np.inf is correct, the moment does not exist.

plot(n=100, axd=None, figsize=(7.0, 4.9), layout='AB\nCD')[source]

Quick plot, updated for 0.9.3 with mosaic and no grid lines. (F(x), x) plot replaced with log density plot.

param n

number of points to plot.

param axd

axis dictionary, if None, create new figure. Must have keys ‘A’, ‘B’, ‘C’, ‘D’.

param figsize

(width, height) in inches.

param layout

the subplot_mosaic layout of the figure. Default is ‘AB

CD’.
return

3.3.3. Aggregate Class

class aggregate.distributions.Aggregate(name, exp_el=0, exp_premium=0, exp_lr=0, exp_en=0, exp_attachment=None, exp_limit=inf, sev_name='', sev_a=nan, sev_b=0, sev_mean=0, sev_cv=0, sev_loc=0, sev_scale=0, sev_xs=None, sev_ps=None, sev_wt=1, sev_lb=0, sev_ub=inf, sev_conditional=True, sev_pick_attachments=None, sev_pick_losses=None, occ_reins=None, occ_kind='', freq_name='', freq_a=0, freq_b=0, freq_zm=False, freq_p0=nan, agg_reins=None, agg_kind='', note='')[source]
__init__(name, exp_el=0, exp_premium=0, exp_lr=0, exp_en=0, exp_attachment=None, exp_limit=inf, sev_name='', sev_a=nan, sev_b=0, sev_mean=0, sev_cv=0, sev_loc=0, sev_scale=0, sev_xs=None, sev_ps=None, sev_wt=1, sev_lb=0, sev_ub=inf, sev_conditional=True, sev_pick_attachments=None, sev_pick_losses=None, occ_reins=None, occ_kind='', freq_name='', freq_a=0, freq_b=0, freq_zm=False, freq_p0=nan, agg_reins=None, agg_kind='', note='')[source]

The Aggregate distribution class manages creation and calculation of aggregate distributions. It allows for very flexible creation of Aggregate distributions. Severity can express a limit profile, a mixed severity or both. Mixed frequency types share a mixing distribution across all broadcast terms to ensure an appropriate inter- class correlation.

Parameters
  • name – name of the aggregate

  • exp_el – expected loss or vector

  • exp_premium – premium volume or vector (requires loss ratio)

  • exp_lr – loss ratio or vector (requires premium)

  • exp_en – expected claim count per segment (self.n = total claim count)

  • exp_attachment – occurrence attachment; None indicates no limit clause, which is treated different from an attachment of zero.

  • exp_limit – occurrence limit

  • sev_name – severity name or sev.BUILTIN_SEV or meta.var agg or port or similar or vector or matrix

  • sev_a – scipy stats shape parameter

  • sev_b – scipy stats shape parameter

  • sev_mean – average (unlimited) severity

  • sev_cv – unlimited severity coefficient of variation

  • sev_loc – scipy stats location parameter

  • sev_scale – scipy stats scale parameter

  • sev_xs – xs and ps must be provided if sev_name is (c|d)histogram, xs are the bucket break points

  • sev_ps – ps are the probability densities within each bucket; if buckets equal size no adjustments needed

  • sev_wt – weight for mixed distribution

  • sev_lb – lower bound for severity (length of sev_lb must equal length of sev_ub and weights)

  • sev_ub – upper bound for severity

  • sev_conditional – if True, severity is conditional, else unconditional.

  • sev_pick_attachments – if not None, a list of attachment points to define picks

  • sev_pick_losses – if not None, a list of losses by layer

  • occ_reins – layers: share po layer xs attach or XXXX

  • occ_kind – ceded to or net of

  • freq_name – name of frequency distribution

  • freq_a – cv of freq dist mixing distribution

  • freq_b – claims per occurrence (delaporte or sig), scale of beta or lambda (Sichel)

  • freq_zm – True/False zero modified flag

  • freq_p0 – if freq_zm, provides the modified value of p0; default is nan

  • agg_reins – layers

  • agg_kind – ceded to or net of

  • note – note, enclosed in {}

_apply_reins_work(reins_list, base_density, debug=False)[source]

Actually do the work. Called by apply_reins and reins_audit_df. Only needs self to get limits, which it must guess without q (not computed at this stage). Does not need to know if occ or agg reins, only that the correct base_density is supplied.

Parameters
  • reins_list

  • kind – occ or agg, for debug plotting

  • debug

Returns

ceder, netter,

_reins_audit_df_work(kind='occ')[source]

Apply each re layer separately and aggregate loss and other stats.

_repr_html_()[source]

For IPython.display

aggregate_error_analysis(log2, bs2_from=None, **kwargs)[source]

Analysis of aggregate error across a range of bucket sizes. If bs2_from is None use recommend_bucket plus/mins 3. Note: if distribution does not have a second moment, you must enter bs2_from.

Parameters
  • log2

  • bs2_from – lower bound on bs to use, in log2 terms; estimate using recommend_bucket if not input.

  • kwargs – passed to update

apply_agg_reins(debug=False, padding=1, tilt_vector=None)[source]

Apply the entire agg reins structure and save output. For by layer detail create reins_audit_df. Makes sev_density_gross, sev_density_net and sev_density_ceded, and updates sev_density to the requested view.

Not reflected in statistics df.

Returns

apply_distortion(dist)[source]

Apply distortion to the aggregate density and append as exag column to density_df. # TODO check consistent with other implementations. :param dist: :return:

apply_occ_reins(debug=False)[source]

Apply the entire occ reins structure and save output For by layer detail create reins_audit_df Makes sev_density_gross, sev_density_net and sev_density_ceded, and updates sev_density to the requested view.

Not reflected in statistics df.

Parameters

debug – More verbose.

Returns

approximate(approx_type='slognorm', output='scipy')[source]

Create an approximation to self using method of moments matching.

Compare to Portfolio.approximate which returns a single sev fixed freq agg, this returns a scipy dist by default.

Use case: exam questions with the normal approacimation!

Parameters
  • approx_type – norm, lognorn, slognorm (shifted lognormal), gamma, sgamma. If ‘all’ then returns a dictionary of each approx.

  • output – scipy - frozen scipy.stats continuous rv object; sev_decl - DecL program for severity (to substituate into an agg ; no name) sev_kwargs - dictionary of parameters to create Severity agg_decl - Decl program agg T 1 claim sev_decl fixed any other string - created Aggregate object

Returns

as above.

cdf(x, kind='previous')[source]

Return cumulative probability distribution at x using kind interpolation.

2022-10 change: kind introduced; default was linear

Parameters

x – loss size

Returns

cramer_lundberg(rho, cap=0, excess=0, stop_loss=0, kind='index', padding=1)

Return the Pollaczeck-Khinchine Capital function relating surplus to eventual probability of ruin. Assumes frequency is Poisson.

See Embrechts, Kluppelberg, Mikosch 1.2, page 28 Formula 1.11

TODO: Should return a named tuple.

Parameters
  • rho – rho = prem / loss - 1 is the margin-to-loss ratio

  • cap – cap = cap severity at cap, which replaces severity with X | X <= cap

  • excess – excess = replace severity with X | X > cap (i.e. no shifting)

  • stop_loss – stop_loss = apply stop loss reinsurance to cap, so X > stop_loss replaced with Pr(X > stop_loss) mass

  • kind

  • padding – for update (the frequency tends to be high, so more padding may be needed)

Returns

ruin vector as pd.Series and function to lookup (no interpolation if kind==index; else interp) capitals

property density_df

Create and return the density_df data frame. A read only property, though if you write d = a.density_df you can obviously edit d. Some duplication of columns (p and p_total) to ensure consistency with Portfolio.

Returns

DataFrame similar to Portfolio.density_df.

property describe

Theoretic and empirical stats. Used in _repr_html_.

discretize(sev_calc, discretization_calc, normalize)[source]

Discretize the severity distributions and weight.

sev_calc describes how the severity is discretize, see `Discretizing the Severity Distribution`_. The options are discrete=round, forward, backward or moment.

sev_calc='continuous' (same as forward, kept for backwards compatibility) is used when you think of the resulting distribution as continuous across the buckets (which we generally don’t). The buckets are not shifted and so \(Pr(X=b_i) = Pr( b_{i-1} < X \le b_i)\). Note that \(b_{i-1}=-bs/2\) is prepended.

We use the discretized distribution as though it is fully discrete and only takes values at the bucket points. Hence, we should use sev_calc=’discrete’. The buckets are shifted left by half a bucket, so \(Pr(X=b_i) = Pr( b_i - b/2 < X \le b_i + b/2)\).

The other wrinkle is the righthand end of the range. If we extend to np.inf then we ensure we have probabilities that sum to 1. But that method introduces a probability mass in the last bucket that is often not desirable (we expect to see a smooth continuous distribution, and we get a mass). The other alternative is to use endpoint = 1 bucket beyond the last, which avoids this problem but can leave the probabilities short. We opt here for the latter and normalize (rescale).

discretization_calc controls whether individual probabilities are computed using backward-differences of the survival function or forward differences of the distribution function, or both. The former is most accurate in the right-tail and the latter for the left-tail of the distribution. We are usually concerned with the right-tail, so prefer survival. Using both takes the greater of the two esimates giving the best of both worlds (underflow makes distribution zero in the right-tail and survival zero in the left tail, so the maximum gives the best estimate) at the expense of computing time.

Sensible defaults: sev_calc=discrete, discretization_calc=survival, normalize=True.

Parameters
  • sev_calc – discrete=round, forward, backward, or continuous and method becomes discrete otherwise

  • discretization_calc – survival, distribution or both; in addition the method then becomes survival

  • normalize – if True, normalize the severity so sum probs = 1. This is generally what you want; but when dealing with thick tailed distributions it can be helpful to turn it off.

Returns

easy_update(log2=16, bs=0, recommend_p=0.99999, debug=False, **kwargs)

Convenience function, delegates to update_work. Avoids having to pass xs. Also aliased as easy_update for backward compatibility.

Parameters
  • log2

  • bs

  • recommend_p – p value passed to recommend_bucket. If > 1 converted to 1 - 10**-p in rec bucket.

  • debug

  • kwargs – passed through to update

Returns

entropy_fit(n_moments, tol=1e-10, verbose=False)[source]

Find the max entropy fit to the aggregate based on n_moments fit. The constant is added (sum of probabilities constraint), for two moments there are n_const = 3 constrains.

Based on discussions with, and R code from, Jon Evans

Run

ans = obj.entropy_fit(2)
ans['ans_df'].plot()

to compare the fits.

Parameters
  • n_moments – number of moments to match

  • tol

  • verbose

Returns

explain_validation()[source]

Explain validation result. Validation computed if needed.

freq_pmf(log2)[source]

Return the frequency probability mass function (pmf) computed using 2**log2 buckets. Uses self.en to compute the expected frequency. The Frequency does not know the expected claim count, so this is a method of Aggregate.

html_info_blob()[source]

Text top of _repr_html_

json()[source]

Write spec to json string.

Returns

limits(stat='range', kind='linear', zero_mass='include')[source]

Suggest sensible plotting limits for kind=range, density, etc., same as Portfolio.

Should optionally return a locator for plots?

Called by ploting routines. Single point of failure!

Must work without q function when not computed (apply_reins_work for occ reins; then use report_ser instead).

Parameters
  • stat – range or density (for y axis)

  • kind – linear or log (this is the y-axis, not log of range…that is rarely plotted)

  • zero_mass – include exclude, for densities

Returns

more(regex)[source]

More information about methods and properties matching regex

pdf(x)[source]

Probability density function, assuming a continuous approximation of the bucketed density.

Parameters

x

Returns

picks(attachments, layer_loss_picks, debug=False)[source]

Adjust the computed severity to hit picks targets in layers defined by a. Delegates work to utilities.picks_work. See that function for details.

plot(axd=None, xmax=0, **kwargs)[source]

Basic plot with severity and aggregate, linear and log plots and Lee plot.

Parameters
  • xmax – Enter a “hint” for the xmax scale. E.g., if plotting gross and net you want all on the same scale. Only used on linear scales?

  • axd

  • kwargs – passed to make_mosaic_figure

Returns

pmf(x)[source]

Probability mass function, treating aggregate as discrete x must be in the index (?)

pollaczeck_khinchine(rho, cap=0, excess=0, stop_loss=0, kind='index', padding=1)[source]

Return the Pollaczeck-Khinchine Capital function relating surplus to eventual probability of ruin. Assumes frequency is Poisson.

See Embrechts, Kluppelberg, Mikosch 1.2, page 28 Formula 1.11

TODO: Should return a named tuple.

Parameters
  • rho – rho = prem / loss - 1 is the margin-to-loss ratio

  • cap – cap = cap severity at cap, which replaces severity with X | X <= cap

  • excess – excess = replace severity with X | X > cap (i.e. no shifting)

  • stop_loss – stop_loss = apply stop loss reinsurance to cap, so X > stop_loss replaced with Pr(X > stop_loss) mass

  • kind

  • padding – for update (the frequency tends to be high, so more padding may be needed)

Returns

ruin vector as pd.Series and function to lookup (no interpolation if kind==index; else interp) capitals

ppf(p, kind='lower')

Return quantile function of density_df.p_total.

Definition 2.1 (Quantiles) x(α) = qα(X) = inf{x ∈ R : P[X ≤ x] ≥ α} is the lower α-quantile of X x(α) = qα(X) = inf{x ∈ R : P[X ≤ x] > α} is the upper α-quantile of X.

kind=='middle' has been removed.

Parameters
  • p

  • kind – ‘lower’ or ‘upper’.

Returns

property pprogram

pretty print the program

property pprogram_html

pretty print the program to html

price(p, g, kind='var')[source]

Price using regulatory and pricing g functions, mirroring Portfolio.price. Unlike Portfolio, cannot calibrate. Applying specified Distortions only. If calibration is needed, embed Aggregate in a one-line Portfolio object.

Compute E_price (X wedge E_reg(X) ) where E_price uses the pricing distortion and E_reg uses the regulatory distortion.

Regulatory capital distortion is applied on unlimited basis: reg_g can be:

  • if input < 1 it is a number interpreted as a p value and used to determine VaR capital

  • if input > 1 it is a directly input capital number

  • d dictionary: Distortion; spec { name = dist name | var, shape=p value a distortion used directly

pricing_g is { name = ph|wang and shape=}, if shape (lr or roe not allowed; require calibration).

if ly, must include ro in spec

Parameters
  • p – a distortion function spec or just a number; if >1 assets, if <1 a prob converted to quantile

  • kind – var lower upper tvar

  • g – pricing distortion function

Returns

q(p, kind='lower')[source]

Return quantile function of density_df.p_total.

Definition 2.1 (Quantiles) x(α) = qα(X) = inf{x ∈ R : P[X ≤ x] ≥ α} is the lower α-quantile of X x(α) = qα(X) = inf{x ∈ R : P[X ≤ x] > α} is the upper α-quantile of X.

kind=='middle' has been removed.

Parameters
  • p

  • kind – ‘lower’ or ‘upper’.

Returns

q_sev(p)[source]

Compute quantile of severity distribution, returning element in the index. Very similar code to q, but only lower quantiles.

Parameters

p

Returns

recommend_bucket(log2=10, p=0.99999, verbose=False)[source]

Recommend a bucket size given 2**N buckets. Not rounded.

For thick tailed distributions need higher p, try p=1-1e-8.

If no second moment, throws a ValueError. You just can’t guess in that situation.

Parameters
  • log2 – log2 of number of buckets. log2=10 is default.

  • p – percentile to use to determine needed range. Default is RECOMMEND_P. if > 1 converted to 1-10**-n.

  • verbose – print out recommended bucket sizes for 2**n for n in {log2, 16, 13, 10}

Returns

property reinsurance_audit_df

Create and return the _reins_audit_df data frame. Read only property.

Returns

reinsurance_description(kind='both', width=0)[source]

Text description of the reinsurance.

Parameters
  • kind – both, occ, or agg

  • width – width of text for textwrap.fill; omitted if width==0

property reinsurance_df

Version of density_df tailored to reinsurance. Several cases

  • occ program only: agg_density_.. is recomputed manually for all three outcomes

  • agg program only: sev_density_… not set for gcn

  • both programs: agg is gcn for the agg program applied to the requested occ output

_apply_reins_work

reinsurance_kinds()[source]

Text desciption of kinds of reinsurance applied: None, Occurrence, Aggergate, both.

Returns

property reinsurance_occ_layer_df

How losses are layered by the occurrence reinsurance. Expected loss, CV layer loss, and expected counts to layers.

reinsurance_occ_plot(axs=None)[source]

Plots for occurrence reinsurance: occurrence log density and aggregate quantile plot.

property reinsurance_report_df

Create and return a dataframe with the reinsurance report. TODO: sort out the overlap with reinsurance_audit_df (occ and agg) What this function adds is the ceded/net of occ aggregates before application of the agg reinsurance. The pure occ and agg parts are in reinsurance_audit_df.

property report_df

Created on the fly report to audit creation of object. There were some bad choices of columns in audit_df…but it [maybe] embedded in other code…. Eg the use of _1 vs _m for mean is inconsistent.

Returns

rescale(scale, kind='homog')[source]

Return a rescaled Aggregate object - used to compute derivatives.

All need to be safe multiplies because of array specification there is an array that is not a numpy array

TODO have parser return numpy arrays not lists!

Parameters
  • scale – amount of scale

  • kind – homog of inhomog

Returns

sample(n, replace=True)[source]

Draw a sample of n items from the aggregate distribution. Wrapper around pd.DataFrame.sample.

property sev

Make exact sf, cdf and pdfs and store in namedtuple for use as sev.cdf etc.

severity_error_analysis(sev_calc='round', discretization_calc='survival', normalize=True)[source]

Analysis of severity component errors, uses the current bs in self. Gives detailed, component by component, error analysis of severities. Includes discretization error (bs large relative to mean) and truncation error (tail integral large).

Total S shows the aggregate not severity. Generally about self.n * (1 - sum_p) (per Feller).

sf(x)[source]

Return survival function using linear interpolation.

Parameters

x – loss size

Returns

snap(x)[source]

Snap value x to the index of density_df, i.e., as a multiple of self.bs.

Parameters

x

Returns

property spec

Get the dictionary specification, but treat as a read only property

Returns

property spec_ex

All relevant info.

Returns

property statistics

Pandas series of theoretic frequency, severity, and aggregate 1st, 2nd, and 3rd moments. Mean, cv, and skewness.

Returns

tvar(p, kind='')[source]

Updated June 2023, 0.13.0

Compute the tail value at risk at threshold p

Definition 2.6 (Tail mean and Expected Shortfall) Assume E[X−] < ∞. Then x¯(α) = TM_α(X) = α^{−1}E[X 1{X≤x(α)}] + x(α) (α − P[X ≤ x(α)]) is α-tail mean at level α the of X. Acerbi and Tasche (2002)

We are interested in the right hand exceedence [?? note > vs ≥] α^{−1}E[X 1{X > x(α)}] + x(α) (P[X ≤ x(α)] − α)

McNeil etc. p66-70 - this follows from def of ES as an integral of the quantile function

q is exact quantile (most of the time) q1 is the smallest index element (bucket multiple) greater than or equal to q

tvar integral is int_p^1 q(s)ds = int_q^infty xf(x)dx = q + int_q^infty S(x)dx we use the last approach. np.trapz approxes the integral. And the missing piece between q and q1 approx as a trapezoid too.

Parameters
  • p

  • kind

Returns

tvar_sev(p)[source]

TVaR of severity - now available for free!

added June 2023

update(log2=16, bs=0, recommend_p=0.99999, debug=False, **kwargs)[source]

Convenience function, delegates to update_work. Avoids having to pass xs. Also aliased as easy_update for backward compatibility.

Parameters
  • log2

  • bs

  • recommend_p – p value passed to recommend_bucket. If > 1 converted to 1 - 10**-p in rec bucket.

  • debug

  • kwargs – passed through to update

Returns

update_work(xs, padding=1, tilt_vector=None, approximation='exact', sev_calc='discrete', discretization_calc='survival', normalize=True, force_severity=False, debug=False)[source]

Compute a discrete approximation to the aggregate density.

See discretize for sev_calc, discretization_calc and normalize.

Quick simple test with log2=13 update took 5.69 ms and _eff took 2.11 ms. So quicker but not an issue unless you are doing many buckets or aggs.

Parameters
  • xs – range of x values used to discretize

  • padding – for FFT calculation

  • tilt_vector – tilt_vector = np.exp(tilt_amount * np.arange(N)), N=2**log2, and tilt_amount * N < 20 recommended

  • approximation – ‘exact’ = perform frequency / severity convolution using FFTs. ‘slognorm’ or ‘sgamma’ use a shifted lognormal or shifted gamma approximation.

  • sev_calc – discrete=round, forward, backward, or continuous and method becomes discrete otherwise

  • discretization_calc – survival, distribution or both; in addition the method then becomes survival

  • normalize – if True, normalize the severity so sum probs = 1. This is generally what you want; but when dealing with thick tailed distributions it can be helpful to turn it off.

  • force_severity – make severities even if using approximation, for plotting

  • debug – run reinsurance in debug model if True.

Returns

property valid

Check if the model appears valid. An answer of True means the model is “not unreasonable”. It does not guarantee the model is valid. On the other hand, False means it is definitely suspect. (The interpretation is similar to the null hypothesis in a statistical test). Called and reported automatically by qd for Aggregate objects.

Checks the relative errors (from self.describe) for:

  • severity mean < eps

  • severity cv < 10 * eps

  • severity skew < 100 * eps (skewness is more difficult to estimate)

  • aggregate mean < eps and < 2 * severity mean relative error (larger values indicate possibility of aliasing and that bs is too small).

  • aggregate cv < 10 * eps

  • aggregate skew < 100 * esp

The default uses eps = 1e-4 relative error. This can be changed by setting the validation_eps variable.

Test only applied for CV and skewness when they are > 0.

Run with logger level 20 (info) for more information on failures.

A Type 1 error (rejecting a valid model) is more likely than Type 2 (failing to reject an invalide one).

Returns

True (interpreted as not unreasonable) if all tests are passed, else False.

var_dict(p, kind='lower', snap=False)[source]

Make a dictionary of value at risks for the line, mirrors Portfolio.var_dict. Here is just marshals calls to the appropriate var or tvar function.

No epd. Allows the price function to run consistently with Portfolio version.

Example Use:

for p, arg in zip([.996, .996, .996, .985, .01], ['var', 'lower', 'upper', 'tvar', 'epd']):
    print(port.var_dict(p, arg,  snap=True))
Parameters
  • p

  • kind – var (defaults to lower), upper, lower, tvar

  • snap – snap tvars to index

Returns