3.3. Distributions Module
3.3.1. Frequency Class
- class aggregate.distributions.Frequency(freq_name, freq_a, freq_b, freq_zm, freq_p0)[source]
Manages Frequency distributions: creates moment function and MGF.
freq_moms(n): returns EN, EN^2 and EN^3 when EN=n
freq_pgf(n, z): returns the moment generating function applied to z when EN=n
Frequency distributions are either non-mixture types or mixture types.
Non-Mixture Frequency Types
fixed
: no parametersbernoulli
: exp_en interpreted as a probability, must be < 1binomial
: Binomial(n/p, p) where p = freq_a, and n = exp_enpoisson
: Poisson(n)geometric
: geometric(1/(n + 1)), supported on 0, 1, 2, …logarithmci
: logarithmic(theta), supported on 1, 2, …; theta solved numericallynegymana
: Po(n/freq_a) stopped sum of Po(freq_a) freq_a = “eggs per cluster”negbin
: freq_a is the variance multiplier, ratio of variance to meanpascal
:pascal
: (generalized) pascal-poisson distribution, a poisson stopped sum of negative binomial; exp_en gives the overall claim count. freq_a is the CV of the frequency distribution and freq_b is the number of claimants per claim (or claims per occurrence). Hence, the Poisson component has mean exp_en / freq_b and the number of claims per occurrence has mean freq_b. This parameterization may not be ideal(!).
Mixture Frequency Types
These distributions are G-mixed Poisson, so N | G ~ Poisson(n G). They are labelled by the name of the mixing distribution or the common name for the resulting frequency distribution. See Panjer and Willmot or JKK.
In all cases freq_a is the CV of the mixing distribution which corresponds to the asympototic CV of the frequency distribution and of any aggregate when the severity has a variance.
gamma
: negative binomial, freq_a = cv of gamma distributiondelaporte
: shifted gamma, freq_a = cv of mixing disitribution, freq_b = proportion of certain claims = shift. freq_b must be between 0 and 1.ig
: inverse gaussian, freq_a = cv of mixing distributionsig
: shifted inverse gaussian, freq_a = cv of mixing disitribution, freq_b = proportion of certain claims = shift. freq_b must be between 0 and 1.sichel
: generalized inverse gaussian mixing distribution, freq_a = cv of mixing distribution and freq_b = lambda value. The beta and mu parameters solved to match moments. Note lambda = -0.5 corresponds to inverse gaussian and 0.5 to reciprocal inverse gauusian. Other special cases are available.sichel.gamma
: generalized inverse gaussian mixture where the parameters match the moments of a delaporte distribution with given freq_a and freq_bsichel.ig
: generalized inverse gaussian mixture where the parameters match the moments of a shifted inverse gaussian distribution with given freq_a and freq_b. This parameterization has poor numerical stability and may fail.beta
: beta mixing with freq_a = Cv where beta is supported on the interval [0, freq_b]. This method should be used carefully. It has poor numerical stability and can produce bizzare aggregates when the alpha or beta parameters are < 1 (so there is a mode at 0 or freq_b).
Code proof for Neyman A:
from aggregate import build, qd mean = 10 eggs_per_cluster = 4 neya = build(f'agg Neya {mean} claims dsev[1] neymana {eggs_per_cluster}') qd(neya) po = build(f'agg Po4 {eggs_per_cluster} claims dsev[1] poisson') po_pmf = po.density_df.query('p_total > 1e-13').p_total byhand = build(f'agg ByHand {mean / eggs_per_cluster} claims dsev {list(po_pmf.index)} {po_pmf.values} poisson') qd(byhand) df = pd.concat((neya.density_df.p_total, byhand.density_df.p_total), axis=1) df.columns = ['neya', 'byhand'] df['err'] = df.neya - df.byhand assert df.err.abs().max() < 1e-5 df.head(40)
Code proof for Pascal:
from aggregate import build, qd mean = 10 claims_per_occ =1.24 overall_cv = 1.255 pascal = build(f'agg PascalEg {mean} claims dsev[1] pascal {overall_cv} {claims_per_occ}', log2=16) qd(pascal) c = (mean * overall_cv**2 - 1 - claims_per_occ) / claims_per_occ th = claims_per_occ * c a = 1 / c # from form of nb pgf identify r = a and beta = theta, mean is rb, var is rb(1+b) nb = build(f'agg NB {claims_per_occ} claims dsev[1] negbin {th + 1}', log2=16) nb_pmf = nb.density_df.query('p_total > 1e-13').p_total qd(nb) byhand = build(f'agg ByHand {mean / claims_per_occ} claims dsev {list(nb_pmf.index)} {nb_pmf.values} poisson', log2=16) qd(byhand) df = pd.concat((pascal.density_df.p_total, byhand.density_df.p_total), axis=1) df.columns = ['pascal', 'byhand'] df['err'] = df.pascal - df.byhand assert df.err.abs().max() < 1e-5 df.head(40)
- Parameters
freq_name – name of the frequency distribution, poisson, geometric, etc.
freq_a –
freq_b –
- __init__(freq_name, freq_a, freq_b, freq_zm, freq_p0)[source]
Creates the freq_pgf and moment function:
moment function(n) returns EN, EN^2, EN^3 when EN=n.
freq_pgf(n, z) is the freq_pgf evaluated at log(z) when EN=n
- Parameters
freq_name – name of the frequency distribution, poisson, geometric, etc.
freq_a –
freq_b –
freq_zm – freq_zm True if zero modified, default False
freq_p0 – modified p0, probability of zero claims
3.3.2. Severity Class
- class aggregate.distributions.Severity(sev_name, exp_attachment=None, exp_limit=inf, sev_mean=0, sev_cv=0, sev_a=nan, sev_b=0, sev_loc=0, sev_scale=0, sev_xs=None, sev_ps=None, sev_wt=1, sev_lb=0, sev_ub=inf, sev_conditional=True, name='', note='')[source]
- __init__(sev_name, exp_attachment=None, exp_limit=inf, sev_mean=0, sev_cv=0, sev_a=nan, sev_b=0, sev_loc=0, sev_scale=0, sev_xs=None, sev_ps=None, sev_wt=1, sev_lb=0, sev_ub=inf, sev_conditional=True, name='', note='')[source]
A continuous random variable, subclasses
scipy.statistics_df.rv_continuous
, adding layer and attachment functionality. It overridescdf
pdf
isf
ppf
sf
stats
See scipy.stats continuous rvs for more details about available distributions. The following distributions with two shape parameters are supported:
Burr (
burr
)Generalized Pareto (
genpareto
)Generalized gamma (
gengamma
)
It is easy to add others in the code below. With two shape parameters the mean cv input format is not available.
See code in Problems and Solutions to extract distributions from scipy stats by introsepection.
- Parameters
sev_name – scipy statistics_df continuous distribution | (c|d)histogram cts or discerte | fixed
exp_attachment – None if layer is missing, distinct from 0; if a = 0 then losses are conditional on X>a, if a = None then losses are conditional on X>=0
exp_limit –
sev_mean –
sev_cv –
sev_a – first shape parameter
sev_b – second shape parameter (e.g., beta)
sev_loc – scipy.stats location parameter
sev_scale – scipy.stats scale parameter
sev_xs – for fixed or histogram classes
sev_ps –
sev_wt – this is not used directly; but it is convenient to pass it in and ignore it because sevs are implicitly created with sev_wt=1.
sev_conditional – conditional or unconditional; for severities use conditional
name – name of the severity object
note – optional note.
- cv_to_shape(cv, hint=1)[source]
Create a frozen object of type dist_name with given cv. The lognormal, gamma, inverse gamma and inverse gaussian distributions are solved analytically. Other distributions solved numerically and may be unstable.
- Parameters
cv –
hint –
- Returns
- mean_to_scale(shape, mean, loc=0)[source]
Adjust the scale to achieved desired mean. Return a frozen instance.
- Parameters
shape –
mean –
loc – location parameter (note: location is added to the mean…)
- Returns
- moms()[source]
Revised moments for Severity class. Trying to compute moments of
X(a,d) = min(d, (X-a)+)
==> E[X(a,d)^n] = int_a^d (x-a)^n f(x) dx + (d-a)^n S(d).
Let x = q(p), F(x) = p, f(x)dx = dp. for 1,2,…n(<=3).
E[X(a,d)^n] = int_{F(a)}^{F(d)} (q(p)-a)^n dp + (d-a)^n S(d)
The base is to compute int_{F(a)}^{F(d)} q(p)^n dp. These are exi below. They are then adjusted to create the moments needed.
Old moments tried to compute int S(x)dx, but that is over a large, non-compact domain and did not work so well. With 0.9.3 old_moms was removed. Old_moms code did this:
ex1 = safe_integrate(lambda x: self.fz.sf(x), 1) ex2 = safe_integrate(lambda x: 2 * (x - self.attachment) * self.fz.sf(x), 2) ex3 = safe_integrate(lambda x: 3 * (x - self.attachment) ** 2 * self.fz.sf(x), 3)
Test examples
def test(mu, sigma, a, y): global moms import types # analytic with no layer attachment fz = ss.lognorm(sigma, scale=np.exp(mu)) tv = np.array([np.exp(k*mu + k * k * sigma**2/2) for k in range(1,4)]) # old method s = agg.Severity('lognorm', sev_a=sigma, sev_scale=np.exp(mu), exp_attachment=a, exp_limit=y) est = np.array(s.old_moms()) # swap out moment routine setattr(s, moms.__name__, types.MethodType(moms, s)) ans = np.array(s.moms()) # summarize and report sg = f'Example: mu={mu} sigma={sigma} a={a} y={y}' print(f'{sg}\n{"="*len(sg)}') print(pd.DataFrame({'new_ans' : ans, 'old_ans': est, 'err': ans/est-1, 'no_la_analytic' : tv})) test(8.7, .5, 0, np.inf) test(8.7, 2.5, 0, np.inf) test(8.7, 2.5, 10e6, 200e6)
Example:
mu=8.7
,sigma=0.5
,a=0
,y=inf
new_ans old_ans err no_la_analytic 0 6.802191e+03 6.802191e+03 3.918843e-11 6.802191e+03 1 5.941160e+07 5.941160e+07 3.161149e-09 5.941160e+07 2 6.662961e+11 6.662961e+11 2.377354e-08 6.662961e+11
Example: mu=8.7 sigma=2.5 a=0 y=inf (here the old method failed)
new_ans old_ans err no_la_analytic 0 1.366256e+05 1.366257e+05 -6.942541e-08 1.366257e+05 1 9.663487e+12 1.124575e+11 8.493016e+01 9.669522e+12 2 2.720128e+23 7.597127e+19 3.579469e+03 3.545017e+23
Example: mu=8.7 sigma=2.5 a=10000000.0 y=200000000.0
new_ans old_ans err no_la_analytic 0 1.692484e+07 1.692484e+07 2.620126e-14 1.366257e+05 1 1.180294e+15 1.180294e+15 5.242473e-13 9.669522e+12 2 1.538310e+23 1.538310e+23 9.814372e-14 3.545017e+23
The numerical issues are very sensitive. Goign for a compromise between speed and accuracy. Only an issue for very thick tailed distributions with no upper limit - not a realistic situation. Here is a tester program for two common cases:
logger_level(30) # see what is going on for sh, dist in zip([1,2,3,4, 3.5,2.5,1.5,.5], ['lognorm']*3 + ['pareto']*4): s = Severity(dist, sev_a=sh, sev_scale=1, exp_attachment=0) print(dist,sh, s.moms()) if dist == 'lognorm': print('actual', [(n, np.exp(n*n*sh*sh/2)) for n in range(1,4)])
- Returns
vector of moments.
np.nan
signals unreliable but finite value.np.inf
is correct, the moment does not exist.
- plot(n=100, axd=None, figsize=(7.0, 4.9), layout='AB\nCD')[source]
Quick plot, updated for 0.9.3 with mosaic and no grid lines. (F(x), x) plot replaced with log density plot.
- param n
number of points to plot.
- param axd
axis dictionary, if None, create new figure. Must have keys ‘A’, ‘B’, ‘C’, ‘D’.
- param figsize
(width, height) in inches.
- param layout
the subplot_mosaic layout of the figure. Default is ‘AB
- CD’.
- return
3.3.3. Aggregate Class
- class aggregate.distributions.Aggregate(name, exp_el=0, exp_premium=0, exp_lr=0, exp_en=0, exp_attachment=None, exp_limit=inf, sev_name='', sev_a=nan, sev_b=0, sev_mean=0, sev_cv=0, sev_loc=0, sev_scale=0, sev_xs=None, sev_ps=None, sev_wt=1, sev_lb=0, sev_ub=inf, sev_conditional=True, sev_pick_attachments=None, sev_pick_losses=None, occ_reins=None, occ_kind='', freq_name='', freq_a=0, freq_b=0, freq_zm=False, freq_p0=nan, agg_reins=None, agg_kind='', note='')[source]
- __init__(name, exp_el=0, exp_premium=0, exp_lr=0, exp_en=0, exp_attachment=None, exp_limit=inf, sev_name='', sev_a=nan, sev_b=0, sev_mean=0, sev_cv=0, sev_loc=0, sev_scale=0, sev_xs=None, sev_ps=None, sev_wt=1, sev_lb=0, sev_ub=inf, sev_conditional=True, sev_pick_attachments=None, sev_pick_losses=None, occ_reins=None, occ_kind='', freq_name='', freq_a=0, freq_b=0, freq_zm=False, freq_p0=nan, agg_reins=None, agg_kind='', note='')[source]
The
Aggregate
distribution class manages creation and calculation of aggregate distributions. It allows for very flexible creation of Aggregate distributions. Severity can express a limit profile, a mixed severity or both. Mixed frequency types share a mixing distribution across all broadcast terms to ensure an appropriate inter- class correlation.- Parameters
name – name of the aggregate
exp_el – expected loss or vector
exp_premium – premium volume or vector (requires loss ratio)
exp_lr – loss ratio or vector (requires premium)
exp_en – expected claim count per segment (self.n = total claim count)
exp_attachment – occurrence attachment; None indicates no limit clause, which is treated different from an attachment of zero.
exp_limit – occurrence limit
sev_name – severity name or sev.BUILTIN_SEV or meta.var agg or port or similar or vector or matrix
sev_a – scipy stats shape parameter
sev_b – scipy stats shape parameter
sev_mean – average (unlimited) severity
sev_cv – unlimited severity coefficient of variation
sev_loc – scipy stats location parameter
sev_scale – scipy stats scale parameter
sev_xs – xs and ps must be provided if sev_name is (c|d)histogram, xs are the bucket break points
sev_ps – ps are the probability densities within each bucket; if buckets equal size no adjustments needed
sev_wt – weight for mixed distribution
sev_lb – lower bound for severity (length of sev_lb must equal length of sev_ub and weights)
sev_ub – upper bound for severity
sev_conditional – if True, severity is conditional, else unconditional.
sev_pick_attachments – if not None, a list of attachment points to define picks
sev_pick_losses – if not None, a list of losses by layer
occ_reins – layers: share po layer xs attach or XXXX
occ_kind – ceded to or net of
freq_name – name of frequency distribution
freq_a – cv of freq dist mixing distribution
freq_b – claims per occurrence (delaporte or sig), scale of beta or lambda (Sichel)
freq_zm – True/False zero modified flag
freq_p0 – if freq_zm, provides the modified value of p0; default is nan
agg_reins – layers
agg_kind – ceded to or net of
note – note, enclosed in {}
- _apply_reins_work(reins_list, base_density, debug=False)[source]
Actually do the work. Called by apply_reins and reins_audit_df. Only needs self to get limits, which it must guess without q (not computed at this stage). Does not need to know if occ or agg reins, only that the correct base_density is supplied.
- Parameters
reins_list –
kind – occ or agg, for debug plotting
debug –
- Returns
ceder, netter,
- _reins_audit_df_work(kind='occ')[source]
Apply each re layer separately and aggregate loss and other stats.
- aggregate_error_analysis(log2, bs2_from=None, **kwargs)[source]
Analysis of aggregate error across a range of bucket sizes. If
bs2_from is None
use recommend_bucket plus/mins 3. Note: if distribution does not have a second moment, you must enter bs2_from.- Parameters
log2 –
bs2_from – lower bound on bs to use, in log2 terms; estimate using
recommend_bucket
if not input.kwargs – passed to
update
- apply_agg_reins(debug=False, padding=1, tilt_vector=None)[source]
Apply the entire agg reins structure and save output. For by layer detail create reins_audit_df. Makes sev_density_gross, sev_density_net and sev_density_ceded, and updates sev_density to the requested view.
Not reflected in statistics df.
- Returns
- apply_distortion(dist)[source]
Apply distortion to the aggregate density and append as exag column to density_df. # TODO check consistent with other implementations. :param dist: :return:
- apply_occ_reins(debug=False)[source]
Apply the entire occ reins structure and save output For by layer detail create reins_audit_df Makes sev_density_gross, sev_density_net and sev_density_ceded, and updates sev_density to the requested view.
Not reflected in statistics df.
- Parameters
debug – More verbose.
- Returns
- approximate(approx_type='slognorm', output='scipy')[source]
Create an approximation to self using method of moments matching.
Compare to Portfolio.approximate which returns a single sev fixed freq agg, this returns a scipy dist by default.
Use case: exam questions with the normal approacimation!
- Parameters
approx_type – norm, lognorn, slognorm (shifted lognormal), gamma, sgamma. If ‘all’ then returns a dictionary of each approx.
output – scipy - frozen scipy.stats continuous rv object; sev_decl - DecL program for severity (to substituate into an agg ; no name) sev_kwargs - dictionary of parameters to create Severity agg_decl - Decl program agg T 1 claim sev_decl fixed any other string - created Aggregate object
- Returns
as above.
- cdf(x, kind='previous')[source]
Return cumulative probability distribution at x using kind interpolation.
2022-10 change: kind introduced; default was linear
- Parameters
x – loss size
- Returns
- cramer_lundberg(rho, cap=0, excess=0, stop_loss=0, kind='index', padding=1)
Return the Pollaczeck-Khinchine Capital function relating surplus to eventual probability of ruin. Assumes frequency is Poisson.
See Embrechts, Kluppelberg, Mikosch 1.2, page 28 Formula 1.11
TODO: Should return a named tuple.
- Parameters
rho – rho = prem / loss - 1 is the margin-to-loss ratio
cap – cap = cap severity at cap, which replaces severity with X | X <= cap
excess – excess = replace severity with X | X > cap (i.e. no shifting)
stop_loss – stop_loss = apply stop loss reinsurance to cap, so X > stop_loss replaced with Pr(X > stop_loss) mass
kind –
padding – for update (the frequency tends to be high, so more padding may be needed)
- Returns
ruin vector as pd.Series and function to lookup (no interpolation if kind==index; else interp) capitals
- property density_df
Create and return the density_df data frame. A read only property, though if you write d = a.density_df you can obviously edit d. Some duplication of columns (p and p_total) to ensure consistency with Portfolio.
- Returns
DataFrame similar to Portfolio.density_df.
- property describe
Theoretic and empirical stats. Used in _repr_html_.
- discretize(sev_calc, discretization_calc, normalize)[source]
Discretize the severity distributions and weight.
sev_calc
describes how the severity is discretize, see `Discretizing the Severity Distribution`_. The options are discrete=round, forward, backward or moment.sev_calc='continuous'
(same as forward, kept for backwards compatibility) is used when you think of the resulting distribution as continuous across the buckets (which we generally don’t). The buckets are not shifted and so \(Pr(X=b_i) = Pr( b_{i-1} < X \le b_i)\). Note that \(b_{i-1}=-bs/2\) is prepended.We use the discretized distribution as though it is fully discrete and only takes values at the bucket points. Hence, we should use sev_calc=’discrete’. The buckets are shifted left by half a bucket, so \(Pr(X=b_i) = Pr( b_i - b/2 < X \le b_i + b/2)\).
The other wrinkle is the righthand end of the range. If we extend to np.inf then we ensure we have probabilities that sum to 1. But that method introduces a probability mass in the last bucket that is often not desirable (we expect to see a smooth continuous distribution, and we get a mass). The other alternative is to use endpoint = 1 bucket beyond the last, which avoids this problem but can leave the probabilities short. We opt here for the latter and normalize (rescale).
discretization_calc
controls whether individual probabilities are computed using backward-differences of the survival function or forward differences of the distribution function, or both. The former is most accurate in the right-tail and the latter for the left-tail of the distribution. We are usually concerned with the right-tail, so prefer survival. Using both takes the greater of the two esimates giving the best of both worlds (underflow makes distribution zero in the right-tail and survival zero in the left tail, so the maximum gives the best estimate) at the expense of computing time.Sensible defaults: sev_calc=discrete, discretization_calc=survival, normalize=True.
- Parameters
sev_calc – discrete=round, forward, backward, or continuous and method becomes discrete otherwise
discretization_calc – survival, distribution or both; in addition the method then becomes survival
normalize – if True, normalize the severity so sum probs = 1. This is generally what you want; but when dealing with thick tailed distributions it can be helpful to turn it off.
- Returns
- easy_update(log2=16, bs=0, recommend_p=0.99999, debug=False, **kwargs)
Convenience function, delegates to update_work. Avoids having to pass xs. Also aliased as easy_update for backward compatibility.
- Parameters
log2 –
bs –
recommend_p – p value passed to recommend_bucket. If > 1 converted to 1 - 10**-p in rec bucket.
debug –
kwargs – passed through to update
- Returns
- entropy_fit(n_moments, tol=1e-10, verbose=False)[source]
Find the max entropy fit to the aggregate based on n_moments fit. The constant is added (sum of probabilities constraint), for two moments there are n_const = 3 constrains.
Based on discussions with, and R code from, Jon Evans
Run
ans = obj.entropy_fit(2) ans['ans_df'].plot()
to compare the fits.
- Parameters
n_moments – number of moments to match
tol –
verbose –
- Returns
- freq_pmf(log2)[source]
Return the frequency probability mass function (pmf) computed using 2**log2 buckets. Uses self.en to compute the expected frequency. The
Frequency
does not know the expected claim count, so this is a method ofAggregate
.
- limits(stat='range', kind='linear', zero_mass='include')[source]
Suggest sensible plotting limits for kind=range, density, etc., same as Portfolio.
Should optionally return a locator for plots?
Called by ploting routines. Single point of failure!
Must work without q function when not computed (apply_reins_work for occ reins; then use report_ser instead).
- Parameters
stat – range or density (for y axis)
kind – linear or log (this is the y-axis, not log of range…that is rarely plotted)
zero_mass – include exclude, for densities
- Returns
- pdf(x)[source]
Probability density function, assuming a continuous approximation of the bucketed density.
- Parameters
x –
- Returns
- picks(attachments, layer_loss_picks, debug=False)[source]
Adjust the computed severity to hit picks targets in layers defined by a. Delegates work to
utilities.picks_work
. See that function for details.
- plot(axd=None, xmax=0, **kwargs)[source]
Basic plot with severity and aggregate, linear and log plots and Lee plot.
- Parameters
xmax – Enter a “hint” for the xmax scale. E.g., if plotting gross and net you want all on the same scale. Only used on linear scales?
axd –
kwargs – passed to make_mosaic_figure
- Returns
- pmf(x)[source]
Probability mass function, treating aggregate as discrete x must be in the index (?)
- pollaczeck_khinchine(rho, cap=0, excess=0, stop_loss=0, kind='index', padding=1)[source]
Return the Pollaczeck-Khinchine Capital function relating surplus to eventual probability of ruin. Assumes frequency is Poisson.
See Embrechts, Kluppelberg, Mikosch 1.2, page 28 Formula 1.11
TODO: Should return a named tuple.
- Parameters
rho – rho = prem / loss - 1 is the margin-to-loss ratio
cap – cap = cap severity at cap, which replaces severity with X | X <= cap
excess – excess = replace severity with X | X > cap (i.e. no shifting)
stop_loss – stop_loss = apply stop loss reinsurance to cap, so X > stop_loss replaced with Pr(X > stop_loss) mass
kind –
padding – for update (the frequency tends to be high, so more padding may be needed)
- Returns
ruin vector as pd.Series and function to lookup (no interpolation if kind==index; else interp) capitals
- ppf(p, kind='lower')
Return quantile function of density_df.p_total.
Definition 2.1 (Quantiles) x(α) = qα(X) = inf{x ∈ R : P[X ≤ x] ≥ α} is the lower α-quantile of X x(α) = qα(X) = inf{x ∈ R : P[X ≤ x] > α} is the upper α-quantile of X.
kind=='middle'
has been removed.- Parameters
p –
kind – ‘lower’ or ‘upper’.
- Returns
- property pprogram
pretty print the program
- property pprogram_html
pretty print the program to html
- price(p, g, kind='var')[source]
Price using regulatory and pricing g functions, mirroring Portfolio.price. Unlike Portfolio, cannot calibrate. Applying specified Distortions only. If calibration is needed, embed Aggregate in a one-line Portfolio object.
Compute E_price (X wedge E_reg(X) ) where E_price uses the pricing distortion and E_reg uses the regulatory distortion.
Regulatory capital distortion is applied on unlimited basis:
reg_g
can be:if input < 1 it is a number interpreted as a p value and used to determine VaR capital
if input > 1 it is a directly input capital number
d dictionary: Distortion; spec { name = dist name | var, shape=p value a distortion used directly
pricing_g
is { name = ph|wang and shape=}, if shape (lr or roe not allowed; require calibration).if ly, must include ro in spec
- Parameters
p – a distortion function spec or just a number; if >1 assets, if <1 a prob converted to quantile
kind – var lower upper tvar
g – pricing distortion function
- Returns
- q(p, kind='lower')[source]
Return quantile function of density_df.p_total.
Definition 2.1 (Quantiles) x(α) = qα(X) = inf{x ∈ R : P[X ≤ x] ≥ α} is the lower α-quantile of X x(α) = qα(X) = inf{x ∈ R : P[X ≤ x] > α} is the upper α-quantile of X.
kind=='middle'
has been removed.- Parameters
p –
kind – ‘lower’ or ‘upper’.
- Returns
- q_sev(p)[source]
Compute quantile of severity distribution, returning element in the index. Very similar code to q, but only lower quantiles.
- Parameters
p –
- Returns
- recommend_bucket(log2=10, p=0.99999, verbose=False)[source]
Recommend a bucket size given 2**N buckets. Not rounded.
For thick tailed distributions need higher p, try p=1-1e-8.
If no second moment, throws a ValueError. You just can’t guess in that situation.
- Parameters
log2 – log2 of number of buckets. log2=10 is default.
p – percentile to use to determine needed range. Default is RECOMMEND_P. if > 1 converted to 1-10**-n.
verbose – print out recommended bucket sizes for 2**n for n in {log2, 16, 13, 10}
- Returns
- property reinsurance_audit_df
Create and return the _reins_audit_df data frame. Read only property.
- Returns
- reinsurance_description(kind='both', width=0)[source]
Text description of the reinsurance.
- Parameters
kind – both, occ, or agg
width – width of text for textwrap.fill; omitted if width==0
- property reinsurance_df
Version of density_df tailored to reinsurance. Several cases
occ program only: agg_density_.. is recomputed manually for all three outcomes
agg program only: sev_density_… not set for gcn
both programs: agg is gcn for the agg program applied to the requested occ output
_apply_reins_work
- reinsurance_kinds()[source]
Text desciption of kinds of reinsurance applied: None, Occurrence, Aggergate, both.
- Returns
- property reinsurance_occ_layer_df
How losses are layered by the occurrence reinsurance. Expected loss, CV layer loss, and expected counts to layers.
- reinsurance_occ_plot(axs=None)[source]
Plots for occurrence reinsurance: occurrence log density and aggregate quantile plot.
- property reinsurance_report_df
Create and return a dataframe with the reinsurance report. TODO: sort out the overlap with reinsurance_audit_df (occ and agg) What this function adds is the ceded/net of occ aggregates before application of the agg reinsurance. The pure occ and agg parts are in reinsurance_audit_df.
- property report_df
Created on the fly report to audit creation of object. There were some bad choices of columns in audit_df…but it [maybe] embedded in other code…. Eg the use of _1 vs _m for mean is inconsistent.
- Returns
- rescale(scale, kind='homog')[source]
Return a rescaled Aggregate object - used to compute derivatives.
All need to be safe multiplies because of array specification there is an array that is not a numpy array
TODO have parser return numpy arrays not lists!
- Parameters
scale – amount of scale
kind – homog of inhomog
- Returns
- sample(n, replace=True)[source]
Draw a sample of n items from the aggregate distribution. Wrapper around pd.DataFrame.sample.
- property sev
Make exact sf, cdf and pdfs and store in namedtuple for use as sev.cdf etc.
- severity_error_analysis(sev_calc='round', discretization_calc='survival', normalize=True)[source]
Analysis of severity component errors, uses the current bs in self. Gives detailed, component by component, error analysis of severities. Includes discretization error (bs large relative to mean) and truncation error (tail integral large).
Total S shows the aggregate not severity. Generally about self.n * (1 - sum_p) (per Feller).
- sf(x)[source]
Return survival function using linear interpolation.
- Parameters
x – loss size
- Returns
- snap(x)[source]
Snap value x to the index of density_df, i.e., as a multiple of self.bs.
- Parameters
x –
- Returns
- property spec
Get the dictionary specification, but treat as a read only property
- Returns
- property spec_ex
All relevant info.
- Returns
- property statistics
Pandas series of theoretic frequency, severity, and aggregate 1st, 2nd, and 3rd moments. Mean, cv, and skewness.
- Returns
- tvar(p, kind='')[source]
Updated June 2023, 0.13.0
Compute the tail value at risk at threshold p
Definition 2.6 (Tail mean and Expected Shortfall) Assume E[X−] < ∞. Then x¯(α) = TM_α(X) = α^{−1}E[X 1{X≤x(α)}] + x(α) (α − P[X ≤ x(α)]) is α-tail mean at level α the of X. Acerbi and Tasche (2002)
We are interested in the right hand exceedence [?? note > vs ≥] α^{−1}E[X 1{X > x(α)}] + x(α) (P[X ≤ x(α)] − α)
McNeil etc. p66-70 - this follows from def of ES as an integral of the quantile function
q is exact quantile (most of the time) q1 is the smallest index element (bucket multiple) greater than or equal to q
tvar integral is int_p^1 q(s)ds = int_q^infty xf(x)dx = q + int_q^infty S(x)dx we use the last approach. np.trapz approxes the integral. And the missing piece between q and q1 approx as a trapezoid too.
- Parameters
p –
kind –
- Returns
- update(log2=16, bs=0, recommend_p=0.99999, debug=False, **kwargs)[source]
Convenience function, delegates to update_work. Avoids having to pass xs. Also aliased as easy_update for backward compatibility.
- Parameters
log2 –
bs –
recommend_p – p value passed to recommend_bucket. If > 1 converted to 1 - 10**-p in rec bucket.
debug –
kwargs – passed through to update
- Returns
- update_work(xs, padding=1, tilt_vector=None, approximation='exact', sev_calc='discrete', discretization_calc='survival', normalize=True, force_severity=False, debug=False)[source]
Compute a discrete approximation to the aggregate density.
See discretize for sev_calc, discretization_calc and normalize.
Quick simple test with log2=13 update took 5.69 ms and _eff took 2.11 ms. So quicker but not an issue unless you are doing many buckets or aggs.
- Parameters
xs – range of x values used to discretize
padding – for FFT calculation
tilt_vector – tilt_vector = np.exp(tilt_amount * np.arange(N)), N=2**log2, and tilt_amount * N < 20 recommended
approximation – ‘exact’ = perform frequency / severity convolution using FFTs. ‘slognorm’ or ‘sgamma’ use a shifted lognormal or shifted gamma approximation.
sev_calc – discrete=round, forward, backward, or continuous and method becomes discrete otherwise
discretization_calc – survival, distribution or both; in addition the method then becomes survival
normalize – if True, normalize the severity so sum probs = 1. This is generally what you want; but when dealing with thick tailed distributions it can be helpful to turn it off.
force_severity – make severities even if using approximation, for plotting
debug – run reinsurance in debug model if True.
- Returns
- property valid
Check if the model appears valid. An answer of True means the model is “not unreasonable”. It does not guarantee the model is valid. On the other hand, False means it is definitely suspect. (The interpretation is similar to the null hypothesis in a statistical test). Called and reported automatically by qd for Aggregate objects.
Checks the relative errors (from
self.describe
) for:severity mean < eps
severity cv < 10 * eps
severity skew < 100 * eps (skewness is more difficult to estimate)
aggregate mean < eps and < 2 * severity mean relative error (larger values indicate possibility of aliasing and that
bs
is too small).aggregate cv < 10 * eps
aggregate skew < 100 * esp
The default uses eps = 1e-4 relative error. This can be changed by setting the validation_eps variable.
Test only applied for CV and skewness when they are > 0.
Run with logger level 20 (info) for more information on failures.
A Type 1 error (rejecting a valid model) is more likely than Type 2 (failing to reject an invalide one).
- Returns
True (interpreted as not unreasonable) if all tests are passed, else False.
- var_dict(p, kind='lower', snap=False)[source]
Make a dictionary of value at risks for the line, mirrors Portfolio.var_dict. Here is just marshals calls to the appropriate var or tvar function.
No epd. Allows the price function to run consistently with Portfolio version.
Example Use:
for p, arg in zip([.996, .996, .996, .985, .01], ['var', 'lower', 'upper', 'tvar', 'epd']): print(port.var_dict(p, arg, snap=True))
- Parameters
p –
kind – var (defaults to lower), upper, lower, tvar
snap – snap tvars to index
- Returns