3.5. Utilities

3.5.1. Moment Aggregator Class

class aggregate.utilities.MomentAggregator(freq_moms=None)[source]

Purely accumulates moments Used by Portfolio Not frequency aware makes report_ser df and statistics_df

Internal variables agg, sev, freq, tot = running total, 1, 2, 3 = noncentral moments, E(X^k)

Parameters: freq_moms – function of one variable returning first three noncentral moments of the underlying frequency distribution

__init__(freq_moms=None)[source]

add_f1s(f1, s1, s2, s3)[source]

accumulate new moments defined by f1 and s - fills in f2, f3 based on stored frequency distribution

used by Aggregate

compute agg for the latest values

Parameters

f1 –
s1 –
s2 –
s3 –

Returns

add_fs(f1, f2, f3, s1, s2, s3)[source]

accumulate new moments defined by f and s

used by Portfolio

compute agg for the latest values

Parameters

f1 –
f2 –
f3 –
s1 –
s2 –
s3 –

Returns

add_fs2(f1, vf, s1, vs)[source]: accumulate based on first two moments entered as mean and variance - this is how questions are generally written.

static agg_from_fs(f1, f2, f3, s1, s2, s3)[source]

aggregate_project moments from freq and sev components

Parameters

f1 –
f2 –
f3 –
s1 –
s2 –
s3 –

Returns

static agg_from_fs2(f1, vf, s1, vs)[source]

aggregate_project moments from freq and sev ex and var x

Parameters

f1 –
vf –
s1 –
vs –

Returns

static column_names()[source]

list of the moment and statistics_df names for f x s = a

Returns

static cumulate_moments(m1, m2, m3, n1, n2, n3)[source]

Moments of sum of indepdendent variables

Parameters

m1 – 1st moment, E(X)
m2 – 2nd moment, E(X^2)
m3 – 3rd moment, E(X^3)
n1 –
n2 –
n3 –

Returns

get_fsa_stats(total, remix=False)[source]

get the current f x s = agg statistics_df and moments total = true use total else, current remix = true for total only, re-compute freq statistics_df based on total freq 1

Parameters

total – binary
remix – combine all sevs and recompute the freq moments from total freq

Returns

moments(mom_type, total=True)[source]

vector of the moments; convenience function

Parameters

mom_type –
total –

Returns

moments_to_mcvsk(mom_type, total=True)[source]

convert noncentral moments into mean, cv and skewness type = agg | freq | sev | mix delegates work

Parameters

mom_type –
total –

Returns

static static_moments_to_mcvsk(ex1, ex2, ex3)[source]

returns mean, cv and skewness from non-central moments

Parameters

ex1 –
ex2 –
ex3 –

Returns

stats_series(name, limit, pvalue, remix)[source]

combine elements into a reporting series handles order, index names etc. in one place

Parameters

name – series name
limit –
pvalue –
remix – called from Aggregate want remix=True to collect mix terms; from Portfolio remix=False

Returns

3.5.2. Moment Wrangler Class

class aggregate.utilities.MomentWrangler[source]

Conversion between central, noncentral and factorial moments

Input any one and ask for any translation.

Stores moments as noncentral internally

__init__()[source]

_make_factorial()[source]: add factorial from central

3.5.3. Axis Manager Class

class aggregate.utilities.AxisManager(n, figsize=None, height=2, aspect=1, nr=5)[source]

Manages creation of a grid of axes for plotting. Allows pandas plot and matplotlib to plot to same set of axes.

Always created and managed through axiter_factory function

Parameters

n – number of plots in grid
figsize –
height – height of individual plot
aspect – aspect ratio of individual plot
nr – number of plots per row

__init__(n, figsize=None, height=2, aspect=1, nr=5)[source]

dimensions()[source]

return dimensions (width and height) of current layout

Returns

static good_grid(n, c=4)[source]: Good layout for n plots :param n: :return:

grid(size=0)[source]

return a block of axes suitable for Pandas if size=0 return all the axes

Parameters: size –
Returns

grid_size(n, subgrid=False)[source]

appropriate grid size given class parameters

Parameters

n –
subgrid – call is for a subgrid, no special treatment for 6 and 8

Returns

static make_figure(n, aspect=1.5, **kwargs)[source]: make the figure and iterator :param n: :param aspect: :return:

static print_fig(n, aspect=1.5)[source]: printout code…to insert (TODO copy to clipboard!) :param n: :param aspect: :return:

static size_figure(r, c, aspect=1.5)[source]: reasonable figure size for n plots :param r: :param c: :param aspect: :return:

tidy()[source]

delete unused axes to tidy up a plot

Returns

static tidy_up(f, ax)[source]: delete unused frames out of a figure :param ax: :return:

3.5.4. Utilities Module

class aggregate.utilities.Answer(**kwargs)[source]

__init__(**kwargs)[source]

Generic answer wrapping class with plotting

Parameters: kwargs – key=value to wrap

_repr_html_(): List elements

list()[source]: List elements

static nice(x)[source]: return a nice rep of x

summary()[source]: just print out the dataframes: horz or vertical as appropriate reasonable styling :return:

class aggregate.utilities.GCN(gross, net, ceded)

_asdict(): Return a new dict which maps field names to their values.

classmethod _make(iterable): Make a new GCN object from a sequence or iterable

_replace(**kwds): Return a new GCN object replacing specified fields with new values

ceded: Alias for field number 2

gross: Alias for field number 0

net: Alias for field number 1

class aggregate.utilities.GreatFormatter(sci=True, power_range=(-3, 3), offset=True, mathText=False)[source]

__init__(sci=True, power_range=(-3, 3), offset=True, mathText=False)[source]

aggregate.utilities.approximate_work(m, cv, skew, name, agg_str, note, approx_type, output)[source]

Does the work for Portfolio.approximate and Aggregate.approximate. See their documentation.

Parameters: output – scipy - frozen scipy.stats continuous rv object; agg_decl sev_decl - DecL program for severity (to substituate into an agg ; no name) sev_kwargs - dictionary of parameters to create Severity agg_decl - Decl program agg T 1 claim sev_decl fixed any other string - created Aggregate object

aggregate.utilities.axiter_factory(axiter, n, figsize=None, height=2, aspect=1, nr=5)[source]

axiter = check_axiter(axiter, …) to allow chaining TODO can this be done in the class somehow?

Parameters

axiter –
n –
figsize –
height –
aspect –
nr –

Returns

aggregate.utilities.block_iman_conover(unit_losses, intra_unit_corrs, inter_unit_corr, as_frame=False)[source]

Apply Iman Conover to the unit loss blocks in unit_losses with correlation matrices in intra.

Then determine the ordering for the unit totals with correlation inter.

Re-order each unit, row by row, so that the totals have the desired correlation structure, but leaving the intra unit correlation unchanged.

unit_losses = [np.arrays or pd.Series] of losses by subunit within units, without totals

len(unit_losses) == len(intra_unit corrs)

For simplicity all normal copula; can add other later if required.

No totals input or output anywhere.

if as_frame then a dataframe version returned, for auditing.

Here is some tester code, using great.test_df to make random unit losses. Vary num_units and num_sims as required.

def bic_tester(num_units=3, num_sims=10000):
    from aggregate import random_corr_matrix
    # from great import test_df

    # create samples
    R = range(num_units)
    unit_losses = [test_df(num_sims, 3 + i) for i in R]
    totals = [u.sum(1) for u in unit_losses]

    # manual dataframe to check against
    manual = pd.concat(unit_losses + totals, keys=[f'Unit_{i}' for i in R] + ['Total' for i in R], axis=1)

    # for input to method
    unit_losses = [i.to_numpy() for i in unit_losses]
    totals = [i.to_numpy() for i in totals]

    # make corrs
    intra_unit_corrs = [random_corr_matrix(i.shape[1], p=.5, positive=True) for i in unit_losses]
    inter_unit_corr = random_corr_matrix(len(totals), p=1, positive=True)

    # apply method
    bic = block_iman_conover(unit_losses, intra_unit_corrs, inter_unit_corr, True)

    # extract frame answer, put col names back
    bic.frame.columns = manual.columns
    dm = bic.frame

    # achieved corr
    for i, target in zip(dm.columns.levels[0], intra_unit_corrs + [inter_unit_corr]):
        print(i)
        print((dm[i].corr() - target).abs().max().max())
        # print(dm[i].corr() - target)

    # total corr across subunits
    display(dm.drop(columns=['Total']).corr())

    # total corr across subunits
    display(dm.drop(columns=['Total']).corr())

    return manual, bic, intra_unit_corrs, inter_unit_corr

manual, bic, intra, inter = bic_tester(3, 10000)

aggregate.utilities.easy_formatter(ax, which, kind, places=None, power_range=(-3, 3), sep='', unit='', sci=True, mathText=False, offset=True)[source]: set which (x, y, b, both) to kind = sci, eng, nice nice = engineering but uses e-3, e-6 etc. see docs for ScalarFormatter and EngFormatter

aggregate.utilities.estimate_agg_percentile(m, cv, skew, p=0.999)[source]

Come up with an estimate of the tail of the distribution based on the three parameter fits, ln and gamma

Updated Nov 2022 with a way to estimate p based on lognormal results. How far in the tail you need to go to get an accurate estimate of the mean. See 2_x_approximation_error in the help.

Retain p param for backwards compatibility.

Parameters

m –
cv –
skew –
p – if > 1 converted to 1 - 10**-n

Returns

aggregate.utilities.explain_validation(rv)[source]: Explain the validation result rv. Don’t over report: if you fail CV don’t need to be told you fail Skew too.

aggregate.utilities.frequency_examples(n, ν, f, κ, sichel_case, log2, xmax=500)[source]

Illustrate different frequency distributions and frequency moment calculations.

sichel_case = gamma | ig | ‘’

Sample call:

df, ans = frequency_examples(n=100, ν=0.45, f=0.5, κ=1.25,
                             sichel_case='', log2=16, xmax=2500)

Parameters

n – E(N) = expected claim count
ν – CV(mixing) = asymptotic CV of any compound aggregate whose severity has a second moment
f – proportion of certain claims, 0 <= f < 1, higher f corresponds to greater skewnesss
κ – (kappa) claims per occurrence
sichel_case – gamma, ig or ‘’
xmax –

aggregate.utilities.friendly(df)[source]

Attempt to format df “nicely”, in a user-friendly manner. Not designed for big dataframes!

Parameters: df –
Returns

aggregate.utilities.ft(z, padding, tilt)[source]

fft with padding and tilt padding = n makes vector 2^n as long n=1 doubles (default) n=2 quadruples tilt is passed in as the tilting vector or None: easier for the caller to have a single instance

Parameters

z –
padding – = 1 doubles
tilt – vector of tilt values

Returns

aggregate.utilities.gamma_fit(m, cv)[source]

aggregate.utilities.get_fmts(df)[source]

reasonable formats for a styler

Parameters: df –
Returns

aggregate.utilities.html_title(txt, n=1, title_case=True)[source]

Parameters

txt –
n –
title_case –

Returns

aggregate.utilities.ic_noise(n, d)[source]: Implements steps 1, 2, 3, 4, 5, and 6 This is bottleneck function, therefore cache it It handles the true-up of the random sample to ensure it is exactly independent :param n: row :param d: columns :return:

aggregate.utilities.ic_rank(N)[source]: rankdata function: assign ranks to data, dealing with ties appropriately work by column N is a numpy array

aggregate.utilities.ic_reorder(ranks, samples)[source]: put samples into the order determined by ranks array is calibrated to the reference distribution space for the answer

aggregate.utilities.ic_t_noise(n, d, dof)[source]: as above using multivariate t distribution noise

aggregate.utilities.ift(z, padding, tilt)[source]

ift that strips out padding and adjusts for tilt

Parameters

z –
padding –
tilt –

Returns

aggregate.utilities.iman_conover(marginals, desired_correlation, dof=0, add_total=True)[source]

Perform Iman Conover shuffling on input marginals to achieve desired_correlation Desired_correlation must be positive definite and of the correct size. The result has the same rank correlation as a reference sample with the desired linear correlation. Thus, the process relies on linear and rank correlation (for the reference and the input sample) being close.

if dof==0 use normal scores; else you mv t

Sample code:

n = 100
df = pd.DataFrame({ f'line_{i}': ss.lognorm(.1 + .2*np.random.rand(),
                scale=10000).rvs(n) for i in range(3)})
desired = np.matrix([[1, -.3, 0], [-.3, 1, .8], [0, .8, 1]])
print(desired)
# check it is a corr matrix
np.linalg.cholesky(desired)

df2 = iman_conover(df, desired)
df2.corr()
df_scatter(df2)

Iman Conover Method

Make rank order the same as a reference sample with desired correlation structure.

Reference sample usually chosen as multivariate normal because it is easy and flexible, but you can use any reference, e.g. copula based.

The old @Risk software used Iman Conover.

Input: matrix $mathbf X$ of marginals and desired correlation matrix $mathbf S$

1. Make one column of scores $a_i=Phi^{-1}(i/(n+1))$ for $i=1,dots,n$ and rescale to have standard deviation one. 1. Copy the scores $r$ times to make the score matrix $mathbf M$. 1. Randomly permute the entries in each column of $mathbf M$. 1. Compute the correlation matrix $n^{-1}mathbf M’mathbf M$ of the sample scores $mathbf M$. 1. Compute the Choleski decomposition $n^{-1}mathbf M^tmathbf M=mathbf Emathbf E^t$ of the score correlation matrix. 1. Compute $mathbf M’ = mathbf M(mathbf E^t)^{-1}$, which is exactly uncorrelated. 1. Compute the Choleski decomposition $mathbf S=mathbf Cmathbf C^t$ of the desired correlation matrix $mathbf S$. 1. Compute $mathbf T=mathbf M’mathbf C^t$. The matrix $mathbf T$ has exactly the desired correlation structure 1. Let $mathbf Y$ be the input matrix $mathbf X$ with each column reordered to have exactly the same rank ordering as the corresponding column of $mathbf T$.

Relies on the fact that rank (Spearman) and linear (Pearson) correlation are approximately the same.

aggregate.utilities.integral_by_doubling(func, x0, err=1e-08)[source]

Compute $\int_{x_0}^\infty f$ as the sum

\[\int_{x_0}^\infty f = \sum_{n \ge 0} \int_{2^nx_0}^{2^{n+1}x_0} f\]

Caller should check the integral actually converges.

Parameters

func – function to be integrated.
x0 – starting x value
err – desired accuracy: stop when incremental integral is <= err.

aggregate.utilities.introspect(ob)[source]: Discover the non-private methods and properties of an object ob. Returns a pandas DataFrame.

aggregate.utilities.kaplan_meier(df, loss='loss', closed='closed')[source]

Compute Kaplan Meier Product limit estimator based on a sample of losses in the dataframe df. For each loss you know the current evaluation in column loss and a 0/1 indicator for open/closed in closed.

The output dataframe has columns

index x_i, size of loss
open - the number of open events of size x_i (open claim with this size)
closed - the number closed at size x_i
events - total number of events of size x_i
n - number at risk at x_i
s - probability of suriviving past x_i = 1 - closed / n
pl - cumulative probability of surviving past x_i

See ipython workbook kaplan_meier.ipynb for a check against lifelines and some kaggle data (telco customer churn, https://www.kaggle.com/datasets/blastchar/telco-customer-churn?resource=download https://towardsdatascience.com/introduction-to-survival-analysis-the-kaplan-meier-estimator-94ec5812a97a

Parameters

df – dataframe of data
loss – column containing loss amount data
closed – column indicating if the obervation is a closed claim (1) or open (0)

Returns

dataframe as described above

aggregate.utilities.kaplan_meier_np(loss, closed)[source]: Feeder to kaplan_meier where loss is np array of loss amounts and closed a same sized array of 0=open, 1=closed indicators.

aggregate.utilities.knobble_fonts(color=False)[source]

Not sure we should get into this…

See FigureManager in Great or common.py

https://matplotlib.org/3.1.1/tutorials/intermediate/color_cycle.html

https://matplotlib.org/3.1.1/users/dflt_style_changes.html#colors-in-default-property-cycle

https://matplotlib.org/2.0.2/examples/color/colormaps_reference.html

https://matplotlib.org/3.1.0/gallery/lines_bars_and_markers/linestyles.html

https://stackoverflow.com/questions/22408237/named-colors-in-matplotlib

aggregate.utilities.ln_fit(m, cv): lognormal parameters

aggregate.utilities.log_test()[source]: ” Issue logs at each level

aggregate.utilities.logarithmic_theta(mean)[source]: Solve for theta parameter given mean, see JKK p. 288

aggregate.utilities.logger_level(level=30, name='aggregate', verbose=False)[source]

Code from common.py

Change logger level all loggers containing name Changing for EVERY logger is a really bad idea, you get the endless debug info out of matplotlib find_font, for exapmle.

FWIW, to list all loggers:

loggers = [logging.getLogger()]  # get the root logger
loggers = loggers + [logging.getLogger(name) for name in logging.root.manager.loggerDict]
loggers

Parameters: level –
Returns

aggregate.utilities.lognorm_approx(ser)[source]: Lognormal approximation to series, index = loss values, values = density.

aggregate.utilities.lognorm_lev(mu, sigma, n, limit)[source]

return E(min(X, limit)^n) for lognormal using exact calculation currently only for n=1, 2

Parameters

mu –
sigma –
n –
limit –

Returns

aggregate.utilities.make_ceder_netter(reins_list, debug=False)[source]

Build the netter and ceder functions. It is applied to occ_reins and agg_reins, so should be stand-alone.

The reinsurance functions are piecewise linear functions from 0 to inf which kinks as needed to express the ceded loss as a function of subject (gross) loss.

For example, if reins_list = [(1, 10, 0), (0.5, 30, 20)] the program is 10 x 10 and 15 part of 30 x 20 (share=0.5). This requires nodes at 0, 10, 20, 50, and inf.

It is easiest to make the ceder function. Ceded loss at subject loss at x equals the sum of the limits below x plus the cession to the layer in which x lies. The variable base keeps track of the layer, h of the sum (height) of lower layers. xs tracks the knot points, ys the values.

Break (xs)   Ceded (ys)
     0            0
    10            0
    20           10
    50           25
   inf           25

For example:

%%sf 1 2

c, n, x, y = make_ceder_netter([(1, 10, 10), (0.5, 30, 20), (.25, np.inf, 50)], debug=True)

xs = np.linspace(0,250, 251)
ys = c(xs)

ax0.plot(xs, ys)
ax0.plot(xs, xs, ':C7')
ax0.set(title='ceded')

ax1.plot(xs, xs-ys)
ax1.plot(xs, xs, 'C7:')
ax1.set(title='net')

Parameters

reins_list – a list of (share of, limit, attach), e.g., (0.5, 3, 2) means 50% share of 3x2 or, equivalently, 1.5 part of 3 x 2. It is better to store share rather than part because it still works if limit == inf.
debug – if True, return layer function xs and ys in addition to the interpolation functions.

Returns

netter and ceder functions; optionally debug information.

aggregate.utilities.make_corr_matrix(vine_spec)[source]

Make a correlation matrix from a vine specification, https://en.wikipedia.org/wiki/Vine_copula.

A vine spececification is:

row 0: correl of X0...Xn-1 with X0
row 1: correl of X1....Xn-1 with X1 given X0
row 2: correl of X2....Xn-1 with X2 given X0, X1
etc.

For example

vs = np.array([[1,.2,.2,.2,.2],
               [0,1,.3,.3,.3],
               [0,0,1,.4, .4],
               [0,0,0,1,.5],
               [0,0,0,0,1]])
make_corr_matrix(vs)

Key fact is the partial correlation forumula

\[\rho(X,Y|Z) = \frac{(\rho(X,Y) - \rho(X,Z)\rho(Y,Z))}{\sqrt{(1-\rho(X,Z)^2)(1-\rho(Y,Z)^2)}}\]

and therefore

\[\rho(X,Y) = \rho(X,Z)\rho(Y,Z) + \rho(X,Y|Z) \sqrt((1-\rho(XZ)^2)(1-\rho(YZ)^2))\]

see https://en.wikipedia.org/wiki/Partial_correlation#Using_recursive_formula.

aggregate.utilities.make_mosaic_figure(mosaic, figsize=None, w=3.5, h=2.45, xfmt='great', yfmt='great', places=None, power_range=(-3, 3), sep='', unit='', sci=True, mathText=False, offset=True, return_array=False)[source]

make mosaic of axes apply format to xy axes default engineering format default w x h per subplot

xfmt=’d’ for default axis formatting, n=nice, e=engineering, s=scientific, g=great great = engineering with power of three exponents

if return_array then the returns are mor comparable with the old axiter_factory

aggregate.utilities.make_var_tvar(ser)[source]

Make var (lower quantile), upper quantile, and tvar functions from a pd.Series ser, which has index given by losses and p_total values.

ser must have a unique monotonic increasing index and all p_totals > 0.

Such a series comes from a.density_df.query('p_total > 0').p_total, for example.

Tested using numpy vs pd.Series lookup functions, and this version is much faster. See var_tvar_test_suite function below for testers (obviously run before this code was integrated).

Changed in v. 0.13.0

aggregate.utilities.moms_analytic(fz, limit, attachment, n, analytic=True)[source]

Return moments of $E[(X-attachment)^+ \wedge limit]^m$ for m = 1,2,…,n.

To check:

# fz = ss.lognorm(1.24)
fz = ss.gamma(6.234, scale=100)
# fz = ss.pareto(3.4234, scale=100, loc=-100)

a1 = moms_analytic(fz, 50, 1234, 3)
a2 = moms_analytic(fz, 50, 1234, 3, False)
a1, a2, a1-a2, (a1-a2) / a1

Parameters

fz – frozen scipy.stats instance
limit – double, limit (layer width)
attachment – double, limit
n – int, power
analytic – if True use analytic formula, else numerical integrals

aggregate.utilities.more(self, regex)[source]: Investigate self for matches to the regex. If callable, try calling with no args, else display.

aggregate.utilities.mu_sigma_from_mean_cv(m, cv)[source]: lognormal parameters

aggregate.utilities.mv(x, y=None)[source]

Nice display of mean and variance for Aggregate or Portfolios or entered values.

R style function, no return value.

Parameters

x – Aggregate or Portfolio or float
y – float, if x is a float

Returns

None

aggregate.utilities.nice_multiple(mx)[source]

Suggest a nice multiple for an axis with scale 0 to mx. Used by the MultipleLocator in discrete plots, where you want an integer multiple. Return 0 to let matplotlib figure the answer. Real issue is stopping multiples like 2.5.

Parameters: mx –
Returns

aggregate.utilities.parse_note(txt)[source]

Extract kwargs from txt note. Recognizes bs, log2, padding, normalize, recommend_p. CSS format. Split on ; and then look for k=v pairs bs can be entered as 1/32 etc.

Parameters: txt – input text
Return value: dictionary of keyword: typed value

aggregate.utilities.parse_note_ex(txt, log2, bs, recommend_p, kwargs)[source]: Avoid duplication: this is how the function is used in Underwriter.build.

aggregate.utilities.partial_e(sev_name, fz, a, n)[source]

Compute the partial expected value of fz. Computing moments is a bottleneck, so you want analytic computation for the most commonly used types.

Exponential (for mixed exponentials) implemented separate from gamma even though it is a special case.

for k=0,…,n as a np.array

To do: beta? weibull? Burr? invgamma, etc.

Parameters

sev_name – scipy.stats name for distribution
fz – frozen scipy.stats instance
a – double, limit for integral
n – int, power

Returns

partial expected value

aggregate.utilities.partial_e_numeric(fz, a, n)[source]: Simple numerical integration version of partial_e for auditing purposes.

aggregate.utilities.picks_work(attachments, layer_loss_picks, xs, sev_density, n=1, sf=None, debug=False)[source]

Adjust the layer unconditional expected losses to target. You need int xf(x)dx, but that is fraught when f is a mixed distribution. So we only use the int S version. fz was initially a frozen continuous distribution; but adjusted to sf function and dropped need for pdf function.

See notes for how the parts are defined. Notice that:

np.allclose(p.layers.v - p.layers.f, p.layers.l - p.layers.e)

is true.

Parameters

attachments – array of layer attachment points, in ascending order (bottom to top). a[0]>0
layer_loss_picks – Target means. If len(layer_loss_picks)==len(attachments) then the bottom layer, 0 to a[0], is added. Can be input as unconditional layer severity (i.e., $\mathbb{E}[(X-a)^+\wedge y]$) or as the layer loss pick (i.e., $\mathbb{E}[(X-a)^+\wedge y]'times n$ where n is the number of ground-up (to the insurer) claims. Multiplying and dividing by $S(a)$ shows this equals conditional severity in the layer times the number of claims in the layer.) Actuaries usually estimate the loss pick to the layer in pricing. When called from Aggregate the number of ground up claims is known.
en – ground-up expected claims. Target is divided by en.
xs – x values for discretization
sev_density – Series of existing severity density from Aggregate.
sf – cdf function for the severity distribution.
debug – if True, return debug information (layers, density with adjusted probs, audit of layer expected values.

aggregate.utilities.pprint(txt)[source]: Simple text version of pprint with line breaking

aggregate.utilities.pprint_ex(txt, split=0, html=False)[source]

Try to format an agg program. This is difficult because of dfreq and dsev, optional reinsurance, etc. Go for a simple approach of removing unnecessary spacing and removing notes. Notes can be accessed from the spec that is always to hand.

For long programs use split=60 or so, they are split at appropriate points.

Best to use html = True to get colorization.

Parameters

txt – program text input
split – if > 0 split lines at this length
html – if True return html (via pygments) , else return text

aggregate.utilities.qd(*argv, accuracy=3, align=True, trim=True, ff=None, **kwargs)[source]

Endless quest for a robust display format!

Quick display (qd) a list of objects. Dataframes handled in text with reasonable defaults. For use in documentation.

Param: argv: list of objects to print
Param: accuracy: number of decimal places to display
Param: align: if True, align columns at decimal point (sEngFormatter)
Param: trim: if True, trim trailing zeros (sEngFormatter)
Param: ff: if not None, use this function to format floats, or ‘basic’, or ‘binary’
Kwargs: passed to pd.DataFrame.to_string for dataframes only. e.g., pass dict of formatters by column.

aggregate.utilities.qdp(df)[source]: Quick describe with nice percentiles and cv for a dataframe.

aggregate.utilities.random_corr_matrix(n, p=1, positive=False)[source]

make a random correlation matrix

smaller p results in more extreme correlation 0 < p <= 1

Eg

rcm = random_corr_matrix(5, .8)
rcm
np.linalg.cholesky(rcm)

positive=True for all entries to be positive

aggregate.utilities.rearrangement_algorithm_max_VaR(df, p=0, tau=0.001, max_n_iter=100)[source]

Implementation of the Rearragement Algorithm (RA). Determines the worst p-VaR rearrangement of the input variables.

For loss random variables p is usually close to 1.

Embrechts, Paul, Giovanni Puccetti, and Ludger Ruschendorf, 2013, Model uncertainty and VaR aggregation, Journal of Banking and Finance 37, 2750–2764.

Worst-Case VaR

Worst value at risk arrangement of marginals.

See Actuarial Review article.

Worst TVaR / Variance arrangement of bivariate data = pair best with worst, second best with second worst, etc., called countermonotonic arangement.

More than 2 marginals: can’t make everything negatively correlated with everything else. If $X$ and $Y$ are negatively correlated and $Y$ and $Z$ are negatively correlated then $X$ and $Z$ will be positively correlated.

Next best attempt: make $X$ countermonotonic to $Y+Z$, $Y$ to $X+Z$ and $Z$ to $X+Y$. Basis of rearrangement algorithm.

The Rearrangement Algorithm

Randomly permute each column of $X$, the $N\times d$ matrix of top $1-p$ observations
Loop
- Create a new matrix $Y$ as follows. For column $j=1,\dots,d$
  - Create a temporary matrix $V_j$ by deleting the $j$th column of $X$
  - Create a column vector $v$ whose $i$th element equals the sum of the elements in the $i$th row of $V_j$
  - Set the $j$th column of $Y$ equal to the $j$th column of $X$ arranged to have the opposite order to $v$, i.e. the largest element in the $j$th column of $X$ is placed in the row of $Y$ corresponding to the smallest element in $v$, the second largest with second smallest, etc.
- Compute $y$, the $N\times 1$ vector with $i$th element equal to the sum of the elements in the $i$th row of $Y$ and let $y^*=\min(y)$ be the smallest element of $y$ and compute $x^*$ from $X$ similarly
- If $y^*-x^* \ge \epsilon$ then set $X=Y$ and repeat the loop
- If $y^*-x^* < \epsilon$ then break from the loop
The arrangement $Y$ is an approximation to the worst :math:` ext{VaR}_p` arrangement of $X$.

Parameters

df – Input DataFrame containing samples from each marginal. RA will only combine the top 1-p proportion of values from each marginal.
p – If p==0 assume df has already truncated to the top p values (for each marginal). Otherwise truncate each at the int(1-p * len(df))
tau – simulation tolerance
max_iter – maximum number of iterations to attempt

Returns

the top 1-p values of the rearranged DataFrame

aggregate.utilities.round_bucket(bs)[source]

Compute a decent rounded bucket from an input float bs.

if bs > 1 round to 2, 5, 10, ...

elif bs < 1 find the smallest power of two greater than bs

Test cases:

test_cases = [1, 1.1, 2, 2.5, 4, 5, 5.5, 8.7, 9.9, 10, 13,
              15, 20, 50, 100, 99, 101, 200, 250, 400, 457,
                500, 750, 1000, 2412, 12323, 57000, 119000,
                1e6, 1e9, 1e12, 1e15, 1e18, 1e21]
for i in test_cases:
    print(i, round_bucket(i))
for i in test_cases:
    print(1/i, round_bucket(1/i))

class aggregate.utilities.sEngFormatter(accuracy, min_prefix=-6, max_prefix=12, align=True, trim=True)[source]

Formats float values according to engineering format inside a range of exponents, and standard scientific notation outside.

Uses the same number of significant digits throughout. Optionally aligns at decimal point. That takes up more horizontal space but produces easier to read output.

Based on matplotlib.ticker.EngFormatter and pandas EngFormatter. Converts to scientific notation outside (smaller) range of prefixes. Uses same number of significant digits?

Testers:
sef1 = sEngFormatter(accuracy=5, min_prefix=0, max_prefix=12, align=True, trim=True)
sef2 = sEngFormatter(accuracy=5, min_prefix=0, max_prefix=12, align=False, trim=True)
sef3 = sEngFormatter(accuracy=5, min_prefix=0, max_prefix=12, align=True, trim=False)
sef4 = sEngFormatter(accuracy=5, min_prefix=0, max_prefix=12, align=False, trim=False)
test = [1.234 * 10**n for n in range(-20,20)]
test = [-i for i in test] + test
for sef in [sef1, sef2, sef3, sef4]:
    print('

‘.join([sef(i) for i in test])): print(‘
‘): print(‘===============’) test = [1.234 * 10**n + 3e-16 for n in range(-20,20)] test = [-i for i in test] + test for sef in [sef1, sef2, sef3, sef4]:

print(‘

‘.join([sef(i) for i in test]))

__init__(accuracy, min_prefix=-6, max_prefix=12, align=True, trim=True)[source]

remove_trailing_zeros(str_x, x, dps)[source]: Remove trailing zeros from a string representation str_x of a number x. The number of decimal places is dps. If the number is in scientific notation then there is no change. Eg with dps == 3, 1.2000 becomes 1.2, 1.000 becomes 1, but 1.200 when x=1.20000001 is unchanged.

aggregate.utilities.sensible_jump(n, desired_rows=20)[source]

return a sensible jump size to output desired_rows given input of n

Parameters

n –
desired_rows –

Returns

aggregate.utilities.sgamma_fit(m, cv, skew)[source]

method of moments shifted gamma fit matching given mean, cv and skewness

Parameters

m –
cv –
skew –

Returns

aggregate.utilities.show_fig(f, format='svg', **kwargs)[source]

Save a figure so it can be placed precisely in output. Used by Underwriter.show to interleaf tables and plots.

Parameters

f – a plt.Figure
format – svg or png
kwargs – passed to savefig

aggregate.utilities.sln_fit(m, cv, skew)[source]

method of moments shifted lognormal fit matching given mean, cv and skewness

Parameters

m –
cv –
skew –

Returns

aggregate.utilities.style_df(df)[source]

Style a df similar to pricinginsurancerisk.com styles.

graph background color is B4C3DC and figure (paler) background is F1F8F#

Dropped row lines; bold level0, caption

Parameters: df –
Returns: styled dataframe

aggregate.utilities.subsets(x)[source]: all non empty subsets of x, an interable

aggregate.utilities.suptitle_and_tight(title, **kwargs)[source]

deal with tight layout when there is a suptitle

Parameters: title –
Returns

aggregate.utilities.test_var_tvar(program, bs=0, n_ps=1025, normalize=False, speed_test=False, log2=16)[source]

Run a test suite of programs against new var and tvar functions compared to old aggregate.Aggregate versions

Suggestion:

args = [
    ('agg T dfreq [1,2,4,8] dsev [1]', 0, 1025, False),
    ('agg T dfreq [1:8] dsev [2:3]', 0, 1025, False),
    ('agg D dfreq [1:6] dsev [1]', 0, 7, False),
    ('agg T2 10 claims 100 x 0 sev lognorm 10 cv 1 poisson ', 1/64, 1025, False),
    ('agg T2 10 claims sev lognorm 10 cv 4 poisson ', 1/16, 1025, False),
    ('agg T2 10 claims sev lognorm 10 cv 4 mixed gamma 1.2 ', 1/8, 1025, False),
    ('agg Dice dfreq [1] dsev [1:6]', 0, 7, False)
]

from IPython.display import display, Markdown
for t in args:
    display(Markdown(f'## {t[0]}'))
    bs = t[1]
    test_suite(*t, speed_test=False, log2=16 if bs!= 1/8 else 20)

Expected output to show that new functions are 3+ orders of magnitude faster, and agree with the old functions. q and q_upper agree everywhere except the jumps.

aggregate.utilities.tweedie_convert(*, p=None, μ=None, σ2=None, λ=None, α=None, β=None, m=None, cv=None)[source]

Translate between Tweedie parameters. Input p, μ, σ2 or λ, α, β or λ, m, cv. Remaining parameters are computed and returned in pandas Series.

p, μ, σ2 are the reproductive parameters, μ is the mean and the variance equals σ2 μ^p λ, α, β are the additive parameters; λαβ is the mean, λα(α + 1) β^2 is the variance (α is the gamma shape and β is the scale). λ, m, cv specify the compound Poisson with expected claim count λ and gamma with mean m and cv

In addition, returns p0, the probability mass at 0.

aggregate.utilities.tweedie_density(x, *, p=None, μ=None, σ2=None, λ=None, α=None, β=None, m=None, cv=None)[source]: Exact density of Tweedie distribution from series expansion. Use any parameterization and convert between them with Tweedie convert. Coded for clarity and flexibility not speed. See tweedie_convert for parameterization.

aggregate.utilities.xsden_to_meancv(xs, den)[source]

Compute mean and cv from xs and density.

Consider adding: np.nan_to_num(den)

Note: cannot rely on pd.Series[-1] to work… it depends on the index. xs could be an index :param xs: :param den: :return:

aggregate.utilities.xsden_to_meancvskew(xs, den)[source]

Compute mean, cv and skewness from xs and density

Consider adding: np.nan_to_num(den)

param xs

param den

return

3.5.5. Constants

class aggregate.constants.Validation(value)[source]: An enumeration.