2.4.11. The `tweedie` Keyword

Prerequisites: Tweedie distribution from GLMs. Use of build.

The aggregate language keyword tweedie makes it easy to build Tweedie distributions. It uses reproductive parameters \(\mu, p, \sigma^2\) (mean, power, and dispersion), since these are most natural for GLM modeling.

Example.

The keyword is used as follows to produce \(\mathsf{Tw}_{1.05}(2, 5)\), mean 2, \(p=1.05\), and dispersion 5.

In [1]: from aggregate import build, qd, mv

In [2]: a15 = build('agg DecL:15 tweedie 2 1.05 5')

In [3]: qd(a15)

        E[X] Est E[X]    Err E[X]   CV(X) Est CV(X)  Skew(X) Est Skew(X)
X                                                                       
Freq 0.40671                        1.568              1.568            
Sev   4.9175   4.9175 -2.3315e-15 0.22942   0.22942  0.45883     0.45883
Agg        2        2 -2.2204e-15  1.6088    1.6088   1.6892      1.6892
log2 = 16, bandwidth = 1/512, validation: not unreasonable.

In [4]: mv(a15)
mean     = 2
variance = 10.35265
std dev  = 3.21755

In [5]: print(f'Expected variance = disp x mean ** p = {2**1.05 * 5:.5f}')
Expected variance = disp x mean ** p = 10.35265

Inspecting the (non-trivial parts of the) specification shows the parser converts it into the additive form:

In [6]: {k: v for k,v in a15.spec.items()
   ...:     if v!=0 and v is not None and v!=''}
   ...: 
Out[6]: 
{'name': 'DecL:15',
 'exp_en': 0.4067100332315139,
 'exp_limit': inf,
 'sev_name': 'gamma',
 'sev_a': 18.999999999999982,
 'sev_scale': 0.2588162309603446,
 'sev_wt': 1,
 'sev_ub': inf,
 'sev_conditional': True,
 'freq_name': 'poisson',
 'freq_p0': nan,
 'note': 'Tw(p=1.05, μ=2.0, σ^2=5.0) --> CP(λ= 0.40671, ga(α=19, β=0.25881623), scale=0.25881623'}

The note shows the compound Poisson specification.

The helper function tweedie_convert translates between parameterizations. The scale (dispersion) parameter \(\sigma^2\) has offsetting effects: higher \(\sigma^2\) results in a lower claim count, a higher gamma mean, and a more skewed aggregate distribution with a bigger mass at zero.

Example.

The code below shows the three Tweedie representations, starting with the easiest to interpret.

In [7]: from aggregate import tweedie_convert

In [8]: import pandas as pd

In [9]: p = 1.005; μ = 1; σ2 = 0.1;                    \
   ...: m0 = tweedie_convert(p=p, μ=μ, σ2=σ2);         \
   ...: λ = μ**(2-p) / ((2-p) * σ2);                   \
   ...: α = (2 - p) / (p - 1);                         \
   ...: β = μ / (λ * α);                               \
   ...: tw_cv = σ2**.5 * μ**(p/2-1);                   \
   ...: sev_m = α *  β;                                \
   ...: sev_cv = α**-0.5;                              \
   ...: m1 = tweedie_convert(λ=λ, m=sev_m, cv=sev_cv); \
   ...: m2 = tweedie_convert(λ=λ, α=α, β=β);
   ...: 

In [10]: assert np.allclose(m0, m1, m2)

In [11]: temp = pd.concat((m0, m1, m2), axis=1); \
   ....: temp.columns = ['mean p disp', 'lambda sev m cv', 'lambda shape scale'];
   ....: 

In [12]: with pd.option_context('display.float_format', lambda x: f'{x:.12g}'):
   ....:     print(temp)
   ....: 
             mean p disp   lambda sev m cv  lambda shape scale
μ                      1                 1                   1
p                  1.005             1.005               1.005
σ^2                  0.1               0.1                 0.1
λ          10.0502512563     10.0502512563       10.0502512563
α                    199               199                 199
β                 0.0005            0.0005              0.0005
tw_cv     0.316227766017    0.316227766017      0.316227766017
sev_m             0.0995            0.0995              0.0995
sev_cv   0.0708881205008   0.0708881205008     0.0708881205008
p0     4.31748997327e-05 4.31748997327e-05   4.31748997327e-05

Three different ways of specifying the same Tweedie distribution.

In [13]: program = f'''
   ....: agg DecL:16 {λ} claims sev gamma {sev_m:.8g} cv {sev_cv} poisson
   ....: agg DecL:17 {λ} claims sev {β:.4g} * gamma {α:.4g} poisson
   ....: agg DecL:18 tweedie {μ} {p} {σ2}
   ....: '''
   ....: 

In [14]: tweedies = build(program)

In [15]: for a in tweedies:
   ....:     print(a.program)
   ....:     qd(a.object.describe)
   ....:     print()
   ....: 
agg DecL:16 10.050251256281404 claims sev gamma 0.0995 cv 0.07088812050083283 poisson

       E[X]  Est E[X]    Err E[X]    CV(X)  Est CV(X)  Err CV(X)  Skew(X)  Est Skew(X)
X                                                                                     
Freq  10.05       NaN         NaN  0.31544        NaN        NaN  0.31544          NaN
Sev  0.0995    0.0995 -8.2379e-14 0.070888   0.070888   3.12e-06  0.14178      0.14177
Agg       1         1 -1.1036e-13  0.31623    0.31623 1.5599e-08  0.31781      0.31781

agg DecL:17 10.050251256281404 claims sev 0.0005 * gamma 199 poisson

       E[X]  Est E[X]    Err E[X]    CV(X)  Est CV(X)  Err CV(X)  Skew(X)  Est Skew(X)
X                                                                                     
Freq  10.05       NaN         NaN  0.31544        NaN        NaN  0.31544          NaN
Sev  0.0995    0.0995 -1.0825e-13 0.070888   0.070888   3.12e-06  0.14178      0.14177
Agg       1         1 -1.3756e-13  0.31623    0.31623 1.5599e-08  0.31781      0.31781

agg DecL:18 tweedie 1 1.005 0.1

       E[X]  Est E[X]    Err E[X]    CV(X)  Est CV(X)  Err CV(X)  Skew(X)  Est Skew(X)
X                                                                                     
Freq  10.05       NaN         NaN  0.31544        NaN        NaN  0.31544          NaN
Sev  0.0995    0.0995 -8.6597e-14 0.070888   0.070888   3.12e-06  0.14178      0.14177
Agg       1         1  -1.148e-13  0.31623    0.31623 1.5599e-08  0.31781      0.31781

Convert from reproductive form:

In [16]: tweedie_convert(p=1.05, μ=2, σ2=5)
Out[16]: 
μ          2.000000
p          1.050000
σ^2        5.000000
λ          0.406710
α         19.000000
β          0.258816
tw_cv      1.608777
sev_m      4.917508
sev_cv     0.229416
p0         0.665837
dtype: float64

Convert from additive form:

In [17]: tweedie_convert(λ=0.406710033, m=4.917508388, cv=0.229415734)
Out[17]: 
μ          2.000000
p          1.050000
σ^2        5.000000
λ          0.406710
α         19.000000
β          0.258816
tw_cv      1.608777
sev_m      4.917508
sev_cv     0.229416
p0         0.665837
dtype: float64

Build a Tweedie using reproductive parameters, p, mu, sigma2.

In [18]: a19 = build('agg DecL:19 tweedie 2 1.05 5')

In [19]: a19.plot()

In [20]: qd(a19)

        E[X] Est E[X]    Err E[X]   CV(X) Est CV(X)  Skew(X) Est Skew(X)
X                                                                       
Freq 0.40671                        1.568              1.568            
Sev   4.9175   4.9175 -2.3315e-15 0.22942   0.22942  0.45883     0.45883
Agg        2        2 -2.2204e-15  1.6088    1.6088   1.6892      1.6892
log2 = 16, bandwidth = 1/512, validation: not unreasonable.

In [21]: print(a19.spec)
{'name': 'DecL:19', 'exp_el': 0, 'exp_premium': 0, 'exp_lr': 0, 'exp_en': 0.4067100332315139, 'exp_attachment': None, 'exp_limit': inf, 'sev_name': 'gamma', 'sev_a': 18.999999999999982, 'sev_b': 0, 'sev_mean': 0, 'sev_cv': 0, 'sev_loc': 0, 'sev_scale': 0.2588162309603446, 'sev_xs': None, 'sev_ps': None, 'sev_wt': 1, 'sev_lb': 0, 'sev_ub': inf, 'sev_conditional': True, 'sev_pick_attachments': None, 'sev_pick_losses': None, 'occ_reins': None, 'occ_kind': '', 'freq_name': 'poisson', 'freq_a': 0, 'freq_b': 0, 'freq_zm': False, 'freq_p0': nan, 'agg_reins': None, 'agg_kind': '', 'note': 'Tw(p=1.05, μ=2.0, σ^2=5.0) --> CP(λ= 0.40671, ga(α=19, β=0.25881623), scale=0.25881623'}

In [22]: print(a19.cdf(0), np.exp(-.40671))
0.6658372329829731 0.6658372551097528

Example.

When p is close to 1, the Tweedie approaches a Poisson. Here mean = 10 and sigma2 = 1, so the distribution is not over-dispersed. The gamma severity has mean 1 and a very small CV; it acts like degenerate distribution at 1.

In [23]: a20 = build('agg DecL:20 tweedie 10 1.0001 1')

In [24]: a20.plot()

In [25]: qd(a20)

       E[X] Est E[X]   Err E[X]    CV(X) Est CV(X)  Skew(X) Est Skew(X)
X                                                                      
Freq 9.9987                      0.31625            0.31625            
Sev  1.0001   1.0001 1.9973e-11 0.010001  0.010004 0.019989    0.019977
Agg      10       10  1.997e-11  0.31626   0.31626   0.3163      0.3163
log2 = 16, bandwidth = 1/1024, validation: not unreasonable.

In [26]: tweedie_convert(p=1.0001, μ=10, σ2=1)
Out[26]: 
μ           10.000000
p            1.000100
σ^2          1.000000
λ            9.998698
α         9999.000000
β            0.000100
tw_cv        0.316264
sev_m        1.000130
sev_cv       0.010001
p0           0.000045
dtype: float64

Example.

When p is close to 2, the Tweedie approaches a Gamma. Here mean = 10, and sigma2=0.04. The variance equals sigma2 mu^2, so CV = sigma = 0.2

In [27]: a21 = build('agg DecL:21 tweedie 10 1.999 0.04', log2=16, bs=1/256)

In [28]: a21.plot()

In [29]: qd(a21)

           E[X]   Est E[X]   Err E[X]     CV(X) Est CV(X)   Skew(X) Est Skew(X)
X                                                                              
Freq      25058                       0.0063173           0.0063173            
Sev  0.00039908 0.00039773 -0.0033706    31.607    31.715    63.214       63.21
Agg          10     9.9663 -0.0033706   0.19977   0.20045   0.39934     0.39932
log2 = 16, bandwidth = 1/256, validation: fails sev mean, agg mean.

Build the same distribution explicitly from gamma severities. Here the gamma is built using mean and CV or shape and scale.

In [30]: tc = tweedie_convert(p=1.9999, μ=10, σ2=.04)

In [31]: print(tc)
μ             10.000000
p              1.999900
σ^2            0.040000
λ         250057.571255
α              0.000100
β              0.399868
tw_cv          0.199977
sev_m          0.000040
sev_cv        99.995000
p0             0.000000
dtype: float64

In [32]: m, cv = tc['μ'], tc['tw_cv']

In [33]: print(m, cv)
10.0 0.19997697547449372

In [34]: g = build(f'sev g gamma {m} cv {cv}')

In [35]: sh = cv ** -2; sc = m / sh

In [36]: print(sc, sh)
0.39990790719926267 25.005757125520617

In [37]: g2 = build(f'sev g2 {sc} * gamma {sh}')

In [38]: print(g2.stats(), g.stats())
(9.999999999999968, 3.999079071994487) (10.000000000000108, 3.999079071991673)

2.4.11.1. Analytic Error Analysis

There is a series expansion for the pdf of a Tweedie computed by conditioning on the number of claims and using that a convolution of gammas with the same scale parameter is again gamma. For a Tweedie with expected frequency \(\lambda\), gamma shape \(\alpha\) and scale \(\beta\), it is given by

\[f(x) = \sum_{n \ge 1} e^{-\lambda}\frac{\lambda^n}{n!}\frac{x^{n\alpha-1}e^{-x/\beta}}{\Gamma(n\alpha)\beta^{{n\alpha}}}\]

for \(x>0\) and \(f(x)=\exp(-\lambda)\). The exact function shows the FFT method is very accurate.

In [39]: from aggregate import tweedie_convert, build, qd

In [40]: from scipy.special import loggamma

In [41]: import matplotlib.pyplot as plt

In [42]: import numpy as np

In [43]: from pandas import option_context

In [44]: a = build('agg Tw tweedie 10 1.01 1')

In [45]: qd(a)

       E[X] Est E[X]    Err E[X]   CV(X) Est CV(X)  Skew(X) Est Skew(X)
X                                                                      
Freq 9.8711                      0.31829            0.31829            
Sev  1.0131   1.0131 -1.9318e-14  0.1005    0.1005  0.20101     0.20101
Agg      10       10 -2.2427e-14 0.31989   0.31989  0.32309     0.32309
log2 = 16, bandwidth = 1/1024, validation: not unreasonable.

In [46]: a.plot()

A Tweedie with \(p\) close to 1 approximates a Poisson. Its gamma severity is very peaked around its mean (high \(\alpha\) and offsetting small \(\beta\)).

The next function provides a transparent, if inefficient, implementation of the Tweedie density.

In [47]: def tweedie_density(x, mean, p, disp):
   ....:     pars = tweedie_convert(p=p, μ=mean, σ2=disp)
   ....:     λ = pars['λ']
   ....:     α = pars['α']
   ....:     β = pars['β']
   ....:     if x == 0:
   ....:         return np.exp(-λ)
   ....:     logl = np.log(λ)
   ....:     logx = np.log(x)
   ....:     logb = np.log(β)
   ....:     logbase = -λ
   ....:     log_term = 100
   ....:     const = -λ - x / β
   ....:     ans = 0.0
   ....:     for n in range(1, 2000): #while log_term > -20:
   ....:         log_term = (const  +
   ....:                     + n * logl  +
   ....:                     + (n * α - 1) * logx +
   ....:                     - loggamma(n+1) +
   ....:                     - loggamma(n * α) +
   ....:                     - n * α * logb)
   ....:         ans += np.exp(log_term)
   ....:         if n > 20 and log_term < -227:
   ....:             break
   ....:     return ans
   ....: 

The following graphs show that the FFT approximation is excellent, across a wide range, just as its good moment-matching performance suggests it would be.

In [48]: bit = a.density_df.loc[5:a.q(0.99):256, ['p']]

In [49]: bit['exact'] = [tweedie_density(i, 10, 1.01, 1) for i in bit.index]

In [50]: bit['p'] /= a.bs

In [51]: fig, axs = plt.subplots(1, 2, figsize=(2 * 3.5, 2.45), constrained_layout=True, squeeze=True)

In [52]: ax0, ax1 = axs.flat

In [53]: bit.plot(ax=ax0);

In [54]: ax0.set(ylabel='density');

In [55]: bit['err'] = bit.p / bit.exact - 1

In [56]: bit.err.plot(ax=ax1);

In [57]: ax1.set(ylabel='relative error', ylim=[-1e-5, 1e-5]);

2.4.11. The tweedie Keyword

2.4.11.1. Analytic Error Analysis

2.4.11. The `tweedie` Keyword