.. _2_x_samples: .. NEEDS WORK Working With Samples ==================== **Objectives:** How to sample from :mod:`aggregate` and how to a build a :class:`Portfolio` from a sample. Inducing correlation in a sample using the Iman-Conover algorithm and determining the worst-VaR rearrangement using the rearrangement algorithm. **Audience:** Planning and strategy, ERM, capital modeling, risk management actuaries. **Prerequisites:** DecL, aggregate distributions, risk measures. **See also:** :doc:`../5_technical_guides/5_x_samples`, :doc:`../5_technical_guides/5_x_iman_conover`, :doc:`../5_technical_guides/5_x_rearrangement_algorithm`. **Contents:** #. :ref:`Helpful References` #. :ref:`samp samp` #. :ref:`samp ic` #. :ref:`samp ra` #. :ref:`samp summary` Helpful References -------------------- * :cite:t:`PIR` chapter 14 and 15 * :cite:t:`Puccetti2012` * :cite:t:`Conover1999` * :cite:t:`Mildenhall2005a` * Vitale IC proof in dependency book .. See examples in /TELOS/Blog/agg/examples/IC_and_rearrangement.ipynb. Samples and Densities ----------------------- Use case: make realistic marginal distributions with ``aggregate`` that reflect the underlying frequency and severity (rather than defaulting to a lognormal determined by a CV assumption) and then use a sample in your simulation model. .. _samp samp: Samples from :mod:`aggregate` Object ------------------------------------- The method :meth:`sample` draws a sample from an :class:`Aggregate` or :class:`Portfolio` class object. Both cases work by applying ``pandas.DataFrame.sample`` to the object's ``density_df`` dataframe. **Examples.** 1. A sample from an :class:`Aggregate`. Set up a simple lognormal distribution, modeled as an aggregate with trivial frequency. .. ipython:: python :okwarning: from aggregate import build, qd, set_seed a01 = build('agg Samp:01 ' '1 claim ' 'sev lognorm 10 cv .4 ' 'fixed' , bs=1/512) qd(a01) Apply :meth:`sample` and display the results. .. ipython:: python :okwarning: set_seed(102) df = a01.sample(10**5) fc = lambda x: f'{x:8.2f}' qd(df.head(), float_format=fc) The sample histogram and the computed pmf are close. The pmf is adjusted to the resolution of the histogram. .. ipython:: python :okwarning: fig, ax = plt.subplots(1, 1, figsize=(3.5, 2.45), constrained_layout=True) xm = a01.q(0.999) df.hist(bins=np.arange(xm), ec='w', lw=.25, density=True, ax=ax, grid=False); (a01.density_df.loc[:xm, 'p_total'] / a01.bs).plot(ax=ax); @savefig samp_agg_hist.png scale=20 ax.set(title='Sample and aggregate pmf', ylabel='pmf'); 2. A sample from a :class:`Portfolio` produces a multivariate distribution. Setup a simple :class:`Portfolio` with three lognormal marginals. .. ipython:: python :okwarning: from aggregate.utilities import qdp from pandas.plotting import scatter_matrix p02 = build('port Samp:02 ' 'agg A 1 claim sev lognorm 10 cv .2 fixed ' 'agg B 1 claim sev lognorm 15 cv .5 fixed ' 'agg C 1 claim sev lognorm 5 cv .8 fixed ' , bs=1/128) qd(p02) Apply :meth:`sample` to produce a sample with no correlation. Here are the first few values. .. ipython:: python :okwarning: df = p02.sample(10**4) qd(df.head(), float_format=fc) :meth:`qdp` prints the pandas ``describe`` statistics dataframe for a dataframe, and adds the CV. .. ipython:: python :okwarning: qdp(df) The sample is independent, with correlations close to zero, as expected. .. ipython:: python :okwarning: abc = ['A', 'B', 'C'] qd(df[abc].corr()) The scatterplot is consistent with independent marginals. .. ipython:: python :okwarning: @savefig sample_corr1.png scale=20 scatter_matrix(df[abc], grid=False, figsize=(6, 6), diagonal='hist', hist_kwds={'density': True, 'bins': 25, 'lw': .25, 'ec': 'w'}, s=1, marker='.'); 3. Pass a correlation matrix to :meth:`sample` to draw a correlated sample. Correlation is induced using the Iman-Conover algorithm. The function :meth:`random_corr_matrix` creates a random correlation matrix using vines. The second parameter controls the average correlation. This example includes high positive correlation. .. ipython:: python :okwarning: from aggregate import random_corr_matrix rcm = random_corr_matrix(3, .6, True) rcm Re-sample with target correlation ``rcm``. The achieved correlation is reasonably close to the requested ``rcm``. .. ipython:: python :okwarning: df2 = p02.sample(10**4, desired_correlation=rcm) qd(df2.iloc[:, :3].corr('pearson')) The scatterplot now shows correlated marginals. The histograms are unchanged. .. ipython:: python :okwarning: df2['total'] = df2.sum(1) @savefig sample_corr2.png scale=20 scatter_matrix(df2[abc], grid=False, figsize=(6, 6), diagonal='hist', hist_kwds={'density': True, 'bins': 25, 'lw': .25, 'ec': 'w'}, s=1, marker='.'); The sample uses a different random state and produces a different draw. Comparing ``qdp`` output is one way to see if 10000 simulations is adequate. In this case there is good agreement. .. ipython:: python :okwarning: qdp(df2) .. _samp ic: Applying the Iman-Conover Algorithm --------------------------------------- The method :meth:`sample` automatically applies the Iman-Conover algorithm (described in :doc:`../5_technical_guides/5_x_iman_conover`). It is also easy to apply Iman-Conover to a dataframe using the method :meth:`aggregate.utilities.iman_conover`. It reorders the input dataframe to have the same rank correlation as a multivariate normal reference sample with the desired linear correlation. Optionally, a multivariate t-distribution can be used as the reference. **Examples.** Apply Iman-Conover to the sample ``df`` with target the correlation ``rcm``, reusing the variables created in the previous section. The achieved correlation is close to that requested, as shown in the last two blocks. .. ipython:: python :okwarning: from aggregate import iman_conover import pandas as pd ans = iman_conover(df[abc], rcm, add_total=False) qd(pd.DataFrame(rcm, index=abc, columns=abc)) qd(ans.corr()) Setting the argument ``dof`` uses a t-copula reference with ``dof`` degrees of freedom. The t-copula with low degrees of freedom can produce pinched multivariate distributions. Use with caution. .. ipython:: python :okwarning: ans = iman_conover(df[abc], rcm, dof=2, add_total=False) qd(ans.corr()) @savefig sample_corrt.png scale=20 scatter_matrix(ans, grid=False, figsize=(6, 6), diagonal='hist', hist_kwds={'density': True, 'bins': 25, 'lw': .25, 'ec': 'w'}, s=1, marker='.'); ===== See WP REF for ways to apply Iman-Conover with different reference distributions. **Details.** Creating the independent scores for Iman-Conover is quite time consuming. They are cached for a given sample size. Second and subsequent calls are far quicker (an order of magnitude) than the first call. .. _samp ra: Applying the Re-Arrangement Algorithm --------------------------------------- The method :meth:`rearrangement_algorithm_max_VaR` implements the re-arrangement algorithm described in :ref:`../5_technical_guides/5_x_rearrangement_algorithm`. It returns only the tail of the re-arrangement, since values below the requested percentile are irrelevant. Apply to ``df`` and request 0.999-VaR. The marginals are the 10 largest values. The algorithm permutes them to balance large and small observations. .. ipython:: python :okwarning: from aggregate import rearrangement_algorithm_max_VaR ans = rearrangement_algorithm_max_VaR(df.iloc[:, :3], .999) qd(ans, float_format=fc) Here are the stand-alone ``sa`` VaRs by marginal, in total for ``df``, in total for the correlated ``df2``, and the re-arrangement solutions ``ra`` for a range of different percentiles. The column ``comon total`` shows VaR for the comonotonic sum of the marginals (which equals the largest TVaR and variance re-arrangement). .. ipython:: python :okwarning: ps = [9000, 9500, 9900, 9960, 9990, 9999] sa = pd.concat([df[c].sort_values().reset_index(drop=True).iloc[ps] for c in df] +[df2.rename(columns={'total':'corr total'})['corr total'].\ sort_values().reset_index(drop=True).iloc[ps]], axis=1) sa['comon total'] = sa[abc].sum(1) ra = pd.concat([rearrangement_algorithm_max_VaR(df.iloc[:, :3], p/10000).iloc[0] for p in ps], axis=1, keys=ps).T exhibit = pd.concat([sa, ra], axis=1, keys=['stand-alone', 're-arrangement']) exhibit.index = [f'{x/10000:.2%}' for x in exhibit.index] exhibit.index.name = 'percentile' qd(exhibit, float_format=fc) See also :ref:`ra worked example`. .. _samp sample to portfolio: Creating a :class:`Portfolio` From a Sample --------------------------------------------- A :class:`Portfolio` can be created from an existing sample by passing in a dataframe rather than a list of aggregates. This approach is useful when another model has created the sample, but the user wants to access other ``aggregate`` functionality. Each marginal in the sample is created as a ``dsev`` with the sampled outcomes. The ``p_total`` column used to set scenario probabilities if its is input, otherwise each scenario is treated as equally likely. The :class:`Portfolio` ignores any the correlation structure of the sample; the marginals are treated as independent, but see :ref:`samp switcheroo` for a way around this assumption. **Example.** Create a simple discrete sample from a three unit portfolio. .. ipython:: python :okwarning: sample = pd.DataFrame( {'A': [20, 22, 24, 6, 5, 6, 7, 8, 21, 3], 'B': [20, 18, 16, 14, 12, 10, 8, 6, 4, 2], 'C': [0, 0, 0, 0, 0, 0, 0, 0, 20, 40]}) qd(sample) Pass to :class:`Portfolio` to create with these marginals. In this case, treat the marginals as discrete and update with ``bs=1``. .. ipython:: python :okwarning: from aggregate import Portfolio p03 = Portfolio('Samp:03', sample) p03.update(bs=1, log2=8) qd(p03) The univariate statistics for each marginal are the same as the sample input, but because they added independently, the totals differ. The sample has negative correlation and a lower CV. .. ipython:: python :okwarning: sample['total'] = sample.sum(1) qdp(sample) The :class:`Portfolio` total is a convolution of the input marginals and includes all possible combinations added independently. The figure plots the distribution functions. .. ipython:: python :okwarning: ax = p03.density_df.filter(regex='p_[ABCt]').cumsum().plot( drawstyle='steps-post', lw=1, figsize=(3.5, 2.45)) ax.plot(np.hstack((0, sample.total.sort_values())), np.linspace(0, 1, 11), drawstyle='steps-post', lw=2, label='dependent'); ax.set(xlim=[-2, 90]); @savefig samp_port_samp.png scale=20 ax.legend(loc='lower right'); .. _samp switcheroo: Using Samples and the Switcheroo Trick --------------------------------------- :class:`Portfolio` objects created from a sample ignore the dependency structure; the ``aggregate`` convolution algorithm always assumes independence. It is highly desirable to retain the sample's dependency structure. Many calculations rely only on :math:`\mathsf E[X_i\mid X]` and not the input densities per se. Thus, we reflect dependency if we alter the values :math:`\mathsf E[X_i\mid X]` based on a sample and recompute everything that depends on them. The method :meth:`Portfolio.add_exa_sample` implements this idea. **Example.** ``sample`` was chosen to have lots of ties - different ways of obtaining the same total outcome. .. ipython:: python :okwarning: qd(sample) Apply ``add_exa_sample`` to the ``sample`` dataframe and look at the outcomes with positive probability. When a total outcome can occur in multiple ways, ``exeqa_i`` gives the average value of unit ``i``. The function is applied to a copy of the original :class:`Portfolio` object because it invalidates various internal states. The output dataframe is indexed by total loss. Notice that rows sum to the correct total. .. ipython:: python :okwarning: p03sw = Portfolio('Samp:03sw', sample) p03sw.update(bs=1, log2=8) df = p03sw.add_exa_sample(sample) qd(df.query('p_total > 0').filter(regex='p_total|exeqa_[ABC]')) Swap the ``density_df`` dataframe --- the **switcheroo trick**. .. ipython:: python :okwarning: p03sw.density_df = df See the function ``Portfolio.create_from_sample`` for a single step create from sample, update, add exa calc, and switcheroo. Most :class:`Portfolio` spectral functions depend only on marginal conditional expectations. Applying these functions through ``p03sw`` reflects dependencies. Calibrate some distortions to a 15% return. The maximum loss is only 45, so use a 1-VaR, no default capital standard. .. ipython:: python :okwarning: p03sw.calibrate_distortions(ROEs=[0.15], Ps=[1], strict='ordered'); qd(p03sw.distortion_df) Apply the PH and dual to the independent and dependent portfolios. Asset level 45 is the 0.861 percentile of the independent. .. ipython:: python :okwarning: d1 = p03sw.dists['ph']; d2 = p03sw.dists['dual'] for d in [d1, d2]: print(d.name) print('='*74) pr = p03.price(1, d) pr45 = p03.price(.861, d) prsw = p03sw.price(1, d) a = pd.concat((pr.df, pr45.df, prsw.df), keys=['pr', 'pr45', 'prsw']) qd(a, float_format=lambda x: f'{x:7.3f}') .. There's a sneaky but effective way to add correlation. The idea is: * Make a portfolio with independent lines as usual * Pull a sample from each unit * Shuffle the sample to induce the correlation you want using Iman-Conover. You don't have to use a normal copula. * (Sneaky part): recompute :math:`\mathsf E[X_i \mid X]` functions with those from the sample. From there, you can compute everything you need to use the natural allocation because it works on the conditional expectations, not the actual sample. I call it the switcheroo operation. .. _samp summary: Summary of Objects Created by DecL ------------------------------------- Objects created by :meth:`build` in this guide. Objects created directly by class constructors are not entered into the knowledge database. .. ipython:: python :okwarning: :okexcept: from aggregate import pprint_ex for n, r in build.qlist('^Samp:').iterrows(): pprint_ex(r.program, split=20) .. ipython:: python :suppress: plt.close('all')