5.9. Working With Samples

Objectives: Describe working with samples including the switcheroo trick, the Iman-Conover algorithm and the re-arrangement algorithm.

Audience: Advanced users and programmers.

Prerequisites: DecL, general use of aggregate, probability. Probability, measures of association, multivariate distributions, matrix algebra.

Contents:

Helpful References
Using Samples and The Switcheroo Trick
The Iman-Conover Method
The Rearrangement Algorithm

5.9.1. Helpful References

Conover [1999]
Mildenhall [2005]
Puccetti and Ruschendorf [2012]
Embrechts et al. [2013]
Mildenhall and Major [2022], Section 4.2.5.

5.9.2. Using Samples and The Switcheroo Trick

Todo

Documentation to follow.

5.9.3. The Iman-Conover Method

5.9.3.1. Basic Idea

Here is the basic idea of the Iman-Conover method. Given samples of \(n\) values from two known marginal distributions \(X\) and \(Y\) and a desired correlation \(\rho\) between them, re-order the samples to have the same rank order as a reference distribution, of size \(n\times 2\), with linear correlation \(\rho\). Since linear correlation and rank correlation are typically close, the re-ordered output will have approximately the desired correlation structure. What makes the IC method work so effectively is the existence of easy algorithms to determine samples from reference distributions with prescribed linear correlation structures.

5.9.3.2. Theoretical Derivation

Suppose that \(\mathsf{M}\) is an \(n\) element sample from an \(r\) dimensional multivariate distribution, so \(\mathsf{M}\) is an \(n\times r\) matrix. Assume that the columns of \(\mathsf{M}\) are uncorrelated, have mean zero, and standard deviation one. Let \(\mathsf{M}'\) denote the transpose of \(\mathsf{M}\). These assumptions imply that the correlation matrix of the sample \(\mathsf{M}\) can be computed as \(n^{-1}\mathsf{M}'\mathsf{M}\), and because the columns are independent, \(n^{-1}\mathsf{M}'\mathsf{M}=\mathsf{id}\). (There is no need to scale the covariance matrix by the row and column standard deviations because they are all one. In general \(n^{-1}\mathsf{M}'\mathsf{M}\) is the covariance matrix of \(\mathsf{M}\).)

Let \(\mathsf{S}\) be a correlation matrix, i.e. \(\mathsf{S}\) is a positive semi-definite symmetric matrix with 1’s on the diagonal and all elements \(\le 1\) in absolute value. In order to rule out linearly dependent variables assume \(\mathsf{S}\) is positive definite. These assumptions ensure \(\mathsf{S}\) has a Choleski decomposition

\[\mathsf{S}=\mathsf{C}'\mathsf{C}\]

for some upper triangular matrix \(\mathsf{C}\), see Golub Golub or Press et al. Set \(\mathsf{T}=\mathsf{M}\mathsf{C}\). The columns of \(\mathsf{T}\) still have mean zero, because they are linear combinations of the columns of \(\mathsf{M}\) which have zero mean by assumption. It is less obvious, but still true, that the columns of \(\mathsf{T}\) still have standard deviation one. To see why, remember that the covariance matrix of \(\mathsf{T}\) is

\[n^{-1}\mathsf{T}'\mathsf{T}=n^{-1}\mathsf{C}'\mathsf{M}'\mathsf{M}\mathsf{C}=\mathsf{C}'\mathsf{C}=\mathsf{S},\label{coolA}\]

since \(n^{-1}\mathsf{M}'\mathsf{M}=\mathsf{id}\) is the identity by assumption. Now \(\mathsf{S}\) is actually the correlation matrix too because the diagonal is scaled to one, so the covariance and correlation matrices coincide. The process of converting \(\mathsf{M}\), which is easy to simulate, into \(\mathsf{T}\), which has the desired correlation structure \(\mathsf{S}\), is the theoretical basis of the IC method.

It is important to note that estimates of correlation matrices, depending on how they are constructed, need not have the mathematical properties of a correlation matrix. Therefore, when trying to use an estimate of a correlation matrix in an algorithm, such as the Iman-Conover, which actually requires a proper correlation matrix as input, it may be necessary to check the input matrix does have the correct mathematical properties.

Next we discuss how to make \(n\times r\) matrices \(\mathsf{M}\), with independent, mean zero columns. The basic idea is to take \(n\) numbers \(a_1,\dots,a_n\) with \(\sum_i a_i=0\) and \(n^{-1}\sum_i a_i^2=1\), use them to form one \(n\times 1\) column of \(\mathsf{M}\), and then to copy it \(r\) times. Finally randomly permute the entries in each column to make them independent as columns of random variables. Iman and Conover call the \(a_i\) “scores”. They discuss several possible definitions for the scores, including scaled versions of \(a_i=i\) (ranks) and \(a_i\) uniformly distributed. They note that the shape of the output multivariate distribution depends on the scores. All of the examples in their paper use normal scores. We will discuss normal scores here, and consider alternatives in Section 1.4.1.

Given that the scores will be based on normal random variables, we can either simulate \(n\) random standard normal variables and then shift and re-scale to ensure mean zero and standard deviation one, or we can use a stratified sample from the standard normal, \(a_i=\Phi^{-1}(i/(n+1))\). By construction, the stratified sample has mean zero which is an advantage. Also, by symmetry, using the stratified sample halves the number of calls to \(\Phi^{-1}\). For these two reasons we prefer it in the algorithm below.

The correlation matrix of \(\mathsf{M}\), constructed by randomly permuting the scores in each column, will only be approximately equal to \(\mathsf{id}\) because of random simulation error. In order to correct for the slight error which could be introduced Iman and Conover use another adjustment in their algorithm. Let \(\mathsf{EE}=n^{-1}\mathsf{M}'\mathsf{M}\) be the actual correlation matrix of \(\mathsf{M}\) and let \(\mathsf{EE}=\mathsf{F}'\mathsf{F}\) be the Choleski decomposition of \(\mathsf{EE}\), and define \(\mathsf{T}=\mathsf{M}\mathsf{F}^{-1}\mathsf{C}\). The columns of \(\mathsf{T}\) have mean zero, and the covariance matrix of \(\mathsf{T}\) is

\[\begin{split}\begin{aligned} n^{-1}\mathsf{T}'\mathsf{T} &= n^{-1}\mathsf{C}'\mathsf{F}'^{-1}\mathsf{M}'\mathsf{M}\mathsf{F}^{-1}\mathsf{C} \notag \\ &= \mathsf{C}'\mathsf{F}'^{-1}\mathsf{EE}\mathsf{F}^{-1}\mathsf{C} \notag \\ &= \mathsf{C}'\mathsf{F}'^{-1}\mathsf{F}'\mathsf{F}\mathsf{F}^{-1}\mathsf{C} \notag \\ &= \mathsf{C}' \mathsf{C} \notag \\ &= \mathsf{S},\label{icCorr}\end{aligned}\end{split}\]

and hence \(\mathsf{T}\) has correlation matrix exactly equal to \(\mathsf{S}\), as desired. If \(\mathsf{EE}\) is singular then the column shuffle needs to be repeated.

Now the reference distribution \(\mathsf{T}\) with exact correlation structure \(\mathsf{S}\) is in hand, all that remains to complete the IC method is to re-order the each column of the input distribution \(\mathsf{X}\) to have the same rank order as the corresponding column of \(\mathsf{T}\).

5.9.3.3. Algorithm

Here is a more algorithmic description of the IC method. The description uses normal scores and the Choleski method to determine the reference distribution. As we discussed above, it is possible to make other choices in place of these and they are discussed in Section 1.4. We will actually present two versions of the core algorithm. The first, called “Simple Algorithm” deals with the various matrix operations at a high level. The second “Detailed Algorithm” takes a more sophisticated approach to the matrix operations, including referencing appropriate Lapack routines. Lapack is a standard set of linear algebra functions. Software vendors provide very high performance implementations of Lapack, many of which are used in CPU benchmarks. Several free Windows implementations are available on the web. The reader should study the simple algorithm first to understand what is going in the IC method. In order to code a high performance implementation you should follow the steps outlined in the detailed algorithm. Both algorithms have the same inputs and outputs.

An \(n \times r\) matrix \(\mathsf{X}\) consisting of \(n\) samples from each of \(r\) marginal distributions, and a desired correlation matrix \(\mathsf{S}\).

The IC method does not address how the columns of \(\mathsf{X}\) are determined. It is presumed that the reader has sampled from the appropriate distributions in some intelligent manner. The matrix \(\mathsf{S}\) must be a correlation matrix for linearly independent random variables, so it must be symmetric and positive definite. If \(\mathsf{S}\) is not symmetric positive semi-definite the algorithm will fail at the Choleski decomposition step. The output is a matrix \(\mathsf{T}\) each of whose columns is a permutation of the corresponding column of \(\mathsf{X}\) and whose approximate correlation matrix is \(\mathsf{S}\).

Make one column of scores \(a_i=\Phi^{-1}(i/(n+1))\) for \(i=1,\dots,n\) and rescale to have standard deviation one.
Copy the scores \(r\) times to make the score matrix \(\mathsf{M}\).
Randomly permute the entries in each column of \(\mathsf{M}\).
Compute the correlation matrix \(\mathsf{EE}=n^{-1}\mathsf{M}'\mathsf{M}\) of \(\mathsf{M}\).
Compute the Choleski decomposition \(\mathsf{EE}=\mathsf{F}'\mathsf{F}\) of \(\mathsf{EE}\).
Compute the Choleski decomposition \(\mathsf{S}=\mathsf{C}'\mathsf{C}\) of the desired correlation matrix \(\mathsf{S}\).
Compute \(\mathsf{T}=\mathsf{M}\mathsf{F}^{-1}\mathsf{C}\). The matrix \(\mathsf{T}\) has exactly the desired correlation structure.
Let \(\mathsf{Y}\) be the input matrix \(\mathsf{X}\) with each column reordered to have exactly the same rank ordering as the corresponding column of \(\mathsf{T}\).
Compute the Choleski decomposition of \(\mathsf{S}\), \(\mathsf{S}=\mathsf{C}'\mathsf{C}\), with \(\mathsf{C}\) upper triangular. If the Choleski algorithm fails then \(\mathsf{S}\) is not a valid correlation matrix. Flag an error and exit. Checking \(\mathsf{S}\) is a correlation matrix in Step 1 avoids performing wasted calculations and allows the routine to exit as quickly as possible. Also check that all the diagonal entries of \(\mathsf{S}\) are 1 so \(\mathsf{S}\) has full rank. Again flag an error and exit if not. The Lapack routine DPOTRF can use be used to compute the Choleski decomposition. In the absence of Lapack, \(\mathsf{C}=(c_{ij})\) can be computed recursively using

\[c_{ij}=\frac{s_{ij}-\sum_{k=1}^{j-1} c_{ik}c_{jk}}{\sqrt{1-\sum_{k=1}^{j-1} c_{jk}^2}}\label{chol}\]

for \(1\le i\le j\le n\)—since all the diagonal elements of \(S\) equal one. The empty sum \(\sum_0^0=0\) and for \(j>i\) the denominator equals \(c_{ii}\) and the elements of \(\mathsf{C}\) should be calculated from left to right, top to bottom. See Wang or Herzog.
Let \(m=\lfloor n/2\rfloor\) be the largest integer less than or equal to \(n/2\) and \(v_i=\Phi^{-1}(i/(2m+1))\) for \(i=1,\dots,m\).
If \(n\) is odd set

\[\mathsf{v}=(v_m,v_{m-1},\dots,v_1,0,-v_1,\dots,-v_m)\]

and if \(n\) is even set

\[\mathsf{v}=(v_m,v_{m-1},\dots,v_1,-v_1,\dots,-v_m).\]

Here we have chosen to use normal scores. Other distributions could be used in place of the normal, as discussed in Section 1.4.1. Also note that by taking advantage of the symmetry of the normal distribution halves the number of calls to \(\Phi^{-1}\) which is relatively computationally expensive. If multiple calls will be made to the IC algorithm then store \(\mathsf{v}\) for use in future calls.
Form the \(n\times r\) score matrix \(\mathsf{M}\) from \(r\) copies of the scores vector \(\mathsf{v}\).
Compute \(m_{xx}=n^{-1}\sum_i v_i^2\), the variance of \(\mathsf{v}\). Note that \(\sum_i v_i=0\) by construction.
Randomly shuffle columns \(2,\dots,r\) of the score matrix.
Compute the correlation matrix \(\mathsf{EE}\) of the shuffled score matrix \(\mathsf{M}\). Each column of \(\mathsf{M}\) has mean zero, by construction, and variance \(m_{xx}\). The correlation matrix is obtained by dividing each element of \(\mathsf{M}'\mathsf{M}\) by \(m_{xx}\). The matrix product can be computed using the Lapack routine DGEMM. If \(\mathsf{EE}\) is singular repeat step 6.
Determine Choleski decomposition \(\mathsf{EE}=\mathsf{F}'\mathsf{F}\) of \(\mathsf{EE}\) using the Lapack routine DPOTRF. Because \(\mathsf{EE}\) is a correlation matrix it must be symmetric and positive definite and so is guaranteed to have a Choleski root.
Compute \(\mathsf{F}^{-1}\mathsf{C}\) using the Lapack routine DTRTRS to solve the linear equation \(\mathsf{F}\mathsf{A}=\mathsf{C}\) for \(\mathsf{A}\). Solving the linear equation avoids a time consuming matrix inversion and multiplication. The routine DTRTRS is optimized for upper triangular input matrices.
Compute the correlated scores \(\mathsf{T}=\mathsf{M}\mathsf{F}^{-1}\mathsf{C}=\mathsf{M}\mathsf{A}\) using DGEMM. The matrix \(\mathsf{T}\) has exactly the desired correlation structure.
Compute the ranks of the elements of \(\mathsf{T}\). Ranks are computed by indexing the columns of \(\mathsf{T}\) as described in Section 8.4 of Press et al. Let \(r(k)\) denote the index of the \(k\)th ranked element of \(\mathsf{T}\).
Let \(\mathsf{Y}\) be the \(n\times r\) matrix with \(i\)th column equal to the \(i\)th column of the input matrix \(\mathsf{X}\) given the same rank order as \(\mathsf{T}\). The re-ordering is performed using the ranks computed in the previous step. First sort the input columns into ascending order if they are not already sorted and then set \(\mathsf{Y}_{i,k}=\mathsf{X}_{i,r(k)}\).

The output of the algorithm is a matrix \(\mathsf{Y}\) each of whose columns is a permutation of the corresponding column of the input matrix \(\mathsf{X}\). The rank correlation matrix of \(\mathsf{Y}\) is identical to that of a multivariate distribution with correlation matrix \(\mathsf{S}\).

5.9.3.4. Simple Example of Iman-Conover

Having explained the IC method, we now give a simple example to explicitly show all the details. The example will work with \(n=20\) samples and \(r=4\) different marginals. The marginals are samples from four lognormal distributions, with parameters \(\mu=12,11,10,10\) and \(\sigma=0.15,0.25,0.35,0.25\). The input matrix is

\[\begin{split}\mathsf{X}= \begin{pmatrix} 123,567 & 44,770 & 15,934 & 13,273 \\ 126,109 & 45,191 & 16,839 & 15,406 \\ 138,713 & 47,453 & 17,233 & 16,706 \\ 139,016 & 47,941 & 17,265 & 16,891 \\ 152,213 & 49,345 & 17,620 & 18,821 \\ 153,224 & 49,420 & 17,859 & 19,569 \\ 153,407 & 50,686 & 20,804 & 20,166 \\ 155,716 & 52,931 & 21,110 & 20,796 \\ 155,780 & 54,010 & 22,728 & 20,968 \\ 161,678 & 57,346 & 24,072 & 21,178 \\ 161,805 & 57,685 & 25,198 & 23,236 \\ 167,447 & 57,698 & 25,393 & 23,375 \\ 170,737 & 58,380 & 30,357 & 24,019 \\ 171,592 & 60,948 & 30,779 & 24,785 \\ 178,881 & 66,972 & 32,634 & 25,000 \\ 181,678 & 68,053 & 33,117 & 26,754 \\ 184,381 & 70,592 & 35,248 & 27,079 \\ 206,940 & 72,243 & 36,656 & 30,136 \\ 217,092 & 86,685 & 38,483 & 30,757 \\ 240,935 & 87,138 & 39,483 & 35,108 \end{pmatrix}.\end{split}\]

Note that the marginals are all sorted in ascending order. The algorithm does not actually require pre-sorting the marginals but it simplifies the last step.

The desired target correlation matrix is

\[\begin{split}\mathsf{S}= \begin{pmatrix} 1.000 & 0.800 & 0.400 & 0.000\\ 0.800 & 1.000 & 0.300 & -0.200\\ 0.400 & 0.300 & 1.000 & 0.100\\ 0.000 & -0.200 & 0.100 & 1.000 \end{pmatrix}.\end{split}\]

The Choleski decomposition of \(\mathsf{S}\) is

\[\begin{split}\mathsf{C}= \begin{pmatrix} 1.000 & 0.800 & 0.400 & 0.000\\ 0.000 & 0.600 & -0.033 & -0.333\\ 0.000 & 0.000 & 0.916 & 0.097\\ 0.000 & 0.000 & 0.000 & 0.938 \end{pmatrix}.\end{split}\]

Now we make the score matrix. The basic scores are \(\Phi^{-1}(i/21)\), for \(i=1,\dots,20\). We scale these by \(0.868674836252965\) to get a vector \(\mathsf{v}\) with standard deviation one. Then we combine four \(\mathsf{v}\)’s and shuffle randomly to get

\[\begin{split}\mathsf{M}= \begin{pmatrix} -1.92062 & 1.22896 & -1.00860 & -0.49584 \\ -1.50709 & -1.50709 & -1.50709 & 0.82015 \\ -1.22896 & 1.92062 & 0.82015 & -0.65151 \\ -1.00860 & -0.20723 & 1.00860 & -1.00860 \\ -0.82015 & 0.82015 & 0.34878 & 1.92062 \\ -0.65151 & -1.22896 & -0.65151 & 0.20723 \\ -0.49584 & -0.65151 & 1.22896 & -0.34878 \\ -0.34878 & -0.49584 & -0.49584 & -0.06874 \\ -0.20723 & -1.00860 & 0.20723 & 0.65151 \\ -0.06874 & 0.49584 & 0.06874 & -1.22896 \\ 0.06874 & -0.34878 & -1.22896 & 0.49584 \\ 0.20723 & 0.34878 & 0.65151 & 0.34878 \\ 0.34878 & -0.06874 & -0.20723 & 1.22896 \\ 0.49584 & -1.92062 & -0.82015 & -0.20723 \\ 0.65151 & 0.20723 & 1.92062 & -1.92062 \\ 0.82015 & 1.00860 & 1.50709 & 1.50709 \\ 1.00860 & -0.82015 & -1.92062 & 1.00860 \\ 1.22896 & 1.50709 & 0.49584 & -1.50709 \\ 1.50709 & 0.06874 & -0.06874 & 0.06874 \\ 1.92062 & 0.65151 & -0.34878 & -0.82015 \end{pmatrix}.\end{split}\]

As described in Section 1.1, \(\mathsf{M}\) is approximately independent. In fact \(\mathsf{M}\) has covariance matrix

\[\begin{split}\mathsf{EE}= \begin{pmatrix} 1.0000 & 0.0486 & 0.0898 & -0.0960 \\ 0.0486 & 1.0000 & 0.4504 & -0.2408 \\ 0.0898 & 0.4504 & 1.0000 & -0.3192 \\ -0.0960 & -0.2408 & -0.3192 & 1.0000 \end{pmatrix}\end{split}\]

and \(\mathsf{EE}\) has Choleski decomposition

\[\begin{split}\mathsf{F}= \begin{pmatrix} 1.0000 & 0.0486 & 0.0898 & -0.0960\\ 0.0000 & 0.9988 & 0.4466 & -0.2364\\ 0.0000 & 0.0000 & 0.8902 & -0.2303\\ 0.0000 & 0.0000 & 0.0000 & 0.9391 \end{pmatrix}.\end{split}\]

Thus \(\mathsf{T}=\mathsf{M}\mathsf{F}^{-1}\mathsf{C}\) is given by

\[\begin{split}\mathsf{T}= \begin{pmatrix} -1.92062 & -0.74213 & -2.28105 & -1.33232 \\ -1.50709 & -2.06697 & -1.30678 & 0.54577 \\ -1.22896 & 0.20646 & -0.51141 & -0.94465 \\ -1.00860 & -0.90190 & 0.80546 & -0.65873 \\ -0.82015 & -0.13949 & -0.31782 & 1.76960 \\ -0.65151 & -1.24043 & -0.27999 & 0.23988 \\ -0.49584 & -0.77356 & 1.42145 & 0.23611 \\ -0.34878 & -0.56670 & -0.38117 & -0.14744 \\ -0.20723 & -0.76560 & 0.64214 & 0.97494 \\ -0.06874 & 0.24487 & -0.19673 & -1.33695 \\ 0.06874 & -0.15653 & -1.06954 & 0.14015 \\ 0.20723 & 0.36925 & 0.56694 & 0.51206 \\ 0.34878 & 0.22754 & -0.06362 & 1.19551 \\ 0.49584 & -0.77154 & 0.26828 & 0.03168 \\ 0.65151 & 0.62666 & 2.08987 & -1.21744 \\ 0.82015 & 1.23804 & 1.32493 & 1.85680 \\ 1.00860 & 0.28474 & -1.23688 & 0.59246 \\ 1.22896 & 1.85260 & 0.17411 & -1.62428 \\ 1.50709 & 1.20294 & 0.39517 & 0.13931 \\ 1.92062 & 1.87175 & -0.04335 & -0.97245 \end{pmatrix}.\end{split}\]

An easy calculation will verify that \(\mathsf{T}\) has correlation matrix \(\mathsf{S}\), as required.

To complete the IC method we must re-order each column of \(\mathsf{X}\) to have the same rank order as \(\mathsf{T}\). The first column does not change because it is already in ascending order. In the second column, the first element of \(\mathsf{Y}\) must be the 14th element of \(\mathsf{X}\), the second the 20th, third 10th and so on. The ranks of the other elements are the transpose of

\[\begin{split}\begin{pmatrix} 14 & 20 & 10 & 18 & 11 & 19 & 17 & 13 & 15 & 8 & 12 & 6 & 9 & 16 & 5 & 3 & 7 & 2 & 4 & 1\\ 20 & 19 & 16 & 4 & 14 & 13 & 2 & 15 & 5 & 12 & 17 & 6 & 11 & 8 & 1 & 3 & 18 & 9 & 7 & 10\\ 18 & 6 & 15 & 14 & 2 & 8 & 9 & 13 & 4 & 19 & 10 & 7 & 3 & 12 & 17 & 1 & 5 & 20 & 11 & 16 \end{pmatrix}\end{split}\]

and the resulting re-ordering of \(\mathsf{X}\) is

\[\begin{split}\mathsf{T}= \begin{pmatrix} 123,567 & 50,686 & 15,934 & 16,706 \\ 126,109 & 44,770 & 16,839 & 25,000 \\ 138,713 & 57,685 & 17,620 & 19,569 \\ 139,016 & 47,453 & 35,248 & 20,166 \\ 152,213 & 57,346 & 20,804 & 30,757 \\ 153,224 & 45,191 & 21,110 & 24,019 \\ 153,407 & 47,941 & 38,483 & 23,375 \\ 155,716 & 52,931 & 17,859 & 20,796 \\ 155,780 & 49,420 & 33,117 & 27,079 \\ 161,678 & 58,380 & 22,728 & 15,406 \\ 161,805 & 54,010 & 17,265 & 23,236 \\ 167,447 & 66,972 & 32,634 & 24,785 \\ 170,737 & 57,698 & 24,072 & 30,136 \\ 171,592 & 49,345 & 30,357 & 20,968 \\ 178,881 & 68,053 & 39,483 & 16,891 \\ 181,678 & 72,243 & 36,656 & 35,108 \\ 184,381 & 60,948 & 17,233 & 26,754 \\ 206,940 & 86,685 & 25,393 & 13,273 \\ 217,092 & 70,592 & 30,779 & 21,178 \\ 240,935 & 87,138 & 25,198 & 18,821 \end{pmatrix}.\end{split}\]

The rank correlation matrix of \(\mathsf{Y}\) is exactly \(\mathsf{S}\). The actual linear correlation is only approximately equal to \(\mathsf{S}\). The achieved value is

\[\begin{split}\begin{pmatrix} 1.00 & 0.85 & 0.26 & -0.11 \\ 0.85 & 1.00 & 0.19 & -0.20 \\ 0.26 & 0.19 & 1.00 & 0.10 \\ -0.11 & -0.20 & 0.10 & 1.00 \end{pmatrix},\end{split}\]

a fairly creditable performance given the input correlation matrix and the very small number of samples \(n=20\). When used with larger sized samples the IC method typically produces a very close approximation to the required correlation matrix, especially when the marginal distributions are reasonably symmetric.

5.9.3.5. Extensions of Iman-Conover

Following through the explanation of the IC method shows that it relies on a choice of multivariate reference distribution. A straightforward method to compute a reference is to use the Choleski decomposition method Equation ([coolA]) applied to certain independent scores. The example in Section 1.3 used normal scores. However nothing prevents us from using other distributions for the scores provided they are suitably normalized to have mean zero and standard deviation one.

Another approach to IC is to use a completely different multivariate distribution as reference. There are several other families of multivariate distributions, including the elliptically contoured distribution family (which includes the normal and \(t\) as a special cases) and multivariate Laplace distribution, which are easy to simulate from. Changing scores is actually an example of changing the reference distribution; however, for the examples we consider the exact form of the new reference is unknown.

5.9.3.6. Alternative Scores

The choice of score distribution has a profound effect on the multivariate distribution output by the IC method. The basic Iman-Conover algorithm uses normally distributed scores. We now show the impact of using exponentially and uniformly distributed scores.

The next figure shows three bivariate distributions with identical marginal distributions (shown in the lower right hand plot), the same correlation coefficient of \(0.643\pm 0.003\) but using normal scores (top left), exponential scores (top rigtht) and uniform scores (lower left). The input correlation to the IC method was 0.65 in all three cases and there are 1000 pairs in each plot. Here the IC method produced bivariate distributions with actual correlation coefficient extremely close to the requested value.

The normal scores produce the most natural looking bivariate distribution, with approximately elliptical contours. The bivariate distributions with uniform or exponential scores look unnatural, but it is important to remember that if all you know about the bivariate distribution are the marginals and correlation coefficient all three outcomes are possible.

Bivariate distributions with normal, uniform and exponential scores.

Figure MISSING shows the distribution of the sum of the two marginals for each of the three bivariate distributions and for independent marginals. The sum with exponential scores has a higher kurtosis (is more peaked) than with normal scores. As expected all three dependent sums have visibly thicker tails than the independent sum.

Iman and Conover considered various different score distributions in their paper. They preferred normal scores as giving more natural looking, elliptical contours. Certainly, the contours produced using exponential or uniform scores appear unnatural. If nothing else they provide a sobering reminder that knowing the marginal distributions and correlation coefficient of a bivariate distribution does not come close to fully specifying it!

5.9.3.7. Multivariate Reference Distributions

The IC method needs some reference multivariate distribution to determine an appropriate rank ordering for the input marginals. So far we have discussed using the Choleski decomposition trick in order to determine a multivariate normal reference distribution. However, any distribution can be used as reference provided it has the desired correlation structure. Multivariate distributions that are closely related by formula to the multivariate normal, such as elliptically contoured distributions and asymmetric Laplace distributions, can be simulated using the Choleski trick.

Elliptically contoured distributions are a family which extends the normal. For a more detailed discussion see Fang and Zhang. The multivariate \(t\)-distribution and symmetric Laplace distributions are in the elliptically contoured family. Elliptically contoured distributions must have characteristic equations of the form

\[\Phi(\mathsf{t})=\exp(i\mathsf{t}'\mathsf{m})\phi(\mathsf{t}'\mathsf{S}\mathsf{t})\]

for some \(\phi:\mathsf{R}\to\mathsf{R}\), where \(\mathsf{m}\) is an \(r\times 1\) vector of means and \(\mathsf{S}\) is a \(r\times r\) covariance matrix (nonnegative definite and symmetric). In one dimension the elliptically contoured distributions coincide with the symmetric distributions. The covariance is \(\mathsf{S}\), if it is defined.

If \(\mathsf{S}\) has rank \(r\) then an elliptically contoured distribution \(\mathsf{x}\) has a stochastic representation

\[\mathsf{x}=\mathsf{m} + R\mathsf{T}' \mathsf{u}^{(r)}\]

where \(\mathsf{T}\) is the Choleski decomposition of \(\mathsf{S}\), so \(\mathsf{S}=\mathsf{T}'\mathsf{T}\), \(\mathsf{u}^{(r)}\) is a uniform distribution on the sphere in \(\mathsf{R}^r\), and \(R\) is a scale factor independent of \(\mathsf{u}^{(r)}\). The idea here should be clear: pick a direction on the sphere, adjust by \(\mathsf{T}\), scale by a distance \(R\) and finally translate by the means \(\mathsf{m}\). A uniform distribution on a sphere can be created as \(\mathsf{x}/\Vert \mathsf{x}\Vert\) where \(\mathsf{x}\) has a multivariate normal distribution with identity covariance matrix. (By definition, \(\Vert \mathsf{x}\Vert^2=\sum_i x_i^2\) has a \(\chi^2_r\) distribution.) Uniform vectors \(\mathsf{u}^{(r)}\) can also be created by applying a random orthogonal matrix to a fixed vector \((1,0,\dots,0)\) on the sphere. Diaconis describes a method for producing random orthogonal matrices.

The \(t\)-copula with \(\nu\) degrees of freedom has a stochastic representation

\[\mathsf{x}=\mathsf{m} + \frac{\sqrt{\nu}}{\sqrt{S}}\mathsf{z}\label{tsim}\]

where \(S\sim \chi^2_{\nu}\) and \(\mathsf{z}\) is multivariate normal with means zero and covariance matrix \(\mathsf{S}\). Thus one can easily simulate from the multivariate \(t\) by first simulating multivariate normals and then simulating an independent \(S\) and multiplying.

The multivariate Laplace distribution is discussed in Kotz, Kozubowski and Podgorski. It comes in two flavors: symmetric and asymmetric. The symmetric distribution is also an elliptically contoured distribution. It has characteristic function of the form

\[\Phi(\mathsf{t})=\frac{1}{1+ \mathsf{t}'\mathsf{S}\mathsf{t} / 2}\label{symLaplace}\]

where \(\mathsf{S}\) is the covariance matrix. To simulate, use the fact that \(\sqrt{W}\mathsf{X}\) has a symmetric Laplace distribution if \(W\) is exponential and \(\mathsf{X}\) a multivariate normal with covariance matrix \(\mathsf{S}\).

The multivariate asymmetric Laplace distribution has characteristic function

\[\Psi(\mathsf{t})=\frac{1}{1+\mathsf{t}'\mathsf{S}\mathsf{t}/2 - i\mathsf{m}'\mathsf{t}}.\label{asymLaplace}\]

To simulate from WHAT, use the fact that

\[\mathsf{m} W + \sqrt{W}\mathsf{X} \label{aslsim}\]

has a symmetric Laplace distribution if \(W\) is exponential and \(\mathsf{X}\) a multivariate normal with covariance matrix \(\mathsf{S}\) and means zero. The asymmetric Laplace is not an elliptically contoured distribution.

The next figure compares IC samples produced using a normal copula to those produced with a \(t\)-copula. In both cases the marginals are normally distributed with mean zero and unit standard deviation. The \(t\)-copula has \(\nu=2\) degrees of freedom. In both figures the marginals are uncorrelated, but in the right the marginals are not independent. The \(t\)-copula has pinched tails, similar to Venter’s Heavy Right Tailed copulas.

IC samples produced from the same marginal and correlation matrix using the normal and \(t\) copula reference distributions.

5.9.3.8. Algorithms for Extended Methods

In Section 1.4.2 we described how the IC method can be extended by using different reference multivariate distributions. It is easy to change the IC algorithm to incorporate different reference distributions for \(t\)-copulas and asymmetric Laplace distributions. Follow the detailed algorithm to step 10. Then use the stochastic representation to simulate from the scaling distribution for each row and multiply each component by the resulting number, resulting in an adjusted \(\mathsf{T}\) matrix. Then complete steps 11 and 12 of the detailed algorithm.

5.9.3.9. Comparison With the Normal Copula Method

By the normal copula method we mean the following algorithm, described in Wang or Herzog.

A set of correlated risks \((X_1,\dots,X_r)\) with marginal cumulative distribution functions \(F_i\) and Kendall’s tau \(\tau_{ij}=\tau(X_i,X_j)\) or rank correlation coefficients \(r(X_i,X_j)\).

Convert Kendall’s tau or rank correlation coefficient to correlation using

\[\rho_{ij}=\sin(\pi\tau_{ij}/2)=2\sin(\pi r_{ij}/6)\]

and construct the Choleski decomposition \(\mathsf{S}=\mathsf{C}'\mathsf{C}\) of \(\mathsf{S}=(\rho_{ij})\).
Generate \(r\) standard normal variables \(\mathsf{Y}=(Y_1,\dots,Y_r)\).
Set \(\mathsf{Z}=\mathsf{Y}\mathsf{C}\).
Set \(u_i=\Phi(Z_i)\) for \(i=1,\dots,r\).
Set \(X_i=F_i^{-1}(u_i)\).

The vectors \((X_1,\dots,X_r)\) form a sample from a multivariate distribution with prescribed correlation structure and marginals \(F_i\).

The Normal Copula method works because of the following theorem from Wang.

[wangThm] Assume that \((Z_1,\dots,Z_k)\) have a multivariate normal joint probability density function given by

\[f(z_1,\dots,z_k)=\frac{1}{\sqrt{(2\pi)^n|\Sigma|}}\exp(-\mathsf{z}'\Sigma^{-1}\mathsf{z}/2),\]

\(\mathsf{z}=(z_1,\dots,z_k)\), with correlation coefficients \(\Sigma_{ij}=\rho_{ij}=\rho(Z_i,Z_j)\). Let \(H(z_1,\dots,z_k)\) be their joint cumulative distribution function. Then

\[C(u_1,\dots,u_k)=H(\Phi^{-1}(u_1),\dots,\Phi^{-1}(u_k))\]

defines a multivariate uniform cumulative distribution function called the normal copula.

For any set of given marginal cumulative distribution functions \(F_1,\dots,F_k\), the set of variables

\[\label{ncm} X_1=F_1^{-1}(\Phi(Z_1)),\dots,X_k=F_1^{-1}(\Phi(Z_k))\]

have a joint cumulative function

\[F_{X_1,\dots,X_k}(x_1,\dots,x_k)=H(\Phi^{-1}(F_x(u_1)),\dots, \Phi^{-1}(F_k(u_k))\]

with marginal cumulative distribution functions \(F_1,\dots,F_k\). The multivariate variables \((X_1,\dots,X_k)\) have Kendall’s tau

\[\tau(X_i,X_j)=\tau(Z_i,Z_j)=\frac{2}{\pi}\arcsin(\rho_{ij})\]

and Spearman’s rank correlation coefficients

\[\text{rkCorr}(X_i,X_j)=\text{rkCorr}(Z_i,Z_j)=\frac{6}{\pi}\arcsin(\rho_{ij}/2)\]

In the normal copula method we simulate from \(H\) and then invert using ([ncm]). In the IC method with normal scores we produce a sample from \(H\) such that \(\Phi(z_i)\) are equally spaced between zero and one and then, rather than invert the distribution functions, we make the \(j\)th order statistic from the input sample correspond to \(\Phi(z)=j/(n+1)\) where the input has \(n\) observations. Because the \(j\)th order statistic of a sample of \(n\) observations from a distribution \(F\) approximates \(F^{-1}(j/(n+1))\) we see the normal copula and IC methods are doing essentially the same thing.

While the normal copula method and the IC method are confusingly similar there are some important differences to bear in mind. Comparing and contrasting the two methods should help clarify how the two algorithms are different.

Wang [1998] shows the normal copula method corresponds to the IC method when the latter is computed using normal scores and the Choleski trick.
The IC method works on a given sample of marginal distributions. The normal copula method generates the sample by inverting the distribution function of each marginal as part of the simulation process.
Though the use of scores the IC method relies on a stratified sample of normal variables. The normal copula method could use a similar method, or it could sample randomly from the base normals. Conversely a sample could be used in the IC method.
Only the IC method has an adjustment to ensure that the reference multivariate distribution has exactly the required correlation structure.
IC method samples have rank correlation exactly equal to a sample from a reference distribution with the correct linear correlation. Normal copula samples have approximately correct linear and rank correlations.
An IC method sample must be taken in its entirety to be used correctly. The number of output points is fixed by the number of input points, and the sample is computed in its entirety in one step. Some IC tools (@Risk, SCARE) produce output which is in a particular order. Thus, if you sample the \(n\)th observation from multiple simulations, or take the first \(n\) samples, you will not get a random sample from the desired distribution. However, if you select random rows from multiple simulations (or, equivalently, if you randomly permute the rows output prior to selecting the \(n\)th) then you will obtain the desired random sample. It is important to be aware of these issues before using canned software routines.
The normal copula method produces simulations one at a time, and at each iteration the resulting sample is a sample from the required multivariate distribution. That is, output from the algorithm can be partitioned and used in pieces.

In summary remember these differences can have material practical consequences and it is important not to misuse IC method samples.

5.9.3.10. Theoretical Underpinnings of the Iman-Conover Method

The theoretical foundations of the Iman-Conover method are elegantly justified by Vitale’s Theorem Vitale [1990]. We will state Vitale’s theorem, explain its relationship to the IC method, and sketch the proof. The result should give a level of comfort to practitioners using a simulation approach to modeling multivariate distributions. It is not necessary to follow the details laid out here in order to understand and use the IC method, so the uninterested reader can skip the rest of the section. The presentation we give follows Vitale’s original paper Vitale [1990] closely.

Functional dependence and independence between two random variables are clearly opposite ends of the dependence spectrum. It is therefore surprising that Vitale’s Theorem says that any bivariate distribution \((U,V)\) can be approximated arbitrarily closely by a functionally dependent pair \((U,TU)\) for a suitable transformation \(T\).

In order to explain the set up of Vitale’s theorem we need to introduce some notation. Let \(n\) be a power of 2. An interval of the form \(((j-1)/n,j/n)\) for some \(n\ge 1\) and \(1\le j\le n\) is called a dyadic interval of rank \(n\). An invertible (Borel) measure-preserving map which maps by translation on each dyadic interval of rank \(n\) is called a permutation of rank \(n\). Such a \(T\) just permutes the dyadic intervals, so there is a natural correspondence between permutations of \(n\) elements and transformations \(T\). If the permutation of dyadic intervals has a single cycle (has order \(n\) in the symmetric group) then \(T\) is called a cyclic permutation.

Theorem. (Vitale) Let \(U\) and \(V\) be uniformly distributed variables. There is a sequence of cyclic permutations \(T_1,T_2,\dots\) such that \((U,T_nU)\) converges in distribution to \((U,V)\) as \(n\to \infty\).

Recall convergence in distribution means that the distribution function of \((U,T_nU)\) tends to that of \((U,V)\) at all points of continuity as \(n\to\infty\).

The proof of Vitale’s theorem is quite instructive and so we give a detailed sketch.

The proof is in two parts. The first constructs a sequence of arbitrary permutations \(T_n\) with the desired property. The second part shows it can be approximated with cyclic permutations. We skip the second refinement.

Divide the square \([0,1]\times [0,1]\) into sub-squares. We will find a permutation \(T\) such that the distributions of \((U,V)\) and \((U,TU)\) coincide on sub-squares. Reducing the size of the sub-squares will prove the result.

Fix \(n\), a power of two. Let \(I_j=((j-1)/n,j/n)\), \(j=1,\dots,n\). We will find an invertible permutation \(T\) such that

\[\Pr(U\in I_j,TU\in I_k)=\Pr(U\in I_j,V\in I_k):=p_{jk}\]

for \(j,k=1,\dots,n\). Define

\[\begin{split}\begin{aligned} I_{j1}&=((j-1)/n, (j-1)/n+p_{j1}) \\ I_{j2}&=((j-1)/n +p_{j1}, (j-1)/n+p_{j1}+p_{j2}) \\ && \cdots \\ I_{jn}&=((j-1)/n +p_{j1}+\cdots+p_{j,n-1}, j/n)\end{aligned}\end{split}\]

and

\[\begin{split}\begin{aligned} \tilde I_{j1}&=((j-1)/n, (j-1)/n+p_{1j}) \\ \tilde I_{j2}&=((j-1)/n +p_{1j}, (j-1)/n+p_{1j}+p_{2j}) \\ && \cdots \\ \tilde I_{jn}&=((j-1)/n +p_{1j}+\cdots+p_{n-1,j}, j/n).\end{aligned}\end{split}\]

By construction the measure of \(I_{jk}\) equals the measure of \(\tilde I_{kj}\). The invertible map \(T\) which sends each \(I_{jk}\) to \(\tilde I_{kj}\) by translation is the map we need because

\[\begin{split}\begin{aligned} \Pr(U\in I_j, T(U)\in I_k) &= \Pr(U\in I_j, U \in T^{-1}(I_k)) \\ &= \Pr(U\in I_j \cap T^{-1}(\bigcup_l \tilde I_{kl})) \\ &= \Pr(U\in \bigcup_l I_j \cap I_{lk}) \\ &= \Pr(U\in I_{jk}) \\ &= p_{jk},\end{aligned}\end{split}\]

since the only \(I_{lk}\) which intersects \(I_j\) is \(I_{jk}\) by construction, and \(U\) is uniform. The transformation \(T\) is illustrated schematically in Table 1 for \(n=3\). The fact 3 is not a power of 2 does not invalidate the schematic!

\[\begin{split}\small \begin{matrix} \begin{array}{c || c||c|c|c||c|c|c||c|c|c||} \hline & \tilde I_{33} & & & & & & & & & p_{33} \\ I_3 & \tilde I_{32} & & & & & & p_{23} & & & \\ & \tilde I_{31} & & & p_{13} & & & & & & \\ \hline \hline & \tilde I_{23} & & & & & & & & p_{32} & \\ I_2 & \tilde I_{22} & & & & & p_{22} & & & & \\ & \tilde I_{21} & & p_{12} & & & & & & & \\ \hline \hline & \tilde I_{13} & & & & & & & p_{31} & & \\ I_1 & \tilde I_{12} & & & & p_{21} & & & & & \\ & \tilde I_{11} & p_{11} & & & & & & & & \\ \hline \hline & & I_{11} & I_{12} & I_{13} & I_{21} & I_{22} & I_{13} & I_{31} & I_{32} & I_{33} \\ \hline \hline & & & I_1 & & & I_2 & & & I_3 \\ \end{array} \end{matrix}\end{split}\]

If each \(p_{jk}\) is a dyadic rational then \(T\) is a permutation of the interval. If not then we approximate and use some more heavy duty results (a 1946 theorem of Birkhoff on representation by convex combinations of permutation matrices) to complete the proof.

Vitale’s theorem can be extended to non-uniform distributions.

Corollary. (Vitale) Let \(U\) and \(V\) be arbitrary random variables. There is a sequence of functions \(S_1,S_2,\dots\) such that \((U,S_n U)\) converges in distribution to \((U,V)\) as \(n\to\infty\).

Let \(F\) be the distribution function of \(U\) and \(G\) for \(V\). Then \(F(U)\) and \(G(V)\) are uniformly distributed. Apply Vitale’s theorem to get a sequence of functions \(T_n\). Then \(S_n=G^{-1}T_nF\) is the required transformation.

5.9.4. The Rearrangement Algorithm

The Rearrangement Algorithm (RA) is a practical and straightforward method to determine the worst-VaR sum. The RA works by iteratively making each marginal crossed (counter-monotonic) with the sum of the other marginal distributions. It is easy to program and suitable for problems involving hundreds of variables and millions of simulations.

The Rearrangement Algorithm was introduced in Puccetti and Ruschendorf [2012] and subsequently improved in Embrechts et al. [2013].

Algorithm Input: Input samples are arranged in a matrix \(\tilde X = (x_{ij})\) with \(i=1,\dots, M\) rows corresponding to the simulations and \(j=1,\dots, d\) columns corresponding to the different marginals. VaR probability parameter \(p\). Accuracy threshold \(\epsilon>0\) specifies convergence criterion.

Algorithm Steps

Sort each column of \(\tilde X\) in descending order.
Set \(N := \lceil (1-p)M \rceil\).
Create matrix \(X\) as the \(N\times d\) submatrix of the top \(N\) rows of \(\tilde X\).
Randomly permute rows within each column of \(X\).
Do Loop
- Create a new matrix \(Y\) as follows. For column \(j=1,\dots,d\):
  - Create a temporary matrix \(V_j\) by deleting the \(j\)th column of \(X\)
  - Create a column vector \(v\) whose \(i\)th element equals the sum of the elements in the \(i\)th row of \(V_j\)
  - Set the \(j\)th column of \(Y\) equal to the \(j\)th column of \(X\) arranged to have the opposite order to \(v\), i.e., the largest element in the \(j\)th column of \(X\) is placed in the row of \(Y\) corresponding to the smallest element in \(v\), the second largest with second smallest, etc.
- Compute \(y\), the \(N\times 1\) vector with \(i\)th element equal to the sum of the elements in the \(i\)th row of \(Y\).
- Compute \(x\) from \(X\) similarly.
- Compute \(y^{\ast}:=\min(y)\), the smallest element of \(y\).
- Compute \(x^{\ast}:=\min(x)\).
- If \(y^{\ast}-x^{\ast} \ge \epsilon\) then set \(X:=Y\) and repeat the loop.
- If \(y^{\ast}-x^{\ast} < \epsilon\) then break from the loop.
Stack \(Y\) on top of the \((M-N)\times d\) submatrix of \(M-N\) bottom rows of \(\tilde X\).
Output: The result approximates the worst \(\mathsf{VaR}_p\) arrangement of \(\tilde X\).

Only the top \(N\) values need special treatment; all the smaller values can be combined arbitrarily because they aren’t included in the worst-VaR rearrangement. Given that \(X\) consists of the worst \(1-p\) proportion of each marginal, the required estimated \(\mathsf{VaR}_p\) is the least row sum of \(Y\), that is \(y^{\ast}\). In implementation, \(x^{\ast}\) can be carried forward as the \(y^{\ast}\) from the previous iteration and not recomputed. The statistics \(x^{\ast}\) and \(y^{\ast}\) can be replaced with the variance of the row-sums of \(X\) and \(Y\) and yield essentially the same results.

Embrechts et al. [2013] report that while there is no analytic proof the algorithm always works, it performs very well based on examples and tests where we can compute the answer analytically.

5.9.4.1. Worked Example

Setup. Compute the worst \(\mathsf{VaR}_{0.99}\) of the sum of lognormal distributions with mean 10 and coefficient of variations 1, 2, and 3 by applying the Rearrangement Algorithm to a stratified sample of \(N=40\) observations at and above the 99th percentile for the matrix \(X\).

Solution. The table below shows the input and output of the Rearrangement Algorithm.

Starting \(X\) is shown in the first three columns \(x_0, x_1, x_2\). The column Sum shows the row sums \(x_0+x_1+x_2\) corresponding to a comonotonic ordering. These four columns are all sorted in ascending order. The right-hand three columns, \(s_0, s_1, s_2\) are the output, with row sum given in the Max VaR column. The worst-case \(\text{VaR}_{0.99}\) is the minimum of the last column, 352.8. It is 45 percent greater than the additive VaR of 242.5. Only a sample from each marginal’s largest 1 percent values is shown since smaller values are irrelevant to the calculation.
\(x_0\)	\(x_1\)	\(x_2\)	Sum	\(s_0\)	\(s_1\)	\(s_2\)	Max VaR
49.0	85.6	107.9	242.5	87.1	124.6	141.1	352.8
49.4	86.6	109.5	245.6	70.8	113.6	169.3	353.7
49.9	87.7	111.2	248.8	98.8	127.9	127.4	354.1
50.3	88.9	112.9	252.1	79.9	118.8	155.5	354.1
50.7	90.0	114.7	255.5	83.1	107.1	164.3	354.5
51.2	91.3	116.6	259.1	92.1	139.7	122.8	354.6
51.6	92.6	118.6	262.8	67.7	135.4	151.5	354.7
52.1	93.9	120.6	266.6	108.8	116.1	129.8	354.7
52.6	95.3	122.8	270.7	62.8	105.1	186.9	354.8
53.2	96.7	125.0	274.9	63.9	170.6	120.6	355.0
53.7	98.3	127.4	279.3	69.2	111.3	174.6	355.1
54.3	99.9	129.8	284.0	72.7	144.5	138.1	355.3
54.9	101.5	132.4	288.8	59.9	101.5	194.1	355.5
55.5	103.3	135.2	293.9	127.5	103.3	125.0	355.8
56.1	105.1	138.1	299.3	60.8	162.6	132.4	355.9
56.8	107.1	141.1	305.0	66.3	109.1	180.5	355.9
57.5	109.1	144.4	311.1	61.8	149.8	144.4	356.0
58.3	111.3	147.9	317.5	65.0	155.8	135.2	356.0
59.1	113.6	151.5	324.3	74.8	121.6	159.7	356.1
59.9	116.1	155.5	331.5	77.1	131.5	147.9	356.5
60.8	118.8	159.7	339.3	59.1	179.9	118.6	357.5
61.8	121.6	164.3	347.7	58.3	99.9	202.0	360.1
62.8	124.6	169.3	356.7	57.5	191.1	116.6	365.3
63.9	127.9	174.6	366.4	56.8	98.3	210.9	366.0
65.0	131.5	180.5	377.0	56.1	96.7	221.0	373.9
66.3	135.4	186.9	388.7	55.5	205.1	114.7	375.4
67.7	139.7	194.1	401.5	54.9	95.3	232.7	382.9
69.2	144.5	202.0	415.7	54.3	223.3	112.9	390.5
70.8	149.8	210.9	431.6	53.7	93.9	246.3	393.9
72.7	155.8	221.0	449.5	53.2	92.6	262.5	408.2
74.8	162.6	232.7	470.1	52.6	248.7	111.2	412.5
77.1	170.6	246.3	494.0	52.1	91.3	282.3	417.7
79.9	179.9	262.5	522.3	51.2	288.1	109.5	448.8
83.1	191.1	282.3	556.5	51.6	90.0	307.2	448.9
87.1	205.1	307.2	599.4	50.7	88.9	340.0	479.6
92.1	223.3	340.0	655.5	50.3	87.7	386.6	524.6
98.8	248.7	386.6	734.1	49.9	366.9	107.9	524.7
108.8	288.1	461.1	858.0	49.4	86.6	461.1	597.2
127.5	366.9	615.7	1110.1	49.0	85.6	615.7	750.3

The table illustrates the worst-case VaR may be substantially higher than when the marginals are perfectly correlated, here 45 percent higher at 352.8 vs. 242.5. The form of the output columns shows the two part structure. There is a series of values up to 356 involving moderate sized losses from each marginal with approximately the same total. The larger values of the rearrangement are formed from a large value from one marginal combined with smaller values from the other two.

The bold entry \(366.4\) indicates when the comonotonic sum of marginals exceeds the worst 0.99-VaR arrangement.

Performing the same calculation with \(N=1000\) samples from the largest 1 percent of each marginal produces an estimated worst VaR of 360.5.

The following code replicates this calculation in aggregate. The answer relies on random seeds and is slightly different from the table above.

In [1]: import aggregate as agg

In [2]: import numpy as np

In [3]: import pandas as pd

In [4]: import scipy.stats as ss

In [5]: ps = np.linspace(0.99, 1, 40, endpoint=False)

In [6]: params = {i: agg.mu_sigma_from_mean_cv(10, i) for i in [1,2,3]}

In [7]: df = pd.DataFrame({f'x_{i}': ss.lognorm(params[i][1],
   ...:    scale=np.exp(params[i][0])).isf(1-ps)
   ...:    for i in [1,2,3]}, index=ps)
   ...: 

In [8]: df_ra = agg.rearrangement_algorithm_max_VaR(df)

In [9]: agg.qd(df_ra, float_format=lambda x: f'{x:.1f}', max_rows=100)

     x_1   x_2   x_3  total
75.3 120.1 157.6  352.9
69.7 117.4 166.5  353.6
68.2 151.4 134.3  353.8
 87.6 146.0 120.2  353.9
77.7 122.9 153.6  354.2
66.8 141.2 146.4  354.4
 63.2 129.3 161.9  354.4
71.3 106.3 176.9  354.6
 60.4 172.3 122.3  355.0
99.4 112.6 143.1  355.0
73.2 110.4 171.5  355.0
128.2 102.7 124.5  355.4
64.3 108.3 182.8  355.5
61.3 157.4 137.0  355.7
109.5 114.9 131.6  356.0
62.2 104.5 189.3  356.0
 80.4 126.0 149.9  356.3
83.7 132.9 140.0  356.6
65.5 164.3 126.8  356.6
59.5 101.0 196.5  357.1
 92.6 136.9 129.1  358.7
58.7 181.7 118.3  358.7
58.0  99.4 204.6  361.9
 57.3 193.0 116.4  366.7
56.6  97.9 213.6  368.0
55.9  96.4 223.8  376.1
55.3 207.1 114.5  377.0
 54.7  95.0 235.6  385.3
 54.1 225.5 112.8  392.4
53.6  93.7 249.3  396.5
53.0  92.4 265.7  411.1
52.5 251.0 111.1  414.6
52.0  91.1 285.6  428.8
 51.6 290.7 109.5  451.8
51.1  89.9 310.8  451.8
 50.7  88.8 343.9  483.4
50.3 370.1 107.9  528.2
49.8  87.7 391.0  528.5
49.4  86.6 466.1  602.1
49.0  85.6 622.1  756.7

There are several important points to note about the Rearrangement Algorithm output and the failure of subadditivity it induces. They mirror the case \(d=2\).

The dependence structure does not have right tail dependence.
In Table 1, the comonotonic sum is greater than the maximum VaR sum for the top 40 percent observations, above 366.4. The algorithm output is tailored to a specific value of \(p\) and does not work for other \(p\)s. It produces relatively thinner tails for higher values of \(p\) than the comonotonic copula.
The algorithm works for any non-trivial marginal distributions—it is universal.
The implied dependence structure specifies only how the larger values of each marginal are related; any dependence structure can be used for values below \(\mathsf{VaR}_p\).

The Rearrangement Algorithm gives a definitive answer to the question “Just how bad could things get?” and perhaps provides a better base against which to measure diversification effect than either independence or the comonotonic copula. While the multivariate structure it reveals is odd and specific to \(p\), it is not wholly improbable. It pinpoints a worst-case driven by a combination of moderately severe, but not extreme, tail event outcomes. Anyone who remembers watching their investment portfolio during a financial crisis has seen that behavior before! It is a valuable additional feature for any risk aggregation software.