The Iman-Conover Method

Basic Idea

Here is the basic idea of the Iman-Conover method. Given samples of \(n\) values from two known marginal distributions \(X\) and \(Y\) and a desired correlation \(\rho\) between them, re-order the samples to have the same rank order as a reference distribution, of size \(n\times 2\), with linear correlation \(\rho\). Since linear correlation and rank correlation are typically close, the re-ordered output will have approximately the desired correlation structure. What makes the IC method work so effectively is the existence of easy algorithms to determine samples from reference distributions with prescribed linear correlation structures.

Theoretical Derivation

Suppose that \(\mathsf{M}\) is an \(n\) element sample from an \(r\) dimensional multivariate distribution, so \(\mathsf{M}\) is an \(n\times r\) matrix. Assume that the columns of \(\mathsf{M}\) are uncorrelated, have mean zero, and standard deviation one. Let \(\mathsf{M}'\) denote the transpose of \(\mathsf{M}\). These assumptions imply that the correlation matrix of the sample \(\mathsf{M}\) can be computed as \(n^{-1}\mathsf{M}'\mathsf{M}\), and because the columns are independent, \(n^{-1}\mathsf{M}'\mathsf{M}=\mathsf{id}\). (There is no need to scale the covariance matrix by the row and column standard deviations because they are all one. In general \(n^{-1}\mathsf{M}'\mathsf{M}\) is the covariance matrix of \(\mathsf{M}\).)

Let \(\mathsf{S}\) be a correlation matrix, i.e. \(\mathsf{S}\) is a positive semi-definite symmetric matrix with 1’s on the diagonal and all elements \(\le 1\) in absolute value. In order to rule out linearly dependent variables assume \(\mathsf{S}\) is positive definite. These assumptions ensure \(\mathsf{S}\) has a Choleski decomposition


for some upper triangular matrix \(\mathsf{C}\), see Golub Golub or Press et al. Set \(\mathsf{T}=\mathsf{M}\mathsf{C}\). The columns of \(\mathsf{T}\) still have mean zero, because they are linear combinations of the columns of \(\mathsf{M}\) which have zero mean by assumption. It is less obvious, but still true, that the columns of \(\mathsf{T}\) still have standard deviation one. To see why, remember that the covariance matrix of \(\mathsf{T}\) is


since \(n^{-1}\mathsf{M}'\mathsf{M}=\mathsf{id}\) is the identity by assumption. Now \(\mathsf{S}\) is actually the correlation matrix too because the diagonal is scaled to one, so the covariance and correlation matrices coincide. The process of converting \(\mathsf{M}\), which is easy to simulate, into \(\mathsf{T}\), which has the desired correlation structure \(\mathsf{S}\), is the theoretical basis of the IC method.

It is important to note that estimates of correlation matrices, depending on how they are constructed, need not have the mathematical properties of a correlation matrix. Therefore, when trying to use an estimate of a correlation matrix in an algorithm, such as the Iman-Conover, which actually requires a proper correlation matrix as input, it may be necessary to check the input matrix does have the correct mathematical properties.

Next we discuss how to make \(n\times r\) matrices \(\mathsf{M}\), with independent, mean zero columns. The basic idea is to take \(n\) numbers \(a_1,\dots,a_n\) with \(\sum_i a_i=0\) and \(n^{-1}\sum_i a_i^2=1\), use them to form one \(n\times 1\) column of \(\mathsf{M}\), and then to copy it \(r\) times. Finally randomly permute the entries in each column to make them independent as columns of random variables. Iman and Conover call the \(a_i\) “scores”. They discuss several possible definitions for the scores, including scaled versions of \(a_i=i\) (ranks) and \(a_i\) uniformly distributed. They note that the shape of the output multivariate distribution depends on the scores. All of the examples in their paper use normal scores. We will discuss normal scores here, and consider alternatives in Section 1.4.1.

Given that the scores will be based on normal random variables, we can either simulate \(n\) random standard normal variables and then shift and re-scale to ensure mean zero and standard deviation one, or we can use a stratified sample from the standard normal, \(a_i=\Phi^{-1}(i/(n+1))\). By construction, the stratified sample has mean zero which is an advantage. Also, by symmetry, using the stratified sample halves the number of calls to \(\Phi^{-1}\). For these two reasons we prefer it in the algorithm below.

The correlation matrix of \(\mathsf{M}\), constructed by randomly permuting the scores in each column, will only be approximately equal to \(\mathsf{id}\) because of random simulation error. In order to correct for the slight error which could be introduced Iman and Conover use another adjustment in their algorithm. Let \(\mathsf{EE}=n^{-1}\mathsf{M}'\mathsf{M}\) be the actual correlation matrix of \(\mathsf{M}\) and let \(\mathsf{EE}=\mathsf{F}'\mathsf{F}\) be the Choleski decomposition of \(\mathsf{EE}\), and define \(\mathsf{T}=\mathsf{M}\mathsf{F}^{-1}\mathsf{C}\). The columns of \(\mathsf{T}\) have mean zero, and the covariance matrix of \(\mathsf{T}\) is

\[\begin{split}\begin{aligned} n^{-1}\mathsf{T}'\mathsf{T} &= n^{-1}\mathsf{C}'\mathsf{F}'^{-1}\mathsf{M}'\mathsf{M}\mathsf{F}^{-1}\mathsf{C} \notag \\ &= \mathsf{C}'\mathsf{F}'^{-1}\mathsf{EE}\mathsf{F}^{-1}\mathsf{C} \notag \\ &= \mathsf{C}'\mathsf{F}'^{-1}\mathsf{F}'\mathsf{F}\mathsf{F}^{-1}\mathsf{C} \notag \\ &= \mathsf{C}' \mathsf{C} \notag \\ &= \mathsf{S},\label{icCorr}\end{aligned}\end{split}\]

and hence \(\mathsf{T}\) has correlation matrix exactly equal to \(\mathsf{S}\), as desired. If \(\mathsf{EE}\) is singular then the column shuffle needs to be repeated.

Now the reference distribution \(\mathsf{T}\) with exact correlation structure \(\mathsf{S}\) is in hand, all that remains to complete the IC method is to re-order the each column of the input distribution \(\mathsf{X}\) to have the same rank order as the corresponding column of \(\mathsf{T}\).


Here is a more algorithmic description of the IC method. The description uses normal scores and the Choleski method to determine the reference distribution. As we discussed above, it is possible to make other choices in place of these and they are discussed in Section 1.4. We will actually present two versions of the core algorithm. The first, called “Simple Algorithm” deals with the various matrix operations at a high level. The second “Detailed Algorithm” takes a more sophisticated approach to the matrix operations, including referencing appropriate Lapack routines. Lapack is a standard set of linear algebra functions. Software vendors provide very high performance implementations of Lapack, many of which are used in CPU benchmarks. Several free Windows implementations are available on the web. The reader should study the simple algorithm first to understand what is going in the IC method. In order to code a high performance implementation you should follow the steps outlined in the detailed algorithm. Both algorithms have the same inputs and outputs.

An \(n \times r\) matrix \(\mathsf{X}\) consisting of \(n\) samples from each of \(r\) marginal distributions, and a desired correlation matrix \(\mathsf{S}\).

The IC method does not address how the columns of \(\mathsf{X}\) are determined. It is presumed that the reader has sampled from the appropriate distributions in some intelligent manner. The matrix \(\mathsf{S}\) must be a correlation matrix for linearly independent random variables, so it must be symmetric and positive definite. If \(\mathsf{S}\) is not symmetric positive semi-definite the algorithm will fail at the Choleski decomposition step. The output is a matrix \(\mathsf{T}\) each of whose columns is a permutation of the corresponding column of \(\mathsf{X}\) and whose approximate correlation matrix is \(\mathsf{S}\).

  1. Make one column of scores \(a_i=\Phi^{-1}(i/(n+1))\) for \(i=1,\dots,n\) and rescale to have standard deviation one.

  2. Copy the scores \(r\) times to make the score matrix \(\mathsf{M}\).

  3. Randomly permute the entries in each column of \(\mathsf{M}\).

  4. Compute the correlation matrix \(\mathsf{EE}=n^{-1}\mathsf{M}'\mathsf{M}\) of \(\mathsf{M}\).

  5. Compute the Choleski decomposition \(\mathsf{EE}=\mathsf{F}'\mathsf{F}\) of \(\mathsf{EE}\).

  6. Compute the Choleski decomposition \(\mathsf{S}=\mathsf{C}'\mathsf{C}\) of the desired correlation matrix \(\mathsf{S}\).

  7. Compute \(\mathsf{T}=\mathsf{M}\mathsf{F}^{-1}\mathsf{C}\). The matrix \(\mathsf{T}\) has exactly the desired correlation structure.

  8. Let \(\mathsf{Y}\) be the input matrix \(\mathsf{X}\) with each column reordered to have exactly the same rank ordering as the corresponding column of \(\mathsf{T}\).

  9. Compute the Choleski decomposition of \(\mathsf{S}\), \(\mathsf{S}=\mathsf{C}'\mathsf{C}\), with \(\mathsf{C}\) upper triangular. If the Choleski algorithm fails then \(\mathsf{S}\) is not a valid correlation matrix. Flag an error and exit. Checking \(\mathsf{S}\) is a correlation matrix in Step 1 avoids performing wasted calculations and allows the routine to exit as quickly as possible. Also check that all the diagonal entries of \(\mathsf{S}\) are 1 so \(\mathsf{S}\) has full rank. Again flag an error and exit if not. The Lapack routine DPOTRF can use be used to compute the Choleski decomposition. In the absence of Lapack, \(\mathsf{C}=(c_{ij})\) can be computed recursively using

    \[c_{ij}=\frac{s_{ij}-\sum_{k=1}^{j-1} c_{ik}c_{jk}}{\sqrt{1-\sum_{k=1}^{j-1} c_{jk}^2}}\label{chol}\]

    for \(1\le i\le j\le n\)—since all the diagonal elements of \(S\) equal one. The empty sum \(\sum_0^0=0\) and for \(j>i\) the denominator equals \(c_{ii}\) and the elements of \(\mathsf{C}\) should be calculated from left to right, top to bottom. See Wang or Herzog.

  10. Let \(m=\lfloor n/2\rfloor\) be the largest integer less than or equal to \(n/2\) and \(v_i=\Phi^{-1}(i/(2m+1))\) for \(i=1,\dots,m\).

  11. If \(n\) is odd set


    and if \(n\) is even set


    Here we have chosen to use normal scores. Other distributions could be used in place of the normal, as discussed in Section 1.4.1. Also note that by taking advantage of the symmetry of the normal distribution halves the number of calls to \(\Phi^{-1}\) which is relatively computationally expensive. If multiple calls will be made to the IC algorithm then store \(\mathsf{v}\) for use in future calls.

  12. Form the \(n\times r\) score matrix \(\mathsf{M}\) from \(r\) copies of the scores vector \(\mathsf{v}\).

  13. Compute \(m_{xx}=n^{-1}\sum_i v_i^2\), the variance of \(\mathsf{v}\). Note that \(\sum_i v_i=0\) by construction.

  14. Randomly shuffle columns \(2,\dots,r\) of the score matrix.

  15. Compute the correlation matrix \(\mathsf{EE}\) of the shuffled score matrix \(\mathsf{M}\). Each column of \(\mathsf{M}\) has mean zero, by construction, and variance \(m_{xx}\). The correlation matrix is obtained by dividing each element of \(\mathsf{M}'\mathsf{M}\) by \(m_{xx}\). The matrix product can be computed using the Lapack routine DGEMM. If \(\mathsf{EE}\) is singular repeat step 6.

  16. Determine Choleski decomposition \(\mathsf{EE}=\mathsf{F}'\mathsf{F}\) of \(\mathsf{EE}\) using the Lapack routine DPOTRF. Because \(\mathsf{EE}\) is a correlation matrix it must be symmetric and positive definite and so is guaranteed to have a Choleski root.

  17. Compute \(\mathsf{F}^{-1}\mathsf{C}\) using the Lapack routine DTRTRS to solve the linear equation \(\mathsf{F}\mathsf{A}=\mathsf{C}\) for \(\mathsf{A}\). Solving the linear equation avoids a time consuming matrix inversion and multiplication. The routine DTRTRS is optimized for upper triangular input matrices.

  18. Compute the correlated scores \(\mathsf{T}=\mathsf{M}\mathsf{F}^{-1}\mathsf{C}=\mathsf{M}\mathsf{A}\) using DGEMM. The matrix \(\mathsf{T}\) has exactly the desired correlation structure.

  19. Compute the ranks of the elements of \(\mathsf{T}\). Ranks are computed by indexing the columns of \(\mathsf{T}\) as described in Section 8.4 of Press et al. Let \(r(k)\) denote the index of the \(k\)th ranked element of \(\mathsf{T}\).

  20. Let \(\mathsf{Y}\) be the \(n\times r\) matrix with \(i\)th column equal to the \(i\)th column of the input matrix \(\mathsf{X}\) given the same rank order as \(\mathsf{T}\). The re-ordering is performed using the ranks computed in the previous step. First sort the input columns into ascending order if they are not already sorted and then set \(\mathsf{Y}_{i,k}=\mathsf{X}_{i,r(k)}\).

The output of the algorithm is a matrix \(\mathsf{Y}\) each of whose columns is a permutation of the corresponding column of the input matrix \(\mathsf{X}\). The rank correlation matrix of \(\mathsf{Y}\) is identical to that of a multivariate distribution with correlation matrix \(\mathsf{S}\).

Simple Example of Iman-Conover

Having explained the IC method, we now give a simple example to explicitly show all the details. The example will work with \(n=20\) samples and \(r=4\) different marginals. The marginals are samples from four lognormal distributions, with parameters \(\mu=12,11,10,10\) and \(\sigma=0.15,0.25,0.35,0.25\). The input matrix is

\[\begin{split}\mathsf{X}= \begin{pmatrix} 123,567 & 44,770 & 15,934 & 13,273 \\ 126,109 & 45,191 & 16,839 & 15,406 \\ 138,713 & 47,453 & 17,233 & 16,706 \\ 139,016 & 47,941 & 17,265 & 16,891 \\ 152,213 & 49,345 & 17,620 & 18,821 \\ 153,224 & 49,420 & 17,859 & 19,569 \\ 153,407 & 50,686 & 20,804 & 20,166 \\ 155,716 & 52,931 & 21,110 & 20,796 \\ 155,780 & 54,010 & 22,728 & 20,968 \\ 161,678 & 57,346 & 24,072 & 21,178 \\ 161,805 & 57,685 & 25,198 & 23,236 \\ 167,447 & 57,698 & 25,393 & 23,375 \\ 170,737 & 58,380 & 30,357 & 24,019 \\ 171,592 & 60,948 & 30,779 & 24,785 \\ 178,881 & 66,972 & 32,634 & 25,000 \\ 181,678 & 68,053 & 33,117 & 26,754 \\ 184,381 & 70,592 & 35,248 & 27,079 \\ 206,940 & 72,243 & 36,656 & 30,136 \\ 217,092 & 86,685 & 38,483 & 30,757 \\ 240,935 & 87,138 & 39,483 & 35,108 \end{pmatrix}.\end{split}\]

Note that the marginals are all sorted in ascending order. The algorithm does not actually require pre-sorting the marginals but it simplifies the last step.

The desired target correlation matrix is

\[\begin{split}\mathsf{S}= \begin{pmatrix} 1.000 & 0.800 & 0.400 & 0.000\\ 0.800 & 1.000 & 0.300 & -0.200\\ 0.400 & 0.300 & 1.000 & 0.100\\ 0.000 & -0.200 & 0.100 & 1.000 \end{pmatrix}.\end{split}\]

The Choleski decomposition of \(\mathsf{S}\) is

\[\begin{split}\mathsf{C}= \begin{pmatrix} 1.000 & 0.800 & 0.400 & 0.000\\ 0.000 & 0.600 & -0.033 & -0.333\\ 0.000 & 0.000 & 0.916 & 0.097\\ 0.000 & 0.000 & 0.000 & 0.938 \end{pmatrix}.\end{split}\]

Now we make the score matrix. The basic scores are \(\Phi^{-1}(i/21)\), for \(i=1,\dots,20\). We scale these by \(0.868674836252965\) to get a vector \(\mathsf{v}\) with standard deviation one. Then we combine four \(\mathsf{v}\)’s and shuffle randomly to get

\[\begin{split}\mathsf{M}= \begin{pmatrix} -1.92062 & 1.22896 & -1.00860 & -0.49584 \\ -1.50709 & -1.50709 & -1.50709 & 0.82015 \\ -1.22896 & 1.92062 & 0.82015 & -0.65151 \\ -1.00860 & -0.20723 & 1.00860 & -1.00860 \\ -0.82015 & 0.82015 & 0.34878 & 1.92062 \\ -0.65151 & -1.22896 & -0.65151 & 0.20723 \\ -0.49584 & -0.65151 & 1.22896 & -0.34878 \\ -0.34878 & -0.49584 & -0.49584 & -0.06874 \\ -0.20723 & -1.00860 & 0.20723 & 0.65151 \\ -0.06874 & 0.49584 & 0.06874 & -1.22896 \\ 0.06874 & -0.34878 & -1.22896 & 0.49584 \\ 0.20723 & 0.34878 & 0.65151 & 0.34878 \\ 0.34878 & -0.06874 & -0.20723 & 1.22896 \\ 0.49584 & -1.92062 & -0.82015 & -0.20723 \\ 0.65151 & 0.20723 & 1.92062 & -1.92062 \\ 0.82015 & 1.00860 & 1.50709 & 1.50709 \\ 1.00860 & -0.82015 & -1.92062 & 1.00860 \\ 1.22896 & 1.50709 & 0.49584 & -1.50709 \\ 1.50709 & 0.06874 & -0.06874 & 0.06874 \\ 1.92062 & 0.65151 & -0.34878 & -0.82015 \end{pmatrix}.\end{split}\]

As described in Section 1.1, \(\mathsf{M}\) is approximately independent. In fact \(\mathsf{M}\) has covariance matrix

\[\begin{split}\mathsf{EE}= \begin{pmatrix} 1.0000 & 0.0486 & 0.0898 & -0.0960 \\ 0.0486 & 1.0000 & 0.4504 & -0.2408 \\ 0.0898 & 0.4504 & 1.0000 & -0.3192 \\ -0.0960 & -0.2408 & -0.3192 & 1.0000 \end{pmatrix}\end{split}\]

and \(\mathsf{EE}\) has Choleski decomposition

\[\begin{split}\mathsf{F}= \begin{pmatrix} 1.0000 & 0.0486 & 0.0898 & -0.0960\\ 0.0000 & 0.9988 & 0.4466 & -0.2364\\ 0.0000 & 0.0000 & 0.8902 & -0.2303\\ 0.0000 & 0.0000 & 0.0000 & 0.9391 \end{pmatrix}.\end{split}\]

Thus \(\mathsf{T}=\mathsf{M}\mathsf{F}^{-1}\mathsf{C}\) is given by

\[\begin{split}\mathsf{T}= \begin{pmatrix} -1.92062 & -0.74213 & -2.28105 & -1.33232 \\ -1.50709 & -2.06697 & -1.30678 & 0.54577 \\ -1.22896 & 0.20646 & -0.51141 & -0.94465 \\ -1.00860 & -0.90190 & 0.80546 & -0.65873 \\ -0.82015 & -0.13949 & -0.31782 & 1.76960 \\ -0.65151 & -1.24043 & -0.27999 & 0.23988 \\ -0.49584 & -0.77356 & 1.42145 & 0.23611 \\ -0.34878 & -0.56670 & -0.38117 & -0.14744 \\ -0.20723 & -0.76560 & 0.64214 & 0.97494 \\ -0.06874 & 0.24487 & -0.19673 & -1.33695 \\ 0.06874 & -0.15653 & -1.06954 & 0.14015 \\ 0.20723 & 0.36925 & 0.56694 & 0.51206 \\ 0.34878 & 0.22754 & -0.06362 & 1.19551 \\ 0.49584 & -0.77154 & 0.26828 & 0.03168 \\ 0.65151 & 0.62666 & 2.08987 & -1.21744 \\ 0.82015 & 1.23804 & 1.32493 & 1.85680 \\ 1.00860 & 0.28474 & -1.23688 & 0.59246 \\ 1.22896 & 1.85260 & 0.17411 & -1.62428 \\ 1.50709 & 1.20294 & 0.39517 & 0.13931 \\ 1.92062 & 1.87175 & -0.04335 & -0.97245 \end{pmatrix}.\end{split}\]

An easy calculation will verify that \(\mathsf{T}\) has correlation matrix \(\mathsf{S}\), as required.

To complete the IC method we must re-order each column of \(\mathsf{X}\) to have the same rank order as \(\mathsf{T}\). The first column does not change because it is already in ascending order. In the second column, the first element of \(\mathsf{Y}\) must be the 14th element of \(\mathsf{X}\), the second the 20th, third 10th and so on. The ranks of the other elements are the transpose of

\[\begin{split}\begin{pmatrix} 14 & 20 & 10 & 18 & 11 & 19 & 17 & 13 & 15 & 8 & 12 & 6 & 9 & 16 & 5 & 3 & 7 & 2 & 4 & 1\\ 20 & 19 & 16 & 4 & 14 & 13 & 2 & 15 & 5 & 12 & 17 & 6 & 11 & 8 & 1 & 3 & 18 & 9 & 7 & 10\\ 18 & 6 & 15 & 14 & 2 & 8 & 9 & 13 & 4 & 19 & 10 & 7 & 3 & 12 & 17 & 1 & 5 & 20 & 11 & 16 \end{pmatrix}\end{split}\]

and the resulting re-ordering of \(\mathsf{X}\) is

\[\begin{split}\mathsf{T}= \begin{pmatrix} 123,567 & 50,686 & 15,934 & 16,706 \\ 126,109 & 44,770 & 16,839 & 25,000 \\ 138,713 & 57,685 & 17,620 & 19,569 \\ 139,016 & 47,453 & 35,248 & 20,166 \\ 152,213 & 57,346 & 20,804 & 30,757 \\ 153,224 & 45,191 & 21,110 & 24,019 \\ 153,407 & 47,941 & 38,483 & 23,375 \\ 155,716 & 52,931 & 17,859 & 20,796 \\ 155,780 & 49,420 & 33,117 & 27,079 \\ 161,678 & 58,380 & 22,728 & 15,406 \\ 161,805 & 54,010 & 17,265 & 23,236 \\ 167,447 & 66,972 & 32,634 & 24,785 \\ 170,737 & 57,698 & 24,072 & 30,136 \\ 171,592 & 49,345 & 30,357 & 20,968 \\ 178,881 & 68,053 & 39,483 & 16,891 \\ 181,678 & 72,243 & 36,656 & 35,108 \\ 184,381 & 60,948 & 17,233 & 26,754 \\ 206,940 & 86,685 & 25,393 & 13,273 \\ 217,092 & 70,592 & 30,779 & 21,178 \\ 240,935 & 87,138 & 25,198 & 18,821 \end{pmatrix}.\end{split}\]

The rank correlation matrix of \(\mathsf{Y}\) is exactly \(\mathsf{S}\). The actual linear correlation is only approximately equal to \(\mathsf{S}\). The achieved value is

\[\begin{split}\begin{pmatrix} 1.00 & 0.85 & 0.26 & -0.11 \\ 0.85 & 1.00 & 0.19 & -0.20 \\ 0.26 & 0.19 & 1.00 & 0.10 \\ -0.11 & -0.20 & 0.10 & 1.00 \end{pmatrix},\end{split}\]

a fairly creditable performance given the input correlation matrix and the very small number of samples \(n=20\). When used with larger sized samples the IC method typically produces a very close approximation to the required correlation matrix, especially when the marginal distributions are reasonably symmetric.

Extensions of Iman-Conover

Following through the explanation of the IC method shows that it relies on a choice of multivariate reference distribution. A straightforward method to compute a reference is to use the Choleski decomposition method Equation ([coolA]) applied to certain independent scores. The example in Section 1.3 used normal scores. However nothing prevents us from using other distributions for the scores provided they are suitably normalized to have mean zero and standard deviation one.

Another approach to IC is to use a completely different multivariate distribution as reference. There are several other families of multivariate distributions, including the elliptically contoured distribution family (which includes the normal and \(t\) as a special cases) and multivariate Laplace distribution, which are easy to simulate from. Changing scores is actually an example of changing the reference distribution; however, for the examples we consider the exact form of the new reference is unknown.

Alternative Scores

The choice of score distribution has a profound effect on the multivariate distribution output by the IC method. The basic Iman-Conover algorithm uses normally distributed scores. We now show the impact of using exponentially and uniformly distributed scores.

The next figure shows three bivariate distributions with identical marginal distributions (shown in the lower right hand plot), the same correlation coefficient of \(0.643\pm 0.003\) but using normal scores (top left), exponential scores (top rigtht) and uniform scores (lower left). The input correlation to the IC method was 0.65 in all three cases and there are 1000 pairs in each plot. Here the IC method produced bivariate distributions with actual correlation coefficient extremely close to the requested value.

The normal scores produce the most natural looking bivariate distribution, with approximately elliptical contours. The bivariate distributions with uniform or exponential scores look unnatural, but it is important to remember that if all you know about the bivariate distribution are the marginals and correlation coefficient all three outcomes are possible.


Bivariate distributions with normal, uniform and exponential scores.

Figure MISSING shows the distribution of the sum of the two marginals for each of the three bivariate distributions and for independent marginals. The sum with exponential scores has a higher kurtosis (is more peaked) than with normal scores. As expected all three dependent sums have visibly thicker tails than the independent sum.

Iman and Conover considered various different score distributions in their paper. They preferred normal scores as giving more natural looking, elliptical contours. Certainly, the contours produced using exponential or uniform scores appear unnatural. If nothing else they provide a sobering reminder that knowing the marginal distributions and correlation coefficient of a bivariate distribution does not come close to fully specifying it!

Multivariate Reference Distributions

The IC method needs some reference multivariate distribution to determine an appropriate rank ordering for the input marginals. So far we have discussed using the Choleski decomposition trick in order to determine a multivariate normal reference distribution. However, any distribution can be used as reference provided it has the desired correlation structure. Multivariate distributions that are closely related by formula to the multivariate normal, such as elliptically contoured distributions and asymmetric Laplace distributions, can be simulated using the Choleski trick.

Elliptically contoured distributions are a family which extends the normal. For a more detailed discussion see Fang and Zhang. The multivariate \(t\)-distribution and symmetric Laplace distributions are in the elliptically contoured family. Elliptically contoured distributions must have characteristic equations of the form


for some \(\phi:\mathsf{R}\to\mathsf{R}\), where \(\mathsf{m}\) is an \(r\times 1\) vector of means and \(\mathsf{S}\) is a \(r\times r\) covariance matrix (nonnegative definite and symmetric). In one dimension the elliptically contoured distributions coincide with the symmetric distributions. The covariance is \(\mathsf{S}\), if it is defined.

If \(\mathsf{S}\) has rank \(r\) then an elliptically contoured distribution \(\mathsf{x}\) has a stochastic representation

\[\mathsf{x}=\mathsf{m} + R\mathsf{T}' \mathsf{u}^{(r)}\]

where \(\mathsf{T}\) is the Choleski decomposition of \(\mathsf{S}\), so \(\mathsf{S}=\mathsf{T}'\mathsf{T}\), \(\mathsf{u}^{(r)}\) is a uniform distribution on the sphere in \(\mathsf{R}^r\), and \(R\) is a scale factor independent of \(\mathsf{u}^{(r)}\). The idea here should be clear: pick a direction on the sphere, adjust by \(\mathsf{T}\), scale by a distance \(R\) and finally translate by the means \(\mathsf{m}\). A uniform distribution on a sphere can be created as \(\mathsf{x}/\Vert \mathsf{x}\Vert\) where \(\mathsf{x}\) has a multivariate normal distribution with identity covariance matrix. (By definition, \(\Vert \mathsf{x}\Vert^2=\sum_i x_i^2\) has a \(\chi^2_r\) distribution.) Uniform vectors \(\mathsf{u}^{(r)}\) can also be created by applying a random orthogonal matrix to a fixed vector \((1,0,\dots,0)\) on the sphere. Diaconis describes a method for producing random orthogonal matrices.

The \(t\)-copula with \(\nu\) degrees of freedom has a stochastic representation

\[\mathsf{x}=\mathsf{m} + \frac{\sqrt{\nu}}{\sqrt{S}}\mathsf{z}\label{tsim}\]

where \(S\sim \chi^2_{\nu}\) and \(\mathsf{z}\) is multivariate normal with means zero and covariance matrix \(\mathsf{S}\). Thus one can easily simulate from the multivariate \(t\) by first simulating multivariate normals and then simulating an independent \(S\) and multiplying.

The multivariate Laplace distribution is discussed in Kotz, Kozubowski and Podgorski. It comes in two flavors: symmetric and asymmetric. The symmetric distribution is also an elliptically contoured distribution. It has characteristic function of the form

\[\Phi(\mathsf{t})=\frac{1}{1+ \mathsf{t}'\mathsf{S}\mathsf{t} / 2}\label{symLaplace}\]

where \(\mathsf{S}\) is the covariance matrix. To simulate, use the fact that \(\sqrt{W}\mathsf{X}\) has a symmetric Laplace distribution if \(W\) is exponential and \(\mathsf{X}\) a multivariate normal with covariance matrix \(\mathsf{S}\).

The multivariate asymmetric Laplace distribution has characteristic function

\[\Psi(\mathsf{t})=\frac{1}{1+\mathsf{t}'\mathsf{S}\mathsf{t}/2 - i\mathsf{m}'\mathsf{t}}.\label{asymLaplace}\]

To simulate from WHAT, use the fact that

\[\mathsf{m} W + \sqrt{W}\mathsf{X} \label{aslsim}\]

has a symmetric Laplace distribution if \(W\) is exponential and \(\mathsf{X}\) a multivariate normal with covariance matrix \(\mathsf{S}\) and means zero. The asymmetric Laplace is not an elliptically contoured distribution.

The next figure compares IC samples produced using a normal copula to those produced with a \(t\)-copula. In both cases the marginals are normally distributed with mean zero and unit standard deviation. The \(t\)-copula has \(\nu=2\) degrees of freedom. In both figures the marginals are uncorrelated, but in the right the marginals are not independent. The \(t\)-copula has pinched tails, similar to Venter’s Heavy Right Tailed copulas.


IC samples produced from the same marginal and correlation matrix using the normal and \(t\) copula reference distributions.

Algorithms for Extended Methods

In Section 1.4.2 we described how the IC method can be extended by using different reference multivariate distributions. It is easy to change the IC algorithm to incorporate different reference distributions for \(t\)-copulas and asymmetric Laplace distributions. Follow the detailed algorithm to step 10. Then use the stochastic representation to simulate from the scaling distribution for each row and multiply each component by the resulting number, resulting in an adjusted \(\mathsf{T}\) matrix. Then complete steps 11 and 12 of the detailed algorithm.

Comparison With the Normal Copula Method

By the normal copula method we mean the following algorithm, described in Wang or Herzog.

A set of correlated risks \((X_1,\dots,X_r)\) with marginal cumulative distribution functions \(F_i\) and Kendall’s tau \(\tau_{ij}=\tau(X_i,X_j)\) or rank correlation coefficients \(r(X_i,X_j)\).

  1. Convert Kendall’s tau or rank correlation coefficient to correlation using

    \[\rho_{ij}=\sin(\pi\tau_{ij}/2)=2\sin(\pi r_{ij}/6)\]

    and construct the Choleski decomposition \(\mathsf{S}=\mathsf{C}'\mathsf{C}\) of \(\mathsf{S}=(\rho_{ij})\).

  2. Generate \(r\) standard normal variables \(\mathsf{Y}=(Y_1,\dots,Y_r)\).

  3. Set \(\mathsf{Z}=\mathsf{Y}\mathsf{C}\).

  4. Set \(u_i=\Phi(Z_i)\) for \(i=1,\dots,r\).

  5. Set \(X_i=F_i^{-1}(u_i)\).

The vectors \((X_1,\dots,X_r)\) form a sample from a multivariate distribution with prescribed correlation structure and marginals \(F_i\).

The Normal Copula method works because of the following theorem from Wang.

[wangThm] Assume that \((Z_1,\dots,Z_k)\) have a multivariate normal joint probability density function given by


\(\mathsf{z}=(z_1,\dots,z_k)\), with correlation coefficients \(\Sigma_{ij}=\rho_{ij}=\rho(Z_i,Z_j)\). Let \(H(z_1,\dots,z_k)\) be their joint cumulative distribution function. Then


defines a multivariate uniform cumulative distribution function called the normal copula.

For any set of given marginal cumulative distribution functions \(F_1,\dots,F_k\), the set of variables

\[\label{ncm} X_1=F_1^{-1}(\Phi(Z_1)),\dots,X_k=F_1^{-1}(\Phi(Z_k))\]

have a joint cumulative function

\[F_{X_1,\dots,X_k}(x_1,\dots,x_k)=H(\Phi^{-1}(F_x(u_1)),\dots, \Phi^{-1}(F_k(u_k))\]

with marginal cumulative distribution functions \(F_1,\dots,F_k\). The multivariate variables \((X_1,\dots,X_k)\) have Kendall’s tau


and Spearman’s rank correlation coefficients


In the normal copula method we simulate from \(H\) and then invert using ([ncm]). In the IC method with normal scores we produce a sample from \(H\) such that \(\Phi(z_i)\) are equally spaced between zero and one and then, rather than invert the distribution functions, we make the \(j\)th order statistic from the input sample correspond to \(\Phi(z)=j/(n+1)\) where the input has \(n\) observations. Because the \(j\)th order statistic of a sample of \(n\) observations from a distribution \(F\) approximates \(F^{-1}(j/(n+1))\) we see the normal copula and IC methods are doing essentially the same thing.

While the normal copula method and the IC method are confusingly similar there are some important differences to bear in mind. Comparing and contrasting the two methods should help clarify how the two algorithms are different.

  1. Wang [1998] shows the normal copula method corresponds to the IC method when the latter is computed using normal scores and the Choleski trick.

  2. The IC method works on a given sample of marginal distributions. The normal copula method generates the sample by inverting the distribution function of each marginal as part of the simulation process.

  3. Though the use of scores the IC method relies on a stratified sample of normal variables. The normal copula method could use a similar method, or it could sample randomly from the base normals. Conversely a sample could be used in the IC method.

  4. Only the IC method has an adjustment to ensure that the reference multivariate distribution has exactly the required correlation structure.

  5. IC method samples have rank correlation exactly equal to a sample from a reference distribution with the correct linear correlation. Normal copula samples have approximately correct linear and rank correlations.

  6. An IC method sample must be taken in its entirety to be used correctly. The number of output points is fixed by the number of input points, and the sample is computed in its entirety in one step. Some IC tools (@Risk, SCARE) produce output which is in a particular order. Thus, if you sample the \(n\)th observation from multiple simulations, or take the first \(n\) samples, you will not get a random sample from the desired distribution. However, if you select random rows from multiple simulations (or, equivalently, if you randomly permute the rows output prior to selecting the \(n\)th) then you will obtain the desired random sample. It is important to be aware of these issues before using canned software routines.

  7. The normal copula method produces simulations one at a time, and at each iteration the resulting sample is a sample from the required multivariate distribution. That is, output from the algorithm can be partitioned and used in pieces.

In summary remember these differences can have material practical consequences and it is important not to misuse IC method samples.

Theoretical Underpinnings of the Iman-Conover Method

The theoretical foundations of the Iman-Conover method are elegantly justified by Vitale’s Theorem Vitale [1990]. We will state Vitale’s theorem, explain its relationship to the IC method, and sketch the proof. The result should give a level of comfort to practitioners using a simulation approach to modeling multivariate distributions. It is not necessary to follow the details laid out here in order to understand and use the IC method, so the uninterested reader can skip the rest of the section. The presentation we give follows Vitale’s original paper Vitale [1990] closely.

Functional dependence and independence between two random variables are clearly opposite ends of the dependence spectrum. It is therefore surprising that Vitale’s Theorem says that any bivariate distribution \((U,V)\) can be approximated arbitrarily closely by a functionally dependent pair \((U,TU)\) for a suitable transformation \(T\).

In order to explain the set up of Vitale’s theorem we need to introduce some notation. Let \(n\) be a power of 2. An interval of the form \(((j-1)/n,j/n)\) for some \(n\ge 1\) and \(1\le j\le n\) is called a dyadic interval of rank \(n\). An invertible (Borel) measure-preserving map which maps by translation on each dyadic interval of rank \(n\) is called a permutation of rank \(n\). Such a \(T\) just permutes the dyadic intervals, so there is a natural correspondence between permutations of \(n\) elements and transformations \(T\). If the permutation of dyadic intervals has a single cycle (has order \(n\) in the symmetric group) then \(T\) is called a cyclic permutation.

Theorem. (Vitale) Let \(U\) and \(V\) be uniformly distributed variables. There is a sequence of cyclic permutations \(T_1,T_2,\dots\) such that \((U,T_nU)\) converges in distribution to \((U,V)\) as \(n\to \infty\).

Recall convergence in distribution means that the distribution function of \((U,T_nU)\) tends to that of \((U,V)\) at all points of continuity as \(n\to\infty\).

The proof of Vitale’s theorem is quite instructive and so we give a detailed sketch.

The proof is in two parts. The first constructs a sequence of arbitrary permutations \(T_n\) with the desired property. The second part shows it can be approximated with cyclic permutations. We skip the second refinement.

Divide the square \([0,1]\times [0,1]\) into sub-squares. We will find a permutation \(T\) such that the distributions of \((U,V)\) and \((U,TU)\) coincide on sub-squares. Reducing the size of the sub-squares will prove the result.

Fix \(n\), a power of two. Let \(I_j=((j-1)/n,j/n)\), \(j=1,\dots,n\). We will find an invertible permutation \(T\) such that

\[\Pr(U\in I_j,TU\in I_k)=\Pr(U\in I_j,V\in I_k):=p_{jk}\]

for \(j,k=1,\dots,n\). Define

\[\begin{split}\begin{aligned} I_{j1}&=((j-1)/n, (j-1)/n+p_{j1}) \\ I_{j2}&=((j-1)/n +p_{j1}, (j-1)/n+p_{j1}+p_{j2}) \\ && \cdots \\ I_{jn}&=((j-1)/n +p_{j1}+\cdots+p_{j,n-1}, j/n)\end{aligned}\end{split}\]


\[\begin{split}\begin{aligned} \tilde I_{j1}&=((j-1)/n, (j-1)/n+p_{1j}) \\ \tilde I_{j2}&=((j-1)/n +p_{1j}, (j-1)/n+p_{1j}+p_{2j}) \\ && \cdots \\ \tilde I_{jn}&=((j-1)/n +p_{1j}+\cdots+p_{n-1,j}, j/n).\end{aligned}\end{split}\]

By construction the measure of \(I_{jk}\) equals the measure of \(\tilde I_{kj}\). The invertible map \(T\) which sends each \(I_{jk}\) to \(\tilde I_{kj}\) by translation is the map we need because

\[\begin{split}\begin{aligned} \Pr(U\in I_j, T(U)\in I_k) &= \Pr(U\in I_j, U \in T^{-1}(I_k)) \\ &= \Pr(U\in I_j \cap T^{-1}(\bigcup_l \tilde I_{kl})) \\ &= \Pr(U\in \bigcup_l I_j \cap I_{lk}) \\ &= \Pr(U\in I_{jk}) \\ &= p_{jk},\end{aligned}\end{split}\]

since the only \(I_{lk}\) which intersects \(I_j\) is \(I_{jk}\) by construction, and \(U\) is uniform. The transformation \(T\) is illustrated schematically in Table 1 for \(n=3\). The fact 3 is not a power of 2 does not invalidate the schematic!

\[\begin{split}\small \begin{matrix} \begin{array}{c || c||c|c|c||c|c|c||c|c|c||} \hline & \tilde I_{33} & & & & & & & & & p_{33} \\ I_3 & \tilde I_{32} & & & & & & p_{23} & & & \\ & \tilde I_{31} & & & p_{13} & & & & & & \\ \hline \hline & \tilde I_{23} & & & & & & & & p_{32} & \\ I_2 & \tilde I_{22} & & & & & p_{22} & & & & \\ & \tilde I_{21} & & p_{12} & & & & & & & \\ \hline \hline & \tilde I_{13} & & & & & & & p_{31} & & \\ I_1 & \tilde I_{12} & & & & p_{21} & & & & & \\ & \tilde I_{11} & p_{11} & & & & & & & & \\ \hline \hline & & I_{11} & I_{12} & I_{13} & I_{21} & I_{22} & I_{13} & I_{31} & I_{32} & I_{33} \\ \hline \hline & & & I_1 & & & I_2 & & & I_3 \\ \end{array} \end{matrix}\end{split}\]

If each \(p_{jk}\) is a dyadic rational then \(T\) is a permutation of the interval. If not then we approximate and use some more heavy duty results (a 1946 theorem of Birkhoff on representation by convex combinations of permutation matrices) to complete the proof.

Vitale’s theorem can be extended to non-uniform distributions.

Corollary. (Vitale) Let \(U\) and \(V\) be arbitrary random variables. There is a sequence of functions \(S_1,S_2,\dots\) such that \((U,S_n U)\) converges in distribution to \((U,V)\) as \(n\to\infty\).

Let \(F\) be the distribution function of \(U\) and \(G\) for \(V\). Then \(F(U)\) and \(G(V)\) are uniformly distributed. Apply Vitale’s theorem to get a sequence of functions \(T_n\). Then \(S_n=G^{-1}T_nF\) is the required transformation.