# Statistics

In Smile, there are many statistical functions to describe and analyze data.

## Basic Statistic Functions

Use the following functions to calculate the descriptive statistics for your data: `sum`, `mean`, `median`, `q1`, `q3`, `variance`, `sd`, `mad` (median absolute deviation), `min`, `max`, `whichMin`, `whichMax`, etc.

``````
smile> val x = Array(1.0, 2.0, 3.0, 4.0)
x: Array[Double] = Array(1.0, 2.0, 3.0, 4.0)

smile> mean(x)
res1: Double = 2.5

smile> sd(x)
res2: Double = 1.2909944487358054
``````
``````
smile> import static smile.math.MathEx.*

smile> import smile.stat.*

smile> double[] x = {1.0, 2.0, 3.0, 4.0}
x ==> double[4] { 1.0, 2.0, 3.0, 4.0 }

smile> mean(x)
\$4 ==> 2.5

smile> sd(x)
\$5 ==> 1.2909944487358054
``````

## Distributions

Probability distributions are theoretical distributions based on assumptions about a source population. The distributions assign probability to the event that a random variable has a specific, discrete value, or falls within a specified range of continuous values.

All univariate distributions in Smile implements the interface `smile.math.stat.distribution.Distribution`. We support Bernoulli, beta, binomial, χ2, exponential, F, gamma, Gaussian, geometric, hyper geometric, logistic, log normal, negative binomial, Possion, shift geometric, t, and Weibull distribution. In additional, multivariate Gaussian distribution is supported. In fact, we also support finite mixture models and can estimate the exponential family mixture models from data.

A `Distribution` object can be created with given parameters. Meanwhile, they can be created by estimating parameters from a given data set. With a `Distribution` object, we may access its distribution parameter(s), mean, variance, standard deviation, entropy, generates a random number following the distribution, call its probability density function (the method `p` or cumulative distribution function (`cdf`). The reverse function of `cdf` is `quantile`. We can also calculate the likelihood or log likelihood of a sample set.

``````
smile> val e = new ExponentialDistribution(1.0)
e: smile.stat.distribution.ExponentialDistribution = Exponential Distribution(1.0000)

smile> e.mean
res3: Double = 1.0

smile> e.variance
res4: Double = 1.0

smile> e.sd
res5: Double = 1.0

smile> e.entropy
res6: Double = 1.0

// generate a random number
smile> e.rand
res7: Double = 0.3155668608029686

// PDF
smile> e.p(2.0)
res8: Double = 0.1353352832366127

smile> e.cdf(2.0)
res9: Double = 0.8646647167633873

smile> e.quantile(0.1)
res10: Double = 0.10536051565782628

smile> e.logLikelihood(Array(1.0, 1.1, 0.9, 1.5))
res12: Double = -4.5

// estimate a distribution from data
smile> val e = ExponentialDistribution.fit(Array(1.0, 1.1, 0.9, 1.5, 1.8, 1.9, 2.0, 0.5))
e: smile.stat.distribution.ExponentialDistribution = Exponential Distribution(0.7477)
``````
``````
smile> import smile.stat.distribution.*

smile> var e = new ExponentialDistribution(1.0)
e ==> Exponential Distribution(1.0000)

smile> e.mean()
\$8 ==> 1.0

smile> e.variance()
\$9 ==> 1.0

smile> e.sd()
\$10 ==> 1.0

smile> e.entropy()
\$11 ==> 1.0

smile> e.rand()
\$12 ==> 0.38422274023788616

smile> e.p(2)
\$13 ==> 0.1353352832366127

smile> e.cdf(2)
\$14 ==> 0.8646647167633873

smile> e.quantile(0.1)
\$15 ==> 0.10536051565782628

smile> double[] samples = {1.0, 1.1, 0.9, 1.5}
samples ==> double[4] { 1.0, 1.1, 0.9, 1.5 }

smile> e.logLikelihood(samples)
\$17 ==> -4.5

smile> double[] data = {1.0, 1.1, 0.9, 1.5, 1.8, 1.9, 2.0, 0.5}
data ==> double[8] { 1.0, 1.1, 0.9, 1.5, 1.8, 1.9, 2.0, 0.5 }

smile> var d = ExponentialDistribution.fit(data)
d ==> Exponential Distribution(0.7477)
``````

The below is a more advanced example of estimating a mixture model of Gaussian, exponential and gamma distribution. The result is quite accurate for this complicated case.

``````
smile> val gaussian = new GaussianDistribution(-2.0, 1.0)
smile> val exp = new ExponentialDistribution(0.8)
smile> val gamma = new GammaDistribution(2.0, 3.0)

// generate the samples
smile> val data = Array.fill(500)(gaussian.rand()) ++ Array.fill(500)(exp.rand()) ++ Array.fill(1000)(gamma.rand())

// define the initial guess of the components in the mixture model
smile> val a = new Mixture.Component(0.3, new GaussianDistribution(0.0, 1.0))
smile> val b = new Mixture.Component(0.3, new ExponentialDistribution(1.0))
smile> val c = new Mixture.Component(0.4, new GammaDistribution(1.0, 2.0))

// estimate the model
smile> val mixture = ExponentialFamilyMixture.fit(data, a, b, c)
mixture: smile.stat.distribution.ExponentialFamilyMixture = Mixture[3]:{ (Gaussian Distribution(-2.0135, 0.9953):0.2478) (Exponential Distribution(0.7676):0.2882) (Gamma Distribution(2.7008, 2.4051):0.4640)}
``````
``````
smile> var gaussian = new GaussianDistribution(-2.0, 1.0)
gaussian ==> Gaussian Distribution(-2.0000, 1.0000)

smile> var exp = new ExponentialDistribution(0.8)
exp ==> Exponential Distribution(0.8000)

smile> var gamma = new GammaDistribution(2.0, 3.0)
gamma ==> Gamma Distribution(3.0000, 2.0000)

smile>     import java.util.stream.*

smile> var data = DoubleStream.concat(
DoubleStream.concat(
DoubleStream.generate(gaussian::rand).limit(500),
DoubleStream.generate(exp::rand).limit(500)),
DoubleStream.generate(gamma::rand).limit(1000)).toArray()
data ==> double[2000] { -2.396693222610913, -3.11796309434 ... 0928, 2.995037488374675, 1

smile> var a = new Mixture.Component(0.3, new GaussianDistribution(0.0, 1.0))
a ==> smile.stat.distribution.Mixture\$Component@32a068d1

smile> var b = new Mixture.Component(0.3, new ExponentialDistribution(1.0))
b ==> smile.stat.distribution.Mixture\$Component@365c30cc

smile> var c = new Mixture.Component(0.4, new GammaDistribution(1.0, 2.0))
c ==> smile.stat.distribution.Mixture\$Component@4148db48

smile> var mixture = ExponentialFamilyMixture.fit(data, a, b, c)
mixture ==> Mixture(3)[0.31 x Gaussian Distribution(-1.3630, 1.5056) + 0.17 x Exponential Distribution(0.5566) + 0.52 x Gamma Distribution(3.7170, 1.5014)]
``````

If the distribution family is not known, nonparametric methods such as kernel density estimation can be used. Kernel density estimation is a fundamental data smoothing problem where inferences about the population are made, based on a finite data sample. It is also known as the Parzen window method.

``````
smile> val k = new KernelDensity(data)
k: smile.stat.distribution.KernelDensity = smile.stat.distribution.KernelDensity@69724abb

smile> k.p(1.0)
res2: Double = 0.11397721599552492

smile> mixture.p(1.0)
res3: Double = 0.1272572973513569
``````
``````
smile> var k = new KernelDensity(data)
k ==> smile.stat.distribution.KernelDensity@146044d7

smile> k.p(1)
\$32 ==> 0.11955905354604122

smile> mixture.p(1)
\$33 ==> 0.14009430199392497
``````

## Hypothesis Test

A statistical hypothesis test is a method of making decisions using data, whether from a controlled experiment or an observational study (not controlled). In statistics, a result is called statistically significant if it is unlikely to have occurred by chance alone, according to a pre-determined threshold probability, the significance level.

### χ2 Test

#### One-Sample Test

Given the array x containing the observed numbers of events, and an array prob containing the expected probabilities of events, and given the number of constraints (normally one), a small value of p-value indicates a significant difference between the distributions.

``````
smile> val bins = Array(20, 22, 13, 22, 10, 13)
bins: Array[Int] = Array(20, 22, 13, 22, 10, 13)

smile> val prob = Array(1.0/6, 1.0/6, 1.0/6, 1.0/6, 1.0/6, 1.0/6)
prob: Array[Double] = Array(0.16666666666666666, 0.16666666666666666, 0.16666666666666666, 0.16666666666666666, 0.16666666666666666, 0.16666666666666666)

smile> chisqtest(bins, prob)
res8: stat.hypothesis.ChiSqTest = One Sample Chi-squared Test(t = 8.3600, df = 5.000, p-value = 0.137480)
``````
``````
smile> import smile.stat.hypothesis.*

smile> int[] bins = {20, 22, 13, 22, 10, 13}
bins ==> int[6] { 20, 22, 13, 22, 10, 13 }

smile> double[] prob = {1.0/6, 1.0/6, 1.0/6, 1.0/6, 1.0/6, 1.0/6}
prob ==> double[6] { 0.16666666666666666, 0.16666666666666 ... 666, 0.16666666666666666 }

smile> ChiSqTest.test(bins, prob)
\$37 ==> One Sample Chi-squared Test(t = 8.3600, df = 5.000, p-value = 0.137480)
``````

#### Two-Sample Test

Two-sample chisq test. Given the arrays x and y, containing two sets of binned data, and given one constraint, a small value of p-value indicates a significant difference between two distributions.

``````
smile> val bins1 = Array(8, 13, 16, 10, 3)
bins1: Array[Int] = Array(8, 13, 16, 10, 3)

smile> val bins2 = Array(4,  9, 14, 16, 7)
bins2: Array[Int] = Array(4, 9, 14, 16, 7)

smile> chisqtest2(bins1, bins2)
res11: stat.hypothesis.ChiSqTest = Two Sample Chi-squared Test(t = 5.1786, df = 4.000, p-value = 0.269462)
``````
``````
smile> int[] bins1 = {8, 13, 16, 10, 3}
bins1 ==> int[5] { 8, 13, 16, 10, 3 }

smile> int[] bins2 = {4,  9, 14, 16, 7}
bins2 ==> int[5] { 4, 9, 14, 16, 7 }

smile> ChiSqTest.test(bins1, bins2)
\$40 ==> Two Sample Chi-squared Test(t = 5.1786, df = 4.000, p-value = 0.269462)
``````

#### Independence Test

Independence test on a two-dimensional contingency table in the form of an array of integers. The rows of contingency table are labels by the values of one nominal variable, the columns are labels by the values of the other nominal variable, and whose entries are non-negative integers giving the number of observed events for each combination of row and column. Continuity correction will be applied when computing the test statistic for 2x2 tables: one half is subtracted from all |O-E| differences. The correlation coefficient is calculated as Cramer's V.

``````
smile> val x = Array(Array(12, 7), Array(5, 7))
x: Array[Array[Int]] = Array(Array(12, 7), Array(5, 7))

smile> chisqtest(x)
res13: stat.hypothesis.ChiSqTest = Pearson's Chi-squared Test(t = 0.6411, df = 1.000, p-value = 0.423305)
``````
``````
smile> int[][] x = { {12, 7}, {5, 7} }
x ==> int[2][] { int[2] { 12, 7 }, int[2] { 5, 7 } }

smile> ChiSqTest.test(x)
\$42 ==> Pearson's Chi-squared Test(t = 0.6411, df = 1.000, p-value = 0.423305)
``````

### F Test

Test if the arrays x and y have significantly different variances. Small values of p-value indicate that the two arrays have significantly different variances.

``````
smile> val x = Array(0.48074284, -0.52975023, 1.28590721, 0.63456079, -0.41761197, 2.76072411,
1.30321095, -1.16454533, 2.27210509, 1.46394553, -0.31713164, 1.26247543,
2.65886430, 0.40773450, 1.18055440, -0.39611251, 2.13557687, 0.40878860,
1.28461394, -0.02906355)
x: Array[Double] = Array(
0.48074284,
-0.52975023,
1.28590721,
0.63456079,
-0.41761197,
2.76072411,
1.30321095,
-1.16454533,
2.27210509,
1.46394553,
-0.31713164,
1.26247543,
2.6588643,
0.4077345,
1.1805544,
-0.39611251,
2.13557687,
0.4087886,
1.28461394,
-0.02906355
)

smile> val y = Array(1.7495879, 1.9359727, 3.1294928, 0.0861894, 2.1643415, 0.1913219,
-0.3947444, 1.6910837, 1.1548294, 0.2763955, 0.4794719, 3.1805501,
1.5700497, 2.6860190, -0.4410879, 1.8900183, 1.3422381, -0.1701592)
y: Array[Double] = Array(
1.7495879,
1.9359727,
3.1294928,
0.0861894,
2.1643415,
0.1913219,
-0.3947444,
1.6910837,
1.1548294,
0.2763955,
0.4794719,
3.1805501,
1.5700497,
2.686019,
-0.4410879,
1.8900183,
1.3422381,
-0.1701592
)

smile> ftest(x, y)
res16: stat.hypothesis.FTest = F-test(f = 1.0958, df1 = 17, df2 = 19, p-value = 0.841464)

smile> val z = Array(0.6621329, 0.4688975, -0.1553013, 0.4564548, 2.2776146, 2.1543678,
2.8555142, 1.5852899, 0.9091290, 1.6060025, 1.0111968, 1.2479493,
0.9407034, 1.7167572, 0.5380608, 2.1290007, 1.8695506, 1.2139096)
z: Array[Double] = Array(
0.6621329,
0.4688975,
-0.1553013,
0.4564548,
2.2776146,
2.1543678,
2.8555142,
1.5852899,
0.909129,
1.6060025,
1.0111968,
1.2479493,
0.9407034,
1.7167572,
0.5380608,
2.1290007,
1.8695506,
1.2139096
)

smile> ftest(x, z)
res18: stat.hypothesis.FTest = F-test(f = 2.0460, df1 = 19, df2 = 17, p-value = 0.143778)
``````
``````
smile> double[] x = {0.48074284, -0.52975023, 1.28590721, 0.63456079, -0.41761197, 2.76072411,
1.30321095, -1.16454533, 2.27210509, 1.46394553, -0.31713164, 1.26247543,
2.65886430, 0.40773450, 1.18055440, -0.39611251, 2.13557687, 0.40878860,
1.28461394, -0.02906355}
x ==> double[20] { 0.48074284, -0.52975023, 1.28590721, ...  1.28461394, -0.02906355 }

smile> double[] y = {1.7495879, 1.9359727, 3.1294928, 0.0861894, 2.1643415, 0.1913219,
-0.3947444, 1.6910837, 1.1548294, 0.2763955, 0.4794719, 3.1805501,
1.5700497, 2.6860190, -0.4410879, 1.8900183, 1.3422381, -0.1701592}
y ==> double[18] { 1.7495879, 1.9359727, 3.1294928, 0.0 ... 3, 1.3422381, -0.1701592 }

smile> FTest.test(x, y)
\$45 ==> F-test(f = 1.0958, df1 = 17, df2 = 19, p-value = 0.841464)

smile> double[] z = {0.6621329, 0.4688975, -0.1553013, 0.4564548, 2.2776146, 2.1543678,
2.8555142, 1.5852899, 0.9091290, 1.6060025, 1.0111968, 1.2479493,
0.9407034, 1.7167572, 0.5380608, 2.1290007, 1.8695506, 1.2139096}
z ==> double[18] { 0.6621329, 0.4688975, -0.1553013, 0. ... 07, 1.8695506, 1.2139096 }

smile> FTest.test(x, z)
\$47 ==> F-test(f = 2.0460, df1 = 19, df2 = 17, p-value = 0.143778)
``````

### t Test

#### One-Sample Test

Independent one-sample t-test whether the mean of a normally distributed population has a value specified in a null hypothesis. Small values of p-value indicate that the array has significantly different mean.

``````
smile> ttest(x, 1.0)
res19: stat.hypothesis.TTest = One Sample t-test(t = -0.6641, df = 19.000, p-value = 0.514609)

smile> ttest(x, 1.1)
res20: stat.hypothesis.TTest = One Sample t-test(t = -1.0648, df = 19.000, p-value = 0.300300)
``````
``````
smile> TTest.test(x, 1.0)
\$48 ==> One Sample t-test(t = -0.6641, df = 19.000, p-value = 0.514609)

smile> TTest.test(x, 1.1)
\$49 ==> One Sample t-test(t = -1.0648, df = 19.000, p-value = 0.300300)
``````

#### Paired Two-Sample Test

Given the paired arrays x and y, test if they have significantly different means. Small values of p-value indicate that the two arrays have significantly different means.

``````
smile> ttest(y, z)
res21: stat.hypothesis.TTest = Paired t-test(t = -0.1502, df = 17.000, p-value = 0.882382)
``````
``````
smile> TTest.testPaired(y, z)
\$53 ==> Paired t-test(t = -0.1502, df = 17.000, p-value = 0.882382)
``````

#### Independent (Unpaired) Two-Sample Test

Test if the arrays x and y have significantly different means. Small values of p-value indicate that the two arrays have significantly different means. If the parameter equalVariance is true, the data arrays are assumed to be drawn from populations with the same true variance. Otherwise, The data arrays are allowed to be drawn from populations with unequal variances.

``````
smile> ttest2(x, y)
res22: stat.hypothesis.TTest = Unequal Variance Two Sample t-test(t = -1.1219, df = 35.167, p-value = 0.269491)

smile> ttest2(x, y, true)
res23: stat.hypothesis.TTest = Equal Variance Two Sample t-test(t = -1.1247, df = 36.000, p-value = 0.268153)

smile> ttest2(x, z)
res24: stat.hypothesis.TTest = Unequal Variance Two Sample t-test(t = -1.5180, df = 34.025, p-value = 0.138243)

smile> ttest2(x, z, true)
res25: stat.hypothesis.TTest = Equal Variance Two Sample t-test(t = -1.4901, df = 36.000, p-value = 0.144906)
``````
``````
smile> TTest.test(x, y, false)
\$54 ==> Unequal Variance Two Sample t-test(t = -1.1219, df = 35.167, p-value = 0.269491)

smile> TTest.test(x, y, true)
\$55 ==> Equal Variance Two Sample t-test(t = -1.1247, df = 36.000, p-value = 0.268153)

smile> TTest.test(x, z, false)
\$56 ==> Unequal Variance Two Sample t-test(t = -1.5180, df = 34.025, p-value = 0.138243)

smile> TTest.test(x, z, true)
\$57 ==> Equal Variance Two Sample t-test(t = -1.4901, df = 36.000, p-value = 0.144906)
``````

### Kolmogorov–Smirnov Test

#### One-Sample Test

The one-sample K-S test for the null hypothesis that the data set x is drawn from the given distribution. Small values of p-value show that the cumulative distribution function of x is significantly different from the given distribution. The array x is modified by being sorted into ascending order.

``````
smile> val x = Array(
0.53236606, -1.36750258, -1.47239199, -0.12517888, -1.24040594, 1.90357309,
-0.54429527, 2.22084140, -1.17209146, -0.68824211, -1.75068914, 0.48505896,
2.75342248, -0.90675303, -1.05971929, 0.49922388, -1.23214498, 0.79284888,
0.85309580, 0.17903487, 0.39894754, -0.52744720, 0.08516943, -1.93817962,
0.25042913, -0.56311389, -1.08608388, 0.11912253, 2.87961007, -0.72674865,
1.11510699, 0.39970074, 0.50060532, -0.82531807, 0.14715616, -0.96133601,
-0.95699473, -0.71471097, -0.50443258, 0.31690224, 0.04325009, 0.85316056,
0.83602606, 1.46678847, 0.46891827, 0.69968175, 0.97864326, 0.66985742,
-0.20922486, -0.15265994)
x: Array[Double] = Array(
0.53236606,
-1.36750258,
-1.47239199,
-0.12517888,
-1.24040594,
1.90357309,
-0.54429527,
2.2208414,
-1.17209146,
-0.68824211,
-1.75068914,
0.48505896,
2.75342248,
-0.90675303,
-1.05971929,
0.49922388,
-1.23214498,
0.79284888,
0.8530958,
0.17903487,
0.39894754,
-0.5274472,
0.08516943,
-1.93817962,
...

smile> kstest(x, new GaussianDistribution(0, 1))
res27: stat.hypothesis.KSTest = Gaussian Distribution(0.0000, 1.0000) Kolmogorov-Smirnov Test(d = 0.0930, p-value = 0.759824)
``````
``````
smile> double[] x = {0.53236606, -1.36750258, -1.47239199, -0.12517888, -1.24040594, 1.90357309,
-0.54429527, 2.22084140, -1.17209146, -0.68824211, -1.75068914, 0.48505896,
2.75342248, -0.90675303, -1.05971929, 0.49922388, -1.23214498, 0.79284888,
0.85309580, 0.17903487, 0.39894754, -0.52744720, 0.08516943, -1.93817962,
0.25042913, -0.56311389, -1.08608388, 0.11912253, 2.87961007, -0.72674865,
1.11510699, 0.39970074, 0.50060532, -0.82531807, 0.14715616, -0.96133601,
-0.95699473, -0.71471097, -0.50443258, 0.31690224, 0.04325009, 0.85316056,
0.83602606, 1.46678847, 0.46891827, 0.69968175, 0.97864326, 0.66985742,
-0.20922486, -0.15265994}
x ==> double[50] { 0.53236606, -1.36750258, -1.47239199 ... -0.20922486, -0.15265994 }

smile> KSTest.test(x, new GaussianDistribution(0, 1))
\$59 ==> Gaussian Distribution(0.0000, 1.0000) Kolmogorov-Smirnov Test(d = 0.0930, p-value = 0.759824)
``````

#### Two-Sample Test

The two-sample K–S for the null hypothesis that the data sets are drawn from the same distribution. Small values of p-value show that the cumulative distribution function of x is significantly different from that of y. The arrays x and y are modified by being sorted into ascending order.

``````
smile> val y = Array(
0.95791391, 0.16203847, 0.56622013, 0.39252941, 0.99126354, 0.65639108,
0.07903248, 0.84124582, 0.76718719, 0.80756577, 0.12263981, 0.84733360,
0.85190907, 0.77896244, 0.84915723, 0.78225903, 0.95788055, 0.01849366,
0.21000365, 0.97951772, 0.60078520, 0.80534223, 0.77144013, 0.28495121,
0.41300867, 0.51547517, 0.78775718, 0.07564151, 0.82871088, 0.83988694)
y: Array[Double] = Array(
0.95791391,
0.16203847,
0.56622013,
0.39252941,
0.99126354,
0.65639108,
0.07903248,
0.84124582,
0.76718719,
0.80756577,
0.12263981,
0.8473336,
0.85190907,
0.77896244,
0.84915723,
0.78225903,
0.95788055,
0.01849366,
0.21000365,
0.97951772,
0.6007852,
0.80534223,
0.77144013,
0.28495121,
...

smile> kstest(x, y)
res29: stat.hypothesis.KSTest = Two Sample Kolmogorov-Smirnov Test(d = 0.4600, p-value = 0.000416466)
``````
``````
smile> KSTest.test(x, new GaussianDistribution(0, 1))
\$59 ==> Gaussian Distribution(0.0000, 1.0000) Kolmogorov-Smirnov Test(d = 0.0930, p-value = 0.759824)

smile> double[] y = {0.95791391, 0.16203847, 0.56622013, 0.39252941, 0.99126354, 0.65639108,
0.07903248, 0.84124582, 0.76718719, 0.80756577, 0.12263981, 0.84733360,
0.85190907, 0.77896244, 0.84915723, 0.78225903, 0.95788055, 0.01849366,
0.21000365, 0.97951772, 0.60078520, 0.80534223, 0.77144013, 0.28495121,
0.41300867, 0.51547517, 0.78775718, 0.07564151, 0.82871088, 0.83988694}
y ==> double[30] { 0.95791391, 0.16203847, 0.56622013,  ... , 0.82871088, 0.83988694 }

smile> KSTest.test(x, y)
\$61 ==> Two Sample Kolmogorov-Smirnov Test(d = 0.4600, p-value = 0.000416466)
``````

### Correlation Test

#### Pearson Correlation

The t-test is used to establish if the correlation coefficient is significantly different from zero, and, hence that there is evidence of an association between the two variables. There is then the underlying assumption that the data is from a normal distribution sampled randomly. If this is not true, then it is better to use Spearman's coefficient of rank correlation (for non-parametric variables).

``````
smile> val x = Array(44.4, 45.9, 41.9, 53.3, 44.7, 44.1, 50.7, 45.2, 60.1)
x: Array[Double] = Array(44.4, 45.9, 41.9, 53.3, 44.7, 44.1, 50.7, 45.2, 60.1)

smile> val y  = Array(2.6,  3.1,  2.5,  5.0,  3.6,  4.0,  5.2,  2.8,  3.8)
y: Array[Double] = Array(2.6, 3.1, 2.5, 5.0, 3.6, 4.0, 5.2, 2.8, 3.8)

smile> pearsontest(x, y)
res32: stat.hypothesis.CorTest = Pearson Correlation Test(cor = 0.57, t = 1.8411, df = 7.000, p-value = 0.108173)
``````
``````
smile> double[] x = {44.4, 45.9, 41.9, 53.3, 44.7, 44.1, 50.7, 45.2, 60.1}
x ==> double[9] { 44.4, 45.9, 41.9, 53.3, 44.7, 44.1, 50.7, 45.2, 60.1 }

smile> double[] y = {2.6,  3.1,  2.5,  5.0,  3.6,  4.0,  5.2,  2.8,  3.8}
y ==> double[9] { 2.6, 3.1, 2.5, 5.0, 3.6, 4.0, 5.2, 2.8, 3.8 }

smile> CorTest.pearson(x, y)
\$64 ==> Pearson Correlation Test(cor = 0.57, t = 1.8411, df = 7.000, p-value = 0.108173)
``````

#### Spearman Rank Correlation

The Spearman Rank Correlation Coefficient is a form of the Pearson coefficient with the data converted to rankings (i.e. when variables are ordinal). It can be used when there is non-parametric data and hence Pearson cannot be used.

The raw scores are converted to ranks and the differences between the ranks of each observation on the two variables are calculated.

The p-value is calculated by approximation, which is good for n > 10.

``````
smile> spearmantest(x, y)
res33: stat.hypothesis.CorTest = Spearman Correlation Test(cor = 0.60, t = 1.9843, df = 7.000, p-value = 0.0876228)
``````
``````
smile> CorTest.spearman(x, y)
\$65 ==> Spearman Correlation Test(cor = 0.60, t = 1.9843, df = 7.000, p-value = 0.0876228)
``````

#### Kendall Rank Correlation

The Kendall Tau Rank Correlation Coefficient is used to measure the degree of correspondence between sets of rankings where the measures are not equidistant. It is used with non-parametric data. The p-value is calculated by approximation, which is good for n > 10.

``````
smile> kendalltest(x, y)
res34: stat.hypothesis.CorTest = Kendall Correlation Test(cor = 0.44, t = 1.6681, df = 0.000, p-value = 0.0952928)
``````
``````
smile> CorTest.kendall(x, y)
\$66 ==> Kendall Correlation Test(cor = 0.44, t = 1.6681, df = 0.000, p-value = 0.0952928)
``````