Package smile.stat

Interface Sampling


public interface Sampling
Random sampling is the selection of a subset of individuals from within a statistical population to estimate characteristics of the whole population.
  • Method Summary

    Static Methods
    Modifier and Type
    Method
    Description
    static int[][]
    latin(int n, int d)
    Latin hypercube sampling.
    static int[]
    random(int n, double subsample)
    Simple random sampling.
    static int[][]
    strata(int[] category)
    Returns the strata of samples as a two-dimensional array.
    static int[]
    stratify(int[] category, double subsample)
    Stratified sampling from a population which can be partitioned into subpopulations.
  • Method Details

    • random

      static int[] random(int n, double subsample)
      Simple random sampling. All samples have an equal probability of being selected.
      Parameters:
      n - the size of samples.
      subsample - sampling rate. Draw samples with replacement if it is 1.0.
      Returns:
      the indices of selected samples.
    • strata

      static int[][] strata(int[] category)
      Returns the strata of samples as a two-dimensional array. Each row is the sample indices of stratum.
      Parameters:
      category - the strata labels.
      Returns:
      the strata of samples as a two-dimensional array. Each row is the sample indices of stratum.
    • stratify

      static int[] stratify(int[] category, double subsample)
      Stratified sampling from a population which can be partitioned into subpopulations. In statistical surveys, when subpopulations within an overall population vary, it could be advantageous to sample each subpopulation (stratum) independently.

      Stratification is the process of dividing members of the population into homogeneous subgroups before sampling. The strata should define a partition of the population. That is, it should be collectively exhaustive and mutually exclusive: every element in the population must be assigned to one and only one stratum. Then simple random sampling is applied within each stratum.

      Parameters:
      category - the strata labels.
      subsample - sampling rate. Draw samples with replacement if it is 1.0.
      Returns:
      the indices of selected samples.
    • latin

      static int[][] latin(int n, int d)
      Latin hypercube sampling. LHS generates a near-random sample of parameter values from a multidimensional distribution. The sampling method is often used to construct Monte Carlo simulation.

      A Latin hypercube is an n-by-d matrix, each column of which is a permutation of 1, 2, ..., n. When sampling a function of d variables, the range of each variable is divided into n equally probable intervals. n sample points are then placed to satisfy the Latin hypercube requirements; this forces the number of divisions, n, to be equal for each variable.

      This sampling scheme does not require more samples for more dimensions (variables); this independence is one of the main advantages of this sampling scheme. Another advantage is that random samples can be taken one at a time, remembering which samples were taken so far.

      Because the component samples are randomly paired, an LHS is not unique; there are (d!)n-1 possible combinations. With this in mind, improved LHS algorithms iterate to determine optimal pairings according to some specified criteria - such as reduced correlation among the terms or enhanced space-filling properties.

      A randomly generated Latin Hypercube may be quite structured: the design may not have good univariate projection uniformity or the different columns might be highly correlated. Several criteria such as maximin distance and minimum correlation have been proposed to address these issues.

      Parameters:
      n - the number of divisions, also the number of samples.
      d - the dimensionality, i.e. the number of variables.
      Returns:
      Latin hypercube of n-by-d matrix.