Package smile.math

Interface Histogram


public interface Histogram
Histogram utilities. A histogram is a graphical display of tabulated frequencies, shown as bars. It shows what proportion of cases fall into each of several categories: it is a form of data binning. The categories are usually specified as non-overlapping intervals of some variable.

There is no "best" number of bins, and different bin sizes can reveal different features of the data. Depending on the actual data distribution and the goals of the analysis, different bin widths may be appropriate, so experimentation is usually needed to determine an appropriate width.

Note that this class provides only tools to choose the bin width or the number of bins and frequency counting. It does NOT provide plotting services.

  • Method Summary

    Static Methods
    Modifier and Type
    Method
    Description
    static int
    bins(double[] x, double h)
    Returns the number of bins for a data based on a suggested bin width h.
    static int
    bins(int n)
    Returns the number of bins by square-root rule, which takes the square root of the number of data points in the sample (used by Excel histograms and many others).
    static double[]
    breaks(double[] x, double h)
    Returns the breakpoints between histogram cells for a dataset based on a suggested bin width h.
    static double[]
    breaks(double[] x, int k)
    Returns the breakpoints between histogram cells for a dataset.
    static double[]
    breaks(double min, double max, double h)
    Returns the breakpoints between histogram cells for a given range based on a suggested bin width h.
    static double[]
    breaks(double min, double max, int k)
    Returns the breakpoints between histogram cells for a given range.
    static double[][]
    of(double[] data)
    Generate the histogram of given data.
    static double[][]
    of(double[] data, double[] breaks)
    Generate the histogram of n bins.
    static double[][]
    of(double[] data, int k)
    Generate the histogram of n bins.
    static double[][]
    of(float[] data)
    Generate the histogram of given data.
    static double[][]
    of(float[] data, float[] breaks)
    Generate the histogram of n bins.
    static double[][]
    of(float[] data, int k)
    Generate the histogram of n bins.
    static double[][]
    of(int[] data)
    Generate the histogram of given data.
    static double[][]
    of(int[] data, double[] breaks)
    Generate the histogram of n bins.
    static double[][]
    of(int[] data, int k)
    Generate the histogram of k bins.
    static int
    scott(double[] x)
    Returns the number of bins by Scott's rule h = 3.5 * σ / (n1/3).
    static int
    sturges(int n)
    Returns the number of bins by Sturges' rule k = ceil(log2(n) + 1).
  • Method Details

    • of

      static double[][] of(int[] data)
      Generate the histogram of given data. The number of bins k is decided by square-root choice.
      Parameters:
      data - the data points.
      Returns:
      a 3-by-k bins array of which first row is the lower bound of bins, second row is the upper bound of bins, and the third row is the frequency count.
    • of

      static double[][] of(float[] data)
      Generate the histogram of given data. The number of bins k is decided by square-root choice.
      Parameters:
      data - the data points.
      Returns:
      a 3-by-k bins array of which first row is the lower bound of bins, second row is the upper bound of bins, and the third row is the frequency count.
    • of

      static double[][] of(double[] data)
      Generate the histogram of given data. The number of bins k is decided by square-root choice.
      Parameters:
      data - the data points.
      Returns:
      a 3-by-k bins array of which first row is the lower bound of bins, second row is the upper bound of bins, and the third row is the frequency count.
    • of

      static double[][] of(int[] data, int k)
      Generate the histogram of k bins.
      Parameters:
      data - the data points.
      k - the number of bins.
      Returns:
      a 3-by-k bins array of which first row is the lower bound of bins, second row is the upper bound of bins, and the third row is the frequency count.
    • of

      static double[][] of(int[] data, double[] breaks)
      Generate the histogram of n bins.
      Parameters:
      data - the data points.
      breaks - an array of size k+1 giving the breakpoints between histogram cells. Must be in ascending order.
      Returns:
      a 3-by-n bins array of which first row is the lower bound of bins, second row is the upper bound of bins, and the third row is the frequency count.
    • of

      static double[][] of(float[] data, int k)
      Generate the histogram of n bins.
      Parameters:
      data - the data points.
      k - the number of bins.
      Returns:
      a 3-by-k bins array of which first row is the lower bound of bins, second row is the upper bound of bins, and the third row is the frequency count.
    • of

      static double[][] of(float[] data, float[] breaks)
      Generate the histogram of n bins.
      Parameters:
      data - the data points.
      breaks - an array of size k+1 giving the breakpoints between histogram cells. Must be in ascending order.
      Returns:
      a 3-by-k bins array of which first row is the lower bound of bins, second row is the upper bound of bins, and the third row is the frequency count.
    • of

      static double[][] of(double[] data, int k)
      Generate the histogram of n bins.
      Parameters:
      data - the data points.
      k - the number of bins.
      Returns:
      a 3-by-k array of which first row is the lower bound of bins, second row is the upper bound of bins, and the third row is the frequency count.
    • of

      static double[][] of(double[] data, double[] breaks)
      Generate the histogram of n bins.
      Parameters:
      data - the data points.
      breaks - an array of size k+1 giving the breakpoints between histogram cells. Must be in ascending order.
      Returns:
      a 3-by-k bins array of which first row is the lower bound of bins, second row is the upper bound of bins, and the third row is the frequency count.
    • breaks

      static double[] breaks(double[] x, double h)
      Returns the breakpoints between histogram cells for a dataset based on a suggested bin width h.
      Parameters:
      x - the data set.
      h - the bin width.
      Returns:
      the breakpoints between histogram cells
    • breaks

      static double[] breaks(double min, double max, double h)
      Returns the breakpoints between histogram cells for a given range based on a suggested bin width h.
      Parameters:
      min - the lower bound of bins.
      max - the upper bound of bins.
      h - the bin width.
      Returns:
      the breakpoints between histogram cells
    • breaks

      static double[] breaks(double[] x, int k)
      Returns the breakpoints between histogram cells for a dataset.
      Parameters:
      x - the data set.
      k - the number of bins.
      Returns:
      the breakpoints between histogram cells
    • breaks

      static double[] breaks(double min, double max, int k)
      Returns the breakpoints between histogram cells for a given range.
      Parameters:
      min - the lower bound of bins.
      max - the upper bound of bins.
      k - the number of bins.
      Returns:
      the breakpoints between histogram cells
    • bins

      static int bins(double[] x, double h)
      Returns the number of bins for a data based on a suggested bin width h.
      Parameters:
      x - the data set.
      h - the bin width.
      Returns:
      the number of bins k = ceil((max - min) / h)
    • bins

      static int bins(int n)
      Returns the number of bins by square-root rule, which takes the square root of the number of data points in the sample (used by Excel histograms and many others).
      Parameters:
      n - the number of data points.
      Returns:
      the number of bins
    • sturges

      static int sturges(int n)
      Returns the number of bins by Sturges' rule k = ceil(log2(n) + 1).
      Parameters:
      n - the number of data points.
      Returns:
      the number of bins
    • scott

      static int scott(double[] x)
      Returns the number of bins by Scott's rule h = 3.5 * σ / (n1/3).
      Parameters:
      x - the data set.
      Returns:
      the number of bins