Package smile.math
Interface Histogram
public interface Histogram
Histogram utilities. A histogram is a graphical display of tabulated
frequencies, shown as bars. It shows what proportion of cases fall into
each of several categories: it is a form of data binning. The categories
are usually specified as nonoverlapping intervals of some variable.
There is no "best" number of bins, and different bin sizes can reveal different features of the data. Depending on the actual data distribution and the goals of the analysis, different bin widths may be appropriate, so experimentation is usually needed to determine an appropriate width.
Note that this class provides only tools to choose the bin width or the number of bins and frequency counting. It does NOT provide plotting services.

Method Summary
Modifier and TypeMethodDescriptionstatic int
bins
(double[] x, double h) Returns the number of bins for a data based on a suggested bin width h.static int
bins
(int n) Returns the number of bins by squareroot rule, which takes the square root of the number of data points in the sample (used by Excel histograms and many others).static double[]
breaks
(double[] x, double h) Returns the breakpoints between histogram cells for a dataset based on a suggested bin width h.static double[]
breaks
(double[] x, int k) Returns the breakpoints between histogram cells for a dataset.static double[]
breaks
(double min, double max, double h) Returns the breakpoints between histogram cells for a given range based on a suggested bin width h.static double[]
breaks
(double min, double max, int k) Returns the breakpoints between histogram cells for a given range.static double[][]
of
(double[] data) Generate the histogram of given data.static double[][]
of
(double[] data, double[] breaks) Generate the histogram of n bins.static double[][]
of
(double[] data, int k) Generate the histogram of n bins.static double[][]
of
(float[] data) Generate the histogram of given data.static double[][]
of
(float[] data, float[] breaks) Generate the histogram of n bins.static double[][]
of
(float[] data, int k) Generate the histogram of n bins.static double[][]
of
(int[] data) Generate the histogram of given data.static double[][]
of
(int[] data, double[] breaks) Generate the histogram of n bins.static double[][]
of
(int[] data, int k) Generate the histogram of k bins.static int
scott
(double[] x) Returns the number of bins by Scott's rule h = 3.5 * σ / (n^{1/3}).static int
sturges
(int n) Returns the number of bins by Sturges' rule k = ceil(log2(n) + 1).

Method Details

of
static double[][] of(int[] data) Generate the histogram of given data. The number of bins k is decided by squareroot choice. Parameters:
data
 the data points. Returns:
 a 3byk bins array of which first row is the lower bound of bins, second row is the upper bound of bins, and the third row is the frequency count.

of
static double[][] of(float[] data) Generate the histogram of given data. The number of bins k is decided by squareroot choice. Parameters:
data
 the data points. Returns:
 a 3byk bins array of which first row is the lower bound of bins, second row is the upper bound of bins, and the third row is the frequency count.

of
static double[][] of(double[] data) Generate the histogram of given data. The number of bins k is decided by squareroot choice. Parameters:
data
 the data points. Returns:
 a 3byk bins array of which first row is the lower bound of bins, second row is the upper bound of bins, and the third row is the frequency count.

of
static double[][] of(int[] data, int k) Generate the histogram of k bins. Parameters:
data
 the data points.k
 the number of bins. Returns:
 a 3byk bins array of which first row is the lower bound of bins, second row is the upper bound of bins, and the third row is the frequency count.

of
static double[][] of(int[] data, double[] breaks) Generate the histogram of n bins. Parameters:
data
 the data points.breaks
 an array of size k+1 giving the breakpoints between histogram cells. Must be in ascending order. Returns:
 a 3byn bins array of which first row is the lower bound of bins, second row is the upper bound of bins, and the third row is the frequency count.

of
static double[][] of(float[] data, int k) Generate the histogram of n bins. Parameters:
data
 the data points.k
 the number of bins. Returns:
 a 3byk bins array of which first row is the lower bound of bins, second row is the upper bound of bins, and the third row is the frequency count.

of
static double[][] of(float[] data, float[] breaks) Generate the histogram of n bins. Parameters:
data
 the data points.breaks
 an array of size k+1 giving the breakpoints between histogram cells. Must be in ascending order. Returns:
 a 3byk bins array of which first row is the lower bound of bins, second row is the upper bound of bins, and the third row is the frequency count.

of
static double[][] of(double[] data, int k) Generate the histogram of n bins. Parameters:
data
 the data points.k
 the number of bins. Returns:
 a 3byk array of which first row is the lower bound of bins, second row is the upper bound of bins, and the third row is the frequency count.

of
static double[][] of(double[] data, double[] breaks) Generate the histogram of n bins. Parameters:
data
 the data points.breaks
 an array of size k+1 giving the breakpoints between histogram cells. Must be in ascending order. Returns:
 a 3byk bins array of which first row is the lower bound of bins, second row is the upper bound of bins, and the third row is the frequency count.

breaks
static double[] breaks(double[] x, double h) Returns the breakpoints between histogram cells for a dataset based on a suggested bin width h. Parameters:
x
 the data set.h
 the bin width. Returns:
 the breakpoints between histogram cells

breaks
static double[] breaks(double min, double max, double h) Returns the breakpoints between histogram cells for a given range based on a suggested bin width h. Parameters:
min
 the lower bound of bins.max
 the upper bound of bins.h
 the bin width. Returns:
 the breakpoints between histogram cells

breaks
static double[] breaks(double[] x, int k) Returns the breakpoints between histogram cells for a dataset. Parameters:
x
 the data set.k
 the number of bins. Returns:
 the breakpoints between histogram cells

breaks
static double[] breaks(double min, double max, int k) Returns the breakpoints between histogram cells for a given range. Parameters:
min
 the lower bound of bins.max
 the upper bound of bins.k
 the number of bins. Returns:
 the breakpoints between histogram cells

bins
static int bins(double[] x, double h) Returns the number of bins for a data based on a suggested bin width h. Parameters:
x
 the data set.h
 the bin width. Returns:
 the number of bins k = ceil((max  min) / h)

bins
static int bins(int n) Returns the number of bins by squareroot rule, which takes the square root of the number of data points in the sample (used by Excel histograms and many others). Parameters:
n
 the number of data points. Returns:
 the number of bins

sturges
static int sturges(int n) Returns the number of bins by Sturges' rule k = ceil(log2(n) + 1). Parameters:
n
 the number of data points. Returns:
 the number of bins

scott
static int scott(double[] x) Returns the number of bins by Scott's rule h = 3.5 * σ / (n^{1/3}). Parameters:
x
 the data set. Returns:
 the number of bins
