Package smile.math
Interface Histogram
public interface Histogram
Histogram utilities. A histogram is a graphical display of tabulated
frequencies, shown as bars. It shows what proportion of cases fall into
each of several categories: it is a form of data binning. The categories
are usually specified as non-overlapping intervals of some variable.
There is no "best" number of bins, and different bin sizes can reveal different features of the data. Depending on the actual data distribution and the goals of the analysis, different bin widths may be appropriate, so experimentation is usually needed to determine an appropriate width.
Note that this class provides only tools to choose the bin width or the number of bins and frequency counting. It does NOT provide plotting services.
-
Method Summary
Modifier and TypeMethodDescriptionstatic int
bins
(double[] x, double h) Returns the number of bins for a data based on a suggested bin width h.static int
bins
(int n) Returns the number of bins by square-root rule, which takes the square root of the number of data points in the sample (used by Excel histograms and many others).static double[]
breaks
(double[] x, double h) Returns the breakpoints between histogram cells for a dataset based on a suggested bin width h.static double[]
breaks
(double[] x, int k) Returns the breakpoints between histogram cells for a dataset.static double[]
breaks
(double min, double max, double h) Returns the breakpoints between histogram cells for a given range based on a suggested bin width h.static double[]
breaks
(double min, double max, int k) Returns the breakpoints between histogram cells for a given range.static double[][]
of
(double[] data) Generate the histogram of given data.static double[][]
of
(double[] data, double[] breaks) Generate the histogram of n bins.static double[][]
of
(double[] data, int k) Generate the histogram of n bins.static double[][]
of
(float[] data) Generate the histogram of given data.static double[][]
of
(float[] data, float[] breaks) Generate the histogram of n bins.static double[][]
of
(float[] data, int k) Generate the histogram of n bins.static double[][]
of
(int[] data) Generate the histogram of given data.static double[][]
of
(int[] data, double[] breaks) Generate the histogram of n bins.static double[][]
of
(int[] data, int k) Generate the histogram of k bins.static int
scott
(double[] x) Returns the number of bins by Scott's rule h = 3.5 * σ / (n1/3).static int
sturges
(int n) Returns the number of bins by Sturges' rule k = ceil(log2(n) + 1).
-
Method Details
-
of
static double[][] of(int[] data) Generate the histogram of given data. The number of bins k is decided by square-root choice.- Parameters:
data
- the data points.- Returns:
- a 3-by-k bins array of which first row is the lower bound of bins, second row is the upper bound of bins, and the third row is the frequency count.
-
of
static double[][] of(float[] data) Generate the histogram of given data. The number of bins k is decided by square-root choice.- Parameters:
data
- the data points.- Returns:
- a 3-by-k bins array of which first row is the lower bound of bins, second row is the upper bound of bins, and the third row is the frequency count.
-
of
static double[][] of(double[] data) Generate the histogram of given data. The number of bins k is decided by square-root choice.- Parameters:
data
- the data points.- Returns:
- a 3-by-k bins array of which first row is the lower bound of bins, second row is the upper bound of bins, and the third row is the frequency count.
-
of
static double[][] of(int[] data, int k) Generate the histogram of k bins.- Parameters:
data
- the data points.k
- the number of bins.- Returns:
- a 3-by-k bins array of which first row is the lower bound of bins, second row is the upper bound of bins, and the third row is the frequency count.
-
of
static double[][] of(int[] data, double[] breaks) Generate the histogram of n bins.- Parameters:
data
- the data points.breaks
- an array of size k+1 giving the breakpoints between histogram cells. Must be in ascending order.- Returns:
- a 3-by-n bins array of which first row is the lower bound of bins, second row is the upper bound of bins, and the third row is the frequency count.
-
of
static double[][] of(float[] data, int k) Generate the histogram of n bins.- Parameters:
data
- the data points.k
- the number of bins.- Returns:
- a 3-by-k bins array of which first row is the lower bound of bins, second row is the upper bound of bins, and the third row is the frequency count.
-
of
static double[][] of(float[] data, float[] breaks) Generate the histogram of n bins.- Parameters:
data
- the data points.breaks
- an array of size k+1 giving the breakpoints between histogram cells. Must be in ascending order.- Returns:
- a 3-by-k bins array of which first row is the lower bound of bins, second row is the upper bound of bins, and the third row is the frequency count.
-
of
static double[][] of(double[] data, int k) Generate the histogram of n bins.- Parameters:
data
- the data points.k
- the number of bins.- Returns:
- a 3-by-k array of which first row is the lower bound of bins, second row is the upper bound of bins, and the third row is the frequency count.
-
of
static double[][] of(double[] data, double[] breaks) Generate the histogram of n bins.- Parameters:
data
- the data points.breaks
- an array of size k+1 giving the breakpoints between histogram cells. Must be in ascending order.- Returns:
- a 3-by-k bins array of which first row is the lower bound of bins, second row is the upper bound of bins, and the third row is the frequency count.
-
breaks
static double[] breaks(double[] x, double h) Returns the breakpoints between histogram cells for a dataset based on a suggested bin width h.- Parameters:
x
- the data set.h
- the bin width.- Returns:
- the breakpoints between histogram cells
-
breaks
static double[] breaks(double min, double max, double h) Returns the breakpoints between histogram cells for a given range based on a suggested bin width h.- Parameters:
min
- the lower bound of bins.max
- the upper bound of bins.h
- the bin width.- Returns:
- the breakpoints between histogram cells
-
breaks
static double[] breaks(double[] x, int k) Returns the breakpoints between histogram cells for a dataset.- Parameters:
x
- the data set.k
- the number of bins.- Returns:
- the breakpoints between histogram cells
-
breaks
static double[] breaks(double min, double max, int k) Returns the breakpoints between histogram cells for a given range.- Parameters:
min
- the lower bound of bins.max
- the upper bound of bins.k
- the number of bins.- Returns:
- the breakpoints between histogram cells
-
bins
static int bins(double[] x, double h) Returns the number of bins for a data based on a suggested bin width h.- Parameters:
x
- the data set.h
- the bin width.- Returns:
- the number of bins k = ceil((max - min) / h)
-
bins
static int bins(int n) Returns the number of bins by square-root rule, which takes the square root of the number of data points in the sample (used by Excel histograms and many others).- Parameters:
n
- the number of data points.- Returns:
- the number of bins
-
sturges
static int sturges(int n) Returns the number of bins by Sturges' rule k = ceil(log2(n) + 1).- Parameters:
n
- the number of data points.- Returns:
- the number of bins
-
scott
static int scott(double[] x) Returns the number of bins by Scott's rule h = 3.5 * σ / (n1/3).- Parameters:
x
- the data set.- Returns:
- the number of bins
-