Package smile.math

# Interface Histogram

public interface Histogram
Histogram utilities. A histogram is a graphical display of tabulated frequencies, shown as bars. It shows what proportion of cases fall into each of several categories: it is a form of data binning. The categories are usually specified as non-overlapping intervals of some variable.

There is no "best" number of bins, and different bin sizes can reveal different features of the data. Depending on the actual data distribution and the goals of the analysis, different bin widths may be appropriate, so experimentation is usually needed to determine an appropriate width.

Note that this class provides only tools to choose the bin width or the number of bins and frequency counting. It does NOT provide plotting services.

• ## Method Summary

Static Methods
Modifier and Type
Method
Description
`static int`
```bins(double[] x, double h)```
Returns the number of bins for a data based on a suggested bin width h.
`static int`
`bins(int n)`
Returns the number of bins by square-root rule, which takes the square root of the number of data points in the sample (used by Excel histograms and many others).
`static double[]`
```breaks(double[] x, double h)```
Returns the breakpoints between histogram cells for a dataset based on a suggested bin width h.
`static double[]`
```breaks(double[] x, int k)```
Returns the breakpoints between histogram cells for a dataset.
`static double[]`
```breaks(double min, double max, double h)```
Returns the breakpoints between histogram cells for a given range based on a suggested bin width h.
`static double[]`
```breaks(double min, double max, int k)```
Returns the breakpoints between histogram cells for a given range.
`static double[][]`
`of(double[] data)`
Generate the histogram of given data.
`static double[][]`
```of(double[] data, double[] breaks)```
Generate the histogram of n bins.
`static double[][]`
```of(double[] data, int k)```
Generate the histogram of n bins.
`static double[][]`
`of(float[] data)`
Generate the histogram of given data.
`static double[][]`
```of(float[] data, float[] breaks)```
Generate the histogram of n bins.
`static double[][]`
```of(float[] data, int k)```
Generate the histogram of n bins.
`static double[][]`
`of(int[] data)`
Generate the histogram of given data.
`static double[][]`
```of(int[] data, double[] breaks)```
Generate the histogram of n bins.
`static double[][]`
```of(int[] data, int k)```
Generate the histogram of k bins.
`static int`
`scott(double[] x)`
Returns the number of bins by Scott's rule h = 3.5 * σ / (n1/3).
`static int`
`sturges(int n)`
Returns the number of bins by Sturges' rule k = ceil(log2(n) + 1).
• ## Method Details

• ### of

static double[][] of(int[] data)
Generate the histogram of given data. The number of bins k is decided by square-root choice.
Parameters:
`data` - the data points.
Returns:
a 3-by-k bins array of which first row is the lower bound of bins, second row is the upper bound of bins, and the third row is the frequency count.
• ### of

static double[][] of(float[] data)
Generate the histogram of given data. The number of bins k is decided by square-root choice.
Parameters:
`data` - the data points.
Returns:
a 3-by-k bins array of which first row is the lower bound of bins, second row is the upper bound of bins, and the third row is the frequency count.
• ### of

static double[][] of(double[] data)
Generate the histogram of given data. The number of bins k is decided by square-root choice.
Parameters:
`data` - the data points.
Returns:
a 3-by-k bins array of which first row is the lower bound of bins, second row is the upper bound of bins, and the third row is the frequency count.
• ### of

static double[][] of(int[] data, int k)
Generate the histogram of k bins.
Parameters:
`data` - the data points.
`k` - the number of bins.
Returns:
a 3-by-k bins array of which first row is the lower bound of bins, second row is the upper bound of bins, and the third row is the frequency count.
• ### of

static double[][] of(int[] data, double[] breaks)
Generate the histogram of n bins.
Parameters:
`data` - the data points.
`breaks` - an array of size k+1 giving the breakpoints between histogram cells. Must be in ascending order.
Returns:
a 3-by-n bins array of which first row is the lower bound of bins, second row is the upper bound of bins, and the third row is the frequency count.
• ### of

static double[][] of(float[] data, int k)
Generate the histogram of n bins.
Parameters:
`data` - the data points.
`k` - the number of bins.
Returns:
a 3-by-k bins array of which first row is the lower bound of bins, second row is the upper bound of bins, and the third row is the frequency count.
• ### of

static double[][] of(float[] data, float[] breaks)
Generate the histogram of n bins.
Parameters:
`data` - the data points.
`breaks` - an array of size k+1 giving the breakpoints between histogram cells. Must be in ascending order.
Returns:
a 3-by-k bins array of which first row is the lower bound of bins, second row is the upper bound of bins, and the third row is the frequency count.
• ### of

static double[][] of(double[] data, int k)
Generate the histogram of n bins.
Parameters:
`data` - the data points.
`k` - the number of bins.
Returns:
a 3-by-k array of which first row is the lower bound of bins, second row is the upper bound of bins, and the third row is the frequency count.
• ### of

static double[][] of(double[] data, double[] breaks)
Generate the histogram of n bins.
Parameters:
`data` - the data points.
`breaks` - an array of size k+1 giving the breakpoints between histogram cells. Must be in ascending order.
Returns:
a 3-by-k bins array of which first row is the lower bound of bins, second row is the upper bound of bins, and the third row is the frequency count.
• ### breaks

static double[] breaks(double[] x, double h)
Returns the breakpoints between histogram cells for a dataset based on a suggested bin width h.
Parameters:
`x` - the data set.
`h` - the bin width.
Returns:
the breakpoints between histogram cells
• ### breaks

static double[] breaks(double min, double max, double h)
Returns the breakpoints between histogram cells for a given range based on a suggested bin width h.
Parameters:
`min` - the lower bound of bins.
`max` - the upper bound of bins.
`h` - the bin width.
Returns:
the breakpoints between histogram cells
• ### breaks

static double[] breaks(double[] x, int k)
Returns the breakpoints between histogram cells for a dataset.
Parameters:
`x` - the data set.
`k` - the number of bins.
Returns:
the breakpoints between histogram cells
• ### breaks

static double[] breaks(double min, double max, int k)
Returns the breakpoints between histogram cells for a given range.
Parameters:
`min` - the lower bound of bins.
`max` - the upper bound of bins.
`k` - the number of bins.
Returns:
the breakpoints between histogram cells
• ### bins

static int bins(double[] x, double h)
Returns the number of bins for a data based on a suggested bin width h.
Parameters:
`x` - the data set.
`h` - the bin width.
Returns:
the number of bins k = ceil((max - min) / h)
• ### bins

static int bins(int n)
Returns the number of bins by square-root rule, which takes the square root of the number of data points in the sample (used by Excel histograms and many others).
Parameters:
`n` - the number of data points.
Returns:
the number of bins
• ### sturges

static int sturges(int n)
Returns the number of bins by Sturges' rule k = ceil(log2(n) + 1).
Parameters:
`n` - the number of data points.
Returns:
the number of bins
• ### scott

static int scott(double[] x)
Returns the number of bins by Scott's rule h = 3.5 * σ / (n1/3).
Parameters:
`x` - the data set.
Returns:
the number of bins