Packages

  • package root

    Smile (Statistical Machine Intelligence and Learning Engine) is a fast and comprehensive machine learning, NLP, linear algebra, graph, interpolation, and visualization system in Java and Scala.

    Smile (Statistical Machine Intelligence and Learning Engine) is a fast and comprehensive machine learning, NLP, linear algebra, graph, interpolation, and visualization system in Java and Scala. With advanced data structures and algorithms, Smile delivers state-of-art performance.

    Smile covers every aspect of machine learning, including classification, regression, clustering, association rule mining, feature selection, manifold learning, multidimensional scaling, genetic algorithms, missing value imputation, efficient nearest neighbor search, etc.

    Definition Classes
    root
  • package smile
    Definition Classes
    root
  • package association

    Frequent item set mining and association rule mining.

    Frequent item set mining and association rule mining. Association rule learning is a popular and well researched method for discovering interesting relations between variables in large databases. Let I = {i1, i2,..., in} be a set of n binary attributes called items. Let D = {t1, t2,..., tm} be a set of transactions called the database. Each transaction in D has a unique transaction ID and contains a subset of the items in I. An association rule is defined as an implication of the form X ⇒ Y where X, Y ⊆ I and X ∩ Y = Ø. The item sets X and Y are called antecedent (left-hand-side or LHS) and consequent (right-hand-side or RHS) of the rule, respectively. The support supp(X) of an item set X is defined as the proportion of transactions in the database which contain the item set. Note that the support of an association rule X ⇒ Y is supp(X ∪ Y). The confidence of a rule is defined conf(X ⇒ Y) = supp(X ∪ Y) / supp(X). Confidence can be interpreted as an estimate of the probability P(Y | X), the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS.

    For example, the rule {onions, potatoes} ⇒ {burger} found in the sales data of a supermarket would indicate that if a customer buys onions and potatoes together, he or she is likely to also buy burger. Such information can be used as the basis for decisions about marketing activities such as promotional pricing or product placements.

    Association rules are usually required to satisfy a user-specified minimum support and a user-specified minimum confidence at the same time. Association rule generation is usually split up into two separate steps:

    • First, minimum support is applied to find all frequent item sets in a database (i.e. frequent item set mining).
    • Second, these frequent item sets and the minimum confidence constraint are used to form rules.

    Finding all frequent item sets in a database is difficult since it involves searching all possible item sets (item combinations). The set of possible item sets is the power set over I (the set of items) and has size 2n - 1 (excluding the empty set which is not a valid item set). Although the size of the power set grows exponentially in the number of items n in I, efficient search is possible using the downward-closure property of support (also called anti-monotonicity) which guarantees that for a frequent item set also all its subsets are frequent and thus for an infrequent item set, all its supersets must be infrequent.

    In practice, we may only consider the frequent item set that has the maximum number of items bypassing all the sub item sets. An item set is maximal frequent if none of its immediate supersets is frequent.

    For a maximal frequent item set, even though we know that all the sub item sets are frequent, we don't know the actual support of those sub item sets, which are very important to find the association rules within the item sets. If the final goal is association rule mining, we would like to discover closed frequent item sets. An item set is closed if none of its immediate supersets has the same support as the item set.

    Some well known algorithms of frequent item set mining are Apriori, Eclat and FP-Growth. Apriori is the best-known algorithm to mine association rules. It uses a breadth-first search strategy to counting the support of item sets and uses a candidate generation function which exploits the downward closure property of support. Eclat is a depth-first search algorithm using set intersection.

    FP-growth (frequent pattern growth) uses an extended prefix-tree (FP-tree) structure to store the database in a compressed form. FP-growth adopts a divide-and-conquer approach to decompose both the mining tasks and the databases. It uses a pattern fragment growth method to avoid the costly process of candidate generation and testing used by Apriori.

    References:
    • R. Agrawal, T. Imielinski and A. Swami. Mining Association Rules Between Sets of Items in Large Databases, SIGMOD, 207-216, 1993.
    • Rakesh Agrawal and Ramakrishnan Srikant. Fast algorithms for mining association rules in large databases. VLDB, 487-499, 1994.
    • Mohammed J. Zaki. Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering, 12(3):372-390, 2000.
    • Jiawei Han, Jian Pei, Yiwen Yin, and Runying Mao. Mining frequent patterns without candidate generation. Data Mining and Knowledge Discovery 8:53-87, 2004.
    Definition Classes
    smile
  • package cas

    Computer algebra system.

    Computer algebra system. A computer algebra system (CAS) has the ability to manipulate mathematical expressions in a way similar to the traditional manual computations of mathematicians and scientists.

    The symbolic manipulations supported include:

    • simplification to a smaller expression or some standard form, including automatic simplification with assumptions and simplification with constraints
    • substitution of symbols or numeric values for certain expressions
    • change of form of expressions: expanding products and powers, partial and full factorization, rewriting as partial fractions, constraint satisfaction, rewriting trigonometric functions as exponentials, transforming logic expressions, etc.
    • partial and total differentiation
    • matrix operations including products, inverses, etc.
    Definition Classes
    smile
  • package classification

    Classification algorithms.

    Classification algorithms. In machine learning and pattern recognition, classification refers to an algorithmic procedure for assigning a given input object into one of a given number of categories. The input object is formally termed an instance, and the categories are termed classes.

    The instance is usually described by a vector of features, which together constitute a description of all known characteristics of the instance. Typically, features are either categorical (also known as nominal, i.e. consisting of one of a set of unordered items, such as a gender of "male" or "female", or a blood type of "A", "B", "AB" or "O"), ordinal (consisting of one of a set of ordered items, e.g. "large", "medium" or "small"), integer-valued (e.g. a count of the number of occurrences of a particular word in an email) or real-valued (e.g. a measurement of blood pressure).

    Classification normally refers to a supervised procedure, i.e. a procedure that produces an inferred function to predict the output value of new instances based on a training set of pairs consisting of an input object and a desired output value. The inferred function is called a classifier if the output is discrete or a regression function if the output is continuous.

    The inferred function should predict the correct output value for any valid input object. This requires the learning algorithm to generalize from the training data to unseen situations in a "reasonable" way.

    A wide range of supervised learning algorithms is available, each with its strengths and weaknesses. There is no single learning algorithm that works best on all supervised learning problems. The most widely used learning algorithms are AdaBoost and gradient boosting, support vector machines, linear regression, linear discriminant analysis, logistic regression, naive Bayes, decision trees, k-nearest neighbor algorithm, and neural networks (multilayer perceptron).

    If the feature vectors include features of many different kinds (discrete, discrete ordered, counts, continuous values), some algorithms cannot be easily applied. Many algorithms, including linear regression, logistic regression, neural networks, and nearest neighbor methods, require that the input features be numerical and scaled to similar ranges (e.g., to the [-1,1] interval). Methods that employ a distance function, such as nearest neighbor methods and support vector machines with Gaussian kernels, are particularly sensitive to this. An advantage of decision trees (and boosting algorithms based on decision trees) is that they easily handle heterogeneous data.

    If the input features contain redundant information (e.g., highly correlated features), some learning algorithms (e.g., linear regression, logistic regression, and distance based methods) will perform poorly because of numerical instabilities. These problems can often be solved by imposing some form of regularization.

    If each of the features makes an independent contribution to the output, then algorithms based on linear functions (e.g., linear regression, logistic regression, linear support vector machines, naive Bayes) generally perform well. However, if there are complex interactions among features, then algorithms such as nonlinear support vector machines, decision trees and neural networks work better. Linear methods can also be applied, but the engineer must manually specify the interactions when using them.

    There are several major issues to consider in supervised learning:

    • Features: The accuracy of the inferred function depends strongly on how the input object is represented. Typically, the input object is transformed into a feature vector, which contains a number of features that are descriptive of the object. The number of features should not be too large, because of the curse of dimensionality; but should contain enough information to accurately predict the output. There are many algorithms for feature selection that seek to identify the relevant features and discard the irrelevant ones. More generally, dimensionality reduction may seek to map the input data into a lower dimensional space prior to running the supervised learning algorithm.
    • Overfitting: Overfitting occurs when a statistical model describes random error or noise instead of the underlying relationship. Overfitting generally occurs when a model is excessively complex, such as having too many parameters relative to the number of observations. A model which has been overfit will generally have poor predictive performance, as it can exaggerate minor fluctuations in the data. The potential for overfitting depends not only on the number of parameters and data but also the conformability of the model structure with the data shape, and the magnitude of model error compared to the expected level of noise or error in the data. In order to avoid overfitting, it is necessary to use additional techniques (e.g. cross-validation, regularization, early stopping, pruning, Bayesian priors on parameters or model comparison), that can indicate when further training is not resulting in better generalization. The basis of some techniques is either (1) to explicitly penalize overly complex models, or (2) to test the model's ability to generalize by evaluating its performance on a set of data not used for training, which is assumed to approximate the typical unseen data that a model will encounter.
    • Regularization: Regularization involves introducing additional information in order to solve an ill-posed problem or to prevent over-fitting. This information is usually of the form of a penalty for complexity, such as restrictions for smoothness or bounds on the vector space norm. A theoretical justification for regularization is that it attempts to impose Occam's razor on the solution. From a Bayesian point of view, many regularization techniques correspond to imposing certain prior distributions on model parameters.
    • Bias-variance tradeoff: Mean squared error (MSE) can be broken down into two components: variance and squared bias, known as the bias-variance decomposition. Thus in order to minimize the MSE, we need to minimize both the bias and the variance. However, this is not trivial. Therefore, there is a tradeoff between bias and variance.
    Definition Classes
    smile
  • package clustering

    Clustering analysis.

    Clustering analysis. Clustering is the assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense. Clustering is a method of unsupervised learning, and a common technique for statistical data analysis used in many fields.

    Hierarchical algorithms find successive clusters using previously established clusters. These algorithms usually are either agglomerative ("bottom-up") or divisive ("top-down"). Agglomerative algorithms begin with each element as a separate cluster and merge them into successively larger clusters. Divisive algorithms begin with the whole set and proceed to divide it into successively smaller clusters.

    Partitional algorithms typically determine all clusters at once, but can also be used as divisive algorithms in the hierarchical clustering. Many partitional clustering algorithms require the specification of the number of clusters to produce in the input data set, prior to execution of the algorithm. Barring knowledge of the proper value beforehand, the appropriate value must be determined, a problem on its own for which a number of techniques have been developed.

    Density-based clustering algorithms are devised to discover arbitrary-shaped clusters. In this approach, a cluster is regarded as a region in which the density of data objects exceeds a threshold.

    Subspace clustering methods look for clusters that can only be seen in a particular projection (subspace, manifold) of the data. These methods thus can ignore irrelevant attributes. The general problem is also known as Correlation clustering while the special case of axis-parallel subspaces is also known as two-way clustering, co-clustering or biclustering in bioinformatics: in these methods not only the objects are clustered but also the features of the objects, i.e., if the data is represented in a data matrix, the rows and columns are clustered simultaneously. They usually do not however work with arbitrary feature combinations as in general subspace methods.

    Definition Classes
    smile
  • package data

    Data manipulation functions.

    Data manipulation functions.

    Definition Classes
    smile
  • package feature
    Definition Classes
    smile
  • package manifold

    Manifold learning finds a low-dimensional basis for describing high-dimensional data.

    Manifold learning finds a low-dimensional basis for describing high-dimensional data. Manifold learning is a popular approach to nonlinear dimensionality reduction. Algorithms for this task are based on the idea that the dimensionality of many data sets is only artificially high; though each data point consists of perhaps thousands of features, it may be described as a function of only a few underlying parameters. That is, the data points are actually samples from a low-dimensional manifold that is embedded in a high-dimensional space. Manifold learning algorithms attempt to uncover these parameters in order to find a low-dimensional representation of the data.

    Some prominent approaches are locally linear embedding (LLE), Hessian LLE, Laplacian eigenmaps, and LTSA. These techniques construct a low-dimensional data representation using a cost function that retains local properties of the data, and can be viewed as defining a graph-based kernel for Kernel PCA. More recently, techniques have been proposed that, instead of defining a fixed kernel, try to learn the kernel using semidefinite programming. The most prominent example of such a technique is maximum variance unfolding (MVU). The central idea of MVU is to exactly preserve all pairwise distances between nearest neighbors (in the inner product space), while maximizing the distances between points that are not nearest neighbors.

    An alternative approach to neighborhood preservation is through the minimization of a cost function that measures differences between distances in the input and output spaces. Important examples of such techniques include classical multidimensional scaling (which is identical to PCA), Isomap (which uses geodesic distances in the data space), diffusion maps (which uses diffusion distances in the data space), t-SNE (which minimizes the divergence between distributions over pairs of points), and curvilinear component analysis.

    Definition Classes
    smile
  • package math

    Mathematical and statistical functions.

    Mathematical and statistical functions.

    Definition Classes
    smile
  • package distance

    Distance functions.

  • AbsMatrix
  • AbsVector
  • AcosMatrix
  • AcosVector
  • AsinMatrix
  • AsinVector
  • AtanMatrix
  • AtanVector
  • Ax
  • CbrtMatrix
  • CbrtVector
  • CeilMatrix
  • CeilVector
  • ExpMatrix
  • ExpVector
  • Expm1Matrix
  • Expm1Vector
  • FloorMatrix
  • FloorVector
  • Log10Matrix
  • Log10Vector
  • Log1pMatrix
  • Log1pVector
  • Log2Matrix
  • Log2Vector
  • LogMatrix
  • LogVector
  • MatrixAddMatrix
  • MatrixAddValue
  • MatrixDivMatrix
  • MatrixDivValue
  • MatrixExpression
  • MatrixLift
  • MatrixMulMatrix
  • MatrixMulValue
  • MatrixMultiplication
  • MatrixMultiplicationChain
  • MatrixOrderOptimization
  • MatrixSubMatrix
  • MatrixSubValue
  • MatrixTranspose
  • RoundMatrix
  • RoundVector
  • SinMatrix
  • SinVector
  • Slice
  • SqrtMatrix
  • SqrtVector
  • TanMatrix
  • TanVector
  • TanhMatrix
  • TanhVector
  • ValueAddMatrix
  • ValueAddVector
  • ValueDivMatrix
  • ValueDivVector
  • ValueMulMatrix
  • ValueMulVector
  • ValueSubMatrix
  • ValueSubVector
  • VectorAddValue
  • VectorAddVector
  • VectorDivValue
  • VectorDivVector
  • VectorExpression
  • VectorLift
  • VectorMulValue
  • VectorMulVector
  • VectorSubValue
  • VectorSubVector
  • package nlp

    Natural language processing.

    Natural language processing.

    Definition Classes
    smile
  • package plot

    Data visualization.

    Data visualization.

    Definition Classes
    smile
  • package regression

    Regression analysis.

    Regression analysis. Regression analysis includes any techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables. Most commonly, regression analysis estimates the conditional expectation of the dependent variable given the independent variables. Therefore, the estimation target is a function of the independent variables called the regression function. Regression analysis is widely used for prediction and forecasting.

    Definition Classes
    smile
  • package sequence

    Sequence labeling algorithms.

    Sequence labeling algorithms.

    Definition Classes
    smile
  • package util

    Utility functions.

    Utility functions.

    Definition Classes
    smile
  • package validation

    Model validation.

    Model validation.

    Definition Classes
    smile
  • package wavelet

    A wavelet is a wave-like oscillation with an amplitude that starts out at zero, increases, and then decreases back to zero.

    A wavelet is a wave-like oscillation with an amplitude that starts out at zero, increases, and then decreases back to zero. Like the fast Fourier transform (FFT), the discrete wavelet transform (DWT) is a fast, linear operation that operates on a data vector whose length is an integer power of 2, transforming it into a numerically different vector of the same length. The wavelet transform is invertible and in fact orthogonal. Both FFT and DWT can be viewed as a rotation in function space.

    Definition Classes
    smile

package math

Mathematical and statistical functions.

Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. math
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Package Members

  1. package distance

    Distance functions.

Type Members

  1. case class AbsMatrix(A: MatrixExpression) extends MatrixExpression with Product with Serializable
  2. case class AbsVector(x: VectorExpression) extends VectorExpression with Product with Serializable
  3. case class AcosMatrix(A: MatrixExpression) extends MatrixExpression with Product with Serializable
  4. case class AcosVector(x: VectorExpression) extends VectorExpression with Product with Serializable
  5. case class AsinMatrix(A: MatrixExpression) extends MatrixExpression with Product with Serializable
  6. case class AsinVector(x: VectorExpression) extends VectorExpression with Product with Serializable
  7. case class AtanMatrix(A: MatrixExpression) extends MatrixExpression with Product with Serializable
  8. case class AtanVector(x: VectorExpression) extends VectorExpression with Product with Serializable
  9. case class Ax(A: MatrixExpression, x: VectorExpression) extends VectorExpression with Product with Serializable
  10. case class CbrtMatrix(A: MatrixExpression) extends MatrixExpression with Product with Serializable
  11. case class CbrtVector(x: VectorExpression) extends VectorExpression with Product with Serializable
  12. case class CeilMatrix(A: MatrixExpression) extends MatrixExpression with Product with Serializable
  13. case class CeilVector(x: VectorExpression) extends VectorExpression with Product with Serializable
  14. case class ExpMatrix(A: MatrixExpression) extends MatrixExpression with Product with Serializable
  15. case class ExpVector(x: VectorExpression) extends VectorExpression with Product with Serializable
  16. case class Expm1Matrix(A: MatrixExpression) extends MatrixExpression with Product with Serializable
  17. case class Expm1Vector(x: VectorExpression) extends VectorExpression with Product with Serializable
  18. case class FloorMatrix(A: MatrixExpression) extends MatrixExpression with Product with Serializable
  19. case class FloorVector(x: VectorExpression) extends VectorExpression with Product with Serializable
  20. case class Log10Matrix(A: MatrixExpression) extends MatrixExpression with Product with Serializable
  21. case class Log10Vector(x: VectorExpression) extends VectorExpression with Product with Serializable
  22. case class Log1pMatrix(A: MatrixExpression) extends MatrixExpression with Product with Serializable
  23. case class Log1pVector(x: VectorExpression) extends VectorExpression with Product with Serializable
  24. case class Log2Matrix(A: MatrixExpression) extends MatrixExpression with Product with Serializable
  25. case class Log2Vector(x: VectorExpression) extends VectorExpression with Product with Serializable
  26. case class LogMatrix(A: MatrixExpression) extends MatrixExpression with Product with Serializable
  27. case class LogVector(x: VectorExpression) extends VectorExpression with Product with Serializable
  28. case class MatrixAddMatrix(A: MatrixExpression, B: MatrixExpression) extends MatrixExpression with Product with Serializable
  29. case class MatrixAddValue(A: MatrixExpression, x: Double) extends MatrixExpression with Product with Serializable
  30. case class MatrixDivMatrix(A: MatrixExpression, B: MatrixExpression) extends MatrixExpression with Product with Serializable
  31. case class MatrixDivValue(A: MatrixExpression, x: Double) extends MatrixExpression with Product with Serializable
  32. sealed trait MatrixExpression extends AnyRef
  33. case class MatrixLift(A: Matrix) extends MatrixExpression with Product with Serializable
  34. case class MatrixMulMatrix(A: MatrixExpression, B: MatrixExpression) extends MatrixExpression with Product with Serializable
  35. case class MatrixMulValue(A: MatrixExpression, x: Double) extends MatrixExpression with Product with Serializable
  36. case class MatrixMultiplication(A: MatrixExpression, B: MatrixExpression) extends MatrixExpression with Product with Serializable
  37. case class MatrixMultiplicationChain(A: Seq[MatrixExpression]) extends MatrixExpression with Product with Serializable
  38. class MatrixOrderOptimization extends LazyLogging

    Optimizes the order of matrix multiplication chain.

    Optimizes the order of matrix multiplication chain. Matrix multiplication is associative. However, the complexity of matrix multiplication chain is not associative.

  39. case class MatrixSubMatrix(A: MatrixExpression, B: MatrixExpression) extends MatrixExpression with Product with Serializable
  40. case class MatrixSubValue(A: MatrixExpression, x: Double) extends MatrixExpression with Product with Serializable
  41. case class MatrixTranspose(A: MatrixExpression) extends MatrixExpression with Product with Serializable
  42. case class RoundMatrix(A: MatrixExpression) extends MatrixExpression with Product with Serializable
  43. case class RoundVector(x: VectorExpression) extends VectorExpression with Product with Serializable
  44. case class SinMatrix(A: MatrixExpression) extends MatrixExpression with Product with Serializable
  45. case class SinVector(x: VectorExpression) extends VectorExpression with Product with Serializable
  46. case class Slice(start: Int, end: Int, step: Int = 1) extends Product with Serializable

    Python like slicing.

  47. case class SqrtMatrix(A: MatrixExpression) extends MatrixExpression with Product with Serializable
  48. case class SqrtVector(x: VectorExpression) extends VectorExpression with Product with Serializable
  49. case class TanMatrix(A: MatrixExpression) extends MatrixExpression with Product with Serializable
  50. case class TanVector(x: VectorExpression) extends VectorExpression with Product with Serializable
  51. case class TanhMatrix(A: MatrixExpression) extends MatrixExpression with Product with Serializable
  52. case class TanhVector(x: VectorExpression) extends VectorExpression with Product with Serializable
  53. case class ValueAddMatrix(x: Double, A: MatrixExpression) extends MatrixExpression with Product with Serializable
  54. case class ValueAddVector(y: Double, x: VectorExpression) extends VectorExpression with Product with Serializable
  55. case class ValueDivMatrix(x: Double, A: MatrixExpression) extends MatrixExpression with Product with Serializable
  56. case class ValueDivVector(y: Double, x: VectorExpression) extends VectorExpression with Product with Serializable
  57. case class ValueMulMatrix(x: Double, A: MatrixExpression) extends MatrixExpression with Product with Serializable
  58. case class ValueMulVector(y: Double, x: VectorExpression) extends VectorExpression with Product with Serializable
  59. case class ValueSubMatrix(x: Double, A: MatrixExpression) extends MatrixExpression with Product with Serializable
  60. case class ValueSubVector(y: Double, x: VectorExpression) extends VectorExpression with Product with Serializable
  61. case class VectorAddValue(x: VectorExpression, y: Double) extends VectorExpression with Product with Serializable
  62. case class VectorAddVector(x: VectorExpression, y: VectorExpression) extends VectorExpression with Product with Serializable
  63. case class VectorDivValue(x: VectorExpression, y: Double) extends VectorExpression with Product with Serializable
  64. case class VectorDivVector(x: VectorExpression, y: VectorExpression) extends VectorExpression with Product with Serializable
  65. sealed trait VectorExpression extends AnyRef

    Vector Expression.

  66. case class VectorLift(x: Array[Double]) extends VectorExpression with Product with Serializable
  67. case class VectorMulValue(x: VectorExpression, y: Double) extends VectorExpression with Product with Serializable
  68. case class VectorMulVector(x: VectorExpression, y: VectorExpression) extends VectorExpression with Product with Serializable
  69. case class VectorSubValue(x: VectorExpression, y: Double) extends VectorExpression with Product with Serializable
  70. case class VectorSubVector(x: VectorExpression, y: VectorExpression) extends VectorExpression with Product with Serializable

Value Members

  1. def abs(x: MatrixExpression): AbsMatrix
  2. def abs(x: VectorExpression): AbsVector
  3. def acos(x: MatrixExpression): AcosMatrix
  4. def acos(x: VectorExpression): AcosVector
  5. implicit def array2Matrix(data: Array[Array[Double]]): Matrix
  6. implicit def array2Matrix(data: Array[Double]): Matrix
  7. implicit def array2VectorExpression(x: Array[Double]): VectorLift
  8. def asin(x: MatrixExpression): AsinMatrix
  9. def asin(x: VectorExpression): AsinVector
  10. def atan(x: MatrixExpression): AtanMatrix
  11. def atan(x: VectorExpression): AtanVector
  12. def beta(x: Double, y: Double): Double

    The beta function, also called the Euler integral of the first kind.

    The beta function, also called the Euler integral of the first kind.

    B(x, y) = 01 tx-1 (1-t)y-1dt

    for x, y > 0 and the integration is over [0,1].The beta function is symmetric, i.e. B(x,y) = B(y,x).

  13. def cbrt(x: MatrixExpression): CbrtMatrix
  14. def cbrt(x: VectorExpression): CbrtVector
  15. def ceil(x: MatrixExpression): CeilMatrix
  16. def ceil(x: VectorExpression): CeilVector
  17. def chisqtest(table: Array[Array[Int]]): ChiSqTest

    Given a two-dimensional contingency table in the form of an array of integers, returns Chi-square test for independence.

    Given a two-dimensional contingency table in the form of an array of integers, returns Chi-square test for independence. The rows of contingency table are labels by the values of one nominal variable, the columns are labels by the values of the other nominal variable, and whose entries are non-negative integers giving the number of observed events for each combination of row and column. Continuity correction will be applied when computing the test statistic for 2x2 tables: one half is subtracted from all |O-E| differences. The correlation coefficient is calculated as Cramer's V.

  18. def chisqtest(x: Array[Int], prob: Array[Double], constraints: Int = 1): ChiSqTest

    One-sample chisq test.

    One-sample chisq test. Given the array x containing the observed numbers of events, and an array prob containing the expected probabilities of events, and given the number of constraints (normally one), a small value of p-value indicates a significant difference between the distributions.

  19. def chisqtest2(x: Array[Int], y: Array[Int], constraints: Int = 1): ChiSqTest

    Two-sample chisq test.

    Two-sample chisq test. Given the arrays x and y, containing two sets of binned data, and given one constraint, a small value of p-value indicates a significant difference between two distributions.

  20. def cholesky(A: MatrixExpression): Cholesky

    Cholesky decomposition.

  21. def cholesky(A: Matrix): Cholesky

    Cholesky decomposition.

  22. def cholesky(A: Array[Array[Double]]): Cholesky

    Cholesky decomposition.

  23. def det(A: MatrixExpression): Double

    Returns the determinant of matrix.

  24. def det(A: Matrix): Double

    Returns the determinant of matrix.

  25. def diag(A: Matrix): Array[Double]

    Returns the diagonal elements of matrix.

  26. def digamma(x: Double): Double

    The digamma function is defined as the logarithmic derivative of the gamma function.

  27. def eig(A: MatrixExpression): EVD

    Returns eigen values.

  28. def eig(A: Matrix): EVD

    Returns eigen values.

  29. def eig(A: Array[Array[Double]]): EVD

    Returns eigen values.

  30. def eigen(A: IMatrix, k: Int): EVD

    Returns k largest eigenvectors.

  31. def eigen(A: MatrixExpression): EVD

    Eigen decomposition.

  32. def eigen(A: Matrix): EVD

    Eigen decomposition.

  33. def eigen(A: Array[Array[Double]]): EVD

    Eigen decomposition.

  34. def erf(x: Double): Double

    The error function (also called the Gauss error function) is a special function of sigmoid shape which occurs in probability, statistics, materials science, and partial differential equations.

    The error function (also called the Gauss error function) is a special function of sigmoid shape which occurs in probability, statistics, materials science, and partial differential equations. It is defined as:

    erf(x) = 0x e-t2dt

    The complementary error function, denoted erfc, is defined as erfc(x) = 1 - erf(x). The error function and complementary error function are special cases of the incomplete gamma function.

  35. def erfc(x: Double): Double

    The complementary error function.

  36. def erfcc(x: Double): Double

    The complementary error function with fractional error everywhere less than 1.2 × 10-7.

    The complementary error function with fractional error everywhere less than 1.2 × 10-7. This concise routine is faster than erfc.

  37. def exp(x: MatrixExpression): ExpMatrix
  38. def exp(x: VectorExpression): ExpVector
  39. def expm1(x: MatrixExpression): Expm1Matrix
  40. def expm1(x: VectorExpression): Expm1Vector
  41. def eye(m: Int, n: Int): Matrix

    Returns an m-by-n identity matrix.

  42. def eye(n: Int): Matrix

    Returns an n-by-n identity matrix.

  43. def floor(x: MatrixExpression): FloorMatrix
  44. def floor(x: VectorExpression): FloorVector
  45. def ftest(x: Array[Double], y: Array[Double]): FTest

    Test if the arrays x and y have significantly different variances.

    Test if the arrays x and y have significantly different variances. Small values of p-value indicate that the two arrays have significantly different variances.

  46. def gamma(x: Double): Double

    Gamma function.

    Gamma function. Lanczos approximation (6 terms).

  47. def inv(A: MatrixExpression): Matrix

    Returns the inverse of matrix.

  48. def inv(A: Matrix): Matrix

    Returns the inverse of matrix.

  49. def inverf(p: Double): Double

    The inverse error function.

  50. def inverfc(p: Double): Double

    The inverse complementary error function.

  51. def kendalltest(x: Array[Double], y: Array[Double]): CorTest

    Kendall rank correlation test.

    Kendall rank correlation test. The Kendall Tau Rank Correlation Coefficient is used to measure the degree of correspondence between sets of rankings where the measures are not equidistant. It is used with non-parametric data. The p-value is calculated by approximation, which is good for n > 10.

  52. def kstest(x: Array[Double], y: Array[Double]): KSTest

    The two-sample KS test for the null hypothesis that the data sets are drawn from the same distribution.

    The two-sample KS test for the null hypothesis that the data sets are drawn from the same distribution. Small values of p-value show that the cumulative distribution function of x is significantly different from that of y. The arrays x and y are modified by being sorted into ascending order.

  53. def kstest(x: Array[Double], y: Distribution): KSTest

    The one-sample KS test for the null hypothesis that the data set x is drawn from the given distribution.

    The one-sample KS test for the null hypothesis that the data set x is drawn from the given distribution. Small values of p-value show that the cumulative distribution function of x is significantly different from the given distribution. The array x is modified by being sorted into ascending order.

  54. def lgamma(x: Double): Double

    log of the Gamma function.

    log of the Gamma function. Lanczos approximation (6 terms)

  55. def log(x: MatrixExpression): LogMatrix
  56. def log(x: VectorExpression): LogVector
  57. def log10(x: MatrixExpression): Log10Matrix
  58. def log10(x: VectorExpression): Log10Vector
  59. def log1p(x: MatrixExpression): Log1pMatrix
  60. def log1p(x: VectorExpression): Log1pVector
  61. def log2(x: MatrixExpression): Log2Matrix
  62. def log2(x: VectorExpression): Log2Vector
  63. def lu(A: MatrixExpression): LU

    LU decomposition.

  64. def lu(A: Matrix): LU

    LU decomposition.

  65. def lu(A: Array[Array[Double]]): LU

    LU decomposition.

  66. implicit def matrix2MatrixExpression(x: Matrix): MatrixLift
  67. implicit def matrixExpression2Array(exp: MatrixExpression): Matrix
  68. implicit def matrixOps(matrix: Matrix): MatrixOps
  69. def ones(m: Int, n: Int): Matrix

    Returns an m-by-n matrix of all ones.

  70. def ones(n: Int): Matrix

    Returns an n-by-n matrix of all ones.

  71. def pearsontest(x: Array[Double], y: Array[Double]): CorTest

    Pearson correlation coefficient test.

  72. implicit def pimpArray2D(data: Array[Array[Double]]): PimpedArray2D
  73. implicit def pimpDouble(x: Double): PimpedDouble
  74. implicit def pimpDoubleArray(data: Array[Double]): PimpedDoubleArray
  75. implicit def pimpInt(x: Int): PimpedInt
  76. implicit def pimpIntArray(data: Array[Int]): PimpedArray[Int]
  77. def qr(A: MatrixExpression): QR

    QR decomposition.

  78. def qr(A: Matrix): QR

    QR decomposition.

  79. def qr(A: Array[Array[Double]]): QR

    QR decomposition.

  80. def rand(m: Int, n: Int, lo: Double = 0.0, hi: Double = 1.0): Matrix

    Returns an m-by-n matrix of uniform distributed random numbers.

  81. def randn(m: Int, n: Int, mu: Double = 0.0, sigma: Double = 1.0): Matrix

    Returns an m-by-n matrix of normally distributed random numbers.

  82. def rank(A: MatrixExpression): Int

    Returns the rank of matrix.

  83. def rank(A: Matrix): Int

    Returns the rank of matrix.

  84. def round(x: MatrixExpression): RoundMatrix
  85. def round(x: VectorExpression): RoundVector
  86. def sin(x: MatrixExpression): SinMatrix
  87. def sin(x: VectorExpression): SinVector
  88. def spearmantest(x: Array[Double], y: Array[Double]): CorTest

    Spearman rank correlation coefficient test.

    Spearman rank correlation coefficient test. The Spearman Rank Correlation Coefficient is a form of the Pearson coefficient with the data converted to rankings (ie. when variables are ordinal). It can be used when there is non-parametric data and hence Pearson cannot be used.

    The raw scores are converted to ranks and the differences between the ranks of each observation on the two variables are calculated.

    The p-value is calculated by approximation, which is good for n > 10.

  89. def sqrt(x: MatrixExpression): SqrtMatrix
  90. def sqrt(x: VectorExpression): SqrtVector
  91. def svd(A: IMatrix, k: Int): SVD

    Returns k largest singular vectors.

  92. def svd(A: MatrixExpression): SVD

    SVD decomposition.

  93. def svd(A: Matrix): SVD

    SVD decomposition.

  94. def svd(A: Array[Array[Double]]): SVD

    SVD decomposition.

  95. def tan(x: MatrixExpression): TanMatrix
  96. def tan(x: VectorExpression): TanVector
  97. def tanh(x: MatrixExpression): TanhMatrix
  98. def tanh(x: VectorExpression): TanhVector
  99. def trace(A: Matrix): Double

    Returns the trace of matrix.

  100. def ttest(x: Array[Double], y: Array[Double]): TTest

    Given the paired arrays x and y, test if they have significantly different means.

    Given the paired arrays x and y, test if they have significantly different means. Small values of p-value indicate that the two arrays have significantly different means.

  101. def ttest(x: Array[Double], mean: Double): TTest

    Independent one-sample t-test whether the mean of a normally distributed population has a value specified in a null hypothesis.

    Independent one-sample t-test whether the mean of a normally distributed population has a value specified in a null hypothesis. Small values of p-value indicate that the array has significantly different mean.

  102. def ttest2(x: Array[Double], y: Array[Double], equalVariance: Boolean = false): TTest

    Test if the arrays x and y have significantly different means.

    Test if the arrays x and y have significantly different means. Small values of p-value indicate that the two arrays have significantly different means.

    equalVariance

    true if the data arrays are assumed to be drawn from populations with the same true variance. Otherwise, The data arrays are allowed to be drawn from populations with unequal variances.

  103. implicit def vectorExpression2Array(exp: VectorExpression): Array[Double]
  104. def zeros(m: Int, n: Int): Matrix

    Returns an m-by-n zero matrix.

  105. def zeros(n: Int): Matrix

    Returns an n-by-n zero matrix.

Inherited from AnyRef

Inherited from Any

Ungrouped