Package smile.data

Class SparseDataset<T>

java.lang.Object
smile.data.SimpleDataset<SparseArray,T>
smile.data.SparseDataset<T>
Type Parameters:
T - the target type.
All Implemented Interfaces:
Iterable<SampleInstance<SparseArray,T>>, Dataset<SparseArray,T>

public class SparseDataset<T> extends SimpleDataset<SparseArray,T>
List of Lists sparse matrix format. LIL stores one list per row, where each entry stores a column index and value. Typically, these entries are kept sorted by column index for faster lookup. This format is good for incremental matrix construction.

LIL is typically used to construct the matrix. Once the matrix is constructed, it is typically converted to a format, such as Harwell-Boeing column-compressed sparse matrix format, which is more efficient for matrix operations.

  • Constructor Details

  • Method Details

    • nz

      public int nz()
      Returns the number of nonzero entries.
      Returns:
      the number of nonzero entries.
    • nz

      public int nz(int j)
      Returns the number of nonzero entries in column j.
      Parameters:
      j - the column index.
      Returns:
      the number of nonzero entries in column j.
    • nrow

      public int nrow()
      Returns the number of rows.
      Returns:
      the number of rows.
    • ncol

      public int ncol()
      Returns the number of columns.
      Returns:
      the number of columns.
    • get

      public double get(int i, int j)
      Returns the value at entry (i, j).
      Parameters:
      i - the row index.
      j - the column index.
      Returns:
      the cell value.
    • unitize

      public void unitize()
      Unitize each row so that L2 norm of x = 1.
    • unitize1

      public void unitize1()
      Unitize each row so that L1 norm of x is 1.
    • toMatrix

      public SparseMatrix toMatrix()
      Convert into Harwell-Boeing column-compressed sparse matrix format.
      Returns:
      the sparse matrix.
    • of

      public static SparseDataset<Void> of(SparseArray[] data)
      Returns a default implementation of SparseDataset without targets.
      Parameters:
      data - sparse arrays.
      Returns:
      the sparse dataset.
    • of

      public static SparseDataset<Void> of(SparseArray[] data, int ncol)
      Returns a default implementation of SparseDataset without targets.
      Parameters:
      data - sparse arrays.
      ncol - the number of columns.
      Returns:
      the sparse dataset.
    • from

      public static SparseDataset<Void> from(Path path) throws IOException, ParseException
      Parses spare dataset in coordinate triple tuple list format. Coordinate file stores a list of (row, column, value) tuples.
      Parameters:
      path - the input file path.
      Returns:
      the sparse dataset.
      Throws:
      IOException - when fails to read file.
      ParseException - when fails to parse data.
    • from

      public static SparseDataset<Void> from(Path path, int arrayIndexOrigin) throws IOException, ParseException
      Reads spare dataset in coordinate triple tuple list format. Coordinate file stores a list of (row, column, value) tuples:
       instanceID attributeID value
       instanceID attributeID value
       instanceID attributeID value
       instanceID attributeID value
       ...
       instanceID attributeID value
       instanceID attributeID value
       instanceID attributeID value
       
      Ideally, the entries are sorted (by row index, then column index) to improve random access times. This format is good for incremental matrix construction.

      In addition, there may a header line

       D W N   // The number of rows, columns and nonzero entries.
       
      or 3 header lines
       D    // The number of rows
       W    // The number of columns
       N    // The total number of nonzero entries in the dataset.
       
      Parameters:
      path - the input file path.
      arrayIndexOrigin - the starting index of array. By default, it is 0 as in C/C++ and Java. But it could be 1 to parse data produced by other programming language such as Fortran.
      Returns:
      the sparse dataset.
      Throws:
      IOException - if stream to file cannot be read or closed.
      ParseException - if an index is not an integer or the value is not a double.