Package smile.data

Interface SparseDataset<T>

Type Parameters:
T - the target type.
All Superinterfaces:
Dataset<SparseArray,T>, Iterable<SampleInstance<SparseArray,T>>

public interface SparseDataset<T> extends Dataset<SparseArray,T>
List of Lists sparse matrix format. LIL stores one list per row, where each entry stores a column index and value. Typically, these entries are kept sorted by column index for faster lookup. This format is good for incremental matrix construction.

LIL is typically used to construct the matrix. Once the matrix is constructed, it is typically converted to a format, such as Harwell-Boeing column-compressed sparse matrix format, which is more efficient for matrix operations.

  • Method Details

    • nz

      int nz()
      Returns the number of nonzero entries.
      Returns:
      the number of nonzero entries.
    • nz

      int nz(int j)
      Returns the number of nonzero entries in column j.
      Parameters:
      j - the column index.
      Returns:
      the number of nonzero entries in column j.
    • nrow

      default int nrow()
      Returns the number of rows.
      Returns:
      the number of rows.
    • ncol

      int ncol()
      Returns the number of columns.
      Returns:
      the number of columns.
    • get

      default double get(int i, int j)
      Returns the value at entry (i, j).
      Parameters:
      i - the row index.
      j - the column index.
      Returns:
      the cell value.
    • unitize

      default void unitize()
      Unitize each row so that L2 norm of x = 1.
    • unitize1

      default void unitize1()
      Unitize each row so that L1 norm of x is 1.
    • toMatrix

      default SparseMatrix toMatrix()
      Convert into Harwell-Boeing column-compressed sparse matrix format.
      Returns:
      the sparse matrix.
    • of

      static <T> SparseDataset<T> of(Collection<SampleInstance<SparseArray,T>> data)
      Returns a default implementation of SparseDataset without targets.
      Type Parameters:
      T - the target type.
      Parameters:
      data - sparse arrays.
      Returns:
      the sparse dataset.
    • of

      static <T> SparseDataset<T> of(Collection<SampleInstance<SparseArray,T>> data, int ncol)
      Returns a default implementation of SparseDataset without targets.
      Type Parameters:
      T - the target type.
      Parameters:
      data - sparse arrays.
      ncol - the number of columns.
      Returns:
      the sparse dataset.
    • of

      static SparseDataset<Void> of(SparseArray[] data)
      Returns a default implementation of SparseDataset without targets.
      Parameters:
      data - sparse arrays.
      Returns:
      the sparse dataset.
    • of

      static SparseDataset<Void> of(SparseArray[] data, int ncol)
      Returns a default implementation of SparseDataset without targets.
      Parameters:
      data - sparse arrays.
      ncol - the number of columns.
      Returns:
      the sparse dataset.
    • of

      static SparseDataset<Void> of(Stream<SparseArray> data)
      Returns a default implementation of SparseDataset.
      Parameters:
      data - sparse arrays.
      Returns:
      the sparse dataset.
    • from

      static SparseDataset<Void> from(Path path) throws IOException, ParseException
      Parses spare dataset in coordinate triple tuple list format. Coordinate file stores a list of (row, column, value) tuples.
      Parameters:
      path - the input file path.
      Returns:
      the sparse dataset.
      Throws:
      IOException - when fails to read file.
      ParseException - when fails to parse data.
    • from

      static SparseDataset<Void> from(Path path, int arrayIndexOrigin) throws IOException, ParseException
      Reads spare dataset in coordinate triple tuple list format. Coordinate file stores a list of (row, column, value) tuples:
       instanceID attributeID value
       instanceID attributeID value
       instanceID attributeID value
       instanceID attributeID value
       ...
       instanceID attributeID value
       instanceID attributeID value
       instanceID attributeID value
       
      Ideally, the entries are sorted (by row index, then column index) to improve random access times. This format is good for incremental matrix construction.

      In addition, there may a header line

       D W N   // The number of rows, columns and nonzero entries.
       
      or 3 header lines
       D    // The number of rows
       W    // The number of columns
       N    // The total number of nonzero entries in the dataset.
       
      Parameters:
      path - the input file path.
      arrayIndexOrigin - the starting index of array. By default, it is 0 as in C/C++ and Java. But it could be 1 to parse data produced by other programming language such as Fortran.
      Returns:
      the sparse dataset.
      Throws:
      IOException - if stream to file cannot be read or closed.
      ParseException - if an index is not an integer or the value is not a double.