Package smile.data
Class SparseDataset<T>
- Type Parameters:
T
- the target type.
- All Implemented Interfaces:
Iterable<SampleInstance<SparseArray,
,T>> Dataset<SparseArray,
T>
List of Lists sparse matrix format. LIL stores one list per row,
where each entry stores a column index and value. Typically, these
entries are kept sorted by column index for faster lookup.
This format is good for incremental matrix construction.
LIL is typically used to construct the matrix. Once the matrix is constructed, it is typically converted to a format, such as Harwell-Boeing column-compressed sparse matrix format, which is more efficient for matrix operations.
-
Constructor Summary
ConstructorsConstructorDescriptionConstructor.SparseDataset
(Collection<SampleInstance<SparseArray, T>> data, int ncol) Constructor. -
Method Summary
Modifier and TypeMethodDescriptionstatic SparseDataset
<Void> Parses spare dataset in coordinate triple tuple list format.static SparseDataset
<Void> Reads spare dataset in coordinate triple tuple list format.double
get
(int i, int j) Returns the value at entry (i, j).int
ncol()
Returns the number of columns.int
nrow()
Returns the number of rows.int
nz()
Returns the number of nonzero entries.int
nz
(int j) Returns the number of nonzero entries in column j.static SparseDataset
<Void> of
(SparseArray[] data) Returns a default implementation of SparseDataset without targets.static SparseDataset
<Void> of
(SparseArray[] data, int ncol) Returns a default implementation of SparseDataset without targets.toMatrix()
Convert into Harwell-Boeing column-compressed sparse matrix format.void
unitize()
Unitize each row so that L2 norm of x = 1.void
unitize1()
Unitize each row so that L1 norm of x is 1.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
Constructor Details
-
SparseDataset
Constructor.- Parameters:
data
- The sample instances.
-
SparseDataset
Constructor.- Parameters:
data
- The sample instances.ncol
- The number of columns.
-
-
Method Details
-
nz
public int nz()Returns the number of nonzero entries.- Returns:
- the number of nonzero entries.
-
nz
public int nz(int j) Returns the number of nonzero entries in column j.- Parameters:
j
- the column index.- Returns:
- the number of nonzero entries in column j.
-
nrow
public int nrow()Returns the number of rows.- Returns:
- the number of rows.
-
ncol
public int ncol()Returns the number of columns.- Returns:
- the number of columns.
-
get
public double get(int i, int j) Returns the value at entry (i, j).- Parameters:
i
- the row index.j
- the column index.- Returns:
- the cell value.
-
unitize
public void unitize()Unitize each row so that L2 norm of x = 1. -
unitize1
public void unitize1()Unitize each row so that L1 norm of x is 1. -
toMatrix
Convert into Harwell-Boeing column-compressed sparse matrix format.- Returns:
- the sparse matrix.
-
of
Returns a default implementation of SparseDataset without targets.- Parameters:
data
- sparse arrays.- Returns:
- the sparse dataset.
-
of
Returns a default implementation of SparseDataset without targets.- Parameters:
data
- sparse arrays.ncol
- the number of columns.- Returns:
- the sparse dataset.
-
from
Parses spare dataset in coordinate triple tuple list format. Coordinate file stores a list of (row, column, value) tuples.- Parameters:
path
- the input file path.- Returns:
- the sparse dataset.
- Throws:
IOException
- when fails to read file.ParseException
- when fails to parse data.
-
from
public static SparseDataset<Void> from(Path path, int arrayIndexOrigin) throws IOException, ParseException Reads spare dataset in coordinate triple tuple list format. Coordinate file stores a list of (row, column, value) tuples:instanceID attributeID value instanceID attributeID value instanceID attributeID value instanceID attributeID value ... instanceID attributeID value instanceID attributeID value instanceID attributeID value
Ideally, the entries are sorted (by row index, then column index) to improve random access times. This format is good for incremental matrix construction.In addition, there may a header line
D W N // The number of rows, columns and nonzero entries.
or 3 header linesD // The number of rows W // The number of columns N // The total number of nonzero entries in the dataset.
- Parameters:
path
- the input file path.arrayIndexOrigin
- the starting index of array. By default, it is 0 as in C/C++ and Java. But it could be 1 to parse data produced by other programming language such as Fortran.- Returns:
- the sparse dataset.
- Throws:
IOException
- if stream to file cannot be read or closed.ParseException
- if an index is not an integer or the value is not a double.
-