Value-Compressed Sparse Column (VCSC) Class Reference
- class PyVSparse.VCSC[source]
- _COOconstruct(moduleName, spmat)[source]
Private helper function to construct a VCSC matrix from a scipy.sparse COO matrix. In C++, the constructor expects std::tuple<indexT, indexT, T> for each non-zero element.
C++ declaration: template <typename T2, typename indexT2> VCSC(std::vector<std::tuple<indexT2, indexT2, T2>>& entries, uint64_t num_rows, uint32_t num_cols, uint32_t nnz);
- Parameters:
moduleName (str) – The name of the module to construct the VCSC matrix from
spmat (sp.sparse.coo_matrix) – The input matrix
- _CSconstruct(moduleName, spmat)[source]
Private helper function to construct a VCSC matrix from a scipy.sparse CSC or CSR matrix. This uses the Eigen::SparseMatrix<T> constructor in C++. Pybind11 handles the conversion.
A pure Python implementation of a CSR/C matrix could be used to make a VCSC matrix, but that is not implemented at this time. The C++ backend should support it though.
- Parameters:
moduleName (str) – The name of the module to construct the VCSC matrix from
spmat (Union[sp.sparse.csc_matrix, sp.sparse.csr_matrix]) – The input matrix
- __eq__(other)[source]
Compares the matrix to another VCSC matrix
- Parameters:
other (VCSC) – The matrix to compare to
- Returns:
True if the matrices are equal, False otherwise
- Return type:
bool
- __getitem__(key)[source]
Random access operator for VCSC.
As of right now, this only supports random access of a single element.
- Parameters:
key (int) – The index of the element to access
- Returns:
The value of the element at the index
- Return type:
any
- __imul__(other)[source]
Inplace multiplication of the matrix by a scalar
- Parameters:
other (Union[int, float]) – The value or object to multiply the matrix by
- Returns:
The matrix multiplied by the input
- Return type:
- Raises:
TypeError – If the input is not a scalar or numpy array
- __init__(spmat, indexType=numpy.uint32, order='col')[source]
Value-Compressed Sparse Column is a read-only sparse matrix format for redundant data without compromising speed. See README.md for more information.
While the name is indicitive of the storage order, the matrix can be stored in either column-major or row-major order.
This class can be constructed from a few different options: 1.) scipy.sparse.csc_matrix 2.) scipy.sparse.csr_matrix 3.) scipy.sparse.coo_matrix 4.) PyVSparse.VCSC 5.) A .vcsc file (written by VCSC.write() or a C++ program that uses VCSC.write()).
The user can specify the index type of the matrix. The default is np.uint32. The python version is limited to unsigned integers, becasue there is no advantage to using signed integers. The choice of index type should not affect performance, correctness, or what features are available. The only difference SHOULD be the memory consumption of the matrix, unless you are attempting to store more than the integer limit of the index type. i.e. storing 256 rows/cols with a np.uint8 index type.
The user can also specify the storage order of the matrix. The default is “Col” for column-major order.
- Note: Becasue of how indices are stored in VCSC, cache misses are more commmon. For a very redundant matrix,
the performance of VCSC will be just as fast, in some cases faster, than a CSC matrix because caching is possible. However, this is based on naive implementation of matrix operations, so matrix multiplication will be faster for SciPy matrices that use BLAS.
VCSC is faster than IVCSC becasue indices are byte-aligned, but does not offer the same level of compression.
Coefficient-wise operations may be much faster because fewer are stored.
- Parameters:
spmat (Union[sp.sparse.csc_matrix, sp.sparse.csr_matrix, sp.sparse.coo_matrix, PyVSparse.VCSC, str]) – The input matrix or .vcsc filename
indexType (np.dtype) – The index type of the matrix. The default is np.uint32
order (str) – The storage order of the matrix. The default is “Col” for column-major order. “Row” can also be specified for row-major order. Capitalization does not matter.
- __iter__(index)[source]
Returns an iterator for the matrix
- Returns:
An iterator for the matrix
- Return type:
any
- __matmul__(other)[source]
Multiplication of the matrix by a: - dense matrix
The matrix returned will be a dense numpy matrix or vector.
- Parameters:
other (Union[np.ndarray]) – The object to multiply the matrix by
- Returns:
The matrix multiplied by the input
- Return type:
union[np.ndarray]
- Raises:
TypeError – If the input is not a numpy array
- __mul__(other)[source]
Multiplication of the matrix by a: - scalar
If the input is a scalar, then the matrix returned will be a VCSC matrix.
- Parameters:
other (Union[int, float]) – The value or object to multiply the matrix by
- Returns:
The matrix multiplied by the input
- Return type:
union[VCSC]
- Raises:
TypeError – If the input is not a scalar
- __ne__(other)[source]
Compares the matrix to another VCSC matrix
- Parameters:
other (VCSC) – The matrix to compare to
- Returns:
True if the matrices are not equal, False otherwise
- Return type:
bool
- __radd__(other)[source]
Coefficient-wise addition of the matrix by a: - dense matrix
Becuase VCSC is read-only, the matrix returned will be a dense numpy matrix or vector. This is a right addition operation: A + VCSC
The matrix returned will be a dense numpy matrix or vector.
- Parameters:
other (Union[np.ndarray, int, float]) – The object to add to the matrix
- Returns:
The matrix added to the input
- Return type:
union[np.ndarray]
- Raises:
TypeError – If the input is not a numpy array, int, or float
- __rsub__(other)[source]
Coefficient-wise subtraction of the matrix by a: - dense matrix
Becuase VCSC is read-only, the matrix returned will be a dense numpy matrix or vector. This is a right subtraction operation: A - VCSC
The matrix returned will be a dense numpy matrix or vector.
- Parameters:
other (Union[np.ndarray, int, float]) – The object to subtract from the matrix
- Returns:
The matrix subtracted from the input
- Return type:
union[np.ndarray]
- Raises:
TypeError – If the input is not a numpy array, int, or float
- __rtruediv__(other)[source]
Coefficient-wise division of the matrix by a: - Dense matrix
Becuase VCSC is read-only, the matrix returned will be a dense numpy matrix or vector. This is a right division operation: A / VCSC
The matrix returned will be a dense numpy matrix or vector.
- Parameters:
other (Union[int, float]) – The object to divide the matrix by
- Returns:
The matrix divided by the input
- Return type:
union[np.ndarray]
- Raises:
TypeError – If the input is not a scalar
- _npzConstruct(moduleName, secondary='csc')[source]
This is a wrapper function to construct a VCSC matrix from a .npz file. This still creates a scipy.sparse matrix, but allows for a convenient way to read the file.
- Parameters:
moduleName (str) – The npz file name
secondary (str) – The secondary format to convert the matrix to a valid form if the npz file is not one with an available constructor. i.e. BSR -> CSC. The default is “csc”
- Raises:
TypeError – If the secondary format is not a valid format.
- append(matrix)[source]
Appends a matrix to the current matrix
The appended matrix must be of the same type or a scipy.sparse.csc_matrix/csr_matrix depending on the storage order of the current matrix. For a column-major matrix, the appended matrix will be appended to the end of the columns. For a row-major matrix, the appended matrix will be appended to the end of the rows.
- Parameters:
matrix (Union[VCSC, sp.sparse.csc_matrix, sp.sparse.csr_matrix]) – The matrix to append
- Raises:
TypeError – If the input matrix is not a supported type of matrix
- Return type:
None
- getCounts(outerIndex)[source]
Returns the number of non-zero elements in a column or row depending on storage order.
- For example, if the matrix is:
[1] [1] [2]
Then the list [1, 2] will be returned
- Note: Whether the counts are from a column or row depends on order of the matrix.
A matrix stored in column-major order will return the counts of a column.
- Parameters:
outerIndex (int) – The index of the column or row to get the counts of
- Returns:
A list containing the counts of the column or row
- Return type:
list[Union[np.uint8, np.uint16, np.uint32, np.uint64]]
- Raises:
IndexError – If the provided index is out of range
- getIndices(outerIndex)[source]
Returns the indices of a column or row depending on storage order.
- Note: Whether the indices are from a column or row depends on order of the matrix.
A matrix stored in column-major order will return the indices of a column.
- Parameters:
outerIndex (int) – The index of the column or row to get the indices of
- Returns:
A list containing the indices of the column or row
- Return type:
list
- Raises:
IndexError – If the provided index is out of range
- getNumIndices(outerIndex)[source]
Returns the number of unique values in a column or row depending on storage order.
- Note: Whether the number of indices are from a column or row depends on order of the matrix.
A matrix stored in column-major order will return the number of indices of a column.
- Parameters:
outerIndex (int) – The index of the column or row to get the number of indices of
- Returns:
A list containing the number of indices of each column or row
- Return type:
list
- Raises:
IndexError – If the provided index is out of range
- getValues(outerIndex)[source]
Returns the unique values of a column or row depending on storage order.
- Note: Whether the values are from a column or row depends on order of the matrix.
A matrix stored in column-major order will return the values of a column.
- Parameters:
outerIndex (int) – The index of the column or row to get the values of
- Returns:
A list containing the unique values of the column or row
- Return type:
list
- Raises:
IndexError – If the provided index is out of range
- max(axis=None)[source]
On axis=None, returns the maximum of all elements in the matrix
If axis=0, returns the maximum of each column
If axis=1, returns the maximum of each row
- Parameters:
axis (int) – The axis to find the maximum along. The default is None, which finds the maximum of all elements in the matrix.
- Returns:
The maximum of the matrix or the maximum of each row/column
- Return type:
Union[np.int64, np.double, np.ndarray]
- min(axis=None)[source]
On axis=None, returns the minimum of all nonzero elements in the matrix
If axis=0, returns the nonzero minimum of each column
If axis=1, returns the nonzero minimum of each row
- Note: because of the way the matrix is stored,
minimums that are zero are very expensive to compute.
There are a few exceptions: - If a row/column is all zeros, then the minimum will be zero. - if axis=None, then the minimum will be zero if nnz < rows * cols
- Parameters:
axis (int) – The axis to find the minimum along. The default is None, which finds the minimum of all nonzero elements in the matrix.
- Returns:
The minimum of the matrix or the minimum of each row/column
- Return type:
Union[np.int64, np.double, np.ndarray]
- norm()[source]
Returns the Frobenius norm of the matrix
- Returns:
The Frobenius norm of the matrix
- Return type:
np.double
- read(filename)[source]
Function to read a VCSC formatted matrix from a file. This function should automatically determine the template type of the matrix.
This can also read a .npz file, but it must be of a CSC, CSR, or COO format.
- Parameters:
filename (str) – The name of the file to read from
- shape()[source]
Returns the shape of the matrix as a tuple (rows, cols)
:return The shape of the matrix :rtype: Tuple[np.uint32, np.uint32]
- Return type:
tuple[numpy.uint32, numpy.uint32]
- slice(start, end)[source]
Returns a slice of the matrix.
Currently, only slicing by storage order is supported. For example, if the matrix is stored in column-major order, Then the returned matrix will be a slice of the columns.
- Parameters:
start (int) – The start index of the slice
end (int) – The end index of the slice
- Returns:
The slice of the matrix
- Return type:
- sum(axis=None)[source]
On axis=None, returns the sum of all elements in the matrix
If axis=0, returns the sum of each column
If axis=1, returns the sum of each row
Note: Sum is either int64 or a double
- Parameters:
axis (int) – The axis to sum along. The default is None, which sums all elements in the matrix.
- Returns:
The sum of the matrix or the sum of each row/column
- Return type:
Union[np.int64, np.double, np.ndarray]
- Raises:
ValueError – If the axis is not 0, 1, or None
- tocsc()[source]
Converts the matrix to a scipy.sparse.csc_matrix
- Note: This is a copy. This does not destroy the original matrix.
If the storage order of the VCSC matrix is in row-major, then then a csr_matrix will be created and converted to a csc_matrix.
- Returns:
The matrix in csc format
- Return type:
sp.sparse.csc_matrix
- tocsr()[source]
Converts the matrix to a scipy.sparse.csr_matrix
- Note: This is a copy. This does not destroy the original matrix.
If the storage order of the VCSC matrix is in column-major, then then a csc_matrix will be created and converted to a csr_matrix.
- Returns:
The matrix in scipy.sparse.csr_matrix format
- Return type:
sp.sparse.csr_matrix
- trace()[source]
Returns the sum of all elements along the diagonal.
Throws ValueError if matrix is not square.
Note: Sum is either int64 or a double.
- Returns:
The sum of the diagonal
- Return type:
Union[np.int64, np.double]
- Raises:
ValueError – If the matrix is not square
- transpose(inplace=True)[source]
Transposes the matrix.
- Note: This is a very slow operation. It is recommended to use the transpose() function from another matrix format instead.
Nothing is returned if the operation is in place.
Memory usage will change after this operation.
- Parameters:
inplace (bool) – Whether to transpose the matrix in place. The default is True
- Returns:
The transposed VCSC matrix
- Return type: