Value-Compressed Sparse Column (VCSC) Class Reference

class PyVSparse.VCSC[source]
_COOconstruct(moduleName, spmat)[source]

Private helper function to construct a VCSC matrix from a scipy.sparse COO matrix. In C++, the constructor expects std::tuple<indexT, indexT, T> for each non-zero element.

C++ declaration: template <typename T2, typename indexT2> VCSC(std::vector<std::tuple<indexT2, indexT2, T2>>& entries, uint64_t num_rows, uint32_t num_cols, uint32_t nnz);

Parameters:
  • moduleName (str) – The name of the module to construct the VCSC matrix from

  • spmat (sp.sparse.coo_matrix) – The input matrix

_CSconstruct(moduleName, spmat)[source]

Private helper function to construct a VCSC matrix from a scipy.sparse CSC or CSR matrix. This uses the Eigen::SparseMatrix<T> constructor in C++. Pybind11 handles the conversion.

A pure Python implementation of a CSR/C matrix could be used to make a VCSC matrix, but that is not implemented at this time. The C++ backend should support it though.

Parameters:
  • moduleName (str) – The name of the module to construct the VCSC matrix from

  • spmat (Union[sp.sparse.csc_matrix, sp.sparse.csr_matrix]) – The input matrix

__eq__(other)[source]

Compares the matrix to another VCSC matrix

Parameters:

other (VCSC) – The matrix to compare to

Returns:

True if the matrices are equal, False otherwise

Return type:

bool

__getitem__(key)[source]

Random access operator for VCSC.

As of right now, this only supports random access of a single element.

Parameters:

key (int) – The index of the element to access

Returns:

The value of the element at the index

Return type:

any

__imul__(other)[source]

Inplace multiplication of the matrix by a scalar

Parameters:

other (Union[int, float]) – The value or object to multiply the matrix by

Returns:

The matrix multiplied by the input

Return type:

VCSC

Raises:

TypeError – If the input is not a scalar or numpy array

__init__(spmat, indexType=numpy.uint32, order='col')[source]

Value-Compressed Sparse Column is a read-only sparse matrix format for redundant data without compromising speed. See README.md for more information.

While the name is indicitive of the storage order, the matrix can be stored in either column-major or row-major order.

This class can be constructed from a few different options: 1.) scipy.sparse.csc_matrix 2.) scipy.sparse.csr_matrix 3.) scipy.sparse.coo_matrix 4.) PyVSparse.VCSC 5.) A .vcsc file (written by VCSC.write() or a C++ program that uses VCSC.write()).

The user can specify the index type of the matrix. The default is np.uint32. The python version is limited to unsigned integers, becasue there is no advantage to using signed integers. The choice of index type should not affect performance, correctness, or what features are available. The only difference SHOULD be the memory consumption of the matrix, unless you are attempting to store more than the integer limit of the index type. i.e. storing 256 rows/cols with a np.uint8 index type.

The user can also specify the storage order of the matrix. The default is “Col” for column-major order.

Note: Becasue of how indices are stored in VCSC, cache misses are more commmon. For a very redundant matrix,

the performance of VCSC will be just as fast, in some cases faster, than a CSC matrix because caching is possible. However, this is based on naive implementation of matrix operations, so matrix multiplication will be faster for SciPy matrices that use BLAS.

VCSC is faster than IVCSC becasue indices are byte-aligned, but does not offer the same level of compression.

Coefficient-wise operations may be much faster because fewer are stored.

Parameters:
  • spmat (Union[sp.sparse.csc_matrix, sp.sparse.csr_matrix, sp.sparse.coo_matrix, PyVSparse.VCSC, str]) – The input matrix or .vcsc filename

  • indexType (np.dtype) – The index type of the matrix. The default is np.uint32

  • order (str) – The storage order of the matrix. The default is “Col” for column-major order. “Row” can also be specified for row-major order. Capitalization does not matter.

__iter__(index)[source]

Returns an iterator for the matrix

Returns:

An iterator for the matrix

Return type:

any

__matmul__(other)[source]

Multiplication of the matrix by a: - dense matrix

The matrix returned will be a dense numpy matrix or vector.

Parameters:

other (Union[np.ndarray]) – The object to multiply the matrix by

Returns:

The matrix multiplied by the input

Return type:

union[np.ndarray]

Raises:

TypeError – If the input is not a numpy array

__mul__(other)[source]

Multiplication of the matrix by a: - scalar

If the input is a scalar, then the matrix returned will be a VCSC matrix.

Parameters:

other (Union[int, float]) – The value or object to multiply the matrix by

Returns:

The matrix multiplied by the input

Return type:

union[VCSC]

Raises:

TypeError – If the input is not a scalar

__ne__(other)[source]

Compares the matrix to another VCSC matrix

Parameters:

other (VCSC) – The matrix to compare to

Returns:

True if the matrices are not equal, False otherwise

Return type:

bool

__radd__(other)[source]

Coefficient-wise addition of the matrix by a: - dense matrix

Becuase VCSC is read-only, the matrix returned will be a dense numpy matrix or vector. This is a right addition operation: A + VCSC

The matrix returned will be a dense numpy matrix or vector.

Parameters:

other (Union[np.ndarray, int, float]) – The object to add to the matrix

Returns:

The matrix added to the input

Return type:

union[np.ndarray]

Raises:

TypeError – If the input is not a numpy array, int, or float

__rsub__(other)[source]

Coefficient-wise subtraction of the matrix by a: - dense matrix

Becuase VCSC is read-only, the matrix returned will be a dense numpy matrix or vector. This is a right subtraction operation: A - VCSC

The matrix returned will be a dense numpy matrix or vector.

Parameters:

other (Union[np.ndarray, int, float]) – The object to subtract from the matrix

Returns:

The matrix subtracted from the input

Return type:

union[np.ndarray]

Raises:

TypeError – If the input is not a numpy array, int, or float

__rtruediv__(other)[source]

Coefficient-wise division of the matrix by a: - Dense matrix

Becuase VCSC is read-only, the matrix returned will be a dense numpy matrix or vector. This is a right division operation: A / VCSC

The matrix returned will be a dense numpy matrix or vector.

Parameters:

other (Union[int, float]) – The object to divide the matrix by

Returns:

The matrix divided by the input

Return type:

union[np.ndarray]

Raises:

TypeError – If the input is not a scalar

_npzConstruct(moduleName, secondary='csc')[source]

This is a wrapper function to construct a VCSC matrix from a .npz file. This still creates a scipy.sparse matrix, but allows for a convenient way to read the file.

Parameters:
  • moduleName (str) – The npz file name

  • secondary (str) – The secondary format to convert the matrix to a valid form if the npz file is not one with an available constructor. i.e. BSR -> CSC. The default is “csc”

Raises:

TypeError – If the secondary format is not a valid format.

append(matrix)[source]

Appends a matrix to the current matrix

The appended matrix must be of the same type or a scipy.sparse.csc_matrix/csr_matrix depending on the storage order of the current matrix. For a column-major matrix, the appended matrix will be appended to the end of the columns. For a row-major matrix, the appended matrix will be appended to the end of the rows.

Parameters:

matrix (Union[VCSC, sp.sparse.csc_matrix, sp.sparse.csr_matrix]) – The matrix to append

Raises:

TypeError – If the input matrix is not a supported type of matrix

Return type:

None

byteSize()[source]

Returns the memory consumption of the matrix in bytes

Return type:

np.uint64

fromVCSC(spmat)[source]

Copy constructor for VCSC

Parameters:

spmat (VCSC) – The input VCSC matrix

getCounts(outerIndex)[source]

Returns the number of non-zero elements in a column or row depending on storage order.

For example, if the matrix is:

[1] [1] [2]

Then the list [1, 2] will be returned

Note: Whether the counts are from a column or row depends on order of the matrix.

A matrix stored in column-major order will return the counts of a column.

Parameters:

outerIndex (int) – The index of the column or row to get the counts of

Returns:

A list containing the counts of the column or row

Return type:

list[Union[np.uint8, np.uint16, np.uint32, np.uint64]]

Raises:

IndexError – If the provided index is out of range

getIndices(outerIndex)[source]

Returns the indices of a column or row depending on storage order.

Note: Whether the indices are from a column or row depends on order of the matrix.

A matrix stored in column-major order will return the indices of a column.

Parameters:

outerIndex (int) – The index of the column or row to get the indices of

Returns:

A list containing the indices of the column or row

Return type:

list

Raises:

IndexError – If the provided index is out of range

getNumIndices(outerIndex)[source]

Returns the number of unique values in a column or row depending on storage order.

Note: Whether the number of indices are from a column or row depends on order of the matrix.

A matrix stored in column-major order will return the number of indices of a column.

Parameters:

outerIndex (int) – The index of the column or row to get the number of indices of

Returns:

A list containing the number of indices of each column or row

Return type:

list

Raises:

IndexError – If the provided index is out of range

getValues(outerIndex)[source]

Returns the unique values of a column or row depending on storage order.

Note: Whether the values are from a column or row depends on order of the matrix.

A matrix stored in column-major order will return the values of a column.

Parameters:

outerIndex (int) – The index of the column or row to get the values of

Returns:

A list containing the unique values of the column or row

Return type:

list

Raises:

IndexError – If the provided index is out of range

max(axis=None)[source]

On axis=None, returns the maximum of all elements in the matrix

If axis=0, returns the maximum of each column

If axis=1, returns the maximum of each row

Parameters:

axis (int) – The axis to find the maximum along. The default is None, which finds the maximum of all elements in the matrix.

Returns:

The maximum of the matrix or the maximum of each row/column

Return type:

Union[np.int64, np.double, np.ndarray]

min(axis=None)[source]

On axis=None, returns the minimum of all nonzero elements in the matrix

If axis=0, returns the nonzero minimum of each column

If axis=1, returns the nonzero minimum of each row

Note: because of the way the matrix is stored,

minimums that are zero are very expensive to compute.

There are a few exceptions: - If a row/column is all zeros, then the minimum will be zero. - if axis=None, then the minimum will be zero if nnz < rows * cols

Parameters:

axis (int) – The axis to find the minimum along. The default is None, which finds the minimum of all nonzero elements in the matrix.

Returns:

The minimum of the matrix or the minimum of each row/column

Return type:

Union[np.int64, np.double, np.ndarray]

norm()[source]

Returns the Frobenius norm of the matrix

Returns:

The Frobenius norm of the matrix

Return type:

np.double

read(filename)[source]

Function to read a VCSC formatted matrix from a file. This function should automatically determine the template type of the matrix.

This can also read a .npz file, but it must be of a CSC, CSR, or COO format.

Parameters:

filename (str) – The name of the file to read from

shape()[source]

Returns the shape of the matrix as a tuple (rows, cols)

:return The shape of the matrix :rtype: Tuple[np.uint32, np.uint32]

Return type:

tuple[numpy.uint32, numpy.uint32]

slice(start, end)[source]

Returns a slice of the matrix.

Currently, only slicing by storage order is supported. For example, if the matrix is stored in column-major order, Then the returned matrix will be a slice of the columns.

Parameters:
  • start (int) – The start index of the slice

  • end (int) – The end index of the slice

Returns:

The slice of the matrix

Return type:

VCSC

sum(axis=None)[source]

On axis=None, returns the sum of all elements in the matrix

If axis=0, returns the sum of each column

If axis=1, returns the sum of each row

Note: Sum is either int64 or a double

Parameters:

axis (int) – The axis to sum along. The default is None, which sums all elements in the matrix.

Returns:

The sum of the matrix or the sum of each row/column

Return type:

Union[np.int64, np.double, np.ndarray]

Raises:

ValueError – If the axis is not 0, 1, or None

tocsc()[source]

Converts the matrix to a scipy.sparse.csc_matrix

Note: This is a copy. This does not destroy the original matrix.

If the storage order of the VCSC matrix is in row-major, then then a csr_matrix will be created and converted to a csc_matrix.

Returns:

The matrix in csc format

Return type:

sp.sparse.csc_matrix

tocsr()[source]

Converts the matrix to a scipy.sparse.csr_matrix

Note: This is a copy. This does not destroy the original matrix.

If the storage order of the VCSC matrix is in column-major, then then a csc_matrix will be created and converted to a csr_matrix.

Returns:

The matrix in scipy.sparse.csr_matrix format

Return type:

sp.sparse.csr_matrix

trace()[source]

Returns the sum of all elements along the diagonal.

Throws ValueError if matrix is not square.

Note: Sum is either int64 or a double.

Returns:

The sum of the diagonal

Return type:

Union[np.int64, np.double]

Raises:

ValueError – If the matrix is not square

transpose(inplace=True)[source]

Transposes the matrix.

Note: This is a very slow operation. It is recommended to use the transpose() function from another matrix format instead.

Nothing is returned if the operation is in place.

Memory usage will change after this operation.

Parameters:

inplace (bool) – Whether to transpose the matrix in place. The default is True

Returns:

The transposed VCSC matrix

Return type:

VCSC

vectorLength(vector)[source]

Returns the euclidean length of the vector

Parameters:

vector (int) – The index of the vector to find the euclidean length of

Returns:

The euclidean length of the vector

Return type:

np.double

Raises:

IndexError – If the vector index is out of range

write(filename)[source]

Writes the matrix to a file. If the file name doesn’t include .vcsc, it will be appended.

Parameters:

filename (str) – The name of the file to write to

Return type:

None