What is "colptr" in Julia and its counterpart in Python? - python

I'm trying to rewrite some Julia code into python code. And I just found colptr attached to sparse matrix. I searched it but I still don't understand what it is.
Could someone provide me the information about it and counterpart in python 3? Thank you in advance.
[edit]
this is from Julia's references
struct SparseMatrixCSC{Tv,Ti<:Integer} <: AbstractSparseMatrix{Tv,Ti}
m::Int # Number of rows
n::Int # Number of columns
colptr::Vector{Ti} # Column j is in colptr[j]:(colptr[j+1]-1)
rowval::Vector{Ti} # Row indices of stored values
nzval::Vector{Tv} # Stored values, typically nonzeros
for instance, A.colptr[j] is referring all the elements of j-th column of CSC matrix A?
I tried to figure it out by running some simple code below,
A = sparse([1, 1, 2, 3], [1, 3, 2, 3], [0, 1, 2, 0])
for i=1:4
println(A.colptr[i])
end
and the result was
1
2
3
5
I still have no idea why the result would be like this. explanation says
Ti is the integer type for storing column pointers

You're looking at a compressed sparse column (CSC) representation of a matrix. Instead of, for example, storing all the values of a matrix in sequence in memory, this allows only storing nonzero values. For example, the matrix
5 0 0
6 0 7
1 0 3
0 2 0
can either be stored in memory as the column major sequence 5 6 1 0 0 0 0 2 0 7 3 0 or you could do something smarter.
If you only store the column major sequence of nonzero elements, you end up with a much shorter list: 5 6 1 2 7 3! But now you need a way to map these values back to their locations in the matrix. You need a column index and a row index. Thus, we have two more lists:
The row indices for each stored value can also be stored in a one-to-one fashion: 1 2 3 4 2 3.
Now, I could store the column indices in a similar one-to-one fashion: 1 1 1 2 3 3. Were I to do so, this would be a sparse coordinate (COO) format. But note that there's a lot of redundant information here: look at all those repeated values! The common CSC format compresses this further. I already know I have three columns; I could simply store where each column starts. This is the colptr: it has one value per column and points to where that column starts. Thus instead of storing six values, it only need to store three: column one starts at index one (of course). Column two starts at index 4, and column three starts at index 5. It turns out to make life a bit easier if we also store a final fourth value representing one past the end, because then we can describe the nonzero values in a particular column simply by saying that the stored values in column j can be found in nzval[colptr[j]:colptr[j+1]-1].
The typical Python equivalent is in scipy: scipy.sparse.csc_matrix; simply substitutue colptr -> indptr, indices -> rowval, nzval -> data and accommodate the 0-based indexing.

Related

The minimum number of subarrays that should be incremented to form a target array

Suppose we have an array called target with positive values. Now consider an array initial of same size with all zeros. We have to find the minimum number of subarrays required to generate a target array from the initial if we do this operation: (Select any subarray from initial and increment each value by one.)
So, if the input is like target = [2,3,4,3,2],
[0,0,0,0,0] first pass select subarray from index 0 to 4 and increase it by 1, so array will be [1,1,1,1,1], then again select from index 0 to 4 to make it [2,2,2,2,2], then select elements from index 1 to 3 and increase, so array will be [2,3,3,3,2], and finally select index 2 and make the array [2,3,4,3,2] which is same as target. so the output will be:
0 4
0 4
1 3
2 2
I have a problem with finding a good algorithm in this question. At first, I thought it might be better to reduce the target array to 0 array, but this does not make a difference in the algorithm that I need to find, any help or hint would be appreciated.

Printing a Python Array

I have the array below that represent a matrix of 20 cols x 10 rows.
What I am trying to do is to get the value located on the third position after I provide the Column and Row Values. For example if I type in the values 3 and 0, I expect to get 183 as answer. I used the print command as follows print(matrix[3][0][I don't know]) either I get out of range or the undesirable results.
I also organized the data as matrix[[[0],[0],[180]], [[1],[0],[181]], [[2],[0],[182]],... without too much success.
I have the matrix data on a csv file, so I can formatted accordingly if the problem is the way I am presenting the data.
Can soomeone, please, take a look to this code and direct me? Thanks
matrix =[]
matrix =[
['0','0','180'],
['1','0','181'],
['2','0','182'],
['3','0','183'],
['4','0','184'],
['5','0','185'],
['6','0','186'],
['7','0','187'],
['18','0','198']]
print(matrix[?][?][value])
your matrix here is 9 * 3
if you want the 185, it's in the 6th row 3rd column, so indexes are 5 and 2 respectively.
matrix[5][2] will print the result, idk why you have a 3rd bracket.
basically to access an element you will do [rowNumber][colNumber] , first brackets will give you whatever is in that position of the big array (a 2 d array is just an array of arrays) so you get an array (1D with 3 element) you then put the index of the element in that 1D array.

A loop to iteratively arrive at rowsums = colsums for i = j

I do apologize my question is going to be wordy because i'm just at a loss on how to even start coding this. Pseudo-code answers are highly appreciated if only to allow me to understand how to solve this (then I can write some actual code and come back for help if necessary).
My problem isn't so much the code as it is understanding the logic I need (which is arguably the harder part of programming).
An informal explanation of my problem is that want to change a matrix A (which happens to be sparse) such that the row sums are equal to the column sums. I can do this by adding to A a matrix AS where S is a matrix of scales.
Formally, I want to find an S matrix such that (A + AS)ONESn = T and (t(A) + T(A)S)ONESn = T where ONESn is a vector of ones that creates T, the vector of row sums.
The vector T is set in stone as it were, it is the current column sums and is the target for the row sums.
I think the way I want to solve this is for each row i and column j where i = j I want to find the row sum and compute how far it is from the target. Then I want to change each element of that row such that the row sum equals the target (or is at least "close enough" where I can set the "close enough").
However, this is subject to the condition that the sum of column j must equal the target as well.
How can I design the logic so that I can start with say column 1 and row 1, figure out the values in row 1 and then figure out the values of column 1 subject to the first entry of column 1 being "fixed" by the earlier procedure.
Following that, row 2 should have its first value "fixed" by the above, and similarly the programme needs to figure out column 2 with fixed values for the first two entries now.
And so on until you get to the final column and row
I have tried programming a gradient descent but got stick on how to make the gradient descent for the columns depend on the gradient descent for the rows iteratively.
I've also worked this out by hand (for a 2x2 matrix), I can figure out the answer but I'm not sure how I managed to do so which is why I'm struggling to code it.
Suppose A is a 2x2 matrix of [1, 2, 3, 4]. Row sums are are [4, 6]. Column sums are [3, 7].
1 3 | 4
2 4 | 6
___
3 7
if I add the matrix S = [1, 0, -1, 0]
1 -1
0 0
I get A + S = [2, 2, 2, 4] which has row sums [4, 6].
2 2 | 4
2 4 | 6
___
4 6
Expected results are a matrix (A + AS) such that the row sums equal the column sums.
Or an error message saying "does not converge"
You have some matrix A and you need to add another matrix S so that the resulting matrix M has same row sums as column sums. This means:
A + S = M # For M row sums = column sums
So what you need to do is to find S. You can simply change the equotion to
S = M - S
Now you can set any matrix with same row sum and column sum for M and you get S. Once you have S you can do
A + S = M.
This means that you can add to every matrix A another matrix S so that the resulting matrix M has row sums= column sums. Hence, you will not get the messege "matrix does not converge".
Here is a R code
A <- matrix(rnorm(4), ncol= 2)
M <- matrix(c(2,2,2,4), ncol= 2)
S <- M - A
rowSums(A+S) == colSums(A+S)
TRUE TRUE
Or, more general:
row_col_num <- 5 # number of columns and rows
A <- matrix(rnorm(row_col_num *row_col_num ), ncol= row_col_num )
M <- matrix(rep(1, row_col_num *row_col_num ), ncol= row_col_num )
S <- M - A
rowSums(A+S) == colSums(A+S)
TRUE TRUE TRUE TRUE TRUE
The resulting matrix A+S is always as you set M. So I am not sure what this is for. If you need to know how to find S, where A+S gives you a matrix M with row sums= column sums, this is how you can do it.

Summing array values over a specific range with numpy

So I am trying to get the sum over a specific range of values in a text file using:
np.sum(d[a:b])
I am using a text file with 10000 entries. I know that we always start at zero. So my range is quite large i.e; index 200-555 (including 200 and 555). I tried just for testing summing over a small range:
In [17]: np.sum(d[1:4])
Out[17]: 50.164228
But the above code summed from the 2nd block (labeled number 1 by python) until the third. The numbers are; (0-> 13.024)
, 1-> 17.4529, 2-> 16.9382, 3-> 15.7731,( 4-> 11.7589), 5-> 14.5178.
zero is just for reference and it ignored the 4th-> 11.7589. Why?
When using range indexing in Python, the second index (the 4 in your case) is not an inclusive index. By specifying [1:4], you're summing the elements from index 1 up to but not including index 4. Specify 5 as the second index if you want to include the element at index 4.

Slicing arrays in Numpy / Scipy

I have an array like:
a = array([[1,2,3],[3,4,5],[4,5,6]])
What's the most efficient way to slice out a 1x2 array out of this that has only the first two columns of "a"?
i.e.
array([[2,3],[4,5],[5,6]]) in this case.
Two dimensional numpy arrays are indexed using a[i,j] (not a[i][j]), but you can use the same slicing notation with numpy arrays and matrices as you can with ordinary matrices in python (just put them in a single []):
>>> from numpy import array
>>> a = array([[1,2,3],[3,4,5],[4,5,6]])
>>> a[:,1:]
array([[2, 3],
[4, 5],
[5, 6]])
Is this what you're looking for?
a[:,1:]
To quote documentation, the basic slice syntax is i:j:k where i is the starting index, j is the stopping index, and k is the step (when k > 0).
Now if i is not given, it defaults to 0 if k > 0. Otherwise i defaults to n - 1 for k < 0 (where n is the length of the array).
If j is not given, it defaults to n (length of array).
That's for a one dimensional array.
Now a two dimensional array is a different beast. The slicing syntax for that is a[rowrange, columnrange].
So if you want all the rows, but just the last two columns, like in your case, you do:
a[0:3, 1:3]
Here, "[0:3]" means all the rows from 0 to 3. and "[1:3]" means all columns from column 1 to column 3.
Now as you may be wondering, even though you have only 3 columns and the numbering starts from 1, it must return 3 columns right? i.e: column 1, column 2, column 3
That is the tricky part of this syntax. The first column is actually column 0. So when you say "[1:3]", you are actually saying give me column 1 and column 2. Which are the last two columns you want. (There actually is no column 3.)
Now if you don't know how long your matrix is or if you want all the rows, you can just leave that part empty.
i.e.
a[:, 1:3]
Same goes for columns also. i.e if you wanted say, all the columns but just the first row, you would write
a[0:1,:]
Now, how the above answer a[:,1:] works is because when you say "[1:]" for columns, it means give me everything except for column 0, and till the end of all the columns. i.e empty means 'till the end'.
By now you must realize that anything on either side of the comma is all a subset of the one dimensional case I first mentioned above. i.e if you want to specify your rows using step sizes you can write
a[::2,1]
Which in your case would return
array([[2, 3],
[5, 6]])
i.e. a[::2,1] elucidates as: give me every other row, starting with the top most, and give me only the 2nd column.
This took me some time to figure out. So pasting it here, just in case it helps someone.

Categories

Resources