Summing array values over a specific range with numpy

Summing array values over a specific range with numpy - python

So I am trying to get the sum over a specific range of values in a text file using:
np.sum(d[a:b])
I am using a text file with 10000 entries. I know that we always start at zero. So my range is quite large i.e; index 200-555 (including 200 and 555). I tried just for testing summing over a small range:
In [17]: np.sum(d[1:4])
Out[17]: 50.164228
But the above code summed from the 2nd block (labeled number 1 by python) until the third. The numbers are; (0-> 13.024)
, 1-> 17.4529, 2-> 16.9382, 3-> 15.7731,( 4-> 11.7589), 5-> 14.5178.
zero is just for reference and it ignored the 4th-> 11.7589. Why?

When using range indexing in Python, the second index (the 4 in your case) is not an inclusive index. By specifying [1:4], you're summing the elements from index 1 up to but not including index 4. Specify 5 as the second index if you want to include the element at index 4.

Related

Kind of Skewnorm in Python - datapoints

I would like to generate a list of random values with it's highest value at given index and decreasing values towards start and end of the list. E.g. if I have 1 dim 10 element long list and I'm at index 3 then this would be my highest value and other values decreases towards index 0 and index 9. So the two primary parameters would be list length and Index of top value. It would be also nice to control random value range and mean of the list.
Would anyone know function (combination of functions) from numpy / scipy etc. that would satisfy this case? I was looking at numpy's different kind of norm functions but this is not what I'm looking for.

Find Sign when Sign Changes in Pandas Column while Ignoring Zeros using Vectorization

I'm trying to find a vectorized way of determining the first instance where my column of data has a sign change. I looked at this question and it gets close to what I want, except it evaluates my first zeros as true. I'm open to different solutions including changing how the data is set up in the first place. I'll detail what I'm doing below.
I have two columns, let's call them positive and negative, that look at a third column. The third column has values ranging between [-5, 5]. When this column is [3, 5], my positive column gets a +1 on that same row; all other rows are 0 in that column. Likewise, when the third column is between [-5, -3], my negative column gets a -1 in that row; all other rows are 0.
I combine these columns into one column. You can conceptualize this as 'turn machine on, keep it on/off, turn it off, keep it on/off, turn machine on ... etc.' The problem I've having is that my combined column looks something like below:
pos = [1,1,1,0, 0, 0,0,0,0,0,1, 0,1]
neg = [0,0,0,0,-1,-1,0,0,0,0,0,-1,0]
com = [1,1,1,0,-1,-1,0,0,0,0,1,-1,1]
# Below is what I want to have as the final column.
cor = [1,0,0,0,-1, 0,0,0,0,0,1,-1,1]
The problem with what I've linked is that it gets close, but it evaluates the first 0 as a sign change as well. 0's should be ignored and I tried a few things, but seem to be creating new errors. For the sake of completeness, this is what the code linked outputs:
lnk = [True,False,False,True,True,False,True,False,False,False,True,True,True]
As you can see, it's doing the 1 and -1 not flipping fine, but the zero's it's flipping. Not sure if I should change how the combined column is made or if I should change the logic for the creation of the component columns, both. The big thing is I need to vectorize this code for performance concerns.
Any help would be greatly appreciated!

Let's suppose your dataframe is named df with columns pos and neg then you can try something like the following :
df.loc[:, "switch_pos"] = (np.diff(df.pos, prepend=0) > 0)*1
df.loc[:, "switch_neg"] = (np.diff(df.neg, prepend=0) > 0)*(-1)
You can then combine your two switchs columns.
Explanations
no.diff gives you the difference row by row but setting (for pos columns) 1 for 0 to 1 and - 1 for 1 to 0. Considering your desired output, you want to keep only your 0 to 1, that's why you need to keep only the more than zero output

Finding the index of the maximum number in a python matrix which includes strings

I understand that
np.argmax(np.max(x, axis=1))
returns the index of the row that contains the maximum value and
np.argmax(np.max(x, axis=0))
returns the index of the row that contains the maximum value.
But what if the matrix contained strings? How can I change the code so that it still finds the index of the largest value?
Also (if there's no way to do what I previously asked for), can I change the code so that the operation is only carried out on a sub-section of the matrix, for instance, on the bottom right '2x2' sub-matrix in this example:
array = [['D','F,'J'],
['K',3,4],
['B',3,1]]
[[3,4],
[3,1]]

Can you try first converting the column to type dtype? If you take the min/max of a dtype column, it should use string values for the minimum/maximum.

Although not efficient, this could be one way to find index of the maximum number in the original matrix by using slices:
newmax=0
newmaxrow=0
newmaxcolumn=0
for row in [array[i][1:] for i in range(1,2)]:
for num in row:
if num>newmax:
newmax=num
newmaxcolumn=row.index(newmax)+1
newmaxrow=[array[i][1:] for i in range(1,2)].index(row)+1
Note: this method would not work if the lagest number lies within row 0 or column 0.

Finding index by iterating over each row of matrix

I have an numpy array 'A' of size 5000x10. I also have another number 'Num'. I want to apply the following to each row of A:
import numpy as np
np.max(np.where(Num > A[0,:]))
Is there a pythonic way than writing a for loop for above.

You could use argmax -
A.shape[1] - 1 - (Num > A)[:,::-1].argmax(1)
Alternatively with cumsum and argmax -
(Num > A).cumsum(1).argmax(1)
Explanation : With np.max(np.where(..), we are basically looking to get the last occurrence of matches along each row on the comparison.
For the same, we can use argmax. But, argmax on a boolean array gives us the first occurrence and not the last one. So, one trick is to perform the comparison and flip the columns with [:,::-1] and then use argmax. The column indices are then subtracted by the number of cols in the array to make it trace back to the original order.
On the second approach, it's very similar to a related post and therefore quoting from it :
One of the uses of argmax is to get ID of the first occurence of the max element along an axis in an array . So, we get the cumsum along the rows and get the first max ID, which represents the last non-zero elem. This is because cumsum on the leftover elements won't increase the sum value after that last non-zero element.

Finding index by iterating each element of a vector over each row of matrix [duplicate]

I have an numpy array 'A' of size 5000x10. I also have another number 'Num'. I want to apply the following to each row of A:
import numpy as np
np.max(np.where(Num > A[0,:]))
Is there a pythonic way than writing a for loop for above.

You could use argmax -
A.shape[1] - 1 - (Num > A)[:,::-1].argmax(1)
Alternatively with cumsum and argmax -
(Num > A).cumsum(1).argmax(1)
Explanation : With np.max(np.where(..), we are basically looking to get the last occurrence of matches along each row on the comparison.
For the same, we can use argmax. But, argmax on a boolean array gives us the first occurrence and not the last one. So, one trick is to perform the comparison and flip the columns with [:,::-1] and then use argmax. The column indices are then subtracted by the number of cols in the array to make it trace back to the original order.
On the second approach, it's very similar to a related post and therefore quoting from it :
One of the uses of argmax is to get ID of the first occurence of the max element along an axis in an array . So, we get the cumsum along the rows and get the first max ID, which represents the last non-zero elem. This is because cumsum on the leftover elements won't increase the sum value after that last non-zero element.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.