Numpy list comprehension iterating over 2D array - python

I have a bit of code that loads up a long (100k-1mil) set of lines, it has an index in the first column followed by 18 values, for a total of 19 floats per line. This all is put into a numpy array.
I need to do some simple processing on the matrix to keep the index column and get out 1s and 0s depending on conditions of whether values are positive or negative, but the criterion varies as the columns are sequential pairs of values with different reference values.
The code below goes through the columns 2-19 first by evens then odds to check the values, and then creates a temporary list to put into the array I want to have at the end.
I know there's a simpler way to do this, with list comprehension and possibly lambda, but I'm not proficient enough with this to figure it out. So I'm hoping someone can help me reduce the length of this code into something more compact. More efficient would be great too, but I know that the compact methods don't always increase efficiency. It will however help me better understand list comprehension, with and without numpy.
Sample values for reference:
0.000 72.250 -158.622 86.575 -151.153 85.807 -149.803 84.285 -143.701 77.723 -160.471 96.587 -144.020 75.827 -157.071 87.629 -148.856 100.814 -140.488
10.000 56.224 -174.351 108.309 -154.148 68.564 -155.721 83.634 -132.836 75.030 -177.971 100.623 -146.616 61.856 -150.885 92.147 -150.124 91.841 -153.112
20.000 53.357 -153.537 58.190 -160.235 77.575 176.257 93.771 -150.549 77.789 -161.534 103.589 -146.363 73.623 -159.441 99.315 -129.663 92.842 -138.736
And here is the code snippet:
datain = numpy.loadtxt(testfile.txt) #load data
dataout = numpy.zeros(datain.shape) # initialize empty processing array
dataout[:, 0] = datain[:, 0] # assign time values from input data to processing array
dataarray = numpy.zeros(len(datain[0]))
phit = numpy.zeros((len(dataarray)-1)/2)
psit = numpy.zeros((len(dataarray)-1)/2)
for i in range(len(datain)):
dataarray = numpy.copy(datain[i])
phit[:] = dataarray[1::2]
psit[:] = dataarray[2::2]
temp = []
for j in range(len(phit)):
if(phit[j] < 0):
temp.append(1)
else:
temp.append(0)
if(psit[j] > 0):
temp.append(1)
else:
temp.append(0)
dataout[i][1:] = temp
Thanks in advance, I know there's a fair number of questions on these topics here; unfortunately I couldn't find one that helped me get to a solution.

As #abarnert mentioned, the solution here is not to write better loops, but (since you're using Numpy) to not loop in Python at all by understanding how to use Numpy in more advanced ways.
What you have is a matrix like
[ [idx, v0a, v0b, v1a, v1b, ... ], ... ]
And you want a matrix that's basically
[ [idx, 1 if v0a < 0 else 0, 1 if v0b > 0 else 0, ... ], ... ]
We're going to do this in two steps: first, we'll transform the matrix slightly so that the comparisons are all the same; second, we'll apply the comparison in-place.
The only difference between how we handle "even" and "odd" columns is that one is being checked for <0, the other >0. If we modify the second group of columns by multiplying them by -1, then these comparisons both become simply <0:
datain[:, 2::2] *= -1
Now we just want to know, for every value (besides the first column), is that value <0. This is super easy:
datain[:, 1:] < 0
This returns a matrix of boolean values, where each value represents whether or not the corresponding cell in datain[:, 1:] was less than 0. You want these as integers, 1 for True and 0 for False; it turns out, when we assign these boolean values back into our original array (which contains floats), numpy will cast the bools into floats automatically; True will get cast to 1.0, and False will get cast to 0.0.
If you don't want to throw away your original data, simply copy it off first. Here's the complete code:
# If you want to preserve your old data, create a copy for us to modify
dataout = np.array(datain)
# Now assign your integer values into your data array
dataout[:, 2::2] *= -1
dataout[:, 1:] = datain[:, 1:] < 0
For the sample input you provided:
array([[ 0. , 72.25 , 158.622, 86.575, 151.153, 85.807,
149.803, 84.285, 143.701, 77.723, 160.471, 96.587,
144.02 , 75.827, 157.071, 87.629, 148.856, 100.814,
140.488],
[ 10. , 56.224, 174.351, 108.309, 154.148, 68.564,
155.721, 83.634, 132.836, 75.03 , 177.971, 100.623,
146.616, 61.856, 150.885, 92.147, 150.124, 91.841,
153.112],
[ 20. , 53.357, 153.537, 58.19 , 160.235, 77.575,
-176.257, 93.771, 150.549, 77.789, 161.534, 103.589,
146.363, 73.623, 159.441, 99.315, 129.663, 92.842,
138.736]])
This code ends up with the following final result:
array([[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0.],
[10., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0.],
[20., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0.]])

Thanks to abarnert for pointing me in the right direction with this, the solution is pretty simple.
datain = numpy.loadtxt(testfile.txt) #load data
dataout = numpy.empty(datain.shape, dtype=int) # initialize empty processing array
dataout[:, 0] = datain[:, 0] # assign time values from input data to processing array
dataout[:, 1::2] = datain[:, 1::2] < 0
dataout[:, 2::2] = datain[:, 2::2] > 0
That's it! Much shorter, much more readable, and gets me the values I want.

Related

How to convert an upper triangular matrix to reduced row echelon form without using a for loop in python?

I am trying to solve a system of linear equations from an upper triangular augmented matrix, but I'm having trouble figuring out how to code a for loop to compute the reduced row echelon form of the matrix (i.e. all ones on the diagonal, values in the last column, and zeroes in all other entries).
This is the matrix (this is my first time asking a question on here, so pardon the formatting I'm just copying and pasting from my code):
aug = np.array([[1., 0.5, 2., 0.5, -2.],
[0., 1., -2.8, -1., 3.6],
[0., 0., 1., 0., 2.],
[0., 0., 0., 1., 1.]])
Here is what I've tried. I assume I have to run through the rows of the matrix backwards and apply the operation to each, but what I have below is not producing the correct answer. I tried to make the range to run from the first row to the third row, with the reversed() method to run through that range backwards.
for i in reversed(range(0,aug.shape[0]-1)):
aug[i] = aug[i] - aug[i+1]*aug[i,i+1]
return aug
I know the answer should return the last column as:
[ 2 -1 -2 1]
I want to do this using a for loop, not using numpy. If anyone could help me figure out my problem, it would be much appreciated.

How to implement a certain operation on some specific bins of histogram?

In the following code hist shows the values of counts of a histogram. In to implement certain operation only on the bins having counts grater than zero and grater than 1. But when I store the files and print E_bin it also prints the empty arrays which is because it considers hist having values zero. How can I overcome this problem and store only those files where hist values are grater than 0 and one?
`hist: [3., 0., 0., 0., 0., 0., 1,. 2., 0., 3.]
for j in range(len(hist)):
val =hist[j]
E_bin =[]
for k in range(len(w)):
if j<len(hist)-1 and val>0 and bin_edges[j]<= w[k] <bin_edges[j+1]:
E_bin.append(w[k])
elif j==len(hist)-1 and val>0 and bin_edges[j]<= w[k]<=
bin_edges[j+1]:
E_bin.append(w[k])
E_bin = np.array(E_bin)
print("E_bin: ",E_bin)
np.save("./InputData/Samples/Sample_%s_bin_%s"%(i,j),E_bin)`
use numpy.nonzero to get the indices of values which are non zero and implement your logic on that.
for i in np.nonzero(hist):
#rest of the code

How can I change multiple values at once in pandas dataframe, using arrays as indices that vary in length?

I want to change a number of values in my pandas dataframe, where the indices that are indicating the columns may vary in size.
I need something that is faster than a for-loop, because it will be done on a lot of rows, and this turned out to be too slow.
As a simple example, consider this
df = pd.DataFrame(np.zeros((5,5)))
Now, I want to change some of the values in this dataframe to 1. If I e.g. want to change the values in the second and fith row for the first two columns, but in the fourth row I want to change all the values, I want something like this to work:
col_indices = np.array([np.arange(2),np.arange(5),np.arange(2)])
row_indices = np.array([1,3,4])
df.loc(row_indices,col_indices) =1
However, this does not work (I suspect that it does not work because the shape of the data you would select is not conform with a dataframe).
Is there any more flexible way of indexing without having to loop over rows etc.?
A solution that works only for range-like arrays (as above) would also work for my current problem - but general answer would also be nice.
Thanks for any help!
IIUC here's one approach. Define the column indices as the amount of columns where you want to insert 1s instead, and the rows where you want to insert them:
col_indices = np.array([2,5,2])
row_indices = np.array([1,3,4])
arr = df.values
And use advanced indexing to set the cells of interest to 1:
arr[row_indices] = np.arange(arr.shape[0]) <= col_indices[:,None]
array([[0., 0., 0., 0., 0.],
[1., 1., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[1., 1., 1., 1., 1.],
[1., 1., 0., 0., 0.]])

How to multiply diagonal elements by each other using numpy?

For the purpose of this exercise, let's consider a matrix where the element m_{i, j} is given by the rule m_{i, j} = i*j if i == j and 0 else.
Is there an easy "numpy" way of calculating such a matrix without having to resort to if statements checking for the indices?
You can use the numpy function diag to construct a diagonal matrix if you give it the intended diagonal as a 1D array as input.
So you just need to create that, like [i**2 for i in range (N)] with N the dimension of the matrix.
You could use the identity matrix given by numpy.identity(n) and then multiply it by a n dimensional vector.
Assuming you have a squared matrix, you can do this:
import numpy as np
ary = np.zeros((4, 4))
_ = [ary.__setitem__((i, i), i**2) for i in range(ary.shape[0])]
print(ary)
# array([[0., 0., 0., 0.],
# [0., 1., 0., 0.],
# [0., 0., 4., 0.],
# [0., 0., 0., 9.]])

How can I translate a MATLAB cell in Python 3?

Just to give you some context:
I have to translate some MATLAB code into Python 3 one, but here I've been confronted to a little problem.
Matlab:
for i in 1:num_nodes
for j in 1:num_nodes
K{i,j} = zeros(3,3);
Which I translated into:
k_topology = [[]]
for i in range(x):
for i in range(x):
k_topology[[i][j]].extend(np.zeros(3,3))
Also, further in the Matlab code there's a third loop:
for k in 1:3
K{i,j}(k,k) = -1
Which also kind of... Upsets me?
The fact is I don't really see how I can translate this kind of variable into Python. Also, I guess that my Python code's kind of "broken" - and I'm not really asking to any of you to improve it - , so I'm just asking which is the best way to translate Matlab's cell into Python?
I finally found something apparently simple to translate this, using list comprehension - according to kazemakase's answer. The actual Python code is now looking like this:
k_topology = [[np.zeros((3,3)) for j in range(self.get_nb_nodes_from_network())]\
for i in range(self.get_nb_nodes_from_network())]
And looks like something like this in Output:
[[array([[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.]]),
array([[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.]]),
array([[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.]])], ..., [array(...)]]
(There's really too many values to paste it here, but I think you got it.)
The first question you need to ask is "what is a Matlab cell and what could be a suitable corresponding Python type?"
If I remember correctly from my bad old Matlab days, a cell is sort of a container that holds content of mixed types. It is something like a dynamically typed array or matrix. It is multidimensionally indexed.
Python is dynamically typed, so any Python contianer can basically fulfill this function. Lists in Python are indexed, so nested lists could work - but they are somewhat weird to set up and access:
K = [[None] * num_nodes for _ in range(num_nodes)]
K[i][j] # need two indices to access elements of a nested list.
For the particular scenario a dictionary better mirrors Matlab syntax. Although a ditionary takes only one index, we can exploit the fact that tuples can be declared without brackets and that dictionaries can take tuples as index:
K = {}
for i in range(num_nodes):
for j in range(num_nodes):
K[i, j] = np.zeros((3, 3))
for k in 1:3
K[i, j][k, k] = -1
While the dictionary is syntactically more concise, element access is potentially less performant than in nested lists. Nested look different than Matlab code. The choice depends on performance or similarity to the original code. But if performance is an issue there are many more things to consider, anyway. In summary: There is no one best way to do it.
Since the OP expclicitly asked not to improve the code, I explicitly ask him/her to ignore this part of the answer.
A better way to build diagonal matrices is to use np.ones instead of looping over diagonal elements.
K = {}
for i in range(num_nodes):
for j in range(num_nodes):
K[i, j] = -np.ones((3, 3))
Also, nested lists can be constructed without (much) prior initialization, if that is the preferred approach:
K = []
for i in range(num_nodes):
K.append([])
for j in range(num_nodes):
K[-1].append(-np.ones((3, 3)))
Now, for the peace of my soul, let me take apart provide feedback on the OP's code:
k_topology = [[]]
for i in range(x):
for i in range(x):
k_topology[[i][j]].extend(np.zeros(3,3))
This has nothing to do with the original Matlab code (different variable names)
Both loops use i. j is never defined.
[[i][j]] builds a list with one element i and tries to take the jth element. If j is ever something other than 0 this will cause an error.
list.extend a appends all elements of the argument individually to the list - in this case individual rows. list.append would be correct to use as the whole 3x3 matrix should be appended as one element in K.
np.zeros(3, 3) should be np.zeros((3, 3)) (assuming np is an alias for numpy) because the function takes the shape is the first argument, not multiple arguments.
Using the Octave/scipy save/loadmat that I demonstrated in the linked post:
In an Octave session
>> num_nodes=3
num_nodes = 3
>> num_nodes=3;
>> K=cell(num_nodes, num_nodes);
>> for i = 1:num_nodes
for j = 1:num_nodes
K{i,j} = zeros(2,2);
end
end
>> K
K =
{
[1,1] =
0 0
0 0
[2,1] =
0 0
0 0
etc
Access one cell:
>> K{1,2}
ans =
0 0
0 0
Access one element of one cell:
>> K{1,2}(1,1)
ans = 0
>> save -7 kfile.mat K
In Python
In [31]: from scipy import io
In [32]: data = io.loadmat('kfile.mat')
In [34]: data
Out[34]:
{'K': array([[array([[ 0., 0.],
[ 0., 0.]]),
array([[ 0., 0.],
[ 0., 0.]]),
array([[ 0., 0.],
[ 0., 0.]])],
[array([[ 0., 0.],
[ 0., 0.]]),
array([[ 0., 0.],
[ 0., 0.]]),
array([[ 0., 0.],
[ 0., 0.]])],
[array([[ 0., 0.],
[ 0., 0.]]),
array([[ 0., 0.],
[ 0., 0.]]),
array([[ 0., 0.],
[ 0., 0.]])]], dtype=object),
'__globals__': [],
'__header__': b'MATLAB 5.0 MAT-file, written by Octave 4.0.0, 2017-02-15 19:05:44 UTC',
'__version__': '1.0'}
In [35]: data['K'].shape
Out[35]: (3, 3)
In [36]: data['K'][0,0].shape
Out[36]: (2, 2)
In [37]: data['K'][0,0][0,0]
Out[37]: 0.0
loadmat treats a cell as a 2d object dtype array; while regular matrices are 2d numeric arrays. Object arrays are, in many ways like a nested Python list.

Categories

Resources