I have the following code
l = len(time) #time is a 300 element list
ll = len(sample) #sample has 3 sublists each with 300 elements
w, h = ll, l
Matrix = [[0 for x in range(w)] for y in range(h)]
for n in range(0,l):
for m in range(0,ll):
x=sample[m]
Matrix[m][n]= x
When I run the code to fill the matrix I get an error message saying "list index out of range" I put in a print statement to see where the error happens and when m=0 and n=3 the matrix goes out of index.
from what I understand on the fourth line of the code I initialize a 3X300 matrix so why does it go out of index at 0X3 ?
You need to change Matrix[m][n]= x to Matrix[n][m]= x
The indexing of nested lists happens from the outside in. So for your code, you'll probably want:
Matrix[n][m] = x
If you prefer the other order, you can build the matrix differently (swap w and h in the list comprehensions).
Note that if you're going to be doing mathematical operations with this matrix, you may want to be using numpy arrays instead of Python lists. They're almost certainly going to be much more efficient at doing math operations than anything you can write yourself in pure Python.
Note that indexing in nested lists in Python happens from outside in, and so you'll have to change the order in which you index into your array, as follows:
Matrix[n][m] = x
For mathematical operations and matrix manipulations, using numpy two-dimensional arrays, is almost always a better choice. You can read more about them here.
Related
I have a function with outputs 2 arrays, lets call them X_i, and Y_i which are two N x 1 arrays, where N is the number of points. By using multiprocessing's pool.apply_aync, I was able to parallelize this function which gave me my results in a HUGE list. The structure of the results are a list of M values, where each value is a list containing X_i and Y_i. So in summary, I have a huge list which M smaller lists containing the two arrays X_i and Y_i.
Now I have want append all the X_i's into one array called X and Y_i's called Y. What is the most efficient way to do this? I'm looking for some sort of parallel algorithm. Order does NOT matter!
So far I have just a simple for loop that separates this massive data array:
X = np.zeros((N,1))
Y = np.zeros((N,1))
for i in range(len(results))
X = np.append(results[i][0].reshape(N,1),X,axis = 1)
Y = np.append(results[i][1].reshape(N,1),Y,axis = 1)
I found this algorthim to be rather slow, so I need to speed it up! Thanks!
You should provide a simple scenario of your problem, break it down and give us a simple input, output scenario, it would help a lot, as all this variables and text make it a bit confusing.Maybe this can help;
You can unpack the lists, then grab the ones you need by index, append the list to your new empty X[] and append the other list you needed to Y[], at the end get the arrays out of the lists and merge those into your new N dimensional array or into a new list.
list = [[[1,2],[3,4]],[[4,5],[6,7]]]
sub_pre = []
flat_list = []
for sublist in list:
sub_pre.append(sublist)
for item in sublist:
flat_list.append(item)
print(list)
print(flat_list)
Thanks to #JonSG for the brilliant insight. This type of sorting algorithm can be sped up using array manipulation. Through the use of most parallels packages, a function that outputs in multiple arrays will most likely get put into a huge list. Here I have a list called results, which contains M smaller lists of two N x 1 arrays.
To unpack the main array and sort all the X_i and Y_i into their own X and Y arrays respectively, it can be done so like this.
np.shape(results) = (M, 2, N)
X = np.array(results)[:,0,:]
Y = np.array(results)[:,1,:]
This gave me an 100x speed increase!
I'll preface this with saying that I'm new to Python, but not new to OOP.
I'm using numpy.where to find the indices in n arrays at which a particular condition is met, specifically if the value in the array is greater than x.
What I want to do is find the indicies in which all n arrays meet that condition - so in each each array, at index y, the element is greater than x.
n0[y] > x
n1[y] > x
n2[y] > x
n3[y] > x
For example, if my arrays after using numpy.where were:
a = [0,1,2,3,4,5,6,7,8,9,10]
b = [0,2,4,6,8,10,12,14,16,18,20]
c = [0,2,3,5,7,11,13,17,19,23]
d = [0,1,2,3,5,8,13,21,34,55]
I want to get the output
[0,2]
I found the function numpy.isin, which seems to do what I want for just two arrays. I don't know how to go about expanding this to more than two arrays and am not sure if it's possible.
Here's the start of my code, in which I generate the indices meeting my criteria:
n = np.empty([0])
n = np.append(n,np.where(sensor[i] > x)[0])
I'm a little stuck. I know I could create a new array with the same number of indicies as my original arrays and set the values in it to true or false, but that would not be very efficient and my original arrays are 25k+ elements long.
To find the intersection of n different arrays, first convert them all to sets. Then it is possible to apply set.intersection(). For the example with a, b, c and d, simply do:
set.intersection(*map(set, [a,b,c,d]))
This will result in a set {0, 2}.
I have to evaluate the following expression, given two quite large matrices A,B and a very complicated function F:
The mathematical expression
I was thinking if there is an efficient way in order to first find those indices i,j that will give a non-zero element after the multiplication of the matrices, so that I avoid the quite slow 'for loops'.
Current working code
# Starting with 4 random matrices
A = np.random.randint(0,2,size=(50,50))
B = np.random.randint(0,2,size=(50,50))
C = np.random.randint(0,2,size=(50,50))
D = np.random.randint(0,2,size=(50,50))
indices []
for i in range(A.shape[0]):
for j in range(A.shape[0]):
if A[i,j] != 0:
for k in range(B.shape[1]):
if B[j,k] != 0:
for l in range(C.shape[1]):
if A[i,j]*B[j,k]*C[k,l]*D[l,i]!=0:
indices.append((i,j,k,l))
print indices
As you can see, in order to get the indices I need I have to use nested loops (= huge computational time).
My guess would be NO: you cannot avoid the for-loops. In order to find all the indices ij you need to loop through all the elements which defeats the purpose of this check. Therefore, you should go ahead and use simple array elementwise multiplication and dot product in numpy - it should be quite fast with for loops taken care by numpy.
However, if you plan on using a Python loop then the answer is YES, you can avoid them by using numpy, using the following pseudo-code (=hand-waving):
i, j = np.indices((N, M)) # CAREFUL: you may need to swap i<->j or N<->M
fs = F(i, j, z) # array of values of function F
# for a given z over the index grid
R = np.dot(A*fs, B) # summation over j
# return R # if necessary do a summation over i: np.sum(R, axis=...)
If the issue is that computing fs = F(i, j, z) is a very slow operation, then you will have to identify elements of A that are zero using two loops built-in into numpy (so they are quite fast):
good = np.nonzero(A) # hidden double loop (for 2D data)
fs = np.zeros_like(A)
fs[good] = F(i[good], j[good], z) # compute F only where A != 0
I am working on some molecular dynamics using Python, and the arrays tend to get pretty large. It would be helpful to have a quick check to see if certain vectors appear in the arrays.
After searching for way to do this, I was surprised to see this question doesn't seem to come up.
In particular,
if I have something like
import numpy as np
y = [[1,2,3], [1,3,2]]
x = np.array([[1,2,3],[3,2,1],[2,3,1],[10,5,6]])
and I want to see if the specific vectors from y are present in x (not just the elements), how would I do so?
Using something like
for i in y:
if i in x:
print(i)
will simply return every y array vector that contains at least one element of i.
Thoughts?
If you want to check if ALL vectors in y are present in the array, you could try:
import numpy as np
y = [[1,2,3], [1,3,2]]
x = np.array([[1,2,3],[3,2,1],[2,3,1],[10,5,6]])
all(True if i in x else False for i in y)
# True
You don't explicitly give your expected output, but I infer that you want to see only [1, 2, 3] as the output from this program.
You get that output if you make x merely another list, rather than a NumPy array.
The best strategy will depend on sizes and numbers. A quick solution is
[np.where(np.all(x==row, axis=-1))[0] for row in y]
# [array([0]), array([], dtype=int64)]
The result list gives for each row in y a possibly empty array of positions in x where the row occurs.
I am assigning values to a numpy array by looking up values in other numpy arrays. These arrays have potentially different indices. Here is an example:
import numpy as np
A=1; B=2; C=3; D=4; E=5
X = np.random.normal(0,1,(A,B,C,E))
Y = np.random.normal(0,1,(A,B,D))
Z = np.random.normal(0,1,(A,C))
Result = np.zeros((A,B,C,D,E))
for a in range(A):
for b in range(B):
for c in range(C):
for d in range(D):
for e in range(E):
Result[a,b,c,d,e] = Z[a,c] + Y[a,b,d] + X[a,b,c,e]
What is the best way to optimize this code? I can remove the E for loop using Result[a,b,c,d,:] = Z[a,c] + Y[a,b,d] + X[a,b,c,:]. But then how to remove the rest of the loops? I was also thinking that I could manipulate X,Y,Z before assignment so it merges easily with the dimensions of Result. There must be more elegant ways. Thanks for tips.
Here's one way:
Result = Z[:,None,:,None,None] + Y[:,:,None,:,None] + X[:,:,:,None,:]
To produce this vectorized version, all I did was replace the various indices into X, Y, and Z with full a,b,c,d,e-style indexing, inserting None where missing indices were found. For example, Y[a,b,d] becomes Y[a,b,None,d,None], which vectorizes into Y[:,:,None,:,None].
In numpy, indexing by None tells the array to pretend like it has an additional axis. This doesn't change the size of the array, but it does change how operations get broadcasted, which is what we need here. Check out the numpy broadcasting docs for more info.