Related
I have a NumPy array
X = np.array([[1,2,3,4],[5,6,7,8]])
is there a way to slice each subarray in X with diffrent begin and end indexes
Something like this
#np.eg_slic(X, [[begin,end],[begin,end]])
np.eg_slice(X, [[1,3],[0,2]])
>>> array([[2,3],[5,6]])
I am currently using linspace to generate every index value and store it in array which is not really efficient when it comes to ram size
We can do it by creating a mask based on the start and end indices of the slice array.
inputs :
X = np.array([[1,2,3,4],[5,6,7,8]])
indices = np.array([[1,3],[0,2]])
Split the indices to get the start and end indices for each row,
start,end = np.hsplit(indices,2)
#start: ([[1],[0]]),
#end: ([[3],[2]])
Mask creation,
jj = np.arange(X.shape[1])[None,...]
mask = (jj >= start)* (jj < end)
#[[False, True, True, False],
[ True, True, False, False]]
Get the values,
X[mask].reshape(-1,2) # reshape based on slice length
#[[2, 3],
[5, 6]]
I have an array which I want to use boolean indexing on, with multiple index arrays, each producing a different array. Example:
w = np.array([1,2,3])
b = np.array([[False, True, True], [True, False, False]])
Should return something along the lines of:
[[2,3], [1]]
I assume that since the number of cells containing True can vary between masks, I cannot expect the result to reside in a 2d numpy array, but I'm still hoping for something more elegant than iterating over the masks the appending the result of indexing w by the i-th b mask to it.
Am I missing a better option?
Edit: The next step I want to do afterwards is to sum each of the arrays returned by w[b], returning a list of scalars. If that somehow makes the problem easier, I'd love to know as well.
Assuming you want a list of numpy arrays you can simply use a comprehension:
w = np.array([1,2,3])
b = np.array([[False, True, True], [True, False, False]])
[w[bool] for bool in b]
# [array([2, 3]), array([1])]
If your goal is just a sum of the masked values you use:
np.sum(w*b) # 6
or
np.sum(w*b, axis=1) # array([5, 1])
# or b # w
…since False times you number will be 0 and therefor won't effect the sum.
Try this:
[w[x] for x in b]
Hope this helps.
I have a 2D list of booleans. I want to select a random index from the the list where the value is False. For example, given the following list:
[[True, False, False],
[True, True, True],
[False, True, True]]
The valid choices would be: [0, 1], [0, 2], and [2, 0].
I could keep a list of valid indices and then use random.choice to select from it, but it seems unpythonic to keep a variable and update it every time the underlying list changes for only this one purpose.
Bonus points if your answer runs quickly.
We can use a oneliner like:
import numpy as np
from random import choice
choice(np.argwhere(~a))
With a the array of booleans.
This works as follows: by using ~a, we negate the elements of the array. Next we use np.argwhere to construct a k×2-array: an array where every row has two elements: for every dimension the value such that the corresponding value has as value False.
By choice(..) we thus select a random row. We can however not use this directly to access the element. We can use the tuple(..) constructor to cast it to a tuple:
>>> tuple(choice(np.argwhere(~a)))
(2, 0)
You can thus fetch the element then with:
t = tuple(choice(np.argwhere(~a)))
a[t]
But of course, it is not a surprise that:
>>> t = tuple(choice(np.argwhere(~a)))
>>> a[t]
False
My non-numpy version:
result = random.choice([
(i,j)
for i in range(len(a))
for j in range(len(a[i]))
if not a[i][j]])
Like Willem's np version, this generates a list of valid tuples and invokes random.choice() to pick one.
Alternatively, if you hate seeing range(len(...)) as much as I do, here is an enumerate() version:
result = random.choice([
(i, j)
for i, row in enumerate(a)
for j, cell in enumerate(row)
if not cell])
Assuming you don't want to use numpy.
matrix = [[True, False, False],
[True, True, True],
[False, True, True]]
valid_choices = [(i,j) for i, x in enumerate(matrix) for j, y in enumerate(x) if not y]
random.choice(valid_choices)
With list comprehensions, you can change the if condition (if not y) to suit your needs. This will return the coordinate that is randomly selected, but optionally you could change the value part of the list comprehension (i,j) in this case to: y and it'd return false, though thats a bit redundant in this case.
I'm fairly new to NumPy, and also not the most expierenced Python programmer,
so please excuse me if this seems trivial to you ;)
I am writing a script to extract specific data out of several molecular-dynamics simulation.
Therefore I read data out of some files and modify and truncate them to a uniform length
and add everything together row-wise, to form a 2D-array for each simulation run.
These arrays are appended to each other, so that I ultimately get a 3D-Array, where each slice along the z-Axis would represent a dataset of a specific simulation run.
The goal is to later on do easy manipulation, e.g. averaging over all simulation runs.
This is just to give you the basic idea of what is done:
import numpy as np
A = np.zeros((2000), dtype = bool)
A = A.reshape((1, 2000))
# Appending different rows to form a '2D-Matrix',
# this is the actual data per simulation run
for i in xrange(1,103):
B = np.zeros((2000), dtype = bool)
B = B.reshape((1, 2000))
A = np.concatenate((A, B), axis=0)
print A.shape
# >>> (2000, 103)
C = np.expand_dims(A, axis=2)
A = np.expand_dims(A, axis=2)
print A.shape
# >>> (2000, 103, 1)
# Appending different '2D-Matrices' to form a 3D array,
# each slice along the z-Axis representing one simulation run
for i in xrange(1,50):
A = np.concatenate((A, C), axis=2)
print A.shape
# >>> (2000, 103, 50)
So far so good, now to the actual question:
In one 2D-array, each row represents a different set of interacting atom-pairs.
I later on want to create subsets of the array, depending on different critera - e.g. 'show me all pairs, where the distance x is 10 < x <= 20'.
So when I first add the rows together in for i in xrange(1,103): ..., I want to include indexing of the rows with a set of ints for each row.
The data of atom pairs is there anyway, at the moment I'm just not including it in the ndarray.
I was thinking of a tuple, so that my 2D-Array would look like
[ [('int' a,'int' b), [False,True,False,...]],
[('int' a,'int' d), [True, False, True...]],
...
]
Or something like that
[ [['int' a], ['int' b], [False,True,False,...]],
[['int' a], ['int' d], [True, False, True...]],
...
]
Can you think of another or easier approach for this kind of filtering?
I'm not quite sure if I'm on the right track here and it doesn't seem to be very straight-forward to have different datatypes in an array like that.
Also notice, that all indexes are ordered in the same way in each 2D-array, because I sort them (atm based on a String) and add np.zeros() rows for those that only occur on other simulation runs.
Maybe a Lookup-table is the right approach?
Thanks a lot!
Update/Answer:
Sorry, I know the question was a little bit too specific and bloated with
code that wasn't relevant to the question.
I answered the question myself, and for the sake of documentation you can find it below. It is specific, but maybe it helps someone to handle his indexing in numpy.
Short, general answer:
I basically just created a look-up-table as a python list and did a very simple numpy slicing operation for selection with a mask, containing indices:
A = [[[1, 2],
[3, 4],
[5, 6]],
[[7, 8],
[9,10],
[11,12]]]
A = np.asarray(A)
# selects only rows 1 and 2 from each 2D array
mask = [1,2]
B = A[ : , mask, : ]
Which gives for B:
[[[ 3 4]
[ 5 6]]
[[ 9 10]
[11 12]]]
Complete answer, specific for my question above:
This is my 2D array:
A =[[True, False, False, False, False],
[False, True, False, False, False],
[False, False, True, False, False]]
A = np.asarray(A)
Indexing of the rows as tuples, this is due to my specific problem
e.g.:
lut = [(1,2),(3,4),(3,5)]
Append other 2D array to form a 3D array:
C = np.expand_dims(A, axis=0)
A = np.expand_dims(A, axis=0)
A = np.concatenate((A, C), axis=0)
This is the 3D Array A:
>[[[ True False False False False]
[False True False False False]
[False False True False False]]
[[ True False False False False]
[False True False False False]
[False False True False False]]]
Selecting rows, which contain "3" in the Look-up-Table
mask = [i for i, v in enumerate(lut) if 3 in v]
> [1, 2]
Applying mask to the 3D-array:
B = A[ : , mask, : ]
Now B is the 3D array A after selection:
[[[False True False False False]
[False False True False False]]
[[False True False False False]
[False False True False False]]]
To keep track of the new indices of B:
create a new Look-up-Table for further computation:
newLut = [v for i, v in enumerate(lut) if i in mask]
>[(3, 4), (3, 5)]
I am trying to take a set of arrays and convert them into a matrix that will essentially be an indicator matrix for a set of items.
I currently have a array of N items
A_ = [A,B,C,D,E,...,Y,Z]
In addition, I have S arrays (currently stored in an array) that are have a subset of the items in vector A.
B_ = [A,B,C,Z]
C_ = [A,B]
D_ = [D,Y,Z]
The array they are stored in would is structures like so:
X = [B_,C_,D_]
I would like to convert the data into an indicator matrix for easier operation. It would ideally look like this (it would be an N x S sized matrix):
[1,1,1,0,...,0,1]
[1,1,0,0,...,0,0]
[0,0,0,1,...,1,1]
I know how I could use a for loop to iterate through this and create the matrix but I was wondering if there is a more efficient/syntactically simple way of going about this.
A concise way would be to use a list comprehension.
# Create a list containing the alphabet using a list comprehension
A_ = [chr(i) for i in range(65,91)]
# A list containing two sub-lists with some letters
M = [["A","B","C","Z"],["A","B","G"]]
# Nested list comprehension to convert character matrix
# into matrix of indicator vectors
I_M = [[1 if char in sublist else 0 for char in A_] for sublist in M]
The last line is a bit dense if you aren't familiar with comprehensions, but its not too tricky once you take it apart. The inner part...
[1 if char in sublist else 0 for char in A_]
Is a list comprehension in itself, which creates a list containing 1's for all characters (char) in A_ which are also found in sublist, and 0's for characters not found in sublist.
The outer bit...
[ ... for sublist in M]
simply runs the inner list comprehension for each sublist found in M, resulting in a list of all the sublists created by the inner list comprehension stored in I_M.
Edit:
While I tried to keep this example simple, it is worth noting (as DSM and jterrace point out) that testing membership in vanilla arrays is O(N). Converting it to a hashlike structure like a Set would speed up the checking for large sublists.
Using numpy:
>>> import numpy as np
>>> A_ = np.array(['A','B','C','D','E','Y','Z'])
>>> B_ = np.array(['A','B','C','Z'])
>>> C_ = np.array(['A','B'])
>>> D_ = np.array(['D','Y','Z'])
>>> X = [B_,C_,D_]
>>> matrix = np.array([np.in1d(A_, x) for x in X])
>>> matrix.shape
(3, 7)
>>> matrix
array([[ True, True, True, False, False, False, True],
[ True, True, False, False, False, False, False],
[False, False, False, True, False, True, True]], dtype=bool)
This is O(NS).