I'm new to python and there is something that I am not sure how to do it. I have the following Matrices:
A=[[1,1,1],[1,1,1],[1,1,1]]
B=[[False,True,False],[True,False,True],[False,True,False]]
I would like to use B to transform A into the following Matrix:
A=[[0,1,0],[1,0,1],[0,1,0]]
I'm sure it is quite simple but, as said, I'm new to python so if you could tell me how to do that I'd appreciate it.
Many thanks
Your best bet for this is to use numpy:
import numpy as np
data = np.array([[1, 2, 3,],
[4, 5, 6,],
[7, 8, 9,],])
mask = np.array([[False, True, False,],
[True, False, True,],
[False, True, False,],])
filtered_data = data * mask
which results in filtered_data of:
array([[0, 2, 0],
[4, 0, 6],
[0, 8, 0]])
Without numpy you can do it with a nested list comprehension, but I'm sure you'll agree the numpy solution is much clearer if it's an option:
data = [[1, 2, 3,],
[4, 5, 6,],
[7, 8, 9,],]
mask = [[False, True, False,],
[True, False, True,],
[False, True, False,],]
filtered_data = [[data_elem if mask_elem else 0
for data_elem, mask_elem in zip(data_row, mask_row)]
for data_row, mask_row in zip(data, mask)]
which gives you filtered_data equal to
[[0, 2, 0], [4, 0, 6], [0, 8, 0]]
Using enumerate
Ex:
A=[[1,1,1],[1,1,1],[1,1,1]]
B=[[False,True,False],[True,False,True],[False,True,False]]
for ind, val in enumerate(B):
for sub_ind, sub_val in enumerate(val):
A[ind][sub_ind] = int(sub_val)
print(A)
Output:
[[0, 1, 0], [1, 0, 1], [0, 1, 0]]
You could just do
[ [int(y) for y in x] for x in B ]
Doing int() on a Boolean.
int(False) --> 0
int(True) --> 1
With numpy.multiply you'll get what you want:
import numpy as np
A=[[1,1,1],[1,1,1],[1,1,1]]
B=[[False,True,False],[True,False,True],[False,True,False]]
np.multiply(A, B)
#array([[0, 1, 0],
# [1, 0, 1],
# [0, 1, 0]])
Since, you have asked A to modified. Here's a solution, that doesn't create a new list, but modifies A. It uses zip and enumerate
A=[[1,1,1],[1,1,1],[1,1,1]]
B=[[False,True,False],[True,False,True],[False,True,False]]
for x,y in zip(A,B):
for x1,y1 in zip(enumerate(x),y):
x[x1[0]] = int(y1)
print A
Output:
[[0, 1, 0], [1, 0, 1], [0, 1, 0]]
If you want to modify A using flags in B, you can do it like that:
A = [[1, 1, 1], [1, 1, 1], [1, 1, 1]]
B = [[False, True, False], [True, False, True], [False, True, False]]
C = [[int(A_el == B_el) for A_el, B_el in zip(A_ar, B_ar)] for A_ar, B_ar in zip(A, B)]
Output:
[[0, 1, 0], [1, 0, 1], [0, 1, 0]]
Also you can iterate using indexes:
C = [[int(A[i][j] == B[i][j]) for j in range(len(A[0]))] for i in range(len(A))
try this
A=[[1,1,1],[1,1,1],[1,1,1]]
B=[[False,True,False],[True,False,True],[False,True,False]]
X = [[x and y for x,y in zip(a,b)] for a,b in zip(A,B)]
C = [ [int(x) for x in c] for c in X ]
print(C)
output
[[0, 1, 0], [1, 0, 1], [0, 1, 0]]
Basic doble for loop:
for i in range(len(A)):
for j in range(len(A[0])):
A[i][j]= int(B[i][j])*A[i][j]
print (A)
output:
[[0, 1, 0], [1, 0, 1], [0, 1, 0]]
example:
A=[[1,1,1],[1,1,1],[0,0,0]]
B=[[False,True,False],[True,False,True],[False,True,False]]
output:
for i in range(len(A)):
for j in range(len(A[0])):
A[i][j]= int(B[i][j])*A[i][j]
print (A)
Related
I have the following array in python:
a = np.array([[1,1,1],[1,1,1],[1,1,1]])
and the following index array:
b = np.array([0,1,2])
I want to index a using b such that I can subtract 1 from the matching row/column and get the following result:
[[0,1,1],[1,0,1],[1,1,0]]
I can do it using loops, wanted to know if there was a "non-loop" way of doing it.
for i in range(len(b)):
a[i][b[i]] = a[i][b[i]] - 1
It looks like there is some confusion on how to handle this.
You want a simple indexing:
a[np.arange(len(a)), b] -= 1
Output:
array([[0, 1, 1],
[1, 0, 1],
[1, 1, 0]])
Output for b = np.array([2,0,1])
array([[1, 1, 0],
[0, 1, 1],
[1, 0, 1]])
Your code produces output as follows:
a = np.array([[1,1,1],[1,1,1],[1,1,1]])
b = np.array([0,1,2])
for i in range(len(b)):
a[i][b[i]] = a[i][b[i]] - 1
Output:
array([[0, 1, 1],
[1, 0, 1],
[1, 1, 0]])
This can be done in non -loopy way as follows:
a[np.arange(len(b)),b] -= 1
print(a)
Output:
array([[0, 1, 1],
[1, 0, 1],
[1, 1, 0]])
I have an array like this and I have to find the distance between each points. How could I do so in python with numpy?
array([[ 8139, 112607],
[ 8139, 115665],
[ 8132, 126563],
[ 8193, 113938],
[ 8193, 123714],
[ 8156, 120291],
[ 8373, 125253],
[ 8400, 131442],
[ 8400, 136354],
[ 8401, 129352],
[ 8439, 129909],
[ 8430, 135706],
[ 8430, 146359],
[ 8429, 139089],
[ 8429, 133243]])
Let's minimize this problem down to 4 points:
points = np.array([[8139, 115665], [8132, 126563], [8193, 113938], [8193, 123714]])
In general, you need to do 2 steps:
Make an indices of pairs of points you want to take
Apply np.hypot for these pairs.
TL;DR
Making an indices of points
There are many ways of how you would like to create pairs of indices for each pair of points you'd like to take. But where do they come from? In every case it's a good idea to start building them from adjancency matrix.
Case 1
In the most common way you can start from building it like so:
adjacency = np.ones(shape=(len(points), len(points)), dtype=bool)
>>> adjacency
[[ True True True True]
[ True True True True]
[ True True True True]
[ True True True True]]
It corresponds to indices you need to take like so:
adjacency_idx_view = np.transpose(np.nonzero(adjacency))
for n in adjacency_idx_view.reshape(len(points), len(points), 2):
>>> print(n.tolist())
[[0, 0], [1, 0], [2, 0], [3, 0]]
[[0, 1], [1, 1], [2, 1], [3, 1]]
[[0, 2], [1, 2], [2, 2], [3, 2]]
[[0, 3], [1, 3], [2, 3], [3, 3]]
And this is how you collect them:
x, y = np.nonzero(adjacency)
>>> np.transpose([x, y])
array([[0, 0],
[0, 1],
[0, 2],
[0, 3],
[1, 0],
[1, 1],
[1, 2],
[1, 3],
[2, 0],
[2, 1],
[2, 2],
[2, 3],
[3, 0],
[3, 1],
[3, 2],
[3, 3]], dtype=int64)
It could be done also manually like in #
Corralien's answer:
x = np.repeat(np.arange(len(points)), len(points))
y = np.tile(np.arange(len(points)), len(points))
Case 2
In previous case every pair of point is duplicated. There are also pairs with points duplicating. A better option is to omit this excessive data and take only pairs with index of first point being less than index of the second one:
adjacency = np.less.outer(np.arange(len(points)), np.arange(len(points)))
>>> print(adjacency)
[[False True True True]
[False False True True]
[False False False True]
[False False False False]]
x, y = np.nonzero(adjacency)
This is not used widely. Although this lays beyond the hood of np.triu_indices. Hence, as an alternative, we could use:
x, y = np.triu_indices(len(points), 1)
And this results in:
>>> np.transpose([x, y])
array([[0, 1],
[0, 2],
[0, 3],
[0, 4],
[1, 2],
[1, 3],
[1, 4],
[2, 3],
[2, 4],
[3, 4]])
Case 3
You could also try omit only pairs of duplicated points and leave pairs with points being swapped. As in Case 1 it costs 2x memory and consumption time so I'll leave it for demonstration purposes only:
adjacency = ~np.identity(len(points), dtype=bool)
>>> adjacency
array([[False, True, True, True],
[ True, False, True, True],
[ True, True, False, True],
[ True, True, True, False]])
x, y = np.nonzero(adjacency)
>>> np.transpose([x, y])
array([[0, 1],
[0, 2],
[0, 3],
[1, 0],
[1, 2],
[1, 3],
[2, 0],
[2, 1],
[2, 3],
[3, 0],
[3, 1],
[3, 2]], dtype=int64)
I'll leave making x and y manually (without masking) as an exercise for the others.
Apply np.hypot
Instead of np.sqrt(np.sum((a - b) ** 2, axis=1)) you could do np.hypot(np.transpose(a - b)). I'll take my Case 2 as my index generator:
def distance(points):
x, y = np.triu_indices(len(points), 1)
x_coord, y_coord = np.transpose(points[x] - points[y])
return np.hypot(x_coord, y_coord)
>>> distance(points)
array([10898.00224812, 1727.84403231, 8049.18113848, 12625.14736548,
2849.65296133, 9776. ])
You can use np.repeat and np.tile to create all combinations then compute the euclidean distance:
xy = np.array([[8139, 115665], [8132, 126563], [8193, 113938], [8193, 123714],
[8156, 120291], [8373, 125253], [8400, 131442], [8400, 136354],
[8401, 129352], [8439, 129909], [8430, 135706], [8430, 146359],
[8429, 139089], [8429, 133243]])
a = np.repeat(xy, len(xy), axis=0)
b = np.tile(xy, [len(xy), 1])
d = np.sqrt(np.sum((a - b) ** 2, axis=1))
The output of d is (196,) which is 14 x 14.
Update
but I have to do it in a function.
def distance(xy):
a = np.repeat(xy, len(xy), axis=0)
b = np.tile(xy, [len(xy), 1])
return np.sqrt(np.sum((a - b) ** 2, axis=1))
d = distance(xy)
I have a list of unique rows and another larger array of data (called test_rows in example). I was wondering if there was a faster way to get the location of each unique row in the data. The fastest way that I could come up with is...
import numpy
uniq_rows = numpy.array([[0, 1, 0],
[1, 1, 0],
[1, 1, 1],
[0, 1, 1]])
test_rows = numpy.array([[0, 1, 1],
[0, 1, 0],
[0, 0, 0],
[1, 1, 0],
[0, 1, 0],
[0, 1, 1],
[0, 1, 1],
[1, 1, 1],
[1, 1, 0],
[1, 1, 1],
[0, 1, 0],
[0, 0, 0],
[1, 1, 0]])
# this gives me the indexes of each group of unique rows
for row in uniq_rows.tolist():
print row, numpy.where((test_rows == row).all(axis=1))[0]
This prints...
[0, 1, 0] [ 1 4 10]
[1, 1, 0] [ 3 8 12]
[1, 1, 1] [7 9]
[0, 1, 1] [0 5 6]
Is there a better or more numpythonic (not sure if that word exists) way to do this? I was searching for a numpy group function but could not find it. Basically for any incoming dataset I need the fastest way to get the locations of each unique row in that data set. The incoming dataset will not always have every unique row or the same number.
EDIT:
This is just a simple example. In my application the numbers would not be just zeros and ones, they could be anywhere from 0 to 32000. The size of uniq rows could be between 4 to 128 rows and the size of test_rows could be in the hundreds of thousands.
Numpy
From version 1.13 of numpy you can use numpy.unique like np.unique(test_rows, return_counts=True, return_index=True, axis=1)
Pandas
df = pd.DataFrame(test_rows)
uniq = pd.DataFrame(uniq_rows)
uniq
0 1 2
0 0 1 0
1 1 1 0
2 1 1 1
3 0 1 1
Or you could generate the unique rows automatically from the incoming DataFrame
uniq_generated = df.drop_duplicates().reset_index(drop=True)
yields
0 1 2
0 0 1 1
1 0 1 0
2 0 0 0
3 1 1 0
4 1 1 1
and then look for it
d = dict()
for idx, row in uniq.iterrows():
d[idx] = df.index[(df == row).all(axis=1)].values
This is about the same as your where method
d
{0: array([ 1, 4, 10], dtype=int64),
1: array([ 3, 8, 12], dtype=int64),
2: array([7, 9], dtype=int64),
3: array([0, 5, 6], dtype=int64)}
There are a lot of solutions here, but I'm adding one with vanilla numpy. In most cases numpy will be faster than list comprehensions and dictionaries, although the array broadcasting may cause memory to be an issue if large arrays are used.
np.where((uniq_rows[:, None, :] == test_rows).all(2))
Wonderfully simple, eh? This returns a tuple of unique row indices and the corresponding test row.
(array([0, 0, 0, 1, 1, 1, 2, 2, 3, 3, 3]),
array([ 1, 4, 10, 3, 8, 12, 7, 9, 0, 5, 6]))
How it works:
(uniq_rows[:, None, :] == test_rows)
Uses array broadcasting to compare each element of test_rows with each row in uniq_rows. This results in a 4x13x3 array. all is used to determine which rows are equal (all comparisons returned true). Finally, where returns the indices of these rows.
With the np.unique from v1.13 (downloaded from the source link on the latest documentation, https://github.com/numpy/numpy/blob/master/numpy/lib/arraysetops.py#L112-L247)
In [157]: aset.unique(test_rows, axis=0,return_inverse=True,return_index=True)
Out[157]:
(array([[0, 0, 0],
[0, 1, 0],
[0, 1, 1],
[1, 1, 0],
[1, 1, 1]]),
array([2, 1, 0, 3, 7], dtype=int32),
array([2, 1, 0, 3, 1, 2, 2, 4, 3, 4, 1, 0, 3], dtype=int32))
In [158]: a,b,c=_
In [159]: c
Out[159]: array([2, 1, 0, 3, 1, 2, 2, 4, 3, 4, 1, 0, 3], dtype=int32)
In [164]: from collections import defaultdict
In [165]: dd = defaultdict(list)
In [166]: for i,v in enumerate(c):
...: dd[v].append(i)
...:
In [167]: dd
Out[167]:
defaultdict(list,
{0: [2, 11],
1: [1, 4, 10],
2: [0, 5, 6],
3: [3, 8, 12],
4: [7, 9]})
or indexing the dictionary with the unique rows (as hashable tuple):
In [170]: dd = defaultdict(list)
In [171]: for i,v in enumerate(c):
...: dd[tuple(a[v])].append(i)
...:
In [172]: dd
Out[172]:
defaultdict(list,
{(0, 0, 0): [2, 11],
(0, 1, 0): [1, 4, 10],
(0, 1, 1): [0, 5, 6],
(1, 1, 0): [3, 8, 12],
(1, 1, 1): [7, 9]})
This will do the job:
import numpy as np
uniq_rows = np.array([[0, 1, 0],
[1, 1, 0],
[1, 1, 1],
[0, 1, 1]])
test_rows = np.array([[0, 1, 1],
[0, 1, 0],
[0, 0, 0],
[1, 1, 0],
[0, 1, 0],
[0, 1, 1],
[0, 1, 1],
[1, 1, 1],
[1, 1, 0],
[1, 1, 1],
[0, 1, 0],
[0, 0, 0],
[1, 1, 0]])
indices=np.where(np.sum(np.abs(np.repeat(uniq_rows,len(test_rows),axis=0)-np.tile(test_rows,(len(uniq_rows),1))),axis=1)==0)[0]
loc=indices//len(test_rows)
indices=indices-loc*len(test_rows)
res=[[] for i in range(len(uniq_rows))]
for i in range(len(indices)):
res[loc[i]].append(indices[i])
print(res)
[[1, 4, 10], [3, 8, 12], [7, 9], [0, 5, 6]]
This will work for all the cases including the cases in which not all the rows in uniq_rows are present in test_rows. However, if somehow you know ahead that all of them are present, you could replace the part
res=[[] for i in range(len(uniq_rows))]
for i in range(len(indices)):
res[loc[i]].append(indices[i])
with just the row:
res=np.split(indices,np.where(np.diff(loc)>0)[0]+1)
Thus avoiding loops entirely.
Not very 'numpythonic', but for a bit of an upfront cost, we can make a dict with the keys as a tuple of your row, and a list of indices:
test_rowsdict = {}
for i,j in enumerate(test_rows):
test_rowsdict.setdefault(tuple(j),[]).append(i)
test_rowsdict
{(0, 0, 0): [2, 11],
(0, 1, 0): [1, 4, 10],
(0, 1, 1): [0, 5, 6],
(1, 1, 0): [3, 8, 12],
(1, 1, 1): [7, 9]}
Then you can filter based on your uniq_rows, with a fast dict lookup: test_rowsdict[tuple(row)]:
out = []
for i in uniq_rows:
out.append((i, test_rowsdict.get(tuple(i),[])))
For your data, I get 16us for just the lookup, and 66us for building and looking up, versus 95us for your np.where solution.
Approach #1
Here's one approach, not sure about the level of "NumPythonic-ness" though to such a tricky problem -
def get1Ds(a, b): # Get 1D views of each row from the two inputs
# check that casting to void will create equal size elements
assert a.shape[1:] == b.shape[1:]
assert a.dtype == b.dtype
# compute dtypes
void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
# convert to 1d void arrays
a = np.ascontiguousarray(a)
b = np.ascontiguousarray(b)
a_void = a.reshape(a.shape[0], -1).view(void_dt).ravel()
b_void = b.reshape(b.shape[0], -1).view(void_dt).ravel()
return a_void, b_void
def matching_row_indices(uniq_rows, test_rows):
A, B = get1Ds(uniq_rows, test_rows)
validA_mask = np.in1d(A,B)
sidx_A = A.argsort()
validA_mask = validA_mask[sidx_A]
sidx = B.argsort()
sortedB = B[sidx]
split_idx = np.flatnonzero(sortedB[1:] != sortedB[:-1])+1
all_split_indx = np.split(sidx, split_idx)
match_mask = np.in1d(B,A)[sidx]
valid_mask = np.logical_or.reduceat(match_mask, np.r_[0, split_idx])
locations = [e for i,e in enumerate(all_split_indx) if valid_mask[i]]
return uniq_rows[sidx_A[validA_mask]], locations
Scope(s) of improvement (on performance) :
np.split could be replaced by a for-loop for splitting using slicing.
np.r_ could be replaced by np.concatenate.
Sample run -
In [331]: unq_rows, idx = matching_row_indices(uniq_rows, test_rows)
In [332]: unq_rows
Out[332]:
array([[0, 1, 0],
[0, 1, 1],
[1, 1, 0],
[1, 1, 1]])
In [333]: idx
Out[333]: [array([ 1, 4, 10]),array([0, 5, 6]),array([ 3, 8, 12]),array([7, 9])]
Approach #2
Another approach to beat the setup overhead from the previous one and making use of get1Ds from it, would be -
A, B = get1Ds(uniq_rows, test_rows)
idx_group = []
for row in A:
idx_group.append(np.flatnonzero(B == row))
The numpy_indexed package (disclaimer: I am its author) was created to solve problems of this kind in an elegant and efficient manner:
import numpy_indexed as npi
indices = np.arange(len(test_rows))
unique_test_rows, index_groups = npi.group_by(test_rows, indices)
If you dont care about the indices of all rows, but only those present in test_rows, npi has a bunch of simple ways of tackling that problem too; f.i:
subset_indices = npi.indices(unique_test_rows, unique_rows)
As a sidenote; it might be useful to take a look at the examples in the npi library; in my experience, most of the time people ask a question of this kind, these grouped indices are just a means to an end, and not the endgoal of the computation. Chances are that using the functionality in npi you can reach that end goal more efficiently, without ever explicitly computing those indices. Do you care to give some more background to your problem?
EDIT: if you arrays are indeed this big, and always consist of a small number of columns with binary values, wrapping them with the following encoding might boost efficiency a lot further still:
def encode(rows):
return (rows * [[2**i for i in range(rows.shape[1])]]).sum(axis=1, dtype=np.uint8)
I created a list to represent a 2-dim matrix:
mylist = []
while (some condition):
x1 = ...
x2 = ...
mylist.append([x1,x2])
I would like to test if each entry in the second column of the matrix is bigger than 0.45, but I meet some difficulty:
>>> mylist
[[1, 2], [1, -3], [-1, -2], [-1, 2], [0, 0], [0, 1], [0, -1]]
>>> mylist[][1] > 0.4
File "<stdin>", line 1
mylist[][1] > 0.4
^
SyntaxError: invalid syntax
>>> mylist[:,1] > 0.4
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: list indices must be integers, not tuple
Given that mylist is a list of sublists, how can I specify all the second components of all its sublists?
Is it good to choose list to represent the 2-dim matrix? I chose it, only because the size of the matrix is dynamically determined. What would you recommend?
Thanks!
Use all() like this:
>>> lst = [[1, 2], [1, -3], [-1, -2], [-1, 2], [0, 0], [0, 1], [0, -1]]
>>> all(x > 0.45 for _, x in lst)
False
If you need a list of booleans then use a list comprehension:
>>> [x > 0.45 for _, x in lst]
[True, False, False, True, False, True, False]
mylist[][1] is an invalid syntax, but if you can use NumPy then you can do something like:
In [1]: arr = np.array([[1, 2], [1, -3], [-1, -2], [-1, 2], [0, 0], [0, 1], [0, -1]])
In [2]: all(arr[:,1] > 0.45)
Out[2]: False
In [4]: arr[:,1] > .45
Out[4]: array([ True, False, False, True, False, True, False], dtype=bool)
#Aशwini चhaudhary's solution is fantastic if you continue to use lists.
I would suggest you use numpy though as it can provide significant speed increases through vectorised functions, especially when working with larger datasets.
import numpy as np
mylist = [[1, 2], [1, -3], [-1, -2], [-1, 2], [0, 0], [0, 1], [0, -1]]
myarray = np.array(mylist)
# Look at all "rows" (chosen by :) and the 2nd "column" (given by 1).
print(myarray[:,1]>0.45)
# [ True False False True False True False]
I have to analyze a quadratic 2D numpy array LL for values which are symmetric (LL[i,j] == LL[j,i]) and not zero.
Is there a faster and more "array like" way without loops to do this?
Is there a easy way to store the indices of the values for later use without creating a array and append the tuple of the indices in every loop?
Here my classical looping approach to store the indices:
IdxArray = np.array() # Array to store the indices
for i in range(len(LL)):
for j in range(i+1,len(LL)):
if LL[i,j] != 0.0:
if LL[i,j] == LL[j,i]:
IdxArray = np.vstack((IdxArray,[i,j]))
later use the indices:
for idx in IdxArray:
P = LL[idx]*(TT[idx[0]]-TT[idx[1]])
...
>>> a = numpy.matrix('5 2; 5 4')
>>> b = numpy.matrix('1 2; 3 4')
>>> a.T == b.T
matrix([[False, False],
[ True, True]], dtype=bool)
>>> a == a.T
matrix([[ True, False],
[False, True]], dtype=bool)
>>> numpy.nonzero(a == a.T)
(matrix([[0, 1]]), matrix([[0, 1]]))
How about this:
a = np.array([[1,0,3,4],[0,5,4,6],[7,4,4,5],[3,4,5,6]])
np.fill_diagonal(a, 0) # changes original array, must be careful
overlap = (a == a.T) * a
indices = np.argwhere(overlap != 0)
Result:
>>> a
array([[0, 0, 3, 4],
[0, 0, 4, 6],
[7, 4, 0, 5],
[3, 4, 5, 0]])
>>> overlap
array([[0, 0, 0, 0],
[0, 0, 4, 0],
[0, 4, 0, 5],
[0, 0, 5, 0]])
>>> indices
array([[1, 2],
[2, 1],
[2, 3],
[3, 2]])