I have a structure like this :
data = [[2,5,6,9,12,45,32] , [43,23,12,76,845,1] ,[65,23,1,54,22,123] ,
[323,23,412,656,2,3] , [8,5,3,9,12,45,32] , [60,23,12,76,845,1] ,
[5,23,1,54,22,123] , [35,2,12,56,22,34] ]
and I want order this lists based on another list with the positions
order = [5,4,1,3,0,6,7, 2]
the result would be :
data_ordered = [[60,23,12,76,845,1],[8,5,3,9,12,45,32], [43,23,12,76,845,1],
[323,23,412,656,2,3] , [2,5,6,9,12,45,32] , [5,23,1,54,22,123] ,
[35,2,12,56,22,34] ,[65,23,1,54,22,123] ]
Any idea?
data_ordered = [ data[i] for i in order]
Pretty basic list comprehension.
import numpy as np
data_ordered = np.array(data)[np.array(order)].tolist()
And this will be done. Full example given below:
import numpy as np
data = [[2,5,6,9,12,45,32] , [43,23,12,76,845,1] ,[65,23,1,54,22,123] ,
[323,23,412,656,2,3] , [8,5,3,9,12,45,32] , [60,23,12,76,845,1] ,
[5,23,1,54,22,123] , [35,2,12,56,22,34] ]
order = [5,4,1,3,0,6,7, 2]
data_ordered= np.array(data)[np.array(order)].tolist()
print(data_ordered)
Output is
[[60, 23, 12, 76, 845, 1], [8, 5, 3, 9, 12, 45, 32], [43, 23, 12, 76, 845, 1], [323, 23, 412, 656, 2, 3], [2, 5, 6, 9, 12, 45, 32], [5, 23, 1, 54, 22, 123], [35, 2, 12, 56, 22, 34], [65, 23, 1, 54, 22, 123]]
Use numpy to solve it.
Related
I have a tensor and want to apply a dictionary.
I am using instance segmentation with 44 classes but trying to merge into 15.
I have my data on a tf.record and dont want to create one everytime I change classes, so trying to modify it in the parser
test=tf.random.uniform(shape=(120,120), minval=0, maxval=43, dtype=tf.int32)
dict2={0:0,
1:0,
2:1,
3:1,
4:1,
5:1,
6:2 ,
7:2,
8:2,
9:3,
10:3,
11:4,
12:4,
13:4,
14:5,
15:5,
16:5,
17:6,
18:7,
19:7,
20:7,
21:7,
22:8,
23:8,
24:8,
25:9,
26:9,
27:9,
28:9,
29:10,
30:10,
31:10,
32:10,
33:10,
34:11,
35:11,
36:12,
37:12,
38:12,
39:13,
40:13,
41:14,
42:14,
43:14
}
test2=tf.vectorized_map(dict2.get,test.ref())
Error
ValueError: Attempt to convert a value (<Reference wrapping <tf.Tensor: shape=(120, 120), dtype=int32, numpy=
array([[36, 21, 34, ..., 7, 0, 8],
[36, 8, 32, ..., 15, 22, 35],
[30, 37, 10, ..., 26, 3, 39],
...,
[37, 6, 14, ..., 20, 36, 31],
[34, 11, 36, ..., 8, 0, 0],
[37, 5, 25, ..., 36, 32, 24]])>>) with an unsupported type (<class 'tensorflow.python.util.object_identity.Reference'>) to a Tensor.
this solved it.
keys = list(dict2.keys())
values = [dict2[k] for k in keys]
table = tf.lookup.StaticHashTable(
tf.lookup.KeyValueTensorInitializer(keys, values),
default_value=-1)
test2=table.lookup(test)
I have value X of type ndarray with shape: (40000, 2)
The second column of X contains list of 50 numbers
Example:
[17, [1, 2, 3, ...]],
[39, [44, 45, 45, ...]], ...
I want to convert it to ndarray of shape (40000, 51):
the first column will be the same
the every element of the list will be in it's own column.
for my example:
[17, 1, 2, 3, ....],
[39, 44, 45, 45, ...]
How can I do it ?
np.hstack((arr[:,0].reshape(-1,1), np.array(arr[:,1].tolist())))
Example:
>>> arr
array([[75, list([90, 39, 63])],
[20, list([82, 92, 22])],
[80, list([12, 6, 89])],
[79, list([11, 96, 74])],
[96, list([26, 37, 65])]], dtype=object)
>>> np.hstack((arr[:,0].reshape(-1,1),np.array(arr[:,1].tolist()))).astype(int)
array([[75, 90, 39, 63],
[20, 82, 92, 22],
[80, 12, 6, 89],
[79, 11, 96, 74],
[96, 26, 37, 65]])
You can do this for each line of your ndarray , here is an example :
# X = [39, [44, 45, 45, ...]]
newX = numpy.ndarray(shape=(1,51))
new[0] = X[0] # adding the first element
# now the rest of elements
i = 0
for e in X[1] :
newX[i] = e
i = i + 1
You can make this process as a function and apply it in this way :
newArray = numpy.ndarray(shape=(40000,51))
i = 0
for x in oldArray :
Process(newArray[i],x)
i=i+1
I defined the source array (with shorter lists in column 1) as:
X = np.array([[17, [1, 2, 3, 4]], [39, [44, 45, 45, 46]]])
To do your task, define the following function:
def myExplode(row):
tbl = [row[0]]
tbl.extend(row[1])
return tbl
Then apply it to each row:
np.apply_along_axis(myExplode, axis=1, arr=X)
The result is:
array([[17, 1, 2, 3, 4],
[39, 44, 45, 45, 46]])
This question already has answers here:
Vectorize large NumPy multiplication
(2 answers)
Closed 3 years ago.
Consider the two toy arrays below:
import numpy as np
k = np.random.randint(1, 25, (5, 2, 3))
l = np.random.randint(25, 50, (7, 3))
In [27]: k
Out[27]:
array([[[14, 15, 24],
[21, 24, 5]],
[[22, 19, 9],
[21, 1, 11]],
[[ 1, 23, 5],
[16, 14, 2]],
[[ 7, 3, 16],
[23, 2, 8]],
[[12, 24, 4],
[ 2, 15, 20]]])
In [28]: l
Out[28]:
array([[47, 31, 42],
[28, 27, 26],
[45, 32, 49],
[29, 34, 32],
[40, 36, 25],
[44, 27, 31],
[27, 35, 26]])
I can get the multiplicative sum that I am interested in as follows:
f = np.array([np.sum( k * x, axis = 2) for x in l])
In [29]: f
Out[29]:
array([[[2131, 1941],
[2001, 1480],
[ 970, 1270],
[1094, 1479],
[1476, 1399]],
[[1421, 1366],
[1363, 901],
[ 779, 878],
[ 693, 906],
[1088, 981]],
[[2286, 1958],
[2039, 1516],
[1026, 1266],
[1195, 1491],
[1504, 1550]],
[[1684, 1585],
[1572, 995],
[ 971, 1004],
[ 817, 991],
[1292, 1208]],
[[1700, 1829],
[1789, 1151],
[ 993, 1194],
[ 788, 1192],
[1444, 1120]],
[[1765, 1727],
[1760, 1292],
[ 820, 1144],
[ 885, 1314],
[1300, 1113]],
[[1527, 1537],
[1493, 888],
[ 962, 974],
[ 710, 899],
[1268, 1099]]])
How can I calculate this sum without resorting to comprehension?
This is a good use case for np.einsum:
np.einsum('ijk,lk->lij', k, l)
list_comp = np.array([np.sum( k * x, axis = 2) for x in l])
np.allclose(np.einsum('ijk,lk->lij', k, l), list_comp)
# True
Or using broadcasting:
(l[:,None,None]*k).sum(-1)
Although from a quick check on timings np.einsum runs about 3 times faster
You can also do that with np.tensordot:
import numpy as np
np.random.seed(0)
k = np.random.randint(1, 25, (5, 2, 3))
l = np.random.randint(25, 50, (7, 3))
f = np.tensordot(l, k, [-1, -1])
f_comp = np.array([np.sum(k * x, axis=2) for x in l])
print(np.allclose(f, f_comp))
# True
I have a Pandas DataFrame with MultiIndex on columns (lets say 3 levels):
MultiIndex(levels=[['BA-10.0', 'BA-2.5', ..., 'p'], ['41B004', '41B005', ..., 'T1M003', 'T1M011'], [25, 26, ..., 276, 277]],
labels=[[0, 0, 0, ..., 18, 19, 19], [4, 5, 6,..., 14, 12, 13], [24, 33, 47, ..., 114, 107, 113]],
names=['measurandkey', 'sitekey', 'channelid'])
When I iter through the first level and yield subset of DataFrame:
def cluster(df):
for key in df.columns.levels[0]:
yield df[key]
for subdf in cluster(df):
print(subdf.columns)
Columns index does have lost its first level, but the MultiIndex still contains reference to all other keys in sub-levels even if they are missing in the subset.
MultiIndex(levels=[['41B004', '41B005', '41B006', '41B008', '41B011', '41MEU1', '41N043', '41R001', '41R002', '41R012', '41WOL1', '41WOL2', 'T1M001', 'T1M003', 'T1M011'], [25, 26, 27, 28, 30, 31, 32, 3, ....
labels=[[4, 5, 6, 7, 9, 10], [24, 33, 47, 61, 83, 98]],
names=['sitekey', 'channelid'])
How can I force subdf to have its columns MultiIndex updated with only keys that are present?
def cluster(df):
for key in df.columns.levels[0]:
d = df[key]
d.columns = pd.MultiIndex.from_tuples(d.columns.to_series())
yield d
In Python 2.7 using numpy or by any means if I had an array of any size and wanted to excluded certain values and output the new array how would I do that? Here is What I would like
[(1,2,3),
(4,5,6), then exclude [4,2,9] to make the array[(1,5,3),
(7,8,9)] (7,8,6)]
I would always be excluding data the same length as the row length and always only one entry per column. [(1,5,3)] would be another example of data I would want to excluded. So every time I loop the function it reduces the array row size by one. I would imagine I have to use a masked array or convert my mask to a masked array and subtract the two then maybe condense the output but I have no idea how. Thanks for your time.
You can do it very efficiently if you transform your 2-D array in an unraveled 1-D array. Then you repeat the array with the elements to be excluded, called e in order to do an element-wise comparison:
import numpy as np
a = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
e = [1, 5, 3]
ar = a.T.ravel()
er = np.repeat(e, a.shape[0])
ans = ar[er != ar].reshape(a.shape[1], a.shape[0]-1).T
But it will work if each element in e only matches one row of a.
EDIT:
as suggested by #Jaime, you can avoid the ravel() and get the same result doing directly:
ans = a.T[(a != e).T].reshape(a.shape[1], a.shape[0]-1).T
To exclude vector e from matrix a:
import numpy as np
a = np.array([(1,2,3), (4,5,6), (7,8,9)])
e = [4,2,9]
print np.array([ [ i for i in a.transpose()[j] if i != e[j] ]
for j in range(len(e)) ]).transpose()
This would take some work to generalize, but here's something that can handle 2-d cases of the kind you describe. If passed unexpected input, this won't notice and will generate strange results, but it's at least a starting point:
def columnwise_compress(a, values):
a_shape = a.shape
a_trans_flat = a.transpose().reshape(-1)
compressed = a_trans_flat[~numpy.in1d(a_trans_flat, values)]
return compressed.reshape(a_shape[:-1] + ((a_shape[0] - 1),)).transpose()
Tested:
>>> columnwise_compress(numpy.arange(9).reshape(3, 3) + 1, [4, 2, 9])
array([[1, 5, 3],
[7, 8, 6]])
>>> columnwise_compress(numpy.arange(9).reshape(3, 3) + 1, [1, 5, 3])
array([[4, 2, 6],
[7, 8, 9]])
The difficulty is that you're asking for "compression" of a kind that numpy.compress doesn't do (removing different values for each column or row) and you're asking for compression along columns instead of rows. Compressing along rows is easier because it moves along the natural order of the values in memory; you might consider working with transposed arrays for that reason. If you want to do that, things become a bit simpler:
>>> a = numpy. array([[1, 4, 7],
... [2, 5, 8],
... [3, 6, 9]])
>>> a[~numpy.in1d(a, [4, 2, 9]).reshape(3, 3)].reshape(3, 2)
array([[1, 7],
[5, 8],
[3, 6]])
You'll still need to handle shape parameters intelligently if you do it this way, but it will still be simpler. Also, this assumes there are no duplicates in the original array; if there are, this could generate wrong results. Saullo's excellent answer partially avoids the problem, but any value-based approach isn't guaranteed to work unless you're certain that there aren't duplicate values in the columns.
In the spirit of #SaulloCastro's answer, but handling multiple occurrences of items, you can remove the first occurrence on each column doing the following:
def delete_skew_row(a, b) :
rows, cols = a.shape
row_to_remove = np.argmax(a == b, axis=0)
items_to_remove = np.ravel_multi_index((row_to_remove,
np.arange(cols)),
a.shape, order='F')
ret = np.delete(a.T, items_to_remove)
return np.ascontiguousarray(ret.reshape(cols,rows-1).T)
rows, cols = 5, 10
a = np.random.randint(100, size=(rows, cols))
b = np.random.randint(rows, size=(cols,))
b = a[b, np.arange(cols)]
>>> a
array([[50, 46, 85, 82, 27, 41, 45, 27, 17, 26],
[92, 35, 14, 34, 48, 27, 63, 58, 14, 18],
[90, 91, 39, 19, 90, 29, 67, 52, 68, 69],
[10, 99, 33, 58, 46, 71, 43, 23, 58, 49],
[92, 81, 64, 77, 61, 99, 40, 49, 49, 87]])
>>> b
array([92, 81, 14, 82, 46, 29, 67, 58, 14, 69])
>>> delete_skew_row(a, b)
array([[50, 46, 85, 34, 27, 41, 45, 27, 17, 26],
[90, 35, 39, 19, 48, 27, 63, 52, 68, 18],
[10, 91, 33, 58, 90, 71, 43, 23, 58, 49],
[92, 99, 64, 77, 61, 99, 40, 49, 49, 87]])