I have a dataframe that contains 1 single array per column that i need to explode to multiple rows per column. the arrays are nested (twice) and varying length (6-12 arrays contained in each)
I'm trying to either;
explode the top level of each nested array or
explode everything and have a multi-index
example data:
I have tried varying methods found here on SO using the built in explode function, haven't been able to produce anything that
My example df:
df = pd.DataFrame({"k_6_cluster":[[[1,2,3],[4,5,6],[7,8,9]],[[1,2,3],[4,5,6],[7,8,9]]],"k_7_cluster":[[[10,20,30],[40,50,60],[70,80,90]],[[10,20,30],[40,50,60],[70,80,90]]]
print(df)
k_6_cluster k_7_cluster
0 [[1, 2, 3], [4, 5, 6], [7, 8, 9]] [[10, 20, 30], [40, 50, 60], [70, 80, 90]]
1 [[1, 2, 3], [4, 5, 6], [7, 8, 9]] [[10, 20, 30], [40, 50, 60], [70, 80, 90]]
Following lines of code will explode the top level of each nested array
list_cols = df.columns
exploded = [df[col].explode() for col in list_cols]
out_df = pd.DataFrame(dict(zip(list_cols, exploded)))
print(out_df)
k_6_cluster k_7_cluster
0 [1, 2, 3] [10, 20, 30]
0 [4, 5, 6] [40, 50, 60]
0 [7, 8, 9] [70, 80, 90]
1 [1, 2, 3] [10, 20, 30]
1 [4, 5, 6] [40, 50, 60]
1 [7, 8, 9] [70, 80, 90]
Related
Assume I have following multiple numpy np.array with different number of rows but same number of columns:
a=np.array([[10, 20, 30],
[40, 50, 60],
[70, 80, 90]])
b=np.array([[1, 2, 3],
[4, 5, 6]])
I want to combine them to have following:
result=np.array([[10, 20, 30],
[40, 50, 60],
[70, 80, 90],
[1, 2, 3],
[4, 5, 6]])
Here's what I do using for loop but I don't like it. Is there a pythonic way to do this?
c=[a,b]
num_row=sum([x.shape[0] for x in c])
num_col=a.shape[1] # or b.shape[1]
result=np.zeros((num_row,num_col))
k=0
for s in c:
for i in s:
reult[k]=i
k+=1
result=
array([[10, 20, 30],
[40, 50, 60],
[70, 80, 90],
[1, 2, 3],
[4, 5, 6]])
Use numpy.concatenate(), this is its exact purpose.
import numpy as np
a=np.array([[10, 20, 30],
[40, 50, 60],
[70, 80, 90]])
b=np.array([[1, 2, 3],
[4, 5, 6]])
result = np.concatenate((a, b), axis=0)
In my opinion, the most "Pythonic" way is to use a builtin or package rather than writing a bunch of code. Writing everything from scratch is for C developers.
I have these values in dataset in a pandas dataframe column
col1
[[1,2],[3,4],[5,6],[7,8],[9,10],[11,12]]
[[13,14],[15,16],[17,18],[19,20],[21,22],[23,24]]
I want to get 6 elements as list in new columns as rows.
This is the columns that I want to get.
col2 col3
[1,3,5,7,9,11] [2,4,6,8,10,12]
[13,15,17,19,21,23] [14,16,18,20,22,24]
You can use a list comprehension and the DataFrame constructor:
df[['col2', 'col3']] = pd.DataFrame([list(map(list, zip(*l))) for l in df['col1']])
Another approach with numpy:
a = np.dstack(df['col1'].to_numpy())
df['col2'] = a[:,0].T.tolist()
df['col3'] = a[:,1].T.tolist()
Output:
col1 col2 col3
0 [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12]] [1, 3, 5, 7, 9, 11] [2, 4, 6, 8, 10, 12]
1 [[13, 14], [15, 16], [17, 18], [19, 20], [21, 22], [23, 24]] [13, 15, 17, 19, 21, 23] [14, 16, 18, 20, 22, 24]
How do I extract from a matrix rows and columns that are not consecutive.
For example, in this matrix how do i extract rows 1,2 and 4 with columns 1, 2 and 4?
import numpy as np
a = np.matrix([[1, 2, 3, 4, 5],
[6, 7, 8, 9, 10],
[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20],
[18, 19, 20, 21, 22]])
So the new matrix should be:
b = ([[7, 8 , 10],
[12, 13, 15],
[19, 20, 22]])
In the doc section linked by hpaulj, see example starting with From a 4x3 array the corner elements should be selected using advanced indexing.
Specifically, the paragraph that starts This broadcasting can also be achieved using the function ix_:
In your case, rows are [1, 2, 4] and same for columns, so
rows = np.array([1, 2, 4], dtype=np.intp)
columns = np.array([1, 2, 4], dtype=np.intp)
b = a[np.ix_(rows, columns)]
import numpy as np
A = np.array(
[ [ [45, 12, 4], [45, 13, 5], [46, 12, 6] ],
[ [46, 14, 4], [45, 14, 5], [46, 11, 5] ],
[ [47, 13, 2], [48, 15, 5], [52, 15, 1] ] ])
print(A[1:3, 0:2])
Please explain this. I have been struggling to understand
When accessing a 3D array this way, what you are acutally asking for is to cut a part of each nesting level of those arrays:
A[1:3, 0:2, 0:3]
# ↑↑↑
# Of the outer array (the outer []), take elements 1 (inclusive) to 3 (exclusive).
# Mind that counting starts at 0, so this is the second and third line in your example
A[1:3, 0:2, 0:3]
# ↑↑↑
# Out of the second level array, take the elements 0 (inclusive) to 2 (exclusive).
# This is the first and the second group of three numbers each
A[1:3, 0:2, 0:3]
# ↑↑↑
# This you did not specify, but it is added automatically
# Of the third level arrays, take element 0 (inclusive) to 3 (exclusive)
# Those arrays only have 3 numbers each, so they are left untouched.
In [483]: A = np.array(
...: [ [ [45, 12, 4], [45, 13, 5], [46, 12, 6] ],
...: [ [46, 14, 4], [45, 14, 5], [46, 11, 5] ],
...: [ [47, 13, 2], [48, 15, 5], [52, 15, 1] ] ])
The whole 3d array. If you need to put names on the dimensions, I'd suggest 'plane', 'row' and 'column':
In [484]: A
Out[484]:
array([[[45, 12, 4],
[45, 13, 5],
[46, 12, 6]],
[[46, 14, 4],
[45, 14, 5],
[46, 11, 5]],
[[47, 13, 2],
[48, 15, 5],
[52, 15, 1]]])
In [485]: A.shape
Out[485]: (3, 3, 3)
Taking a slice on the first dimension (the last 2 planes):
In [486]: A[1:3]
Out[486]:
array([[[46, 14, 4],
[45, 14, 5],
[46, 11, 5]],
[[47, 13, 2],
[48, 15, 5],
[52, 15, 1]]])
Taking 2 rows from each of those planes:
In [487]: A[1:3, 0:2]
Out[487]:
array([[[46, 14, 4],
[45, 14, 5]],
[[47, 13, 2],
[48, 15, 5]]])
The last dimension, columns, is left whole, the equivalent of A[1:3, 0:2, :] (trailing slices are automatic).
3D slicing is just the same as 1d and 2d (and 4d etc). There's nothing special or really different about 3d.
I have this 2d array :
import numpy as np
R = int(input("Enter the number of rows:")) //4
C = int(input("Enter the number of columns:")) //5
randnums= np.random.randint(1,100, size=(R,C))
print(randnums)
[[98 25 33 9 41]
[67 32 67 27 85]
[38 79 52 40 58]
[84 76 44 9 2]]
Now, i want to happen is that i will search an element and the output will be its column and rows
example.
enter number to search : 40
Number 40 found in row 3 column 4
enter number : 100
number not found
something like this ?
thanks in advance
l = [[98, 25, 33, 9, 41],
[67, 32, 67, 27, 85],
[38, 79, 52, 40, 58],
[84, 76, 44, 9, 2]]
def fnd(l,value):
for i,v in enumerate(l):
if value in v:
return {'row':i+1,'col':v.index(value)+1}
return {'row':-1,'col':-1}
print(fnd(l,40))
{'row': 3, 'col': 4}
If the number of columns will be constant as shown in the example, you can search using below code.
a = [[98, 25, 33, 9, 41],
[67, 32, 67, 27, 85],
[38, 79, 52, 40, 58],
[84, 76, 44, 9, 2]]
a_ind = [p[x] for p in a for x in range(len(p))] # Construct 1d array as index for a to search efficiently.
def find(x):
return a_ind.index(x) // 5 + 1, a_ind.index(x) % 5 + 1 # Here 5 is the number of columns
print(find(98), find(58), find(40))
#Output
(1, 1) (3, 5) (3, 4)
You can use numpy.where function.
r = np.random.randint(1,10, size=(5,5))
# array([[6, 5, 3, 1, 8],
# [3, 9, 7, 5, 6],
# [6, 2, 5, 5, 8],
# [1, 5, 1, 1, 1],
# [1, 6, 5, 8, 6]])
s = np.where(r == 8)
# the first array is row indices, second is column indices
# (array([0, 2, 4], dtype=int64), array([4, 4, 3], dtype=int64))
s = np.array(np.where(r == 8)).T
# transpose to get 2d array of indices
# array([[0, 4],
# [2, 4],
# [4, 3]], dtype=int64)