python pandas exploding multiple varying length dataframe columns containing arrays - python

I have a dataframe that contains 1 single array per column that i need to explode to multiple rows per column. the arrays are nested (twice) and varying length (6-12 arrays contained in each)
I'm trying to either;
explode the top level of each nested array or
explode everything and have a multi-index
example data:
I have tried varying methods found here on SO using the built in explode function, haven't been able to produce anything that

My example df:
df = pd.DataFrame({"k_6_cluster":[[[1,2,3],[4,5,6],[7,8,9]],[[1,2,3],[4,5,6],[7,8,9]]],"k_7_cluster":[[[10,20,30],[40,50,60],[70,80,90]],[[10,20,30],[40,50,60],[70,80,90]]]
print(df)
k_6_cluster k_7_cluster
0 [[1, 2, 3], [4, 5, 6], [7, 8, 9]] [[10, 20, 30], [40, 50, 60], [70, 80, 90]]
1 [[1, 2, 3], [4, 5, 6], [7, 8, 9]] [[10, 20, 30], [40, 50, 60], [70, 80, 90]]
Following lines of code will explode the top level of each nested array
list_cols = df.columns
exploded = [df[col].explode() for col in list_cols]
out_df = pd.DataFrame(dict(zip(list_cols, exploded)))
print(out_df)
k_6_cluster k_7_cluster
0 [1, 2, 3] [10, 20, 30]
0 [4, 5, 6] [40, 50, 60]
0 [7, 8, 9] [70, 80, 90]
1 [1, 2, 3] [10, 20, 30]
1 [4, 5, 6] [40, 50, 60]
1 [7, 8, 9] [70, 80, 90]

Related

Pythonic method for stacking np.array's of different row length

Assume I have following multiple numpy np.array with different number of rows but same number of columns:
a=np.array([[10, 20, 30],
[40, 50, 60],
[70, 80, 90]])
b=np.array([[1, 2, 3],
[4, 5, 6]])
I want to combine them to have following:
result=np.array([[10, 20, 30],
[40, 50, 60],
[70, 80, 90],
[1, 2, 3],
[4, 5, 6]])
Here's what I do using for loop but I don't like it. Is there a pythonic way to do this?
c=[a,b]
num_row=sum([x.shape[0] for x in c])
num_col=a.shape[1] # or b.shape[1]
result=np.zeros((num_row,num_col))
k=0
for s in c:
for i in s:
reult[k]=i
k+=1
result=
array([[10, 20, 30],
[40, 50, 60],
[70, 80, 90],
[1, 2, 3],
[4, 5, 6]])
Use numpy.concatenate(), this is its exact purpose.
import numpy as np
a=np.array([[10, 20, 30],
[40, 50, 60],
[70, 80, 90]])
b=np.array([[1, 2, 3],
[4, 5, 6]])
result = np.concatenate((a, b), axis=0)
In my opinion, the most "Pythonic" way is to use a builtin or package rather than writing a bunch of code. Writing everything from scratch is for C developers.

How can I separate tuples into columns in a Pandas DataFrame?

I have these values in dataset in a pandas dataframe column
col1
[[1,2],[3,4],[5,6],[7,8],[9,10],[11,12]]
[[13,14],[15,16],[17,18],[19,20],[21,22],[23,24]]
I want to get 6 elements as list in new columns as rows.
This is the columns that I want to get.
col2 col3
[1,3,5,7,9,11] [2,4,6,8,10,12]
[13,15,17,19,21,23] [14,16,18,20,22,24]
You can use a list comprehension and the DataFrame constructor:
df[['col2', 'col3']] = pd.DataFrame([list(map(list, zip(*l))) for l in df['col1']])
Another approach with numpy:
a = np.dstack(df['col1'].to_numpy())
df['col2'] = a[:,0].T.tolist()
df['col3'] = a[:,1].T.tolist()
Output:
col1 col2 col3
0 [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12]] [1, 3, 5, 7, 9, 11] [2, 4, 6, 8, 10, 12]
1 [[13, 14], [15, 16], [17, 18], [19, 20], [21, 22], [23, 24]] [13, 15, 17, 19, 21, 23] [14, 16, 18, 20, 22, 24]

How to extract non-consecutive rows and columns of a matrix?

How do I extract from a matrix rows and columns that are not consecutive.
For example, in this matrix how do i extract rows 1,2 and 4 with columns 1, 2 and 4?
import numpy as np
a = np.matrix([[1, 2, 3, 4, 5],
[6, 7, 8, 9, 10],
[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20],
[18, 19, 20, 21, 22]])
So the new matrix should be:
b = ([[7, 8 , 10],
[12, 13, 15],
[19, 20, 22]])
In the doc section linked by hpaulj, see example starting with From a 4x3 array the corner elements should be selected using advanced indexing.
Specifically, the paragraph that starts This broadcasting can also be achieved using the function ix_:
In your case, rows are [1, 2, 4] and same for columns, so
rows = np.array([1, 2, 4], dtype=np.intp)
columns = np.array([1, 2, 4], dtype=np.intp)
b = a[np.ix_(rows, columns)]

Could you please someone explain 3-d array slicing?

import numpy as np
A = np.array(
[ [ [45, 12, 4], [45, 13, 5], [46, 12, 6] ],
[ [46, 14, 4], [45, 14, 5], [46, 11, 5] ],
[ [47, 13, 2], [48, 15, 5], [52, 15, 1] ] ])
print(A[1:3, 0:2])
Please explain this. I have been struggling to understand
When accessing a 3D array this way, what you are acutally asking for is to cut a part of each nesting level of those arrays:
A[1:3, 0:2, 0:3]
# ↑↑↑
# Of the outer array (the outer []), take elements 1 (inclusive) to 3 (exclusive).
# Mind that counting starts at 0, so this is the second and third line in your example
A[1:3, 0:2, 0:3]
# ↑↑↑
# Out of the second level array, take the elements 0 (inclusive) to 2 (exclusive).
# This is the first and the second group of three numbers each
A[1:3, 0:2, 0:3]
# ↑↑↑
# This you did not specify, but it is added automatically
# Of the third level arrays, take element 0 (inclusive) to 3 (exclusive)
# Those arrays only have 3 numbers each, so they are left untouched.
In [483]: A = np.array(
...: [ [ [45, 12, 4], [45, 13, 5], [46, 12, 6] ],
...: [ [46, 14, 4], [45, 14, 5], [46, 11, 5] ],
...: [ [47, 13, 2], [48, 15, 5], [52, 15, 1] ] ])
The whole 3d array. If you need to put names on the dimensions, I'd suggest 'plane', 'row' and 'column':
In [484]: A
Out[484]:
array([[[45, 12, 4],
[45, 13, 5],
[46, 12, 6]],
[[46, 14, 4],
[45, 14, 5],
[46, 11, 5]],
[[47, 13, 2],
[48, 15, 5],
[52, 15, 1]]])
In [485]: A.shape
Out[485]: (3, 3, 3)
Taking a slice on the first dimension (the last 2 planes):
In [486]: A[1:3]
Out[486]:
array([[[46, 14, 4],
[45, 14, 5],
[46, 11, 5]],
[[47, 13, 2],
[48, 15, 5],
[52, 15, 1]]])
Taking 2 rows from each of those planes:
In [487]: A[1:3, 0:2]
Out[487]:
array([[[46, 14, 4],
[45, 14, 5]],
[[47, 13, 2],
[48, 15, 5]]])
The last dimension, columns, is left whole, the equivalent of A[1:3, 0:2, :] (trailing slices are automatic).
3D slicing is just the same as 1d and 2d (and 4d etc). There's nothing special or really different about 3d.

How to find the row and column of element in 2d array in python?

I have this 2d array :
import numpy as np
R = int(input("Enter the number of rows:")) //4
C = int(input("Enter the number of columns:")) //5
randnums= np.random.randint(1,100, size=(R,C))
print(randnums)
[[98 25 33 9 41]
[67 32 67 27 85]
[38 79 52 40 58]
[84 76 44 9 2]]
Now, i want to happen is that i will search an element and the output will be its column and rows
example.
enter number to search : 40
Number 40 found in row 3 column 4
enter number : 100
number not found
something like this ?
thanks in advance
l = [[98, 25, 33, 9, 41],
[67, 32, 67, 27, 85],
[38, 79, 52, 40, 58],
[84, 76, 44, 9, 2]]
def fnd(l,value):
for i,v in enumerate(l):
if value in v:
return {'row':i+1,'col':v.index(value)+1}
return {'row':-1,'col':-1}
print(fnd(l,40))
{'row': 3, 'col': 4}
If the number of columns will be constant as shown in the example, you can search using below code.
a = [[98, 25, 33, 9, 41],
[67, 32, 67, 27, 85],
[38, 79, 52, 40, 58],
[84, 76, 44, 9, 2]]
a_ind = [p[x] for p in a for x in range(len(p))] # Construct 1d array as index for a to search efficiently.
def find(x):
return a_ind.index(x) // 5 + 1, a_ind.index(x) % 5 + 1 # Here 5 is the number of columns
print(find(98), find(58), find(40))
#Output
(1, 1) (3, 5) (3, 4)
You can use numpy.where function.
r = np.random.randint(1,10, size=(5,5))
# array([[6, 5, 3, 1, 8],
# [3, 9, 7, 5, 6],
# [6, 2, 5, 5, 8],
# [1, 5, 1, 1, 1],
# [1, 6, 5, 8, 6]])
s = np.where(r == 8)
# the first array is row indices, second is column indices
# (array([0, 2, 4], dtype=int64), array([4, 4, 3], dtype=int64))
s = np.array(np.where(r == 8)).T
# transpose to get 2d array of indices
# array([[0, 4],
# [2, 4],
# [4, 3]], dtype=int64)

Categories

Resources