I want to do this with python and pandas.
Let's suppose that I have the following:
x_position y_position
0 [4, 2, 6] [1, 2, 9]
1 [1, 7] [3, 5]
and I finally want to have the following:
x_position y_position new_0_0 new_0_1 new_1_0 new_1_1 new_2_0 new_2_1
0 [4, 2, 6] [1, 2, 9] 4 1 2 2 6 9
1 [1, 7] [3, 5] 1 3 7 5 Na Na
It is not necessary that the new columns have names such as new_0_0; it can be 0_0 or even anything to be honest.
Secondly, it would be good if your code can work for more columns with lists e.g. with a z_position column too.
What is the most efficient way to do this?
Use list comprehension with DataFrame constructor and concat, sorting by second level of Multiindex in columns by DataFrame.sort_index and last flatten MultiIndex:
print (df)
x_position y_position z_position
0 [4, 2, 6] [1, 2, 9] [4,8,9]
1 [1, 7] [3, 5] [1,3]
comp = [pd.DataFrame(df[x].tolist()) for x in df.columns]
df1 = pd.concat(comp, axis=1, keys=range(len(df.columns))).sort_index(axis=1, level=1)
df1.columns = [f'new_{b}_{a}' for a, b in df1.columns]
print (df1)
new_0_0 new_0_1 new_0_2 new_1_0 new_1_1 new_1_2 new_2_0 new_2_1 \
0 4 1 4 2 2 8 6.0 9.0
1 1 3 1 7 5 3 NaN NaN
new_2_2
0 9.0
1 NaN
print (df.join(df1))
x_position y_position z_position new_0_0 new_0_1 new_0_2 new_1_0 \
0 [4, 2, 6] [1, 2, 9] [4, 8, 9] 4 1 4 2
1 [1, 7] [3, 5] [1, 3] 1 3 1 7
new_1_1 new_1_2 new_2_0 new_2_1 new_2_2
0 2 8 6.0 9.0 9.0
1 5 3 NaN NaN NaN
Related
I have created a pandas dataframe using this code:
import numpy as np
import pandas as pd
ds = {'col1': [1,2,3,3,3,6,7,8,9,10]}
df = pd.DataFrame(data=ds)
The dataframe looks like this:
print(df)
col1
0 1
1 2
2 3
3 3
4 3
5 6
6 7
7 8
8 9
9 10
I need to create a field called col2 that contains in a list (for each record) the last 3 elements of col1 while iterating through each record. So, the resulting dataframe would look like this:
Does anyone know how to do it by any chance?
Here is a solution using rolling and list comprehension
df['col2'] = [x.tolist() for x in df['col1'].rolling(3)]
col1 col2
0 1 [1]
1 2 [1, 2]
2 3 [1, 2, 3]
3 3 [2, 3, 3]
4 3 [3, 3, 3]
5 6 [3, 3, 6]
6 7 [3, 6, 7]
7 8 [6, 7, 8]
8 9 [7, 8, 9]
9 10 [8, 9, 10]
Use a list comprehension:
N = 3
l = df['col1'].tolist()
df['col2'] = [l[max(0,i-N+1):i+1] for i in range(df.shape[0])]
Output:
col1 col2
0 1 [1]
1 2 [1, 2]
2 3 [1, 2, 3]
3 3 [2, 3, 3]
4 3 [3, 3, 3]
5 6 [3, 3, 6]
6 7 [3, 6, 7]
7 8 [6, 7, 8]
8 9 [7, 8, 9]
9 10 [8, 9, 10]
Upon seeing the other answers, I'm affirmed my answer is pretty stupid.
Anyways, here it is.
import pandas as pd
ds = {'col1': [1,2,3,3,3,6,7,8,9,10]}
df = pd.DataFrame(data=ds)
df['col2'] = df['col1'].shift(1)
df['col3'] = df['col2'].shift(1)
df['col4'] = (df[['col3','col2','col1']]
.apply(lambda x:','.join(x.dropna().astype(str)),axis=1)
)
The last column contains the resulting list.
col1 col4
0 1 1.0
1 2 1.0,2.0
2 3 1.0,2.0,3.0
3 3 2.0,3.0,3.0
4 3 3.0,3.0,3.0
5 6 3.0,3.0,6.0
6 7 3.0,6.0,7.0
7 8 6.0,7.0,8.0
8 9 7.0,8.0,9.0
9 10 8.0,9.0,10.0
lastThree = []
for x in range(len(df)):
lastThree.append([df.iloc[x - 2]['col1'], df.iloc[x - 1]['col1'], df.iloc[x]['col1']])
df['col2'] = lastThree
I have two dataframes, one has a column that contains a list of values and the other one has some values.
I want to filter the main df if one of the values in the second df exists in the main df column.
Code:
import pandas as pd
A = pd.DataFrame({'index':[0,1,2,3,4], 'vals':[[1,2],[5,4],[7,1,26],['-'],[9,8,5]]})
B = pd.DataFrame({'index':[4,7], 'val':[1,8]})
print(A)
print(B)
print(B['val'].isin(A['vals'])) # Will not work since its comparing element to list
result = pd.DataFrame({'index':[0,2,4], 'vals':[[1,2],[7,1,26],[9,8,5]]})
Dataframe A
index
vals
0
[1, 2]
1
[5, 4]
2
[7, 1, 26]
3
[-]
4
[9, 8, 5]
Dataframe B
index
val
4
1
7
8
Result
index
vals
0
[1, 2]
2
[7, 1, 26]
4
[9, 8, 5]
You can explode your vals column then compute the intersection:
>>> A.loc[A['vals'].explode().isin(B['val']).loc[lambda x: x].index]
index vals
0 0 [1, 2]
2 2 [7, 1, 26]
4 4 [9, 8, 5]
Detail about explode:
>>> A['vals'].explode()
0 1
0 2
1 5
1 4
2 7 # not in B -|
2 1 # in B | -> keep index 2
2 26 # not in B -|
3 -
4 9
4 8
4 5
Name: vals, dtype: object
You can use:
# mask the values based on the intersection between the list in each row and B values
mask = A['vals'].apply(lambda a: len(list(set(a) & set(B['val'])))) > 0
result = A[mask]
print(result)
Output:
index vals
0 0 [1, 2]
2 2 [7, 1, 26]
4 4 [9, 8, 5]
I'm looking to find a minimum value across level 1 of a multiindex, time in this example. But I'd like to retain all other labels of the index.
import numpy as np
import pandas as pd
stack = [
[0, 1, 1, 5],
[0, 1, 2, 6],
[0, 1, 3, 2],
[0, 2, 3, 4],
[0, 2, 2, 5],
[0, 3, 2, 1],
[1, 1, 0, 5],
[1, 1, 2, 6],
[1, 1, 3, 7],
[1, 2, 2, 8],
[1, 2, 3, 9],
[2, 1, 7, 1],
[2, 1, 8, 3],
[2, 2, 3, 4],
[2, 2, 8, 1],
]
df = pd.DataFrame(stack)
df.columns = ['self', 'time', 'other', 'value']
df.set_index(['self', 'time', 'other'], inplace=True)
df.groupby(level=1).min() doesn't return the correct values:
value
time
1 1
2 1
3 1
doing something like df.groupby(level=[0,1,2]).min() returns the original dataframe unchanged.
I swear I used to be able to do this by calling .min(level=1) but it's giving me deprecation notices and teling me to use the above groupby format, but the result seems different than I remember, am I stupid?
original:
value
self time other
0 1 1 5
2 6
3 2 #<-- min row
2 3 4 #<-- min row
2 5
3 2 1 #<-- min row
1 1 0 5 #<-- min row
2 6
3 7
2 2 8 #<-- min row
3 9
2 1 7 1 #<-- min row
8 3
2 3 4
8 1 #<-- min row
desired result:
value
self time other
0 1 3 2
2 3 4
3 2 1
1 1 0 5
2 2 8
2 1 7 1
2 8 1
Group by your 2 first levels then return the idxmin instead of min to get all indexes. Finally, use loc to filter out your original dataframe:
out = df.loc[df.groupby(level=['self', 'time'])['value'].idxmin()]
print(out)
# Output
value
self time other
0 1 3 2
2 3 4
3 2 1
1 1 0 5
2 2 8
2 1 7 1
2 8 1
Why not just groupby the first two indexes, rather than all three?
out = df.groupby(level=[0,1]).min()
Output:
>>> out
value
self time
0 1 2
2 4
3 1
1 1 5
2 8
2 1 1
2 1
I have a dataframe in python
import pandas as pd
d = {'name':['a','b','c','d','e'],'location1': [1, 2,3,8,6], 'location2':
[2,1,4,6,8]}
df = pd.DataFrame(data=d)
df is as follow:
name location1 location2
0 a 1 2
1 b 2 1
2 c 3 4
3 d 8 6
4 e 6 8
I try to obtain a dataframe as:
name loc
0 a [1, 2]
1 b [2, 1]
2 c [3, 4]
3 d [8, 6]
4 e [6, 8]
How to efficiently convert that?
Here are some suggestions.
Listification and Assignment
# pandas >= 0.24
df['loc'] = df[['location1', 'location2']].to_numpy().tolist()
# pandas < 0.24
df['loc'] = df[['location1', 'location2']].values.tolist()
df
name location1 location2 loc
0 a 1 2 [1, 2]
1 b 2 1 [2, 1]
2 c 3 4 [3, 4]
3 d 8 6 [8, 6]
4 e 6 8 [6, 8]
Remove the columns using drop.
(df.drop(['location1', 'location2'], 1)
.assign(loc=df[['location1', 'location2']].to_numpy().tolist()))
name loc
0 a [1, 2]
1 b [2, 1]
2 c [3, 4]
3 d [8, 6]
4 e [6, 8]
zip with pop using List Comprehension
df['loc'] = [[x, y] for x, y in zip(df.pop('location1'), df.pop('location2'))]
# or
df['loc'] = [*map(list, zip(df.pop('location1'), df.pop('location2')))]
df
name loc
0 a [1, 2]
1 b [2, 1]
2 c [3, 4]
3 d [8, 6]
4 e [6, 8]
pop destructively removes the columns, so you get to assign and cleanup in a single step.
In Python, I have a dataframe and want to divide the numbers in same column.(such as divide 6 by 9 in column c)
df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),columns=['a', 'b', 'c'])
df2
a b c
0 1 2 3
1 4 5 6
2 7 8 9