I have created a pandas dataframe using this code:
import numpy as np
import pandas as pd
ds = {'col1': [1,2,3,3,3,6,7,8,9,10]}
df = pd.DataFrame(data=ds)
The dataframe looks like this:
print(df)
col1
0 1
1 2
2 3
3 3
4 3
5 6
6 7
7 8
8 9
9 10
I need to create a field called col2 that contains in a list (for each record) the last 3 elements of col1 while iterating through each record. So, the resulting dataframe would look like this:
Does anyone know how to do it by any chance?
Here is a solution using rolling and list comprehension
df['col2'] = [x.tolist() for x in df['col1'].rolling(3)]
col1 col2
0 1 [1]
1 2 [1, 2]
2 3 [1, 2, 3]
3 3 [2, 3, 3]
4 3 [3, 3, 3]
5 6 [3, 3, 6]
6 7 [3, 6, 7]
7 8 [6, 7, 8]
8 9 [7, 8, 9]
9 10 [8, 9, 10]
Use a list comprehension:
N = 3
l = df['col1'].tolist()
df['col2'] = [l[max(0,i-N+1):i+1] for i in range(df.shape[0])]
Output:
col1 col2
0 1 [1]
1 2 [1, 2]
2 3 [1, 2, 3]
3 3 [2, 3, 3]
4 3 [3, 3, 3]
5 6 [3, 3, 6]
6 7 [3, 6, 7]
7 8 [6, 7, 8]
8 9 [7, 8, 9]
9 10 [8, 9, 10]
Upon seeing the other answers, I'm affirmed my answer is pretty stupid.
Anyways, here it is.
import pandas as pd
ds = {'col1': [1,2,3,3,3,6,7,8,9,10]}
df = pd.DataFrame(data=ds)
df['col2'] = df['col1'].shift(1)
df['col3'] = df['col2'].shift(1)
df['col4'] = (df[['col3','col2','col1']]
.apply(lambda x:','.join(x.dropna().astype(str)),axis=1)
)
The last column contains the resulting list.
col1 col4
0 1 1.0
1 2 1.0,2.0
2 3 1.0,2.0,3.0
3 3 2.0,3.0,3.0
4 3 3.0,3.0,3.0
5 6 3.0,3.0,6.0
6 7 3.0,6.0,7.0
7 8 6.0,7.0,8.0
8 9 7.0,8.0,9.0
9 10 8.0,9.0,10.0
lastThree = []
for x in range(len(df)):
lastThree.append([df.iloc[x - 2]['col1'], df.iloc[x - 1]['col1'], df.iloc[x]['col1']])
df['col2'] = lastThree
Related
I have two dataframes, one has a column that contains a list of values and the other one has some values.
I want to filter the main df if one of the values in the second df exists in the main df column.
Code:
import pandas as pd
A = pd.DataFrame({'index':[0,1,2,3,4], 'vals':[[1,2],[5,4],[7,1,26],['-'],[9,8,5]]})
B = pd.DataFrame({'index':[4,7], 'val':[1,8]})
print(A)
print(B)
print(B['val'].isin(A['vals'])) # Will not work since its comparing element to list
result = pd.DataFrame({'index':[0,2,4], 'vals':[[1,2],[7,1,26],[9,8,5]]})
Dataframe A
index
vals
0
[1, 2]
1
[5, 4]
2
[7, 1, 26]
3
[-]
4
[9, 8, 5]
Dataframe B
index
val
4
1
7
8
Result
index
vals
0
[1, 2]
2
[7, 1, 26]
4
[9, 8, 5]
You can explode your vals column then compute the intersection:
>>> A.loc[A['vals'].explode().isin(B['val']).loc[lambda x: x].index]
index vals
0 0 [1, 2]
2 2 [7, 1, 26]
4 4 [9, 8, 5]
Detail about explode:
>>> A['vals'].explode()
0 1
0 2
1 5
1 4
2 7 # not in B -|
2 1 # in B | -> keep index 2
2 26 # not in B -|
3 -
4 9
4 8
4 5
Name: vals, dtype: object
You can use:
# mask the values based on the intersection between the list in each row and B values
mask = A['vals'].apply(lambda a: len(list(set(a) & set(B['val'])))) > 0
result = A[mask]
print(result)
Output:
index vals
0 0 [1, 2]
2 2 [7, 1, 26]
4 4 [9, 8, 5]
I would like to extend an existing pandas DataFrame and fill the new column successively:
df = pd.DataFrame({'col1': [1, 2, 3, 4, 5, 6], 'col2': [7, 8, 9, 10, 11, 12]})
df['col3'] = pd.Series(['a' for x in df[:3]])
df['col3'] = pd.Series(['b' for x in df[3:4]])
df['col3'] = pd.Series(['c' for x in df[4:]])
I would expect a result as follows:
col1 col2 col3
0 1 7 a
1 2 8 a
2 3 9 a
3 4 10 b
4 5 11 c
5 6 12 c
However, my code fails and I get:
col1 col2 col3
0 1 7 a
1 2 8 a
2 3 9 NaN
3 4 10 NaN
4 5 11 NaN
5 6 12 NaN
What is wrong?
Use the loc accessor:
df = pd.DataFrame({'col1': [1, 2, 3, 4, 5, 6], 'col2': [7, 8, 9, 10, 11, 12]})
df.loc[:2,'col3'] = 'a'
df.loc[3,'col3'] = 'b'
df.loc[4:,'col3'] = 'c'
df
col1
col2
col3
0
1
7
a
1
2
8
a
2
3
9
a
3
4
10
b
4
5
11
c
5
6
12
c
As #Amirhossein Kiani and #Emma notes in the comments, you're never using df itself to assign values, so there is no need to slice it. Since you can assign a list to a DataFrame column, the following suffices:
df['col3'] = ['a'] * 3 + ['b'] + ['c'] * (len(df) - 4)
You can also use numpy.select to assign values. The idea is to create a list of boolean Serieses for certain index ranges and select values accordingly. For example, if index is less than 3, select 'a', if index is between 3 and 4, select 'b', etc.
import numpy as np
df['col3'] = np.select([df.index<3, df.index.to_series().between(3, 4, inclusive='left')], ['a','b'], 'c')
Output:
col1 col2 col3
0 1 7 a
1 2 8 a
2 3 9 a
3 4 10 b
4 5 11 c
5 6 12 c
Every time you do something like df['col3'] = pd.Series(['a' for x in df[:3]]), you're assigning a new pd.Series to the column col3. One alternative way to do this is to create your new column separately, then assign it to the df.
df = pd.DataFrame({'col1': [1, 2, 3, 4, 5, 6], 'col2': [7, 8, 9, 10, 11, 12]})
new_col = ['a' for _ in range(3)] + ['b'] + ['c' for _ in range(4, len(df))]
df['col3'] = pd.Series(new_col)
I am trying to create a DataFrame like this:
column_names= ["a", "b", "c"]
vals = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
df = pd.DataFrame(vals, columns=column_names)
Which results in the following DataFrame:
a b c
0 1 2 3
1 4 5 6
2 7 8 9
I suppose this is the expected result. However, I am trying to achieve this result:
a b c
0 1 4 7
1 2 5 8
2 3 6 9
Where each nested list in vals corresponds to a whole column instead of a row.
Is there a way to get the above DataFrame without changing the way the data is passed to the constructor? Or even a method I can call to transpose the DataFrame?
Just zip it:
df = pd.DataFrame(dict(zip(column_names, vals)))
Outputs:
a b c
0 1 4 7
1 2 5 8
2 3 6 9
Try the column naming in a difference step -
column_names= ["a", "b", "c"]
vals = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
df = pd.DataFrame(vals).T
df.columns = column_names
a b c
0 1 4 7
1 2 5 8
2 3 6 9
Or if you can use numpy, you can do it in one step -
import numpy as np
column_names= ["a", "b", "c"]
vals = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
df = pd.DataFrame(vals.T, columns=column_names)
print(df)
a b c
0 1 4 7
1 2 5 8
2 3 6 9
Use transpose(df.T):
In [3397]: df = df.T.reset_index(drop=True)
In [3398]: df.columns = column_names
In [3399]: df
Out[3399]:
a b c
0 1 4 7
1 2 5 8
2 3 6 9
If use constructor simplier is use zip with unpacking list with *:
column_names= ["a", "b", "c"]
vals = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
df = pd.DataFrame(zip(*vals), columns=column_names)
print (df)
a b c
0 1 4 7
1 2 5 8
2 3 6 9
Solutions if already was created DataFrame:
df = pd.DataFrame(vals, columns=column_names)
Use DataFrame.T and reassign columns with index values:
df1 = df.T
df1.columns, df1.index = df1.index, df1.columns
print (df1)
a b c
0 1 4 7
1 2 5 8
2 3 6 9
One line solution with transpose, DataFrame.set_axis and DataFrame.reset_index:
df1 = df.T.set_axis(column_names, axis=1).reset_index(drop=True)
print (df1)
a b c
0 1 4 7
1 2 5 8
2 3 6 9
Or transpose only numpy array, thank you #Henry Yik:
df.loc[:] = df.T.to_numpy()
I want to do this with python and pandas.
Let's suppose that I have the following:
x_position y_position
0 [4, 2, 6] [1, 2, 9]
1 [1, 7] [3, 5]
and I finally want to have the following:
x_position y_position new_0_0 new_0_1 new_1_0 new_1_1 new_2_0 new_2_1
0 [4, 2, 6] [1, 2, 9] 4 1 2 2 6 9
1 [1, 7] [3, 5] 1 3 7 5 Na Na
It is not necessary that the new columns have names such as new_0_0; it can be 0_0 or even anything to be honest.
Secondly, it would be good if your code can work for more columns with lists e.g. with a z_position column too.
What is the most efficient way to do this?
Use list comprehension with DataFrame constructor and concat, sorting by second level of Multiindex in columns by DataFrame.sort_index and last flatten MultiIndex:
print (df)
x_position y_position z_position
0 [4, 2, 6] [1, 2, 9] [4,8,9]
1 [1, 7] [3, 5] [1,3]
comp = [pd.DataFrame(df[x].tolist()) for x in df.columns]
df1 = pd.concat(comp, axis=1, keys=range(len(df.columns))).sort_index(axis=1, level=1)
df1.columns = [f'new_{b}_{a}' for a, b in df1.columns]
print (df1)
new_0_0 new_0_1 new_0_2 new_1_0 new_1_1 new_1_2 new_2_0 new_2_1 \
0 4 1 4 2 2 8 6.0 9.0
1 1 3 1 7 5 3 NaN NaN
new_2_2
0 9.0
1 NaN
print (df.join(df1))
x_position y_position z_position new_0_0 new_0_1 new_0_2 new_1_0 \
0 [4, 2, 6] [1, 2, 9] [4, 8, 9] 4 1 4 2
1 [1, 7] [3, 5] [1, 3] 1 3 1 7
new_1_1 new_1_2 new_2_0 new_2_1 new_2_2
0 2 8 6.0 9.0 9.0
1 5 3 NaN NaN NaN
I have a dataframe in python
import pandas as pd
d = {'name':['a','b','c','d','e'],'location1': [1, 2,3,8,6], 'location2':
[2,1,4,6,8]}
df = pd.DataFrame(data=d)
df is as follow:
name location1 location2
0 a 1 2
1 b 2 1
2 c 3 4
3 d 8 6
4 e 6 8
I try to obtain a dataframe as:
name loc
0 a [1, 2]
1 b [2, 1]
2 c [3, 4]
3 d [8, 6]
4 e [6, 8]
How to efficiently convert that?
Here are some suggestions.
Listification and Assignment
# pandas >= 0.24
df['loc'] = df[['location1', 'location2']].to_numpy().tolist()
# pandas < 0.24
df['loc'] = df[['location1', 'location2']].values.tolist()
df
name location1 location2 loc
0 a 1 2 [1, 2]
1 b 2 1 [2, 1]
2 c 3 4 [3, 4]
3 d 8 6 [8, 6]
4 e 6 8 [6, 8]
Remove the columns using drop.
(df.drop(['location1', 'location2'], 1)
.assign(loc=df[['location1', 'location2']].to_numpy().tolist()))
name loc
0 a [1, 2]
1 b [2, 1]
2 c [3, 4]
3 d [8, 6]
4 e [6, 8]
zip with pop using List Comprehension
df['loc'] = [[x, y] for x, y in zip(df.pop('location1'), df.pop('location2'))]
# or
df['loc'] = [*map(list, zip(df.pop('location1'), df.pop('location2')))]
df
name loc
0 a [1, 2]
1 b [2, 1]
2 c [3, 4]
3 d [8, 6]
4 e [6, 8]
pop destructively removes the columns, so you get to assign and cleanup in a single step.