python pandas DataFrame - assign a list to multiple cells - python

I have a DataFrame like
name col1 col2
a aa 123
a bb 123
b aa 234
and a list
[1, 2, 3]
I want to replace the col2 of every row with col1 = 'aa' with the list like
name col1 col2
a aa [1, 2, 3]
a bb 123
b aa [1, 2, 3]
I tried something like
df.loc[df[col1] == 'aa', col2] = [1, 2, 3]
but it gives me the error:
ValueError: could not broadcast input array from shape (xx,) into shape (yy,)
How should I get around this?

Make it simple. np.where should do. Code below
df['col2']=np.where(df['col1']=='aa', str(lst), df['col2'])
Alternatively use pd.Series with list locked in double brackects
df['col2']=np.where(df['col1']=='aa', pd.Series([lst]), df['col2'])

import pandas as pd
df = pd.DataFrame({"name":["a","a","b"],"col1":["aa","bb","aa"],"col2":[123,123,234]})
l = [1,2,3]
df["col2"] = df.apply(lambda x: l if x.col1 == "aa" else x.col2, axis =1)
df

A list comprehension with an if/else should work
df['col2'] = [x['col2'] if x['col1'] != 'aa' else [1,2,3] for ind,x in df.iterrows()]

It will be safe to do with for loop
df.col2 = df.col2.astype(object)
for x in df.index:
if df.at[x,'col1'] == 'aa':
df.at[x,'col2'] = [1,2,3]
df
name col1 col2
0 a aa [1, 2, 3]
1 a bb 123
2 b aa [1, 2, 3]

You can also use:
data = {'aa':[1,2,3]}
df['col2'] = np.where(df['col1'] == 'aa', df['col1'].map(data), df['col2'])
You should use this with care, as doing this will change list to both locations:
df['col2'].loc[0].append(5)
print(df)
#OUTPUT
name col1 col2
0 a aa [1, 2, 3, 5]
1 a bb 123
2 b aa [1, 2, 3, 5]
But this is fine:
df = df.loc[1:]
print(df)
#OUTPUT
name col1 col2
1 a bb 123
2 b aa [1, 2, 3]

Related

Pandas Dataframe add element to a list in a cell

I am trying something like this:
List append in pandas cell
But the problem is the post is old and everything is deprecated and should not be used anymore.
d = {'col1': ['TEST', 'TEST'], 'col2': [[1, 2], [1, 2]], 'col3': [35, 89]}
df = pd.DataFrame(data=d)
col1
col2
col3
TEST
[1, 2, 3]
35
TEST
[1, 2, 3]
89
My Dataframe looks like this, were there is the col2 is the one I am interested in. I need to add [0,0] to the lists in col2 for every row in the DataFrame. My real DataFrame is of dynamic shape so I cant just set every cell on its own.
End result should look like this:
col1
col2
col3
TEST
[1, 2, 3, 0, 0]
35
TEST
[1, 2, 3, 0, 0]
89
I fooled around with df.apply and df.assign but I can't seem to get it to work.
I tried:
df['col2'] += [0, 0]
df = df.col2.apply(lambda x: x.append([0,0]))
Which returns a Series that looks nothing like i need it
df = df.assign(new_column = lambda x: x + list([0, 0))
Not sure if this is the best way to go but, option 2 works with a little modification
import pandas as pd
d = {'col1': ['TEST', 'TEST'], 'col2': [[1, 2], [1, 2]], 'col3': [35, 89]}
df = pd.DataFrame(data=d)
df["col2"] = df["col2"].apply(lambda x: x + [0,0])
print(df)
Firstly, if you want to add all members of an iterable to a list use .extend instead of .append. This doesn't work because the method works inplace and doesn't return anything so "col2" values become None, so use list summation instead. Finally, you want to assign your modified column to the original DataFrame, not override it (this is the reason for the Series return)
One idea is use list comprehension:
df["col2"] = [x + [0,0] for x in df["col2"]]
print (df)
col1 col2 col3
0 TEST [1, 2, 0, 0] 35
1 TEST [1, 2, 0, 0] 89
for val in df['col2']:
val.append(0)

Combine lists from several columns into one nested list pandas

Here is my dataframe:
| col1 | col2 | col3 |
----------------------------------
[1,2,3,4] | [1,2,3,4] | [1,2,3,4]
I also have this function:
def joiner(col1,col2,col3):
snip = []
snip.append(col1)
snip.append(col2)
snip.append(col3)
return snip
I want to call this on each of the columns and assign it to a new column.
My end goal would be something like this:
| col1 | col2 | col3 | col4
------------------------------------------------------------------
[1,2,3,4] | [1,2,3,4] | [1,2,3,4] | [[1,2,3,4],[1,2,3,4],[1,2,3,4]]
Just .apply list on axis=1, it'll create lists for each rows
>>> df['col4'] = df.apply(list, axis=1)
OUTPUT:
col1 col2 col3 col4
0 [1, 2, 3, 4] [1, 2, 3, 4] [1, 2, 3, 4] [[1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4]]
You can just do
df['col'] = df.values.tolist()

combine two column and create new column using pandas library

df = pd.read_csv("school_data.csv")
col1 col2
0 [1,2,3] [4,5,6]
1 [0,5,3] [6,2,5]
want o/p
col1 col2 col3
0 [1,2,3] [4,5,6] [1,2,3,4,5,6]
1 [0,5,3] [6,2,5] [0,5,3,6,2,5]
col1 and col2 value are unique,
using pandas
Simplest way would be to do this:
df['col3'] = df['col1'] + df['col2']
Example:
import pandas as pd
row1 = [[1,2,3], [4,5,6]]
row2 = [[0,5,3], [6,2,5]]
df = pd.DataFrame(data=[row1, row2], columns=['col1', 'col2'])
df['col3'] = df['col1'] + df['col2']
print(df)
Output:
col1 col2 col3
0 [1, 2, 3] [4, 5, 6] [1, 2, 3, 4, 5, 6]
1 [0, 5, 3] [6, 2, 5] [0, 5, 3, 6, 2, 5]
You can use apply function on more than one column at once, like this:
def func(x):
return x['col1'] + x['col2']
df['col3'] = df[['col1','col2']].apply(func, axis=1)
Why not do a simple df['col1'] + df['col2']?
Assume col1 has list but in str types. In that case you can always modify func to:
def func(x):
return x['col1'][1:-1].split(',') + x['col2']

Create new pandas column with same list as every row?

I would like to create a new column in a dataframe that has a list at every row. I'm looking for something that will accomplish the following:
df = pd.DataFrame(data={'A': [1, 2, 3], 'B': ['x', 'y', 'z']})
list_=[1,2,3]
df['new_col] = list_
A B new_col
0 1 x [1,2,3]
1 2 y [1,2,3]
2 3 z [1,2,3]
Does anyone know how to accomplish this? Thank you!
df = pd.DataFrame(data={'A': [1, 2, 3], 'B': ['x', 'y', 'z']})
list_=[1,2,3]
df['new_col'] = [list_]*len(df)
Output:
A B new_col
0 1 x [1, 2, 3]
1 2 y [1, 2, 3]
2 3 z [1, 2, 3]
Tip: list as a variable name is not advised. list is a built in type like str, int etc.
df['new_col'] = pd.Series([mylist for x in range(len(df.index))])
("list" is a terrible variable name, so I'm using "mylist" in this example).
df['new_col'] = [[1,2,3] for j in range(df.shape[0])]

how does pandas.Series.str.get method work?

I have a pandas.Series named matches like this:
When I called pandas.Series.str.get method on it, it returns a new Series with its values all NaN:
I have read the document pandas.Series.str.get, but still can't understand it.
It return second element from iterable, it is same as str[1]:
df = pd.DataFrame({"A": [[1,2,3], [0,1,3]], "B":['aswed','yuio']})
print (df)
A B
0 [1, 2, 3] aswed
1 [0, 1, 3] yuio
df['C'] = df['A'].str.get(1)
df['C1'] = df['A'].str[1]
df['D'] = df['B'].str.get(1)
df['D1'] = df['B'].str[1]
print (df)
A B C C1 D D1
0 [1, 2, 3] aswed 2 2 s s
1 [0, 1, 3] yuio 1 1 u u

Categories

Resources