Pandas apply function on dataframe over multiple columns

Pandas apply function on dataframe over multiple columns - python

When I run the following code I get an KeyError: ('a', 'occurred at index a'). How can I apply this function, or something similar, over the Dataframe without encountering this issue?
Running python3.6, pandas v0.22.0
import numpy as np
import pandas as pd
def add(a, b):
return a + b
df = pd.DataFrame(np.random.randn(3, 3),
columns = ['a', 'b', 'c'])
df.apply(lambda x: add(x['a'], x['c']))

I think need parameter axis=1 for processes by rows in apply:
axis: {0 or 'index', 1 or 'columns'}, default 0
0 or index: apply function to each column
1 or columns: apply function to each row
df = df.apply(lambda x: add(x['a'], x['c']), axis=1)
print (df)
0 -0.802652
1 0.145142
2 -1.160743
dtype: float64

You don't even need apply, you can directly add the columns. The output will be a series either way:
df = df['a'] + df['c']
for example:
df = pd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [5, 6]})
df = df['a'] + df['c']
print(df)
# 0 6
# 1 8
# dtype: int64

you can try this
import numpy as np
import pandas as pd
def add(df):
return df.a + df.b
df = pd.DataFrame(np.random.randn(3, 3),
columns = ['a', 'b', 'c'])
df.apply(add, axis =1)
where of course you can substitute any function that takes as inputs the columns of df.

Related

Pandas assign() has no effect when used in user-defined function

When I use the DataFrame.assign() method in my own function foobar, it has no effect to the global DataFrame.
#!/usr/bin/env python3
import pandas as pd
def foobar(df):
# has no affect to the "global" df
df.assign(Z = lambda x: x.A + x.B)
return df
data = {'A': range(3),
'B': range(3)}
df = pd.DataFrame(data)
df = foobar(df)
# There is no 'Z' column in this df
print(df)
The result output
A B
0 0 0
1 1 1
2 2 2
I assume this has something to do with the difference of views and copy's in Pandas. But I am not sure how to handle this the right and elegant Pandas-way.

Pandas assign returns a DataFrame so you need to assign the result to the same df. Try this:
def foobar(df):
df = df.assign(Z = lambda x: x.A + x.B)
return df

How to concat item that is in list format in columns in dataframe

i want to concatenate item that is in list format in dataframe
i have a data frame below, when i print the DataFrame.head(), it shows below
A B
1 [1,2,3,4]
2 [5,6,7,8]
Expect Result (convert it from list to string separate by comma)
A B
1 1,2,3,4
2 5,6,7,8

You could do:
import pandas as pd
data = [[1, [1,2,3,4]],
[2, [5,6,7,8]]]
df = pd.DataFrame(data=data, columns=['A', 'B'])
df['B'] = [','.join(map(str, lst)) for lst in df.B]
print(df.head(2))
Output
A B
0 1 1,2,3,4
1 2 5,6,7,8

You can use the map or apply methods for this:
import pandas as pd
data = [[1, [1,2,3,4]],
[2, [5,6,7,8]]]
df = pd.DataFrame(data=data, columns=['A', 'B'])
df['B'] = df['B'].map(lambda x: ",".join(map(str,x)))
# or
# df['B'] = df['B'].apply(lambda x: ",".join(map(str,x)))
print(df.head(2))

df = pd.DataFrame([['1',[1,2,3,4]],['2',[5,6,7,8]]], columns=list('AB'))
generic way to convert lists to strings. in your example, your list is of type int but it could be any type that can be represented as a string to join the elements in the list by using ','.join(map(str, a_list)) Then just iterate through the rows in the specific column that cotains the lists you want to join
for i, row in df.iterrows():
df.loc[i,'B'] = ','.join(map(str, row['B']))

Drop multiple columns that end with certain string in Pandas

I have a dataframe with a lot of columns using the suffix '_o'. Is there a way to drop all the columns that has '_o' in the end of its label?
In this post I've seen a way to drop the columns that start with something using the filter function. But how to drop the ones that end with something?

Pandonic
df = df.loc[:, ~df.columns.str.endswith('_o')]
df = df[df.columns[~df.columns.str.endswith('_o')]]
List comprehensions
df = df[[x for x in df if not x.endswith('_o')]]
df = df.drop([x for x in df if x.endswith('_o')], 1)

To use df.filter() properly here you could use it with a lookbehind:
>>> df = pd.DataFrame({'a': [1, 2], 'a_o': [2, 3], 'o_b': [4, 5]})
>>> df.filter(regex=r'.*(?<!_o)$')
a o_b
0 1 4
1 2 5

This can be done by re-assigning the dataframe with only the needed columns
df = df.iloc[:, [not o.endswith('_o') for o in df.columns]]

Dataframe empty when passing column names

I am facing issue where on passing numpy array to dataframe without column names initializes it properly. Whereas, if I pass column names, it is empty.
x = np.array([(1, '1'), (2, '2')], dtype = 'i4,S1')
df = pd.DataFrame(x)
In []: df
Out[]:
f0 f1
0 1 1
1 2 2
df2 = pd.DataFrame(x, columns=['a', 'b'])
In []: df2
Out[]:
Empty DataFrame
Columns: [a, b]
Index: []

I think you need specify column names in parameter dtype, see DataFrame from structured or record array:
x = np.array([(1, '1'), (2, '2')], dtype=[('a', 'i4'),('b', 'S1')])
df2 = pd.DataFrame(x)
print (df2)
a b
0 1 b'1'
1 2 b'2'
Another solution without parameter dtype:
x = np.array([(1, '1'), (2, '2')])
df2 = pd.DataFrame(x, columns=['a', 'b'])
print (df2)
a b
0 1 1
1 2 2

It's the dtype param, without specifiying it, it works as expected.
See the example at documentation DataFrame
import numpy as np
import pandas as pd
x = np.array([(1, "11"), (2, "22")])
df = pd.DataFrame(x)
print df
df2 = pd.DataFrame(x, columns=['a', 'b'])
print df2

How to change fragmnet of a text in pandas data frame

I have a problem with replace text in df. I tried to use df.replace() function but in my case it failed. So here is my example:
df = pd.DataFrame({'col_a':['A', 'B', 'C'], 'col_b':['_world1_', '-world1_', '*world1_']})
df = df.replace(to_replace='world1', value='world2')
Unfortunately this code doesn't change anything, I still have world1 in my df
Someone have any suggestions ?

Use vectorised str.replace to replace string matches in your text:
In [245]:
df = pd.DataFrame({'col_a':['A', 'B', 'C'], 'col_b':['_world1_', '-world1_', '*world1_']})
df['col_b'] = df['col_b'].str.replace('world1', 'world2')
df
Out[245]:
col_a col_b
0 A _world2_
1 B -world2_
2 C *world2_

The value you want to replace does not exist.
That one works:
import pandas as pd
df = pd.DataFrame({'col_a':['A', 'B', 'C'], 'col_b':['_world1_', '-world1_', '*world1_']})
print df
df = df.replace(to_replace='*world1_', value='world2')
print df

Here you go:
df.col_b = df.apply(lambda x: x.col_b.replace('world1','world2'), axis = 1)
In [13]: df
Out[13]:
col_a col_b
0 A _world2_
1 B -world2_
2 C *world2_
There could be many more options, however with the function replace that you are referring to, it can be used with regex as well
In [21]: df.replace('(world1)','world2',regex=True)
Out[21]:
col_a col_b
0 A _world2_
1 B -world2_
2 C *world2_

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas apply function on dataframe over multiple columns - python

You don't even need apply, you can directly add the columns. The output will be a series either way: df = df['a'] + df['c'] for example: df = pd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [5, 6]}) df = df['a'] + df['c'] print(df) # 0 6 # 1 8 # dtype: int64

you can try this import numpy as np import pandas as pd def add(df): return df.a + df.b df = pd.DataFrame(np.random.randn(3, 3), columns = ['a', 'b', 'c']) df.apply(add, axis =1) where of course you can substitute any function that takes as inputs the columns of df.

Related

Pandas assign() has no effect when used in user-defined function

How to concat item that is in list format in columns in dataframe

Drop multiple columns that end with certain string in Pandas

Dataframe empty when passing column names

How to change fragmnet of a text in pandas data frame

Categories

Resources