I am trying to create a function to perform operations with columns of a data frame, but in the end it gives me an error because when defining the lambda x: x.variable, variable takes it literally, how do I assign the value it has? variable in the x.
import pandas as pd
d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
def example(dataFrame, variable):
dataFrame= dataFrame.assign(newColumn= lambda x: x.variable**2)
example(df,'col1')
AttributeError: 'DataFrame' object has no attribute 'variable'
How can i fix this?
Changed the x.variable to x[variable] inside your lambda
def example(dataFrame, variable):
dataFrame = dataFrame.assign(newColumn = lambda x: x[variable]**2)
return dataFrame # you probably want to return the dataFrame as well
example(df,'col1')
df.column_name (x.variable) can be used only if the column already exists, in your case you are creating a new column and the property you are calling does not exist on that dataframe yet. So you can use df['new column name'] (x[variable])
Related
Please be patient I am new to Python and Pandas.
I have a lot of pandas dataframe, but some are duplicates. So I wrote a function that check if 2 dataframes are equal, if they are 1 will be deleted:
def check_eq(df1, df2):
if df1.equals(df2):
del[df2]
print( "Deleted %s" (df_name) )
The function works, but I wish to know how to have the variable "df_name" as string with the name of the dataframe.
I don't understand, the parameters df1 and df2 are dataframe objects how I can get their name at run-time if I wish to print it?
Thanks in advance.
What you are trying to use is an f-string.
def check_eq(df1, df2):
if df1.equals(df2):
del[df2]
print(f"Deleted {df2.name}")
I'm not certain whether you can call this print method, though. Since you deleted the dataframe right before you call its name attribute. So df2 is unbound.
Instead try this:
def check_eq(df1, df2):
if df1.equals(df2):
print(f"Deleted {df2.name}")
del df2
Now, do note that your usage of 'del' is also not correct. I assume you want to delete the second dataframe in your code. However, you only delete it inside the scope of the check_eq method. You should familiarize yourself with the scope concept first. https://www.w3schools.com/python/python_scope.asp
The code I used:
d = {'col1': [1, 2], 'col2': [3, 4]}
df1 = pd.DataFrame(data=d)
df2 = pd.DataFrame(data=d)
df1.name='dataframe1'
df2.name='dataframe2'
def check_eq(df1, df2):
if df1.equals(df2):
print(f"Deleted {df2.name}")
I am trying to figure out a way to convert a column in a data frame that is currently a list to a set.
#Converting column from a list to a set
df['Column2']=s.apply(lambda x: [x])
The Error I am getting is mentioned below
NameError: name 's' is not defined
Hopes this helps. It uses lambda to go over the column and set the type of each of them to a set.
import pandas as pd
df = pd.DataFrame({'Column1':[1,2,3],'Column2':[4,5,6]})
df['Column1'] = df['Column1'].apply(lambda x: {x})
I'd like to create a new column B by applying a function on each row of column A, which is of data type object and filled with list data, in dataframe DF without changing the values of column A.
def f(i):
if(type(i) is list):
for j in range(0,len(i)):
i[j]+=1
else:
i+=1
return i
df = pd.DataFrame([1,1],columns=['A'])
df['A']=df['A'].astype(object)
df.at[[0,1],'A']=[1,2]
df['B']=df['A'].apply(lambda x: f(x))
Unfortunately the following happens: df['B'] = function(df['A']), but also df['A'] = function(df['A']).
Please note: df['A'] is a list, dtype is object (o).
To be clear: I want column A to remain as original. Can anyone tell me how to achieve this?
you want to use apply on column A
df['B'] = df['A'].apply(function)
this does the function on each value in A.
essentially you are using the apply method of the series object, more info:
pandas.Series.apply
df2 = df.copy()
df['B'] = df2.apply(lamba row: function(row['A']), axis=1)
Please help me out in writing pandas custom functions, in
the confusion loop in returning specific row and col values as custom results,i want to return col means without using slicing no user defined functions like numpy(np.mean) and i need only parameter to pass is dataset 'df' to custom function.
In layman way i want to return column ['A','B'] means from function col_mean() by passing dataset "df" without using pandas slicing and predefined functions like mean/np.mean
Below is my dataset please give me code logic in getting col means.
df = pd.DataFrame({'A': [10,20,30], 'B': [20, 30, 10]})
def col_men(df):
means=[0 for i in range(df.shape[1])]
for k in range(df.shape[1]):
col_values=[row[k] for row in df]
means[k]=sum(col_values)/float(len(df))
return means
Instead of using range(df.shape[1]) use enumerate(df.columns), so you keep both name and position:
df = pd.DataFrame({'A': [10,20,30], 'B': [20, 30, 10]})
def col_men(df):
means=[0 for i in range(df.shape[1])]
for index, k in enumerate(df.columns):
col_values=[row for row in df[k]]
means[index]=sum(col_values)/len(df)
return means
col_men(df)
Here is my attempt.
The Dataframe I have now has a column which will decide how to deal with the features.
For example, df has two columns, DATA and TYPE. The TYPE has three classes: S1,S2 and S3. And I will define three different function based on the different type of samples.
#### S1
def f_s1(data):
result = data+1
return result
#### S2
def f_s2(data):
result = data+2
return result
#### S3
def f_s3(data):
result = data+3
return result
I also created a mapping set:
f_map= {'S1':f_s1,'S2':f_s2, 'S3': f_s3}
Then, I use pandas.Map utility to map these function with the type of each sample.
df['result'] = df['TYPE'].map(f_map)(df['DATA'])
But It didn't work with TypeError: 'Series' object is not callable.
Any advice would be appreciate!
df['TYPE'].map(f_map) creates a series of functions, and if you want to apply them to your data column correspondingly, one option would be to use zip() function as follows:
df['result'] = [func(data) for func, data in zip(df['TYPE'].map(f_map), df['DATA'])]
df
Alternatively, you can group by TYPE and then apply the specific function for each type(or group) to the DATA column in that group, assuming your predefined functions contain vectorized operation and thus accepting series as parameters:
df = pd.DataFrame({'TYPE':['S1', 'S2', 'S3', 'S1'], 'DATA':[1, 1, 1, 1]})
df['result'] = (df.groupby('TYPE').apply(lambda g: f_map.get(g['TYPE'].iloc[0])(g['DATA']))
.reset_index(level = 0, drop = True))