Blank input to a defined function in Python - python

I have a defined function to slice a dataframe and do some analysis like this
def df_slice(startrow,endrow):
do something...
newdf = df[startrow,endrow]
do something...
return newdf
Normally for to analyze the first few rows a df, I can just use
df1= df_slice(0,10)
But what if I wish to slice the last 5 rows of the dataframe?
so that in the function
newdf = df[-5:]
I would not use df1= df_slice(-5,'') or just leave in blank like df1= df_slice(-5,).
What should I do?

Found the answer, just input None as the parameter.

df[startrow, endrow]
is equivalent to
df.__getitem__((startrow, endrow))
And
df[startrow:]
is equivalent to
df.__getitem__(slice(startrow, None))
Here is a sample code:
class MyCollection:
def __getitem__(self, item):
return item
my_collection = MyCollection()
print(my_collection[1, 2])
print(my_collection[1:])
output:
(1, 2)
slice(1, None, None) # start, stop, step
As you noticed, the omitting slice element means None.
So you can call
newdf = df_slice(-5, None)
reference: https://docs.python.org/3/library/functions.html#slice

Related

if statement and call function for dataframe

I know how to apply an IF condition in Pandas DataFrame. link
However, my question is how to do the following:
if (df[df['col1'] == 0]):
sys.path.append("/desktop/folder/")
import self_module as sm
df = sm.call_function(df)
What I really want to do is when value in col1 equals to 0 then call function call_function().
def call_function(ds):
ds['new_age'] = (ds['age']* 0.012345678901).round(12)
return ds
I provide a simple example above for call_function().
Since your function interacts with multiple columns and returns a whole data frame, run conditional logic inside the method:
def call_function(ds):
ds['new_age'] = np.nan
ds.loc[ds['col'] == 0, 'new_age'] = ds['age'].mul(0.012345678901).round(12)
return ds
df = call_function(df)
If you are unable to modify the function, run method on splits of data frame and concat or append together. Any new columns in other split will be have values filled with NAs.
def call_function(ds):
ds['new_age'] = (ds['age']* 0.012345678901).round(12)
return ds
df = pd.concat([call_function(df[df['col'] == 0].copy()),
df[df['col'] != 0].copy()])

How to apply a function in Pandas to a cell in every row where a different cell in that same row meets a condition?

I am trying to use the pandas string method "str.zfill" to add leading zeros to a cell in the same column for every row in the dataframe where another cell in that row meets a certain condition. So for any given row in my DataFrame "excodes", when the value in column "LOB_SVC_CD" is "MTG", apply the str.zfill(5) method to the cell in column "PS_CD". When the value in "LOB_SVC_CD" is not "MTG" leave the value in "PS_CD" as is.
I've tried a few custom functions, "np.where" and a few apply/map lambdas. I'm getting errors on all of them.
#Custom Function
def add_zero(column):
if excodes.loc[excodes.LOB_SVC_CD == 'MTG']:
excodes.PS_CD.str.zfill(5)
else:
return excodes.PS_CD
excodes['code'] = excodes.apply(add_zero)
#Custom Function with For Loop
def add_zero2(column):
code = []
for row(i) in column:
if excodes.LOB_SVC_CD == 'MTG':
code.append(excodes.PS_CD.str.zfill(5))
else:
code.append(excodes.PS_CD)
excodes['Code'] = code
excodes['code'] = excodes.apply(add_zero)
#np.Where
mask = excodes[excodes.LOB_SVC_CD == 'MTG']
excodes['code'] = pd.DataFrame[np.where(mask, excodes.PS_CD.str.zfill(5), excodes.PS_CD)]
#Lambda
excodes['code'] = excodes['LOB_SVC_CD'].map(lambda x: excodes.PS_CD.str.zfill(5)) if x[excodes.LOB_SVC_CD == 'MTG'] else excodes.PS_CD)
#Assign with a "Where"
excodes.assign((excodes.PS_CD.str.zfill(5)).where(excodes.LOB_SVC_CD == 'MTG'))
Expected results will be either:
create a new called "code" with all values in "PS_CD" are given leading zeroes in rows where excodes.LOB_SVC_CD == 'MTG'
adding leading zeroes to the values in excodes["PS_CD"] when the row excodes['LOB_SVC_CD'] == 'MTG'
Error Messages I'm getting are - on each of the approaches I've tried:
#Custom Function:
"ValueError: ('The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index PS_CD')"
# Custom Function with For Loop:
"SyntaxError: can't assign to function call"
#np.Where:
"ValueError: operands could not be broadcast together with shapes (152,7) (720,) (720,)"
#Apply Lambda:
"string indices must be integers"
#Assign with a "Where":
"TypeError: assign() takes 1 positional argument but 2 were given"
This seems to work :)
# Ensure the data in the PS_CD are strings
data["PS_CD"] = data["PS_CD"].astype(str)
# Iterate over all rows
for index in data.index:
# If the LOB_SVC_CD is "MTG"
if (data.loc[index, "LOB_SVC_CD"] == "MTG"):
# Apply the zfill(5) in the PS_CD on the same row (index)
data.loc[index, "PS_CD"] = data.loc[index, "PS_CD"].zfill(5)
# Print the result
print(data)
Alternative way (maybe a bit more Python-ish) :)
# Ensure the data in the PS_CD are strings
data["PS_CD"] = data["PS_CD"].astype(str)
# Custom function for applying the zfill
def my_zfill(x, y):
return y.zfill(5) if x == "MTG" else y
# Iterate over the data applying the custom function on each row
data["PS_CD"] = pd.Series([my_zfill(x, y) for x, y in zip(data["LOB_SVC_CD"], data["PS_CD"])])
My take:
>>> import pandas
>>> df = pandas.DataFrame(data = [['123', 'MTG'],['321', 'CLOC']], columns = ['PS_CD', 'LOB_SVC_CD'])
>>> df
PS_CD LOB_SVC_CD
0 123 MTG
1 321 CLOC
>>>
>>> df['PS_CD'] = df.apply(lambda row: row['PS_CD'].zfill(5) if row['LOB_SVC_CD'] == 'MTG' else row['PS_CD'], axis='columns')
>>> df
PS_CD LOB_SVC_CD
0 00123 MTG
1 321 CLOC
Using lambda will return value for every row, zfilled PS_CD if LOB_SVC_CD was MTG else original PS_CD.

issue in writing function to filter rows data frame

I am writing a function that will serve as filter for rows that I wanted to use.
The sample data frame is as follow:
df = pd.DataFrame()
df ['Xstart'] = [1,2.5,3,4,5]
df ['Xend'] = [6,8,9,10,12]
df ['Ystart'] = [0,1,2,3,4]
df ['Yend'] = [6,8,9,10,12]
df ['GW'] = [1,1,2,3,4]
def filter(data,Game_week):
pass_data = data [(data['GW'] == Game_week)]
when I recall the function filter as follow, I got an error.
df1 = filter(df,1)
The error message is
AttributeError: 'NoneType' object has no attribute 'head'
but when I use manual filter, it works.
pass_data = df [(df['GW'] == [1])]
This is my first issue.
My second issue is that I want to filter the rows with multiple GW (1,2,3) etc.
For that I can manually do it as follow:
pass_data = df [(df['GW'] == [1])|(df['GW'] == [2])|(df['GW'] == [3])]
if I want to use in function input as list [1,2,3]
how can I write it in function such that I can input a range of 1 to 3?
Could anyone please advise?
Thanks,
Zep
Use isin for pass list of values instead scalar, also filter is existing function in python, so better is change function name:
def filter_vals(data,Game_week):
return data[data['GW'].isin(Game_week)]
df1 = filter_vals(df,range(1,4))
Because you don't return in the function, so it will be None, not the desired dataframe, so do (note that also no need parenthesis inside the data[...]):
def filter(data,Game_week):
return data[data['GW'] == Game_week]
Also, isin may well be better:
def filter(data,Game_week):
return data[data['GW'].isin(Game_week)]
Use return to return data from the function for the first part. For the second, use -
def filter(data,Game_week):
return data[data['GW'].isin(Game_week)]
Now apply the filter function -
df1 = filter(df,[1,2])

Is there an "Identity" filter in pandas

I have a function that takes in some complex parameters and is expected to return a filter to be used on a pandas dataframe.
filters = build_filters(df, ...)
filtered_df = df[filters]
For example, if the dataframe has series Gender and Age, build_filters could return (df.Gender == 'M') & (df.Age == 100)
If, however, build_filters determines that there should be no filters applied, is there anything that I can return (i.e. the "identity filter") that will result in df not being filtered?
I've tried the obvious things like None, True, and even a generator that returns True for every call to next()
The closest I've come is
operator.ne(df.ix[:,0], nan)
which I think is silly, and likely going to cause bugs I can't yet forsee.
You can return slice(None). Here's a trivial demonstration:
df = pd.DataFrame([[1, 2, 3]])
df2 = df[slice(None)] # equivalent to df2 = df[:]
df2[0] = -1
assert df.equals(df2)
Alternatively, use pd.DataFrame.pipe and return df if no filters need to be applied:
def apply_filters(df):
# some logic
if not filter_flag:
return df
else:
# mask = ....
return df[mask]
filtered_df = df.pipe(apply_filters)

Changing self.variables inside __exit__ method of Context Managers

First thing first, the title is very unclear, however nothing better sprang to my mind. I'll ellaborate the problem in more detail.
I've found myself doing this routine a lot with pandas dataframes. I need to work for a while with only the part(some columns) of the DataFrame and later I want to add those columns back. The an idea came to my mind = Context Managers. But I am unable to come up with the correct implementation (if there is any..).
import pandas as pd
import numpy as np
class ProtectColumns:
def __init__(self, df, protect_cols=[]):
self.protect_cols = protect_cols
# preserve a copy of the part we want to protect
self.protected_df = df[protect_cols].copy(deep=True)
# create self.df with only the part we want to work on
self.df = df[[x for x in df.columns if x not in protect_cols]]
def __enter__(self):
# return self, or maybe only self.df?
return self
def __exit__(self, *args, **kwargs):
# btw. do i need *args and **kwargs here?
# append the preserved data back to the original, now changed
self.df[self.protect_cols] = self.protected_df
if __name__ == '__main__':
# testing
# create random DataFrame
df = pd.DataFrame(np.random.randn(6,4), columns=list("ABCD"))
# uneccessary step
df = df.applymap(lambda x: int(100 * x))
# show it
print(df)
# work without cols A and B
with ProtectColumns(df, ["A", "B"]) as PC:
# make everything 0
PC.df = PC.df.applymap(lambda x: 0)
# this prints the expected output
print(PC.df)
However, say I don't want to use PC.df onwards, but df. I could just do df = PC.df, or make a copy inside with or after that. But is is possible to handle this inside e.g. the __exit__ method?
# unchanged df
print(df)
with ProtectColumns(df, list("AB")) as PC:
PC.applymap(somefunction)
# df is now changed
print(df)
Thanks for any ideas!

Categories

Resources