Optimize the code for dataframes in python

Optimize the code for dataframes in python - python

Below is the code for checks on two columns. I know this isn't the proper way of doing it on two columns of a dataframe but I was hoping to get help for doing it in a better way
for i in range(len(df)):
if df['Current_Value'][i].lower() == 'false'or df['Current_Value'][i].lower() == '0' and df['_Value'][i].lower() == 'false' or df['_Value'][i].lower() == '0':
df['CHECK'][i] = True
elif df['Current_Value'][i].lower() == 'true' or df['Current_Value'][i].lower() == '1' and df['_Value'][i].lower() == 'true' or df['_Value'][i].lower() == '1':
df['CHECK'][i] = True
elif df['Current_Value'][i].lower() in df['_Value'][i].lower():
df['CHECK'][i] = True
else:
df['CHECK'][i] = False

You should use lambda expressions for such a check. Although you haven't provided us with a sample dataset, what you could do is something like this:
First define the lambda function
def fill_check_column(current_value, value):
# Precompute this, so that it is calculated only once
current_value = current_value.lower()
value = value.lower()
if current_value in ['false', '0'] and value in ['false', '0']:
return True
elif current_value in ['true', '1'] and value in ['true', '1']:
return True
elif current_value in value:
return True
else:
return False
Then use it on the data frame:
df['Check'] = df.apply(lambda row: fill_check_column(current_value=row['Current_Value'],
value=row['_Value'],
axis=1)
You could also improve the fill_check_column to make the checks only once.

Related

Apply result to dataset after df.iterrows

df = pd.read_csv('./test22.csv')
df.head(5)
df = df.replace(np.nan, None)
for index,col in df.iterrows():
# Extract only if date1 happened earlier than date2
load = 'No'
if col['date1'] == None or col['date2'] == None:
load = 'yes'
elif int(str(col['date1'])[:4]) >= int(str(col['date2'])[:4]) and \
(len(str(col['date1'])) == 4 or len(str(col['date2'])) == 4):
load = 'yes'
elif int(str(col['date1'])[:6]) >= int(str(col['date2'])[:6]) and \
(len(str(col['date1'])) == 6 or len(str(col['date2'])) == 6):
load = 'yes'
elif int(str(col['date1'])[:8]) >= int(str(col['date2'])[:8]):
load = 'yes'
df.head(5)
After preprocessing using iterrows in dataset, If you look at the above code (attached code), it will not be reflected in the actual dataset. I want to reflect the result in actual dataset.
How can I apply it to the actual dataset?

Replace your for loop with a function that returns a boolean, then you can use df.apply to apply it to all rows, and then filter your dataframe by that value:
def should_load(x):
if x['date1'] == None or x['date2'] == None:
return True
elif int(str(x['date1'])[:4]) >= int(str(x['date2'])[:4]) and \
(len(str(x['date1'])) == 4 or len(str(x['date2'])) == 4):
return True
elif int(str(x['date1'])[:6]) >= int(str(x['date2'])[:6]) and \
(len(str(x['date1'])) == 6 or len(str(x['date2'])) == 6):
return True
elif int(str(x['date1'])[:8]) >= int(str(x['date2'])[:8]):
return True
return False
df[df.apply(should_load, axis=1)].head(5)

how to solve without loop or recursions, puzzling?

i want to make a function that takes a list as an input and return values when there are two of the same elements.The list is sorted.I can't use any loops or recursions is there any way it is possible?
def continuity(lst):
if len(lst)==1:
return 'no'
elif lst[0]==lst[1]:
return 'yes'
else:
return continuity(lst[1:])
this is what i did but it uses recursion.

def continuity(lst):
if len(lst) > 1
if len(lst) == len(set(lst)):
return 'yes'
else:
return 'no'
else:
return 'no'

Create a set for the same & then compare the length:
def continuity(lst):
lst_set = set(lst)
return 'yes' if len(lst_set) == len(lst) and len(lst) != 1 else 'no'
Using counter:
def continuity(lst):
from collections import Counter
occurrences = list(Counter(x).values())
sum = sum(occurrences)
return 'yes' if sum == len(lst_set) and len(lst) != 1 else 'no'

Conditional column in DataFrame: where's the mistake?

Let's start with a Pandas DataFrame df with numerical columns pS, pS0 and pE:
import pandas as pd
df = pd.DataFrame([[0.1,0.2,0.7],[0.3,0.6,0.1],[0.9,0.1,0.0]],
columns=['pS','pE','pS0'])
We want to build a column indicating which of the 3 previous is dominating. I achieved it this way:
def class_morph(x):
y = [x['pE'],x['pS'],x['pS0']]
y.sort(reverse=True)
if (y[0] == y[1]):
return 'U'
elif (x['pE'] == y[0]):
return 'E'
elif (x['pS'] == y[0]):
return 'S'
elif (x['pS0'] == y[0]):
return 'S0'
df['Morph'] = df.apply(class_morph, axis=1)
Which gives the correct result:
But my initial try was the following:
def class_morph(x):
if (x['pE'] > np.max(x['pS'],x['pS0'])):
return 'E'
elif (x['pS'] > np.max(x['pE'],x['pS0'])):
return 'S'
elif (x['pS0'] > np.max(x['pS'],x['pE'])):
return 'S0'
else:
return 'U'
Which returned something wrong:
Could somebody explain to me what is my mistake in my first try?

python - form check doesn't work

I have a problem. I want to create a form in python. If something is wrong, I want an alert-window (showwarning()). Otherwise it should write 'TRUE' into the command line.
The problem is that I get every time a alert-window. It does not care if the form is filled out correctly or wrong.
Can somebody help me with this problem?
code:
""" Variables """
inputError_1 = bool(0)
inputError_2 = bool(0)
inputError_3 = bool(0)
valueCheck = bool(0)
""" Check-Button """
def Check():
if len(nameOne.get()) == 0:
inputError_1 == TRUE
elif len(nameTwo.get()) == 0:
inputError_2 == TRUE
elif len(comment.get(INSERT)) == 0:
inputError_3 == TRUE
else:
valueCheck = bool(1)
if inputError_1 == FALSE or inputError_2 == FALSE or inputError_3 == FALSE:
showwarning()
else:
print'TRUE'

I think that you can do this in a simpler way:
def check():
if len(nameOne.get()) == 0 or len(nameTwo.get()) == 0 or len(comment.get(INSERT)) == 0:
showwarning()
else:
print 'True'
check()

Python accumulative conditional logic

I am struggling with the logic for the problem below and I know that even if I nailed the logic, I would probably implement it clumsily, so any advice would be awesome.
I have a dictionary representing a file:
the_file = {'Filename':'x:\\myfile.doc','Modified':datetime(2012,2,3),'Size':32412}
I have a list of filters and I want to filter the file dictionary to determine a match.
filters = [
{'Key':'Filename','Criteria':'Contains','Value':'my'},
{'Key':'Filename','Criteria':'Does not end with','Value':'-M.txt'},
{'Key':'Modified','Criteria':'After','Value':datetime(2012,1,1)}
]
My best attempt at creating a function to do this (which doesn't work):
def is_asset(the_file, filters):
match = False
for f in filters:
if f['Key'] == u'Filename':
if f['Criteria'] == u'Contains':
if f['Value'] in the_file['Filename']:
match = True
elif f['Criteria'] == u'Starts with':
if the_file['Filename'].startswith(f['Value']):
match = True
elif f['Criteria'] == u'Ends with':
if the_file['Filename'].endswith(f['Value']):
match = True
elif not f['Criteria'] == u'Does not end with':
if the_file['Filename'].endswith(f['Value']):
match = False
elif f['Criteria'] == u'Equals':
if os.path.basename(the_file['Filename']) == f['Value']:
match = True
elif f['Criteria'] == u'Does not contain':
if f['Value'] in the_file['Filename']:
match = False
elif f['Key'] == u'Modified':
mtime = int(os.path.getmtime(the_file['Filename']))
if f['Criteria'] == u'Before':
if f['Value'] > datetime.fromtimestamp(mtime):
the_file['Modified'] = mtime
match = True
elif f['Criteria'] == u'After':
if f['Value'] < datetime.fromtimestamp(mtime):
the_file['Modified'] = mtime
match = True
elif f['Key'] == u'Size':
size = long(os.path.getsize(the_file['Filename']))
if f['Criteria'] == u'Bigger':
if f['Value'] < size:
the_file['Size'] = size
match = True
elif f['Value'] > size:
the_file['Size'] = size
match = True
if match:
return the_file

Instead of trying to do it in one megafunction, break it down into smaller steps.
filenamecondmap = {
u'Contains': operator.contains,
u'Does not end with': lambda x, y: not x.endswith(y),
...
}
...
condmap = {
u'Filename': filenamecondmap,
u'Modified': modifiedcondmap,
...
}
And then walk the structures until you have the conditional, then execute it.
condmap[u'Filename'][u'Contains'](thefile['Filename'], 'my')

You can simply use functions as your 'criteria'. This makes it much simpler, as you don't have to write a big if-else ladder. Something like this:
def contains(value, sub):
return sub in value
def after(value, dt):
return value > dt
filters = [
{'Key': 'FileName', 'Criteria': contains, 'Value': 'my'},
{'Key': 'Modified', 'Criteria': after, 'Value': datetime(2012,1,1)}
]
for f in filters:
if f['Criteria'](filename[f['Key']], f['Value']):
return True

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Optimize the code for dataframes in python - python

Related

Apply result to dataset after df.iterrows

how to solve without loop or recursions, puzzling?

Conditional column in DataFrame: where's the mistake?

python - form check doesn't work

Python accumulative conditional logic

Categories

Resources