I'm feeling quite generally stupid at the moment.
I've been attempting to get my function to run a some code once a specific name has been inputted.
I feel as if the code I am putting together is quite childish and I'm currently trying to find some help on what I am doing wrong. Any input is appreciated. I apologise for my messy code.
I want to simply get my function argument (cond) to be able to run this specific code if its either inexpensive, large_screen or apple_product. Thanks for all your help :)
def satisfies(product, cond):
inexpensive = (4, '<=', 1000)
large_screen = (2, '>=', 6.3)
apple_product = (1, '==', 'Apple')
conditions = (inexpensive, large_screen, apple_product)
if cond == conditions[0]:
if cond[1] == '<=':
return True if cond[2] <= product[4] else False
elif cond[1] == '<=':
return True if cond[2] <= product[4] else False
elif cond[1] == '==':
return True if cond[2] == product[4] else False
if cond == conditions[1]:
if cond[1] == '<=':
return True if cond[2] <= product[4] else False
elif cond[1] == '<=':
return True if cond[2] <= product[4] else False
elif cond[1] == '==':
return True if cond[2] == product[4] else False
if cond == conditions[2]:
if cond[1] == '==':
return True if cond[2] == product[1] else False
Input : A product feature list (product() and a condition
(cond) as specified above.
Output: True if cond holds for the product otherwise False.
Calling satisfies(['Nova 5T', 'Huawei', 6.26, 3750, 497], inexpensive) should return True.
You can simplify your code by
splitting it into two functions:
one that checks a single condition
another that checks many using a combinator function; the default is all(), for "AND"; if you want, you can use combinator=any for an "OR" sort of query).
using the operator module, which contains predicate functions for various operators, and a mapping for your operator strings to those functions
import operator
operators = {
"<=": operator.le,
">=": operator.ge,
"==": operator.eq,
"!=": operator.ne,
"<": operator.lt,
">": operator.gt,
}
def check_condition(product, cond):
field, operator, value = cond
return operators[operator](product[field], value)
def check_conditions(product, conditions, combinator=all):
return combinator(
check_condition(product, cond)
for cond in conditions
)
def check_products_conditions(products, conditions, combinator=all):
return [
product
for product in products
if check_conditions(product, conditions, combinator=combinator)
]
inexpensive = (4, "<=", 1000)
large_screen = (2, ">=", 6.3)
apple_product = (1, "==", "Apple")
product = ['Nova 5T', 'Huawei', 6.26, 3750, 497]
# Check single condition
check_condition(
product,
inexpensive,
)
# Check multiple conditions
check_conditions(
product,
conditions=(
inexpensive,
large_screen,
apple_product,
),
)
# Check multiple products against multiple conditions
matching_products = check_products_conditions(
products=[product, product, product],
conditions=(
inexpensive,
large_screen,
apple_product,
),
)
Related
This question already has answers here:
Turn string into operator
(8 answers)
Closed 11 months ago.
I have a few python strings parsed from a file where there are several hundred conditions that i need to evaluate. How would you experts out there recommend I evaluate these conditions? For example...
["20", "<=", "17.5"] # false
["15", ">=", "18.5"] # false
["20", "==", "20"] # true
["beta", "==", "beta"] # true
["beta", "!=", "beta"] # false
I wasn't sure if there were any tricks to resolving these equations, or if i should do some sort of if else like...
op = parts[1]
if op == '<=':
return op[0] <= op[2]
elif op == '>=':
return op[0] >= op[2]
elif op == '==':
return op[0] == op[2]
elif op == '!=':
return op[0] != op[2]
You could use the eval() function:
parts = ["20", "<=", "17.5"]
print(eval(' '.join(parts))) # False
def __link_price(row: pd.Series) -> Union[None, float]:
if (row['fund'] == 'A') and (row['share_class'] == 'X'):
return df_hist.loc[row['date'], 'AA']
elif (row['fund'] == 'A') and (row['share_class'] == 'Y'):
return df_hist.loc[row['date'], 'AB']
elif (row['fund'] == 'B') and (row['share_class'] == 'X'):
return df_hist.loc[row['date'], 'BA']
elif (row['fund'] == 'B') and (row['share_class'] == 'Y'):
return df_hist.loc[row['date'], 'BB']
elif (row['fund'] == 'C') and (row['share_class'] == 'X'):
return df_hist.loc[row['date'], 'CA']
elif (row['fund'] == 'C') and (row['share_class'] == 'Y'):
return df_hist.loc[row['date'], 'CB']
else:
return 0
df.loc[:, 'price'] = df.apply(__link_price_at_purchase, axis=1).values
df has 10,000+ lines, so this code is taking a long time. In addition for each row, I'm doing a df_hist.loc call to get the value.
I'm trying to speed up this section of code and then option I've found so far is using:
df.loc[:, 'price'] = df.apply(__link_price_at_purchase, axis=1, raw=True).values
But this forces me to use index based selection for row instead of value selection:
if (row[0] == 'A') and (row[1] == 'X')
which reduces the readability of the code.
I'm looking for an approach that both speeds up the code and still allows for readability of the code.
In python, there is a certain cost for each attribute or item lookup and function call. And you don't have a compiler that optimizes things for you.
Here are some general recommendations:
Try creating a column that includes fund and share_class without using Python functions and then merge it with df_hist
# convert history from 'wide' format into 'long' format
hist = df_hist.set_index("date").stack()
prices = (
# create key column for join
df.assign(key=df["fund"] + df["share_class"].replace({"X": "A", "Y": "B"}))
.set_index(["date", "key"])
.join(hist) # join by index
)
If it's not trivial to create a key column, minimize attribute lookups inside the apply function:
def __link_price(row):
date, fund, share_class = row[["date", "fund", "share_class"]]
if fund == 'A' and share_class == 'X':
return df_hist.loc[date, 'AA']
...
Optimize if conditions. For example, you need to check 6 conditions in case where (row['fund'] == 'C') and (row['share_class'] == 'Y'). You can reduce this number to ... 1.
fund_and_share_class_to_key = {
("A", "X"): "AA",
("A", "Y"): "AB",
...
}
key = fund_and_share_class_to_key.get((fund, share_class))
return df_hist.loc[date, key] if key is not None else 0
Pandas itself is pretty slow for non-vectorized and non-arithmetic operations. In your case it's better to use standard python dicts for faster lookups.
# small benchmark
df = pd.DataFrame({"value": [4,5,6]})
d = df.to_dict(orient="index")
%timeit df.loc[1, "value"] # 8.7ms
%timeit d[1]["value"] # 50ns; ~170 times faster
# convert dataframe into the dict with format:
# {<date>: {"AA": <value>}}
history = df_hist.set_index("date").to_dict(orient="index")
def __link_price(row):
...
price = history.get(date, {}).get(key, 0)
return price
It should be faster to pass history as an apply argument rather than search it in the non-local scope. It also makes the code cleaner.
def __link_price(row, history):
...
df.apply(__link_price, args=(history, ))
To summarize, a faster function would be something like this:
history = df_hist.set_index("date").to_dict(orient="index")
# we don't need to create a mapping on every __link_price call
fund_and_share_class_to_key = {
("A", "X"): "AA",
("A", "Y"): "AB",
...
}
def __link_price(row, history, fund_and_share_class_to_key):
date, fund, share_class = row[["date", "fund", "share_class"]]
key = fund_and_share_class_to_key.get((fund, share_class))
return history.get(date, {}).get(key, 0)
df.apply(__link_price, args=(history, fund_and_share_class_to_key))
Below is the code for checks on two columns. I know this isn't the proper way of doing it on two columns of a dataframe but I was hoping to get help for doing it in a better way
for i in range(len(df)):
if df['Current_Value'][i].lower() == 'false'or df['Current_Value'][i].lower() == '0' and df['_Value'][i].lower() == 'false' or df['_Value'][i].lower() == '0':
df['CHECK'][i] = True
elif df['Current_Value'][i].lower() == 'true' or df['Current_Value'][i].lower() == '1' and df['_Value'][i].lower() == 'true' or df['_Value'][i].lower() == '1':
df['CHECK'][i] = True
elif df['Current_Value'][i].lower() in df['_Value'][i].lower():
df['CHECK'][i] = True
else:
df['CHECK'][i] = False
You should use lambda expressions for such a check. Although you haven't provided us with a sample dataset, what you could do is something like this:
First define the lambda function
def fill_check_column(current_value, value):
# Precompute this, so that it is calculated only once
current_value = current_value.lower()
value = value.lower()
if current_value in ['false', '0'] and value in ['false', '0']:
return True
elif current_value in ['true', '1'] and value in ['true', '1']:
return True
elif current_value in value:
return True
else:
return False
Then use it on the data frame:
df['Check'] = df.apply(lambda row: fill_check_column(current_value=row['Current_Value'],
value=row['_Value'],
axis=1)
You could also improve the fill_check_column to make the checks only once.
If we have comparators AND / OR in a condition string such as
A != B AND C > 100
how do I parse this string and evaluate the result so that the following code returns false
A = "foo" B = "foo" C = 99 -> eval("A != B AND C > 100") is false
I use the operator library to do the evaluation and do a naive check, but if we see AND then we need to finish the evaluation on both sides before using the AND operator. Is there a better way to do the parsing & evaluation?
import operator
ops = {
"AND": operator.and_,
"OR": operator.or_,
"==": operator.eq,
"!=": operator.ne,
"<": operator.lt,
"<=": operator.le,
">": operator.gt,
">=": operator.ge
}
result = false
content = string.split(" ")
for content in contents:
if content in ops:
a = contents[i-1]
b = contents[i+1]
result = ops[content](a, b)
contents[i+1] = result
i+=1
if result is True:
print("It is true")
I am not parsing the actual variables in this, but it should be able to do what you are asking for. It first evaluates the 0th priority, then evaluates the 1st, based on the 0th and so on. I have only tested it for a 2 priority, but I believe it should work for all. Heres how it functions.
['A', '!=', 'A', 'AND', 'B', '==', 'B']
[False, 'AND', True]
[False]
Heres the actual code:
import operator
ops = {
"AND": operator.and_,
"OR": operator.or_,
"==": operator.eq,
"!=": operator.ne,
"<": operator.lt,
"<=": operator.le,
">": operator.gt,
">=": operator.ge
}
prior = {
"AND": 1,
"OR": 1,
"==": 0,
"!=": 0,
"<": 0,
"<=": 0,
">": 0,
">=": 0
}
maxPrior = 1
def parseEval(string):
content = string.split()
for priorMode in range(maxPrior+1):
print(content)
subParse = []
subParse = []
for ind,cont in enumerate(content):
if cont in ops:
priorLev = prior[cont]
if priorLev <= priorMode:
condA = content[ind-1]
condB = content[ind+1]
subParse.append(ops[cont](condA,condB))
else:
subParse.append(cont)
content = subParse
print(content)
return subParse[0]
parseEval("A != A OR B == B")
If you want you can make ops and prior inside the function. Also sorry, I should have given the ones meaning priority a better name, as it makes it sound like another word. Sorry if that gave confusion. If you have any questions please let me know, I'm happy to help!
I am struggling with the logic for the problem below and I know that even if I nailed the logic, I would probably implement it clumsily, so any advice would be awesome.
I have a dictionary representing a file:
the_file = {'Filename':'x:\\myfile.doc','Modified':datetime(2012,2,3),'Size':32412}
I have a list of filters and I want to filter the file dictionary to determine a match.
filters = [
{'Key':'Filename','Criteria':'Contains','Value':'my'},
{'Key':'Filename','Criteria':'Does not end with','Value':'-M.txt'},
{'Key':'Modified','Criteria':'After','Value':datetime(2012,1,1)}
]
My best attempt at creating a function to do this (which doesn't work):
def is_asset(the_file, filters):
match = False
for f in filters:
if f['Key'] == u'Filename':
if f['Criteria'] == u'Contains':
if f['Value'] in the_file['Filename']:
match = True
elif f['Criteria'] == u'Starts with':
if the_file['Filename'].startswith(f['Value']):
match = True
elif f['Criteria'] == u'Ends with':
if the_file['Filename'].endswith(f['Value']):
match = True
elif not f['Criteria'] == u'Does not end with':
if the_file['Filename'].endswith(f['Value']):
match = False
elif f['Criteria'] == u'Equals':
if os.path.basename(the_file['Filename']) == f['Value']:
match = True
elif f['Criteria'] == u'Does not contain':
if f['Value'] in the_file['Filename']:
match = False
elif f['Key'] == u'Modified':
mtime = int(os.path.getmtime(the_file['Filename']))
if f['Criteria'] == u'Before':
if f['Value'] > datetime.fromtimestamp(mtime):
the_file['Modified'] = mtime
match = True
elif f['Criteria'] == u'After':
if f['Value'] < datetime.fromtimestamp(mtime):
the_file['Modified'] = mtime
match = True
elif f['Key'] == u'Size':
size = long(os.path.getsize(the_file['Filename']))
if f['Criteria'] == u'Bigger':
if f['Value'] < size:
the_file['Size'] = size
match = True
elif f['Value'] > size:
the_file['Size'] = size
match = True
if match:
return the_file
Instead of trying to do it in one megafunction, break it down into smaller steps.
filenamecondmap = {
u'Contains': operator.contains,
u'Does not end with': lambda x, y: not x.endswith(y),
...
}
...
condmap = {
u'Filename': filenamecondmap,
u'Modified': modifiedcondmap,
...
}
And then walk the structures until you have the conditional, then execute it.
condmap[u'Filename'][u'Contains'](thefile['Filename'], 'my')
You can simply use functions as your 'criteria'. This makes it much simpler, as you don't have to write a big if-else ladder. Something like this:
def contains(value, sub):
return sub in value
def after(value, dt):
return value > dt
filters = [
{'Key': 'FileName', 'Criteria': contains, 'Value': 'my'},
{'Key': 'Modified', 'Criteria': after, 'Value': datetime(2012,1,1)}
]
for f in filters:
if f['Criteria'](filename[f['Key']], f['Value']):
return True