Group variable based on list with strings - Python

Group variable based on list with strings - Python - python

My python code doesn't return what I expect it to do and I hope you can help me.
I have a dataset that consists of a long list of venues in a city and a column with the venue category type (e.g. 'italian restaurant'). Now I'd like to make an additional column to my dataframe with the broader category group ('eating') based on a list with strings to search on.
I hoped to see the following outputs in my dataframe with these four example venue categories:
'italian restaurant' → 'eating'
'bed & breakfast' → 'sleeping'
'museum of modern art' → 'sightseeing'
'gym' → 'other'
I tried to solve it with the following:
sleeping = ['bed','hostel','hotel']
eating = ['bar','bistro','cafe','pub','restaurant']
sightseeing = ['museum','theater','zoo']
def catgroup(cat):
for cat in df['venue_cat']:
if any(s in cat for s in sleeping):
return 'sleeping'
elif any(s in cat for s in eating):
return 'eating'
elif any(s in cat for s in sightseeing):
return 'sightseeing'
else:
return 'other'
Followed by
df['cat_group'] = df['venue_cat'].apply(catgroup)
Unfortunately, all venues return the same category from the first elif statement: eating.
I know it's the first elif statement because if I change the order of the elifs (eating vs sightseeing) I only get: sightseeing
Would love to hear your solutions to this issue, because I just don't see it

Remove the for loop in the function, so that:
def catgroup(cat):
if any(s in cat for s in sleeping):
return 'sleeping'
elif any(s in cat for s in shopping):
return 'shopping'
elif any(s in cat for s in eating):
return 'eating'
elif any(s in cat for s in sightseeing):
return 'sightseeing'
else:
return 'other'
df['cat_group'] = df['venue_cat'].apply(catgroup)

You are probably not using the cat parameter of the function correctly. I would expect that apply is called multiple times (once per row) so the cat parameter already contains the value you want to check (or a single element array with that value). By using a for on the df, you are actually basing the result on the first row of the whole dataframe and responding with the same value for all calls to your function.
Within the function, your code is similar to a switch statement (which Python doesn't have but can easily be simulated).
To simulate a plain vanilla switch statement I usually define a helper function like this:
def switch(v): yield lambda *c:v in c
Which is used in a one pass for-in statement:
x = 3
for case in switch(x):
if case(1): return "one"
if case(2,4): return "even"
if case(3): return "three"
In this case the comparison condition is a little different and would benefit from using a regular expression instead of 's in cat'. So lets define a wordSwitch() helper function that looks for whole word patterns:
import re
def wordSwitch(v): yield lambda *c: any(re.search(r'\b('+w+')\b',v) for w in c)
Your code could then look like this:
def catGroup(cat):
for case in wordSwitch(cat): # could need to be cat[0]
if case(*sleeping): return "sleeping"
if case(*eating): return "eating"
if case(*sightseeing): return "sightseeing"
return "other"
Note that, although i'm not familiar with .apply(), I believe it receives the field (or row) value directly so you don't need to (and probably must not) get data from df['..']. You should try printing the value of cat that the function receives to be sure.
You could also place the word lists directly in the case() parts:
for case in wordSwitch(cat):
if case('bed','hostel','hotel'): return "sleeping"
if case('bar','bistro','cafe','pub','restaurant'): return "eating"
if case('museum','theater','zoo'): return "sightseeing"
return "other"

Related

DataFrame Panda: if-elif.. if block is not picked up

CSV is formatted as:
Dataframe is:
I am trying to achieve a if conditions. But it executes the else block and outcomes are always "Value3".Where I am going wrong?

Add strip as given below:
def validate(row):
if row['TRANSACTION DESC'].strip()=='JWPFMAIN':
val="Value1"
elif row['TRANSACTION CD'].strip()=='':
val="Value2"
else:
val="Value3"
return val
dfwithcolumns['Status'] = dfwithcolumns.apply(validate, axis=1)

Try to use elif instead of the second if. Because then if the first one is true but the second if statement is false then the val would default to value3. Also make sure that for the second if statement that it is a space, because it could also be a '' empty string.

How to pass a "Take All" parameter in pandas loc filter condition?

I have a function with a parameter (in this case: "department") to filter (df.loc[(df['A'] == department) specific data out of my dataset. In one case, I want to use this specific function but instead of filtering the data, I want to get all the data.
Is there a way to pass a parameter which would result in something like
df.loc[(df['A'] == *) or
df.loc[(df['A'] == %)
# Write the data to the table
def table_creation(table, department, status):
def condition_to_value(df, kpi):
performance_indicator = df.loc[(df['A'] == department) & (df['C'] == kpi) & (df['B'] == status), 'D'].values[0]
return performance_indicator

One way I could think of is, instead of using: df['A'] == 'department' you can use df['A'].isin(['department']). The two yield the same result.
Once you do that, then you can pass the "Take All" parameter like so:
df['A'].isin(df['A'].unique())
where df['A'].unique() is a list all the unique paratemres in this column, so it will return all True.
Or you can pass multiple parameters like so:
df['A'].isin(['department', 'string_2', 'string_3']))

Building over Newskooler's answer, as you know the name of the column you'll be searching over, you could add his solution inside the function and process '*' accordingly.
It would look something like this:
# Write the data to the table
def table_creation(table, department, status):
def condition_to_value(df, kpi):
# use '*' to identify all departments
if isinstance(department, str) and department=='*':
department = df['A'].isin(df['A'].unique())
# make the function work with string or list inputs
if isinstance(department, str):
department = [department, ]
# notice the addition of the isin as recommended by Newskooler
performance_indicator = df.loc[(df['A'].isin(department)) & (df['C'] == kpi) & (df['B'] == status), 'D'].values[0]
return performance_indicator
I realize there are missing parts here, as they are also in the initial question, but this changes should work without having to change how you call your function now, but will include the benefits listed in the previous answer.

I don't think you can do it by passing a parameter like in an SQL query. You'll have to re-write your function a bit to take this condition into consideration.

Substitute for case statement in Python checking for one of three substrings in a string

Consider the following string:
payments = 'cash:yes,square_cash:no,venmo:no'
If "cash:yes" is found in the string, I would like to return "cash." If "square_cash:yes" is found in the string, I'd like to return "Square Cash" and so on.
I think I'm close, but can't quite figure it out. Here's my code:
payments = 'cash:yes,square_cash:no,venmo:no'
def get_payment_type(x):
return {
x.find('cash:yes') !=-1: 'Cash',
x.find('square_cash:yes') !=-1: 'Square Cash',
x.find('venmo:yes') !=-1: 'Venmo'
}.get(x, 'not found') # default if x not found
return {'payment_used': get_payment_type(payments) }
This always returns "not found", so I know my syntax is off, just not sure where.

Your specific error here is reversing the dictionary keys and values.
dict.get looks up a key and the key is first in the dict syntax:
{"key": "value"}
So if you reverse the keys and values in your answer, it could work.
However, I would recommend a number of changes:
Use if/else and return instead of trying to be clever with dicts. Much easier to read
use x in y instead of y.find(x) != -1
instead of matching strings, a more robust, nicer and more general method is parsing the string into a dictionary.
Here is an example using if, else instead:
if "square_cash:yes" in payments:
return "square_cash"
elif "cash:yes" in payments:
return "Cash"
elif "venmo:yes" in payments:
return "Venmo"
else:
return "not found"
Here is a quick sketch of how parsing this into a dictionary could look:
result = {}
for element in payments.split(","):
key, value = element.split(":")
result[key] = value

To handle arbitrary input, use regex to capture the desired payment type and then if the type exists in the full payment string, capitalize the parts found by re.findall:
import re
payment_types = {'cash:yes,square_cash:no,venmo:no':"cash:yes", 'cash:yes,square_cash:yes,venmo:no':"square_cash:yes"}
final_type = {a:' '.join(i.capitalize() for i in re.findall('^[a-zA-Z_]+(?=:)', b)[0].split('_')) if b in a else None for a, b in payment_types.items()}
Output:
{'cash:yes,square_cash:yes,venmo:no': 'Square Cash', 'cash:yes,square_cash:no,venmo:no': 'Cash'}

Split the string by comma and then use you can use in to check if string is list.
Ex:
payments = 'cash:no,square_cash:yes,venmo:no'.split(",")
if "cash:yes" in payments:
print('Cash')
elif "square_cash:yes" in payments:
print("square_cash")
elif "venmo:yes" in payments:
print("Venmo")
else:
print("not found")

Is it possible to find out which condition is being met in a multi condition if then statement?

I'm working on a python/beautifulsoup web scraper. I'm searching for certain keywords, my statement looks like this:
if 'tribe' in entry or 'ai1ec' in entry or 'tfly' in entry:
print('Plugin Found!')
rating = easy
sheet.cell(row=i, column=12).value = rating
What I would like to do is find out which one of those keywords is making the statement true. My first instinct would be to write a nested loop to check but I wasn't sure if there was a way to capture the value that makes the statement true that would involve less code?

I would use a generator comprehension that I would pass to next with a default value. If the comprehension doesn't find anything, next returns the default value, else it returns the first found and stops there (kind of any, but which stores the result)
cases = ['tribe','allec','tfly']
entry = 'iiii allec rrrr'
p = next((x for x in cases if x in entry),None)
if p is not None:
print('Plugin Found!',p)

[EDIT: changed to only find first name]
for name in ('tribe', 'ailec', 'tfly'):
if name in entry:
print ('Name =', name)
print('Plugin Found!')
rating = easy
sheet.cell(row=i, column=12).value = rating
break

Check if Dictionary Values exist in a another Dictionary in Python

I am trying to compare values from 2 Dictionaries in Python. I want to know if a value from one Dictionary exists anywhere in another Dictionary. Here is what i have so far. If it exists I want to return True, else False.
The code I have is close, but not working right.
I'm using VS2012 with Python Plugin
I'm passing both Dictionary items into the functions.
def NameExists(best_guess, line):
return all (line in best_guess.values() #Getting Generator Exit Error here on values
for value in line['full name'])
Also, I want to see if there are duplicates within best_guess itself.
def CheckDuplicates(best_guess, line):
if len(set(best_guess.values())) != len(best_guess):
return True
else:
return False

As error is about generator exit, I guess you use python 3.x. So best_guess.values() is a generator, which exhaust for the first value in line['full name'] for which a match will not be found.
Also, I guess all usage is incorrect, if you look for any value to exist (not sure, from which one dictinary though).
You can use something like follows, providing line is the second dictionary:
def NameExists(best_guess, line):
vals = set(best_guess.values())
return bool(set(line.values()).intersection(vals))

The syntax in NameExists seems wrong, you aren't using the value and best_guess.values() is returning an iterator, so in will only work once, unless we convert it to a list or a set (you are using Python 3.x, aren't you?). I believe this is what you meant:
def NameExists(best_guess, line):
vals = set(best_guess.values())
return all(value in vals for value in line['full name'])
And the CheckDuplicates function can be written in a shorter way like this:
def CheckDuplicates(best_guess, line):
return len(set(best_guess.values())) != len(best_guess)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Group variable based on list with strings - Python - python

Related

DataFrame Panda: if-elif.. if block is not picked up

How to pass a "Take All" parameter in pandas loc filter condition?

Substitute for case statement in Python checking for one of three substrings in a string

Is it possible to find out which condition is being met in a multi condition if then statement?

Check if Dictionary Values exist in a another Dictionary in Python

Categories

Resources