CSV is formatted as:
Dataframe is:
I am trying to achieve a if conditions. But it executes the else block and outcomes are always "Value3".Where I am going wrong?
Add strip as given below:
def validate(row):
if row['TRANSACTION DESC'].strip()=='JWPFMAIN':
val="Value1"
elif row['TRANSACTION CD'].strip()=='':
val="Value2"
else:
val="Value3"
return val
dfwithcolumns['Status'] = dfwithcolumns.apply(validate, axis=1)
Try to use elif instead of the second if. Because then if the first one is true but the second if statement is false then the val would default to value3. Also make sure that for the second if statement that it is a space, because it could also be a '' empty string.
Related
If I have a list of strings such as the following:
"apple.test.banana", "test.example","example.example.test".
Is there a way to return only "test.banana" and "example.test"?
I need to check and see if there are two dots, and if there are, return only the value described above.
I attempted to use:
string="apple.test.banana"
dot_count=0
for i in string:
if i==".":
dot_count=dot_count+1
if dot_count>1:
string.split(".")[1]
But this appears to only return the string "test".
Any advice would be greatly appreciated. Thank you.
You are completely right, except for the last line, which sould say '.'.join(string.split(".")[1:]).
Also, instead of the for loop, you can just use .count(): dot_count = string.count('.') (this doesn't affect anything, just makes your code easier to read)
So the program becomes:
string = "apple.test.banana"
dot_count = string.count('.')
if dot_count > 1:
print('.'.join(string.split(".")[1:]))
Which outputs: test.banana
I'm not sure I understand what they're asking me to do here, so this is my attempt at doing it.
a='Swim'
b='Run'
if a!=b:
my_boolean = a!=b
print (my_boolean)
the excercise is only asking you to save the value of 'a!=b' in a variable.
However this should help you to understand the code better; you should save 'a != b' only once and then use 'my_boolean' every time you need it, but your code only prints true, because if 'my_boolean' is false, try this:
a = 'Swim'
b = 'Run'
my_boolean = a != b
print(my_boolean)
if my_boolean:
print('printing result: ' + str(my_boolean))
Let's get through it one-by-one. You are trying to compare two strings. If they are exactly the same you should get True otherwise you get False. Doing so would result in a Boolean or bool value.
So, your solution is inverted. It is expected to be:
a='Swim'
b='Run'
my_boolean = (a==b) # This value is boolean already. You can remove the brackets too. I put them for clarity
print (str(my_boolean)) # It works without saying str(), but I did that to show you that the value is converted from the type `bool` into the type String `str`.
My python code doesn't return what I expect it to do and I hope you can help me.
I have a dataset that consists of a long list of venues in a city and a column with the venue category type (e.g. 'italian restaurant'). Now I'd like to make an additional column to my dataframe with the broader category group ('eating') based on a list with strings to search on.
I hoped to see the following outputs in my dataframe with these four example venue categories:
'italian restaurant' → 'eating'
'bed & breakfast' → 'sleeping'
'museum of modern art' → 'sightseeing'
'gym' → 'other'
I tried to solve it with the following:
sleeping = ['bed','hostel','hotel']
eating = ['bar','bistro','cafe','pub','restaurant']
sightseeing = ['museum','theater','zoo']
def catgroup(cat):
for cat in df['venue_cat']:
if any(s in cat for s in sleeping):
return 'sleeping'
elif any(s in cat for s in eating):
return 'eating'
elif any(s in cat for s in sightseeing):
return 'sightseeing'
else:
return 'other'
Followed by
df['cat_group'] = df['venue_cat'].apply(catgroup)
Unfortunately, all venues return the same category from the first elif statement: eating.
I know it's the first elif statement because if I change the order of the elifs (eating vs sightseeing) I only get: sightseeing
Would love to hear your solutions to this issue, because I just don't see it
Remove the for loop in the function, so that:
def catgroup(cat):
if any(s in cat for s in sleeping):
return 'sleeping'
elif any(s in cat for s in shopping):
return 'shopping'
elif any(s in cat for s in eating):
return 'eating'
elif any(s in cat for s in sightseeing):
return 'sightseeing'
else:
return 'other'
df['cat_group'] = df['venue_cat'].apply(catgroup)
You are probably not using the cat parameter of the function correctly. I would expect that apply is called multiple times (once per row) so the cat parameter already contains the value you want to check (or a single element array with that value). By using a for on the df, you are actually basing the result on the first row of the whole dataframe and responding with the same value for all calls to your function.
Within the function, your code is similar to a switch statement (which Python doesn't have but can easily be simulated).
To simulate a plain vanilla switch statement I usually define a helper function like this:
def switch(v): yield lambda *c:v in c
Which is used in a one pass for-in statement:
x = 3
for case in switch(x):
if case(1): return "one"
if case(2,4): return "even"
if case(3): return "three"
In this case the comparison condition is a little different and would benefit from using a regular expression instead of 's in cat'. So lets define a wordSwitch() helper function that looks for whole word patterns:
import re
def wordSwitch(v): yield lambda *c: any(re.search(r'\b('+w+')\b',v) for w in c)
Your code could then look like this:
def catGroup(cat):
for case in wordSwitch(cat): # could need to be cat[0]
if case(*sleeping): return "sleeping"
if case(*eating): return "eating"
if case(*sightseeing): return "sightseeing"
return "other"
Note that, although i'm not familiar with .apply(), I believe it receives the field (or row) value directly so you don't need to (and probably must not) get data from df['..']. You should try printing the value of cat that the function receives to be sure.
You could also place the word lists directly in the case() parts:
for case in wordSwitch(cat):
if case('bed','hostel','hotel'): return "sleeping"
if case('bar','bistro','cafe','pub','restaurant'): return "eating"
if case('museum','theater','zoo'): return "sightseeing"
return "other"
I have question here:
How do I compare variable that has string+decimal in Python.
Example :
a = "2.11.22-abc-def-ghi"
if a == (2.11*) :
print "ok"
I want it only compare the first 2 decimal point only and it doesn't care the rest of it value. How can I do that?
Thanks
Here's the most direct answer to your question, I think...a way to code what your pseudocode is getting at:
a = "2.11.22-abc-def-ghi"
if a.startswith("2.11"):
print("ok")
If you want to grab the numeric value off the front, turn it into a true number, and use that in a comparison, no matter what the specific value, you could do this:
import re
a = "2.11.22-abc-def-ghi"
m = re.match(r"(\d+\.\d+).*", a)
if m:
f = float(m.group(1))
if (f == 2.11):
print("ok")
If you want to compare part of a string, you can always slice it with the syntax str[start_index: end_index] and then compare the slice. Please note the start_index is inclusive and end_index is exclusive. For example
name = "Eric Johnson"
name[0:3] #value of the slice is "Eri" no "Eric".
in your case, you can do
if a[0:4] == "2.11":
#stuff next
I am trying to compare values from 2 Dictionaries in Python. I want to know if a value from one Dictionary exists anywhere in another Dictionary. Here is what i have so far. If it exists I want to return True, else False.
The code I have is close, but not working right.
I'm using VS2012 with Python Plugin
I'm passing both Dictionary items into the functions.
def NameExists(best_guess, line):
return all (line in best_guess.values() #Getting Generator Exit Error here on values
for value in line['full name'])
Also, I want to see if there are duplicates within best_guess itself.
def CheckDuplicates(best_guess, line):
if len(set(best_guess.values())) != len(best_guess):
return True
else:
return False
As error is about generator exit, I guess you use python 3.x. So best_guess.values() is a generator, which exhaust for the first value in line['full name'] for which a match will not be found.
Also, I guess all usage is incorrect, if you look for any value to exist (not sure, from which one dictinary though).
You can use something like follows, providing line is the second dictionary:
def NameExists(best_guess, line):
vals = set(best_guess.values())
return bool(set(line.values()).intersection(vals))
The syntax in NameExists seems wrong, you aren't using the value and best_guess.values() is returning an iterator, so in will only work once, unless we convert it to a list or a set (you are using Python 3.x, aren't you?). I believe this is what you meant:
def NameExists(best_guess, line):
vals = set(best_guess.values())
return all(value in vals for value in line['full name'])
And the CheckDuplicates function can be written in a shorter way like this:
def CheckDuplicates(best_guess, line):
return len(set(best_guess.values())) != len(best_guess)