How to use series. apply() to create conditional pandas series? - python

I am trying to create a new column in my df using numerical data from another column. I attempted using a for loop and a series of if statements to categorize the numerical data into strings that I want to now use to create the new column. The following data is from the WNBA 2010-2011 dataset about the players.
def clean(col):
for xp in col:
if xp < 1:
print('Rookie')
elif ((xp >= 1) and (xp <= 3)):
print('Little experience')
elif ((xp >= 4) and (xp <= 5)):
print('Experienced')
elif ((xp > 5) and (xp < 10)):
print('Very experienced')
elif (xp > 10):
print("Veteran")
I tried using series.apply() and series.map() but both of these return a new column called XP as follows
XP = df.Experience.apply(clean)
df['XP'] = XP
However, when I checked the dtypes it says that the newly created column is a NONETYPE object. Is this because I am using the print function in the for loop as opposed to manipulating the actual value? If so what should I do to return the string values specified?
Thanks in advance for the help.

df = pd.DataFrame({'xp':[0,2,4,6,20,'4']})
Put in a string because you had the type error.
def clean(str_xp):
xp = int(str_xp)
if xp < 1:
return('Rookie')
elif ((xp >= 1) and (xp <= 3)):
return('Little experience')
elif ((xp >= 4) and (xp <= 5)):
return('Experienced')
elif ((xp > 5) and (xp < 10)):
return('Very experienced')
elif (xp > 10):
return ("Veteran")
df['rank'] = df['xp'].apply(clean)
df returns:
xp rank
0 0 Rookie
1 2 Little experience
2 4 Experienced
3 6 Very experienced
4 20 Veteran
5 4 Experienced

That's because your function doesn't return anything (so returns None by default). You need to replace those print statements with return.
Also, you don't need to loop over the column in your function - apply does that for you in a vectorized way. Try this:
def clean(xp):
if xp < 1:
return 'Rookie'
elif ((xp >= 1) and (xp <= 3)):
return 'Little experience'
elif ((xp >= 4) and (xp <= 5)):
return 'Experienced'
elif ((xp > 5) and (xp < 10)):
return 'Very experienced')
elif (xp > 10):
return "Veteran"
df['XP'] = df.Experience.apply(clean)
Bear in mind also that the way your equalities are currently written, your function will return None if xp == 10.

Related

How to understand complex lists in python

Sorry this will be a very basic question, I am learning python.
I went through a coding exercise to calculate bmi and went for a straightforward way:
def bmi(weight, height):
bmi = weight / height ** 2
if bmi <= 18.5:
return "Underweight"
elif bmi <= 25:
return "Normal"
elif bmi <= 30:
return "Overweight"
else:
return "Obese"
However, in the exercise solutions I also see this one:
def bmi(weight, height):
b = weight / height ** 2
return ['Underweight', 'Normal', 'Overweight', 'Obese'][(b > 30) + (b > 25) + (b > 18.5)]
I want to understand what this double/back-to-back list is where they've got [items][conditions] but I can't find the name of it to learn about it - what is the name for this? Is it a part of list comprehensions?
observe this line carefully
['Underweight', 'Normal', 'Overweight', 'Obese'][(b > 30) + (b > 25) + (b > 18.5)]
Above is line is actually list indexing [(b > 30) + (b > 25) + (b > 18.5)] this gives the index of the list ['Underweight', 'Normal', 'Overweight', 'Obese']. Let us say b > 30 then it satisfies all the three conditions (b > 30) + (b > 25) + (b > 18.5) the equivalent boolean value of each condition is 1 making the sum 3 and returns index 3 which is Obese. Similarly it works for other conditions.

Python - assigning categories based on value range

Solved
I am trying to figure out why my solution is wrong. Made both, second is correct. I like the first better, because the intervals are more manageable and pleasing for my bad programming brain. And quite frankly, I am kind of lost as to what happens from pH 8 (Neutral) and up (solution 2). Therefore I would like to continue to work in this style for future assignment, rather than solution 2 (correct). However, solution 1 that I prefer is wrong and returns weakly though it should be strongly acidic Why is that and how can this be fixed?
Def: assigning category based on pH (2.3)
pH & category
0–2 Strongly acidic
3–5 Weakly acidic
6–8 Neutral
9–11 Weakly basic
12–14 Strongly basic
Anything else falls in pH out of range
def pH2Category(pH):
if pH < 0 or pH > 14:
category = "pH out of range"
elif (pH >=0) and (pH < 3):
category = "Strongly acidic"
elif (pH >=3) and (pH < 6):
category = "Weakly acidic"
elif (pH >=5) and (pH < 9):
category = "Neutral"
elif (pH >=9) and (pH < 12):
category = "Weakly basic"
else:
category = "Strongly basic"
return category
print(pH2Category(2.3))
def pH2Category(pH):
if pH < 0 or pH > 14:
category = "pH out of range"
elif pH < 3:
category = "Strongly acidic"
elif pH < 6:
category = "Weakly acidic"
elif pH <= 8:
category = "Neutral"
elif pH <= 11:
category = "Weakly basic"
else:
category = "Strongly basic"
return category
print(pH2Category(2.3))
If you want to stick to a pattern similar to what you know, the following should work:
def pH2Category(pH):
if pH < 0 or pH > 14:
return "pH out of range"
elif (pH >= 0) and (pH < 3):
return "Strongly acidic"
elif (pH >= 3) and (pH < 6):
return "Weakly acidic"
elif (pH >= 6) and (pH < 9):
return "Neutral"
elif (pH >= 9) and (pH < 12):
return "Weakly basic"
else:
return "Strongly basic"
print(pH2Category(2.3))
Please note the above will a) return 'pH out of range', so if you only want that printed and not returned, replace return in the first if with a print statement, and b) you had overlapping ranges in 5-6 as #quqa mentioned in the comment, so I fixed that.
There are many ways to do what you did but here is one that can help you start with python and make your life easier
# dictionary of (min_value, max_value): 'Description of Ph'
# just fill out for your needs
table = {
(0, 3): 'Strongly Acidic',
(3, 6): 'Wakly Acidic',
(6, 9): 'Neutral',
...
}
def check_ph(ph):
for ph_range, ph_description in table.items():
# for every pair that you see in table
if ph_range[0] <= ph < ph_range[1]:
# if ph is greater or eqaual than min_value
# and lesser than max_value
print(ph_description)
# return the ph description
return ph_description
# if the value is outside the table range just print it out
print('ph out of range')
If you want something that is scalable you should consider a Segment Tree (log-speed lookups).
Doing wasteful comparisons is cheap for something with only a few categories, but even with your 5 categories you could be doing 10 comparisons when 3-4 would do.
https://en.wikipedia.org/wiki/Segment_tree
You might be interested in portion, a library I wrote that provides structure and operations for intervals. It's distributed on PyPI, so you can easily install it with pip install portion.
Among other, it provides an IntervalDict class that acts like a Python dict but accepts ranges as keys.
Applied on your example:
>>> import portion as P
>>> d = P.IntervalDict()
>>> d[P.open(-P.inf, P.inf)] = 'pH out of range'
>>> d[P.closed(0, 2)] = 'Strongly acidic'
>>> d[P.closed(3, 5)] = 'Weakly acidic'
>>> ...
>>> d[1]
'Strongly acidic'
>>> d[300]
'pH out of range'

Python - problem with changing values to groups

I have a dataset that has different attributes. One of these attributes is temperature. My temperature range is from about -30 to about 30 degrees. I want to do a machine learning study and I wanted to group the temperature into different groups. On a principle: below -30: 0, -30 to -10: 1 and so on. I wrote the code below, but it doesn't work the way I want it to. The data type is: int32, I converted it with float64.
dane = [treningowy_df]
for zbior in dane:
zbior['temperatura'] = zbior['temperatura'].astype(int)
zbior.loc[ zbior['temperatura'] <= -30, 'temperatura'] = 0
zbior.loc[(zbior['temperatura'] > -30) & (zbior['temperatura'] <= -10), 'temperatura'] = 1
zbior.loc[(zbior['temperatura'] > -10) & (zbior['temperatura'] <= 0), 'temperatura'] = 2
zbior.loc[(zbior['temperatura'] > 0) & (zbior['temperatura'] <= 10), 'temperatura'] = 3
zbior.loc[(zbior['temperatura'] > 10) & (zbior['temperatura'] <= 20), 'temperatura'] = 4
zbior.loc[(zbior['temperatura'] > 20) & (zbior['temperatura'] <= 30), 'temperatura'] = 5
zbior.loc[ zbior['temperatura'] > 30, 'temperatura'] = 6
For example: before the code is executed, record 1 has a temperature: -3, and after the code is applied, record 1 has a temperature: 3. why? A record with a temperature before a change: 22 after the change: 5, i.e. the assignment was executed correctly.
it looks like you're manipulating a dataframe. have you tried using the apply function?
Personally I would go about this as such (in fact, with a new column).
1. Write a function to process the value
def _check_temperature_range(x):
if x <= -30:
return 0
elif x <= -10:
return 1
# so on and so forth...
2. Apply the function onto the column of the dataframe
df[new_column] = df[column].apply(lambda x: _check_temperature_range(x))
The results should then be reflected in the new_column or old column should you use back the same column
I think your code is applying multiple times on the same row.
With you're exemple with the first line :
temp = -3 gives 2
but then temp = 2 gives 3
So I recommend to create a new column in your dataframe
I believe it has to do with the sequence of your code.
A record with temperature -3, gets assigned as 2 -
zbior.loc[(zbior['temperatura'] > -10) & (zbior['temperatura'] <= 0), 'temperatura'] = 2
Then in the next line, it is found again as being between 0 and 10, and so assigned again as 3 -
zbior.loc[(zbior['temperatura'] > 0) & (zbior['temperatura'] <= 10), 'temperatura'] = 3
One solution is to assign a number that doesn't make you "jump" a category.
So, for -3, I'd assign 0 so it sticks around.
After that you can do another pass, and change to the actual numbers you wanted, eg 0->3 etc.
If zbior is a pandas.DataFrame, you can use the map function
def my_func(x):
if x <= -30:
return 0
elif x <= -10:
return 1
elif x <= 0:
return 2
elif x <= 10:
return 3
elif x <= 20:
return 4
elif x <= 30:
return 5
else:
return 6
zbior.temperatura=zbior.temperatura.map(my_func)

Python - How to use a float in a "if x in range" statement

I'm trying to write an if statement that takes a float as a range.
Thank you.
x = 8.2
if x in range(0, 4.4):
print('one')
if x in range(4.5, 8):
print('two')
if x in range(8.1, 9.9):
print('three')
if x > 10:
print('four')
I have also tried this to no avail
if x > 0 <= 4.4
Use
if 0 < x <= 4.4:
where x is in the middle. It's equivalent to
if 0 < x and x <= 4.4:
range is not suitable for that task.
You don't need range(). Just use comparisons, and use elif so that the ranges are exclusive.
if x < 4.5:
print('one')
elif x < 8:
print('two')
elif x < 10:
print('three')
else:
print('four')
This also solves the problem that you had gaps between your ranges.
x = 8.3
if 0 <= x <= 4.4:
print('one')
if 4.5 <= x <= 8:
print('two')
if 8.1 <= x <= 9.9:
print('three')
if x > 10:
print('four')

Python - Multiple Elif Statements Being Skipped Over in If Statement

I'm trying to get my program to return different statements based off the hour the user inputs. If I input a number for hour between he first two statements ((hours < 6) and (hours <= 10) or (hours >= 6)), it will return the correct string but if I input anything greater than 10 for the hour, it won't return the intended string for that hour but it will keep repeating the second string.
Any help is appreciated!
Here's my program:
https://i.stack.imgur.com/uQzBi.png
def food(hours, boolean):
if boolean == "True" or boolean == "true":
if (hours < 6):
return "no food"
elif (hours <= 10) or (hours >= 6):
return "breakfast, marmalade"
elif (hours <= 15) or (hours >= 11):
return "lunch, true,dessert"
elif (hours < 22) or (hours >= 15):
return "dinner, dessert"
else:
return "no food"
else:
if (hours < 6):
return "no food"
elif (hours <= 10) or (hours >= 6):
return "breakfast,coffee"
elif (hours <= 15) or (hours >= 11):
return "lunch, false"
elif (hours < 22) or (hours >= 15):
return "dinner"
else:
return "no food"
x = food(15, "true")
print(x)
You should be using “and” instead of “or”. Anything > 10 will also be >= 6 so the second condition always matches.
Python have boolean value True and False. Their is no need to use strings 'True' or 'False'. You can also use the power of if-elif-else logic. Python executes from top to bottom, when condition is met, it breaks. Your function can be rewritten to this:
def food(hour, boolean):
'''Food
Takes in hour as int and boolean as bool
E.g. x = food(15,True)
# TODO:
Ensure that input data types are correct.
'''
if boolean:
if hour >= 22 or hour >= 0:
return 'no food'
elif hour >= 15:
return 'dinner, dessert'
elif hour >= 11:
return 'lunch, true,dessert'
elif hour >= 6:
return 'breakfast, marmalade'
else:
raise ValueError('something wrong')
else:
if hour >= 22 or hour >= 0:
return 'no food'
elif hour >= 15:
return 'dinner'
elif hour >= 11:
return 'lunch, false'
elif hour >= 6:
return 'breakfast, coffee'
else:
raise ValueError('something wrong')
x = food(15, True)
print(x)
Looks like the first elif statement is your problem. You should use and instead of or. By using or, anything >= 6 will return breakfast marmalade, not just anything between 6 and 10.
Welcome to StackOverflow! As mentioned by the other answers, using 'and' instead of 'or' would solve your issue. However, it is redundant to include more than one condition for each meal if they are all sequential as by writing:
if (hours < 6):
return "no food"
you are already saying to only output the return value if the hour input is less than 6, hence only values more than 6 would make it to the next elif statement.
Do let me know if I misunderstood something about your program's use case that required you to write the code as such!

Categories

Resources