Would python intrepet this:
if hour < 7 and hour > 0 or hour > 20 and hour < 23:
the same as
if 7 > hour > 0 or 23 > hour > 20 (this one is just the usual mathematical inequality)
if not then what should I write to tell python this inequality?
You can use comparison chaining.
if (0 < hour < 7) or (20 < hour < 23):
# do stuff
(Parenthesis for emphasis.)
You can always use parentheses to be sure:
if (hour < 7 and hour > 0) or (hour > 20 and hour < 23):
Related
imagine i have a column days
days
190
567
55
I want to create a new column based on the condition that
df['new_colum'] =
if day < 180:
print(y)
elif( days >180 & < 365):
print(d)
else:
print(h)
how do i do this in python and any alternative for if condition
You can use df.apply:
df['new_col'] = df.days.apply(lambda x: y if x < 180 else (d if 180 < x < 365 else h))
day = 1
month = 3
for x in range(3):
while day <= 31:
print(str(month)+"/"+str(day)+"/2019'")
day += 1
month += 1
I am trying to print the first 31 days from March-May. (I know April has 30 days, but I am not concerned with that.
The while loop works, printing out the first 31 days in march. The code does not loop through 2 more times and increment the month to sequentially print out the 31 days for April and May.
I am used to Java for loops and I am not familiar iterating over an nondeclared variable.
You need to reset the day back to one inside the outer loop otherwise day stays at 31 the second time through the loop. You can do this by moving the assignment inside the loop:
month = 3
for x in range(3):
day = 1
while day <= 31:
print(str(month)+"/"+str(day)+"/2019'")
day += 1
month += 1
Having said that, it's easier just to use for loops:
for month in range(3, 6):
for day in range(1, 32):
print(f"{month}/{day}/2019")
The only problem is that the variable day is not being reseted once a new month starts. You need to move the declaration day=1 inside the for-loop:
month = 3
for x in range(3):
day = 1
while day <= 31:
print(str(month)+"/"+str(day)+"/2019'")
day += 1
month += 1
I m trying to create a bool with multiple conditions on a datetimeindex. Here is my example:
df = pd.DataFrame(index=pd.date_range('2020-05-24', '2020-05-26', freq='1H', closed='left'))
mybool = np.logical_and(df.index.weekday < 5 , df.index.hour > 7 , df.index.hour < 20)
So mybool should be True for Mon-Friday for 12 hrs from 8am to 8pm. However this returns true on Mon-Fri from 8am to midnight. So it looks like the first two conditions are picked up but the third is not. But this also returns no error.
Chain your conditions with bitwise ands:
(df.index.weekday < 5) & (df.index.hour > 7) & (df.index.hour < 20)
Note that np.logical_and expects only two input arrays x1, x2. An alternative would be to use np.logical_and.reduce on a list of conditions:
np.logical_and.reduce([df.index.weekday < 5, df.index.hour > 7, df.index.hour < 20])
I am following the suggestions here pandas create new column based on values from other columns but still getting an error. Basically, my Pandas dataframe has many columns and I want to group the dataframe based on a new categorical column whose value depends on two existing columns (AMP, Time).
df
df['Time'] = pd.to_datetime(df['Time'])
#making sure Time column read from the csv file is time object
import datetime as dt
day_1 = dt.date.today()
day_2 = dt.date.today() - dt.timedelta(days = 1)
def f(row):
if (row['AMP'] > 100) & (row['Time'] > day_1):
val = 'new_positives'
elif (row['AMP'] > 100) & (day_2 <= row['Time'] <= day_1):
val = 'rec_positives'
elif (row['AMP'] > 100 & row['Time'] < day_2):
val = 'old_positives'
else:
val = 'old_negatives'
return val
df['GRP'] = df.apply(f, axis=1) #this gives the following error:
TypeError: ("Cannot compare type 'Timestamp' with type 'date'", 'occurred at index 0')
df[(df['AMP'] > 100) & (df['Time'] > day_1)] #this works fine
df[(df['AMP'] > 100) & (day_2 <= df['Time'] <= day_1)] #this works fine
df[(df['AMP'] > 100) & (df['Time'] < day_2)] #this works fine
#df = df.groupby('GRP')
I am able to select the proper sub-dataframes based on the conditions specified above, but when I apply the above function on each row, I get the error. What is the correct approach to group the dataframe based on the conditions listed?
EDIT:
Unforunately, I cannot provide a sample of my dataframe. However, here is simple dataframe that gives an error of the same type:
import numpy as np
import pandas as pd
mydf = pd.DataFrame({'a':np.arange(10),
'b':np.random.rand(10)})
def f1(row):
if row['a'] < 5 & row['b'] < 0.5:
value = 'less'
elif row['a'] < 5 & row['b'] > 0.5:
value = 'more'
else:
value = 'same'
return value
mydf['GRP'] = mydf.apply(f1, axis=1)
ypeError: ("unsupported operand type(s) for &: 'int' and 'float'", 'occurred at index 0')
EDIT 2:
As suggested below, enclosing the comparison operator with parentheses did the trick for the cooked up example. This problem is solved.
However, I am still getting the same error in my my real example. By the way, if I were to use the column 'AMP' with perhaps another column in my table, then everything works and I am able to create df['GRP'] by applying the function f to each row. This shows the problem is related to using df['Time']. But then why am I able to select df[(df['AMP'] > 100) & (df['Time'] > day_1)]? Why would this work in this context, but not when the condition appears in a function?
Based on your error message and example, there are two things to fix. One is to adjust parentheses for operator precedence in your final elif statement. The other is to avoid mixing datetime.date and Timestamp objects.
Fix 1: change this:
elif (row['AMP'] > 100 & row['Time'] < day_2):
to this:
elif (row['AMP'] > 100) & (row['Time'] < day_2):
These two lines are different because the bitwise & operator takes precedence over the < and > comparison operators, so python attempts to evaluate 100 & row['Time']. A full list of Python operator precedence is here: https://docs.python.org/3/reference/expressions.html#operator-precedence
Fix 2: Change these 3 lines:
import datetime as dt
day_1 = dt.date.today()
day_2 = dt.date.today() - dt.timedelta(days = 1)
to these 2 lines:
day1 = pd.to_datetime('today')
day_2 = day_1 - pd.DateOffset(days=1)
Some parentheses need to be added in the if-statements:
import numpy as np
import pandas as pd
mydf = pd.DataFrame({'a':np.arange(10),
'b':np.random.rand(10)})
def f1(row):
if (row['a'] < 5) & (row['b'] < 0.5):
value = 'less'
elif (row['a'] < 5) & (row['b'] > 0.5):
value = 'more'
else:
value = 'same'
return value
mydf['GRP'] = mydf.apply(f1, axis=1)
If you don't need to use a custom function, then you can use multiple masks (somewhat similar to this SO post)
For the Time column, I used this code. It may be that you were trying to compare Time column values that did not have the required dtype (??? this is my guess)
import datetime as dt
mydf['Time'] = pd.date_range(start='10/14/2018', end=dt.date.today())
day_1 = pd.to_datetime(dt.date.today())
day_2 = day_1 - pd.DateOffset(days = 1)
Here is the raw data
mydf
a b Time
0 0 0.550149 2018-10-14
1 1 0.889209 2018-10-15
2 2 0.845740 2018-10-16
3 3 0.340310 2018-10-17
4 4 0.613575 2018-10-18
5 5 0.229802 2018-10-19
6 6 0.013724 2018-10-20
7 7 0.810413 2018-10-21
8 8 0.897373 2018-10-22
9 9 0.175050 2018-10-23
One approach involves using masks for columns
# Append new column
mydf['GRP'] = 'same'
# Use masks to change values in new column
mydf.loc[(mydf['a'] < 5) & (mydf['b'] < 0.5) & (mydf['Time'] < day_2), 'GRP'] = 'less'
mydf.loc[(mydf['a'] < 5) & (mydf['b'] > 0.5) & (mydf['Time'] > day_1), 'GRP'] = 'more'
mydf
a b Time GRP
0 0 0.550149 2018-10-14 same
1 1 0.889209 2018-10-15 same
2 2 0.845740 2018-10-16 same
3 3 0.340310 2018-10-17 less
4 4 0.613575 2018-10-18 same
5 5 0.229802 2018-10-19 same
6 6 0.013724 2018-10-20 same
7 7 0.810413 2018-10-21 same
8 8 0.897373 2018-10-22 same
9 9 0.175050 2018-10-23 same
Another approach is to set a, b and Time as a multi-index and use index-based masks to set values
mydf.set_index(['a','b','Time'], inplace=True)
# Get Index level values
a = mydf.index.get_level_values('a')
b = mydf.index.get_level_values('b')
t = mydf.index.get_level_values('Time')
# Apply index-based masks
mydf['GRP'] = 'same'
mydf.loc[(a < 5) & (b < 0.5) & (t < day_2), 'GRP'] = 'less'
mydf.loc[(a < 5) & (b > 0.5) & (t > day_1), 'GRP'] = 'more'
mydf.reset_index(drop=False, inplace=True)
mydf
a b Time GRP
0 0 0.550149 2018-10-14 same
1 1 0.889209 2018-10-15 same
2 2 0.845740 2018-10-16 same
3 3 0.340310 2018-10-17 less
4 4 0.613575 2018-10-18 same
5 5 0.229802 2018-10-19 same
6 6 0.013724 2018-10-20 same
7 7 0.810413 2018-10-21 same
8 8 0.897373 2018-10-22 same
9 9 0.175050 2018-10-23 same
Source to filter by datetime and create a range of dates.
You have a excelent example here, it is very useful and you could apply filters after groupby. It is a way without using mask.
def get_letter_type(letter):
if letter.lower() in 'aeiou':
return 'vowel'
else:
return 'consonant'
In [6]: grouped = df.groupby(get_letter_type, axis=1)
https://pandas.pydata.org/pandas-docs/version/0.22/groupby.html
I am trying to create a function in python which will display the date. So I can see the program run, I have set one day to five seconds, so every five seconds it will become the next 'day' and it will print the date.
I know there is already an in-build function for displaying a date, however I am very new to python and I am trying to improve my skills (so excuse my poor coding.)
I have set the starting date to the first of January, 2000.
Here is my code:
import time
def showDate():
year = 00
month = 1
day = 1
oneDay = 5
longMonths = [1, 3, 5, 7, 8, 10, 12]
shortMonths = [4, 6, 9, 11]
while True:
time.sleep(1)
oneDay = oneDay - 1
if oneDay == 0:
if month in longMonths:
if day > 31:
day = day + 1
else:
month = month + 1
day = 0
if month == 2:
if day > 28:
day = day + 1
else:
month = month + 1
day = 0
if month in shortMonths:
if day > 30:
day = day + 1
else:
month = month + 1
day = 0
if day == 31 and month == 12:
year = year + 1
print(str(day) + '/' + str(month) + '/' + str(year))
oneDay = 5
showDate()
However, when I try to run the program this is the output I get this:
>>>
0/3/0
0/5/0
0/7/0
0/8/0
0/10/0
0/12/0
0/13/0
0/13/0
0/13/0
I don't know why this is happening, could someone please suggest a solution?
There's no possible path through your code where day gets incremented.
I think you are actually confused between > and <: you check if day is greater than 31 or 28, which it never is. I think you mean if day < 31: and so on.
First of all, it's easier to just set time.sleep(5) instead of looping over time.sleep(1) 5 times. It's better to have a list of values with days of the month, not just 2 lists of the long and short months. Also your while loop is currently indefinite, is that intentional?
Anyway, your main problem was comparing day > 31, but there's lots of things that can be improved. As I said, I'm removing the use of oneDay to just do sleep(5) as it's cleaner and having one daysInMonths list.
import time
def showDate():
year = 00
month = 1
day = 1
daysInMonths = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
Now you can have only one if check about if the day has reached the end of a month, like this:
while True:
time.sleep(5)
if day < daysInMonths[month-1]:
day += 1
This will check the index of the list for the current month. It uses -1 because lists begin at index 0, and your months begin at 1. (ie. the months run from 1-12 but the list's indices are 0-11). Also I used the += operator, which is basically short hand for var = var + something. It works the same and looks neater.
This test encompasses all months, and then the alternative scenario is that you need to increment the month. I recommend in this block that you first check if the month is 12 and then increment the year from there. Also you should be setting day and month back to 1, since that was their starting value. If it's not the end of the year, increment the month and set day back to 1.
else:
if month == 12:
year += 1
day = 1
month = 1
else:
month += 1
day = 1
print("{}/{}/{}".format(day, month, year))
I also used the string.format syntax for neatness. With format, it will substitute the variables you pass in for {} in the string. It makes it easier to lay out how the string should actually look, and it converts the variables to string format implicitly.
Try this.
The day comparisons should be <, not >. When going to the next month, I set the day to 1, because there are no days 0 in the calendar. And I use elif for the subsequent month tests, because all the cases are exclusive.
def showDate():
year = 00
month = 1
day = 1
oneDay = 5
longMonths = [1, 3, 5, 7, 8, 10, 12]
shortMonths = [4, 6, 9, 11]
while True:
time.sleep(1)
oneDay = oneDay - 1
if oneDay == 0:
if month in longMonths:
if day < 31:
day = day + 1
else:
month = month + 1
day = 1
elif month == 2:
if day < 28:
day = day + 1
else:
month = month + 1
day = 1
if month in shortMonths:
if day < 30:
day = day + 1
else:
month = month + 1
day = 1
if day == 31 and month == 12:
year = year + 1
month = 1
print(str(day) + '/' + str(month) + '/' + str(year))
oneDay = 5