I have to implement the equivalent of the following SQL in a Pandas dataframe:
select * from table where ISNULL(date, GETDATE()) >= as_of_date
Basically, I want to select the rows where the value of date is more than as_of_date. There are some rows where date is null, and in those cases, I want to only select those rows if as_of_date is less than or equal to today's date.
Is there a way to do this in Pandas?
You might need:
from datetime import date
df[df.date.fillna(date.today()) >= as_of_date]
You also need to make sure date column and as_of_date are both datetime objects, if not, use pd.to_datetime() to convert:
df['date'] = pd.to_datetime(df.date)
as_of_date = pd.to_datetime(as_of_date)
df[(df['date'] < datetime.now().date()) & (df['date'] == None)]
But note this is just an example if you provide same code and df I can help you with greater details.
Related
So I have this data frame that has many columns, but I'm only interested in the data spanning from say 01/01/2009-01/01/2019 so I want to keep all the data in that range and get rid of everything else
Assuming date column name as date_column
df_new = df[(df['date_column'] > '01/01/2009') & (df['date_column'] <= '01/01/2019')]
print(df_new)
If they're correctly formatted:
df_new = df[df['date_col'].between('2009-01-01', '2019-01-01')]
this will work no for any date format, dd-mm-yyyy or yyyy-mm-dd
df[(pd.to_datetime(df['Date']).dt.year >= 2009) & (pd.to_datetime(df['Date']).dt.year <= 2019)]
df1 = df1[df1['TIME STAMP'].between('2021-01-27 00:00:00', '2021-10-10 23:59:59')]
The above code is selecting a dataframe from two specific dates and it works fine.
I want to select a from date and to date (infinity/the last date of dataframe) or any option to select only from date.
You can use comparison operators between timestamps:
import pandas as pd
df1 = df1[df1['TIME STAMP'] >= pd.Timestamp('2021-01-27 00:00:00')]
How do you pull a certain date from a datetime column. I have been using this
df.loc[(df['column name'] == 'date')]
But it cannot find the date although it is in the df.
Your datetime column has probably smaller granularity than just the date(year,month,day), the default in pandas is nanoseconds(ns) but it could also be just seconds in your case, depending on the data source. You can see the dtype by accessing df.column.dtype yourself, and it also helps with the case when you column isnt actually of datetime dtype, in which case you need to cast it to datetime first.
And '2001-15-12' is not equal to '2001-15-12 18:36:45:2242'
Neither '2001-15-12' to '2001-15-12 18:36:45'
If you only need dates, set the datetime colum to just the date like this, using the .dt accesor for datetime segments:
df['column name'] = df['column name'].dt.date
Then you'll be able to access
df.loc[(df['column name'] == 'date')] #using just the year, month and date in the format above.
I have a list of dates in a DF that have been converted to a YYYY-MM format and need to select a range. This is what I'm trying:
#create dataframe
data = ['2016-01','2016-02','2016-09','2016-10','2016-11','2017-04','2017-05','2017-06','2017-07','2017-08']
df = pd.DataFrame(data, columns = {'date'})
#lookup range
df[df["date"].isin(pd.date_range('2016-01', '2016-06'))]
It doesn't seem to be working because the date column is no longer a datetime column. The format has to be in YYYY-MM. So I guess the question is, how can I make a datetime column with YYYY-MM? Can someone please help?
Thanks.
You do not need an actual datetime-type column or query values for this to work. Keep it simple:
df[df.date.between('2016-01', '2016-06')]
That gives:
date
0 2016-01
1 2016-02
It works because ISO 8601 date strings can be sorted as if they were plain strings. '2016-06' comes after '2016-05' and so on.
I have a dataframe with columns: customerId, amount, date the date range of this dataframe is: date: 1/1/2016 9/9/2017 I am trying to find the top 10,000 customers will be determined by the total amount of money they have spent in the year 2016; I was going to sort the amount column in descending order and then parse the date column by just 2016 using
mask = (df['date'] >= '1/1/2016') & (df['date'] <'1/1/2017')
there has to be a smarter way to do this, I am new to coding so any help would be appreciated thanks!
Maybe you can try converting the column to datetime by:
df['date'] = pd.to_datetime(df['date'])
#then filter by year
mask = df['date'].apply(lambda x: x.year == 2016)
#A-Za-z's answer is more concise, but in case the column wasn't in datetime type already, you can convert it with pd.to_datetime.
You can use .dt accessor given that the date column is pandas datetime. Otherwise convert it to datetime first
df.date = pd.to_datetime(df.date)
df[df.date.dt.year == 2016]
Should give you the required rows. If you can post the sample dataset, it would be easier to test it