Trouble selecting records based on date in python/pandas - python

I have a column in my dataframe df_events called 'Program Date Time.' I've successfully created separate columns for EventDate and EventTime based on this using df_events.ProgramDateTime.dt.date and df_events.ProgramDateTime.dt.time.
My problem occurs when I try to select records between two dates. I seem to run into all kinds of type errors whatever I try.
I'm a relatively new Python/pandas user, just recently familiar with dataframes. I'm using Python3.7.
I have tried using strptime and even just trying to select records based on the original column ProgramDateTime.
I'm also writing this code in Sublime Text
import pandas as pd, numpy as np
from datetime import datetime
File_Path = 'path'
Event_csv = 'file.csv'
df_event = pd.read_csv(File_Path+Event_csv)
# Indicate analysis period
StartDate = datetime.strptime('2018-08-08', '%Y-%m-%d')
EndDate = datetime.strptime('2019-07-01', '%Y-%m-%d')
# Change appropriate column in Events dataframe to make sure in Datetime format.
df_event['ProgramDateTime'] = pd.to_datetime(df_event['ProgramDateTime'])
#Create separate columns for Event Date and Time in dataframe
df_event['EventDate'], df_event['EventTime'] = df_event.ProgramDateTime.dt.date, df_event.ProgramDateTime.dt.time
# Create dataframe of programs occurring only during analysis period
df_event_ap = df_event[df_event['EventDate']>=StartDate and df_event['EventDate']<=EndDate]
print(df_event_ap.dtypes)
print(df_even_ap.head(11))
I expect to see a new dataframe, df_events_ap, containing only those records that are between StartDate and EndDate.
Instead, the problem happens just as Python's supposed to select the records (the code underneath the last comment (#) line.)
I get this error:
TypeError: can't compare datetime.datetime to datetime.date

The first thing that I can spot is your df_event['EventDate']. This needs to be once again converted into datetime format:
df_event['EventDate'] = pd.to_datetime(df_event['EventDate'])
Then do:
from datetime import datetime
StartDate = datetime.strptime('2018-08-08', '%Y-%m-%d')
EndDate = datetime.strptime('2019-07-01', '%Y-%m-%d')
Now that StartDate, EndDate and df_event['EventDate'] are all in the same format, you have to do:
df_event_ap = df_event[(df_event['EventDate']>=StartDate) & (df_event['EventDate']<=EndDate)]
You will now get your both outputs:
Output for first print statement:
print(df_event_ap.dtypes)
ProgramDateTime datetime64[ns]
EventDate datetime64[ns]
EventTime object
dtype: object
Output for second print statement:
print(df_event_ap.head(11))
ProgramDateTime EventDate EventTime
0 2018-12-20 12:46:52 2018-12-20 12:46:52
2 2018-12-25 12:46:52 2018-12-25 12:46:52
4 2018-11-20 12:46:52 2018-11-20 12:46:52
5 2018-12-10 12:46:52 2018-12-10 12:46:52

Related

how to convert data which is in time delta into datetime.?

|Event Date|startTime|
|----------|---------|
|2022-11-23|0 days 08:30:00|
when i was tring to get data a sql table to dataframe using variables from columns of other dataframe
it came like this i want it only the time 08:30:00 what to do to get the required output
output I need is like this
|Event Date|startTime|
|----------|---------|
|2022-11-23|08:30:00|
i tried
sql['startTime']=pd.to_datetime(df1['startTime']).dt.time
it is showing this error
TypeError: <class 'datetime.time'> is not convertible to datetime
tried finding for it be didn't get anything useful solution but came across the opposite situation question still not useful info present in the question for my situation
Add the timedelta to a datetime, then you have a time component. Ex:
import pandas as pd
df = pd.DataFrame({"Event Date": ["2022-11-23"],
"startTime": ["0 days 08:30:00"]})
# ensure correct datatypes; that might be not necessary in your case
df["Event Date"] = pd.to_datetime(df["Event Date"])
df["startTime"] = pd.to_timedelta(df["startTime"])
# create a datetime Series that includes the timedelta
df["startDateTime"] = df["Event Date"] + df["startTime"]
df["startDateTime"].dt.time
0 08:30:00
Name: startDateTime, dtype: object

In Python,Excel,Pandas, I Need to move row based on Date to new sheet, but getting date error

so I have an excel report that is run weekly that we manually modify. I have created a Python scrip to remove unwanted data and format it. No I have a request that I need to copy data that is 1 year or older from today, 2 years, 3years and so on. not a programmer so bare with me.
I import the Excel
>excel_workbook = 'Excel.xlsx'
sheet1 = pd.read_excel(excel_workbook, sheet_name='Sheet1', keep_default_na= False, index_col=0,
parse_dates=['DT RECD'])
then when i try to set date tp format i get this:
today = pd.datetime.now().date()
print(today)
oneYear = today - pd.Timedelta.days(365) #timedelta(days=365)
print(oneYear)
twoYear = today - pd.Timedelta.days(730) #timedelta(days=730)
print(twoYear)
errors:
pandas.core.arrays.datetimelike.InvalidComparison: 730 days, 0:00:00
raise TypeError(f"Invalid comparison between dtype={left.dtype} and {typ}")
TypeError: Invalid comparison between dtype=datetime64[ns] and timedelta
how do I match dates so I can State something like OverOneYear = <= oneyear and >= twoyear on Column 'DT RECD'
so i have made the changes:
today = datetime.date.today()
oneYear = today - pd.DateOffset(years=1)
twoYear = today - pd.DateOffset(years=2)
Output:
2021-03-01
2020-03-01 00:00:00
2019-03-01 00:00:00
Oringinal Time Stamp
MM/DD/YYY
after changes
2020-03-01 00:00:00
Sample Data set
Dataset
Trying to filter
YearOne[YearOne['DT RECD'].between(oneYear, twoYear)]
print(YearOne)
no error - But date 2021-03-01(was this after import "parse_dates=['DT RECD']") does not match 2020-03-01 00:00:00, i do not need the 00:00:00. if i can drop it i think it will filter just fine.
It is in datetime64[ns]
sheet1.dtypes
FINISH object
LENGTH object
Qty on Hand int64
Unit Cost float64
DT RECD datetime64[ns]
PRODUCED object
Status object
You can replace your setting of oneYear and twoYear as follows:
oneYear = today - pd.offsets.DateOffset(years=1)
twoYear = today - pd.offsets.DateOffset(years=2)
If you want to select for dates in Column 'DT RECD' within the last one year and 2 years, you can use: (assuming df is the name of DataFrame)
df_selected = df[df['DT RECD'].between(oneYear, twoYear)) # extract into new variable
print(df_selected) # view the selected result
Edit:
If you want to keep the dates in datetime.date format instead of Timestamp format, add the following 2 lines in addition to the above:
oneYear = oneYear.to_pydatetime().date()
twoYear = twoYear.to_pydatetime().date()

how to change the data type date object to datetime in python?

In a train data set, datetime column is an object . First row of this column : 2009-06-15 17:26:21 UTC . I tried splitting the data
train['Date'] = train['pickup_datetime'].str.slice(0,11)
train['Time'] = test['pickup_datetime'].str.slice(11,19)
So that I can split the Date and time as two variables and change them to datetime data type. Tried lot of methods but could not get the result.
train['Date']=pd.to_datetime(train['Date'], format='%Y-%b-%d')
Also tried spliting the date,time and UTC
train['DateTime'] = pd.to_datetime(train['DateTime'])
Please suggest a code for this. I am a begginer.
Thanks in advance
I would try the following
import pandas as pd
#create some random dates matching your formatting
df = pd.DataFrame({"date": ["2009-06-15 17:26:21 UTC", "2010-08-16 19:26:21 UTC"]})
#convert to datetime objects
df["date"] = pd.to_datetime(df["date"])
print(df["date"].dt.date) #returns the date part without tz information
print(df["date"].dt.time) #returns the time part
Output:
0 2009-06-15
1 2010-08-16
Name: date, dtype: object
0 17:26:21
1 19:26:21
Name: date, dtype: object
For further information feel free to consult the docs:
dt.date
dt.time
For your particular case:
#convert to datetime object
df['pickup_datetime']= pd.to_datetime(df['pickup_datetime'])
# seperate date and time
df['Date'] = df['pickup_datetime'].dt.date
df['Time'] = df['pickup_datetime'].dt.time

How to import date column from csv in python in format d/m/y

I have a data sheet in which issue_d is a date column having values stored in a format - 11-Dec. On clicking any cell of the column, date is coming as 12/11/2018.
But while reading the csv file, issue_d is getting imported as 11-Dec. Year is not getting imported.
How do I get the issue_d column in format- d/m/y?
Code i tried -
import pandas
data=pandas.read_csv('Project_data.csv')
print(data)
checking issue_d column: data['issue_d']
result :
0 11-Dec
1 11-Dec
2 11-Dec
expected:
0 11-Dec-2018
1 11-Dec-2018
2 11-Dec-201
You can use to_datetime with add year to column:
df['issue_d'] = pd.to_datetime(df['issue_d'] + '-2018')
print (df)
issue_d
0 2018-12-11
1 2018-12-11
2 2018-12-11
A more 'controllable' way of getting the data is to first get the datetime from the data frame as normal, and then convert it:
dt = dt.strftime('%Y-%m-%d')
In this case, you'd put %d in front. strftime is a great technique because it allows the most customization when converting a datetime variable, and I used it in my tutorial book - if you're a beginner to python algorithms, you should definitely check it out!
After you do this, you can splice out each individual month, day, and year, and then use
strftime("%B")
to get the string-name of the month (e.g. "February").
Good Luck!

How do I change the Date but not the Time of a Timestamp within a dataframe column?

Python 3.6.0
I am importing a file with Unix timestamps.
I’m converting them to Pandas datetime and rounding to 10 minutes (12:00, 12:10, 12:20,…)
The data is collected from within a specified time period, but from different dates.
For our analysis, we want to change all dates to the same dates before doing a resampling.
At present we have a reduce_to_date that is the target for all dates.
current_date = pd.to_datetime('2017-04-05') #This will later be dynamic
reduce_to_date = current_date - pd.DateOffset(days=7)
I’ve tried to find an easy way to change the date in a series without changing the time.
I was trying to avoid lengthy conversions with .strftime().
One method that I've almost settled is to add the reduce_to_date and df['Timestamp'] difference to df['Timestamp']. However, I was trying to use the .date() function and that only works on a single element, not on the series.
GOOD!
passed_df['Timestamp'][0] = passed_df['Timestamp'][0] + (reduce_to_date.date() - passed_df['Timestamp'][0].date())
NOT GOOD
passed_df['Timestamp'][:] = passed_df['Timestamp'][:] + (reduce_to_date.date() - passed_df['Timestamp'][:].date())
AttributeError: 'Series' object has no attribute 'date'
I can use a loop:
x=1
for line in passed_df['Timestamp']:
passed_df['Timestamp'][x] = line + (reduce_to_date.date() - line.date())
x+=1
But this throws a warning:
C:\Users\elx65i5\Documents\Lightweight Logging\newmain.py:60: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
The goal is to have all dates the same, but leave the original time.
If we can simply specify the replacement date, that’s great.
If we can use mathematics and change each date according to a time delta, equally as great.
Can we accomplish this in a vectorized fashion without using .strftime() or a lengthy procedure?
If I understand correctly, you can simply subtract an offset
passed_df['Timestamp'] -= pd.offsets.Day(7)
demo
passed_df=pd.DataFrame(dict(
Timestamp=pd.to_datetime(['2017-04-05 15:21:03', '2017-04-05 19:10:52'])
))
# Make sure your `Timestamp` column is datetime.
# Mine is because I constructed it that way.
# Use
# passed_df['Timestamp'] = pd.to_datetime(passed_df['Timestamp'])
passed_df['Timestamp'] -= pd.offsets.Day(7)
print(passed_df)
Timestamp
0 2017-03-29 15:21:03
1 2017-03-29 19:10:52
using strftime
Though this is not ideal, I wanted to make a point that you absolutely can use strftime. When your column is datetime, you can use strftime via the dt date accessor with dt.strftime. You can create a dynamic column where you specify the target date like this:
pd.to_datetime(passed_df.Timestamp.dt.strftime('{} %H:%M:%S'.format('2017-03-29')))
0 2017-03-29 15:21:03
1 2017-03-29 19:10:52
Name: Timestamp, dtype: datetime64[ns]
I think you need convert df['Timestamp'].dt.date to_datetime, because output of date is python date object, not pandas datetime object:
df=pd.DataFrame({'Timestamp':pd.to_datetime(['2017-04-05 15:21:03','2017-04-05 19:10:52'])})
print (df)
Timestamp
0 2017-04-05 15:21:03
1 2017-04-05 19:10:52
current_date = pd.to_datetime('2017-04-05')
reduce_to_date = current_date - pd.DateOffset(days=7)
df['Timestamp'] = df['Timestamp'] - reduce_to_date + pd.to_datetime(df['Timestamp'].dt.date)
print (df)
Timestamp
0 2017-04-12 15:21:03
1 2017-04-12 19:10:52

Categories

Resources