Pandas datetime inconsistent format - python

I hope someone can help me with the following:
I'm trying to convert my data to daily averages using:
df['timestamp'] = pd.to_datetime(df['Datum WSM-09'])
df_daily_avg = df.groupby(pd.Grouper(freq='D', key='timestamp')).mean()
df['Datum WSM-09'] looks like this:
0 6-3-2020 12:30
1 6-3-2020 12:40
2 6-3-2020 12:50
3 6-3-2020 13:00
4 6-3-2020 13:10
...
106785 18-3-2022 02:00
106786 18-3-2022 02:10
106787 18-3-2022 02:20
106788 18-3-2022 02:30
106789 18-3-2022 02:40
Name: Datum WSM-09, Length: 106790, dtype: object
However, when executing the first line the data under "timestamp" is inconsistent. The last rows displayed in the picture are correct. For the first ones, it should be 2020-03-06 12:30. The month and the day are switched around.
Many thanks

Try using the "dayfirst" option:
df['timestamp'] = pd.to_datetime(df['Datum WSM-09'], dayfirst=True)

In https://xkcd.com/1179 Randall Munroe explains
that "you're doing it Wrong."
Your source column is apparently object / text.
The March 18th timestamps are unambiguous,
as there's fewer than 18 months in the year.
The ambiguous March 6th timestamps make the hair
on the back of the black cat stand on end.
You neglected to specify a timestamp format,
given that the source column is ambiguously formatted.
Please RTFM:
format : str, default None
The strftime to parse time, e.g. "%d/%m/%Y". Note that "%f" will parse all the way up to nanoseconds. See strftime documentation for more information on choices.
You tried offering a value of None,
which is not a good match to your business needs.
I don't know what all of your input data looks like,
but perhaps %d-%m-%Y %H:%M would better
match your needs.

Related

How to formulate logic that will parse month-day date pair and append year based on previous value in Pandas row

Hello and thanks for taking a moment to read my issue. I currently have a column or series of data within a Pandas dataframe that I am attempting to parse into a proper YYYY-MM-DD (%Y-%m-%d %H:%M) type format. The problem is this data does not contain a year on its own.
cur_date is what I currently have to work with.
cur_date
Jan-20 14:05
Jan-4 05:07
Dec-31 12:07
Apr-12 20:54
Jan-21 06:12
Nov-3 04:10
Feb-5 11:45
Jan-7 07:09
Dec-3 12:11
req_date is what I am aiming to achieve.
req_date
2023-01-20 14:05
2023-01-04 05:07
2022-12-31 12:07
2022-04-12 20:54
2022-01-21 06:12
2021-11-03 04:10
2021-02-05 11:45
2021-01-07 07:09
2020-12-03 12:11
I am aware of writing something like the following df['cur_date'] = pd.to_datetime(df['cur_date'], format='%b-%d %H:%M') but this will not allow me to append a descending year to the individual row.
I tried various packages, one being dateparser which has some options to handle incomplete dates such as the settings={'PREFER_DATES_FROM': 'past'} setting but this does not have the capability to look back at a previous value and interpret the date as I am looking for.
i hope these codes work for you :)
note: When the epoch value is equal, it's up to you whether to change the year or not
import time
current_year = 2023
last = {"ly":current_year, "epoch":0}
def set_year(tt):
epoch = time.mktime(tt)
if epoch > last["epoch"] and last["epoch"] != 0: # first year must current year or you can compare with current time
last["ly"] -= 1
last["epoch"] = epoch
return str(last["ly"])
def transform_func(x):
time_tup = time.strptime(f"{current_year}-"+x, "%Y-%b-%d %H:%M") # const year for comparing
time_format = time.strftime("%m-%d %H:%M", time_tup)
ly = set_year(time_tup)
return f"{ly}-{time_format}"
df["req_date"] = df["cur_date"].transform(transform_func)

Turning hours, minutes and seconds (HH:MM:SS AM/PM) into 24h time in pandas [duplicate]

I have a Time column consisting of data of type 'str' in the following format:
1 5:21:26 PM
2 5:21:58 PM
3 5:22:22 PM
4 5:22:36 PM
5 7:18:16 PM
I'm trying to convert it into a 24 Hour format that'll look like this:
1 17:21:26
2 17:21:58
3 17:22:22
4 17:22:36
5 19:18:16
I followed the solution presented in a similar question here using the code
df['Time'] = pd.to_datetime(df['Time']).strftime('%H:%M:%S')
Which throws up a frustrating error OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-01-01 19:30:37
I've also found that no observation exists for 7:30:37 PM or 19:30:37 in my data frame. The above method works well for the rest of the data as opposed to one particular observation.
Are there any ways to override this error or any alternatives to convert the data in the column?
Please Advise.
Use format parameter with %I:%M:%S %p - here %I is for hours in 12H format and %p for match AM or PM:
df['Time'] = pd.to_datetime(df['Time'], format='%I:%M:%S %p').dt.strftime('%H:%M:%S')
print (df)
Time
1 17:21:26
2 17:21:58
3 17:22:22
4 17:22:36
5 19:18:16

dataframe datetimeindex changes

I have a dataframe with a date column. I want to turn this date column into my index. When I change the date column into pd.to_datetime(df['Date'], errors='raise', dayfirst=True) I get:
df1.head()
Out[60]:
Date Open High Low Close Volume Market Cap
0 2018-03-14 0.789569 0.799080 0.676010 0.701902 479149000 30865600000
1 2018-03-13 0.798451 0.805729 0.778471 0.789711 279679000 31213000000
2 2018-12-03 0.832127 0.838328 0.787882 0.801048 355031000 32529500000
3 2018-11-03 0.795765 0.840407 0.775737 0.831122 472972000 31108000000
4 2018-10-03 0.854872 0.860443 0.793736 0.796627 402670000 33418600000
The format of Date originally is string dd-mm-yyyy, but as you can see, the tranformation to datetime messes things up from the 2nd row on. How can I get consistent datetimes?
Edit: I think I solved it. Using the answers below about format I found out the error was in a package that I used to generate the data (\[cryptocmd\]). I changed the format to %Y-%m-%d in the utils script of the package and now it seems to work fine.
According to the docs:
dayfirst : boolean, default False
Specify a date parse order if arg is str or its list-likes. If True,
parses dates with the day first, eg 10/11/12 is parsed as 2012-11-10.
Warning: dayfirst=True is not strict, but will prefer to parse with
day first (this is a known bug, based on dateutil behavior).
Emphasis mine. Since you apparently know that your format is "dd-mm-yyyy" you should specify it explicitly:
df['Date'] = pd.to_datetime(df['Date'], format='%d-%m-%Y', errors='raise')

Adjust datetime in Pandas to get CustomBusinessWeek

I have a long series of stock daily prices and I am trying to get week prices to do some calculations. I have been reading the documentation and I see you can set offsets get a specific date of the week which is what I want. This is the code assume stock is part of a loop I am runing.
df_clean_BW[WEEKLY_PricesFriday'] = stock.resample('W-FRI').last()
But for US stock market there are many days where it is a holiday on Friday so then I saw you can adjust this for USCalendar Holidays. This is the code I was using
from pandas.tseries.offsets import CustomBusinessDay
from pandas.tseries.holiday import USFederalHolidayCalendar
bday_us = CustomBusinessDay(calendar=USFederalHolidayCalendar())
But I dont know how to combine the two so that if there is a holiday on Friday to take the day prior (the Thursday instead). So something like this but this throws an error
df_clean_BW[WEEKLY_PricesFriday'] = stock.resample('W-FRI' & bday_us).last()
I have a long list of dates so I don't want to create a list of exception days because that would be too long. Here is an example of the output I would want. In this case Jan 1, 2016 was a Friday so I just want to take December 31, 2015 instead. This must be a common request for anyone who looks at stock data but I cant figure out a way to do it.
Date Price Week Price
12/30/2015 103.3227
12/31/2015 101.3394
1/4/2016 101.426 101.3394 << Take 12/31 as 1.1 is holiday
1/5/2016 98.8844
1/6/2016 96.9492
1/7/2016 92.8575
1/8/2016 93.3485 93.3485
First generate your array of Fridays including holidays. Then use np.busday_offset() to offset them like this:
np.busday_offset(fridays, 0, roll='backward', busdaycal=bday_us.calendar)

Trouble with converting date format in python using datetime.strptime

Hi everyone I have a code like this to calculate the exact day on six_months back but unfortunately it prints the yy-mm-dd format and I want the dd/mm/yy format how do I do it(I tried to convert but it doesn't work)?What's wrong with my code?
import datetime
six_months = str(datetime.date.today() - datetime.timedelta(6*365/12-1)
datetime.datetime.strptime(six_months, '%Y-%m-%d').strftime('%d/%m/%y')
expected output=04/02/2017
current output=2017-02-04
The code is fine, you just forgot to save to result of
datetime.datetime.strptime(six_months, '%Y-%m-%d').strftime('%d/%m/%y')
to any variable. strptime doesn't change the object it is called on in any way, it returns a string.
I reckon your algorithm for computing 6 months back doesn't correspond to the real-world understanding of that phrase. Six months back from 4 August is 4 February and your computation gives the right answer for that. But six months back from 4 September is 4 March, and your computation gives the answer 7 March.
Your code also unnecessarily formats the computed date to a string, and then has to parse the string back into a date to get the dd/mm/yy format you want.
import datetime
from dateutil.relativedelta import *
six_months = datetime.date.today() + relativedelta(months=-6)
print (f"{six_months:%d/%m/%y}")
Output (until tomorrow) is
04/02/17

Categories

Resources