Aligning datetime formats for comparrison - python

I'm having trouble align two different dates. I have an excel import which I turn into a DateTime in pandas and I would like to compare this DateTime with the current DateTime. The troubles are in the formatting of the imported DateTime.
Excel format of the date:
2020-07-06 16:06:00 (which is yyyy-dd-mm hh:mm:ss)
When I add the DateTime to my DataFrame it creates the datatype Object. After I convert it with pd.to_datetime it creates the format yyyy-mm-dd hh:mm:ss. It seems that the month and the day are getting mixed up.
Example code:
df = pd.read_excel('my path')
df['Arrival'] = pd.to_datetime(df['Arrival'], format='%Y-%d-%m %H:%M:%S')
print(df.dtypes)
Expected result:
2020-06-07 16:06:00
Actual result:
2020-07-06 16:06:00
How do I resolve this?
Gr,
Sempah

An ISO-8601 date/time is always yyyy-MM-dd, not yyyy-dd-MM. You've got the month and date positions switched around.
While localized date/time strings are inconsistent about the order of month and date, this particular format where the year comes first always starts with the biggest units (years) and decreases in unit size going right (month, date, hour, etc.)

It's solved. I think that I misunderstood the results. It already was working without me knowledge. Thanks for the help anyway.

Related

Issue while converting pandas to datetime

I'm converting string to datetime datatype using pandas,
here is my snippet,
df[col] = pd.to_datetime(df[col], format='%H%M%S%d%m%Y', errors='coerce')
input :
col
00000001011970
00000001011970
...
00000001011970
output:
col
1970-01-01
1970-01-01
...
1970-01-01 00:00:00
the ouput consists of date and date with time..
I need the output as date with time.
PLease help me out where I am going wrong
The time is there. It just so happens, because it's midnight, 00:00:00, it is not showing explicitly.
You can see it's with e.g.
df[col].dt.minute
which will give a Series of 0's.
To print out the time explicitly, you could use
df[col].dt.strftime('%H:%M:%S')
Alter the format as you see fit.
Keep in mind that the visual output with anything in Pandas (or computers in general) does not have to be exactly what is stored. It is up to the programmer to format the output into what they want. But calculations on the variables still uses all (invisible) information.
Just like the other answer suggested time is there, but since it's midnight 00:00:00, it's not showing explicitly. To print out the date with time you can try this :
df[col] = pd.to_datetime(df[col], format='%H%M%S%d%m%Y', errors='coerce').dt.strftime('%Y-%m-%d %H:%M:%S')

convert time to UTC in pandas

I have multiple csv files, I've set DateTime as the index.
df6.set_index("gmtime", inplace=True)
#correct the underscores in old datetime format
df6.index = [" ".join( str(val).split("_")) for val in df6.index]
df6.index = pd.to_datetime(df6.index)
The time was put in GMT, but I think it's been saved as BST (British summertime) when I set the clock for raspberry pi.
I want to shift the time one hour backwards. When I use
df6.tz_convert(pytz.timezone('utc'))
it gives me below error as it assumes that the time is correct.
Cannot convert tz-naive timestamps, use tz_localize to localize
How can I shift the time to one hour?
Given a column that contains date/time info as string, you would convert to datetime, localize to a time zone (here: Europe/London), then convert to UTC. You can do that before you set as index.
Ex:
import pandas as pd
dti = pd.to_datetime(["2021-09-01"]).tz_localize("Europe/London").tz_convert("UTC")
print(dti) # notice 1 hour shift:
# DatetimeIndex(['2021-08-31 23:00:00+00:00'], dtype='datetime64[ns, UTC]', freq=None)
Note: setting a time zone means that DST is accounted for, i.e. here, during winter you'd have UTC+0 and during summer UTC+1.
To add to FObersteiner's response (sorry,new user, can't comment on answers yet):
I've noticed that in all the real world situations I've run across it (with full dataframes or pandas series instead of just a single date), .tz_localize() and .tz_convert() need to be called slightly differently.
What's worked for me is
df['column'] = pd.to_datetime(df['column']).dt.tz_localize('Europe/London').dt.tz_convert('UTC')
Without the .dt, I get "index is not a valid DatetimeIndex or PeriodIndex."

Pandas reads date from CSV incorrectly

I am very new to Python, and finding it very frustrating.
I have a CSV that I am importing, but its reading the date column incorrectly.
In the Month column, I have the 1st of each month - so it should read (yyyy-mm-dd):
2020-01-01
2020-02-01
2020-03-01
etc
however, its reading it as (yyyy-dd-mm)
2020-01-01
2020-01-02
2020-01-03
etc
I've tried several conversion functions from stackoverflow as well as other websites, but they either just don't work, or do nothing.
My import is as follows:
try:
collections_data = pd.read_csv('./monthly_collections.csv')
print("Collections Data imported successfully.")
except error as e:
print("Error importing Collections Data!")
I have tried the parse_dates parameter on the import, but it doesn't help.
If I then try this:
temp = pd.to_datetime(collections_data['Collections Month'], format='%m/%d/%Y')
temp
then I get
which you can see, it is reading the months as the days - in other words, it is showing individual days of the month, instead of the 1st day of each month.
I'd greatly appreciate some help to get these dates corrected, as I need to do some date calculations on them, and also join two tables based on this date - which is going to be my next problem.
Kind Regards
Inferring date format
Some dates are ambiguous, while others aren't. Consider these dates:
2020-27-01
2020-12-14
2020-01-02
11-10-12
In examples #1 & #2 we can easily infer that date format. In example #1, The first four digit have to be the year (there's no 2020th month or 2020th day of a month), the following two digits have to be the day of the month (there's no 27th month and we already have year information) and the last two digits are the month (we already have year and day of month information). We can use a similar approach for example #2.
For example #3 is that the first day of the second month, or is that the second day of the first month? It's impossible to tell without more information. If for instances we had the following sequence of dates: '2020-22-01', '2020-25-01', '2020-01-02', it would be reasonable to infer that '2020-01-02' refers to the first day of the second month, otherwise we would not be able to parse the previous two dates.
In example #4, it's impossible to infer the date format. Either pair of digits would make sense as a year, month or day. (Using pandas.read_csv() you can make use of the dayfirst and yearfirst kwargs, or explicitly declare your date formats and use pandas.to_datetime(some_df, format=).
Your problem
Your dates are ambiguous, from what you've included in your question is not possible to infer whether it's in a day first format (dd-mm) or a month first format (mm-dd). pandas defaults to dayfirst=False so a date like your date 2020-02-01 is expected to mean the second day of the first month unless you specific otherwise. See pandas.read_csv().
dayfirst : bool, default False
DD/MM format dates, international and European format.
Above means that in order to parse 01/02 (DD/MM), 2020/02/01 (iso/international format) or 01/02/2020 (European format) as the first day of the second month you will need to specify pandas.read_csv(somefile.csv, ... dayfirst=True).
I've tried several conversion functions from stackoverflow as well as other websites, but they either just don't work, or do nothing.
You haven't provided the code that you've used that didn't work, nor the code which you used which parsed your dates as month first. If you include an example of what you actually tried I can make a specific comment.
In your question you say that your date format is in (yyyy-mm-dd) but you passed format='%m/%d/%Y' and in your screenshots you have '/' and '-' as your separator in different places. So I'm not sure what your original dates look like.
What you passed to the format kwarg means the first two digits are zero-passed months (i.e 04) followed by a '/' then zero-padded days, followed by '/' and then year as yyyy. If what you wrote at the beginning of your question is correct you should have passed format='%Y-%m-%d' (see the strftime format codes).
Try https://towardsdatascience.com/4-tricks-you-should-know-to-parse-date-columns-with-pandas-read-csv-27355bb2ad0e
Essentially, try the dayfirst optional input for the read_csv function.
You would set it to True and have
collections_data = pd.read_csv('./monthly_collections.csv', dayfirst = True)

Convert column to datetime format, in a leap year

I'm new to Python and programming in general, so I wasn't able to figure out the following: I have a dataframe named ozon, for which column 1 is the time stamp in mm-dd format. Now I want to change that column to a datetime format using the following code:
ozon[1] = pd.to_datetime(ozon[1], format='%m-%d')
Now this is giving me the following error: ValueError: day is out of range for month.
I think it has to do with the fact that it's a leap year, so it doesn't recognize February 29 as a valid date. How can I overcome this error? And could I also add a year to the timestamp (2020)?
Thanks so much in advance!
Add year to column and also to format:
ozon[1] = pd.to_datetime(ozon[1] + '-2000', format='%m-%d-%Y')
If still not working because some values are not valid add errors='coerce' parameter:
ozon[1] = pd.to_datetime(ozon[1] + '-2000', format='%m-%d-%Y', errors='coerce')

How to convert date to datetime?

I have this type of date '20181115 0756' and also in a diffent dataframe in this format '2018-11-15'. I would like to know if there is any way to convert it to datetime without the hours and minutes
date['DATE']= pd.to_datetime(date.DATE)
this converts it to 218-11-15 00:00:00 and I'd like to avoid that
What I trying to do is to calcuate the time difference between the dates in the two dataframes that I have
Thank you in advance
You can use the following code
date['DATE'] = pd.to_datetime(date['DATE'], errors='coerce').dt.date

Categories

Resources