I'm new to Python and programming in general, so I wasn't able to figure out the following: I have a dataframe named ozon, for which column 1 is the time stamp in mm-dd format. Now I want to change that column to a datetime format using the following code:
ozon[1] = pd.to_datetime(ozon[1], format='%m-%d')
Now this is giving me the following error: ValueError: day is out of range for month.
I think it has to do with the fact that it's a leap year, so it doesn't recognize February 29 as a valid date. How can I overcome this error? And could I also add a year to the timestamp (2020)?
Thanks so much in advance!
Add year to column and also to format:
ozon[1] = pd.to_datetime(ozon[1] + '-2000', format='%m-%d-%Y')
If still not working because some values are not valid add errors='coerce' parameter:
ozon[1] = pd.to_datetime(ozon[1] + '-2000', format='%m-%d-%Y', errors='coerce')
Related
I'm having trouble align two different dates. I have an excel import which I turn into a DateTime in pandas and I would like to compare this DateTime with the current DateTime. The troubles are in the formatting of the imported DateTime.
Excel format of the date:
2020-07-06 16:06:00 (which is yyyy-dd-mm hh:mm:ss)
When I add the DateTime to my DataFrame it creates the datatype Object. After I convert it with pd.to_datetime it creates the format yyyy-mm-dd hh:mm:ss. It seems that the month and the day are getting mixed up.
Example code:
df = pd.read_excel('my path')
df['Arrival'] = pd.to_datetime(df['Arrival'], format='%Y-%d-%m %H:%M:%S')
print(df.dtypes)
Expected result:
2020-06-07 16:06:00
Actual result:
2020-07-06 16:06:00
How do I resolve this?
Gr,
Sempah
An ISO-8601 date/time is always yyyy-MM-dd, not yyyy-dd-MM. You've got the month and date positions switched around.
While localized date/time strings are inconsistent about the order of month and date, this particular format where the year comes first always starts with the biggest units (years) and decreases in unit size going right (month, date, hour, etc.)
It's solved. I think that I misunderstood the results. It already was working without me knowledge. Thanks for the help anyway.
I have created a dataframe through an API extract one of the fields of which is Datetime in the format YYYY-MM-DD HH:MM:SS. I am trying to convert this to a datetime format using the following command:
df_weather['DATETIME'] = pd.to_datetime( df_weather.DATETIME )
But I am getting errors, the last line of which is:
ValueError: ('Unknown string format:', '2018-01-01 0.00.00')
Is the problem that the Hours field is showing only 1 digit instead of a zero-padded value for values less than 10? If yes, how to correct that?
If no, what could be the problem here and how to resolve?
This is error is not because of hour with only one digit. It is because of your time format is not correct.
You can use this:
df_weather['DATETIME'] = pd.to_datetime(df_weather.DATETIME,format="%Y-%m-%d %H.%M.%S")
For time format, you can check this site: https://strftime.org/
My goal is to convert period to datetime.
If Life Was Easy:
master_df = master_df['Month'].to_datetime()
Back Story:
I built a new dataFrame that originally summed the monthly totals and made a 'Month' column by converting a timestamp to period. Now I want to convert that time period back to a timestamp so that I can create plots using matplotlib.
I have tried following:
Reading the docs for Period.to_timestamp.
Converting to a string and then back to datetime. Still keeps the period issue and won't convert.
Following a couple similar questions in Stackoverflow but could not seem to get it to work.
A simple goal would be to plot the following:
plot.bar(m_totals['Month'], m_totals['Showroom Visits']);
This is the error I get if I try to use a period dtype in my charts
ValueError: view limit minimum 0.0 is less than 1 and is an invalid Matplotlib date value.
This often happens if you pass a non-datetime value to an axis that has datetime units.
Additional Material:
Code I used to create the Month column (where period issue was created):
master_df['Month'] = master_df['Entry Date'].dt.to_period('M')
Codes I used to group to monthly totals:
m_sums = master_df.groupby(['DealerName','Month']).sum().drop(columns={'Avg. Response Time','Closing Percent'})
m_means = master_df.groupby(['DealerName','Month']).mean()
m_means = m_means[['Avg. Response Time','Closing Percent']]
m_totals = m_sums.join(m_means)
m_totals.reset_index(inplace=True)
m_totals
Resulting DataFrame:
I was able to cast the period type to string then to datetime. Just could not go straight from period to datetime.
m_totals['Month'] = m_totals['Month'].astype(str)
m_totals['Month'] = pd.to_datetime(m_totals['Month'])
m_totals.dtypes
I wish I did not get downvoted for not providing the entire dataFrame.
First change it to str then to date
index=pd.period_range(start='1949-01',periods=144 ,freq='M')
type(index)
#changing period to date
index=index.astype(str)
index=pd.to_datetime(index)
df.set_index(index,inplace=True)
type(df.index)
df.info()
Another potential solution is to use to_timestamp. For example: m_totals['Month'] = m_totals['Month'].dt.to_timestamp()
I have an excel file with a date column. Is there a way to change the date format to MM-DD-YY and create one more column with Quarter & Year? I am very new to Python and I would really appreciate it if you could help me with this one. Thanks!
Current format
Date format: Jan 1, 2016
Desired outcome
Date format: 01/01/2016
One more additional column with something like this "Q1-2016"
Python's datetime module's got you covered. For input:
myDate = datetime.strptime(<datestring>, "%b %d, %Y")
And for output:
print(myDate.strftime("%m/%d/%Y"))
Getting the quarter would be a little bit harder, but you could use myDate.month to figure something out with time ranges. See also, python datetime reference
example, using simple division so january-march are Q1, april-june are Q2, etc.:
print("Q%d-%d" % (myDate.month // 3 + 1, myDate.year))
I am trying to convert a dataframe column with a date and timestamp to a year-weeknumber format, i.e., 01-05-2017 03:44 = 2017-1. This is pretty easy, however, I am stuck at dates that are in a new year, yet their weeknumber is still the last week of the previous year. The same thing that happens here.
I did the following:
df['WEEK_NUMBER'] = df.date.dt.year.astype(str).str.cat(df.date.dt.week.astype(str), sep='-')
Where df['date'] is a very large column with date and times, ranging over multiple years.
A date which gives a problem is for example:
Timestamp('2017-01-01 02:11:27')
The output for my code will be 2017-52, while it should be 2016-52. Since the data covers multiple years, and weeknumbers and their corresponding dates change every year, I cannot simply subtract a few days.
Does anybody have an idea of how to fix this? Thanks!
Replace df.date.dt.year by this:
(df.date.dt.year- ((df.date.dt.week>50) & (df.date.dt.month==1)))
Basically, it means that you will substract 1 to the year value if the week number is greater than 50 and the month is January.