Formatting datetime in dataframe - python

I have a dataframe with number of different dates as index:
2005-01-02
2005-01-03
2005-01-04
2005-01-04
...
2014-12-30
2014-12-31
and i want to format them as MM-DD without changing the type to string. Can someone help me with that? And second question: If I do that, can I still use dt.dayofyear?

Simple way
df.index.str[5:]
More common way
df.index.strftime('%m-%d')

Related

pandas dataframe datetime - convert string to datetime offset

I have a dataframe like:
This time
Time difference
2000-01-01 00:00:00
-3:00
2000-03-01 05:00:00
-5:00
...
...
2000-01-24 16:10:00
-7:00
I'd like to convert the 2nd column (-3:00 means minus 3 hours) from string into something like a time offest that I can directly use to operate with the 1st column (which is already in datetime64[ns]).
I thought there was supposed to be something in pd that does it but couldn't find anything straightforward. Does anyone have any clue?
You can use pd.to_timedelta:
df['Time difference'] = pd.to_timedelta(df['Time difference']+':00')
Obs: I used + ':00' because the default format for string conversion in pd.to_timedelta is "hh:mm:ss".

how to change timestamp column order in python?

I would like to change the order of the column but the column name is time stamp.
How can I change the order of timestamp column?
Here is the example of data I've got.
It is in data frame and the package I am using is pandas and numpy
properties 2020-11-28 03:00:00 2020-12-26 02:00:00 2020-12-12 01:00:00
Percent 76.5 77.62 71.89
Power 718.828 717.949 718.828
I've used below query to change the order of the column but I've got error message saying
Key Error:'value not in index'
total_top4 = tot_top4[['THING DESCRIPTION','2020-11-28 03:00:00', '2020-12-12 01:00:00','2020-12-26 02:00:00']]
total_top4
Can someone please tell me how to change timestamp format column order?
try to set the df using the columns attribute.
I am assuming total_top4 is your dataframe.
total_top4.columns=['THING DESCRIPTION','2020-11-28 03:00:00', '2020-12-12 01:00:00','2020-12-26 02:00:00']
Please try and let me know if this helps you! Thanks

Pandas: How to group by time for each value in other column

I hava a dataframe like so:
column-one column-two column-3 column-4 column-5 date
aaa qqq cat1 dsj dak 2010-01-01 20:00:00
ooo www cat2 fnk qwe 2011-01-02 19:00:00
oll wee cat2 fek wqw 2011-03-02 22:00:00
Column-3 contains the categories in the dataframe. There are approximately 10-12 individual categories. For each category I am trying to count the number of times it occurs for each time(hour/date etc.) in the 'date' column. I ultimately want to be able to graph my results for each category individually. As well as being able to store my results in the dataframe.
This problem has stumped me for quite a while. If anyone has any suggestions please let me know. Or if you need anymore information. Thanks!
I think you might be looking for this?
df.groupby(['date', 'column-3']).size()
It's a little difficult to understand your question. This answer is responding to your comment for #Sina Shabani. If you want to get this information for only one column at a time, you'd use:
col_val_i_want = 'cat1' # Define what you want
mask = df['column-3'].eq(col_val_i_want) # Create a filter
df[mask].groupby('date').count() # Group by and get the count

Pandas idxmax() doesn't work on Series grouped by time periods which contain NaNs

I have a Series that contains scalar values indexes by days over several years. For some years there are not data.
2014-10-07 5036.883410
2013-10-11 5007.515654
2013-10-27 5020.184053
2014-09-12 5082.379630
2014-10-14 5032.669801
2014-10-30 5033.276159
2016-10-03 5046.921912
2016-10-19 5141.861889
2017-10-06 5266.138810
From this I want to get
1. the maximum for each year
2. the day of the maximum for each year
For those years where there are not data, there should be a nan.
To resolve 1. the following works:
import pandas as pd
import numpy as np
data= pd.Series( index=pd.DatetimeIndex(['2014-10-07', '2013-10-11', '2013-10-27', '2014-09-12', '2014-10-14', '2014-10-30', '2016-10-03', '2016-10-19', '2017-10-06'], dtype='datetime64[ns]', name='time', freq=None), data=np.array([5036.88341035, 5007.51565355, 5020.18405295, 5082.37963023, 5032.66980146, 5033.27615931, 5046.92191246, 5141.86188915, 5266.1388102 ]))
# get maximum of each year
data.resample('A').max()
However, I tried different options to get the index of the date with the maximum, but they all failed:
data.resample('A').idxmax()
This raises the following Attribute error:
AttributeError: 'DatetimeIndexResampler' object has no attribute 'idxmax'
Then I tried the following:
data.groupby(pd.TimeGrouper('A')).idxmax()
but this gave an ValueError without specification.
I then found this workaround:
data.groupby(pd.TimeGrouper('A')).agg( lambda x : x.idxmax() )
but I did not wore either for temporally grouped data:
ValueError: attempt to get argmax of an empty sequence
Apparently the reported bug has not been fixed yet and the suggested workaround for categorical data does not seem to work for temporally grouped/resampled data.
Can anyone provide a suitable workaround for this case or maybe an entirely different (and efficient) solution approach to the above problem?
Thanks in advance!
The problem is that you have no records during 2015, but a time period for 2015 is created since it is inside your years' range. You need to manually process this case:
data.resample('A').agg(
lambda x : np.nan if x.count() == 0 else x.idxmax()
)
Output:
time
2013-12-31 2013-10-27
2014-12-31 2014-09-12
2015-12-31 NaT
2016-12-31 2016-10-19
2017-12-31 2017-10-06
Freq: A-DEC, dtype: datetime64[ns]

Convert time format in pandas

I have a string object in this format 2014-12-08 09:30:00.066000 but I want to convert to datetime variable. I also want this to be less granular- I want it to be just in the order of second for example
2014-12-08 09:30:00.066000 to 2014-12-08 09:30:00
I am trying to use pd.to_datetime function but it's not working for me. Anyone know how to do this? Thanks!
See this:
How to round a Pandas `DatetimeIndex`?
from pandas.lib import Timestamp
def to_the_second(ts):
return Timestamp(long(round(ts.value, -9)))
df['My_Date_Column'].apply(to_the_second)

Categories

Resources