how to change timestamp column order in python? - python

I would like to change the order of the column but the column name is time stamp.
How can I change the order of timestamp column?
Here is the example of data I've got.
It is in data frame and the package I am using is pandas and numpy
properties 2020-11-28 03:00:00 2020-12-26 02:00:00 2020-12-12 01:00:00
Percent 76.5 77.62 71.89
Power 718.828 717.949 718.828
I've used below query to change the order of the column but I've got error message saying
Key Error:'value not in index'
total_top4 = tot_top4[['THING DESCRIPTION','2020-11-28 03:00:00', '2020-12-12 01:00:00','2020-12-26 02:00:00']]
total_top4
Can someone please tell me how to change timestamp format column order?

try to set the df using the columns attribute.
I am assuming total_top4 is your dataframe.
total_top4.columns=['THING DESCRIPTION','2020-11-28 03:00:00', '2020-12-12 01:00:00','2020-12-26 02:00:00']
Please try and let me know if this helps you! Thanks

Related

Sorting dataframe rows by Day of Date wise

I have made my dataframe. But I want to sort it by the date wise..For example, I want data for 02.01.2016 just after 01.01.2016.
df_data_2311 = df_data_231.groupby('Date').agg({'Wind Offshore in [MW]': ['sum']})
df_data_2311 = pd.DataFrame(df_data_2311)
After running this, I got the below output. This dataframe has 2192 rows.
Wind Offshore in [MW]
sum
Date
01.01.2016 5249.75
01.01.2017 12941.75
01.01.2018 19020.00
01.01.2019 13723.00
01.01.2020 17246.25
... ...
31.12.2017 21322.50
31.12.2018 13951.75
31.12.2019 21457.25
31.12.2020 16491.25
31.12.2021 35683.25
Kindly let me know How would I sort this data of the day of the date.
You can use the sort_values function in pandas.
df_data_2311.sort_values(by=["Date"])
However in order to sort them by the Date column you will need reset_index() on your grouped dataframe and then to convert the date values to datetime, you can use pandas.to_datetime.
df_data_2311 = df_data_231.groupby('Date').agg({'Wind Offshore in [MW]': ['sum']}).reset_index()
df_data_2311["Date"] = pandas.to_datetime(df_data_2311["Date"], format="%d.%m.%Y")
df_data_2311 = df_data_2311.sort_values(by=["Date"])
I recommend reviewing the pandas docs.

How can I calculate the number of days between two dates with different format in Python?

I have a pandas dataframe with a column of orderdates formatted like this: 2019-12-26.
However when I take the max of this date it will give 2019-12-12. While it is actually 2019-12-26. It makes sense because my dateformat is Dutch and the max() function uses the 'American' (correct me if I'm wrong) format.
This meas that my calculations aren't correct.
How I can change the way the function calculate? Or if thats not possible, change the format of my date column so the calculations are correct?
[In] df['orderdate'] = df['orderdate'].astype('datetime64[ns]')
print(df["orderdate"].max())
[Out] 2019-12-12 00:00:00
Thank you!

Filtering out improperly formatted datetime values in Python DataFrame

I have a DataFrame with one column storing the date.
However, some of these dates are properly formatted datetime objects like'2018-12-24 17:00:00'while others are not and are stored like '20181225'.
When I tried to plot these using plotly, the improperly formatted values got turned into EPOCH dates, which is a problem.
Is there any way I can get a copy of the DataFrame with only those rows with properly formatted dates?
I tried using
clean_dict= dailySum_df.where(dailySum_df[isinstance(dailySum_df['time'],datetime.datetime)])
methods and but it doesn't to work due to the 'Array conditional must be same shape as self' error.
dailySum_df = pd.DataFrame(list(cursors['dailySum']))
trace = go.Scatter(
x=dailySum_df['time'],
y=dailySum_df['countMessageIn']
)
data = [trace]
py.plot(data, filename='basic-line')
Apply dateutil.parser, see also my answer here:
import dateutil.parser as dparser
def myparser(x):
try:
return dparser.parse(x)
except:
return None
df = pd.DataFrame( {'time': ['2018-12-24 17:00:00', '20181225', 'no date at all'], 'countMessageIn': [1,2,3]})
df.time = df.time.apply(myparser)
df = df[df.time.notnull()]
Input:
time countMessageIn
0 2018-12-24 17:00:00 1
1 20181225 2
2 no date at all 3
Output:
time countMessageIn
0 2018-12-24 17:00:00 1
1 2018-12-25 00:00:00 2
Unlike Gustavo's solution this can handle rows with no recognizable date at all and it filters out such rows as required by your question.
If your original time column may contain other text besides the dates themselves, include the fuzzy=True parameter as shown here.
Try parsing the dates column of your dataframe using dateutil.parser.parse and Pandas apply function.

How to convert date to datetime?

I have this type of date '20181115 0756' and also in a diffent dataframe in this format '2018-11-15'. I would like to know if there is any way to convert it to datetime without the hours and minutes
date['DATE']= pd.to_datetime(date.DATE)
this converts it to 218-11-15 00:00:00 and I'd like to avoid that
What I trying to do is to calcuate the time difference between the dates in the two dataframes that I have
Thank you in advance
You can use the following code
date['DATE'] = pd.to_datetime(date['DATE'], errors='coerce').dt.date

Change all NaT values in Pandas dataframe to Timedelta 00:00:00

I have a dataframe in Pandas, and one column "timeOff" has some NaT values.
All I want to do is change all the NaT values to timeDelta values with '00:00:00' as the value.
This is my current output:
Output with NaT values
I have tried to run this line of code:
replaceNaT = pd.to_timedelta('00:00:00')
print(replaceNaT)
startEndEventsDataframe['timeOff'] = np.where(pd.isnull(startEndEventsDataframe['timeOff']) == True, replaceNaT, startEndEventsDataframe['timeOff'])
But this destroys all the values in my dataframe column, as seen below:
After running code from above
I would like for all the values that are not NaT to remain unchanged, and I would like all values that are NaT to be timeDelta with values "00:00:00".
Thanks for the help.
So, as it turns out I figured it out on my own, but figured I would post the solution to anybody who might need to know in the future.
I got rid of the "replaceNaT" and simply wrote "0" in where NaT was found. I guess timeDeltas are stored as integers based on the lowest resolution of time they measure, and are only converted to look like they do when they are displayed?
Anyways, here is the code change that worked for me:
startEndEventsDataframe['timeOff'] = np.where(pd.isnull(startEndEventsDataframe['timeOff']) == True, 0, startEndEventsDataframe['timeOff'])

Categories

Resources