Pandas fillna on datetime object - python

I'm trying to run fillna on a column of type datetime64[ns]. When I run something like:
df['date'].fillna(datetime("2000-01-01"))
I get:
TypeError: an integer is required
Any way around this?

This should work in 0.12 and 0.13 (just released).
#DSM points out that datetimes are constructed like: datetime.datetime(2012,1,1)
SO the error is from failing to construct the value the you are passing to fillna.
Note that using a Timestamp WILL parse the string.
In [3]: s = Series(date_range('20130101',periods=10))
In [4]: s.iloc[3] = pd.NaT
In [5]: s.iloc[7] = pd.NaT
In [6]: s
Out[6]:
0 2013-01-01 00:00:00
1 2013-01-02 00:00:00
2 2013-01-03 00:00:00
3 NaT
4 2013-01-05 00:00:00
5 2013-01-06 00:00:00
6 2013-01-07 00:00:00
7 NaT
8 2013-01-09 00:00:00
9 2013-01-10 00:00:00
dtype: datetime64[ns]
datetime.datetime will work as well
In [7]: s.fillna(Timestamp('20120101'))
Out[7]:
0 2013-01-01 00:00:00
1 2013-01-02 00:00:00
2 2013-01-03 00:00:00
3 2012-01-01 00:00:00
4 2013-01-05 00:00:00
5 2013-01-06 00:00:00
6 2013-01-07 00:00:00
7 2012-01-01 00:00:00
8 2013-01-09 00:00:00
9 2013-01-10 00:00:00
dtype: datetime64[ns]

Right now, df['date'].fillna(pd.Timestamp("20210730")) works in pandas 1.3.1

This example is works with dynamic data if you want to replace NaT data in rows with data from another DateTime data.
df['column_with_NaT'].fillna(df['dt_column_with_thesame_index'], inplace=True)
It's works for me when I was updated some rows in DateTime column and not updated rows had NaT value, and I've been needed to inherit old series data. And this code above resolve my problem. Sry for the not perfect English )

Related

How to append hour:min:sec to the DateTime in pandas Dataframe

I have following dataframe, where date was set as the index col,
date
renormalized
2017-01-01
6
2017-01-08
5
2017-01-15
3
2017-01-22
3
2017-01-29
3
I want to append 00:00:00 to each of the datetime in the index column, make it like
date
renormalized
2017-01-01 00:00:00
6
2017-01-08 00:00:00
5
2017-01-15 00:00:00
3
2017-01-22 00:00:00
3
2017-01-29 00:00:00
3
It seems I got stuck for no solution to make it happen.... It will be great if anyone can help...
Thanks
AL
When your time is 0 for all instances, pandas doesn't show the time by default (although it's a Timestamp class, so it has the time!). Probably your data is already normalized, and you can perform delta time operations as usual.
You can see a target observation with df.index[0] for instance, or take a look at all the times with df.index.time.
You can use DatetimeIndex.strftime
df.index = pd.to_datetime(df.index).strftime('%Y-%m-%d %H:%M:%S')
print(df)
renormalized
date
2017-01-01 00:00:00 6
2017-01-08 00:00:00 5
2017-01-15 00:00:00 3
2017-01-22 00:00:00 3
2017-01-29 00:00:00 3
Or you can choose
df.index = df.index + ' 00:00:00'

Assign first element of groupby to a column yields NaN

Why does this not work out?
I get the right results if I just print it out, but if I use the same to assign it to the df column, I get Nan values...
print(df.groupby('cumsum').first()['Date'])
cumsum
1 2021-01-05 11:00:00
2 2021-01-06 08:00:00
3 2021-01-06 10:00:00
4 2021-01-06 13:00:00
5 2021-01-06 14:00:00
...
557 2021-08-08 08:00:00
558 2021-08-08 09:00:00
559 2021-08-08 11:00:00
560 2021-08-08 13:00:00
561 2021-08-08 18:00:00
Name: Date, Length: 561, dtype: datetime64[ns]
vs
df["Date_First"] = df.groupby('cumsum').first()['Date']
Date
2021-01-01 00:00:00 NaT
2021-01-01 01:00:00 NaT
2021-01-01 02:00:00 NaT
2021-01-01 03:00:00 NaT
2021-01-01 04:00:00 NaT
..
2021-08-08 14:00:00 NaT
2021-08-08 15:00:00 NaT
2021-08-08 16:00:00 NaT
2021-08-08 17:00:00 NaT
2021-08-08 18:00:00 NaT
Name: Date_Last, Length: 5268, dtype: datetime64[ns]
What happens here?
I used an exmpmle form here, but want to get the first elements.
https://www.codeforests.com/2021/03/30/group-consecutive-rows-in-pandas/
What happens here?
If use:
print(df.groupby('cumsum')['Date'].first())
#print(df.groupby('cumsum').first()['Date'])
output are aggregated values by column cumsum with aggregated function first.
So in index are unique values cumsum, so if assign to new column there is mismatch with original index and output are NaNs.
Solution is use GroupBy.transform, which repeat aggregated values to Series (column) with same size like original DataFrame, so index is same like original and assign working perfectly:
df["Date_First"] = df.groupby('cumsum')['Date'].transform("first")

Remove data timestamp and get data only every hours python

I have a bunch of timestamp data in a csv file like this:
2012-01-01 00:00:00, data
2012-01-01 00:01:00, data
2012-01-01 00:02:00, data
...
2012-01-01 00:59:00, data
2012-01-01 01:00:00, data
2012-01-01 01:01:00, data
I want to delete data every minute and only display every hour in python like the following:
2012-01-01 00:00:00, data
2012-01-01 01:00:00, data
2012-01-01 02:00:00, data
Could any one help me? Thank you.
I believe you need to use pandas resample, here's is an example of how it is used to achieve the output you desire. However, keep in mind that since this is a resampling operation during frequency conversion, you must pass a function on how the other columns will beahve (summing all values corresponding to the new timeframe, calculating an average, calculating the difference, etc...) otherwise you will get returned a DatetimeIndexResample. Here is an example:
import pandas as pd
index = pd.date_range('1/1/2000', periods=9, freq='40T')
series = pd.Series(range(9),index=index)
print(series)
Output:
2000-01-01 00:00:00 0
2000-01-01 00:40:00 1
2000-01-01 01:20:00 2
2000-01-01 02:00:00 3
2000-01-01 02:40:00 4
2000-01-01 03:20:00 5
2000-01-01 04:00:00 6
2000-01-01 04:40:00 7
2000-01-01 05:20:00 8
Applying resample hourly without passing the aggregation function:
print(series.resample('H'))
Output:
DatetimeIndexResampler [freq=<Hour>, axis=0, closed=left, label=left, convention=start, base=0]
After passing .sum():
print(series.resample('H').sum())
Output:
2000-01-01 00:00:00 1
2000-01-01 01:00:00 2
2000-01-01 02:00:00 7
2000-01-01 03:00:00 5
2000-01-01 04:00:00 13
2000-01-01 05:00:00 8
Freq: H, dtype: int64

Get weekday/day-of-week for Datetime column of DataFrame

I have a DataFrame df like the following (excerpt, 'Timestamp' are the index):
Timestamp Value
2012-06-01 00:00:00 100
2012-06-01 00:15:00 150
2012-06-01 00:30:00 120
2012-06-01 01:00:00 220
2012-06-01 01:15:00 80
...and so on.
I need a new column df['weekday'] with the respective weekday/day-of-week of the timestamps.
How can I get this?
Use the new dt.dayofweek property:
In [2]:
df['weekday'] = df['Timestamp'].dt.dayofweek
df
Out[2]:
Timestamp Value weekday
0 2012-06-01 00:00:00 100 4
1 2012-06-01 00:15:00 150 4
2 2012-06-01 00:30:00 120 4
3 2012-06-01 01:00:00 220 4
4 2012-06-01 01:15:00 80 4
In the situation where the Timestamp is your index you need to reset the index and then call the dt.dayofweek property:
In [14]:
df = df.reset_index()
df['weekday'] = df['Timestamp'].dt.dayofweek
df
Out[14]:
Timestamp Value weekday
0 2012-06-01 00:00:00 100 4
1 2012-06-01 00:15:00 150 4
2 2012-06-01 00:30:00 120 4
3 2012-06-01 01:00:00 220 4
4 2012-06-01 01:15:00 80 4
Strangely if you try to create a series from the index in order to not reset the index you get NaN values as does using the result of reset_index to call the dt.dayofweek property without assigning the result of reset_index back to the original df:
In [16]:
df['weekday'] = pd.Series(df.index).dt.dayofweek
df
Out[16]:
Value weekday
Timestamp
2012-06-01 00:00:00 100 NaN
2012-06-01 00:15:00 150 NaN
2012-06-01 00:30:00 120 NaN
2012-06-01 01:00:00 220 NaN
2012-06-01 01:15:00 80 NaN
In [17]:
df['weekday'] = df.reset_index()['Timestamp'].dt.dayofweek
df
Out[17]:
Value weekday
Timestamp
2012-06-01 00:00:00 100 NaN
2012-06-01 00:15:00 150 NaN
2012-06-01 00:30:00 120 NaN
2012-06-01 01:00:00 220 NaN
2012-06-01 01:15:00 80 NaN
EDIT
As pointed out to me by user #joris you can just access the weekday attribute of the index so the following will work and is more compact:
df['Weekday'] = df.index.weekday
If the Timestamp column is a datetime value, then you can just use:
df['weekday'] = df['Timestamp'].apply(lambda x: x.weekday())
or
df['weekday'] = pd.to_datetime(df['Timestamp']).apply(lambda x: x.weekday())
You can get with this way:
import datetime
df['weekday'] = pd.Series(df.index).dt.day_name()
In case somebody else has the same issue with a multiindexed dataframe, here is what solved it for me, based on #joris solution:
df['Weekday'] = df.index.get_level_values(1).weekday
for me date was the get_level_values(1) instead of get_level_values(0), which would work for the outer index.
As of pandas 1.1.0 dt.dayofweek is deprecated, so instead of:
df['weekday'] = df['Timestamp'].dt.dayofweek
from #EdChum and #Artyom Krivolapov
you can now use:
df['weekday'] = df['Timestamp'].dt.isocalendar().day

Convert column of date objects in Pandas DataFrame to strings

How to convert a column consisting of datetime64 objects to a strings that would read
01-11-2013 for today's date of November 1.
I have tried
df['DateStr'] = df['DateObj'].strftime('%d%m%Y')
but I get this error
AttributeError: 'Series' object has no attribute 'strftime'
As of version 17.0, you can format with the dt accessor:
df['DateStr'] = df['DateObj'].dt.strftime('%d%m%Y')
In [6]: df = DataFrame(dict(A = date_range('20130101',periods=10)))
In [7]: df
Out[7]:
A
0 2013-01-01 00:00:00
1 2013-01-02 00:00:00
2 2013-01-03 00:00:00
3 2013-01-04 00:00:00
4 2013-01-05 00:00:00
5 2013-01-06 00:00:00
6 2013-01-07 00:00:00
7 2013-01-08 00:00:00
8 2013-01-09 00:00:00
9 2013-01-10 00:00:00
In [8]: df['A'].apply(lambda x: x.strftime('%d%m%Y'))
Out[8]:
0 01012013
1 02012013
2 03012013
3 04012013
4 05012013
5 06012013
6 07012013
7 08012013
8 09012013
9 10012013
Name: A, dtype: object
It works directly if you first set as index. Then essentially you pass a 'DatetimeIndex' object and not a 'Series'
df = df.set_index('DateObj').copy()
df['DateStr'] = df.index.strftime('%d%m%Y')

Categories

Resources