I have a Series containing datetime64[ns] elements called series, and would like to increment the months. I thought the following would work fine, but it doesn't:
series.dt.month += 1
The error is
ValueError: modifications to a property of a datetimelike object are not supported. Change values on the original.
Is there a simple way to achieve this without needing to redefine things?
First, I created timeseries date example:
import datetime
t = [datetime.datetime(2015,4,18,23,33,58),datetime.datetime(2015,4,19,14,32,8),datetime.datetime(2015,4,20,18,42,44),datetime.datetime(2015,4,20,21,41,19)]
import pandas as pd
df = pd.DataFrame(t,columns=['Date'])
Timeseries:
df
Out[]:
Date
0 2015-04-18 23:33:58
1 2015-04-19 14:32:08
2 2015-04-20 18:42:44
3 2015-04-20 21:41:19
Now increment part, you can use offset option.
df['Date']+pd.DateOffset(days=30)
Output:
df['Date']+pd.DateOffset(days=30)
Out[66]:
0 2015-05-18 23:33:58
1 2015-05-19 14:32:08
2 2015-05-20 18:42:44
3 2015-05-20 21:41:19
Name: Date, dtype: datetime64[ns]
Related
I would like to (manually) create in Python a dataframe with daily dates (in column 'date') as per below code.
But the code does not provide the correct format for the daily dates, neglects dates (the desired format representation is below).
Could you please advise how I can correct the code so that the 'date' column is entered in a desired format?
Thanks in advance!
------------------------------------------------------
desired format for date column
2021-03-22 3
2021-04-07 3
2021-04-18 3
2021-05-12 0
------------------------------------------------------
df1 = pd.DataFrame({"date": [2021-3-22, 2021-4-7, 2021-4-18, 2021-5-12],
"x": [3, 3, 3, 0 ]})
df1
date x
0 1996 3
1 2010 3
2 1999 3
3 2004 0
Python wants to interpret the numbers in the sequence 2021-3-22 as a series of mathematical operations 2021 minus 3 minus 22.
If you want that item to be stored as a string that resembles a date you will need to mark them as string literal datatype (str), as shown below by encapsulating them with quotes.
import pandas as pd
df1 = pd.DataFrame({"date": ['2021-3-22', '2021-4-7', '2021-4-18', '2021-5-12'],
"x": [3, 3, 3, 0 ]})
The results for the date column, as shown here indicate that the date column contains elements of the object datatype which encompasses str in pandas. Notice that the strings were created exactly as shown (2021-3-22 instead of 2021-03-22).
0 2021-3-22
1 2021-4-7
2 2021-4-18
3 2021-5-12
Name: date, dtype: object
IF however, you actually want them stored as datetime objects so that you can do datetime manipulations on them (i.e. determine the number of days between two dates OR filter by a specific month OR year) then you need to convert the values to datetime objects.
This technique will do that:
df1['date'] = pd.to_datetime(df1['date'])
The results of this conversion are Pandas datetime objects which enable nanosecond precision (I differentiate this from Python datetime objects which are limited to microsecond precision).
0 2021-03-22
1 2021-04-07
2 2021-04-18
3 2021-05-12
Name: date, dtype: datetime64[ns]
Notice the displayed results are now formatted just as you would expect of datetimes (2021-03-22 instead of 2021-3-22).
You would want to create the series as a datetime and use the following codes when doing so as strings, more info here pandas.to_datetime:
df1 = pd.DataFrame({"date": pd.to_datetime(["2021-3-22", "2021-4-7", "2021-4-18", "2021-5-12"]),
"x": [3, 3, 3, 0 ]})
FWIW, I often use pd.read_csv(io.StringIO(text)) to copy/paste tabular-looking data into a DataFrame (for example, from SO questions).
Example:
import io
import re
import pandas as pd
def df_read(txt, **kwargs):
txt = '\n'.join([s.strip() for s in txt.splitlines()])
return pd.read_csv(io.StringIO(re.sub(r' +', '\t', txt)), sep='\t', **kwargs)
txt = """
date value
2021-03-22 3
2021-04-07 3
2021-04-18 3
2021-05-12 0
"""
df = df_read(txt, parse_dates=['date'])
>>> df
date value
0 2021-03-22 3
1 2021-04-07 3
2 2021-04-18 3
3 2021-05-12 0
>>> df.dtypes
date datetime64[ns]
value int64
dtype: object
sorry if this question has been asked before but I can't seem to find one that describes my current issue.
Basically, I have a large climate dataset that is not bound to "real" dates. The dataset starts at "year one" and goes to "year 9999". These dates are stored as strings such as Jan-01, Feb-01, Mar-01 etc, where the number indicates the year. When trying to convert this column to date time objects, I get an out of range error. (My reading into this suggests this is due to a 64bit limit on the possible datetime timestamps that can exist)
What is a good way to work around this problem/process the date information so I can effectively plot the associated data vs these dates, over this ~10,000 year period?
Thanks
the cftime library was created specifically for this purpose, and xarray has a convenient xr.cftime_range function that makes creating such a range easy:
In [3]: import xarray as xr, pandas as pd
In [4]: date_range = xr.cftime_range('0001-01-01', '9999-01-01', freq='D')
In [5]: type(date_range)
Out[5]: xarray.coding.cftimeindex.CFTimeIndex
This creates a CFTimeIndex object which plays nicely with pandas:
In [8]: df = pd.DataFrame({"date": date_range, "vals": range(len(date_range))})
In [9]: df
Out[9]:
date vals
0 0001-01-01 00:00:00 0
1 0001-01-02 00:00:00 1
2 0001-01-03 00:00:00 2
3 0001-01-04 00:00:00 3
4 0001-01-05 00:00:00 4
... ... ...
3651692 9998-12-28 00:00:00 3651692
3651693 9998-12-29 00:00:00 3651693
3651694 9998-12-30 00:00:00 3651694
3651695 9998-12-31 00:00:00 3651695
3651696 9999-01-01 00:00:00 3651696
[3651697 rows x 2 columns]
I have a pandas series like the one below:
import pandas as pd
import numpy as np
s = pd.Series(np.array([20201018, 20201019, 20201020]), index = [0, 1, 2])
s = pd.to_datetime(s, format='%Y%m%d')
print(s)
0 2020-10-18
1 2020-10-19
2 2020-10-20
dtype: datetime64[ns]
I want to check if say the date 2020-10-18 is present in the series. If I do the below I get false.
date = pd.to_datetime(20201018, format='%Y%m%d')
print(date in s)
I guess this is due to the series containing the date in the type datetime64[ns] while the object I created is of type `pandas._libs.tslibs.timestamps.Timestamp'. How can I go about checking if a date is present in such a series?
Actually date in s will check for date in s.index. For example:
0 in s
returns True since s.index is [0,1,2].
For this case, use comparison:
s.eq(date).any()
or, for several dates, use isin:
s.isin([date1, date2]).any()
I have a csv that looks like this
time,result
1308959819,1
1379259923,2
1318632821,3
1375216682,2
1335930758,4
times are in unix format. I want to extract the hours from such times and groupby the file with respect to such values.
I tried
times = pd.to_datetime(df.time, unit='s')
or even
times = pd.DataFrame(pd.to_datetime(df.time, unit='s'))
but in both cases I got an error with
times.hour
>>>AttributeError: 'DataFrame' object has no attribute 'hour'
You're getting that error because Series and DataFrames don't have hour attributes. You can access the information you want using the .dt convenience accessor (docs here):
>>> times = pd.to_datetime(df.time, unit='s')
>>> times
0 2011-06-24 23:56:59
1 2013-09-15 15:45:23
2 2011-10-14 22:53:41
3 2013-07-30 20:38:02
4 2012-05-02 03:52:38
Name: time, dtype: datetime64[ns]
>>> times.dt
<pandas.tseries.common.DatetimeProperties object at 0xb5de94c>
>>> times.dt.hour
0 23
1 15
2 22
3 20
4 3
dtype: int64
You can use the builtin datetime class to do this.
import datetime
# your code here
hours = datetime.datetime.fromtimestamp(df.time).hour
Objective:
To create an Index that accommodates a pre-existing set of price data from a csv file. I can build an index using list comprehensions. If it's done in that way, the construction would give me a filtered list of length 86,772--when run over 1/3/2007-8/30/2012 for 42 times (i.e. 10 minute intervals). However, my data of prices coming from the csv is length: 62,034. Observe that the difference in length is due to data cleaning issues.
That said, I am not sure how to overcome the apparent mismatch between the real data and this pre-built (list comp) dataframe.
Attempt:
Am I using the first two lines incorrectly?
data=pd.read_csv('___.csv', parse_dates={'datetime':[0,1]}).set_index('datetime')
dt_index = pd.DatetimeIndex([datetime.combine(i.date,i.time) for i in data.index])
ts = pd.Series(data.prices.values, dt_index)
Questions:
As I understand it, I should use 'combine' since I want the index construction to be completely informed by my csv file. And, 'combine' returns a new datetime object whose date components are equal to the given date object’s, and whose time components are equal to the given time object’s.
When I parse_dates, is it lumping the time and date together and considering it to be a 'date'?
Is there a better way to achieve the stated objective?
Traceback Error:
AttributeError: 'unicode' object has no attribute 'date'
You can write this neatly as follows:
ts = df1.prices
Here's an example:
In [1]: df = pd.read_csv('prices.csv',
parse_dates={'datetime': [0,1]}).set_index('datetime')
In [2]: df # dataframe
Out[2]:
prices duty
datetime
2012-11-12 10:00:00 1 0
2012-12-12 10:00:00 2 0
2012-12-12 10:00:00 3 1
In [3]: df.prices # timeseries
Out[3]:
datetime
2012-11-12 10:00:00 1
2012-12-12 10:00:00 2
2012-12-12 11:00:00 3
Name: prices
In [4]: ts = df.prices
You can groupby date like so (similar to this example from the docs):
In [5]: key = lambda x: x.date()
In [6]: df.groupby(key).sum()
Out[6]:
prices duty
2012-11-12 1 0
2012-12-12 5 1
In [7]: ts.groupby(key).sum()
Out[7]:
2012-11-12 1
2012-12-12 5
Where prices.csv contains:
date,time,prices,duty
11/12/2012,10:00,1,0
12/12/2012,10:00,2,0
12/12/2012,11:00,3,1