Python - Pandas - Difference between timestamps and period range

Python - Pandas - Difference between timestamps and period range - python

I am having troubles understanding the difference between a PeriodIndex and a DateTimeIndex, and when to use which. In particular, it always seemed to be more natural to me to use Periods as opposed to Timestamps, but recently I discovered that Timestamps seem to provide the same indexing capability, can be used with the timegrouper and also work better with Matplotlib's date functionalities. So I am wondering if there is every a reason to use Periods (a PeriodIndex)?

Periods can be use to check if a specific event occurs within a certain period. Basically a Period represents an interval while a Timestamp represents a point in time.
# For example, this will return True since the period is 1Day. This test cannot be done with a Timestamp.
p = pd.Period('2017-06-13')
test = pd.Timestamp('2017-06-13 22:11')
p.start_time < test < p.end_time
I believe the simplest reason for ones to use Periods/Timestamps is whether attributes from a Period and a Timestamp are needed for his/her code.

Related

pandas DatetimeIndex from timestamp

I have a large list of timestamps in nanoseconds (can easily be converted to miliseconds). I now want to make an instance of DatetimeIndex using these timestamps. Yet simply passing
timestamps = [3377536510631, 3377556564631, 3377576837400, 3377596513631, ...]
dti = DatetimeIndex(timestamps)
yields dates at 1970 yet they should be at 2017. Dividing them by a million to get milliseconds gives the same rsult. It seems that the input isn't as expected but I wouldn't know either how to easily set the input correctly or how to set the parameters correctly

Your timestamp probably has a false starting time (wrong offset). This usually happens, if the time is not set correctly on the a measurement device. If you cold-start the measurement, It will probably start at time stamp 0, which is 01/01/1970.
If you know the exact time and date the measurement was started, simply subtract the .mim() value from the time stamp column and add the time stamp of the actual start time to the result.

pandas groupby offsets different start

I have a simple offset question that I cannot seem to find the answer for in the other previous posts. I am trying to groupby weeks, but the default df.groupby(pd.TimeGrouper('1W')) gives me the groupby starting on Sunday.
Say for instance I want this groupby to start on Tuesday. I tried to naively add pd.DateOffset(days=2) as an additional argument but that did not seem to work.

Offset strings can include a component that specifies when the type of period should start.
In your case, you want W-Tue
df.groupby(pd.TimeGrouper('W-Tue'))

How to increment a date using Arrow?

I'm using the arrow module to handle datetime objects in Python. If I get current time like this:
now = arrow.now()
...how do I increment it by one day?

Update as of 2020-07-28
Increment the day
now.shift(days=1)
Decrement the day
now.shift(days=-1)
Original Answer
DEPRECATED as of 2019-08-09
https://arrow.readthedocs.io/en/stable/releases.html
0.14.5 (2019-08-09) [CHANGE] Removed deprecated replace shift functionality. Users looking to pass plural properties to the replace function to shift values should use shift instead.
0.9.0 (2016-11-27) [FIX] Separate replace & shift functions
Increment the day
now.replace(days=1)
Decrement the day
now.replace(days=-1)
I highly recommend the docs.

The docs state that shift is to be used for adding offsets:
now.shift(days=1)
The replace method with arguments like days, hours, minutes, etc. seems to work just as shift does, though replace also has day, hour, minute, etc. arguments that replace the value in given field with the provided value.
In any case, I think e.g. now.shift(hours=-1) is much clearer than now.replace.

See documentation
now = arrow.now()
oneDayFromNow = now.replace(days+=1)

Specify a datetime.date without a day in Python

Is there a way to specify a datetime.date without a day like that:
datetime.date(year=1900, month=1, day=None)
I have a dataset with not full specified dates (sometimes only the year and the month is in it). I want to reprepsent that with a datetime.date without doing tricks.

"Beautiful is better than ugly. Explicit is better than implicit..." - Python's Philosophy
You cannot do that by built-in datetime.date. Python datetime.date doesn't have function signature as not to put the day value. Perhaps it is due to the fact that date without day (of month) is naturally an incomplete date in real life.
Additionally, since day input is seen as Integer thus it must have value. And the default integer value as 0 will cause day representation error (albeit the internal mechanism for counting datetime might work around with it), as our day in real life starts with 1. In short, datetime.date has done a pretty good job (in terms of safe of use) - consistent with its "Explicit is better than implicit philosophy" - by not letting the user to call it without specifying day (that is: by hinting what is required in the function signature as what every good programmer would do).
But, you could create your own function wrapper whenever you feel it is annoying or unnecessary too.
Edit:
or using Python's own wrapper:
monthdate = functools.partial(datetime.date, day=1) #edit by ShadowRanger
To me, what seems to be the simplest practice would be to use the current built-in with the value of day as 1.
datetime.date(1900, 1, 1)
It is a very short ,1 to be added

datetime.date represents a day in Gregorian calendar. It is immutable and therefore all values must be known at the instant it is created. You can't omit the day if you use the constructor explicitly.
I have a dataset with not full specified dates
datetime.strptime() provides the default values if necessary:
>>> from datetime import datetime
>>> datetime.strptime('2016-02', '%Y-%m').date()
datetime.date(2016, 2, 1)

DatetimeIndex for daily data only in pandas

I want to create a custom index based on daily dates, such as:
a = bdate_range('1990-01-01', freq='D', periods=10)
This will create an index with various Timestamp objects:
>>> a[0]
Timestamp('1990-01-01 00:00:00', offset='D')
Unfortunately the Timestamp class seems to initialize the underlying numpy.datetime64 objects every single time with an [ns] flag, i.e. enabling a granularity down to nanoseconds.
This is a total overkill for my data, which requires only daily granularity. Not only that, but allowing for this much granularity restricts the data to start only after the year 1678! (i.e. Timestamp('1677-01-01') will fail). The solution should be that one can somehow set a flag that determines which datetime64 resolution the Timestamp object should use, e.g. something like:
Timestamp('1990-01-01', dtype='datetime64[d]')
and ideally bdate_range or date_range should have a similar flag that one can set, in order to create a whole index of adequately formatted Timestamps.
So long story short, is it possible in pandas to create some type of index (e.g. DatetimeIndex, or maybe DateIndex?) that that is specifically suited to handle daily data only?
Thank you for your replies

I believe the internals of DatetimeIndex are closely tied to a nanosecond resolution, so I don't think there's much that can be done there.
But, as recommended in the "caveats" section of the documentation, a PeriodIndex can be used to represent dates outside the nanosecond resolution.
In [147]: a = pd.period_range('1990-01-01', freq='D', periods=10)
In [148]: a[0]
Out[148]: Period('1990-01-01', 'D')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - Pandas - Difference between timestamps and period range - python

Related

pandas DatetimeIndex from timestamp

pandas groupby offsets different start

How to increment a date using Arrow?

Specify a datetime.date without a day in Python

DatetimeIndex for daily data only in pandas

Categories

Resources