pandas DatetimeIndex from timestamp - python

I have a large list of timestamps in nanoseconds (can easily be converted to miliseconds). I now want to make an instance of DatetimeIndex using these timestamps. Yet simply passing
timestamps = [3377536510631, 3377556564631, 3377576837400, 3377596513631, ...]
dti = DatetimeIndex(timestamps)
yields dates at 1970 yet they should be at 2017. Dividing them by a million to get milliseconds gives the same rsult. It seems that the input isn't as expected but I wouldn't know either how to easily set the input correctly or how to set the parameters correctly

Your timestamp probably has a false starting time (wrong offset). This usually happens, if the time is not set correctly on the a measurement device. If you cold-start the measurement, It will probably start at time stamp 0, which is 01/01/1970.
If you know the exact time and date the measurement was started, simply subtract the .mim() value from the time stamp column and add the time stamp of the actual start time to the result.

Related

Cumulative time strings to pandas datetime format

I am trying to work with some time series data that are in cumulative hours, however I am having trouble getting the times to convert to datetime correctly.
csv format
cumulative_time,temperature
01:03:10,30,
02:03:10,31,
...
22:03:10,30,
23:03:10,29,
24:03:09,29,
25:03:09,25,
etc
df['cumulative_time']=pd.to_datetime(df['cumulative_time'],format='%H:%M:%S').dt.time
keeps yielding the error:
time data '24:03:09' does not match format '%H:%M:%S'
Any thoughts on how to convert just times to datetime format, especially if the hours exceed 24 hours?
You probably want the pd.to_timedelta function instead.
A "datetime" is a point in time, eg. "at 3pm in the afternoon"; it's complaining about "24:03:09" because that's 0:03:09 the next day.
A "timedelta" is an amount of elapsed time.

Is there an easy way to plot and manipulate time duration (hours/minutes/seconds) data in Python? NOT datetime data

I'm working with some video game speedrunning (basically, races where people try to beat a game as fast as they can) data, and I have many different run timings in HH:MM:SS format. I know it's possible to convert to seconds, but I want to keep in this format for the purposes of making the axes on any graphs easy to read.
I have all the data in a data frame already and tried converting the timing data to datetime format, with format = '%H:%M:%S', but it just uses this as the time on 1900-01-01.
data=[['Aggy','01:02:32'], ['Kirby','01:04:54'],['Sally','01:06:04']]
df=pd.DataFrame(data, columns=['Runner','Time'])
df['Time']=pd.to_datetime(df['Time'], format='%H:%M:%S')
I thought specifying the format to be just hours/minutes/seconds would strip away any date, but when I print out the header of my dataframe, it says that the time data is now 1900-01-01 01:02:32, as an example. 1:02:32 AM on January 1st, 1900. I want Python to recognize the 1:02:32 as a duration of time, not a datetime format. What's the best way to go about this?
The format argument defines the format of the input date, not the format of the resulting datetime object (reference).
For your needs you can either use the H:m:s part of the datetime, or use the to_timedelta
method.

Python - Pandas - Difference between timestamps and period range

I am having troubles understanding the difference between a PeriodIndex and a DateTimeIndex, and when to use which. In particular, it always seemed to be more natural to me to use Periods as opposed to Timestamps, but recently I discovered that Timestamps seem to provide the same indexing capability, can be used with the timegrouper and also work better with Matplotlib's date functionalities. So I am wondering if there is every a reason to use Periods (a PeriodIndex)?
Periods can be use to check if a specific event occurs within a certain period. Basically a Period represents an interval while a Timestamp represents a point in time.
# For example, this will return True since the period is 1Day. This test cannot be done with a Timestamp.
p = pd.Period('2017-06-13')
test = pd.Timestamp('2017-06-13 22:11')
p.start_time < test < p.end_time
I believe the simplest reason for ones to use Periods/Timestamps is whether attributes from a Period and a Timestamp are needed for his/her code.

Python: creating list of timestamps by minute

I am trying to figure out what the best way to create a list of timestamps in Python is, where the values for the items in the list increment by one minute. The timestamps would be by minute, and would be for the previous 24 hours. I need to create timestamps of the format "MM/dd/yyy HH:mm:ss" or to at least contain all of those measures. The timestamps will be an axis for a graph of data that I am collecting.
Calculating the times alone isn't too bad, as I could just get the current time, convert it to seconds, and change the value by one minute very easily. However, I am kind of stuck on figuring out the date aspect of it without having to do a lot of checking, which doesn't feel very Pythonic.
Is there an easier way to do this? For example, in JavaScript, you can get a Date() object, and simply subtract one minute from the value and JS will take care of figuring out if any of the other fields need to change and how they need to change.
datetime is the way to go, you might want to check out This Blog.
import datetime
import time
now = datetime.datetime.now()
print now
print now.ctime()
print now.isoformat()
print now.strftime("%Y%m%dT%H%M%S")
This would output
2003-08-05 21:36:11.590000
Tue Aug 5 21:36:11 2003
2003-08-05T21:36:11.590000
20030805T213611
You can also do subtraction with datetime and timedelta objects
now = datetime.datetime.now()
minute = timedelta(days=0,seconds=60,microseconds=0)
print now-minute
would output
2015-07-06 10:12:02.349574
You are looking for datetime and timedelta objects. See the docs.

DatetimeIndex for daily data only in pandas

I want to create a custom index based on daily dates, such as:
a = bdate_range('1990-01-01', freq='D', periods=10)
This will create an index with various Timestamp objects:
>>> a[0]
Timestamp('1990-01-01 00:00:00', offset='D')
Unfortunately the Timestamp class seems to initialize the underlying numpy.datetime64 objects every single time with an [ns] flag, i.e. enabling a granularity down to nanoseconds.
This is a total overkill for my data, which requires only daily granularity. Not only that, but allowing for this much granularity restricts the data to start only after the year 1678! (i.e. Timestamp('1677-01-01') will fail). The solution should be that one can somehow set a flag that determines which datetime64 resolution the Timestamp object should use, e.g. something like:
Timestamp('1990-01-01', dtype='datetime64[d]')
and ideally bdate_range or date_range should have a similar flag that one can set, in order to create a whole index of adequately formatted Timestamps.
So long story short, is it possible in pandas to create some type of index (e.g. DatetimeIndex, or maybe DateIndex?) that that is specifically suited to handle daily data only?
Thank you for your replies
I believe the internals of DatetimeIndex are closely tied to a nanosecond resolution, so I don't think there's much that can be done there.
But, as recommended in the "caveats" section of the documentation, a PeriodIndex can be used to represent dates outside the nanosecond resolution.
In [147]: a = pd.period_range('1990-01-01', freq='D', periods=10)
In [148]: a[0]
Out[148]: Period('1990-01-01', 'D')

Categories

Resources