More efficient way to round to day timestamps using pandas

More efficient way to round to day timestamps using pandas - python

I have imported some timestamps into a Pandas frame (via MongoDBclient). They are to the microseconds. I'd like a way to round to the Day. I've seen a previous question about using np.round while converting to ints and back again, but this doesn't work (I tried inlining a div by 3600 x 24 x 100000 but that didn't work).
I have this rather plain version, but it seems REALLY inefficient. What am I missing in either to_datetime, or the np.round example.
df['doa'] = df['doa'].map(lambda x: x.strftime("%Y-%m-%d")
pd.to_datetime(df['doa'])
Note, these are not INDEXES so I can't use the frequency trick.

There's this feature request, which suggests there's no good way:
ENH: add rounding method to DatetimeIndex/TimedeltaIndex
However, in the article I found there is also this approach for minutes, which I might be able to modify:
pd.DatetimeIndex(((dti.asi8/(1e9*60)).round()*1e9*60).astype(np.int64))
Rounding Pandas Timestamp to minutes

Related

Python - Pandas - Difference between timestamps and period range

I am having troubles understanding the difference between a PeriodIndex and a DateTimeIndex, and when to use which. In particular, it always seemed to be more natural to me to use Periods as opposed to Timestamps, but recently I discovered that Timestamps seem to provide the same indexing capability, can be used with the timegrouper and also work better with Matplotlib's date functionalities. So I am wondering if there is every a reason to use Periods (a PeriodIndex)?

Periods can be use to check if a specific event occurs within a certain period. Basically a Period represents an interval while a Timestamp represents a point in time.
# For example, this will return True since the period is 1Day. This test cannot be done with a Timestamp.
p = pd.Period('2017-06-13')
test = pd.Timestamp('2017-06-13 22:11')
p.start_time < test < p.end_time
I believe the simplest reason for ones to use Periods/Timestamps is whether attributes from a Period and a Timestamp are needed for his/her code.

python - parsing mystery date format [duplicate]

This question already has answers here:
Convert weird Python date format to readable date
(2 answers)
Closed 7 years ago.
I'm importing data from an Excel spreadsheet into python. My dates are coming through in a bizarre format of which I am not familiar and cannot parse.
in excel: (7/31/2015)
42216
after I import it:
u'/Date(1438318800000-0500)/'
Two questions:
what format is this and how might I parse it into something more intuitive and easier to read?
is there a robust, swiss-army-knife-esque way to convert dates without specifying input format?

Timezones necessarily make this more complex, so let's ignore them...
As #SteJ remarked, what you get is (close to) the time in seconds since 1st January 1970. Here's a Wikipedia article how that's normally used. Oddly, the string you get seems to have a timezone (-0500, EST in North America) attached. Makes no sense if it's properly UNIX time (which is always in UTC), but we'll pass on that...
Assuming you can get it reduced to a number (sans timezone) the conversion into something sensible in Python is really straight-forward (note the reduction in precision; your original number is the number of milliseconds since the epoch, rather than the standard number of seconds from the epoch):
from datetime import datetime
time_stamp = 1438318800
time_stamp_dt = datetime.fromtimestamp(time_stamp)
You can then get time_stamp_dt into any format you think best using strftime, e.g., time_stamp_dt.strftime('%m/%d/%Y'), which pretty much gives you what you started with.
Now, assuming that the format of the string you provided is fairly regular, we can extract the relevant time quite simply like this:
s = '/Date(1438318800000-0500)/'
time_stamp = int(s[6:16])

Date parsing and timezone adjusting in pandas dataframes

I have about 800,000 rows of data in a dataframe, and one column of the data df['Date'] is string of time and date 'YYYY-MM-DD HH:MM:SS.fff', which doesn't have timezone information. However I know they are in New_York timezone and they need to be convert into CET. Now I have two methods to get the job done:
method 1 (very slow for sure):
df['Date'].apply(lambda x: timezone('America/New_York')\
.localize(datetime.datetime.strptime(x,'%Y%m%d%H:%M:%S.%f'))\
.astimezone(timezone('CET')))
method 2 :
df.index = pd.to_datetime(df['Date'],format='%Y%m%d%H:%M:%S.%f')
df.index.tz_localize('America/New_York').tz_convert('CET')
I am just wondering if there are any other better ways to do it? or any potential pitfalls of the methods I listed? Thanks!
Also, I would like to shift all timestamp by a fix amount of time, such as 1ms timedelta(0,0,1000), how can I implement it using method 2?

Method 2 is definately the best way of doing this.
However, it occurs to me that you are formatting this date after you have loaded the data.
It is much faster to parse dates on load of a file, than it is to change them after you have loaded it. (Not to mention cleaner)
If your data is loaded from a csv file using the pandas.read_csv() function for instance, then you can use the parse_dates= option and the date_parser= option.
You can try it out directly with your lambda function as the date_parser=
and just set the parse_dates= to a list of your date columns.
Like this:
pd.read_csv('myfile.csv', parse_dates=['Date'] date_parser=lambda x: timezone('America/New_York')\
.localize(datetime.datetime.strptime(x,'%Y%m%d%H:%M:%S.%f'))\
.astimezone(timezone('CET')))
Should work and will probably be the fastest.

Python datetime precision

I have a Google App Engine datetime property which I populate with x.date = datetime.datetime.now(). I do a lot of comparisons between dates, and after much debugging, it turns out my client device sends dates out with less precision than a Python date, which caused a terrible mess.
Here is what Python generates:
2012-08-28 21:36:13.158497 with datetime.datetime.now(), but what I want is 2012-08-28 21:36:13.158000 (notice the three zeros at the end.)
How can I achieve this? (keep in mind, I'm not trying to format strings or anything. I want to format a date object.)
I guess one way would be to format it into a string with desired precision, like this:
dateString = date.strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]
and then back to a date object. But there's got to be a better way.

dt = dt.replace(microsecond = (dt.microsecond / 1000) * 1000)
This will truncate the last 3 digits. Proper rounding is a little more complicated due to the possibility that it might round to 1000000 microseconds.

Python's timedelta: can't I just get in whatever time unit I want the value of the entire difference?

I am trying to have some clever dates since a post has been made on my site ("seconds since, hours since, weeks since, etc..") and I'm using datetime.timedelta difference between utcnow and utc dated stored in the database for a post.
Looks like, according to the docs, I have to use the days attribute AND the seconds attribute, to get the fancy date strings I want.
Can't I just get in whatever time unit I want the value of the entire difference? Am I missing something?
It would be perfect if I could just get the entire difference in seconds.

It seems that Python 2.7 has introduced a total_seconds() method, which is what you were looking for, I believe!

You can compute the difference in seconds.
total_seconds = delta.days * 86400 + delta.seconds
No, you're no "missing something". It doesn't provide deltas in seconds.

It would be perfect if I could just get the entire difference in seconds.
Then plain-old-unix-timestamp as provided by the 'time' module may be more to your taste.
I personally have yet to be convinced by a lot of what's in 'datetime'.

Like bobince said, you could use timestamps, like this:
# assuming ts1 and ts2 are the two datetime objects
from time import mktime
mktime(ts1.timetuple()) - mktime(ts2.timetuple())
Although I would think this is even uglier than just calculating the seconds from the timedelta object...

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

More efficient way to round to day timestamps using pandas - python

Related

Python - Pandas - Difference between timestamps and period range

python - parsing mystery date format [duplicate]

Date parsing and timezone adjusting in pandas dataframes

Python datetime precision

Python's timedelta: can't I just get in whatever time unit I want the value of the entire difference?

Categories

Resources