How to remove miliseconds from a datetime object in pymongo aggregation - python

I currently am trying to remove milliseconds from a datetime object in my aggregation of a pymongo query.
I'm grouping by one of their IDs, which they all have in common. I also need the date though.
The current date won't let me group correctly because some objects are 1 millisecond different, so it groups with most of the objects in the grouping, but maybe one of the objects gets left out because of the millisecond time difference.
Edit: I've converted it to a string and it seems to have worked. How do I convert it to a date again within the aggregation. $dateFromString is not working
Is there a way that I can remove the milliseconds so the correct dates all correspond to the same ID?

Related

Converting numpy64 objects to Pandas datetime

Question is pretty self-explanatory. I am finding that pd.to_datetime isn't changing anything about the object type and using pd.Timestampe()directly is bombing out.
Before this is marked a duplicate of Converting between datetime, Timestamp and datetime64, I am struggling at changing an entire column of a dataframe not just one datetime object. Perhaps that was in the article but I didn't see it in the top answer.
I will add that my error is occurring when I try to get unique values from the dataframes column. Is using unique converting the dtype to something unwanted?
The method you mentioned pandas.to_datetime() will work on scalars, Series and whole DataFrame if you need, so:
dataFrame['column_date_converted'] = pd.to_datetime(dataFrame['column_to_convert'])

Python Pandas to_datetime Out of bounds nanosecond timestamp on a pandas.datetime

I am using Python 2--I am behind moving over my code--so perhaps this issue has gone away.
Using pandas, I can create a datetime like this:
import pandas as pd
big_date= pd.datetime(9999,12,31)
print big_date
9999-12-31 00:00:00
big_date2 = pd.to_datetime(big_date)
. . .
Out of bounds nanosecond timestamp: 9999-12-31 00:00:00
I understand the reason for the error in that there are obviously too many nanoseconds in a date that big. I also know that big_date2 = pd.to_datetime(big_date, errors='ignore') would work. However, in my situation, I have a column of what are supposed to be dates (read from SQL server) and I do indeed want it to change invalid data/dates to NaT. In effect, I was using pd.to_datetime as a validity check. To Pandas, on the one hand, 9999-12-31 is a valid date, and on the other, it's not. That means I can't use it and have had to come up with something else.
I've played around with the arguments in pandas to_datetime and not been able to solve this.
I've looked at other questions/problems of this nature, and not found an answer.
I have a similar issue and was able to find a solution.
I have a pandas dataframe with one column that contains a datetime (retrieved from a database table where the column was a DateTime2 data type), but I need to be able to represents date that are further in the future than the Timestamp.max value.
Fortunately, I didn't need to worry about the time part of the datetime column - it was actually always 00:00:00 (I didn't create the database design and, yes, it probably should have been a Date data type and not a DateTime2 data type). So I was able to get round the issue by converting the pandas dataframe column to just a date type. For example:
for i, row in df.iterrows():
df.set_value(i, 'DateColumn', datetime.datetime(9999, 12, 31).date())
sets all of the values in the column to the date 9999-12-31 and you don't receive any errors when using this column anymore.
So, if you can afford to lose the time part of the date you are trying to use you can work round the limitation of the datetime values in the dataframe by converting to a date.

SQLite greater than comparison return equal values as well.

I am using SQLite with python. I have a database with two fields (timestamp, reading).
the timestamp is an ISO8601 string formatted like this "YYYY-MM-DD HH:MM:SS.SSSSSS".
When I run this SQL query:
SELECT timestamp, value FROM 'readings' WHERE timestamp > datetime('2017-08-30 14:19:28.684314')
I get all the appropriate readings where the timestamp is since the date provided but I also get the reading from the datetime I pass in (in the example: '2017-08-30 14:19:28.684314').
My question is why is the greater than comparison operator pretending it's a greater than or equal to operator?
SQLite does not have a separate data type for timestamps.
datetime() returns just a string in SQLite's default format:
> select datetime('2017-08-30 14:19:28.684314');
2017-08-30 14:19:28
This does not include milliseconds. So the comparison ends up between a string with milliseconds against a string without milliseconds; the first one is larger because (after the first 19 characters are equal) it has more characters.
Calling datetime() on both values removes the milliseconds from both values.
It might be a better idea to call datetime() on neither value and to compare them directly.
I solve the problem. I will detail it here in case it is helpful to someone else.
It was with my query. SQLite does not have a direct type for date's or datetime's.
My old query:
SELECT timestamp, value FROM 'readings' WHERE timestamp > datetime('2017-08-30 14:19:28.684314')
was implicitly relying on SQL to figure out that the timestamp field was a datetime. SQLite stores them as TEXT fields internally.
When I modified my query to the following:
SELECT timestamp, value FROM 'readings' WHERE datetime(timestamp) > datetime('2017-08-30 14:19:28.684314')
I started to get the results that I was expecting.

Python: creating list of timestamps by minute

I am trying to figure out what the best way to create a list of timestamps in Python is, where the values for the items in the list increment by one minute. The timestamps would be by minute, and would be for the previous 24 hours. I need to create timestamps of the format "MM/dd/yyy HH:mm:ss" or to at least contain all of those measures. The timestamps will be an axis for a graph of data that I am collecting.
Calculating the times alone isn't too bad, as I could just get the current time, convert it to seconds, and change the value by one minute very easily. However, I am kind of stuck on figuring out the date aspect of it without having to do a lot of checking, which doesn't feel very Pythonic.
Is there an easier way to do this? For example, in JavaScript, you can get a Date() object, and simply subtract one minute from the value and JS will take care of figuring out if any of the other fields need to change and how they need to change.
datetime is the way to go, you might want to check out This Blog.
import datetime
import time
now = datetime.datetime.now()
print now
print now.ctime()
print now.isoformat()
print now.strftime("%Y%m%dT%H%M%S")
This would output
2003-08-05 21:36:11.590000
Tue Aug 5 21:36:11 2003
2003-08-05T21:36:11.590000
20030805T213611
You can also do subtraction with datetime and timedelta objects
now = datetime.datetime.now()
minute = timedelta(days=0,seconds=60,microseconds=0)
print now-minute
would output
2015-07-06 10:12:02.349574
You are looking for datetime and timedelta objects. See the docs.

working with dates in pandas - remove unseen characters in datetime and convert to string

I am using pandas to import data dfST = read_csv( ... , parse_dates={'timestamp':[date]})
In my csv, date is in the format YYY/MM/DD, which is all I need - there is no time. I have several data sets that I need to compare for membership. When I convert theses 'timestamp' to a string, sometimes I get something like this:
'1977-07-31T00:00:00.000000000Z'
which I understand is a datetime including milliseconds and a timezone. Is there any way to suppress the addition of the extraneous time on import? If not, I need to exclude it somehow.
dfST.timestamp[1]
Out[138]: Timestamp('1977-07-31 00:00:00')
I have tried formatting it, which seemed to work until I called the formatted values:
dfSTdate=pd.to_datetime(dfST.timestamp, format="%Y-%m-%d")
dfSTdate.head()
Out[123]:
0 1977-07-31
1 1977-07-31
Name: timestamp, dtype: datetime64[ns]
But no... when I test the value of this I also get the time:
dfSTdate[1]
Out[124]: Timestamp('1977-07-31 00:00:00')
When I convert this to an array, the time is included with the millisecond and the timezone, which really messes my comparisons up.
test97=np.array(dfSTdate)
test97[1]
Out[136]: numpy.datetime64('1977-07-30T20:00:00.000000000-0400')
How can I get rid of the time?!?
Ultimately I wish to compare membership among data sets using numpy.in1d with date as a string ('YYYY-MM-DD') as one part of the comparison
This is due to the way datetime values are stored in pandas: using the numpy datetime64[ns] dtype. So datetime values are always stored at nanosecond resolution. Even if you only have a date, this will be converted to a timestamp with a zero time of nanosecond resolution. This is just due to the implementation in pandas.
The issues you have with printing the values and having unexpected results, is just because how these objects are printed in the python console (their representation), not their actual value.
If you print a single values, you get a the Timestamp representation of pandas:
Timestamp('1977-07-31 00:00:00')
So you get the seconds here as well, just because this is the default representation.
If you convert it to an array, and then print it, you get the standard numpy representation:
numpy.datetime64('1977-07-30T20:00:00.000000000-0400')
This is indeed a very misleading representation. Because numpy will, just for printing it in the console, convert it to your local timezone. But this doesn't change your actual value, it's just weird printing.
That is the background, now to answer your question, how do I get rid of the time?
That depends on your goal. Do you really want to convert it to a string? Or do you just don't like the repr?
if you just want to work with the datetime values, you don't need to get rid of it.
if you want to convert it to strings, you can apply strfitme (df['timestamp'].apply(lambda x: x.strftime('%Y-%m-%d'))). Or if it is to write it as strings to csv, use the date_format keyword in to_csv
if you really want a 'date', you can use the datetime.date type (standard python type) in a DataFrame column. You can convert your existing column to this with with: pd.DatetimeIndex(dfST['timestamp']).date. But personally I don't think this has many advantages.

Categories

Resources