numpy.array.tolist() converts numpy.datetime64 to int

numpy.array.tolist() converts numpy.datetime64 to int - python

I have an array of datetimes that I need to convert to a list of datetimes. My array looks like this:
import numpy as np
my_array = np.array(['2017-06-28T22:47:51.213500000', '2017-06-28T22:48:37.570900000',
'2017-06-28T22:49:46.736800000', '2017-06-28T22:50:41.866800000',
'2017-06-28T22:51:17.024100000', '2017-06-28T22:51:24.038300000'], dtype='datetime64[ns]')
my_list = my_array.tolist()
I need a list of datetime values, but when I do my_array.tolist(), I get a list of numerical time stamps:
[1498690071213500000,
1498690117570900000,
1498690186736800000,
1498690241866800000,
1498690277024100000,
1498690284038300000]
My question is how do I preserve the datetime format when going from an array to a list, or how do I convert the list of time stamps to a list datetime values?

NumPy can't convert instances of 'datetime64[ns]' to Python datetime.datetime instances, because datetime instances do not support nanosecond resolution.
If you cast the array to 'datetime64[us]', so the timestamps have only microsecond resolution, then the .tolist() method will give you datetime.datetime instances:
In [25]: my_array
Out[25]:
array(['2017-06-28T22:47:51.213500000', '2017-06-28T22:48:37.570900000',
'2017-06-28T22:49:46.736800000', '2017-06-28T22:50:41.866800000',
'2017-06-28T22:51:17.024100000', '2017-06-28T22:51:24.038300000'],
dtype='datetime64[ns]')
In [26]: my_array.astype('datetime64[us]').tolist()
Out[26]:
[datetime.datetime(2017, 6, 28, 22, 47, 51, 213500),
datetime.datetime(2017, 6, 28, 22, 48, 37, 570900),
datetime.datetime(2017, 6, 28, 22, 49, 46, 736800),
datetime.datetime(2017, 6, 28, 22, 50, 41, 866800),
datetime.datetime(2017, 6, 28, 22, 51, 17, 24100),
datetime.datetime(2017, 6, 28, 22, 51, 24, 38300)]

Explicitly casting the numpy.ndarray as a native Python list will preserve the contents as numpy.datetime64 objects:
>>> list(my_array)
[numpy.datetime64('2017-06-28T22:47:51.213500000'),
numpy.datetime64('2017-06-28T22:48:37.570900000'),
numpy.datetime64('2017-06-28T22:49:46.736800000'),
numpy.datetime64('2017-06-28T22:50:41.866800000'),
numpy.datetime64('2017-06-28T22:51:17.024100000'),
numpy.datetime64('2017-06-28T22:51:24.038300000')]
However, if you wanted to go back from an integer timestamp to a numpy.datetime64 object, the number given here by numpy.ndarray.tolist is given in nanosecond format, so you could also use a list comprehension like the following:
>>> [np.datetime64(x, "ns") for x in my_list]
[numpy.datetime64('2017-06-28T22:47:51.213500000'),
numpy.datetime64('2017-06-28T22:48:37.570900000'),
numpy.datetime64('2017-06-28T22:49:46.736800000'),
numpy.datetime64('2017-06-28T22:50:41.866800000'),
numpy.datetime64('2017-06-28T22:51:17.024100000'),
numpy.datetime64('2017-06-28T22:51:24.038300000')]
And if you want the final result as a Python datetime.datetime object instead of a numpy.datetime64 object, you can use a method like this (adjusted as needed for locality):
>>> from datetime import datetime
>>> list(map(datetime.utcfromtimestamp, my_array.astype(np.uint64) / 1e9))
[datetime.datetime(2017, 6, 28, 22, 47, 51, 213500),
datetime.datetime(2017, 6, 28, 22, 48, 37, 570900),
datetime.datetime(2017, 6, 28, 22, 49, 46, 736800),
datetime.datetime(2017, 6, 28, 22, 50, 41, 866800),
datetime.datetime(2017, 6, 28, 22, 51, 17, 24100),
datetime.datetime(2017, 6, 28, 22, 51, 24, 38300)]
Edit: Warren Weckesser's answer provides a more straightforward approach to go from a numpy.datetime64[ns] array to a list of Python datetime.datetime objects than is described here.

Try
# convert to string type first
my_list = my_array.astype(str).tolist()
my_list
# ['2017-06-28T22:47:51.213500000', '2017-06-28T22:48:37.570900000', '2017-06-28T22:49:46.736800000', '2017-06-28T22:50:41.866800000', '2017-06-28T22:51:17.024100000', '2017-06-28T22:51:24.038300000']
The other answers provide a more straightforward ways but for completeness, you can call datetime.datetime.fromtimestamp in a loop
from datetime import datetime
[datetime.fromtimestamp(x) for x in my_array.astype(object)/1e9]
#[datetime.datetime(2017, 6, 28, 15, 47, 51, 213500),
# datetime.datetime(2017, 6, 28, 15, 48, 37, 570900),
# datetime.datetime(2017, 6, 28, 15, 49, 46, 736800),
# datetime.datetime(2017, 6, 28, 15, 50, 41, 866800),
# datetime.datetime(2017, 6, 28, 15, 51, 17, 24100),
# datetime.datetime(2017, 6, 28, 15, 51, 24, 38300)]

Related

transform a list of datetimes in a pandas column to list of strings

I have the following pandas dataframe
import pandas as pd
import datetime
foo = pd.DataFrame({'id': [1,2], 'time' :['[datetime.datetime(2021, 10, 20, 14, 29, 51), datetime.datetime(2021, 10, 20, 14, 46, 8)]', '[datetime.datetime(2021, 10, 20, 15, 0, 44), datetime.datetime(2021, 10, 20, 16, 13, 42)]']})
foo
id time
0 1 [datetime.datetime(2021, 10, 20, 14, 29, 51), datetime.datetime(2021, 10, 20, 14, 46, 8)]
1 2 [datetime.datetime(2021, 10, 20, 15, 0, 44), datetime.datetime(2021, 10, 20, 16, 13, 42)]
I would like to transform each element of the lists in the time column to a string with the format '%Y/%m/%d %H:%M:%S'
I know I can do this:
t = datetime.datetime(2021, 10, 20, 14, 29, 51)
t.strftime('%Y/%m/%d %H:%M:%S')
to yield the value '2021/10/20 14:29:51',
but I do not know how to do this operation for every string element of each list in the time column.
Any help ?

You just need to use list comprehension inside apply after converting string lists to actual lists with eval:
foo.time.apply(lambda str_list: [item.strftime('%Y/%m/%d %H:%M:%S') for item in eval(str_list)])

You can separate the list into rows first with explode and then use the dt accessor in pandas:
(foo
.explode('time')
.assign(time=lambda x: x.time.dt.strftime('%Y/%m/%d %H:%M:%S'))
)

Convert the date time stamp to local epoch datatime

Assume this is my date time stamp list:
[datetime.datetime(2017, 11, 17, 9, 33, 11), datetime.datetime(2017, 11, 17, 9, 33, 36), datetime.datetime(2017, 11, 17, 9, 33, 48)]
A lot of examples have been given for converting these values to epoch, but the values are in GMT.
How do we convert it to Epoch Local time?
To put it in simpler words. The general epoch conversion coverts the given date into epoch considering that the datetime given is in GMT. But the given date time is localtime!

>>> x = datetime.datetime(2017, 11, 17, 9, 33, 36)
>>> x.timestamp()
1510882416.0
>>> x.ctime()
'Fri Nov 17 09:33:36 2017'

You can use this:
l = [datetime.datetime(2017, 11, 17, 9, 33, 11), datetime.datetime(2017, 11, 17, 9, 33, 36), datetime.datetime(2017, 11, 17, 9, 33, 48)]
epo = [x.strftime('%s') for x in l]
print(epo)
# ['1510898591', '1510898616', '1510898628']

You can try arrow lib:
import arrow
import datetime
x = datetime.datetime(2017, 11, 17, 9, 33, 36)
time = arrow.get(x, 'local').shift(hours=-6) // here you can change hours to what you want
print time
>>2017-11-17T03:33:36+00:00
time = time.format('YYYY-MM-DD HH:mm:ss')
print time
>>2017-11-17 03:33:36

How to change the date format of the whole column?

I am analyzing the .csv file and in this my first column is of the datetime in the format "2016-09-15T00:00:13" and I want to change this format to standard python datetime object.I can change the format for one but date but for whole column I can not do that.
My code that I am using:
import numpy
import dateutil.parser
mydate = dateutil.parser.parse(numpy.mydata[1:,0])
print(mydate)
I am getting the error:
'module' object has no attribute 'mydata'
Here is the column for which I want the format to be changed.
print(mydata[1:,0])
['2016-09-15T00:00:13'
'2016-09-15T00:00:38'
'2016-09-15T00:00:53'
...,
'2016-09-15T23:59:28'
'2016-09-15T23:59:37'
'2016-09-15T23:59:52']

from datetime import datetime
for date in mydata:
date_object = datetime.strptime(date, '%Y-%m-%dT%H:%M:%S')
Here's a link to the method I'm using. That same link also lists the format arguments.
Oh and about the
'module' object has no attribute 'mydata'
You call numpy.mydata which is a reference to the "mydata" attribute of the numpy module you imported. The problem is, is that "mydata" is just one of your variables, not something included with numpy.

Unless you have a compelling reason to avoid it, pandas is the way to go with this kind of analysis. You can simply do
import pandas
df = pandas.read_csv('myfile.csv', parse_dates=True)
This will assume the first column is the index column and parse dates in it. This is probably what you want.

Assuming you've dealt with that numpy.mydata[1:,0] attribute error
Your data looks like:
In [268]: mydata=['2016-09-15T00:00:13' ,
...: '2016-09-15T00:00:38' ,
...: '2016-09-15T00:00:53' ,
...: '2016-09-15T23:59:28' ,
...: '2016-09-15T23:59:37' ,
...: '2016-09-15T23:59:52']
or in array form it is a ld array of strings
In [269]: mydata=np.array(mydata)
In [270]: mydata
Out[270]:
array(['2016-09-15T00:00:13', '2016-09-15T00:00:38', '2016-09-15T00:00:53',
'2016-09-15T23:59:28', '2016-09-15T23:59:37', '2016-09-15T23:59:52'],
dtype='<U19')
numpy has a version of datetime that stores as a 64 bit float, and can be used numerically. Your dates readily convert to that with astype (your format is standard):
In [271]: mydata.astype(np.datetime64)
Out[271]:
array(['2016-09-15T00:00:13', '2016-09-15T00:00:38', '2016-09-15T00:00:53',
'2016-09-15T23:59:28', '2016-09-15T23:59:37', '2016-09-15T23:59:52'],
dtype='datetime64[s]')
tolist converts this array to a list - and the dates to datetime objects:
In [274]: D.tolist()
Out[274]:
[datetime.datetime(2016, 9, 15, 0, 0, 13),
datetime.datetime(2016, 9, 15, 0, 0, 38),
datetime.datetime(2016, 9, 15, 0, 0, 53),
datetime.datetime(2016, 9, 15, 23, 59, 28),
datetime.datetime(2016, 9, 15, 23, 59, 37),
datetime.datetime(2016, 9, 15, 23, 59, 52)]
which could be turned back into an array of dtype object:
In [275]: np.array(D.tolist())
Out[275]:
array([datetime.datetime(2016, 9, 15, 0, 0, 13),
datetime.datetime(2016, 9, 15, 0, 0, 38),
datetime.datetime(2016, 9, 15, 0, 0, 53),
datetime.datetime(2016, 9, 15, 23, 59, 28),
datetime.datetime(2016, 9, 15, 23, 59, 37),
datetime.datetime(2016, 9, 15, 23, 59, 52)], dtype=object)
These objects couldn't be used in array calculations. The list would be just as useful.
If your string format wasn't standard you'd have to use the datetime parser in a list comprehension as #staples shows.

python combining a range and a list of numbers

range(5, 15) [1, 1, 5, 6, 10, 10, 10, 11, 17, 28]
range(6, 24) [4, 10, 10, 10, 15, 16, 18, 20, 24, 30]
range(7, 41) [9, 18, 19, 23, 23, 26, 28, 40, 42, 44]
range(11, 49) [9, 23, 24, 27, 29, 31, 43, 44, 45, 45]
range(38, 50) [1, 40, 41, 42, 44, 48, 49, 49, 49, 50]
I get the above outpout from a print command from a function. What I really want is a combined list of the range, for example in the top line 5,6,7...15,1,1,5,6 etc.
The output range comes from
range_draws=range(int(lower),int(upper))
which I naively thought would give a range. The other numbers come from a sliced list.
Could someone help me to get the desired result.

The range() function returns a special range object to save on memory (no need to keep all the numbers in memory when only the start, end and step size will do). Cast it to a list to 'expand' it:
list(yourrange) + otherlist
To quote the documentation:
The advantage of the range type over a regular list or tuple is that a range object will always take the same (small) amount of memory, no matter the size of the range it represents (as it only stores the start, stop and step values, calculating individual items and subranges as needed).

fast categorization (binning)

I've a huge number of entries, every one is a float number. These data x are accesible with an iterator. I need to classify all the entries using selection like 10<y<=20, 20<y<=50, .... where y are data from an other iterables. The number of entries is much more than the number of selections. At the end I want a dictionary like:
{ 0: [all events with 10<x<=20],
1: [all events with 20<x<=50], ... }
or something similar. For example I'm doing:
for x, y in itertools.izip(variable_values, binning_values):
thebin = binner_function(y)
self.data[tuple(thebin)].append(x)
in general y is multidimensional.
This is very slow, is there a faster solution, for example with numpy? I think the problem cames from the list.append method I'm using and not from the binner_function

A fast way to get the assignments in numpy is using np.digitize:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.digitize.html
You'd still have to split the resulting assignments up into groups. If x or y is multidimensional, you will have to flatten the arrays first. You could then get the unique bin assignments, and then iterate over those in conjunction with np.where to split the the assigments up into groups. This will probably be faster if the number of bins is much smaller than the number of elements that need to be binned.
As a somewhat trivial example that you will need to tweak/elaborate on for your particular problem (but is hopefully enough to get you started with with a numpy solution):
In [1]: import numpy as np
In [2]: x = np.random.normal(size=(50,))
In [3]: b = np.linspace(-20,20,50)
In [4]: assign = np.digitize(x,b)
In [5]: assign
Out[5]:
array([23, 25, 25, 25, 24, 26, 24, 26, 23, 24, 25, 23, 26, 25, 27, 25, 25,
25, 25, 26, 26, 25, 25, 26, 24, 23, 25, 26, 26, 24, 24, 26, 27, 24,
25, 24, 23, 23, 26, 25, 24, 25, 25, 27, 26, 25, 27, 26, 26, 24])
In [6]: uid = np.unique(assign)
In [7]: adict = {}
In [8]: for ii in uid:
...: adict[ii] = np.where(assign == ii)[0]
...:
In [9]: adict
Out[9]:
{23: array([ 0, 8, 11, 25, 36, 37]),
24: array([ 4, 6, 9, 24, 29, 30, 33, 35, 40, 49]),
25: array([ 1, 2, 3, 10, 13, 15, 16, 17, 18, 21, 22, 26, 34, 39, 41, 42, 45]),
26: array([ 5, 7, 12, 19, 20, 23, 27, 28, 31, 38, 44, 47, 48]),
27: array([14, 32, 43, 46])}
For dealing with flattening and then unflattening numpy arrays, see:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.unravel_index.html
http://docs.scipy.org/doc/numpy/reference/generated/numpy.ravel_multi_index.html

np.searchsorted is your friend. As I read somewhere here in another answer to the same topic, it's currently a good bit faster than digitize, and does the same job.
http://docs.scipy.org/doc/numpy/reference/generated/numpy.searchsorted.html

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

numpy.array.tolist() converts numpy.datetime64 to int - python

Related

transform a list of datetimes in a pandas column to list of strings

Convert the date time stamp to local epoch datatime

How to change the date format of the whole column?

python combining a range and a list of numbers

fast categorization (binning)

Categories

Resources