I am analyzing the .csv file and in this my first column is of the datetime in the format "2016-09-15T00:00:13" and I want to change this format to standard python datetime object.I can change the format for one but date but for whole column I can not do that.
My code that I am using:
import numpy
import dateutil.parser
mydate = dateutil.parser.parse(numpy.mydata[1:,0])
print(mydate)
I am getting the error:
'module' object has no attribute 'mydata'
Here is the column for which I want the format to be changed.
print(mydata[1:,0])
['2016-09-15T00:00:13'
'2016-09-15T00:00:38'
'2016-09-15T00:00:53'
...,
'2016-09-15T23:59:28'
'2016-09-15T23:59:37'
'2016-09-15T23:59:52']
from datetime import datetime
for date in mydata:
date_object = datetime.strptime(date, '%Y-%m-%dT%H:%M:%S')
Here's a link to the method I'm using. That same link also lists the format arguments.
Oh and about the
'module' object has no attribute 'mydata'
You call numpy.mydata which is a reference to the "mydata" attribute of the numpy module you imported. The problem is, is that "mydata" is just one of your variables, not something included with numpy.
Unless you have a compelling reason to avoid it, pandas is the way to go with this kind of analysis. You can simply do
import pandas
df = pandas.read_csv('myfile.csv', parse_dates=True)
This will assume the first column is the index column and parse dates in it. This is probably what you want.
Assuming you've dealt with that numpy.mydata[1:,0] attribute error
Your data looks like:
In [268]: mydata=['2016-09-15T00:00:13' ,
...: '2016-09-15T00:00:38' ,
...: '2016-09-15T00:00:53' ,
...: '2016-09-15T23:59:28' ,
...: '2016-09-15T23:59:37' ,
...: '2016-09-15T23:59:52']
or in array form it is a ld array of strings
In [269]: mydata=np.array(mydata)
In [270]: mydata
Out[270]:
array(['2016-09-15T00:00:13', '2016-09-15T00:00:38', '2016-09-15T00:00:53',
'2016-09-15T23:59:28', '2016-09-15T23:59:37', '2016-09-15T23:59:52'],
dtype='<U19')
numpy has a version of datetime that stores as a 64 bit float, and can be used numerically. Your dates readily convert to that with astype (your format is standard):
In [271]: mydata.astype(np.datetime64)
Out[271]:
array(['2016-09-15T00:00:13', '2016-09-15T00:00:38', '2016-09-15T00:00:53',
'2016-09-15T23:59:28', '2016-09-15T23:59:37', '2016-09-15T23:59:52'],
dtype='datetime64[s]')
tolist converts this array to a list - and the dates to datetime objects:
In [274]: D.tolist()
Out[274]:
[datetime.datetime(2016, 9, 15, 0, 0, 13),
datetime.datetime(2016, 9, 15, 0, 0, 38),
datetime.datetime(2016, 9, 15, 0, 0, 53),
datetime.datetime(2016, 9, 15, 23, 59, 28),
datetime.datetime(2016, 9, 15, 23, 59, 37),
datetime.datetime(2016, 9, 15, 23, 59, 52)]
which could be turned back into an array of dtype object:
In [275]: np.array(D.tolist())
Out[275]:
array([datetime.datetime(2016, 9, 15, 0, 0, 13),
datetime.datetime(2016, 9, 15, 0, 0, 38),
datetime.datetime(2016, 9, 15, 0, 0, 53),
datetime.datetime(2016, 9, 15, 23, 59, 28),
datetime.datetime(2016, 9, 15, 23, 59, 37),
datetime.datetime(2016, 9, 15, 23, 59, 52)], dtype=object)
These objects couldn't be used in array calculations. The list would be just as useful.
If your string format wasn't standard you'd have to use the datetime parser in a list comprehension as #staples shows.
Related
I have the following pandas dataframe
import pandas as pd
import datetime
foo = pd.DataFrame({'id': [1,2], 'time' :['[datetime.datetime(2021, 10, 20, 14, 29, 51), datetime.datetime(2021, 10, 20, 14, 46, 8)]', '[datetime.datetime(2021, 10, 20, 15, 0, 44), datetime.datetime(2021, 10, 20, 16, 13, 42)]']})
foo
id time
0 1 [datetime.datetime(2021, 10, 20, 14, 29, 51), datetime.datetime(2021, 10, 20, 14, 46, 8)]
1 2 [datetime.datetime(2021, 10, 20, 15, 0, 44), datetime.datetime(2021, 10, 20, 16, 13, 42)]
I would like to transform each element of the lists in the time column to a string with the format '%Y/%m/%d %H:%M:%S'
I know I can do this:
t = datetime.datetime(2021, 10, 20, 14, 29, 51)
t.strftime('%Y/%m/%d %H:%M:%S')
to yield the value '2021/10/20 14:29:51',
but I do not know how to do this operation for every string element of each list in the time column.
Any help ?
You just need to use list comprehension inside apply after converting string lists to actual lists with eval:
foo.time.apply(lambda str_list: [item.strftime('%Y/%m/%d %H:%M:%S') for item in eval(str_list)])
You can separate the list into rows first with explode and then use the dt accessor in pandas:
(foo
.explode('time')
.assign(time=lambda x: x.time.dt.strftime('%Y/%m/%d %H:%M:%S'))
)
I have an array of datetimes that I need to convert to a list of datetimes. My array looks like this:
import numpy as np
my_array = np.array(['2017-06-28T22:47:51.213500000', '2017-06-28T22:48:37.570900000',
'2017-06-28T22:49:46.736800000', '2017-06-28T22:50:41.866800000',
'2017-06-28T22:51:17.024100000', '2017-06-28T22:51:24.038300000'], dtype='datetime64[ns]')
my_list = my_array.tolist()
I need a list of datetime values, but when I do my_array.tolist(), I get a list of numerical time stamps:
[1498690071213500000,
1498690117570900000,
1498690186736800000,
1498690241866800000,
1498690277024100000,
1498690284038300000]
My question is how do I preserve the datetime format when going from an array to a list, or how do I convert the list of time stamps to a list datetime values?
NumPy can't convert instances of 'datetime64[ns]' to Python datetime.datetime instances, because datetime instances do not support nanosecond resolution.
If you cast the array to 'datetime64[us]', so the timestamps have only microsecond resolution, then the .tolist() method will give you datetime.datetime instances:
In [25]: my_array
Out[25]:
array(['2017-06-28T22:47:51.213500000', '2017-06-28T22:48:37.570900000',
'2017-06-28T22:49:46.736800000', '2017-06-28T22:50:41.866800000',
'2017-06-28T22:51:17.024100000', '2017-06-28T22:51:24.038300000'],
dtype='datetime64[ns]')
In [26]: my_array.astype('datetime64[us]').tolist()
Out[26]:
[datetime.datetime(2017, 6, 28, 22, 47, 51, 213500),
datetime.datetime(2017, 6, 28, 22, 48, 37, 570900),
datetime.datetime(2017, 6, 28, 22, 49, 46, 736800),
datetime.datetime(2017, 6, 28, 22, 50, 41, 866800),
datetime.datetime(2017, 6, 28, 22, 51, 17, 24100),
datetime.datetime(2017, 6, 28, 22, 51, 24, 38300)]
Explicitly casting the numpy.ndarray as a native Python list will preserve the contents as numpy.datetime64 objects:
>>> list(my_array)
[numpy.datetime64('2017-06-28T22:47:51.213500000'),
numpy.datetime64('2017-06-28T22:48:37.570900000'),
numpy.datetime64('2017-06-28T22:49:46.736800000'),
numpy.datetime64('2017-06-28T22:50:41.866800000'),
numpy.datetime64('2017-06-28T22:51:17.024100000'),
numpy.datetime64('2017-06-28T22:51:24.038300000')]
However, if you wanted to go back from an integer timestamp to a numpy.datetime64 object, the number given here by numpy.ndarray.tolist is given in nanosecond format, so you could also use a list comprehension like the following:
>>> [np.datetime64(x, "ns") for x in my_list]
[numpy.datetime64('2017-06-28T22:47:51.213500000'),
numpy.datetime64('2017-06-28T22:48:37.570900000'),
numpy.datetime64('2017-06-28T22:49:46.736800000'),
numpy.datetime64('2017-06-28T22:50:41.866800000'),
numpy.datetime64('2017-06-28T22:51:17.024100000'),
numpy.datetime64('2017-06-28T22:51:24.038300000')]
And if you want the final result as a Python datetime.datetime object instead of a numpy.datetime64 object, you can use a method like this (adjusted as needed for locality):
>>> from datetime import datetime
>>> list(map(datetime.utcfromtimestamp, my_array.astype(np.uint64) / 1e9))
[datetime.datetime(2017, 6, 28, 22, 47, 51, 213500),
datetime.datetime(2017, 6, 28, 22, 48, 37, 570900),
datetime.datetime(2017, 6, 28, 22, 49, 46, 736800),
datetime.datetime(2017, 6, 28, 22, 50, 41, 866800),
datetime.datetime(2017, 6, 28, 22, 51, 17, 24100),
datetime.datetime(2017, 6, 28, 22, 51, 24, 38300)]
Edit: Warren Weckesser's answer provides a more straightforward approach to go from a numpy.datetime64[ns] array to a list of Python datetime.datetime objects than is described here.
Try
# convert to string type first
my_list = my_array.astype(str).tolist()
my_list
# ['2017-06-28T22:47:51.213500000', '2017-06-28T22:48:37.570900000', '2017-06-28T22:49:46.736800000', '2017-06-28T22:50:41.866800000', '2017-06-28T22:51:17.024100000', '2017-06-28T22:51:24.038300000']
The other answers provide a more straightforward ways but for completeness, you can call datetime.datetime.fromtimestamp in a loop
from datetime import datetime
[datetime.fromtimestamp(x) for x in my_array.astype(object)/1e9]
#[datetime.datetime(2017, 6, 28, 15, 47, 51, 213500),
# datetime.datetime(2017, 6, 28, 15, 48, 37, 570900),
# datetime.datetime(2017, 6, 28, 15, 49, 46, 736800),
# datetime.datetime(2017, 6, 28, 15, 50, 41, 866800),
# datetime.datetime(2017, 6, 28, 15, 51, 17, 24100),
# datetime.datetime(2017, 6, 28, 15, 51, 24, 38300)]
I have an array of unixtime timestamps. How do I convert that using
datetime.utcfromtimestamp().strftime("%Y-%M-%D %H:%M:%S")
? My array is saved under "time". How do I utilize that array in this conversion?
Assuming your times are of the format datetime, you can loop through the list and convert each one.
Here is a quick example:
import datetime
time = []
for i in range(10):
time.append(datetime.datetime.now())
print(time) # output: [datetime.datetime(2020, 7, 8, 10, 7, 4, 314614), datetime.datetime(2020, 7, 8, 10, 7, 4, 314622)....
formattedTime = []
for t in time:
formattedTime.append(t.strftime('%Y-%m-%d %H:%M:%S'))
print(formattedTime) # output: ['2020-07-07/08/20 10:07:04', '2020-07-07/08/20 10:07:04', ....
# the update to my answer:
newTimes = []
for date_time_str in formattedTime:
newTimes.append(datetime.datetime.strptime(date_time_str, '%Y-%m-%d %H:%M:%S'))
print(newTimes) # '%Y-%m-%d %H:%M:%S' [datetime.datetime(2020, 7, 8, 18, 56, 47), datetime.datetime(2020, 7, 8, 18, 56, 47), datetime.datetime(2020, 7, 8, 18, 56, 47),...]
Let me know if you have more questions.
I am also attaching this article for datetime which I found really helpful.
Here is the example in repl
I am trying to code a function called days15(). The function will be passed an argument called ‘myDateStr’. myDateStr is string representation of a date in the form 20170817 (that is YearMonthDay). The code in the function will create a datetime object from the string, it will then create a timedelta object with a length of 1 day. Then, it will use a list comprehension to produce a list of 15 datetime objects, starting with the date that is passed to the function
the function should return the following list.
[datetime.datetime(2017, 8, 17, 0, 0), datetime.datetime(2017, 8, 18, 0, 0), datetime.datetime(2017, 8, 19, 0, 0), datetime.datetime(2017, 8, 20, 0, 0), datetime.datetime(2017, 8, 21, 0, 0), datetime.datetime(2017, 8, 22, 0, 0), datetime.datetime(2017, 8, 23, 0, 0), datetime.datetime(2017, 8, 24, 0, 0), datetime.datetime(2017, 8, 25, 0, 0), datetime.datetime(2017, 8, 26, 0, 0), datetime.datetime(2017, 8, 27, 0, 0), datetime.datetime(2017, 8, 28, 0, 0), datetime.datetime(2017, 8, 29, 0, 0), datetime.datetime(2017, 8, 30, 0, 0), datetime.datetime(2017, 8, 31, 0, 0)]
I am stuck for the code. I have strted with the below.Please help. Thanks
from datetime import datetime, timedelta
myDateStr = '20170817'
def days15(myDateStr):
Pandas will help you in converting strings to datetime, so first you need to import it:
from datetime import datetime, timedelta
import pandas as pd
myDateStr = '20170817'
Then you can initialize an empty list that you'll later append:
datelist = []
And then you write a function:
def days15(myDateStr):
#converting to datetime
date = pd.to_datetime(myDateStr)
#loop to create 15 datetimes
for i in range(15):
newdate = date + timedelta(days=i)
#adding new dates to the list
datelist.append(newdate)
and then you can call your function and get a list of 15 datetimes:
days15(myDateStr)
As you said, there will be two steps to implement: firstly, convert the string date to a datetime object and secondly, iterate over the next 15 days using timedelta, with a list comprehension or a simple loop.
from datetime import datetime, timedelta
myDateStr = '20170817'
# Parse the string and return a datetime object
def getDateTime(date):
return datetime(int(date[:4]),int(date[4:6]),int(date[6:]))
# Iterate over the timedelta added to the starting date
def days15(myDateStr):
return [getDateTime(myDateStr) + timedelta(days=x) for x in range(15)]
When creating a pandas dataframe object (python 2.7.9, pandas 0.16.2), the first datetime field gets automatically converted into a pandas timestamp. Why? Is it possible to prevent this so as to keep the field in the original type?
Please see code below:
import numpy as np
import datetime
import pandas
create a dict:
x = {'cusip': np.array(['10553M10', '67085120', '67085140'], dtype='|S8'),
'vstart':np.array([datetime.datetime(2001, 11, 16, 0, 0),
datetime.datetime(2012, 2, 28, 0, 0), datetime.datetime(2014, 12, 22, 0, 0)],
dtype=object),
'vstop': np.array([datetime.datetime(2012, 2, 28, 0, 0),
datetime.datetime(2014, 12, 22, 0, 0), datetime.datetime(9999, 12, 31, 0, 0)],
dtype=object),
'id': np.array(['EQ0000000000041095', 'EQ0000000000041095', 'EQ0000000000041095'],
dtype='|S18')}
So, the vstart and vstop keys are datetime so far. However, after:
df = pandas.DataFrame(data = x)
the vstart becomes a pandas Timestamp automatically while vstop remains a datetime
type(df.vstart[0])
#class 'pandas.tslib.Timestamp'
type(df.vstop[0])
#type 'datetime.datetime'
I don't understand why the first datetime column that the constructor comes across gets converted to Timestamp by pandas. And how to tell pandas to keep the data types as they are. Can you help? Thank you.
actually I've noticed something in your data , it has nothing to do with your first or second date column in your column vstop there is a datetime with value dt.datetime(9999, 12, 31, 0, 0) , if you changed the year on this date to a normal year like 2020 for example both columns will be treated the same .
just note that I'm importing datetime module as dt
x = {'cusip': np.array(['10553M10', '67085120', '67085140'], dtype='|S8'),
'vstop': np.array([dt.datetime(2012, 2, 28, 0, 0), dt.datetime(2014, 12, 22, 0, 0), dt.datetime(2020, 12, 31, 0, 0)], dtype=object),
'vstart': np.array([dt.datetime(2001, 11, 16, 0, 0),dt.datetime(2012, 2, 28, 0, 0), dt.datetime(2014, 12, 22, 0, 0)], dtype=object),
'id': np.array(['EQ0000000000041095', 'EQ0000000000041095', 'EQ0000000000041095'], dtype='|S18')}
In [27]:
df = pd.DataFrame(x)
df
Out[27]:
cusip id vstart vstop
10553M10 EQ0000000000041095 2001-11-16 2012-02-28
67085120 EQ0000000000041095 2012-02-28 2014-12-22
67085140 EQ0000000000041095 2014-12-22 2020-12-31
In [25]:
type(df.vstart[0])
Out[25]:
pandas.tslib.Timestamp
In [26]:
type(df.vstop[0])
Out[26]:
pandas.tslib.Timestamp