converting a string to np.array with datetime64, NOT using Pandas - python

I'm looking for a way to convert dates given in the format YYYYmmdd to an np.array with dtype='datetime64'. The dates are stored in another np.array but with dtype='float64'.
I am looking for a way to achieve this by avoiding Pandas!
I already tried something similar as suggested in this answer but the author states that "[...] if (the date format) was in ISO 8601 you could parse it directly using numpy, [...]".
As the date format in my case is YYYYmmdd which IS(?) ISO 8601 it should be somehow possible to parse it directly using numpy. But I don't know how as I am a total beginner in python and coding in general.
I really try to avoid Pandas because I don't want to bloat my script when there is a way to get the task done by using the modules I am already using. I also read it would decrease the speed here.

If noone else comes up with something more builtin, here is a pedestrian method:
>>> dates
array([19700101., 19700102., 19700103., 19700104., 19700105., 19700106.,
19700107., 19700108., 19700109., 19700110., 19700111., 19700112.,
19700113., 19700114.])
>>> y, m, d = dates.astype(int) // np.c_[[10000, 100, 1]] % np.c_[[10000, 100, 100]]
>>> y.astype('U4').astype('M8') + (m-1).astype('m8[M]') + (d-1).astype('m8[D]')
array(['1970-01-01', '1970-01-02', '1970-01-03', '1970-01-04',
'1970-01-05', '1970-01-06', '1970-01-07', '1970-01-08',
'1970-01-09', '1970-01-10', '1970-01-11', '1970-01-12',
'1970-01-13', '1970-01-14'], dtype='datetime64[D]')

You can go via the python datetime module.
from datetime import datetime
import numpy as np
datestrings = np.array(["18930201", "19840404"])
dtarray = np.array([datetime.strptime(d, "%Y%m%d") for d in datestrings], dtype="datetime64[D]")
print(dtarray)
# out: ['1893-02-01' '1984-04-04'] datetime64[D]
Since the real question seems to be how to get the given strings into the matplotlib datetime format,
from datetime import datetime
import numpy as np
from matplotlib import dates as mdates
datestrings = np.array(["18930201", "19840404"])
mpldates = mdates.datestr2num(datestrings)
print(mpldates)
# out: [691071. 724370.]

Related

Issues with converting date time to proper format- Columns must be same length as key

I'm doing some data analysis on a dataset (https://www.kaggle.com/sudalairajkumar/covid19-in-usa) and Im trying to convert the date and time column (lastModified) to the proper datetime format. When I tried it first it returned an error
ValueError: hour must be in 0..23
so I tried doing this -
data_df[['date','time']] =
data_df['lastModified'].str.split(expand=True)
data_df['lastModified'] = (pd.to_datetime(data_df.pop('date'),
format='%d/%m/%Y') +
pd.to_timedelta(data_df.pop('time') + ':00'))
This gives an error - Columns must be same length as key
I understand this means that both columns I'm splitting arent the same size. How do I resolve this issue? I'm relatively new to python. Please explain in a easy to understand manner. thanks very much
This is my whole code-
import pandas as pd
dataset_url = 'https://www.kaggle.com/sudalairajkumar/covid19-in-
usa'
import opendatasets as od
od.download(dataset_url)
data_dir = './covid19-in-usa'
import os
os.listdir(data_dir)
data_df = pd.read_csv('./covid19-in-usa/us_covid19_daily.csv')
data_df
data_df[['date','time']] =
data_df['lastModified'].str.split(expand=True)
data_df['lastModified'] = (pd.to_datetime(data_df.pop('date'),
format='%d/%m/%Y') +
pd.to_timedelta(data_df.pop('time') + ':00'))
Looks like lastModified is in ISO format. I have used something like below to convert iso date string:
from dateutil import parser
from datetime import datetime
...
timestamp = parser.isoparse(lastModified).timestamp()
dt = datetime.fromtimestamp(timestamp)
...
On this line:
data_df[['date','time']] = data_df['lastModified'].str.split(expand=True)
In order to do this assignment, the number of columns on both sides of the = must be the same. split can output multiple columns, but it will only do this if it finds the character it's looking for to split on. By default, it splits by whitespace. There is no whitespace in the date column, and therefore it will not split. You can read the documentation for this here.
For that reason, this line should be like this, so it splits on the T:
data_df[['date','time']] = data_df['lastModified'].str.split('T', expand=True)
But the solution posted by #southiejoe is likely to be more reliable. These timestamps are in a standard format; parsing them is a previously-solved problem.
You need these libraries
#import
from dateutil import parser
from datetime import datetime
Then try writing something similar for convert the date and time column. This way the columns should be the same length as the key
#convert the time column to the correct datetime format
clock = parser.isoparse(lastModified).timestamp()
#convert the date column to the correct datetime format
data = datetime.fromtimestamp(timestamp)

How to plus datetime in python?

I have the program that generate datetime in several format like below.
1 day, 21:21:00.561566
11:19:26.056148
Maybe it have in month or year format, and i want to know are there any way to plus these all time that i get from the program.
- 1 day, 21:21:00.561566 is the string representation of a datetime.timedelta object. If you need to parse from string to timedelta, pandas has a suitable method. There are other third party parsers; I'm just using this one since pandas is quite common.
import pandas as pd
td = pd.to_timedelta('- 11:19:26.056148')
# Timedelta('-1 days +12:40:33.943852')
td.total_seconds()
# -40766.056148
If you need to find the sum of multiple timedelta values, you can sum up their total_seconds and convert them back to timedelta:
td_strings = ['- 1 day, 21:21:00.561566', '- 11:19:26.056148']
td_sum = pd.Timedelta(seconds=sum([pd.to_timedelta(s).total_seconds() for s in td_strings]))
td_sum
# Timedelta('-1 days +10:01:34.505418')
...or leverage some tools from the Python standard lib:
from functools import reduce
from operator import add
td_sum = reduce(add, map(pd.to_timedelta, td_strings))
# Timedelta('-1 days +10:01:34.505418')
td_sum.total_seconds()
# -50305.494582
You can subtract date time like here to find how far apart these two times are:
https://stackoverflow.com/a/1345852/2415706
Adding two dates doesn't really make any sense though. Like, if you try to add Jan 1st of 2020 to Jan 1st of 1995, what are you expecting?
You can use datatime.timedelta class for this purpose.
You can find the documentation here.
You will need to parse your string and build a timedelta object.

Converting to_datetime but keeping original time

I am trying to convert string to Datetime- but the conversion adds 5 hours to the original time. How do I convert but keep the time as is?
>>> import pandas as pd
>>> t = pd.to_datetime("2016-09-21 08:56:29-05:00", format='%Y-%m-%d %H:%M:%S')
>>> t
Timestamp('2016-09-21 13:56:29')
The conversion doesn't add 5 hours to the original time. Pandas just detects that your datetime is timezone-aware and converts it to naive UTC. But it's still the same datetime.
If you want a localized Timestamp instance, use Timestamp.tz_localize() to make t a timezone-aware UTC timestamp, and then use the Timestamp.tz_convert() method to convert to UTC-0500:
>>> import pandas as pd
>>> import pytz
>>> t = pd.to_datetime("2016-09-21 08:56:29-05:00", format='%Y-%m-%d %H:%M:%S')
>>> t
Timestamp('2016-09-21 13:56:29')
>>> t.tz_localize(pytz.utc).tz_convert(pytz.timezone('America/Chicago'))
Timestamp('2016-09-21 08:56:29-0500', tz='America/Chicago')
To achieve what you want you can remove the "-5:00" from the end of your time string "2016-09-21 08:56:29-05:00"
However, Erik Cederstrand is correct in explaining that pandas is not modifying the time, it's simply displaying it in a different format.

Can't call strftime on numpy.datetime64, no definition

I have a datetime64 t that I'd like to represent as a string.
When I call strftime like this t.strftime('%Y.%m.%d') I get this error:
AttributeError: 'numpy.datetime64' object has no attribute 'strftime'
What am I missing? I am using Python 3.4.2 and Numpy 1.9.1
Importing a data structures library like pandas to accomplish type conversion feels like overkill to me. You can achieve the same thing with the standard datetime module:
import numpy as np
import datetime
t = np.datetime64('2017-10-26')
t = t.astype(datetime.datetime)
timestring = t.strftime('%Y.%m.%d')
Use this code:
import pandas as pd
t= pd.to_datetime(str(date))
timestring = t.strftime('%Y.%m.%d')
This is the simplest way:
t.item().strftime('%Y.%m.%d')
item() gives you a Python native datetime object, on which all the usual methods are available.
If your goal is only to represent t as a string, the simplest solution is str(t). If you want it in a specific format, you should use one of the solutions above.
One caveat is that np.datetime64 can have different amounts of precision. If t has nanosecond precision, user 12321's solution will still work, but apteryx's and John Zwinck's solutions won't, because t.astype(datetime.datetime) and t.item() return an int:
import numpy as np
print('second precision')
t = np.datetime64('2000-01-01 00:00:00')
print(t)
print(t.astype(datetime.datetime))
print(t.item())
print('microsecond precision')
t = np.datetime64('2000-01-01 00:00:00.0000')
print(t)
print(t.astype(datetime.datetime))
print(t.item())
print('nanosecond precision')
t = np.datetime64('2000-01-01 00:00:00.0000000')
print(t)
print(t.astype(datetime.datetime))
print(t.item())
import pandas as pd
print(pd.to_datetime(str(t)))
second precision
2000-01-01T00:00:00
2000-01-01 00:00:00
2000-01-01 00:00:00
microsecond precision
2000-01-01T00:00:00.000000
2000-01-01 00:00:00
2000-01-01 00:00:00
nanosecond precision
2000-01-01T00:00:00.000000000
946684800000000000
946684800000000000
2000-01-01 00:00:00
For those who might stumble upon this: numpy now has a numpy.datetime_as_string function. Only caveat is that it accepts an array rather than just an individual value. I could make however that this is still a better solution than having to use another library just to do the conversion.
It might help to convert the datetime object to string and use splitting as shown below:
dtObj = 2011-08-01T00:00:00.000000000
dtString = str(dtObj).split('-01T00:00:00.000000000')[0]
print(dtString)
>>> '2011-08-01'

How to set a variable to be "Today's" date in Python/Pandas

I am trying to set a variable to equal today's date.
I looked this up and found a related article:
Set today date as default value in the model
However, this didn't particularly answer my question.
I used the suggested:
dt.date.today
But after
import datetime as dt
date = dt.date.today
print date
<built-in method today of type object at 0x000000001E2658B0>
Df['Date'] = date
I didn't get what I actually wanted which as a clean date format of today's date...in Month/Day/Year.
How can I create a variable of today's day in order for me to input that variable in a DataFrame?
You mention you are using Pandas (in your title). If so, there is no need to use an external library, you can just use to_datetime
>>> pandas.to_datetime('today').normalize()
Timestamp('2015-10-14 00:00:00')
This will always return today's date at midnight, irrespective of the actual time, and can be directly used in pandas to do comparisons etc. Pandas always includes 00:00:00 in its datetimes.
Replacing today with now would give you the date in UTC instead of local time; note that in neither case is the tzinfo (timezone) added.
In pandas versions prior to 0.23.x, normalize may not have been necessary to remove the non-midnight timestamp.
If you want a string mm/dd/yyyy instead of the datetime object, you can use strftime (string format time):
>>> dt.datetime.today().strftime("%m/%d/%Y")
# ^ note parentheses
'02/12/2014'
Using pandas: pd.Timestamp("today").strftime("%m/%d/%Y")
pd.datetime.now().strftime("%d/%m/%Y")
this will give output as '11/02/2019'
you can use add time if you want
pd.datetime.now().strftime("%d/%m/%Y %I:%M:%S")
this will give output as '11/02/2019 11:08:26'
strftime formats
You can also look into pandas.Timestamp, which includes methods like .now and .today.
Unlike pandas.to_datetime('now'), pandas.Timestamp.now() won't default to UTC:
import pandas as pd
pd.Timestamp.now() # will return California time
# Timestamp('2018-12-19 09:17:07.693648')
pd.to_datetime('now') # will return UTC time
# Timestamp('2018-12-19 17:17:08')
i got the same problem so tried so many things
but finally this is the solution.
import time
print (time.strftime("%d/%m/%Y"))
simply just use pd.Timestamp.now()
for example:
input: pd.Timestamp.now()
output: Timestamp('2022-01-12 14:43:05.521896')
I know all you want is Timestamp('2022-01-12') you don't anything after
thus we could use replace to remove hour, minutes , second and microsecond
here:
input: pd.Timestamp.now().replace(hour=0, minute=0, second=0, microsecond=0)
output: Timestamp('2022-01-12 00:00:00')
but looks too complicated right, here is a simple way use normalize
input: pd.Timestamp.now().normalize()
output: Timestamp('2022-01-12 00:00:00')
Easy solution in Python3+:
import time
todaysdate = time.strftime("%d/%m/%Y")
#with '.' isntead of '/'
todaysdate = time.strftime("%d.%m.%Y")
import datetime
def today_date():
'''
utils:
get the datetime of today
'''
date=datetime.datetime.now().date()
date=pd.to_datetime(date)
return date
Df['Date'] = today_date()
this could be safely used in pandas dataframes.
There are already quite a few good answers, but to answer the more general question about "any" period:
Use the function for time periods in pandas. For Day, use 'D', for month 'M' etc.:
>pd.Timestamp.now().to_period('D')
Period('2021-03-26', 'D')
>p = pd.Timestamp.now().to_period('D')
>p.to_timestamp().strftime("%Y-%m-%d")
'2021-03-26'
note: If you need to consider UTC, you can use: pd.Timestamp.utcnow().tz_localize(None).to_period('D')...
From your solution that you have you can use:
import pandas as pd
pd.to_datetime(date)
using the date variable that you use

Categories

Resources