Timedelta time difference expressed as float variable - python

I have data in a pandas dataframe that is marked by timestamps as datetime objects. I would like to make a graph that takes the time as something fluid. My idea was to substract the first timestamp from the others (here exemplary for the second entry)
xhertz_df.loc[1]['Dates']-xhertz_df.loc[0]['Dates']
to get the time passed since the first measurement. Which gives 350 days 08:27:51 as a timedelta object. So far so good.
This might be a duplicate but I have not found the solution here so far. Is there a way to quickly transform this object to a number of e.g. minutes or seconds or hours. I know I could extract the individual days, hours and minutes and make a tedious calculation to get it. But is there an integrated way to just turn this object into what I want?
Something like
timedelta.tominutes
that gives it back as a float of minutes, would be great.

If all you want is a float representation, maybe as simple as:
float_index = pd.Index(xhertz_df.loc['Dates'].values.astype(float))

In Pandas, Timestamp and Timedelta columns are internally handled as numpy datetime64[ns], that is an integer number of nanoseconds.
So it is trivial to convert a Timedelta column to a number of minutes:
(xhertz_df.loc[1]['Dates']-xhertz_df.loc[0]['Dates']).astype('int64')/60000000000.

Here is a way to do so with ‘timestamp‘:
Two examples for converting and one for the diff
import datetime as dt
import time
# current date and time
now = dt.datetime.now()
timestamp1 = dt.datetime.timestamp(now)
print("timestamp1 =", timestamp1)
time.sleep(4)
now = dt.datetime.now()
timestamp2 = dt.datetime.timestamp(now)
print("timestamp2 =", timestamp2)
print(timestamp2 - timestamp1)

Related

How do i convert duration column into hours and minutes?

I'm currently sitting in Jupyter Notebook on a dataset that has a duration column that looks like this;
I still feel like a newbie at programming at programming, so i'm not sure to convert this data so it can be visualized in graphs in jupyter. Right now its just all strings in the column.
Does anyone knows how i do this right?
Thank you!
Assuming each time in your data is a string and assuming the formats are all as shown then you could use a parser after a little massaging of the data:
from dateutil import parser
s = "1 hour 35 mins"
print(s)
s = s.replace('min', 'minute')
time = parser.parse(s).time()
print(time)
This somewhat less flexible than the answer from #Jimpsoni which captures the two numbers but will work on your data and variations such as "1h 35m". If your data is in a List then you can loop through it; if in a Pandas series then you could form a function and use .apply to convert the values in the series.
You could loop through your dataset and extract the numbers from the strings and then turn those numbers into timedelta objects. Here is a one example from your dataset.
from datetime import timedelta
import re
string = "1 hour 35 mins" # Example from dataset
# Extract numbers with regex
numbers = list(map(int, re.findall(r'\d+', string)))
# Create timedelta object from those numbers
if len(numbers) < 2:
time = timedelta(minutes=numbers[0])
else:
time = timedelta(hours=numbers[0], minutes=numbers[1])
print(time) # -> prints 1:35:00
More about deltatime object here.
What is the optimal way to loop through your dataset really depends in what form is your data in but this is an example how you would do it to one instance of data.

convert time to UTC in pandas

I have multiple csv files, I've set DateTime as the index.
df6.set_index("gmtime", inplace=True)
#correct the underscores in old datetime format
df6.index = [" ".join( str(val).split("_")) for val in df6.index]
df6.index = pd.to_datetime(df6.index)
The time was put in GMT, but I think it's been saved as BST (British summertime) when I set the clock for raspberry pi.
I want to shift the time one hour backwards. When I use
df6.tz_convert(pytz.timezone('utc'))
it gives me below error as it assumes that the time is correct.
Cannot convert tz-naive timestamps, use tz_localize to localize
How can I shift the time to one hour?
Given a column that contains date/time info as string, you would convert to datetime, localize to a time zone (here: Europe/London), then convert to UTC. You can do that before you set as index.
Ex:
import pandas as pd
dti = pd.to_datetime(["2021-09-01"]).tz_localize("Europe/London").tz_convert("UTC")
print(dti) # notice 1 hour shift:
# DatetimeIndex(['2021-08-31 23:00:00+00:00'], dtype='datetime64[ns, UTC]', freq=None)
Note: setting a time zone means that DST is accounted for, i.e. here, during winter you'd have UTC+0 and during summer UTC+1.
To add to FObersteiner's response (sorry,new user, can't comment on answers yet):
I've noticed that in all the real world situations I've run across it (with full dataframes or pandas series instead of just a single date), .tz_localize() and .tz_convert() need to be called slightly differently.
What's worked for me is
df['column'] = pd.to_datetime(df['column']).dt.tz_localize('Europe/London').dt.tz_convert('UTC')
Without the .dt, I get "index is not a valid DatetimeIndex or PeriodIndex."

calculate the difference of two timestamp columns [duplicate]

This question already has answers here:
Calculate Time Difference Between Two Pandas Columns in Hours and Minutes
(4 answers)
calculate the time difference between two consecutive rows in pandas
(2 answers)
Closed 2 years ago.
I have a dataset like this:
data = pd.DataFrame({'order_date-time':['2017-09-13 08:59:02', '2017-06-28 11:52:20', '2018-05-18 10:25:53', '2017-08-01 18:38:42', '2017-08-10 21:48:40','2017-07-27 15:11:51',
'2018-03-18 21:00:44','2017-08-05 16:59:05', '2017-08-05 16:59:05','2017-06-05 12:22:19'],
'delivery_date_time':['2017-09-20 23:43:48', '2017-07-13 20:39:29','2018-06-04 18:34:26','2017-08-09 21:26:33','2017-08-24 20:04:21','2017-08-31 20:19:52',
'2018-03-28 21:57:44','2017-08-14 18:13:03','2017-08-14 18:13:03','2017-06-26 13:52:03']})
I want to calculate the time differences between these dates as the number of days and add it to the table as the delivery delay column. But I need to include both day and time for this calculation
for example, if the difference is 7 days 14:44:46 we can round this to 7 days.
from datetime import datetime
datetime.strptime(date_string, format)
you could use this to convert the string to DateTime format and put it in variable and then calculate it
Visit https://www.journaldev.com/23365/python-string-to-datetime-strptime/
Python's datetime library is good to work with individual timestamps. If you have your data in a pandas DataFrame as in your case, however, you should use pandas datetime functionality.
To convert a column with timestamps from stings to proper datetime format you can use pandas.to_datetime():
data['order_date_time'] = pd.to_datetime(data['order_date_time'], format="%Y-%m-%d %H:%M:%S")
data['delivery_date_time'] = pd.to_datetime(data['delivery_date_time'], format="%Y-%m-%d %H:%M:%S")
The format argument is optional, but I think it is a good idea to always use it to make sure your datetime format is not "interpreted" incorrectly. It also makes the process much faster on large data-sets.
Once you have the columns in a datetime format you can simply calculate the timedelta between them:
data['delay'] = data['delivery_date_time'] - data['order_date_time']
And then finally, if you want to round this timedelta, then pandas has again the right method for this:
data['approx_delay'] = data['delay'].dt.round('d')
where the extra dt gives access to datetime specific methods, the round function takes a frequency as arguments, and in this case that frequency has been set to a day using 'd'

Convert timezone of np.datetime64 without loss of precision

I have a DataFrame, one of whose columns is of type datetime64[ns]. These represent times in "Europe/London" timezone, and are on nanosecond-level of precision. (The data is coming from an external system)
I need to convert these to datetime64[ns] entries that represent UTC time instead. So in other words, bump each day by 0 or by 1 hours, depending on whether the entry is during summer time or not.
What is the best way of doing this?
Unfortunately, I couldn't find any timezone support baked into np.datetime64. At the same time, I can't just directly convert to/work with datetime.datetime objects, as that'd mean loss of precision. The only thing I could think of so far is converting np.datetime64 to datetime.datetime, adjusting timezones, getting some sort of timedelta between unadjusted and adjusted datetime.datetime, and then apply that timedelta back to np.datetime64. Sounds like a lot of hoops to jump through though, for something which I'm hoping can be done more easily?
Thanks!
It appears pandas has some built-in support for this, using the dt accessor:
import pandas as pd
import numpy as np
dt_arr = np.array(['2019-05-01T12:00:00.000000010',
'2019-05-01T12:00:00.000000100',],
dtype='datetime64[ns]')
df = pd.DataFrame(dt_arr)
# Represent naive datetimes as London time
df[0] = df[0].dt.tz_localize('Europe/London')
# Convert to UTC
df[0] = df[0].dt.tz_convert("UTC")
print(df)
# 0
# 0 2019-05-01 11:00:00.000000010+00:00
# 1 2019-05-01 11:00:00.000000100+00:00
Assuming you are starting with some ISO 8601 strings in your np.datetime64[ns], you can use dt.tz_localize to assign a time zone to them, then dt.tz_convert to convert them into another time zone.
I will warn though that if they came in as integers like 1556708400000000010, there's a good chance that they already represent UTC, since timestamps given in seconds or nanoseconds are usually Unix epoch times, which are independent of the time zone they were recorded in (it's a number of seconds/nanoseconds after the Unix epoch, not a civil time).

How can I convert a timestamp string of the form "%d%H%MZ" to a datetime object?

I have timestamp strings of the form "091250Z", where the first two numbers are the date and the last four numbers are the hours and minutes. The "Z" indicates UTC. Assuming the timestamp corresponds to the current month and year, how can this string be converted reliably to a datetime object?
I have the parsing to a timedelta sorted, but the task quickly becomes nontrivial when going further and I'm not sure how to proceed:
datetime.strptime("091250Z", "%d%H%MZ")
What you need is to replace the year and month of your existing datetime object.
your_datetime_obj = datetime.strptime("091250Z", "%d%H%MZ")
new_datetime_obj = your_datetime_obj.replace(year=datetime.now().year, month=datetime.now().month)
Like this? You've basically already done it, you just needed to assign it a variable
from datetime import datetime
dt = datetime.strptime('091250Z', '%m%H%MZ')

Categories

Resources