convert time to UTC in pandas - python

I have multiple csv files, I've set DateTime as the index.
df6.set_index("gmtime", inplace=True)
#correct the underscores in old datetime format
df6.index = [" ".join( str(val).split("_")) for val in df6.index]
df6.index = pd.to_datetime(df6.index)
The time was put in GMT, but I think it's been saved as BST (British summertime) when I set the clock for raspberry pi.
I want to shift the time one hour backwards. When I use
df6.tz_convert(pytz.timezone('utc'))
it gives me below error as it assumes that the time is correct.
Cannot convert tz-naive timestamps, use tz_localize to localize
How can I shift the time to one hour?

Given a column that contains date/time info as string, you would convert to datetime, localize to a time zone (here: Europe/London), then convert to UTC. You can do that before you set as index.
Ex:
import pandas as pd
dti = pd.to_datetime(["2021-09-01"]).tz_localize("Europe/London").tz_convert("UTC")
print(dti) # notice 1 hour shift:
# DatetimeIndex(['2021-08-31 23:00:00+00:00'], dtype='datetime64[ns, UTC]', freq=None)
Note: setting a time zone means that DST is accounted for, i.e. here, during winter you'd have UTC+0 and during summer UTC+1.

To add to FObersteiner's response (sorry,new user, can't comment on answers yet):
I've noticed that in all the real world situations I've run across it (with full dataframes or pandas series instead of just a single date), .tz_localize() and .tz_convert() need to be called slightly differently.
What's worked for me is
df['column'] = pd.to_datetime(df['column']).dt.tz_localize('Europe/London').dt.tz_convert('UTC')
Without the .dt, I get "index is not a valid DatetimeIndex or PeriodIndex."

Related

Change the time in datetime column in pandas to specific time if time entry is before 8am and after 12am

I have a two datetime columns in a pandas dataframe and want to calculate the difference between them to see how long people have been online (so I have log in and log out datetime columns). I have entries for 24hours each day but only want to sum up the time difference for the time the user has been online between 8am and 12am.
So I want to set all entries of the login column to 8am if the entries are between 12am and 8am, and all entries of the logout column to 12am if the entries were made between 12am and 8am.
How do I only check for the time in a datetime column and then set it accordingly?
First, I do not know whether your time data has been in datatime object, or still in string. If you have not, you could convert from string to datetime by the following code (suppose that your first column name 'time_in' and second column name 'time_out', and the data variable name data)
from datetime import datetime
data['time_in'] = data['time_in'].apply(lambda x: datetime.strptime(x, "%I%p"))
data['time_out'] = data['time_out'].apply(lambda x: datetime.strptime(x, "%I%p"))
(one note: as you give me your time at Hour_locale like 8am, so I do not include minute in converter. If you do have minute, then "%I%p" should be changed into "%I:M%p"). You could view all time format to convert here, for any future usage https://www.programiz.com/python-programming/datetime/strptime
When you print out your data at this stage, you could see the time will have the format of 1900/01/01 18:00:00 (convert from 6pm, for example). Do not worry, as you see this simply because when converting, the library do not receive date, so it automatically assigned to the first one. Just remind that for next step.
Now, you apply the changing to dataframe, simply as this:
To change all login to 8am:
data.loc[(data['time_in']< datetime(year=1900,month=1,day=1,hour=8)) & (data['time_in'] > datetime(year=1900,month=1,day=1,hour=0)),'time_in'] = datetime(year=1900,month=1,day=1,hour=8)
To change all logout to 12am:
data.loc[(data['time_out']< datetime(year=1900,month=1,day=1,hour=8)) & (data['time_out'] > datetime(year=1900,month=1,day=1,hour=0)),'time_out'] = datetime(year=1900,month=1,day=1,hour=0)
Then, it will all set at this stage. If you want to convert back to string, using strftime() with similar usage to strptime

Timedelta time difference expressed as float variable

I have data in a pandas dataframe that is marked by timestamps as datetime objects. I would like to make a graph that takes the time as something fluid. My idea was to substract the first timestamp from the others (here exemplary for the second entry)
xhertz_df.loc[1]['Dates']-xhertz_df.loc[0]['Dates']
to get the time passed since the first measurement. Which gives 350 days 08:27:51 as a timedelta object. So far so good.
This might be a duplicate but I have not found the solution here so far. Is there a way to quickly transform this object to a number of e.g. minutes or seconds or hours. I know I could extract the individual days, hours and minutes and make a tedious calculation to get it. But is there an integrated way to just turn this object into what I want?
Something like
timedelta.tominutes
that gives it back as a float of minutes, would be great.
If all you want is a float representation, maybe as simple as:
float_index = pd.Index(xhertz_df.loc['Dates'].values.astype(float))
In Pandas, Timestamp and Timedelta columns are internally handled as numpy datetime64[ns], that is an integer number of nanoseconds.
So it is trivial to convert a Timedelta column to a number of minutes:
(xhertz_df.loc[1]['Dates']-xhertz_df.loc[0]['Dates']).astype('int64')/60000000000.
Here is a way to do so with ‘timestamp‘:
Two examples for converting and one for the diff
import datetime as dt
import time
# current date and time
now = dt.datetime.now()
timestamp1 = dt.datetime.timestamp(now)
print("timestamp1 =", timestamp1)
time.sleep(4)
now = dt.datetime.now()
timestamp2 = dt.datetime.timestamp(now)
print("timestamp2 =", timestamp2)
print(timestamp2 - timestamp1)

Aligning datetime formats for comparrison

I'm having trouble align two different dates. I have an excel import which I turn into a DateTime in pandas and I would like to compare this DateTime with the current DateTime. The troubles are in the formatting of the imported DateTime.
Excel format of the date:
2020-07-06 16:06:00 (which is yyyy-dd-mm hh:mm:ss)
When I add the DateTime to my DataFrame it creates the datatype Object. After I convert it with pd.to_datetime it creates the format yyyy-mm-dd hh:mm:ss. It seems that the month and the day are getting mixed up.
Example code:
df = pd.read_excel('my path')
df['Arrival'] = pd.to_datetime(df['Arrival'], format='%Y-%d-%m %H:%M:%S')
print(df.dtypes)
Expected result:
2020-06-07 16:06:00
Actual result:
2020-07-06 16:06:00
How do I resolve this?
Gr,
Sempah
An ISO-8601 date/time is always yyyy-MM-dd, not yyyy-dd-MM. You've got the month and date positions switched around.
While localized date/time strings are inconsistent about the order of month and date, this particular format where the year comes first always starts with the biggest units (years) and decreases in unit size going right (month, date, hour, etc.)
It's solved. I think that I misunderstood the results. It already was working without me knowledge. Thanks for the help anyway.

Convert timezone of np.datetime64 without loss of precision

I have a DataFrame, one of whose columns is of type datetime64[ns]. These represent times in "Europe/London" timezone, and are on nanosecond-level of precision. (The data is coming from an external system)
I need to convert these to datetime64[ns] entries that represent UTC time instead. So in other words, bump each day by 0 or by 1 hours, depending on whether the entry is during summer time or not.
What is the best way of doing this?
Unfortunately, I couldn't find any timezone support baked into np.datetime64. At the same time, I can't just directly convert to/work with datetime.datetime objects, as that'd mean loss of precision. The only thing I could think of so far is converting np.datetime64 to datetime.datetime, adjusting timezones, getting some sort of timedelta between unadjusted and adjusted datetime.datetime, and then apply that timedelta back to np.datetime64. Sounds like a lot of hoops to jump through though, for something which I'm hoping can be done more easily?
Thanks!
It appears pandas has some built-in support for this, using the dt accessor:
import pandas as pd
import numpy as np
dt_arr = np.array(['2019-05-01T12:00:00.000000010',
'2019-05-01T12:00:00.000000100',],
dtype='datetime64[ns]')
df = pd.DataFrame(dt_arr)
# Represent naive datetimes as London time
df[0] = df[0].dt.tz_localize('Europe/London')
# Convert to UTC
df[0] = df[0].dt.tz_convert("UTC")
print(df)
# 0
# 0 2019-05-01 11:00:00.000000010+00:00
# 1 2019-05-01 11:00:00.000000100+00:00
Assuming you are starting with some ISO 8601 strings in your np.datetime64[ns], you can use dt.tz_localize to assign a time zone to them, then dt.tz_convert to convert them into another time zone.
I will warn though that if they came in as integers like 1556708400000000010, there's a good chance that they already represent UTC, since timestamps given in seconds or nanoseconds are usually Unix epoch times, which are independent of the time zone they were recorded in (it's a number of seconds/nanoseconds after the Unix epoch, not a civil time).

How can I convert a timestamp string of the form "%d%H%MZ" to a datetime object?

I have timestamp strings of the form "091250Z", where the first two numbers are the date and the last four numbers are the hours and minutes. The "Z" indicates UTC. Assuming the timestamp corresponds to the current month and year, how can this string be converted reliably to a datetime object?
I have the parsing to a timedelta sorted, but the task quickly becomes nontrivial when going further and I'm not sure how to proceed:
datetime.strptime("091250Z", "%d%H%MZ")
What you need is to replace the year and month of your existing datetime object.
your_datetime_obj = datetime.strptime("091250Z", "%d%H%MZ")
new_datetime_obj = your_datetime_obj.replace(year=datetime.now().year, month=datetime.now().month)
Like this? You've basically already done it, you just needed to assign it a variable
from datetime import datetime
dt = datetime.strptime('091250Z', '%m%H%MZ')

Categories

Resources