Select rows in pandas matching time condition - python

I have a csv with data like this:
[id names timestamp is_valid]
[1 name:surname 2016-06-09 23:29:50.083093 True]
I need to select rows based on this condition: if is_valid is true and if timestamp has passed 24 hours. So it should be True and current time 2016-06-10 23:29:50.083093 to pass the condition.
How can I achieve this? I know how to apply the first condition:
from datetime import datetime, timedelta
import pandas as pd
from dateutil import parser
df=pd.read_csv('acc.csv')
user=(df[df['is_valid']==True])
I can even print timestamp, parse it and compare with datetime.now(). But this is definitely a terrible thing to do.

try this:
from datetime import datetime, timedelta
import pandas as pd
from dateutil import parser
df = pd.read_csv('acc.csv')
tidx = pd.to_datetime(df['timestamp'].values)
past_24 = (pd.datetime.now() - tidx).total_seconds() > 60 * 60 * 24
user = df[df['is_valid'] & past_24]

Related

Select rows based on datetime columns of the same month

I would like to compare 2 datetime columns in pandas and select those that do not have the same month. I cannot find a good source for working with datetime fields in theis a way and I attempted the standard gdf.loc[(gdf[Field1] != gdf[Field2])] but when I add .month, .dt.month, or pd.to_datime/pd.to_datetimeIndex it givesme a series error has no attribute. The 2 columns are datetime objects.
import geopandas as gpd, pandas as pd, datetime
Field1 = 'TrackStartTime'
Field2 = 'TrackEndTime'
gdf = gpd.read_file('Tracks.gpkg', driver='GPKG', layer='segments')
gdf.loc[~(gdf[Field1] == gdf[Field2])]
print(gdf[[Field1, Field2]])

timestamp in this format : '2022-03-17T19:38:48.331000Z'

I want to transform a date which has the following format "2022-03-17T19:38:48.331000Z"
in order to know if it would give me valuable information.
import numpy as np
import pandas as pd
import requests, json
from pandas import json_normalize
from datetime import datetime
from datetime import timezone
!pip3 install zulu
input: column_timestamp
id timestamp
ed25291d0f5edd91615d154f243f82f9 2022-03-18T07:33:36.882000Z
e02c5db9e6f6fca078798c9b2d486a81 2022-03-18T07:33:36.945000Z
f8756b6af18c2fedd8a295040279aecc 2022-03-18T07:33:37.549000Z
...
from datetime import datetime
from datetime import timezone
!pip3 install zulu
time = []
for i in range(505):
dt = zulu.parse(column_timestamp["timestamp"][i])
dt.format('% m/% d/% y % H:% M:% S % z')
time.append(dt)
i = +1
time_df = pd.DataFrame(time)
time_df
output:
0
0 2022-03-18 07:33:36.882000+00:00
1 2022-03-18 07:33:36.945000+00:00
2 2022-03-18 07:33:37.549000+00:00
3 2022-03-18 07:33:37.550000+00:00
4 2022-03-18 07:33:37.552000+00:00
... ...
I want to know if it's correct and as well split this dataframe into different columns:
Date
Hour
Minute
Seconds
And make sure if I'm doing the conversion correct:
'2022-03-18T07:33:36.746000Z'

Lack of desired output

In the code below, I am trying to get data for a specified date only.
It perfectly works for the shown code.
But if I change the date to 26-12-2020, it results in data of both 26-12-2020 and 27-12-2020.
import csv
import datetime
import os
import pandas as pd
import xlsxwriter
import numpy as np
from datetime import date
import datetime
import calendar
rdate = 27-12-2020
data= pd.read_excel(r'C:/Clover Workspace/NPS/Customer Feedback-28-12-2020.xlsx')
data.drop(columns=['User ID','Comments','Purpose ID'],inplace= True, axis=1)
df = pd.DataFrame(data, columns=['Name','Rating','Date','Store','Feedback choice'])
df['Date'] = pd.to_datetime(data['Date'])
df= df[df['Date'].ge("27-12-2020")]
How can I generate the output only for the specified date, irrespective of the date on the excel sheet name?
here:
df= df[df['Date'].ge("27-12-2020")]
.ge means greater or equal, so when you put in 26-12-2020 you get both days. Try using .eq instead:
df= df[df['Date'].eq("26-12-2020")]

tzinfo in Pandas and datetime seems to be different. Is there a workaround?

I am trying to find a time difference between two datatimes. One is set from datetime and another one is read from a CSV file into a dataframe.
The CSV file:
,Timestamp,Value
1,2020-04-21 00:46:23,24.965867802122457
Actual code:
import pandas as pd
import numpy as np
from datetime import datetime, timezone
EPOCH = datetime.utcfromtimestamp(0).replace(tzinfo=timezone.utc)
df = pd.read_csv('./Out/bottom_clamp_pressure.csv', index_col = 0, header = 0)
df['Timestamp'] = df['Timestamp'].apply(pd.to_datetime, utc = True)
print(EPOCH)
print(df.loc[1, 'Timestamp'])
# Output:
# 1970-01-01 00:00:00+00:00
# 2020-04-21 00:46:23+00:00
print(EPOCH.tzinfo)
print(df.loc[1, 'Timestamp'].tzinfo)
# Output:
# UTC
# UTC
print(EPOCH.tzinfo == df.loc[1, 'Timestamp'].tzinfo)
# Output:
# False
print(df.loc[1, 'Timestamp'] - EPOCH)
# Output:
# TypeError: Timestamp subtraction must have the same timezones or no timezones
As you can see in the output above, both dates seems to have UTC timezone, at the same time, one time zone is not equal to another and subtraction of them does not work. Is there some work around that can allow me to get subtraction results?
Thanks!
pandas uses pytz's timezone model for UTC [src], which does not compare equal to the one used by the datetime module from the Python standard lib:
from datetime import datetime, timezone
import pandas as pd
import pytz
s = '2020-04-21 00:46:23'
t = pd.to_datetime(s, utc=True)
t.tzinfo
# <UTC>
d = datetime.fromisoformat(s).replace(tzinfo=timezone.utc)
d.tzinfo
# datetime.timezone.utc
t.tzinfo == d.tzinfo
# False
d = d.replace(tzinfo=pytz.utc)
t.tzinfo == d.tzinfo
# True
So a solution could be to use
EPOCH = datetime.utcfromtimestamp(0).replace(tzinfo=pytz.utc)

Date difference in hours (Excel data import)?

I need to calculate hour difference between two dates (format: year-month-dayTHH:MM:SS I could also potentially transform data format to (format: year-month-day HH:MM:SS) from huge excel file. What is the most efficient way to do it in Python? I have tried to use Datatime/Time object (TypeError: expected string or buffer), Timestamp (ValueError) and DataFrame (does not give hour result).
Excel File:
Order_Date Received_Customer Column3
2000-10-06T13:00:58 2000-11-06T13:00:58 1
2000-10-21T15:40:15 2000-12-27T10:09:29 2
2000-10-23T10:09:29 2000-10-26T10:09:29 3
..... ....
Datatime/Time object code (TypeError: expected string or buffer):
import pandas as pd
import time as t
data=pd.read_excel('/path/file.xlsx')
s1 = (data,['Order_Date'])
s2 = (data,['Received_Customer'])
s1Time = t.strptime(s1, "%Y:%m:%d:%H:%M:%S")
s2Time = t.strptime(s2, "%Y:%m:%d:%H:%M:%S")
deltaInHours = (t.mktime(s2Time) - t.mktime(s1Time))
print deltaInHours, "hours"
Timestamp (ValueError) code:
import pandas as pd
import datetime as dt
data=pd.read_excel('/path/file.xlsx')
df = pd.DataFrame(data,columns=['Order_Date','Received_Customer'])
df.to = [pd.Timestamp('Order_Date')]
df.fr = [pd.Timestamp('Received_Customer')]
(df.fr-df.to).astype('timedelta64[h]')
DataFrame (does not return the desired result)
import pandas as pd
data=pd.read_excel('/path/file.xlsx')
df = pd.DataFrame(data,columns=['Order_Date','Received_Customer'])
df['Order_Date'] = pd.to_datetime(df['Order_Date'])
df['Received_Customer'] = pd.to_datetime(df['Received_Customer'])
answer = df.dropna()['Order_Date'] - df.dropna()['Received_Customer']
answer.astype('timedelta64[h]')
print(answer)
Output:
0 24 days 16:38:07
1 0 days 00:00:00
2 20 days 12:39:52
dtype: timedelta64[ns]
Should be something like this:
0 592 hour
1 0 hour
2 492 hour
Is there another way to convert timedelta64[ns] into hours than answer.astype('timedelta64[h]')?
For each of your solutions you mixed up datatypes and methods. Whereas I do not find the time to explicitly explain your mistakes, yet i want to help you by providing a (probably non optimal) solution.
I built the solution out of your previous tries and I combined it with knowledge from other questions such as:
Convert a timedelta to days, hours and minutes
Get total number of hours from a Pandas Timedelta?
Note that i used Python 3. I hope that my solution guides your way. My solution is this one:
import pandas as pd
from datetime import datetime
import numpy as np
d = pd.read_excel('C:\\Users\\nrieble\\Desktop\\check.xlsx',header=0)
start = [pd.to_datetime(e) for e in data['Order_Date'] if len(str(e))>4]
end = [pd.to_datetime(e) for e in data['Received_Customer'] if len(str(e))>4]
delta = np.asarray(s2Time)-np.asarray(s1Time)
deltainhours = [e/np.timedelta64(1, 'h') for e in delta]
print (deltainhours, "hours")

Categories

Resources