so I want to integrate my code with python API
# Install required library
!pip install xlrd
import pandas as pd
from datetime import time, timedelta, datetime
import openpyxl
import math
!pip install pytanggalmerah
from pytanggalmerah import TanggalMerah
# Mount google drive
from google.colab import drive
drive.mount('/content/drive')
# Read the Excel file
path = '/content/drive/MyDrive/Colab Notebooks/Book2.xls'
df = pd.read_excel(path)
# Convert the 'Tgl/Waktu' column to datetime format
df['Tgl/Waktu'] = pd.to_datetime(df['Tgl/Waktu'])
# Extract the date and time from the 'Tgl/Waktu' column
df['Date'] = df['Tgl/Waktu'].dt.date
a = df['Date'].drop_duplicates()
print(a)
with that code, it will have output as
0 2022-12-17
2 2022-12-19
4 2022-12-20
6 2022-12-21
8 2022-12-22
10 2022-12-23
Name: Date, dtype: object
and for the API i will use pytanggalmerah which will need the input to be
t.set_date("2019", "02", "05") #the order is Year, Month, Date
t.check()
how do i change my date object into string then make a for loop with my string to check whether is it true or false
how do i do it? how to integrate it?
You can use a list comp:
dates = [(str(x.year), str(x.month), str(x.day)) for x in df["Tgl/Waktu"].unique().tolist()]
for date in dates:
year, month, day = date
t.set_date(year, month, day)
is_holiday question:
import numpy as np
holidays = pd.DataFrame(holiday_data).rename(columns={"Date": "Day"})
cols = ["Year", "Month", "Day"]
holidays = holidays.assign(Date=pd.to_datetime(holidays[cols]).dt.date).drop(columns=cols)
df["is_holiday"] = np.where(df["Tgl/Waktu"].isin(holidays["Date"].to_list()), True, False)
print(df)
Related
I want to transform a date which has the following format "2022-03-17T19:38:48.331000Z"
in order to know if it would give me valuable information.
import numpy as np
import pandas as pd
import requests, json
from pandas import json_normalize
from datetime import datetime
from datetime import timezone
!pip3 install zulu
input: column_timestamp
id timestamp
ed25291d0f5edd91615d154f243f82f9 2022-03-18T07:33:36.882000Z
e02c5db9e6f6fca078798c9b2d486a81 2022-03-18T07:33:36.945000Z
f8756b6af18c2fedd8a295040279aecc 2022-03-18T07:33:37.549000Z
...
from datetime import datetime
from datetime import timezone
!pip3 install zulu
time = []
for i in range(505):
dt = zulu.parse(column_timestamp["timestamp"][i])
dt.format('% m/% d/% y % H:% M:% S % z')
time.append(dt)
i = +1
time_df = pd.DataFrame(time)
time_df
output:
0
0 2022-03-18 07:33:36.882000+00:00
1 2022-03-18 07:33:36.945000+00:00
2 2022-03-18 07:33:37.549000+00:00
3 2022-03-18 07:33:37.550000+00:00
4 2022-03-18 07:33:37.552000+00:00
... ...
I want to know if it's correct and as well split this dataframe into different columns:
Date
Hour
Minute
Seconds
And make sure if I'm doing the conversion correct:
'2022-03-18T07:33:36.746000Z'
In the code below, I am trying to get data for a specified date only.
It perfectly works for the shown code.
But if I change the date to 26-12-2020, it results in data of both 26-12-2020 and 27-12-2020.
import csv
import datetime
import os
import pandas as pd
import xlsxwriter
import numpy as np
from datetime import date
import datetime
import calendar
rdate = 27-12-2020
data= pd.read_excel(r'C:/Clover Workspace/NPS/Customer Feedback-28-12-2020.xlsx')
data.drop(columns=['User ID','Comments','Purpose ID'],inplace= True, axis=1)
df = pd.DataFrame(data, columns=['Name','Rating','Date','Store','Feedback choice'])
df['Date'] = pd.to_datetime(data['Date'])
df= df[df['Date'].ge("27-12-2020")]
How can I generate the output only for the specified date, irrespective of the date on the excel sheet name?
here:
df= df[df['Date'].ge("27-12-2020")]
.ge means greater or equal, so when you put in 26-12-2020 you get both days. Try using .eq instead:
df= df[df['Date'].eq("26-12-2020")]
I am trying to find a time difference between two datatimes. One is set from datetime and another one is read from a CSV file into a dataframe.
The CSV file:
,Timestamp,Value
1,2020-04-21 00:46:23,24.965867802122457
Actual code:
import pandas as pd
import numpy as np
from datetime import datetime, timezone
EPOCH = datetime.utcfromtimestamp(0).replace(tzinfo=timezone.utc)
df = pd.read_csv('./Out/bottom_clamp_pressure.csv', index_col = 0, header = 0)
df['Timestamp'] = df['Timestamp'].apply(pd.to_datetime, utc = True)
print(EPOCH)
print(df.loc[1, 'Timestamp'])
# Output:
# 1970-01-01 00:00:00+00:00
# 2020-04-21 00:46:23+00:00
print(EPOCH.tzinfo)
print(df.loc[1, 'Timestamp'].tzinfo)
# Output:
# UTC
# UTC
print(EPOCH.tzinfo == df.loc[1, 'Timestamp'].tzinfo)
# Output:
# False
print(df.loc[1, 'Timestamp'] - EPOCH)
# Output:
# TypeError: Timestamp subtraction must have the same timezones or no timezones
As you can see in the output above, both dates seems to have UTC timezone, at the same time, one time zone is not equal to another and subtraction of them does not work. Is there some work around that can allow me to get subtraction results?
Thanks!
pandas uses pytz's timezone model for UTC [src], which does not compare equal to the one used by the datetime module from the Python standard lib:
from datetime import datetime, timezone
import pandas as pd
import pytz
s = '2020-04-21 00:46:23'
t = pd.to_datetime(s, utc=True)
t.tzinfo
# <UTC>
d = datetime.fromisoformat(s).replace(tzinfo=timezone.utc)
d.tzinfo
# datetime.timezone.utc
t.tzinfo == d.tzinfo
# False
d = d.replace(tzinfo=pytz.utc)
t.tzinfo == d.tzinfo
# True
So a solution could be to use
EPOCH = datetime.utcfromtimestamp(0).replace(tzinfo=pytz.utc)
I have the following date: 2019-11-20 which corresponds to week 47 of the calendar year. This is also what my excel document says. However, when I do it in Python I get week 46 instead. I will upload my code but I do not get what's wrong with it. I tried to split up the column I had to date and time separately but still, I get the same problem. Very odd I do not know what's wrong and my local time at my laptop is fine. Thanks for your help in advance!
Here is my code:
import pandas as pd
from datetime import datetime
import numpy as np
import re
df = pd.read_csv (r'C:\Users\user\document.csv')
df['startedAt'].replace(regex=True,inplace=True,to_replace=r'\+01:00',value=r'')
df['startedAt'].replace(regex=True,inplace=True,to_replace=r'\+02:00',value=r'')
df['startedAt'] = df['startedAt'].apply(lambda x: datetime.strptime(x, '%Y-%m-%dT%H:%M:%S').strftime('%d-%m-%y %H:%M:%S'))
df['endedAt'].replace(regex=True,inplace=True,to_replace=r'\+01:00',value=r'')
df['endedAt'].replace(regex=True,inplace=True,to_replace=r'\+02:00',value=r'')
df['endedAt'] = pd.to_datetime(df['endedAt'], format='%Y-%m-%d')
df['startedAt'] = pd.to_datetime(df['startedAt'])
df['Date_started'] = df['startedAt'].dt.strftime('%d/%m/%Y')
df['Time_started'] = df['startedAt'].dt.strftime('%H:%M:%S')
df['Date_started'] = pd.to_datetime(df['Date_started'], errors='coerce')
df['week'] = df['Date_started'].dt.strftime('%U')
print(df)
I am using the following code to generate data series :-
import pandas as pd
import warnings
warnings.filterwarnings("ignore")
import numpy as np
import calendar
from datetime import datetime
from itertools import cycle, islice
month_input = "Jan"
year_input = 2018
month_start= str(month_input)
year_start = int(year_input)
start = pd.to_datetime(f'{month_start}{year_start}', format='%b%Y')
end = pd.to_datetime(f'{month_input}{year_start + 1}', format='%b%Y') - pd.Timedelta('1d') # Generating Date Range for an Year
daily_series_cal = pd.DataFrame({'Date': pd.date_range(start, end)})
When I am trying to do:
print(daily_series_cal["Date"][0])
It is giving as output as :-
2018-01-01 00:00:00
How can I change the format of whole column to 01/01/2018 ie mm/dd/yyyy?
It is possible by DatetimeIndex.strftime, but lost datetimes and get strings:
daily_series_cal = pd.DataFrame({'Date': pd.date_range(start, end).strftime('%m/%d/%Y')})