Expanding Records Based On Date Range Pandas - python

I am attempting to expand the records in a data frame between two dates. Given the input file of single entry for each record, I want to expand it based on a given date.
Here is an example of the input:
Here is an example of the desired expanded output:
Based on some other examples and documentation online, what I attempted to do was expand out the data frame on a 6 month time frame to get two records for each year, then I corrected the dates based on the birthday of the records using a counter to determine the split for before and after birthday.
df_expand['DATE'] = [pd.date_range(s, e, freq='6M') for s, e in
zip(pd.to_datetime(df_expand['Exposure Start']),
pd.to_datetime(df_expand['Exposure Stop']))]
df_expand = df_expand.explode('DATE').drop(['Exposure Start', 'Exposure Stop'], axis=1)
df_merged['counter'] = range(len(df_merged))
df_merged['start end'] = np.where(df_merged['counter'] % 2 != 0, 1, 0)
df_merged['DoB Year'] = df_merged['DoB'].dt.year
df_merged['DoB Month'] = df_merged['DoB'].dt.month
df_merged['DoB Day'] = df_merged['DoB'].dt.day
df_merged.loc[df_merged['start end'] == 0, 'Exposure Start'] = '1/1/'+ df_merged['Calendar Year'].astype(str)
df_merged.loc[df_merged['start end'] == 1, 'Exposure Start'] = df_merged['DoB Month'].astype(str) + '/' + (df_merged['DoB Day'].astype(int)+1).astype(str) + '/' + df_merged['Calendar Year'].astype(str)
df_merged.loc[df_merged['start end'] == 0, 'Exposure Stop'] = df_merged['DoB Month'].astype(str) + '/' + df_merged['DoB Day'].astype(str) + '/' + df_merged['Calendar Year'].astype(str)
df_merged.loc[df_merged['start end'] == 1, 'Exposure Stop'] = '12/31/'+ df_merged['Calendar Year'].astype(str)
This solution is clearly not elegant, and while it worked originally for my proof of concept, it is now running into issues with edge cases involving rules for the Exposure Start.
Study years are split into 2 separate periods, around the record's birthday.
The initial exposure begins 1/1 of the study year (or, the date that the record enters the study, whichever comes later) and goes through the day before the birthday (or non-death exit date, if that comes sooner).
The 2nd period goes from the birthday to the end of the calendar year (or non-death exit date, if that comes sooner). Where a death is observed, exposure is continued through the next birthday.
An iterative solution is probably better suited, but this was the documentation and guidance I received.

df_merged = pd.read_excel("inputdatawithtestcase.xlsx")
df_merged['DATE'] = [pd.date_range(s, e, freq='6M') for s, e in
zip(pd.to_datetime(df_merged['Exposure Start']),
pd.to_datetime(df_merged['Exposure Stop']))]
df_merged = df_merged.explode('DATE')
df_merged['counter'] = range(len(df_merged))
df_merged['start end'] = np.where(df_merged['counter'] % 2 != 0, 1, 0)
df_merged['DoB Year'] = df_merged['DoB'].dt.year
df_merged['DoB Month'] = df_merged['DoB'].dt.month
df_merged['DoB Day'] = df_merged['DoB'].dt.day
df_merged = df_merged.reset_index()
df_merged["Exposure Start month"] = df_merged["Exposure Start"].dt.month
df_merged["Exposure Start day"] = df_merged["Exposure Start"].dt.day
df_merged["new_perfect_year"] = df_merged["DATE"].dt.year
df_merged["start end"].loc[3]
Last_column = []
second_last_column = []
for a in range(len(df_merged)):
if a>=1:
if df_merged["DoB Year"].loc[a] != match_date:
count = 0
if (df_merged["Exposure Start day"].loc[a] == 1) & (df_merged["Exposure Start month"].loc[a] == 1):
if df_merged["Exposure Start day"].loc[a] == 1:
if df_merged["start end"].loc[a]== 0:
date = str(df_merged['Record ID'].loc[a]) + '/1/'+ str(df_merged['new_perfect_year'].loc[a])
Last_column.append(date)
else:
date = str(df_merged['Record ID'].loc[a]) + '/16/'+ str(df_merged['new_perfect_year'].loc[a])
Last_column.append(date)
else:
if df_merged["start end"].loc[a]== 0:
date = str(df_merged['Exposure Start day'].loc[a]) + '/1/'+ str(df_merged['new_perfect_year'].loc[a])
Last_column.append(date)
else:
date = str(df_merged['Record ID'].loc[a]) + '/16/'+ str(df_merged['new_perfect_year'].loc[a])
Last_column.append(date)
elif count == 0:
date = str(df_merged['Exposure Start month'].loc[a]) + "/" +str(df_merged['Exposure Start day'].loc[a]) + "/" + str(df_merged['new_perfect_year'].loc[a])
Last_column.append(date)
count = count + 1
else:
if df_merged["Exposure Start day"].loc[a] == 1:
if df_merged["start end"].loc[a]== 0:
date = str(df_merged['Record ID'].loc[a]) + '/16/'+ str(df_merged['new_perfect_year'].loc[a])
Last_column.append(date)
else:
date = str(df_merged['Record ID'].loc[a]) + '/1/'+ str(df_merged['new_perfect_year'].loc[a])
Last_column.append(date)
else:
if df_merged["start end"].loc[a]== 0:
date = str(df_merged['Record ID'].loc[a]) + '/16/'+ str(df_merged['new_perfect_year'].loc[a])
Last_column.append(date)
else:
date = str(df_merged['Record ID'].loc[a]) + '/1/'+ str(df_merged['new_perfect_year'].loc[a])
Last_column.append(date)
match_date = df_merged["DoB Year"].loc[a]
for a in range(len(df_merged)):
if (df_merged["Exposure Start day"].loc[a] == 1) & (df_merged["Exposure Start month"].loc[a] == 1):
if df_merged["Exposure Start day"].loc[a] == 1:
if df_merged["start end"].loc[a]== 0:
date = str(df_merged['Record ID'].loc[a]) + '/15/'+ str(df_merged['new_perfect_year'].loc[a])
second_last_column.append(date)
else:
date = '12' + '/31/'+ str(df_merged['new_perfect_year'].loc[a])
second_last_column.append(date)
else:
if df_merged["start end"].loc[a]== 0:
date = str(df_merged['Exposure Start day'].loc[a]) + '/15/'+ str(df_merged['new_perfect_year'].loc[a])
second_last_column.append(date)
else:
date = '12' + '/31/'+ str(df_merged['new_perfect_year'].loc[a])
second_last_column.append(date)
else:
if df_merged["Exposure Start day"].loc[a] == 1:
if df_merged["start end"].loc[a]== 0:
date = '12' + '/31/'+ str(df_merged['new_perfect_year'].loc[a])
second_last_column.append(date)
else:
date = str(df_merged['Record ID'].loc[a]) + '/15/'+ str(df_merged['new_perfect_year'].loc[a])
second_last_column.append(date)
else:
if df_merged["start end"].loc[a]== 0:
date = '12' + '/31/'+ str(df_merged['new_perfect_year'].loc[a])
second_last_column.append(date)
else:
date = str(df_merged['Record ID'].loc[a]) + '/15/'+ str(df_merged['new_perfect_year'].loc[a])
second_last_column.append(date)
match_date = df_merged["DoB Year"].loc[a]
last = pd.DataFrame(Last_column, columns = ["Last column"])
last_2 = pd.DataFrame(second_last_column, columns = ["Second Last column"])
final_df = pd.concat([df_merged, last], axis = 1)
final_df = pd.concat([final_df, last_2], axis = 1)
final_df
final_df = final_df[["Record ID", "DoB", "Exposure Start", "Last column", "Second Last column"]]
final_df.to_csv("name_final_this_first.csv")

Related

Advice on combining or coordinating multiple python scripts

I am still somewhat of a Python novice and am working on a raspberrypi based project at the moment.
I have successfully created and tested two (fairly simple) scripts which work well independently.
Script 1: continually scans for specific BLE devices and decodes the manufacturer data from the advert.
Script 2: continually reads data from a html page and decodes.
The raspberrypi passes all of the relevant information onto an HMI via a RS232 serial link. The HMI is limited in the fact it needs to receive all of the information from both scripts in the same message so I need to repeatedly send a data string with all of the info in. Script 2 contributes the first two parts of the string, script 2 contributes the remaining 20 parts of the string.
As you can see, the scripts currently have a small section at the end which sends the serial data from that script.
As I want to prevent too much lag between the info being received and processed and cannot predict when the data will be received, I don't think combining into a single large file will work as I will end up potentially missing data (e.g. the BLE data being sent). So I am assuming I need to run both scripts in the background and keep updating the relevant variables internally, then running a third script which reads them and collates them and sends the serial data?
Script 1:
`#! /usr/bin/python3 -u
import serial
import struct
from time import *
from bluepy.btle import Scanner
ser = serial.Serial('/dev/ttyS0',9600) #define serial port (PiZero
onboard UART)
coding = "Windows-1252"
SENSORS = {"80:ea:ca:12:23:0b": "Front Left Tyre:" ,"81:ea:ca:22:20:f7" : "Front Right Tyre:", "82:ea:ca:32:24:87" : "Rear Left Tyre:", "83:ea:ca:42:23:07" : "Rear Right Tyre:"}
scanner = Scanner()
#set starting values for all output variables
pres_fl = pres_fr = pres_rl = pres_rr = ""
temp_fl = temp_fr = temp_rl = temp_rr = ""
batt_fl = batt_fr = batt_rl = batt_rr = ""
status_fl = status_fr = status_rl = status_rr = "NONE"
flat_fl = flat_fr = flat_rl = flat_rr = "00"
front_min_pres = 30 # minimum pressure front psi
rear_min_pres = 30 # minimum pressure rear psi
max_temp = 50 # maximum temperature degC
min_batt = 30 # minimum battery level %
max_loss = 5
fl_count = 0
fr_count = 0
rl_count = 0
rr_count = 0
def sort_data(input_data):
#read the relevant bytes fromt he array to each parameter
id_byte = chunks[2:8]
pres_byte = chunks[8:12]
temp_byte = chunks[12:16]
batt_byte = chunks[16]
flat_byte = chunks[17]
#join the elements of each of the sub-arrays into a string
id_str = ''.join(id_byte)
pres_str = ''.join(pres_byte)
temp_str = ''.join(temp_byte)
#convert strings into bytes from hex
pres_hex = bytes.fromhex(pres_str)
temp_hex = bytes.fromhex(temp_str)
#convert each bytes into integers
pres_int = int.from_bytes(pres_hex,'little')
temp_int = int.from_bytes(temp_hex,'little', signed=True)
batt_int = int(batt_byte, 16)
#convert into strings and do necessary unit conversions
pres = str(round(((pres_int/100000)*14.5),1))
temp = str(round((temp_int/100),1))
batt = str(batt_int)
return (id_str , pres , temp , batt , flat_byte)
def check(input_pres,input_temp,input_batt):
input_pres = float(input_pres)
input_temp = float(input_temp)
input_batt = float(input_batt)
if "Front" in device_location:
if input_pres <= front_min_pres:
status = "LOW PRESSURE"
elif input_temp >= max_temp:
status = "HIGH TEMP"
elif input_batt <= min_batt:
status = "LOW BATTERY"
else:
status = "OKAY"
elif "Rear" in device_location:
if input_pres <= rear_min_pres:
status = "LOW PRESSURE"
elif input_temp >= max_temp:
status = "HIGH TEMP"
elif input_batt <= min_batt:
status = "LOW BATTERY"
else:
status = "OKAY"
return status
while True:
scan_entries = scanner.scan(2.0)
#Scan for listed BLE devices and retrieve manufacturer info
for scan_entry in scan_entries:
if scan_entry.addr in SENSORS:
device_location = SENSORS[scan_entry.addr]
manufacturer_hex = next(value for _, desc, value in
scan_entry.getScanData() if desc == 'Manufacturer')
manufacturer_bytes = bytes.fromhex(manufacturer_hex)
# Move manufacturer info into an array of x00 chunks
chunk_length = 2
chunks = [manufacturer_hex[i:i+chunk_length] for i in range(0, len(manufacturer_hex), chunk_length)] # combine data into an array
id_str , pres , temp , batt , flat_byte = sort_data(chunks)
if flat_byte == "00":
if id_str == "80eaca12230b":
fl_count = 0
position = "FL Tyre"
pres_fl = pres
temp_fl = temp
batt_fl = batt
status_fl = check(pres,temp,batt)
flat_fl = flat_byte
fr_count = fr_count + 1
if fr_count >= max_loss:
status_fr = "LOST"
rl_count = rl_count + 1
if rl_count >= max_loss:
status_rl = "LOST"
rr_count = rr_count + 1
if rr_count >= max_loss:
status_rr = "LOST"
elif id_str == "81eaca2220f7":
fr_count = 0
position = "FR Tyre"
pres_fr = pres
temp_fr = temp
batt_fr = batt
status_fr = check(pres,temp,batt)
flat_fr = flat_byte
fl_count = fl_count + 1
if fl_count >= max_loss:
status_fl = "LOST"
rl_count = rl_count + 1
if rl_count >= max_loss:
status_rl = "LOST"
rr_count = rr_count + 1
if rr_count >= max_loss:
status_rr = "LOST"
elif id_str == "82eaca322487":
rl_count = 0
position = "RL Tyre"
pres_rl = pres
temp_rl = temp
batt_rl = batt
status_rl = check(pres,temp,batt)
flat_rl = flat_byte
fl_count = fl_count + 1
if fl_count >= max_loss:
status_fl = "LOST"
fr_count = fr_count + 1
if fr_count >= max_loss:
status_fr = "LOST"
rr_count = rr_count + 1
if rr_count >= max_loss:
status_rr = "LOST"
elif id_str == "83eaca422307":
rr_count = 0
position = "RR Tyre"
pres_rr = pres
temp_rr = temp
batt_rr = batt
status_rr = check(pres,temp,batt)
flat_rr = flat_byte
fl_count = fl_count + 1
if fl_count >= max_loss:
status_fl = "LOST"
fr_count = fr_count + 1
if fr_count >= max_loss:
status_fr = "LOST"
rl_count = rl_count + 1
if rl_count >= max_loss:
status_rl = "LOST"
trip_ser = bytes("TRIP NAME," + "TRIP VALUE,", coding)
pres_ser = bytes(pres_fl + "," + pres_fr + "," + pres_rl + "," + pres_rr + ",", coding)
temp_ser = bytes(temp_fl + "," + temp_fr + "," + temp_rl + "," + temp_rr + ",", coding)
batt_ser = bytes(batt_fl + "," + batt_fr + "," + batt_rl + "," + batt_rr + ",", coding)
flat_ser = bytes(flat_fl + "," + flat_fr + "," + flat_rl + "," + flat_rr + ",", coding)
status_ser = bytes(status_fl + "," + status_fr + "," + status_rl + "," + status_rr, coding)
terminator = bytes("\r", coding)
textstr = trip_ser + pres_ser + temp_ser + batt_ser + flat_ser + status_ser + terminator
ser.write(textstr)
print(textstr.decode(coding))
print("FL:" + str(fl_count))
print("FR:" + str(fr_count))
print("RL:" + str(rl_count))
print("RR:" + str(rr_count))
sleep(0)
else:
status = "PUNCTURE"
if id_str == "80eaca12230b":
position = "FL Tyre"
pres_fl = pres
temp_fl = temp
batt_fl = batt
status_fl = status
flat_fl = flat_byte
elif id_str == "81eaca2220f7":
position = "FR Tyre"
pres_fr = pres
temp_fr = temp
batt_fr = batt
status_fr = status
flat_fr = flat_byte
elif id_str == "82eaca322487":
position = "RL Tyre"
pres_rl = pres
temp_rl = temp
batt_rl = batt
status_rl = status
flat_rl = flat_byte
elif id_str == "83eaca422307":
position = "RR Tyre"
pres_rr = pres
temp_rr = temp
batt_rr = batt
status_rr = status
flat_rr = flat_byte
trip_ser = bytes(",,", coding) # **These two parts need to come from script 2**
pres_ser = bytes(pres_fl + "," + pres_fr + "," + pres_rl + "," + pres_rr + ",", coding)
temp_ser = bytes(temp_fl + "," + temp_fr + "," + temp_rl + "," + temp_rr + ",", coding)
batt_ser = bytes(batt_fl + "," + batt_fr + "," + batt_rl + "," + batt_rr + ",", coding)
flat_ser = bytes(flat_fl + "," + flat_fr + "," + flat_rl + "," + flat_rr + ",", coding)
status_ser = bytes(status_fl + "," + status_fr + "," + status_rl + "," + status_rr + ",", coding)
terminator = bytes("\r", coding)
textstr = trip_ser + pres_ser + temp_ser + batt_ser + flat_ser + status_ser + terminator
ser.write(textstr)
print(textstr.decode(coding))
`
Script 2
#! /usr/bin/python3 -u
import urllib.request , time , serial
ser = serial.Serial('/dev/ttyS0',9600)
#List of Charactors to remove from the message
bad_chars = ['{' , '":' , ' "' , '}' , '"']
format = "Windows-1252"
#-----------------Functions----------------------------
#Decode the incoming serial datastream
def decode_serial(input1):
output1 = input1.decode(format).replace('\r','').replace('\n','') #decode the data and remove the \n & \n charactors
return (output1)
#Convert fuel economy value to mpg
def convert_economy(input2):
input2 = float(input2)
output2 = round((input2 * 282.481),1)
output2 = str(output2)
return (output2)
#Convert speed value to mph
def convert_speed(input3):
input3 = float(input3)
output3 = round((input3 * 0.6214))
output3 = str(output3)
return (output3)
#Convert KPa value to bar
def convert_pres(input5):
input5 = float(input5)
output5 = (input5 / 100)
output5 = str(output5)
return (output5)
#------------------Main Program----------------------------
while True:
# If there is a connection to the host:
try:
data = urllib.request.urlopen("http://192.168.4.1/readVal").read()
#turn data into a string
decode = data.decode()
#remove bad characters from string
for i in bad_chars:
decode = decode.replace(i,'')
#split into list (comma seperated)
list = decode.split(",")
#assign to each type
name_raw = list[0]
value_raw = list[1]
unit_raw = list[2]
#remove superflous characters
name = name_raw.replace("n","", 1)
value = value_raw.replace("v","")
value = value.replace('<br />'," ")
unit = unit_raw.replace("u","")
#Carry out name , value & unit conversion if Turbo Boost is received
if name == 'Turbo boost':
name = 'Turbo Boost'
unit = 'bar'
value = convert_pres(value)
#If no unit is provided then ignore addition of unit
if unit == '':
name = name
#If unit is HTML temperature character then replace with UTF-8 DegC
elif unit == '°C':
subunit = '\u00b0' + "C"
name = name + ' (' + subunit + ')'
#Carry out unit and value conversion if Speed is received
elif unit == 'km/h':
subunit = 'mph'
name = name + ' (' + subunit + ')'
value = convert_speed(value)
#Carry out unit and value conversion if Consumption is received
elif unit == 'l/100':
subunit = 'mpg'
name = name + ' (' + subunit + ')'
value = convert_economy(value)
#Combine name and unit into name string
else:
name = name + ' (' + unit + ')'
# Combine data to send as bytes via RS232 Comms
textstr = bytes(name + ',' + value + "\r",format) #**This is where script 1 would contribute the remaining parts**
ser.write(textstr)
(EXCUSE THE WONKY INDENTING, pasting in my code from notepad++ didnt work smoothly this time)
I am unsure whether I can create another script just for the serial communication which reads the current variables from the two scripts periodically as they internally change or not? I have only previously defined functions within one script and called them from another or just variables which remain fixed (not constantly changing like these).
My approach would be to use multithreading. It sounds scary, but I think it is the cleanest way to run 2 loops at the same time.
I am quite new with python so I can't really help you with exact solution. But I hope my idea could help you at least a little bit
The idea is that you run script1 and script2 as secondary threads and append the outputs from them to main thread into lists.
Then the main thread combines the information and sends it to HMI.
This will work if the script 1 and 2 send information with the same frequency. If they don't you can adjust it in the main loop
import threading
data_1 = []
data_2 = []
def script_1():
while True:
# do something - your script 1
# create textstr
data_1.append(textstr)
def script_2():
while True:
# do something - your script 2
# create textstr
data_2.append(textstr)
# Create threads
thread_1 = threading.Thread(target=script_1)
thread_2 = threading.Thread(target=script_2)
# Start threads (never ending while loops)
thread_1.start()
thread_2.start()
# Main loop
while True:
if len(data_1) > 0 and len(data_2) > 0:
# Combine data and send them to HMI
combined_data = data_1[0] + data_2[0]
send_to_HMI(combined_data)
# Remove sent data from lists
data_1.pop(0)
data_2.pop(0)
Also this is probably very bad performance-wise. There are better ways to do this, but this is the only one I know about.

Why my openpyxl code is slower than my VBA code?

I have an excel file of nearly 95880 rows. I made a VBA function that runs slow, so I tried to code a python script using openpyxl, but it's even slower.
It starts fast, then after 600 rows becomes slower and slower.
The VBA Code is
Option Explicit
Function FTE(Assunzione As Date, Cess As Variant, Data)
Dim myDate As Date
Dim EndDate As Date, EndDate2 As Date
Dim check As Integer
EndDate = Application.WorksheetFunction.EoMonth(Assunzione, 0)
myDate = #1/1/2022#
If Cess = 0 Then
Call Check2(Assunzione, Data, myDate, EndDate, check)
FTE = check
Else:
EndDate2 = Application.WorksheetFunction.EoMonth(Cess, -1)
Call Check1(Assunzione, Cess, Data, myDate, EndDate, EndDate2, check)
FTE = check
End If
End Function
Sub Check1(Assunzione, Cess, Data, myDate, EndDate, EndDate2, check)
Dim Cess1 As Date
Dim gg_lav As Integer, gg_lav2 As Integer
Cess1 = Cess.Value
If Assunzione > Date Then
check = 0
Else
If Month(Assunzione) <= Month(Data) And Year(Assunzione) = 2022 Then
If Assunzione > myDate Then
gg_lav = Application.WorksheetFunction.Days(EndDate, Assunzione) + 1
If gg_lav >= 15 Then
If Month(Data) = (Month(EndDate2) + 1) And Year(Cess1) = 2022 Then
gg_lav2 = Application.WorksheetFunction.Days(Cess1, EndDate2)
If gg_lav2 >= 15 Then
check = 1
Else
check = 0
End If
Else
check = 1
End If
Else
check = 0
End If
Else
check = 1
End If
Else
check = 1
End If
End If
End Sub
Sub Check2(Assunzione, Data, myDate, EndDate, check)
Dim gg_lav As Integer
If Assunzione > Date Then
check = 0
Else
If Month(Assunzione) <= Month(Data) And Year(Assunzione) = 2022 Then
If Assunzione > myDate Then
gg_lav = Application.WorksheetFunction.Days(EndDate, Assunzione) + 1
If gg_lav >= 15 Then
check = 1
Else
check = 0
End If
Else
check = 1
End If
Else
check = 1
End If
End If
End Sub
and my openpyxl is:
def check1(a,d,c,i):
if ws.cell(row=i,column=a).value > ws.cell(row=i,column=d).value:
return 0
else:
if ws.cell(row=i,column=a).value.month == ws.cell(row=i,column=d).value.month and ws.cell(row=i,column=a).value.year == 2022:
EndDate = date(ws.cell(row=i,column=a).value.year, ws.cell(row=i,column=a).value.month,
calendar.monthrange(ws.cell(row=i,column=a).value.year,
ws.cell(row=i,column=a).value.month)[1])
gg_lav = (EndDate - datetime.date(ws.cell(row=i,column=a).value)).days
if gg_lav >= 15:
EndDate2 = date(ws.cell(row=i,column=c).value.year,ws.cell(row=i,column=c).value.month-1,
calendar.monthrange(ws.cell(row=i,column=c).value.year,
ws.cell(row=i,column=c).value.month-1)[1])
if ws.cell(row=i,column=d).value.month == EndDate2.month and ws.cell(row=i,column=c).value.year == 2022:
gg_lav2 = (datetime.date(ws.cell(row=i,column=c).value)-EndDate2).days
if gg_lav2 >= 15:
return 1
else:
return 0
else:
return 1
else:
return 0
else:
return 1
def check2(a,d,i):
if ws.cell(row=i,column=a).value > ws.cell(row=i,column=a).value:
return 0
else:
if ws.cell(row=i,column=a).value.month == ws.cell(row=i,column=d).value.month and ws.cell(row=i,column=a).value.year == 2022:
EndDate = date(ws.cell(row=i,column=a).value.year, ws.cell(row=i,column=a).value.month,
calendar.monthrange(ws.cell(row=i,column=a).value.year,
ws.cell(row=i,column=a).value.month)[1])
gg_lav = (EndDate - datetime.date(ws.cell(row=i,column=a).value)).days
if gg_lav >= 15:
return 1
else:
return 0
else:
return 1
wb1 = Workbook()
ws1 = wb1.create_sheet()
for i in range(2,95882):
if ws.cell(row = i, column = c).value == None:
ws1.cell(row = i, column = 1, value = check2(a, d, i))
else:
ws1.cell(row = i, column = 1, value = check1(a, d, c, i))
What am I doing wrong? Should I use another library or I'm making the code uselessy memory consuming?
Thank you very much for any help!
Update: I think that the problem was with openpyxl. First I tried to reduce the number of observation, from 95K to almost 5K, but it required two and half hour to complete the task.
So I used numpy and it took 55 seconds. Yeah, that's the difference in processing speed.
Here I post the code:
with open('data.csv','r') as f:
data = list(csv.reader(f,delimiter =';'))
arr = np.array(data)
arr = np.resize(arr,(4797,13))
I had to change of course the code in this section:
a = 3
d = 0
c = 4
def check1(a,d,c,i):
if int(arr[i][a]) > int(arr[i][d]):
return 0
else:
za = datetime.fromordinal((int(arr[i][a]) + 693594))
zd = datetime.fromordinal((int(arr[i][d]) + 693594))
da = date(za.year, za.month, za.day)
dd = date(zd.year, zd.month, zd.day)
if za.month == zd.month and za.year + 1899 == 2022:
EndDate = date(za.year, za.month,
calendar.monthrange(za.year,
za.month)[1])
gg_lav = (EndDate - da).days
if gg_lav >= 15:
zc = datetime.fromordinal((int(arr[i][c]) + 693594))
dc = date(zc.year, zc.month, zc.day)
EndDate2 = date(zc.year,zc.month-1,
calendar.monthrange(zc.year,
zc.month-1)[1])
if zd.month == EndDate2.month and zc.year == 2022:
gg_lav2 = (dc-EndDate2).days
if gg_lav2 >= 15:
return 1
else:
return 0
else:
return 1
else:
return 0
else:
return 1
I don't report the check2 function
fte = np.array(10)
for i in range(1,4797):
if arr[i][c] == '':
fte = np.append(fte,check2(a,d,i))
else:
fte = np.append(fte,check1(a, d, c, i))
print(i)

ValueError: time data '' does not match format '%Y-%m-%d %H:%M'

I'm new to coding and cant figure out where i'm breaking. The ValueError keeps coming up but i cant seem to figure out where i'm breaking
def sunset(date,daycycle):
sunset_date_time = ''
year = date.strftime("%Y")
year_data = daycycle.get(year)
if(year_data != None):
month_day = date.strftime("%m-%d")
result_set = year_data.get(month_day)
if(result_set != None):
sunset_time = result_set["sunset"]
sunset_date_time = year + "-" + month_day + " " + sunset_time
return datetime.datetime.strptime(sunset_date_time, "%Y-%m-%d %H:%M")
This error is caused by the date format of the variable "sunset_date_time"
When you try to return the object this variable not have the date format as "%Y-%m-%d %H:%M"
To see what format have you can try print this value or return from the function and check the order of year, month, day , hour and minutes
def sunset(date,daycycle):
sunset_date_time = ''
year = date.strftime("%Y")
year_data = daycycle.get(year)
if(year_data != None):
month_day = date.strftime("%m-%d")
result_set = year_data.get(month_day)
if(result_set != None):
sunset_time = result_set["sunset"]
sunset_date_time = year + "-" + month_day + " " + sunset_time
print(sunset_date_time)
"""
or return sunset_date_time
"""

Unable to select date from datepicker calendar in selenium using python

I'd like to use python selenium to search at https://book.spicejet.com/Search.aspx
I reviewed this question but it is not the right answer what I am looking for.
I searched for a flight from Kolkata to Goa with 2 adults and 2 Infants. When I am giving passenger details I couldn't able to select infants Date of birth.
import time
import selenium
from selenium import webdriver
from selenium.webdriver.support.ui import Select
browser = webdriver.Chrome()
booking_url = "https://book.spicejet.com/Search.aspx"
browser.get(booking_url)
departureButton = browser.find_element_by_id("ControlGroupSearchView_AvailabilitySearchInputSearchVieworiginStation1_CTXT").click()
browser.find_element_by_partial_link_text("Kolkata").click()
arivalButton = browser.find_element_by_id("ControlGroupSearchView_AvailabilitySearchInputSearchViewdestinationStation1_CTXT")
arivalButton.click()
time.sleep(.3)
arivalButton.send_keys("Goa")
time.sleep(1)
search_date = "20-September 2019"
dep_date = search_date.split("-")
dep_month = dep_date[1]
dep_day = dep_date[0]
while browser.find_element_by_class_name("ui-datepicker-title").text != dep_month:
browser.find_element_by_css_selector("a[title='Next']").click()
browser.find_element_by_xpath("//table//a[text()='"+dep_day+"']").click()
time.sleep(1)
try:
return_date_close = browser.find_element_by_class_name("date-close").click
except:
pass
pax_selct = browser.find_element_by_id("divpaxinfo").click()
time.sleep(.2)
# __________Adult number_____________
for i in range(0, 2 - 1):
adults = browser.find_element_by_id("hrefIncAdt")
adults.click()
# ____________Set Num of Children___________________
for i in range(0, 0):
childrens = browser.find_element_by_id("hrefIncChd")
childrens.click()
# ____________Set Num of Infant(s)___________________
for i in range(0, 2):
infants = browser.find_element_by_id("hrefIncInf")
infants.click()
donebttn = browser.find_element_by_id("btnclosepaxoption").click()
searchBtn = browser.find_element_by_class_name("bookbtn").click()
browser.switch_to.default_content()
flightarr = []
tbl_row = browser.find_elements_by_class_name("fare-row")
time_select=3
price_select=1
new_time_serial = 0
tr_cont = 4
for item in tbl_row:
if item.is_displayed():
if new_time_serial == time_select:
col = item
cont = str(tr_cont)
if price_select == 0:
price1 = col.find_element_by_xpath('//*[#id="availabilityTable0"]/tbody/tr['+cont+']/td[3]/p').click()
elif price_select == 1:
price2 = col.find_element_by_xpath('//*[#id="availabilityTable0"]/tbody/tr['+cont+']/td[4]/p').click()
new_time_serial = new_time_serial + 1
tr_cont = tr_cont + 1
time.sleep(1)
cntn_btn = browser.find_element_by_class_name("button-continue").click()
passen_serial = 0
passen_serial_inf = 0
#inf = 1
birth_year = "2017"
birth_month = "Nov"
birth_day = "30"
all_pass_frm = browser.find_element_by_class_name("multicontent")
all_pass_entry = all_pass_frm.find_elements_by_class_name("sectionContent")
for passen in all_pass_entry:
pass_type = passen.find_element_by_class_name("guest-heading").text.split(' ',1)[0]
pass_type2 = passen.find_element_by_class_name("guest-heading").text.split(' ',1)[1]
if pass_type == "Adult":
deg_sel_name = Select(passen.find_element_by_id("CONTROLGROUPPASSENGER_PassengerInputViewPassengerView_DropDownListTitle_" + str(passen_serial) + ""))
deg_sel_name.select_by_index(1)
first_name_in = passen.find_element_by_id("CONTROLGROUPPASSENGER_PassengerInputViewPassengerView_TextBoxFirstName_" + str(passen_serial) + "")
first_name_in.send_keys("imam")
last_name_in = passen.find_element_by_id("CONTROLGROUPPASSENGER_PassengerInputViewPassengerView_TextBoxLastName_" + str(passen_serial) + "")
last_name_in.send_keys("Hossain")
elif pass_type == "Child":
deg_sel_name = Select(passen.find_element_by_id("CONTROLGROUPPASSENGER_PassengerInputViewPassengerView_DropDownListGender_" + str(passen_serial) + ""))
deg_sel_name.select_by_index(2)
first_name_in = passen.find_element_by_id("CONTROLGROUPPASSENGER_PassengerInputViewPassengerView_TextBoxFirstName_" + str(passen_serial) + "")
first_name_in.send_keys("Korim")
last_name_in = passen.find_element_by_id("CONTROLGROUPPASSENGER_PassengerInputViewPassengerView_TextBoxLastName_" + str(passen_serial) + "")
last_name_in.send_keys("Hossain")
elif pass_type == "Infant":
deg_sel_name = Select(passen.find_element_by_id("CONTROLGROUPPASSENGER_PassengerInputViewPassengerView_DropDownListGender_"+ str(passen_serial_inf) + "_" + str(passen_serial_inf) + ""))
deg_sel_name.select_by_index(2)
first_name_in = passen.find_element_by_id("CONTROLGROUPPASSENGER_PassengerInputViewPassengerView_TextBoxFirstName_"+ str(passen_serial_inf) + "_" + str(passen_serial_inf) + "")
first_name_in.send_keys("Aqiba")
last_name_in = passen.find_element_by_id("CONTROLGROUPPASSENGER_PassengerInputViewPassengerView_TextBoxLastName_"+ str(passen_serial_inf) + "_" + str(passen_serial_inf) + "")
last_name_in.send_keys("Hassan")
dob = passen.find_element_by_id("inputDateContactInfant" +str(pass_type2)+ "").click()
dob_cal = browser.find_element_by_class_name("datepickerViewYears")
dob_cal_year = dob_cal.find_element_by_class_name("datepickerYears")
inf_birth_year = dob_cal.find_element_by_xpath('.//*[#class="datepickerYears"]/tr/td/a/span[text()="'+birth_year+'"]').click()
inf_birth_mon = dob_cal.find_element_by_xpath('.//*[#class="datepickerMonths"]/tr/td/a/span[text()="'+birth_month+'"]').click()
inf_birth_day = dob_cal.find_element_by_xpath('.//*[#class="datepickerDays"]/tr/td/a/span[text()="'+birth_day+'"]').click()
# inf = inf +1
passen_serial_inf = passen_serial_inf + 1
passen_serial = passen_serial + 1
print("Done")
I tried with xPath it works for 1st infant but it is not working for 2nd infant. What should i need to do now? is there any way except XPath? And what can i do when passenger number is different?
I have extracted the var declarations and the var increments, there seems to be a mismatch at the declaration statements that could be causing the problem, don't the var declarations need to start at the same number? Set them equal and re-try.
var declaration
passen_serial_inf = 0
inf = 1
var increments
inf = inf +1
passen_serial_inf = passen_serial_inf + 1
passen_serial = passen_serial + 1

Change to the previous day date in python

I'm working on my python script to get the strings from the button objects so I can use it to set the date formats with the time that I got from the strings to store it in the lists. When I get the strings from the button objects, I want to set the date for each string, example: 29/08/2017 11:30PM, 30/08/2017 12:00AM, 30/08/2017 12:30AM.
When I try this:
if day_date >= 0 and day_date <= 6:
if epg_time3 == '12:00AM':
if day_date > 0:
currentday = datetime.datetime.now() + datetime.timedelta(days = 0)
nextday = datetime.datetime.now() + datetime.timedelta(days = self.program_day)
if currentday != nextday:
epg_time_1 = datetime.datetime.now() + datetime.timedelta(days = self.program_day + 1)
epg_time_2 = datetime.datetime.now() + datetime.timedelta(days = self.program_day + 1)
epg_time_3 = datetime.datetime.now() + datetime.timedelta(days = self.program_day + 1)
elif currentday == nextday:
epg_time_1 = datetime.datetime.now() + datetime.timedelta(days = self.program_day)
epg_time_2 = datetime.datetime.now() + datetime.timedelta(days = self.program_day - 1)
epg_time_3 = datetime.datetime.now() + datetime.timedelta(days = self.program_day - 1)
It will show the output:
self.epg_time_1
['29/08/2017 11:00PM']
self.epg_time_2
['29/08/2017 11:30PM']
self.epg_time_3
['29/08/2017 12:00AM']
When I'm calling the EPG_Times function again, it will show the output like this:
self.epg_time_1
['30/08/2017 11:00PM']
self.epg_time_2
['30/08/2017 11:30PM']
self.epg_time_3
['30/08/2017 12:00AM']
It should be:
self.epg_time_1
['30/08/2017 11:00PM']
self.epg_time_2
['30/08/2017 11:30PM']
self.epg_time_3
['31/08/2017 12:00AM']
As you can see the time 12:00AM is the next day so I want to set it to 31 not 30. I have changed from days = self.program_day + 1 to days = self.program_day - 1, but when the strings show 11:00PM, 11:30PM and 12:00AM from the variables epg_time_1, epg_time_2 and epg_time_3, it will show the output like this:
self.epg_time_1
['30/08/2017 11:00PM']
self.epg_time_2
['30/08/2017 11:30PM']
self.epg_time_3
['30/08/2017 12:00AM']
Here is the full code:
self.program_day = list()
def EPG_Times(self):
self.epg_time_1 = list()
self.epg_time_2 = list()
self.epg_time_3 = list()
epg_time1 = str(self.getControl(344).getLabel())
epg_time2 = str(self.getControl(345).getLabel())
epg_time3 = str(self.getControl(346).getLabel())
day_date = self.program_day
day = ''
month = ''
year = ''
if day_date >= 0 and day_date <= 6:
if epg_time3 == '12:00AM':
if day_date == 0:
if epg_time1 == '12:00AM':
epg_time_1 = datetime.datetime.now() + datetime.timedelta(days = self.program_day + 1)
epg_time_2 = datetime.datetime.now() + datetime.timedelta(days = self.program_day + 1)
epg_time_3 = datetime.datetime.now() + datetime.timedelta(days = self.program_day + 1)
else:
epg_time_1 = datetime.datetime.now() + datetime.timedelta(days = self.program_day)
epg_time_2 = datetime.datetime.now() + datetime.timedelta(days = self.program_day)
epg_time_3 = datetime.datetime.now() + datetime.timedelta(days = self.program_day + 1)
else:
currentday = datetime.datetime.now() + datetime.timedelta(days = 0)
nextday = datetime.datetime.now() + datetime.timedelta(days = self.program_day)
if currentday != nextday:
epg_time_1 = datetime.datetime.now() + datetime.timedelta(days = self.program_day + 1)
epg_time_2 = datetime.datetime.now() + datetime.timedelta(days = self.program_day + 1)
epg_time_3 = datetime.datetime.now() + datetime.timedelta(days = self.program_day + 1)
elif currentday == nextday:
epg_time_1 = datetime.datetime.now() + datetime.timedelta(days = self.program_day)
epg_time_2 = datetime.datetime.now() + datetime.timedelta(days = self.program_day - 1)
epg_time_3 = datetime.datetime.now() + datetime.timedelta(days = self.program_day - 1)
epg1_day = epg_time_1.strftime("%d")
epg1_month = epg_time_1.strftime("%m")
epg1_year = epg_time_1.strftime("%Y")
epg2_day = epg_time_2.strftime("%d")
epg2_month = epg_time_2.strftime("%m")
epg2_year = epg_time_2.strftime("%Y")
epg3_day = epg_time_2.strftime("%d")
epg3_month = epg_time_2.strftime("%m")
epg3_year = epg_time_2.strftime("%Y")
half_hour = str(epg1_day + "/" + epg1_month + "/" + epg1_year + " " + epg_time1)
one_hour = str(epg2_day + "/" + epg2_month + "/" + epg2_year + " " + epg_time2)
one_hour_half = str(epg3_day + "/" + epg3_month + "/" + epg3_year + " " + epg_time3)
#Store the times and date in the list....
self.epg_time_1.append(half_hour)
self.epg_time_2.append(one_hour)
self.epg_time_3.append(one_hour_half)
What I'm expected the code to do is to change to the previous day date for each string that I get from the button objects when I call the EPG_time(self) function. If the epg_time_1 and epg_time_2 show the strings 11:00PM and 11:30PM, I want to set the time and date to 29/08/2017 11:00PM for epg_time_1 and 29/08/2017 11:30PM for the epg_time_2. If the epg_time_3 show the string 12:00AM then I want to add it to the next day date with the time 30/08/2017 12:00AM.
In the next 24 hours if the epg_time_1 and epg_time_2 show the strings 11:00PM and 11:30PM, I want to set the time and date to 30/08/2017 11:00PM for epg_time_1 and 30/08/2017 11:30PM for the epg_time_2. If the epg_time_3 show the string 12:00AM then I want to set to the next day date with the time 1/09/2017 12:00AM
If the epg_time_1 and epg_time_2 show the strings 11:30PM and 12:00AM, I want to change to the previous date for epg_time_1 which it is 29/08/2017 11:30PM and 30/08/2017 12:00AM. It will be depends on the time and date when I have stored the strings in the list.
Can you please tell me an example how I could use to change the date to the previous date and add to the next day date using in python?
There's a lot of text in your question that makes it hard to pinpoint the issue exactly. However, it appears to boil down to adding a variable number of days to a particular date and ensuring that the month is also updated (if necessary).
You should use the datetime.datetime.strptime() method to convert your dates to datetimes, which makes it trivial to add timedelta (you use both timedelta and strftime but miss this crucial method) and then just convert back to a string.
import datetime as dt
def add_days(datelist, days_to_add):
# Convert the strings to datetime objects
datelist = [dt.datetime.strptime(item, '%d/%m/%Y %I:%M%p')
for item in datelist]
# Add a variable number of days (can be negative) to the datetimes
mod_dates = [item + dt.timedelta(days=days_to_add) for item in datelist]
# Convert back to strings
str_dates = [dt.datetime.strftime(item, '%d/%m/%Y %I:%M%p')
for item in mod_dates]
return str_dates
# This is the list right at the top of your question
a = ['29/08/2017 11:30PM', '30/08/2017 12:00AM', '30/08/2017 12:30AM']
print("Start:")
print(a)
b = add_days(a, 1)
print("After 1 call:")
print(b)
c = add_days(b, 1)
print("After 2 calls:")
print(c)

Categories

Resources