How to reduce runtime for for loop statements in python - python

I have a program that is taking over 2 minutes to run and I'm not sure what I could do to reduce the runtime. It's most definitely the for loop shown in the code.
"CPU Start Address in decimal"
CPU_START_ADDR = 1644167168 #0x62000000
"Import necessary libraries"
import time
import pandas as pd
import numpy as np
"Creates variable for time to display runtime"
start_time = time.time()
"Create a blank dataframe with necessary column headers"
columnNames = ["Settings 1 Values", "Settings 2 Values", "CPU Address", "FPGA Address", "Delta (Decimal)", "Delta (Pos. Difference)", "Register Name", "R/W"]
output = pd.DataFrame(columns = columnNames)
"Fill values from settings files into output dataframe"
df1 = pd.read_csv("50MHzWholeFPGA.csv")
df2 = pd.read_csv("75MHzWholeFPGA.csv")
spec = pd.read_excel("Mozart FPGA Register Specification.xlsx", skiprows=[0,1,2,3])
output.loc[:, "Settings 1 Values"] = df1.iloc[:, 0]
output.loc[:, "Settings 2 Values"] = df2.iloc[:, 0]
output['Delta (Decimal)'] = output['Settings 2 Values'] - output['Settings 1 Values']
"For loop generates CPU Addresses for all values"
for index, row in output.iterrows():
output.loc[index, 'CPU Address'] = hex(CPU_START_ADDR + (2 * index))[2:]
settingXor = bin(int(output.loc[index, "Settings 1 Values"]) ^ int(output.loc[index, "Settings 2 Values"]))
output.loc[index, "Delta (Pos. Difference)"] = settingXor

Related

Check if the number of slots is > 0 before picking a date and an hour?

I am building a vaccination appointment program that automatically assigns a slot to the user.
This builds the table and saves it into a CSV file:
import pandas
start_date = '1/1/2022'
end_date = '31/12/2022'
list_of_date = pandas.date_range(start=start_date, end=end_date)
df = pandas.DataFrame(list_of_date)
df.columns = ['Date/Time']
df['8:00'] = 100
df['9:00'] = 100
df['10:00'] = 100
df['11:00'] = 100
df['12:00'] = 100
df['13:00'] = 100
df['14:00'] = 100
df['15:00'] = 100
df['16:00'] = 100
df['17:00'] = 100
df.to_csv(r'C:\Users\Ric\PycharmProjects\pythonProject\new.csv')
And this code randomly pick a date and an hour from that date in the CSV table we just created:
import pandas
import random
from random import randrange
#randrange randomly picks an index for date and time for the user
random_date = randrange(365)
random_hour = randrange(10)
list = ["8:00", "9:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00", "16:00", "17:00"]
hour = random.choice(list)
df = pandas.read_csv('new.csv')
date=df.iloc[random_date][0]
# 1 is substracted from that cell as 1 slot will be assigned to the user
df.loc[random_date, hour] -= 1
df.to_csv(r'C:\Users\Ric\PycharmProjects\pythonProject\new.csv',index=False)
print(date)
print(hour)
I need help with making the program check if the random hour it chose on that date has vacant slots. I can manage the while loops that are needed if the number of vacant slots is 0. And no, I have not tried much because I have no clue of how to do this.
P.S. If you're going to try running the code, please remember to change the save and read location.
Here is how I would do it. I've also cleaned it up a bit.
import random
import pandas as pd
start_date, end_date = '1/1/2022', '31/12/2022'
hours = [f'{hour}:00' for hour in range(8, 18)]
df = pd.DataFrame(
data=pd.date_range(start_date, end_date),
columns=['Date/Time']
)
for hour in hours:
df[hour] = 100
# 1000 simulations
for _ in range(1000):
random_date, random_hour = random.randrange(365), random.choice(hours)
# Check if slot has vacant slot
if df.at[random_date, random_hour] > 0:
df.at[random_date, random_hour] -= 1
else:
# Pass here, but you can add whatever logic you want
# for instance you could give it the next free slot in the same day
pass
print(df.describe())
import pandas
import random
from random import randrange
# randrange randomly picks an index for date and time for the user
random_date = randrange(365)
# random_hour = randrange(10) #consider removing this line since it's not used
lista = [# consider avoid using Python preserved names
"8:00",
"9:00",
"10:00",
"11:00",
"12:00",
"13:00",
"14:00",
"15:00",
"16:00",
"17:00",
]
hour = random.choice(lista)
df = pandas.read_csv("new.csv")
date = df.iloc[random_date][0]
# 1 is substracted from that cell as 1 slot will be assigned to the user
if df.loc[random_date, hour] > 0:#here is what you asked for
df.loc[random_date, hour] -= 1
else:
print(f"No Vacant Slots in {random_date}, {hour}")
df.to_csv(r"new.csv", index=False)
print(date)
print(hour)
Here's another alternative. I'm not sure you really need the very large and slow-to-load pandas module for this. This does it with plan Python structures. I tried to run the simulation until it got a failure, but with 365,000 open slots, and flushing the database to disk each time, it takes too long. I changed the 100 to 8, just to see it find a dup in reasonable time.
import csv
import datetime
import random
def create():
start = datetime.date( 2022, 1, 1 )
oneday = datetime.timedelta(days=1)
headers = ["date"] + [f"{i}:00" for i in range(8,18)]
data = []
for _ in range(365):
data.append( [start.strftime("%Y-%m-%d")] + [8]*10 ) # not 100
start += oneday
write( headers, data )
def write(headers, rows):
fcsv = csv.writer(open('data.csv','w',newline=''))
fcsv.writerow( headers )
fcsv.writerows( rows )
def read():
days = []
headers = []
for row in csv.reader(open('data.csv')):
if not headers:
headers = row
else:
days.append( [row[0]] + list(map(int,row[1:])))
return headers, days
def choose( headers, days ):
random_date = random.randrange(365)
random_hour = random.randrange(len(headers)-1)+1
choice = days[random_date][0] + " " + headers[random_hour]
print( "Chose", choice )
if days[random_date][random_hour]:
days[random_date][random_hour] -= 1
write(headers,days)
return choice
else:
print("Randomly chosen slot is full.")
return None
create()
data = read()
while choose( *data ):
pass

Python Pandas Highlight single signs possible?

I am really new to the whole python development and have a question. I would like to achieve the following result:
However, in my research I only found the possibility to change the style for a whole cell.
I am doing a complete character by character comparison and would like to Colour the individual characters accordingly.
Maybe this can also be done with Python (my VBA script is very slow).
This is my python script till now:
import pandas as pd
import numpy as np
path = "XXXXX"
data = pd.read_csv(path, names=["Dir1", "Dir2", "File1", "File2",
"Diff", "Line1", "A", "Line2", "B"], sep="|")
line = 1
for ind in data.index:
if data.A[ind] == data.B[ind]:
var_ok = True
else:
#Work just with different Values
var_ok = False
var_length_A = len(str(data.A[ind]))
var_length_B = len(str(data.B[ind]))
#check length
#A is longer
if var_length_A > var_length_B:
var_longer = var_length_A
#Atos is longer
elif var_length_A < var_length_B:
var_longer = var_length_B
#Same length
else:
var_longer = var_length_A
for count in range(1,var_longer):
#read every sign
var_sign_A = mid(data.A[ind],count,1)
var_sign_B = mid(data.B[ind],count,1)
if var_sign_A != var_sign_B:
#highlight this
else:
#Do nothing
print([ind], "|\t", data.A[ind], "|\t", data.B[ind], "|\t",
var_ok, "|\t", var_length_A, "|\t", var_length_B, "|\t", var_longer)
This is a part from my VBA script:
'Wenn beide gefüllt sind
Else
counter = 1
'Zeichen für Zeichen abgleich
For counter = counter To leng
If Mid(Cells(zeile, Spalte1), counter, 1) <> Mid(Cells(zeile, Spalte2), counter, 1) Then
With Cells(zeile, Spalte2).Characters(start:=counter, Length:=1).Font
.Color = var2
.FontStyle = "Fett"
End With
With Cells(zeile, Spalte1).Characters(start:=counter, Length:=1).Font
.Color = var2
.FontStyle = "Fett"
End With
Else
With Cells(zeile, Spalte2).Characters(start:=counter, Length:=1).Font
.Color = var1
.FontStyle = "Standard"
End With
With Cells(zeile, Spalte1).Characters(start:=counter, Length:=1).Font
.Color = var1
.FontStyle = "Standard"
End With
End If
Next
End If
End If
BR & thank you :)
Marcel

Convert the results of a dictironary to a dataframe

From this commands
from stackapi import StackAPI
lst = ['11786778','12370060']
df = pd.DataFrame(lst)
SITE = StackAPI('stackoverflow', key="xxxx")
results = []
for i in range(1,len(df)):
SITE.max_pages=10000000
SITE.page_size=100
post = SITE.fetch('/users/{ids}/reputation-history', ids=lst[i])
results.append(post)
The results variable prints the results of the json format
How is it possible to converts the results variable to a dataframe with five columns?
reputation_history_type, reputation_change, post_id, creation_date,
user_id
Here try this :
from stackapi import StackAPI
import pandas as pd
lst = ['11786778','12370060']
SITE = StackAPI('stackoverflow')
results = []
SITE.max_pages=10000000
SITE.page_size=100
for i in lst:
post = SITE.fetch('/users/{ids}/reputation-history', ids=[i]).get('items')
results.extend([list(j.values()) for j in post])
df = pd.DataFrame(results, columns = ['reputation_history_type', 'reputation_change', 'post_id', 'creation_date', 'user_id'])
Output :
print(df.head()) gives :
reputation_history_type reputation_change post_id creation_date user_id
0 asker_accepts_answer 2 59126012 1575207944 11786778.0
1 post_undownvoted 2 59118819 1575139301 11786778.0
2 post_upvoted 10 59118819 1575139301 11786778.0
3 post_downvoted -2 59118819 1575139299 11786778.0
4 post_upvoted 10 59110166 1575094452 11786778.0
print(df.tail()) gives :
reputation_history_type reputation_change post_id creation_date user_id
170 post_upvoted 10 58906292 1574036540 12370060.0
171 answer_accepted 15 58896536 1573990105 12370060.0
172 post_upvoted 10 58896044 1573972834 12370060.0
173 post_downvoted 0 58896299 1573948372 12370060.0
174 post_downvoted 0 58896158 1573947435 12370060.0
NOTE :
You can just create a dataframe direct from the result which will be list of lists.
You don't need to declare SITE.max_page and SIZE.page_size every time you loop through the lst.
from stackapi import StackAPI
import pandas as pd
lst = ['11786778', '12370060']
df = pd.DataFrame(lst)
SITE = StackAPI('stackoverflow', key="xxxx")
results = []
for i in range(1, len(df)):
SITE.max_pages = 10000000
SITE.page_size = 100
post = SITE.fetch('/users/{ids}/reputation-history', ids=lst[i])
results.append(post)
data = []
for item in results:
data.append(item)
df = pd.DataFrame(data, columns=['reputation_history_type', 'reputation_change', 'post_id', 'creation_date', 'user_id']
print(df)
Kinda flying in the blind since I maxed out my StackOverflow API limit, but this should work:
from stackapi import StackAPI
from pandas.io.json import json_normalize
lst = ['11786778','12370060']
SITE = StackAPI('stackoverflow', key="xxx")
results = []
for ids in lst:
SITE.max_pages=10000000
SITE.page_size=100
post = SITE.fetch('/users/{ids}/reputation-history', ids=ids)
results.append(json_normalize(post, 'items'))
df = pd.concat(results, ignore_index=True)
json_normalize converts the JSON to dataframe
pd.concat concatenates the dataframes together to make a single frame

How to dynamically print list in Python

We're working on a Python project, where we are retrieving data from our MySQL database, and we're then sending it back to a new table on our database. We have initialized a list,
list_lpn_temp = []
The problem is that the range of this list varies, and therefore we don't always know how many datapoints we will have in our list. We have this code, and this is where the error occurs:
df2 = pd.DataFrame(columns=['first_temp_lpn', 'first_temp_lpn_validated', 'second_temp_lpn', 'second_temp_lpn_validated', 'third_temp_lpn', 'third_temp_lpn_validated'])
df2 = df2.append({'first_temp_lpn' : list_lpn_temp[0][0], 'first_temp_lpn_validated' : list_validated[0], 'second_temp_lpn' : list_lpn_temp[1][0], 'second_temp_lpn_validated' : list_validated[1], 'third_temp_lpn' : list_lpn_temp[2][0], 'third_temp_lpn_validated' : list_validated[2]}, ignore_index=True).round(2)
with engine.connect() as conn, conn.begin():
df2.to_sql('Raw_Validated', conn, if_exists='append', index=False)
Sometimes it gives us an error saying index out of range, as we sometimes only have 2 values in the list, and therefore list_lpn_temp[3][0] will give us the error. Dream scenario would be, if we could somehow send a null or maybe some text saying that we dont have any value to our database.
Therefore we need 2 things:
Send data, but where it depends on the size of our list, and is not just set static. For example like this (We need something better than this):
'first_temp_lpn' : list_lpn_temp[0][0]
If we are receiving index out of range, then we still need to send something to the database, as it expects 3x columns of temperature. But as there are no values, we could send a null, and therefore this could be nice to implement. Otherwise we will just get another big issue.
BIGGER PART OF THE CODE
engine = create_engine("mysql://xxx:xxx#localhost/xxx")
conn = engine.connect()
list_lpn_temp = []
index = pd.date_range(start=start_range.min(), end=end_range.max(), freq='20T')
for x in index:
a_temp = pd.read_sql('SELECT temperature FROM Raw_Data', conn).astype(float).values
list_lpn_temp.extend(a_temp)
if len(list_lpn_temp) > max_samples:
list_lpn_temp.pop(0)
for i in range (len(list_lpn_temp)):
if -1.5 < 25-list_lpn_temp[i] < 1.5:
validated_lpn = 1
list_validated.append(validated_lpn)
new_list_lpn_temp.extend(list_lpn_temp[i])
else:
validated_lpn = 0
list_validated.append(validated_lpn)
df2 = pd.DataFrame(columns=['first_temp_lpn', 'first_temp_lpn_validated', 'second_temp_lpn', 'second_temp_lpn_validated', 'third_temp_lpn', 'third_temp_lpn_validated'])
df2 = df2.append({'first_temp_lpn' : list_lpn_temp[0][0], 'first_temp_lpn_validated' : list_validated[0], 'second_temp_lpn' : list_lpn_temp[1][0], 'second_temp_lpn_validated' : list_validated[1], 'third_temp_lpn' : list_lpn_temp[2][0], 'third_temp_lpn_validated' : list_validated[2]}, ignore_index=True).round(2)
with engine.connect() as conn, conn.begin():
df2.to_sql('Raw_Validated', conn, if_exists='append', index=False)
NEW (KP)
We have a time_start and time_end value, which is formatted to datetime. We want to send it with the temp, so we have tried to modify the df2.append.
lastTime = pd.read_sql('SELECT MAX(timestamp) FROM Raw_Data', conn).astype(str).values.tolist()
firstTime = pd.read_sql('SELECT MIN(timestamp) FROM Raw_Data', conn).astype(str).values.tolist()
firstTime = (pd.to_datetime(firstTime[0])-datetime.timedelta(minutes=10)).round('20T')
lastTime = (pd.to_datetime(lastTime[0])-datetime.timedelta(minutes=10)).round('20T')
test = lastTime - datetime.timedelta(minutes=40)
time_start = test.astype(str).values[0]
lastTime = lastTime + datetime.timedelta(minutes=20)
time_end = lastTime.astype(str).values[0]
for name, value, valid in zip(['first', 'second', 'third'], list_lpn_temp, list_validated):
temp[name+'_temp_lpn'] = value[0]
temp[name+'_temp_lpn_validated'] = valid
df2 = df2.append({'time_start' : time_start, 'time_end' : time_end}, temp)
print (df2)
But then only datetime is being sent (time_start and time_end)
you can loop over the elements in the list.
Something like
temp = {}
for name, value in zip(['first', 'second', 'third'], list_lpn_temp):
temp[name+'_temp_lpn'] = value[0]
temp[name+'_temp_lpn_validated'] = value[1]
df2 = df2.append(temp)

Python 2.7 - manipulate some data from a CSV file

First of all I wanna emphasize that I'm a total beginner at python, the below code I made to manipulate some data from a CSV. I know that it's not the prettiest code and probably I could have made it more elegant, but it works, until a certain point and that's the reason I opened this question
import csv
from numpy import interp
from operator import sub
import math
import pandas as pd
from Tkinter import *
import Tkinter as tk
import tkFileDialog as filedialog
root = Tk()
root.withdraw()
filename= filedialog.askopenfilename( initialdir="C:/", title="select file", filetypes=(("CSV files", "*.CSV"), ("all files", "*.*")))
id_uri = []
ore = []
minute = []
zile = []
activi = []
listx = []
listsa = []
list_ore = []
listspi = []
listspf = []
list_min = []
zile_luna = 0
test = []
nume = []
with open (filename) as p, open ('activi.csv') as a:
reader = csv.reader(p,delimiter=',')
for row in reader:
id_uri.append(row[0])
ore.append(row[1])
minute.append(row[2])
zile.append(row[3])
reader = csv.reader(a)
for row in reader:
activi.append(row[0])
nume.append(row[1])
id_uri = map(int, id_uri)
ore = map(float, ore)
minute = map(float, minute)
minute = interp(minute,[0,60],[0,100])
ore = ore + minute/100
zile = map(int, zile)
activi = map(int, activi)
zile_luna = len(set(zile))+1
mimin = 0
maxim = 0
def pontaj():
global listx
global listsa
global listspi
global listspf
global list_ore
global list_min
global maxim
global minim
for x in range(3):
for y in range(len(id_uri)):
if zile[y] == z:
if activi[x] == id_uri[y]:
listx.append(ore[y])
minim = min(listx)
maxim = max(listx)
listsa.append(maxim-minim)
listx = []
listspi = [int(i) for i in listsa]
listspf = [i%1 for i in listsa]
for i in range(len(listspf)):
listspf[i] = round(listspf[i], 2)
listspf[i] = listspf[i]*100
listspf[i] = interp(listspf[i],[0,100],[0,60])
listspf[i] = int(listspf[i])
list_ore.append(listspi)
list_min.append(listspf)
listsa = []
for z in range(1,zile_luna):
pontaj()
for sublst in list_ore:
for item in range(len(sublst)):
sublst[item] = str(sublst[item])
for sublst in list_min:
for item in range(len(sublst)):
sublst[item] = str(sublst[item])
for i in range(len(list_ore)):
for j in range(len(list_ore[i])):
list_ore[i][j] = ' '.join(i + ':' + j for i,j in zip(list_ore[i][j],list_min[i][j]))
df = pd.DataFrame(list_ore)
df = df.T
nume = pd.Series(nume)
df['e'] = nume.values
df.to_csv('pontaj.csv', index = False, header = False)
print df
and the CSV file I read all the info from looks like this(employee code, hour, minute, day):
23,5,00,1
23,6,00,1
24,7,00,1
25,8,00,1
24,9,00,1
25,11,00,1
24,7,00,2
25,8,00,2
24,9,00,2
25,11,00,2
23,5,00,4
23,6,00,4
24,7,00,4
25,8,00,4
24,9,00,4
25,11,00,4
I have another CSV file that has employee code folowed by employee name like this:
23,aqwe
24,beww
25,cwww
Basically it's an attendance logger, it compares info from one CSV to another, finds the min and max hours in a certain day, subtracts min from max and writes this info in a list that is written to another csv.
Thing is, if all employees attend a certain day, all goes well, it calculates the attendance hours, puts them in the csv, all good. But what will happen if an employee skips one day? well as I found out, it ruins the calculation, because the code requires that all data must be consistent and in a perfect order.
The data written to the CSV file must finally look like this:
day1 day2 day3
hours hours hours employee_a
hours hours hours employee_b
hours hours hours employee_c
But if one skips a day, the hours get scrambled.
I've tried some different approaches but none worked, and I realize the problem is due to my simple way of thinking, but as I said, I only started with python a few days ago.
Do you have any suggestions on how I could improve the code to take the missed day of a certain employee in consideration and generate the data like so:
day1 day2 day3
1:20 2:30 3:40 employee_a
1:20 2:30 3:40 employee_b
0:0 2:30 3:40 employee_c
Any advice would be appreciated, thanks!

Categories

Resources