More efficient way than iterrows - python

I have a DataFrame df with columns action and pointerID. In the code snippet below I'm iterating through every row which is pretty inefficient because the DataFrame is pretty large. Is there a more efficient way to do this?
annotated = []
curr_pointers = []
count = 1
for index, row in df.iterrows():
action = row["action"]
id = row["pointerID"]
if action == "ACTION_MOVE":
annotated.append(curr_pointers[id])
elif (action == "ACTION_POINTER_DOWN") or (action == "ACTION_DOWN"):
if row["actionIndex"] != id:
continue
if id >= len(curr_pointers):
curr_pointers.append(count)
else:
curr_pointers[id] = count
annotated.append(count)
count = count + 1
elif (action == "ACTION_POINTER_UP") or (action == "ACTION_UP") or (action == "ACTION_CANCEL"):
if row["actionIndex"] != id:
continue
annotated.append(curr_pointers[id])
else:
print("{} unknown".format(action))

Related

I am trying to make a functions that counts the occurrence of letter diagonally top left to bottom right of a matrix

This is the main question
Write a function named find_longest_string(legoString,n) that finds and returns the longest
diagonal string from the upper-right triangle of the matrix representation of the brick placement.
The longest diagonal string is defined as a string that contains the maximum occurrences of a
letter on the same diagonal. In case there are multiple solutions i.e. diagonal strings of the
same maximum lengths, you can return any of the valid solutions. Fig 5 shows a computation
of a valid solution. This function must have two parameters, legoString - a string returned from
the function named “place_random_bricks”, and n - the number of columns on the baseplate.
In this example the letter G would be occurring the most
def find_longest_string(legoString, n):
list2 = []
countr = ""
countb = ""
countg = ""
county = ""
countc = ""
for i in range(len(legoString)):
list = []
if i % n == 0: #guarantees that row will only have required amount of columns(no more no less)
sub = legoString[i: i + n] #i starts at 0 and so add number of columns for first row and continue
#list = [] #create empty list each time a row is created
for j in sub:
list.append(j)
list2.append(j)
answer = ''.join(''.join(tup) for tup in list)
answer2 = ''.join(''.join(tup) for tup in list)
print(answer) # prints list before it is reset
for index in range(len(answer)):
if answer2[index] == answer[index-1]:
if answer2[index] == "R":
countr = countr + "R"
elif answer2[index] == "B":
countb = countb + "B"
elif answer2[index] == "G":
countg = countg + "G"
elif answer2[index] == "Y":
county = county + "Y"
elif answer2[index] == "C":
countc = countc + "C"
print(countr, countc, county, countb, countg)
if len(countr) >= len(countb) and len(countg) and len(county) and len(countc):
return countr
elif len(countb) >= len(countr) and len(countg) and len(county) and len(countc):
return countb
elif len(countg) >= len(countr) and len(countb) and len(county) and len(countc):
return countg
elif len(countc) >= len(countr) and len(countg) and len(county) and len(countb):
return countc
elif len(county) >= len(countr) and len(countg) and len(countb) and len(countc):
return county

List values change after saving them

I am currently writing an eight queens problem solving algorithm.
The alogorithm is not the problem tho. Its the storing of the solutions.
I am trying to append the solutions (solutions are stored in a list of lists) to a list,
but after saving the solution and continuing through the algorithm,
the Values of the saved list keep changing.
I suppose the saved list and the one I am changing are somehow still "connected",
but I dont know how.
Does anyone know a solution or a different approach to my saving strategy?
This is the part of the code I am having Problems with.
# The Code only gets executed if there is a valid solution.
if found_solution():
# Saving solution
solutions.append(board)
# deleting last Queen
temp = last_queen.pop()
board[temp[0]][temp[1]] = "-"
# going back one Row
Reihe -= 1
# return
return
The board does look like this
board = [["-","-","-","-"],
["-","-","-","-"],
["-","-","-","-"],
["-","-","-","-"]]
the solution list like this
solutions = []
If anyone wants to take a look at the whole code (There are a few german variables tho):
board = [["-","-","-","-"],
["-","-","-","-"],
["-","-","-","-"],
["-","-","-","-"]]
Reihe = 0
solutions = []
last_queen = []
def found_solution():
count = 0
for i in range(4):
for j in range(4):
if board[i][j] == "D":
count += 1
if count == 4:
return True
return False
def board_clear(temp_Reihe, temp_i):
#Check Column and Row
for k in range(4):
if board[temp_Reihe][k] == "D":
return False
if board[k][temp_i] == "D":
return False
#Check Diagonals
#temp_i == x and temp_Reihe == y
st1_x = temp_i - min(temp_i, temp_Reihe)
st1_y = temp_Reihe - min(temp_i, temp_Reihe)
st2_x = temp_i - min(temp_i, 3-temp_Reihe)
st2_y = temp_Reihe + min(temp_i, 3-temp_Reihe)
while st1_x < 4 and st1_y < 4:
if board[st1_y][st1_x] == "D":
return False
st1_x += 1
st1_y += 1
while st2_x < 4 and st2_y > -1:
if board[st2_y][st2_x] == "D":
return False
st2_x += 1
st2_y -= 1
return True
def main():
global Reihe, board, solutions, last_queen
if found_solution():
#Saving solution
solutions.append(board)
#deleting last Queen
temp = last_queen.pop()
board[temp[0]][temp[1]] = "-"
#going back one Row
Reihe -= 1
return
for i in range(4):
#placing Queen if valid spot
if board_clear(Reihe, i):
board[Reihe][i] = "D"
last_queen.append([Reihe,i])
Reihe += 1
main()
#delete last Queen
temp = last_queen.pop()
board[temp[0]][temp[1]] = "-"
#going back one Row
Reihe -= 1
return
main()
People have suggested using copy or deepcopy
list2 = list1.deepcopy()

creating a list in python

I have an input list using which upon applying some if else logic while trying to save the output in a list. I'm when trying to check the type of it, found it to be "class list". I need to use this list and convert it to data frame in the next step so that i can have some terra data query written on top of it.
Also please note that upon writing the above piece of code in a single Jupiter window and when i go to the next Jupiter window and try query using the list index, i am always getting the last value in the list.
Need help in having the output set to a List, instead of Class List. also how to convert the list/Class list to a DataFrame?.
data = ['login', 'signup', 'account']
for i in range(len(data)):
source = []
if data[i] == 'login':
table = "sales.login_table"
elif data[i] == 'signup':
table = "sales.signup_table"
elif data[i] == 'account':
table = 'sales.account'
elif data[i] == 'addcc':
table = "sales.addcc"
elif data[i] == 'consolidatedfunding':
table = 'sales.consolidatedfunding'
elif data[i] == 'deposit':
table = 'sales.deposit'
elif data[i] == 'holdsassessment':
table = 'sales.holdsassessment'
elif data[i] == 'onboardinggc':
table = 'sales.onboardinggc'
source.append(table)
print(source)
print(source)
output:
['sales.login_table']
['sales.signup_table']
['sales.account']
print(type(source))
output :
<class 'list'>
You need to declare source = [] outside of for-loop, otherwise, in every iteration it'll be declare as empty list.
source = []
for i in range(len(data)):
if data[i] == 'login':
table = "sales.login_table"
elif data[i] == 'signup':
table = "sales.signup_table"
elif data[i] == 'account':
table = 'sales.account'
elif data[i] == 'addcc':
table = "sales.addcc"
elif data[i] == 'consolidatedfunding':
table = 'sales.consolidatedfunding'
elif data[i] == 'deposit':
table = 'sales.deposit'
elif data[i] == 'holdsassessment':
table = 'sales.holdsassessment'
elif data[i] == 'onboardinggc':
table = 'sales.onboardinggc'
source.append(table)
print(source)

Apply result to dataset after df.iterrows

df = pd.read_csv('./test22.csv')
df.head(5)
df = df.replace(np.nan, None)
for index,col in df.iterrows():
# Extract only if date1 happened earlier than date2
load = 'No'
if col['date1'] == None or col['date2'] == None:
load = 'yes'
elif int(str(col['date1'])[:4]) >= int(str(col['date2'])[:4]) and \
(len(str(col['date1'])) == 4 or len(str(col['date2'])) == 4):
load = 'yes'
elif int(str(col['date1'])[:6]) >= int(str(col['date2'])[:6]) and \
(len(str(col['date1'])) == 6 or len(str(col['date2'])) == 6):
load = 'yes'
elif int(str(col['date1'])[:8]) >= int(str(col['date2'])[:8]):
load = 'yes'
df.head(5)
After preprocessing using iterrows in dataset, If you look at the above code (attached code), it will not be reflected in the actual dataset. I want to reflect the result in actual dataset.
How can I apply it to the actual dataset?
Replace your for loop with a function that returns a boolean, then you can use df.apply to apply it to all rows, and then filter your dataframe by that value:
def should_load(x):
if x['date1'] == None or x['date2'] == None:
return True
elif int(str(x['date1'])[:4]) >= int(str(x['date2'])[:4]) and \
(len(str(x['date1'])) == 4 or len(str(x['date2'])) == 4):
return True
elif int(str(x['date1'])[:6]) >= int(str(x['date2'])[:6]) and \
(len(str(x['date1'])) == 6 or len(str(x['date2'])) == 6):
return True
elif int(str(x['date1'])[:8]) >= int(str(x['date2'])[:8]):
return True
return False
df[df.apply(should_load, axis=1)].head(5)

Iterating through list (Python)

I have a list that extends on like this and I would like to sort them based on the first 2 digits of the second part of each. I'm really rushed right now so some help would be nice.
collection = ["81851 19AJA01", "68158 17ARB03", "104837 20AAH02",
I tried this and it didn't work. I'm not doing this for a class I'd really appropriate some help
for x in collection:
counter = 0
i=0
for y in len(str(x)):
if (x[i] == '1'):
counter == 1
elif (x[i] == '2'):
counter == 2
elif x[i] == '0' and counter == 2:
counter = 2
elif x[i] == '9' and counter == 1:
counter = 3
elif x[i] == '8' and counter == 1:
counter = 4
elif x[i] == '7' and counter == 1:
counter = 5
i = i + 1
if (counter==2):
freshmen.append(x)
elif (counter==3):
sophomores.append(x)
elif (counter==4):
juniors.append(x)
elif (counter==5):
seniors.append(x)
Use the key function to define a custom sorting rule:
In [1]: collection = ["81851 19AJA01", "68158 17ARB03", "104837 20AAH02"]
In [2]: sorted(collection, key=lambda x: int(x.split()[1][:2]))
Out[2]: ['68158 17ARB03', '81851 19AJA01', '104837 20AAH02']

Categories

Resources