Optimize Conditional while loop - python

I have a huge dataframe in python +7mil rows. My general problem is that I need to run over a column and make a new 'numer' every time I see a '#' in that column. So the first time I see a # I overwrite it with 1 and drop this row, then I continue in the next row with the same number until I see again a '#' and i procede that.
I already have some code in place, but at it is a loop it is super slow!
i=0
j=0
while i <len(data):
if data.iloc[i][0] == '#':
j=j+1
data = data.drop(data.index[i])
else:
data.iloc[i][0] = j
i=i+1
return data

Try with something like this:
m = (data.iloc[:, 0] == '#')
data.iloc[:, 0] = m.cumsum()
data.drop(m.index[m], inplace=True)

Related

how can I convert the for-while loop into a for-for loop and achieve the same output?

This code successfully prints a youtube play button
rows = 6
symbol = '\U0001f34a'
for i in range(rows):
for j in range(i+1):
print('{'+symbol+'}', end = '')
print()
for x in range(rows):
while x < 5:
print('{'+symbol+'}', end = '')
x+=1
print()
I tried to change the while loop into a for loop and print a "upside-down" right triangle. but it doesnt work.
rows = 6
symbol = '\U0001f34a'
for i in range(rows):
for j in range(i+1):
print('{'+symbol+'}', end = '')
print()
for x in range(rows):
for x in range(5):
print('{'+symbol+'}', end = '')
print()
Your second loop is always in range(5) so it will not print the desired output.
Firstly, you can use your 1st loop to set up the second, but it will be the same as above, and wont make a descending order. In order to do that, I reversed the 1st range :
for x in range(rows)[::-1]: # Reverse the range
for y in range(x): # Use 1st loop variable as parameter
print('{'+symbol+'}', end = '')
Output, with a 'O' since I didnt set up the encoding for your symbol
{O}
{O}{O}
{O}{O}{O}
{O}{O}{O}{O}
{O}{O}{O}{O}{O}
{O}{O}{O}{O}{O}{O}
{O}{O}{O}{O}{O}
{O}{O}{O}{O}
{O}{O}{O}
{O}{O}
{O}
ok thanks for the help guys I see where I messed up.when i was converting the while loop to a for loop I was supposed to find the range in the iteration of the outer for loop. Here is my new code that works perfectly
rows = 6
symbol = '\U0001f34a'
for i in range(rows):
for j in range(i+1):
print('{'+symbol+'}', end = '')
print()
for x in range(rows,0,-1):
for e in range(x,1,-1):
print('{'+symbol+'}', end = '')
print()
It was fun to create, so here is my code that works.
The second for loop start to max length and end at 0 with a step of -1, with this you can create a for loop that decreases
for i in range(rows):
print('{'+symbol+'}'*(i+1))
for i in range(rows,0,-1):
print('{'+symbol+'}'*(i-1))

Updating Dataframe during Traversal

I'm working with dataframes, and need to delete a few rows as I iterate through them.
A brief overview: I read a row (N), compare it with the next 20 rows (till N+20), and delete a few rows between N and N+20 based on the comparison. I then go back to N+1, and compare that row with the next 20 rows, until N+1+20. I do not want to compare N+1 with the rows I've previously deleted.
However, as I delete the rows, the deletion is not reflected in the dataframe as I am traversing its original copy, and the change hasn't been reflected.
Any solutions for this?
df = pd.read_csv(r"C:\snip\test.csv")
index_to_delete = []
for index, row in df.iterrows():
snip
for i in range(20):
if (index + i + 1) < len(df.index):
if condition:
index_to_delete.append(index + i + 1) #storing indices of rows to delete between N and N+20
df.loc[index, ['snip1', 'snip2']] = [snip, snip] #updating values in row N
df = df.drop(index_to_delete)
index_to_delete.clear()
pandas.DataFrame.iterrows():
You should never modify something you are iterating over. This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect.
there are a many tricks to solve ploblem:
1: you can itrate over len of df instead of itrate on df.
for inx in range(len(df)):
try:
row = df.loc[inx]
except:
continue
2: store checked indexes and skip them
df = pd.read_csv(r"C:\snip\test.csv")
all_index_to_delete = []
index_to_delete = []
for index, row in df.iterrows():
if index in all_index_to_delete:
continue
snip
for i in range(20):
if (index + i + 1) < len(df.index):
if condition:
index_to_delete.append(index + i + 1) #storing indices of rows to delete between N and N+20
all_index_to_delete.append(index + i + 1) #storing indices of rows to delete between N and N+20
df.loc[index, ['snip1', 'snip2']] = [snip, snip] #updating values in row N
df = df.drop(index_to_delete)
index_to_delete.clear()

Increment by one everytime loop runs

How can I make the cell number increase by one every time it loops through all of the sheets? I got it to loop through the different sheets itself but I'm not sure how to add +1 to the cell value.
for sheet in sheetlist:
wsX = wb.get_sheet_by_name('{}'.format(sheet))
ws2['D4'] = wsX['P6'].value
I'm trying to get just the ['D4'] to change to D5,D6,D7.. etc up to 25 automatically.
No need for counters or clumsy string conversion: openpyxl provides an API for programmatic access.
for idx, sheet in enumerate(sheetlist, start=4):
wsX = wb[sheet]
cell = ws2.cell(row=idx, column=16)
cell.value = wsX['P6']
for i, sheet in enumerate(sheetlist):
wsX = wb.get_sheet_by_name('{}'.format(sheet))
cell_no = 'D' + str(i + 4)
ws2[cell_no] = wsX['P6'].value
write this outside of the loop :
x = 'D4'
write this in the loop :
x = x[0] + str(int(x[1:])+1)
Try this one... it's commented so you can understand what it's doing.
#counter
i = 4
for sheet in sheetlist:
#looping from D4 to D25
while i <= 25:
wsX = wb.get_sheet_by_name('{}'.format(sheet))
#dynamic way to get the cell
cell1 = 'D' + str(i)
ws2[cell1] = wsX['P6'].value
#incrementing counter
i += 1

Python, append within a loop

So I need to save the results of a loop and I'm having some difficulty. I want to record my results to a new list, but I get "string index out of range" and other errors. The end goal is to record the products of digits 1-5, 2-6, 3-7 etc, eventually keeping the highest product.
def product_of_digits(number):
d= str(number)
for integer in d:
s = 0
k = []
while s < (len(d)):
j = (int(d[s])*int(d[s+1])*int(d[s+2])*int(d[s+3])*int(d[s+4]))
s += 1
k.append(j)
print(k)
product_of_digits(n)
Similar question some time ago. Hi Chauxvive
This is because you are checking until the last index of d as s and then doing d[s+4] and so on... Instead, you should change your while loop to:
while s < (len(d)-4):

Python trouble exiting a 'while' loop for simple script

I wrote a script to reformat a tab-delimited matrix (with header) into a "long format". See example below. It performs the task correctly but it seems to get stuck in an endless loop...
Example of input:
WHO THING1 THING2
me me1 me2
you you1 you2
Desired output:
me THING1 me1
me THING2 me2
you THING1 you1
you THING2 you2
Here is the code:
import csv
matrix_file = open('path')
matrix_reader = csv.reader(matrix_file, delimiter="\t")
j = 1
while j:
matrix_file.seek(0)
rownum = 0
for i in matrix_reader:
rownum+=1
if j == int(len(i)):
j = False
elif rownum ==1:
header = i[j]
else:
print i[0], "\t",header, "\t",i[j]
j +=1
I think it has to do with my exit command (j = False). Any ideas?
edit: Thanks for suggestions. I think a typo in my initial posting led to some confusion, sorry about that For now I have employed a simple solution:
valid = True
while valid:
matrix_file.seek(0)
rownum = 0
for i in matrix_reader:
rownum+=1
if j == int(len(i)):
valid = False
etc, etc, etc...
Your j += 1 is outside the while loop, so j never increases. If len(i) is never less than 2, then you'll have an infinite loop.
But as has been observed, there are other problems with this code. Here's a working version based on your idiom. I would do a lot of things differently, but perhaps you'll find it useful to see how your code could have worked:
j = 1
while j:
matrix_file.seek(0)
rownum = 0
for i in matrix_reader:
rownum += 1
if j == len(i) or j == -1:
j = -1
elif rownum == 1:
header = i[j]
else:
print i[0], "\t", header, "\t", i[j]
j += 1
It doesn't print the rows in the order you wanted, but it gets the basics right.
Here's how I would do it instead. I see that this is similar to what Ashwini Chaudhary posted, but a bit more generalized:
import csv
matrix_file = open('path')
matrix_reader = csv.reader(matrix_file, delimiter="\t")
headers = next(matrix_reader, '')
for row in matrix_reader:
for header, value in zip(headers[1:], row[1:]):
print row[0], header, value
j+=1 is outside the while loop as senderle's answer says.
other improvements can be:
int(len(i)) ,just use len(i) ,as len() always returns a int so no need of int() around
it
use for rownum,i in enumerate(matrix_reader): so now there's no
need of handling an extra variable rownum, it'll be incremented by
itself.
EDIT: A working version of your code, I don't think there's a need of while here, the for loop is sufficient.
import csv
matrix_file = open('data1.csv')
matrix_reader = csv.reader(matrix_file, delimiter="\t")
header=matrix_reader.next()[0].split() #now header is ['WHO', 'THING1', 'THING2']
for i in matrix_reader:
line=i[0].split()
print line[0], "\t",header[1], "\t",line[1]
print line[0], "\t",header[2], "\t",line[2]

Categories

Resources