Shift cells up if entire row is empty in Openpyxl - python

I want the entire row to be removed(shift cells up) if there are no values in the entire row. I'm using Openpyxl.
My code:
for row in range(1, ws1.max_row):
flag = 0
for col in range(1, 50):
if ws1.cell(row, col).value is not None:
flag = 1
if flag == 0:
ws1.delete_rows(row, 1)
The rows are not getting deleted in the above case.
I tried using iter_rows function to do the same and it gives me:
TypeError: '>' not supported between instances of 'tuple' and 'int'
for row in ws1.iter_rows(min_row = 1, max_col=50, max_row = ws1.max_row):
flag = 0
for cell in row:
if cell.value is not None:
flag = 1
if flag == 0:
ws1.delete_rows(row, 1)
Help is appreciated!

The following is a generic approach to finding and then deleting empty rows.
empty_rows = []
for idx, row in enumerate(ws.iter_rows(max_col=50), start=1):
empty = not any((cell.value for cell in row))
if empty:
empty_rows.append(idx)
for row_idx in reversed(empty_rows):
ws.delete_rows(row_idx, 1)

Thanks to Charlie Clark for the help, here is a working solution I came up with, let me know if I can make any improvements to it:
i = 1
emptyRows = []
for row in ws1.iter_rows(min_row = 1, max_col=50, max_row = ws1.max_row):
flag = 0
for cell in row:
if cell.value is not None:
flag = 1
if flag == 0:
emptyRows.append(i)
i += 1
for x in emptyRows:
ws1.delete_rows(x, 1)
emptyRows[:] = [y - 1 for y in emptyRows]

Related

ValueError: Row numbers must be between 1 and 1048576

I'm using python openpyxl to extract specific data from an xlsx file to another xlsx. I defined a function which extracts the data I need and then I ran it using a while loop and told it to stop when it finds an empty cell.
But for some reason it gives me this error: Row numbers must be between 1 and 1048576
Here is my code:
x=3; y=2; z=4; i=5
def line():
c1 = ws1.cell(row = z, column = 1)
ws2.cell(row = y, column = 1).value = c1.value
c2 = ws1.cell(row = i, column = 2)
ws2.cell(row = y, column = 2).value = c2.value
c3 = ws1.cell(row = i, column = x)
ws2.cell(row = y, column = 3).value = c3.value
while ws1.cell(row=i, column=x+2).value != "":
line()
y+=1
x+=2
i+=1
else:
sys.exit()
What am I doing wrong?
The cell value returns a None when there is no data. It will not return "". So, the condition is NOT satisfied and the while loop goes on till the last row of excel. Change...
while ws1.cell(row=i, column=x+2).value != "":
to
while ws1.cell(row=i, column=x+2).value is not None:
...and the code would run as expected.

Making permanent change in a dataframe using python pandas

I would like to convert y dataframe from one format (X:XX:XX:XX) of values to another (X.X seconds)
Here is my dataframe looks like:
Start End
0 0:00:00:00
1 0:00:00:00 0:07:37:80
2 0:08:08:56 0:08:10:08
3 0:08:13:40
4 0:08:14:00 0:08:14:84
And I would like to transform it in seconds, something like that
Start End
0 0.0
1 0.0 457.80
2 488.56 490.80
3 493.40
4 494.0 494.84
To do that I did:
i = 0
j = 0
while j < 10:
while i < 10:
if data.iloc[i, j] != "":
Value = (int(data.iloc[i, j][0]) * 3600) + (int(data.iloc[i, j][2:4]) *60) + int(data.iloc[i, j][5:7]) + (int(data.iloc[i, j][8: 10])/100)
NewValue = data.iloc[:, j].replace([data.iloc[i, j]], Value)
i += 1
else:
NewValue = data.iloc[:, j].replace([data.iloc[i, j]], "")
i += 1
data.update(NewValue)
i = 0
j += 1
But I failed to replace the new values in my oldest dataframe in a permament way, when I do:
print(data)
I still get my old data frame in the wrong format.
Some one could hep me? I tried so hard!
Thank you so so much!
You are using pandas.DataFrame.update that requires a pandas dataframe as an argument. See the Example part of the update function documentation to really understand what update does https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.update.html
If I may suggest a more idiomatic solution; you can directly map a function to all values of a pandas Series
def parse_timestring(s):
if s == "":
return s
else:
# weird to use centiseconds and not milliseconds
# l is a list with [hour, minute, second, cs]
l = [int(nbr) for nbr in s.split(":")]
return sum([a*b for a,b in zip(l, (3600, 60, 1, 0.01))])
df["Start"] = df["Start"].map(parse_timestring)
You can remove the if ... else ... from parse_timestring if you replace all empty string with nan values in your dataframe with df = df.replace("", numpy.nan) then use df["Start"] = df["Start"].map(parse_timestring, na_action='ignore')
see https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.map.html
The datetimelibrary is made to deal with such data. You should also use the apply function of pandas to avoid iterating on the dataframe like that.
You should proceed as follow :
from datetime import datetime, timedelta
def to_seconds(date):
comp = date.split(':')
delta = (datetime.strptime(':'.join(comp[1:]),"%H:%M:%S") - datetime(1900, 1, 1)) + timedelta(days=int(comp[0]))
return delta.total_seconds()
data['Start'] = data['Start'].apply(to_seconds)
data['End'] = data['End'].apply(to_seconds)
Thank you so much for your help.
Your method was working. I also found a method using loop:
To summarize, my general problem was that I had an ugly csv file that I wanted to transform is a csv usable for doing statistics, and to do that I wanted to use python.
my csv file was like:
MiceID = 1 Beginning End Type of behavior
0 0:00:00:00 Video start
1 0:00:01:36 grooming type 1
2 0:00:03:18 grooming type 2
3 0:00:06:73 0:00:08:16 grooming type 1
So in my ugly csv file I was writing only the moment of the begining of the behavior type without the end when the different types of behaviors directly followed each other, and I was writing the moment of the end of the behavior when the mice stopped to make any grooming, that allowed me to separate sequences of grooming. But this type of csv was not usable for easily making statistics.
So I wanted 1) transform all my value in seconds to have a correct format, 2) then I wanted to fill the gap in the end colonne (a gap has to be fill with the following begining value, as the end of a specific behavior in a sequence is the begining of the following), 3) then I wanted to create columns corresponding to the duration of each behavior, and finally 4) to fill this new column with the duration.
My questionning was about the first step, but I put here the code for each step separately:
step 1: transform the values in a good format
import pandas as pd
import numpy as np
data = pd.read_csv("D:/Python/TestPythonTraitementDonnéesExcel/RawDataBatch2et3.csv", engine = "python")
data.replace(np.nan, "", inplace = True)
i = 0
j = 0
while j < len(data.columns):
while i < len(data.index):
if (":" in data.iloc[i, j]) == True:
Value = str((int(data.iloc[i, j][0]) * 3600) + (int(data.iloc[i, j][2:4]) *60) + int(data.iloc[i, j][5:7]) + (int(data.iloc[i, j][8: 10])/100))
data = data.replace([data.iloc[i, j]], Value)
data.update(data)
i += 1
else:
i += 1
i = 0
j += 1
print(data)
step 2: fill the gaps
i = 0
j = 2
while j < len(data.columns):
while i < len(data.index) - 1:
if data.iloc[i, j] == "":
data.iloc[i, j] = data.iloc[i + 1, j - 1]
data.update(data)
i += 1
elif np.all(data.iloc[i:len(data.index), j] == ""):
break
else:
i += 1
i = 0
j += 4
print(data)
step 3: create a new colunm for each mice:
j = 1
k = 0
while k < len(data.columns) - 1:
k = (j * 4) + (j - 1)
data.insert(k, "Duree{}".format(k), "")
data.update(data)
j += 1
print(data)
step 3: fill the gaps
j = 4
i = 0
while j < len(data.columns):
while i < len(data.index):
if data.iloc[i, j - 2] != "":
data.iloc[i, j] = str(float(data.iloc[i, j - 2]) - float(data.iloc[i, j - 3]))
data.update(data)
i += 1
else:
break
i = 0
j += 5
print(data)
And of course, export my new usable dataframe
data.to_csv(r"D:/Python/TestPythonTraitementDonnéesExcel/FichierPropre.csv", index = False, header = True)
here are the transformations:
click on the links for the pictures
before step1
after step 1
after step 2
after step 3
after step 4

Python XLRD Get Column Values by Column Names into List of Dictionaries

I have a xlsx file in which the data does not start at first row or column. It looks like this.
Only the column names are known here. The data ends whenever there is "**********" in first column.
I need the output in a list of dictionaries, like below.
'ListOfDict':
[ { 'A':1, 'B':'ABC', 'C':'Very Good', 'D':'Hardware', 'E':200.2 },
{ 'A':2, 'B':'DEF', 'C':'Not so good', 'D':'Software', 'E':100.1}]
I could figure out the column names. But could not get the values. Here is my code.
import xlrd
from itertools import product
wb = xlrd.open_workbook(filename)
ws = wb.sheet_by_index(0)
for row_index, col_index in product(xrange(ws.nrows), xrange(ws.ncols)):
if ws.cell(row_index, col_index).value == 'A':
print "({}, {})".format(row_index, col_index)
break
key1 = [ws.cell(row_index, col_ind).value for col_ind in range(col_index, ws.ncols)]
val = [ws.cell(row_index + i, col_ind).value
for i in range(row_index + 1, ws.nrows)
for col_ind in range(col_index, ws.ncols)]
But that gives error "list index out of range"
Please help.
Thank you.
Your problem is that the loop variable i is already the row_index, not the offset.
So you simply need to change the row index of the cell to take i:
val = [ws.cell(i, col_ind).value
for i in range(row_index + 1, ws.nrows)
for col_ind in range(col_index, ws.ncols)]
Or alternatively fix the creation the offset:
val = [ws.cell(row_index + i, col_ind).value
for i in range(1, ws.nrows - row_index)
for col_ind in range(col_index, ws.ncols)]
What I would do is first find the last row according to your condition. Then, with a nested loop, create the dictionaries. Something like:
import xlrd
from itertools import product
wb = xlrd.open_workbook(filename)
ws = wb.sheet_by_index(0)
for row_index in xrange(ws.nrows):
if ws.cell(row_index, 0).value == '**********':
last_row = row_index
break
for row_index, col_index in product(xrange(ws.nrows), xrange(ws.ncols)):
if ws.cell(row_index, col_index).value == 'A':
first_row, first_col = row_index, col_index
print "({}, {})".format(row_index, col_index)
break
list_of_dicts = []
for row in range(first_row+1, last_row):
dict = {}
for col in range(first_col, ws.ncols:
key = ws.cell(first_row, col).value
val = ws.cell(row, col).value
dict[key] = val
list_of_dicts.append(dict)
And in a much shorter, unreadable way (just for fun...):
list_of_dicts = [{ws.cell(first_row, col).value: ws.cell(row, col).value for col in range(first_col, ws.ncols} for row in range(first_row+1, last_row)]

Increment by one everytime loop runs

How can I make the cell number increase by one every time it loops through all of the sheets? I got it to loop through the different sheets itself but I'm not sure how to add +1 to the cell value.
for sheet in sheetlist:
wsX = wb.get_sheet_by_name('{}'.format(sheet))
ws2['D4'] = wsX['P6'].value
I'm trying to get just the ['D4'] to change to D5,D6,D7.. etc up to 25 automatically.
No need for counters or clumsy string conversion: openpyxl provides an API for programmatic access.
for idx, sheet in enumerate(sheetlist, start=4):
wsX = wb[sheet]
cell = ws2.cell(row=idx, column=16)
cell.value = wsX['P6']
for i, sheet in enumerate(sheetlist):
wsX = wb.get_sheet_by_name('{}'.format(sheet))
cell_no = 'D' + str(i + 4)
ws2[cell_no] = wsX['P6'].value
write this outside of the loop :
x = 'D4'
write this in the loop :
x = x[0] + str(int(x[1:])+1)
Try this one... it's commented so you can understand what it's doing.
#counter
i = 4
for sheet in sheetlist:
#looping from D4 to D25
while i <= 25:
wsX = wb.get_sheet_by_name('{}'.format(sheet))
#dynamic way to get the cell
cell1 = 'D' + str(i)
ws2[cell1] = wsX['P6'].value
#incrementing counter
i += 1

How do I put all my looped output in a variable (for generating an output file)? (CSV related)

I am quite new to working with python, so i hope you can help me out here. I have to write a programm that opens a csv file, reads it and let you select columns you want by entering the number. those have to be put in a new file. the problem is: after doing the input of which columns i want and putting "X" to start the main-part it generates exactly what i want but by using a loop, not printing a variable that contains it. But for the csv-writer i need to have a variable containg it. any ideas? here you have my code, for questions feel free to ask. the csvfile is just like:
john, smith, 37, blue, michigan
tom, miller, 25, orange, new york
jack, o'neill, 40, green, Colorado Springs
...etc
Code is:
import csv
with open("test.csv","r") as t:
t_read = csv.reader(t, delimiter=",")
t_list = []
max_row = 0
for row in t_read:
if len(row) != 0:
if max_row < len(row):
max_row = len(row)
t_list = t_list + [row]
print([row], sep = "\n")
twrite = csv.writer(t, delimiter = ",")
tout = []
counter = 0
matrix = []
for i in range(len(t_list)):
matrix.append([])
print(len(t_list), max_row, len(matrix), "Rows / Columns / Matrix Dimension")
eingabe = input("Enter column number you need or X to start generating output: ")
nr = int(eingabe)
while type(nr) == int:
colNr = nr-1
if max_row > colNr and colNr >= 0:
nr = int(nr)
# print (type(nr))
for i in range(len(t_list)):
row_A=t_list[i]
matrix[i].append(row_A[int(colNr)])
print(row_A[int(colNr)])
counter = counter +1
matrix.append([])
else:
print("ERROR")
nr = input("Enter column number you need or X to start generating output: ")
if nr == "x":
print("\n"+"Generating Output... " + "\n")
for row in matrix:
# Loop over columns.
for column in row:
print(column + " ", end="")
print(end="\n")
else:
nr = int(nr)
print("\n")
t.close()
Well you have everything you need with matrix, apart from an erroneous line that adds an unneeded row:
counter = counter +1
matrix.append([]) # <= remove this line
else:
print("ERROR")
You can then simply do:
if nr == "x":
print("\n"+"Generating Output... " + "\n")
with open("testout.csv", "w") as out:
wr = csv.writer(out, delimiter=",")
wr.writerows(matrix)

Categories

Resources