I want to find the rowspan i.e. 4 in my case by first column containing "abc" using openpyxl in python. Its not a specific table, its somewhere in my Excel sheet and I am parsing that sheet.
One approach would be to determine if a given cell is within a merged range.
The following script will return you the associated range of merged cells for a given cell. First it locates the cell where your text is, and then attempts to determine the merged cell range:
import openpyxl
def find_cell(ws, text):
for row in ws.iter_rows():
for cell in row:
if cell.value == text:
return cell
return None
def get_merged_range(ws, cell):
if cell in ws.merged_cells:
for merged_range in ws.merged_cell_ranges:
if cell in [c[0] for c in openpyxl.utils.cells_from_range(merged_range)]:
return merged_range
return None
wb = openpyxl.load_workbook(filename = 'input.xlsx')
ws = wb.active
found_cell = find_cell(ws, 'abc').coordinate
print get_merged_range(ws, found_cell)
If the passed cell is not merged, the function will return None.
Related
I am trying to iterate xlsx file and find the cell that contains our company's name using python. The file consists of 2 or more sheets, and each sheet has 6 company's information. Each cell I am looking for has formation as below:
Cell F6 = 1ST(Company_A+Company_B)
Cell G6 = 2ND(Company_C+Company_D)
Cell H6 = 3RD(Company_E+Company_F)
and so on.
I'd like to find the cell that contains Company_A. I have done some coding, but I got some problem.
The coding I can do is as following:
import openpyxl
bid = openpyxl.load_workbook('C:/Users/User/Desktop/bidding.xlsx', data_only=True)
for sheet in bid.worksheets:
for row in sheet.iter_rows():
for entry in row:
if entry.value == '1ST(Company_A+Company_B)':
print(entry.offset(row=1).value)
print(round(entry.offset(row=8).value/100,5))
I can find the value I want, but I want to find the cell without entering everything
As you're using == the script is checking for the string in the cell to match exactly that. Instead use in.
Your code should be:
import openpyxl
bid = openpyxl.load_workbook('C:/Users/User/Desktop/bidding.xlsx', data_only=True)
for sheet in bid.worksheets:
for row in sheet.iter_rows():
for entry in row:
try:
if 'Company_A' in entry.value:
print(entry.offset(row=1).value)
print(round(entry.offset(row=8).value/100,5))
except (AttributeError, TypeError):
continue
I want to find the next empty cell in a specific column and write to values that cell. I've tried it using following method:
for row in sheet['A{}:A{}'.format(sheet.min_row,sheet.max_row)]:
if row is None:
sheet.cell(column=1).value = name
else:
print ("Cell have data")
But It's not writing data to next empty cell. How can I fix that?
It's pretty pointless to construct a string with min_row and max_row. You can simply access the whole column:
from openpyxl import load_workbook
wb = load_workbook("book.xlsx")
ws = wb.active
for cell in ws["A"]:
if cell.value is None:
cell.value = "new value2"
wb.save("book.xlsx")
But this reads the whole column at once as a tuple. Instead, you can use iter_rows() (iter_cols() is not available in read-only):
from openpyxl import load_workbook
wb = load_workbook("book.xlsx")
ws = wb.active
for row in ws.iter_rows(min_col=1, max_col=1):
cell = row[0]
if cell.value is None:
cell.value = "new value"
wb.save("book.xlsx")
How can I find the number of the last non-empty row of an whole xlsx sheet using python and openpyxl?
The file can have empty rows between the cells and the empty rows at the end could have had content that has been deleted. Furthermore I don't want to give a specific column, rather check the whole table.
For example the last non-empty row in the picture is row 13.
I know the subject has been extensively discussed but I haven't found an exact solution on the internet.
# Open file with openpyxl
to_be = load_workbook(FILENAME_xlsx)
s = to_be.active
last_empty_row = len(list(s.rows))
print(last_empty_row)
## Output: 13
s.rows is a generator and its list contains arrays of each rows cells.
If you are looking for the last non-empty row of an whole xlsx sheet using python and openpyxl.
Try this:
import openpyxl
def last_active_row():
workbook = openpyxl.load_workbook(input_file)
wp = workbook[sheet_name]
last_row = wp.max_row
last_col = wp.max_column
for i in range(last_row):
for j in range(last_col):
if wp.cell(last_row, last_col).value is None:
last_row -= 1
last_col -= 1
else:
print(wp.cell(last_row,last_col).value)
print("The Last active row is: ", (last_row+1)) # +1 for index 0
if __name__ = '___main__':
last_active_row()
This should help.
openpyxl's class Worksheet has the attribute max_rows
I am using openpyxl to attempt to delete rows from a spreadsheet. I understand that there is a funciton specifically for deleting rows, however, I was trying to overcome this problem without knowledge of that function, and I am now wondering why my method does not work.
To simplify the problem, I set up a spreadsheet and filled it with letters in some of the cells. In this case, the first print(sheet.max_row) printed "9". After setting all the cell values to None, I expected the number of rows to be 0, however, the second print statement printed "9" again.
Is it possible to reduce the row count by setting all the cells in a row to None?
import openpyxl
from openpyxl import load_workbook
from openpyxl.utils import get_column_letter, column_index_from_string
spreadsheet = load_workbook(filename = pathToSpreadsheet) #pathToSpreadsheet represents the absolute path I had to the spreadsheet that I created.
sheet = spreadsheet.active
print(sheet.max_row) # Printed "9".
rowCount = sheet.max_row
columnCount = sheet.max_column
finalBoundary = get_column_letter(columnCount) + str(rowCount)
allCellObjects = sheet["A1":finalBoundary]
for rowOfCells in allCellObjects:
for cell in rowOfCells:
cell.value = None
print(sheet.max_row) # Also printed "9".
Thank you for your time and effort!
Short answer NO.
However, you could access the cell from the sheet with the cell coordinates and delete them.
for rowOfCells in allCellObjects:
for cell in rowOfCells:
del sheet[cell.coordinate]
print(sheet.max_row)
A little more elaborate answer would be that a worksheet in Openpyxl stores it's _cells as a dict with coordinates as key. max_row property is defined
#property
def max_row(self):
"""The maximum row index containing data (1-based)
:type: int
"""
max_row = 1
if self._cells:
rows = set(c[0] for c in self._cells)
max_row = max(rows)
return max_row
So if the cells was None, the keys/coordinates would still prevail eg: _cells = {(1,1):None, (1,2):None, (5,4): None}.
max_rowwould then still give us the biggest y-component of the key.
I am trying to iterate xlsx file and find the cell that contains our company's name using python. The file consists of 2 or more sheets, and each sheet has 6 company's information. Each cell I am looking for has formation as below:
Cell F6 = 1ST(Company_A+Company_B)
Cell G6 = 2ND(Company_C+Company_D)
Cell H6 = 3RD(Company_E+Company_F)
and so on.
I'd like to find the cell that contains Company_A. I have done some coding, but I got some problem.
The coding I can do is as following:
import openpyxl
bid = openpyxl.load_workbook('C:/Users/User/Desktop/bidding.xlsx', data_only=True)
for sheet in bid.worksheets:
for row in sheet.iter_rows():
for entry in row:
if entry.value == '1ST(Company_A+Company_B)':
print(entry.offset(row=1).value)
print(round(entry.offset(row=8).value/100,5))
I can find the value I want, but I want to find the cell without entering everything
As you're using == the script is checking for the string in the cell to match exactly that. Instead use in.
Your code should be:
import openpyxl
bid = openpyxl.load_workbook('C:/Users/User/Desktop/bidding.xlsx', data_only=True)
for sheet in bid.worksheets:
for row in sheet.iter_rows():
for entry in row:
try:
if 'Company_A' in entry.value:
print(entry.offset(row=1).value)
print(round(entry.offset(row=8).value/100,5))
except (AttributeError, TypeError):
continue