Finding Range of active/selected cell in Excel using Python and xlwings - python

I am trying to write a simple function in Python (with xlwings) that reads a current 'active' cell value in Excel and then writes that cell value to the cell in the next column along from the active cell.
If I specify the cell using an absolute reference, for example range(3, 2), then I everything is ok. However, I can't seem to manage to find the row and column values of whichever cell is selected once the function is run.
I have found a lot of examples where the reference is specified but not where the active cell range can vary depending on the user selection.
I have tried a few ideas. The first option is trying to use the App.selection that I found in the v0.10.0 xlwings documentation but this doesn't seem to return a range reference that can be used - I get an error "Invalid parameter" when trying to retrieve the row from 'cellRange':
def refTest():
import xlwings as xw
wb = xw.Book.caller()
cellRange = xw.App.selection
rowNum = wb.sheets[0].range(cellRange).row
colNum = wb.sheets[0].range(cellRange).column
url = wb.sheets[0].range(rowNum, colNum).value
wb.sheets[0].range(rowNum, colNum + 1).value = url
The second idea was to try to read the row and column directly from the cell selection but this gives me the error "Property object has no attribute 'row'":
def refTest():
import xlwings as xw
wb = xw.Book.caller()
rowNum = xw.App.selection.row
colNum = xw.App.selection.column
url = wb.sheets[0].range(rowNum, colNum).value
wb.sheets[0].range(rowNum, colNum + 1).value = url
Is it possible to pass the range of the active/selected cell from Excel to Python with xlwings? If anyone is able to shed some light on this then I would really appreciate it.
Thanks!

You have to get the app object from the workbook. You'd only use xw.App directly if you wanted to instantiate a new app. Also, selection returns a Range object, so do this:
cellRange = wb.app.selection
rowNum = cellRange.row
colNum = cellRange.column

Related

openpyxl delete rows from formatted table / error referencing table

I am trying to delete rows from a formatted table in Excel using the delete_rows() method. However, this does not delete the rows but only the content of the cells.
As an info, you can format a range as table using openpyxl as described in the documentation: https://openpyxl.readthedocs.io/en/latest/worksheet_tables.html
I have a formatted table called Table5 in the worksheet Sheet1:
To delete 2 rows from row 5, I use the following commands:
import openpyxl as xl
wb = xl.load_workbook('data.xlsx')
ws = wb['Sheet1']
ws.delete_rows(5, 2) # this is the delete command
wb.save('data.xlsx')
With the command delete_rows(), the range of the formatted table remains till row 6, whereas it shrinks when I delete the rows directly in Excel.
The question: How do I delete properly both the data and the formatted range?
Corollary note:
Also when I insert data, the table range does not expand. For example:
for i in range(4, 9):
ws.cell(row=i, column=1).value = i - 1
ws.cell(row=i, column=2).value = (i - 1) * 100
The range stays the same i.e. till row 6, whereas the range expands automatically by inserting data into Excel manually.
If you followed the linked doc, you must have:
added data to the Worksheet
created a Table instance which covers the data (ref=A1:B6)
added the Table instance to the worksheet with ws.add_table.
When you later add or remove rows, this does not affect the table which was remains with ref=A1:B6.
You would need to change the Table instance's ref to fit the new data layout.
# access your table (subsitute <Table> for the name of your table)
tab = ws.tables[<Table>]
# change the ref
tab.ref = "A1:B4"
# and save
wb.save('data.xlsx')
NB. the interface to the tables has changed since version 3.0.0, this code is tested with for 3.0.10, please check with documentation for previous/further changes.
In the following, I expand the answer by ljmc (which I have accepted) containing both ways with the newer and the older version of openpyxl.
After upgrading to openpyxl==3.0.10, you change the reference of a formatted table by simply referencing to the table name in the worksheet with the attribute .tables. Example: in the worksheet Sheet1 there is a formatted table named Table5:
import openpyxl as xl
wb = xl.load_workbook('data.xlsx')
ws = wb['Sheet1']
ws.delete_rows(5, 2) # this is the delete command
ws.tables['Table5'].ref = "A1:B4" # change the reference
wb.save('data.xlsx')
As simple as that: because ws.tables['Table5'] contains the formatted table.
Before the upgrade (openpyxl==3.0.0) you reference the formatted table using the method using the ._tables attribute, but instead of the name Table5 you need the index:
ws.delete_rows(5, 2) # this is the delete command
ws._tables[0].ref = "A1:B4" # change the reference
Using [0] works only if you have one formatted table in your worksheet or if Table5 is the first within the table object.
The proper pythonic way is to first determine the index where Table5 is in the table object:
ws.delete_rows(5, 2) # this is the delete command
ix = next(i for i, t in enumerate(ws._tables) if t.displayName == "Table5")
ws._tables[ix].ref = "A1:B4" # change the reference

Can you delete rows in Python with openpyxl by setting all cell values in a row to 'None'?

I am using openpyxl to attempt to delete rows from a spreadsheet. I understand that there is a funciton specifically for deleting rows, however, I was trying to overcome this problem without knowledge of that function, and I am now wondering why my method does not work.
To simplify the problem, I set up a spreadsheet and filled it with letters in some of the cells. In this case, the first print(sheet.max_row) printed "9". After setting all the cell values to None, I expected the number of rows to be 0, however, the second print statement printed "9" again.
Is it possible to reduce the row count by setting all the cells in a row to None?
import openpyxl
from openpyxl import load_workbook
from openpyxl.utils import get_column_letter, column_index_from_string
spreadsheet = load_workbook(filename = pathToSpreadsheet) #pathToSpreadsheet represents the absolute path I had to the spreadsheet that I created.
sheet = spreadsheet.active
print(sheet.max_row) # Printed "9".
rowCount = sheet.max_row
columnCount = sheet.max_column
finalBoundary = get_column_letter(columnCount) + str(rowCount)
allCellObjects = sheet["A1":finalBoundary]
for rowOfCells in allCellObjects:
for cell in rowOfCells:
cell.value = None
print(sheet.max_row) # Also printed "9".
Thank you for your time and effort!
Short answer NO.
However, you could access the cell from the sheet with the cell coordinates and delete them.
for rowOfCells in allCellObjects:
for cell in rowOfCells:
del sheet[cell.coordinate]
print(sheet.max_row)
A little more elaborate answer would be that a worksheet in Openpyxl stores it's _cells as a dict with coordinates as key. max_row property is defined
#property
def max_row(self):
"""The maximum row index containing data (1-based)
:type: int
"""
max_row = 1
if self._cells:
rows = set(c[0] for c in self._cells)
max_row = max(rows)
return max_row
So if the cells was None, the keys/coordinates would still prevail eg: _cells = {(1,1):None, (1,2):None, (5,4): None}.
max_rowwould then still give us the biggest y-component of the key.

Write formula to Excel with Python error

I try to follow this question to add some formula in my excel using python and openpyxl package.
That link is what i need for my task.
but in this code :
for i, cellObj in enumerate(Sheet.columns[2], 1):
cellObj.value = '=IF($A${0}=$B${0}, "Match", "Mismatch")'.format(i)
i take an error at Sheet.columns[2] any idea why ? i follow the complete code.
i have python 2.7.13 version if that helps for this error.
****UPDATE****
COMPLETE CODE :
import openpyxl
wb = openpyxl.load_workbook('test1.xlsx')
print wb.get_sheet_names()
Sheet = wb.worksheets[0]
for i, cellObj in enumerate(Sheet.columns[2], 1):
cellObj.value = '=IF($A${0}=$B${0}, "Match", "Mismatch")'.format(i)
error message :
for i, cellObj in enumerate(Sheet.columns[2], 1):
TypeError: 'generator' object has no attribute 'getitem'
ws.columns and ws.rows are properties that return generators. But openpyxl also supports slicing and indexing for rows and columns
So, ws['C'] will give a list of the cells in the third column.
For other Stack adventurers looking to copy/paste a formula:
# Writing from pandas back to an existing EXCEL workbook
wb = load_workbook(filename=myfilename, read_only=False, keep_vba=True)
ws = wb['Mysheetname']
# Paste a formula Vlookup! Look at column A, put result in column AC.
for i, cellObj in enumerate(ws['AC'], 1):
cellObj.value = "=VLOOKUP($A${0}, 'LibrarySheet'!C:D,2,FALSE)".format(i)
One issue, I have a header and the formula overwrites it. Anyone know how to start from row 2?
If you want to start from another row you can either use an if statement to skip the first row, or specify the range in the enumeration. A coded example is below:
wb = load_workbook(filename=myfilename, read_only=False, keep_vba=True)
ws = wb['Mysheetname']
# using an if statement
for i, cellObj in enumerate(ws['AC'], 1):
if i > 1:
cellObj.value = "=VLOOKUP($A${0}, 'LibrarySheet'!C:D,2,FALSE)".format(i)
# specifying range, up to max row on worksheet - or you can specify an exact range
for i, cellObj in enumerate(ws['AC2:AC'+str(ws.max_row)],2):
cellObj[0].value = "=VLOOKUP($A${0}, 'LibrarySheet'!C:D,2,FALSE)".format(i)
The second method requires you to begin the index at 2 and returns a tuple rather than a cell object, so you need to specify cellObj[0].value to return the value of the cell object.
fortunately now you can easy do formulas in certain records. Also there are simpler functions to use, such as:
wb.sheetnames instead of wb.read_sheet_names()
sheet = wb['SHEET_NAME'] instead of sheet = wb.get_sheet_by_name('SHEET_NAME')
And formulas can be easily inserted with:
sheet['A1'] = '=SUM(1+1)'

Using Excel named ranges in Python with openpyxl

How do I loop through the cells in an Excel named range/defined name and set each cell value within the named range using openpyxl with Python 2.7?
I found the following, but have not managed to get it to work for printing and setting the values of individual cells within the named range.
Read values from named ranges with openpyxl
Here's my code so far, I have put in comments where I am looking to make the changes. Thanks in anticipation.
#accessing a named range called 'metrics'
namedRange = loadedIndividualFile.defined_names['metrics']
#obtaining a generator of (worksheet title, cell range) tuples
generator = namedRange.destinations
#looping through the generator and getting worksheet title, cell range
cells = []
for worksheetTitle, cellRange in generator:
individualWorksheet = loadedIndividualFile[worksheetTitle]
#==============================
#How do I set cell values here?
# I am looking to print and change each cell value within the defined name range
#==============================
print cellRange
print worksheetTitle
#theWorksheet = workbook[worksheetTitle]
#cell = theWorksheet[cellRange]
I managed to resolve it. Perhaps the following will be useful to someone else who is looking to access the values of each cell in a defined name or named range using openpyxl.
import openpyxl
wb = openpyxl.load_workbook('filename.xlsx')
#getting the address
address = list(wb.defined_names['metrics'].destinations)
#removing the $ from the address
for sheetname, cellAddress in address:
cellAddress = cellAddress.replace('$','')
#looping through each cell address, extracting it from the tuple and printing it out
worksheet = wb[sheetname]
for i in range(0,len(worksheet[cellAddress])):
for item in worksheet[cellAddress][i]:
print item.value`

Using generators outside of a loop

Relatively new to python so please excuse the newbie question, but google isn't helpful at this time.
I have 100 very large xlsx files from which I need to extract the first row (specifically cell A2). I found this gem of a tool called openpyxl which will iterate through my data files without loading everything in memory. It uses a generaotor to get the relevant row on each call
The thing that I can't get is how to initialize a generator outside of a loop. Right now my code is:
from openpyxl import load_workbook
wb = load_workbook(filename = "merged01.xlsx", use_iterators= True)
sheetName = wb.get_sheet_names()
ws = wb.get_sheet_by_name(name = sheetName[0])
row = ws.iter_rows() #row is a generator
for cell in row:
break
print (cell[1].internal_value) # A2
But there has to be a better way of doing this such as:
...
row = ws.iter_rows() #row is a generator
cell = row.first # line I'm trying to KISS
print (cell[1].internal_value) # A2
cell = next(row)
The next function retrieves the next value from any iterator.
You're looking for next().
cell = next(row)

Categories

Resources