openpyxl delete rows from formatted table / error referencing table - python

I am trying to delete rows from a formatted table in Excel using the delete_rows() method. However, this does not delete the rows but only the content of the cells.
As an info, you can format a range as table using openpyxl as described in the documentation: https://openpyxl.readthedocs.io/en/latest/worksheet_tables.html
I have a formatted table called Table5 in the worksheet Sheet1:
To delete 2 rows from row 5, I use the following commands:
import openpyxl as xl
wb = xl.load_workbook('data.xlsx')
ws = wb['Sheet1']
ws.delete_rows(5, 2) # this is the delete command
wb.save('data.xlsx')
With the command delete_rows(), the range of the formatted table remains till row 6, whereas it shrinks when I delete the rows directly in Excel.
The question: How do I delete properly both the data and the formatted range?
Corollary note:
Also when I insert data, the table range does not expand. For example:
for i in range(4, 9):
ws.cell(row=i, column=1).value = i - 1
ws.cell(row=i, column=2).value = (i - 1) * 100
The range stays the same i.e. till row 6, whereas the range expands automatically by inserting data into Excel manually.

If you followed the linked doc, you must have:
added data to the Worksheet
created a Table instance which covers the data (ref=A1:B6)
added the Table instance to the worksheet with ws.add_table.
When you later add or remove rows, this does not affect the table which was remains with ref=A1:B6.
You would need to change the Table instance's ref to fit the new data layout.
# access your table (subsitute <Table> for the name of your table)
tab = ws.tables[<Table>]
# change the ref
tab.ref = "A1:B4"
# and save
wb.save('data.xlsx')
NB. the interface to the tables has changed since version 3.0.0, this code is tested with for 3.0.10, please check with documentation for previous/further changes.

In the following, I expand the answer by ljmc (which I have accepted) containing both ways with the newer and the older version of openpyxl.
After upgrading to openpyxl==3.0.10, you change the reference of a formatted table by simply referencing to the table name in the worksheet with the attribute .tables. Example: in the worksheet Sheet1 there is a formatted table named Table5:
import openpyxl as xl
wb = xl.load_workbook('data.xlsx')
ws = wb['Sheet1']
ws.delete_rows(5, 2) # this is the delete command
ws.tables['Table5'].ref = "A1:B4" # change the reference
wb.save('data.xlsx')
As simple as that: because ws.tables['Table5'] contains the formatted table.
Before the upgrade (openpyxl==3.0.0) you reference the formatted table using the method using the ._tables attribute, but instead of the name Table5 you need the index:
ws.delete_rows(5, 2) # this is the delete command
ws._tables[0].ref = "A1:B4" # change the reference
Using [0] works only if you have one formatted table in your worksheet or if Table5 is the first within the table object.
The proper pythonic way is to first determine the index where Table5 is in the table object:
ws.delete_rows(5, 2) # this is the delete command
ix = next(i for i, t in enumerate(ws._tables) if t.displayName == "Table5")
ws._tables[ix].ref = "A1:B4" # change the reference

Related

Copy pasting a excel column from one excel document to another

I am going crazy here. My code works but the "was2.cell(row = 1, column = 2).value = c.value" line is not saving no matter what I do. I keep getting a "int object has no attribute value" error message. Any ideas or suggestions ?
import openpyxl as xl;
from openpyxl import load_workbook;
# opens the source excel file
#"C:\Users\wwwya\Desktop\mkPox.xlsx" <-- needs to have double backwords slash for xl to understand
mkPox ="C:\\Users\\wwwya\\Desktop\\mkPox.xlsx"
wbMonkey1 = xl.load_workbook(mkPox)
ws1 = wbMonkey1.worksheets[0]
# opens the destination excel file
mkPaste ="C:\\Users\\wwwya\\Desktop\\mkPaste.xlsx"
wbPaste2 = xl.load_workbook(mkPaste)
ws2 = wbPaste2.active
# calculate total number of rows and
# columns in source excel file
mr = ws1.max_row
mc = ws1.max_column
# copying the cell values from source
# excel file to destination excel file
for row in range(2, mr + 1):
for column in "B": #Here you can add or reduce the columns
cell_name = "{}{}".format(column, row)
c = ws1[cell_name].value # the value of the specific cell
print(c)
# writing the read value to destination excel file
ws2.cell(row=2, column=2).value = c.value
# saving the destination excel file
wbPaste2.save(str(mkPaste))```
Your code had a couple of issues around this section
c = ws1[cell_name].value # the value of the specific cell
print(c)
# writing the read value to destination excel file
ws2.cell(row=2, column=2).value = c.value
You assigned c already to the 'value' of the cell, ws1[cell_name].value therefore c is a literal equal to the value of that cell, it has no attributes. When you attempt to assign the cell value on the 2nd sheet, you just want the variable 'c', as #norie indicated.
The next issue in that section is that is the row and column for ws2.cell doesn't change. Therefore whatever you are writing to the 2nd sheet is always going to cell 'B2' making the iteration thru the 1st sheet a waste of time, only cell 'B2' will have a value and it will be from the last cell in column 'B' in the 1st sheet.
Also there is no need to include a file path/name in wbPaste2.save(str(mkPaste)) if saving to the same file. It's only necessary if you want to save to a different path and filename. However if you include the filename it would still work. There is no need to cast as string since mkPaste is already a string.
The code example below shows how you can simplify the whole operation to a few lines;
Note; the loop uses enumerate to create two variables that update each loop iteration.
for enum, c in enumerate(ws1['B'][1:], 2):
enum is used as the row position in ws2, the '2' in the enumerate function means the enum variable initial value is 2, so the first row to be written on the 2nd sheet is row 2.
c is the cell object from ws1, column 'B'. The loop starts at the second cell due to the [1:] param in line with your code starting the copy from row 2.
There is no need to use intermediary variables, just assign each cell in the 2nd sheet the value of the corresponding cell in the 1st sheet then save the file.
import openpyxl as xl;
mkPox ="C:\\Users\\wwwya\\Desktop\\mkPox.xlsx"
wbMonkey1 = xl.load_workbook(mkPox)
ws1 = wbMonkey1.worksheets[0]
# opens the destination excel file
mkPaste ="C:\\Users\\wwwya\\Desktop\\mkPaste.xlsx"
wbPaste2 = xl.load_workbook(mkPaste)
ws2 = wbPaste2.active
for enum, c in enumerate(ws1['B'][1:], 2):
ws2.cell(row=enum, column=c.column).value = c.value
# saving the destination excel file
wbPaste2.save()

How do I create a table using Openpyxl's table module?

I'm attempting to create a script to process several Excel sheets at once and one of the steps i'm trying to get Python to handle is to create a table using data passed from a pandas data frame. Creating a table seems pretty straightforward looking at the documentation.
Following the example from here:
# define a table style
mediumstyle = TableStyleInfo(name='TableStyleMedium2', showRowStripes=True)
# create a table
table = Table(displayName='IdlingReport', ref='A1:C35', tableStyleInfo=mediumstyle)
# add the table to the worksheet
sheet2.add_table(table)
# Saving the report
wb.save(openexcel.filename)
print('Report Saved')
However this creates an empty table, instead of using the data present in cells 'A1:C35'. I can't seem to find any examples anywhere that go beyond these steps so any help with what I may be doing wrong is greatly appreciated.
The data in 'A1:C35' is being written to Excel as follows:
while i < len(self.sheets):
with pd.ExcelWriter(filename, engine='openpyxl') as writer:
writer.book = excelbook
writer.sheets = dict((ws.title, ws) for ws in excelbook.worksheets)
self.df_7.to_excel(writer, self.sheets[i], index=False, header=True, startcol=0, startrow=0)
writer.save()
i += 1
The output looks something like this
Time Location Duration
1/01/2019 [-120085722,-254580042] 5 Min
1/02/2019 [-120085722,-254580042] 15 Min
1/02/2019 [-120085722,-254580042] 7 Min
Just to clarify right now I am first writing my data frame to Excel and then after formatting the data I've written as a table. Reversing these steps by creating the table first and then writing to Excel fills the table, but gets rid of the formatting(font color, font type, size, etc). Which means I'd have to add an additional step to fix the formatting(which i'd like to avoid if possible).
Your command
# create a table
table = Table(displayName='IdlingReport', ref='A1:C35', tableStyleInfo=mediumstyle)
creates a special Excel object — an empty table with the name IdlingReport.
You probably want something else - to fill a sheet of your Excel workbook with data from a Pandas dataframe.
For this purpuse there is a function dataframe_to_rows():
from openpyxl import Workbook
from openpyxl.utils.dataframe import dataframe_to_rows
wb = Workbook()
ws = wb.active # to rename this sheet: ws.title = "some_name"
# to create a new sheet: ws = wb.create_sheet("some_name")
for row in dataframe_to_rows(df, index=True, header=True):
ws.append(row) # appends this row after a previous one
wb.save("something.xlsx")
See Working with Pandas Dataframes and Tutorial.

Finding Range of active/selected cell in Excel using Python and xlwings

I am trying to write a simple function in Python (with xlwings) that reads a current 'active' cell value in Excel and then writes that cell value to the cell in the next column along from the active cell.
If I specify the cell using an absolute reference, for example range(3, 2), then I everything is ok. However, I can't seem to manage to find the row and column values of whichever cell is selected once the function is run.
I have found a lot of examples where the reference is specified but not where the active cell range can vary depending on the user selection.
I have tried a few ideas. The first option is trying to use the App.selection that I found in the v0.10.0 xlwings documentation but this doesn't seem to return a range reference that can be used - I get an error "Invalid parameter" when trying to retrieve the row from 'cellRange':
def refTest():
import xlwings as xw
wb = xw.Book.caller()
cellRange = xw.App.selection
rowNum = wb.sheets[0].range(cellRange).row
colNum = wb.sheets[0].range(cellRange).column
url = wb.sheets[0].range(rowNum, colNum).value
wb.sheets[0].range(rowNum, colNum + 1).value = url
The second idea was to try to read the row and column directly from the cell selection but this gives me the error "Property object has no attribute 'row'":
def refTest():
import xlwings as xw
wb = xw.Book.caller()
rowNum = xw.App.selection.row
colNum = xw.App.selection.column
url = wb.sheets[0].range(rowNum, colNum).value
wb.sheets[0].range(rowNum, colNum + 1).value = url
Is it possible to pass the range of the active/selected cell from Excel to Python with xlwings? If anyone is able to shed some light on this then I would really appreciate it.
Thanks!
You have to get the app object from the workbook. You'd only use xw.App directly if you wanted to instantiate a new app. Also, selection returns a Range object, so do this:
cellRange = wb.app.selection
rowNum = cellRange.row
colNum = cellRange.column

How to write data into existing '.xlsx' file which has multiple sheets

i have to update/append data into existing xlsx file.
xlsx file contains multiple sheets.
for example i want to append some data into existing sheet 'Sheet1', how to do this
To append a new row of data to an existing spreadsheet, you could use the openpyxl module. This will:
Load the existing workbook from the file.
Determines the last row that is in use using ws.get_highest_row()
Add the new row on the next empty row.
Write the updated spreadsheet back to the file
For example:
import openpyxl
file = 'input.xlsx'
new_row = ['data1', 'data2', 'data3', 'data4']
wb = openpyxl.load_workbook(filename=file)
ws = wb['Sheet1'] # Older method was .get_sheet_by_name('Sheet1')
row = ws.get_highest_row() + 1
for col, entry in enumerate(new_row, start=1):
ws.cell(row=row, column=col, value=entry)
wb.save(file)
Note, as can be seen in the docs for XlsxWriter:
XlsxWriter is designed only as a file writer. It cannot read or modify
an existing Excel file.
This approach does not require the use of Windows / Excel to be installed but does have some limitations as to the level of support.
Try xlwings (currently available from http://xlwings.org) it is suitable for both reading and writing excel files.
Everything you need is in the quickstart tutorial. Something like this should be what you want.
import xlwings as xw
with open("FileName.xlsx", "w") as file:
wb = xw.Book(file) # Creates a connection with workbook
xw.Range('A1:D1').value = [1,2,3,4]
Selecting a Sheet
In order to read and write data to a specific sheet. You can activate a sheet and then call Range('cell_ref').
Sheet('Sheet1').activate();
Using Range to select cells
To select a single cell on the current worksheet
a = xw.Range('A1').value;
xw.Range('A1').value = float(a)+5;
To explicitly select a range of cells
xw.Range('A1:E8').value = [new_cell_values_as_list_of_lists];
xw.Range('Named range').value = [new_cell_values_as_list_of_lists];
To automatically select a contiguous range of populated cells that start from 'A1' and go right and down... until empty cell found.
Range('A1').table.value;
It is also possible to just select a row or column using:
Range('A1').vertical.value;
Range('A1').horizontal.value;
Other methods of creating a range object (from the api doc enter link description here)
Range('A1') Range('Sheet1', 'A1') Range(1, 'A1')
Range('A1:C3') Range('Sheet1', 'A1:C3') Range(1, 'A1:C3')
Range((1,2)) Range('Sheet1, (1,2)) Range(1, (1,2))
Range((1,1), (3,3)) Range('Sheet1', (1,1), (3,3)) Range(1, (1,1), (3,3))
Range('NamedRange') Range('Sheet1', 'NamedRange') Range(1, 'NamedRange')

In python removing rows from a excel file using xlrd, xlwt, and xlutils

Hello everyone and thank you in advance.
I have a python script where I am opening a template excel file, adding data (while preserving the style) and saving again. I would like to be able to remove rows that I did not edit before saving out the new xls file. My template xls file has a footer so I want to delete the extra rows before the footer.
Here is how I am loading the xls template:
self.inBook = xlrd.open_workbook(file_path, formatting_info=True)
self.outBook = xlutils.copy.copy(self.inBook)
self.outBookCopy = xlutils.copy.copy(self.inBook)
I then write the info to outBook while grabbing the style from outBookCopy and applying it to each row that I modify in outbook.
so how do I delete rows from outBook before writing it? Thanks everyone!
I achieved using Pandas package....
import pandas as pd
#Read from Excel
xl= pd.ExcelFile("test.xls")
#Parsing Excel Sheet to DataFrame
dfs = xl.parse(xl.sheet_names[0])
#Update DataFrame as per requirement
#(Here Removing the row from DataFrame having blank value in "Name" column)
dfs = dfs[dfs['Name'] != '']
#Updating the excel sheet with the updated DataFrame
dfs.to_excel("test.xls",sheet_name='Sheet1',index=False)
xlwt does not provide a simple interface for doing this, but I've had success with a somewhat similar problem (inserting multiple copies of a row into a copied workbook) by directly changing the worksheet's rows attribute and the row numbers on the row and cell objects.
The rows attribute is a dict, indexed on row number, so iterating a row range takes a little care and you can't slice it.
Given the number of rows you want to delete and the initial row number of the first row you want to keep, something like this might work:
rows_indices_to_move = range(first_kept_row, worksheet.last_used_row + 1)
max_used_row = 0
for row_index in rows_indices_to_move:
new_row_number = row_index - number_to_delete
if row_index in worksheet.rows():
row = worksheet.rows[row_index]
row._Row__idx = new_row_number
for cell in row._Row__cells.values():
if cell:
cell.rowx = new_row_number
worksheet.rows[new_row_number] = row
max_used_row = new_row_number
else:
# There's no row in the block we're trying to slide up at this index, but there might be a row already present to clear out.
if new_row_number in worksheet.rows():
del worksheet.rows[new_row_number]
# now delete any remaining rows
del worksheet.rows[new_row_number + 1:]
# and update the internal marker for the last remaining row
if max_used_row:
worksheet.last_used_row = max_used_row
I would believe that there are bugs in that code, it's untested and relies on direct manipulation of the underlying data structures, but it should show the general idea. Modify the row and cell objects and adjust the rows dictionary so that the indices are correct.
Do you have merged ranges in the rows you want to delete, or below them? If so you'll also need to run through the worksheet's merged_ranges attribute and update the rows for them. Also, if you have multiple groups of rows to delete you'll need to adjust this answer - this is specific to the case of having a block of rows to delete and shifting everything below up.
As a side note - I was able to write text to my worksheet and preserve the predefined style thus:
def write_with_style(ws, row, col, value):
if ws.rows[row]._Row__cells[col]:
old_xf_idx = ws.rows[row]._Row__cells[col].xf_idx
ws.write(row, col, value)
ws.rows[row]._Row__cells[col].xf_idx = old_xf_idx
else:
ws.write(row, col, value)
That might let you skip having two copies of your spreadsheet open at once.
For those of us still stuck with xlrd/xlwt/xlutils, here's a filter you could use:
from xlutils.filter import BaseFilter
class RowFilter(BaseFilter):
rows_to_exclude: "Iterable[int]"
_next_output_row: int
def __init__(
self,
rows_to_exclude: "Iterable[int]",
):
self.rows_to_exclude = rows_to_exclude
self._next_output_row = -1
def _should_include_row(self, rdrowx):
return rdrowx not in self.rows_to_exclude
def row(self, rdrowx, wtrowx):
if self._should_include_row(rdrowx):
# Proceed with writing out the row to the output file
self._next_output_row += 1
self.next.row(
rdrowx, self._next_output_row,
)
# After `row()` has been called, `cell()` is called for each cell of the row
def cell(self, rdrowx, rdcolx, wtrowx, wtcolx):
if self._should_include_row(rdrowx):
self.next.cell(
rdrowx, rdcolx, self._next_output_row, wtcolx,
)
Then put it to use with e.g.:
from xlrd import open_workbook
from xlutils.filter import DirectoryWriter, XLRDReader
xlutils.filter.process(
XLRDReader(open_workbook("input_filename.xls", "output_filename.xls")),
RowFilter([3, 4, 5]),
DirectoryWriter("output_dir"),
)

Categories

Resources