How to properly delete columns from excel file using Python? - python

I want to delete column "K" by code:
cols_list= ['K']
def modify(path):
wb = load_workbook(path)
ws = wb.active
sheet = wb["Sheet1"]
for i in cols_list:
which_cols= openpyxl.utils.cell.column_index_from_string(i)
ws.delete_cols(which_cols, 1)
wb.save("./data_frames_from_xml_modified.xlsx")
The problem is that the modified file don't load properly.
When I manually delete column "K" from Excel. I do it this way:
And the rest of the file properly move left, without bugs.
But when I execute my code, the result is:
So as you can see, the file was not generated correctly.
How can I delete columns without such a mess in my file?

Related

How to modify xml tables inside excel with openpyxl?

I'm trying fo fill a table (inside an xlsx template for Dynamics NAV) with openpyxl, but when I open the file with Excel it promts an alert: “We found a problem with some content in <Excel_filename>. Do you want us to try recovering the file as much as we can? If you trust the source of this workbook, then click Yes”
Then Excel 'repairs' the file and I can still see the data but the table /xl/tables/table1.xml is gone, and Navision can't accept the file.
This is my code in python:
import openpyxl
wb = openpyxl.load_workbook("data_source.xlsx", data_only=True)
sheet1 = wb.active
wb2 = openpyxl.load_workbook('template.xlsx')
sheet2 = wb2.active
filas = sheet1.max_row
for fila in range(3,filas):
sheet2["A"+ str(fila)] = sheet1["A"+ str(fila)].value
sheet2["B"+ str(fila)] = sheet1["B"+ str(fila)].value
sheet2["C"+ str(fila)] = "FRA"
sheet2["D"+ str(fila)] = "NAC"
wb2.save('tax1.xlsx')
wb2.close()
When I create a table from zero with the code they show in the openpyxl official site:
https://openpyxl.readthedocs.io/en/latest/worksheet_tables.html#creating-a-table
it works fine only if the table starts from row one (ref="A1:E5").
...but this template has a table that starts from row 3!
So when I try to make the table I need (ref="A3:D6") I get this: 'UserWarning: File may not be readable: column headings must be strings.' and as expected, I get the same alert and the same result when I open it with Excel.
Is there a way to modify/fill a table without corrupting the xlsx file?
or, like a workaround
Is there a way to create a table from A3 with no errors?
Thanks in advance

Insert Pandas Dataframe in Excel sheet between preexisting data

My current project deals with writing data from one excel file to a specific format, chosen by the user. The format is saved in a folder as excel file where headers and some other text (which will always stay the same) is already in the file, and the only thing that needs to be done is to fill the file with data.
For this I would like to "simply" insert my pandas dataframe at a certain row, so that neither the header nor the footer will be overwritten.
Here an example format:
And how I want the result to look like:
I already managed to write the data to the file below the header row, but it overwrites the footer. This is the code that does exactly that:
fileName = saveFolder+"test.xlsx"
shutil.copyfile(format_path, fileName)
book = load_workbook(fileName)
writer = pd.ExcelWriter(fileName, engine='openpyxl')
writer.book = book
writer.sheets = {ws.title: ws for ws in book.worksheets}
df.to_excel(writer, sheet_name="Tabelle1", startrow=1)
writer.close()
If this is not possible the only workaround I can think of is to read the format used, save it in python, write the header in the given format (background colour, fontsize,...), then the data, and then the footer.
However, if I remember correctly when reading text python will not remember which words are written in bold, and which words are normal. If someone, however, knows how to do this, I would also very much appreciate comments that try to solve my issue in that direction.
To preserve the existing format, etc. you will need insert data into specific cells and openpyxl will allow you to do that. to_excel() will overwrite the worksheet you are trying to add the data. There are some gaps in the question, but I will try to answer it the best I can. Below is the code which will:
Assume there is a dataframe existing with a few rows of data
The program will open the template file (like in the screen shot shared)
Add that dataframe (insert rows and add data without header) to the file (screen shot shared)
Save it as a new file
I am assuming there is just one sheet in template and you are writing to that. You can use for loops to add more sheets or new files
Note that the new rows of data will need to be inserted so that the footer will move down and NOT get deleted. The format, color, etc. of the template will remain intact.
dataframe df (including header)
Name Age Nationality
ABC 12 US
DEF 111 UK
GHI 22 India
JKL 49 Japan
Code
import openpyxl
from openpyxl.utils.dataframe import dataframe_to_rows
file = 'inputfile.xlsx' ## Your template file
wb = openpyxl.load_workbook(filename=file)
ws = wb.active ## You can ws = wb['Sheet1'] if you want to specify a specific sheet
ws.insert_rows(idx=2, amount=len(df)) ## Insert as many rows as in df (4 in our case) after row 1
rows = dataframe_to_rows(df, index=False, header=None)
for r_idx, row in enumerate(rows, 2):
for c_idx, value in enumerate(row, 1):
ws.cell(row=r_idx, column=c_idx, value=value) ##Add the data
wb.save('NewFile.xlsx') ##Your output file
Template (inputfile.xlsx)
Output (Newfile.xlsx)

How to change sheet names without losing graph reference with openpyxl

I'm trying to write some code that will change sheet names in an excel file based on the data in another excel file.
At first this worked fine;
Sheet_names = "A","B","C","D"
wb = openpyxl.load_workbook(file_name.xlsx)
for i in range (1, len(Sheet_names)):
Name_change = wb["Sheet{}".format(str(i))]
wb.active = Name_change
Name_change.title = "{}".format(Sheet_names[i])
wb.save((file_name.xlsx))
wb.close()
But for some reason (I'm unsure what I've changed) a graph within the excel file isn't updating. So the data reference is still to Sheet1, Sheet2 etc.
Im also getting a warning message that there are external links - I guess it's assuming the sheet references are external? The excel file comes from a template that's copied across.
Changing it manually isn't an option, Is having Openpyxl recreate the graph for every sheet the only option?
Out of ideas, help!

csv module not writing new line

I am working a script for reading specific cells from an Excel workbook into a list, and then from the list into a CSV. There's a loop to get workbooks open from a folder as well.
My code:
import csv
import openpyxl
import os
path = r'C:\Users.....' # Folder holding workbooks
workbooks = os.listdir(path)
cell_values = [] # List for storing cell values from worksheets
for workbook in workbooks: # Workbook iteration
wb = openpyxl.load_workbook(os.path.join(path, workbook), data_only=True) # Open workbook
sheet = wb.active # Get sheet
f = open('../record.csv', 'w', newline='') # Open the CSV file
cell_list = ["I9", "AK6", "N35"] # List of cells to check
with f: # CSV writer loop
record_writer = csv.writer(f) # Open CSV writer
for cells in cell_list: # Loop through cell list to get cell values and write them to the cell_values list
cell_values.append(sheet[cells].value) # Append cell values to the cell_values list
record_writer.writerow(cell_values) # Write cell_values list to CSV
quit() # Terminate program after all workbooks in the folder have been analyzed
The output just puts all values on the same line, albeit separated by commas, but it doesn't help me when I go to open my results in Excel if everything is on the same line. When I was using xlrd, the format was vertical but all I had to do was transpose the dataset to be good. But I had to change from xlrd (which was a smart move in general) because it would not read merged cells.
I get this:
4083940,140-21-541,NP,8847060,140-21-736,NP
When I want this
4083940,140-21-541,NP
8847060,140-21-736,NP
Edit - I forgot the "what have I tried" portion of my post. I have tried changing my loops around to avoid overwriting the previous write to the CSV. I have tried clearing the list on each loop to get the script to treat each new entry as a new line. I have tried adding \n in the writer line as I saw in a couple of posts. I have tried to use writerows instead of writerow. I tried A instead of W even though it is a fix and not a solution but that didn't quite work right either.
Your main problem is that cell_values is accumulating the cells from multiple sheets. You need to reset it, like, cell_values = [], for every sheet.
I went back to your original example and:
moved the opening of record.csv up, and placed all the work inside the scope of that file being open and written into
moved cell_values = [] inside your workbook loop
moved cell_list = ["I9", "AK6", "N35"] to the top, because that's really scoped for the entire script, if every workbook has the same cells
removed quit(), it's not necessary at the very end of the script, and in general should probably be avoided: Python exit commands - why so many and when should each be used?
import csv
import openpyxl
import os
path = r'C:\Users.....' # Folder holding workbooks
workbooks = os.listdir(path)
cell_list = ["I9", "AK6", "N35"] # List of cells to check
with open('record.csv', 'w', newline='') as f:
record_writer = csv.writer(f)
for workbook in workbooks:
wb = openpyxl.load_workbook(os.path.join(path, workbook), data_only=True)
sheet = wb.active
cell_values = [] # reset for every sheet
for cells in cell_list:
cell_values.append(sheet[cells].value)
# Write one row per sheet
record_writer.writerow(cell_values)
Also, I can see your new the CSV module, and struggling a little conceptually (since you tried writerow, then writerows, trying to debug your code). Python's official document for CSV doesn't really give practical examples of how to use it. Try reading up here, Writing to a CSV.

Excel Error when using pandas and openpyxl: Repaired Part: /xl/worksheets/sheet1.xml part with XML error. HRESULT 0x8000ffff Line 1, column 0

I'm coming up with an error of opening up an excel file after writing to it. This is what I have so far:
#locate source document
Path = Path(r'C:\Users\username\Test\EXCEL_Test.xlsx')
# open doc and go to active sheet
wb = load_workbook(filename = Path)
ws = wb.active
#add drop down list to each cell in a certain column
dv_v = DataValidation(type="list", formula1='"Y,N"', allow_blank=True)
for cell in ws['D']:
cell = ws.add_data_validation(dv_v)
wb.save(Path)
And these are the two errors that comes up on opening the excel file:
First error popup:
"We found a problem with some content in 'EXCEL_Test.xlsx'. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes."
Second error popup:
"Repaired Part: /xl/worksheets/sheet1.xml part with XML error. HRESULT 0x8000ffff Line 1, column 0."
My data validation is not showing up, and the file has the above errors when attempting to open the file to view the openpyxl changes.
Maybe if someone can help me find out why these errors are popping up? Python finishes with exit code 0, and why the data validation is coming up as blanks in the recovered file?
I think you are using the ws.add_data_validation(dv) incorrectly. The data validations get assigned to the dv first then the dv gets added to the cell.
Try doing it like this.
import openpyxl
from openpyxl import Workbook
from openpyxl.worksheet.datavalidation import DataValidation
#locate source document
Path = 'C:/Users/username/test/Excel_test.xlsx'
# open doc and go to active sheet
wb = openpyxl.load_workbook(filename = Path)
ws = wb['Sheet1']
#add drop down list to each cell in a certain column
dv = DataValidation(type="list", formula1='"Y,N"', allow_blank=True)
ws.add_data_validation(dv)
# This is the same as for the whole of column D
dv.add('D1:D1048576')
wb.save(Path)
Take a look at the Docs here: https://openpyxl.readthedocs.io/en/stable/validation.html

Categories

Resources