Append output to an existing Excel File using OpenPyXL - python

Step 1. Take input from an excel file.
Step 2. Ceate WB object.
Step 3. Take data from that WB object and create a list of dictionaries.
Step 4. Manipulate, Format, style the data.
Step 5. Now need to output the data and APPEND it into an Existing Excel Workbook.
Meaning the Existing Excel Workbook will be appended with the new data from
Step 1 constantly.
I have Steps 1 - 4 down and can output my data to a NEW workbook. I am coming up
empty on Step 5.
Any guidance or direction would be appreciated.
### Python 3.X ###
import sys
import time
import openpyxl
from openpyxl.styles import Alignment, Font, Style
from openpyxl.cell import get_column_letter
from pathlib import Path
###Double Click a Batch file, CMD opens, Client drags and drops excel File
###to be Formatted
try:
path = sys.argv[1]
except IndexError:
path = Path(input('Input file to read: ').strip("'").strip('"'))
output_string = str(Path(path.parent, path.stem + '.NewFormatted.xlsx'))
wb = openpyxl.load_workbook(str(path))
sheet1 = wb.worksheet[0]
###Do the Python Dance on Data###
###Style, Font, Alignment applied to sheets###
###Currently Saves output as a NEW excel File in the Directory the original
###file was drag and dropped from
wb.save(output_string)
###Need the output to be appended to an existing Excel file already present
###in a target directory
###Note Formatted Workbook (output) has multiple sheets###

openpyxl will let you edit existing workbooks including appending data to them. But the methods provided are limited to individual items of data such as the cells. There are no aggregate functions for copying things like worksheets from one workbook to another.

Related

csv module not writing new line

I am working a script for reading specific cells from an Excel workbook into a list, and then from the list into a CSV. There's a loop to get workbooks open from a folder as well.
My code:
import csv
import openpyxl
import os
path = r'C:\Users.....' # Folder holding workbooks
workbooks = os.listdir(path)
cell_values = [] # List for storing cell values from worksheets
for workbook in workbooks: # Workbook iteration
wb = openpyxl.load_workbook(os.path.join(path, workbook), data_only=True) # Open workbook
sheet = wb.active # Get sheet
f = open('../record.csv', 'w', newline='') # Open the CSV file
cell_list = ["I9", "AK6", "N35"] # List of cells to check
with f: # CSV writer loop
record_writer = csv.writer(f) # Open CSV writer
for cells in cell_list: # Loop through cell list to get cell values and write them to the cell_values list
cell_values.append(sheet[cells].value) # Append cell values to the cell_values list
record_writer.writerow(cell_values) # Write cell_values list to CSV
quit() # Terminate program after all workbooks in the folder have been analyzed
The output just puts all values on the same line, albeit separated by commas, but it doesn't help me when I go to open my results in Excel if everything is on the same line. When I was using xlrd, the format was vertical but all I had to do was transpose the dataset to be good. But I had to change from xlrd (which was a smart move in general) because it would not read merged cells.
I get this:
4083940,140-21-541,NP,8847060,140-21-736,NP
When I want this
4083940,140-21-541,NP
8847060,140-21-736,NP
Edit - I forgot the "what have I tried" portion of my post. I have tried changing my loops around to avoid overwriting the previous write to the CSV. I have tried clearing the list on each loop to get the script to treat each new entry as a new line. I have tried adding \n in the writer line as I saw in a couple of posts. I have tried to use writerows instead of writerow. I tried A instead of W even though it is a fix and not a solution but that didn't quite work right either.
Your main problem is that cell_values is accumulating the cells from multiple sheets. You need to reset it, like, cell_values = [], for every sheet.
I went back to your original example and:
moved the opening of record.csv up, and placed all the work inside the scope of that file being open and written into
moved cell_values = [] inside your workbook loop
moved cell_list = ["I9", "AK6", "N35"] to the top, because that's really scoped for the entire script, if every workbook has the same cells
removed quit(), it's not necessary at the very end of the script, and in general should probably be avoided: Python exit commands - why so many and when should each be used?
import csv
import openpyxl
import os
path = r'C:\Users.....' # Folder holding workbooks
workbooks = os.listdir(path)
cell_list = ["I9", "AK6", "N35"] # List of cells to check
with open('record.csv', 'w', newline='') as f:
record_writer = csv.writer(f)
for workbook in workbooks:
wb = openpyxl.load_workbook(os.path.join(path, workbook), data_only=True)
sheet = wb.active
cell_values = [] # reset for every sheet
for cells in cell_list:
cell_values.append(sheet[cells].value)
# Write one row per sheet
record_writer.writerow(cell_values)
Also, I can see your new the CSV module, and struggling a little conceptually (since you tried writerow, then writerows, trying to debug your code). Python's official document for CSV doesn't really give practical examples of how to use it. Try reading up here, Writing to a CSV.

Edit .xlsx with python

I Completely have no idea where to start.
I want to edit some think like:
To:
I want to save the result in a .txt file.
Every thing i know is to open and read the file.
code:
import pandas as pd
file = "myfile.xlsx"
f = pd.read_excel(file)
print(f)
I think the image colors speak for themselves how the code have to run. If not, I'll answer any question.
My go-to for editing Excel spreadsheets is openpyxl
I don't believe it can turn .csv or .xlsx/xlsm into .txt files, but it can read .xlsx/xlsm and save them as a .csv, and pandas can read csv files, so you can probably go from there
Quick example:
from openpyxl import load_workbook
wb = load_workbook("foo.xlsx")
sheet = wb["baz"]
sheet["D5"] = "I'm cell D5"
Use openpyxl, and look at this below:
Get cell color from .xlsx
color_in_hex = sh['A2'].fill.start_color.index # this gives you Hexadecimal value of the color (in cell A2)
So you'd have to iterate across your columns/rows checking for a colour match, then if its a match, grab the value and apply it to your new sheet

use data from excel file in another python script

I want to write a python script that takes data from one excel file and uses this data and inputs it in another excel file to get the output. For eg, if i have input.csv, it takes the data from there, and replaces certain cells of output.csv and gets the value based of the calculation
import pandas as pd
import numpy as np
data=pd.read_excel("Data.xlsx")
Depth=data["Depth (D):"];
ID=data["Tubing inner diameter (dti):"];
API=data["Oil gravity (API):"];
oilvisc=data["Oil viscosity (cp):"];
this is the script i have currently, these are the inputs.
import xlwt
import xlrd
from xlutils.copy import copy
rb=xlrd.open_workbook("hagedornbrowncorrelation.xls")
wb=copy(rb)
w_sheet=wb.get_sheet(0)
w_sheet.write(4,2,700)
wb.save("hagedornbrowncorrelation.xls")
the workbook "hagedornbrowncorrelation.xls" is my calculator, i am replacing the C5 with 700, but when i save it, all the macros and formulas in the workbook just go away and it becomes a useless workbook with numbers
I have done a similar project with openpyxl module which can be found here
https://openpyxl.readthedocs.io/en/stable/
Because I build a UI with Tkinter, I did to open a file, you may not want to use a global variable like I did, this was a quick hack.
def getFilecurrent():
global path
# open dialog box to select file
path = filedialog.askopenfilename(initialdir="/", title="Select file")
Then you can store it using
ref_workbook = openpyxl.load_workbook("filevariable")
Then do your manipulation of the data by selecting the right cell using, also remember to select the right worksheet.
weeklyengagement = ws['B18'].value
Afterwards, you create a new template for the file pasted into like
template = openpyxl.load_workbook("Section12Grades.xlsx") #Add file name
temp_sheet = template.get_sheet_by_name("Sheet1") #Add Sheet name
Lastly, you copy the range and paste the range using loops. There are so many resources out there I'm not going to paste my code as it has some custom set up and it would only confuse you.
Edit: if you wish to save with Macro, you can do:
wb = load_workbook(filename='filename.xlsm', read_only=False, keep_vba=True)
Formulas are string and if you wish to save the formulas, you have to keep it in the string format and save.

Exporting plain text header and image to Excel

I am fairly new to Python, but I'm getting stuck trying to pass an image file into a header during the DataFrame.to_excel() portion of my file.
Basically what I want is a picture in the first cell of the Excel table, followed by a couple of rows (5 to be exact) of text which will include a date (probably from datetime.date.today().ctime() if possible).
I already have the code to output the table portion as:
mydataframe.to_excel(my_path_name, sheet_name= my_sheet_name, index=False, startrow=7,startcol=0)
Is there a way to output the image and text portion directly from Python?
UPDATE:
For clarity, mydataframe is exporting the meat and potatoes of the worksheet (data rows and columns). I already have it starting on row 7 of the worksheet in Excel. The header portion is the trouble spot.
I found the solution and thanks for all of the help.
The simple answer is to use the xlsxwriter package as the engine. In other words assume that the image is saved at the path /image.png. Then the code to insert the data into the excel file with the image located at the top of the data would be:
# Importing packages and storing string for image file
import pandas as pd
import xlsxwriter
import numpy as np
image_file = '/image.png'
# Creating a fictitious data set since the actual data doesn't matter
dataframe = pd.DataFrame(np.random.rand(5,2),columns=['a','b'])
# Opening the xlsxwriter object to a path on the C:/ drive
writer = pd.ExcelWriter('C:/file.xlsx',engine='xlsxwriter')
dataframe.to_excel(writer,sheet_name = 'Arbitrary', startrow=3)
# Accessing the workbook / worksheet
workbook = writer.book
worksheet = writer.sheets['Arbitrary']
# Inserting the image into the workbook in cell A1
worksheet.insert_image('A1',image_file)
# Closing the workbook and saving the file to the specified path and filename
writer.save()
And now I have an image on the top of my excel file. Huzzah!

Overwriting existing cells in an XLSX file using Python

I am trying to find a library that overwrites an existing cell to change its contents using Python.
what I want to do:
read from .xlsx file
compare cell data determine if change is needed.
change data in cell Eg. overwrite date in cell 'O2'
save file.
I have tried the following libraries:
xlsxwriter
combination of:
xlrd
xlwt
xlutils
openpyxl
xlsxwriter only writes to a new excel sheet and file.
combination: works to read from .xlsx but only writes to .xls
openpyxl: reads from existing file but doesn't write to existing cells can only create new rows and cells, or can create entire new workbook
Any suggestions would greatly be appreciated. Other libraries? how to manipulate the libraries above to overwrite data in an existing file?
from win32com.client import Dispatch
import os
xl = Dispatch("Excel.Application")
xl.Visible = True # otherwise excel is hidden
# newest excel does not accept forward slash in path
wbs_path = r'C:\path\to\a\bunch\of\workbooks'
for wbname in os.listdir(wbs_path):
if not wbname.endswith(".xlsx"):
continue
wb = xl.Workbooks.Open(wbs_path + '\\' + wbname)
sh = wb.Worksheets("name of sheet")
sh.Range("A1").Value = "some new value"
wb.Save()
wb.Close()
xl.Quit()
Alternatively you can use xlwing, which (if I had to guess) seems to be using this approach under the hood.
>>> import xlwings as xw
>>> wb = xw.Book() # this will create a new workbook
>>> wb = xw.Book('FileName.xlsx') # connect to an existing file in the current working directory
>>> wb = xw.Book(r'C:\path\to\file.xlsx') # on Windows: use raw strings to escape backslashes

Categories

Resources