I am working a script for reading specific cells from an Excel workbook into a list, and then from the list into a CSV. There's a loop to get workbooks open from a folder as well.
My code:
import csv
import openpyxl
import os
path = r'C:\Users.....' # Folder holding workbooks
workbooks = os.listdir(path)
cell_values = [] # List for storing cell values from worksheets
for workbook in workbooks: # Workbook iteration
wb = openpyxl.load_workbook(os.path.join(path, workbook), data_only=True) # Open workbook
sheet = wb.active # Get sheet
f = open('../record.csv', 'w', newline='') # Open the CSV file
cell_list = ["I9", "AK6", "N35"] # List of cells to check
with f: # CSV writer loop
record_writer = csv.writer(f) # Open CSV writer
for cells in cell_list: # Loop through cell list to get cell values and write them to the cell_values list
cell_values.append(sheet[cells].value) # Append cell values to the cell_values list
record_writer.writerow(cell_values) # Write cell_values list to CSV
quit() # Terminate program after all workbooks in the folder have been analyzed
The output just puts all values on the same line, albeit separated by commas, but it doesn't help me when I go to open my results in Excel if everything is on the same line. When I was using xlrd, the format was vertical but all I had to do was transpose the dataset to be good. But I had to change from xlrd (which was a smart move in general) because it would not read merged cells.
I get this:
4083940,140-21-541,NP,8847060,140-21-736,NP
When I want this
4083940,140-21-541,NP
8847060,140-21-736,NP
Edit - I forgot the "what have I tried" portion of my post. I have tried changing my loops around to avoid overwriting the previous write to the CSV. I have tried clearing the list on each loop to get the script to treat each new entry as a new line. I have tried adding \n in the writer line as I saw in a couple of posts. I have tried to use writerows instead of writerow. I tried A instead of W even though it is a fix and not a solution but that didn't quite work right either.
Your main problem is that cell_values is accumulating the cells from multiple sheets. You need to reset it, like, cell_values = [], for every sheet.
I went back to your original example and:
moved the opening of record.csv up, and placed all the work inside the scope of that file being open and written into
moved cell_values = [] inside your workbook loop
moved cell_list = ["I9", "AK6", "N35"] to the top, because that's really scoped for the entire script, if every workbook has the same cells
removed quit(), it's not necessary at the very end of the script, and in general should probably be avoided: Python exit commands - why so many and when should each be used?
import csv
import openpyxl
import os
path = r'C:\Users.....' # Folder holding workbooks
workbooks = os.listdir(path)
cell_list = ["I9", "AK6", "N35"] # List of cells to check
with open('record.csv', 'w', newline='') as f:
record_writer = csv.writer(f)
for workbook in workbooks:
wb = openpyxl.load_workbook(os.path.join(path, workbook), data_only=True)
sheet = wb.active
cell_values = [] # reset for every sheet
for cells in cell_list:
cell_values.append(sheet[cells].value)
# Write one row per sheet
record_writer.writerow(cell_values)
Also, I can see your new the CSV module, and struggling a little conceptually (since you tried writerow, then writerows, trying to debug your code). Python's official document for CSV doesn't really give practical examples of how to use it. Try reading up here, Writing to a CSV.
Related
I want to modify an existing XLSM file, but:
I need to keep the VBA-Scripts
I need to keep the shapes (aka my "buttons" to trigger the scripts)
Further information:
So far I used openpyxl (but am open to other modules)
I want to modify sheet_1 only - it contains only data
sheet_2 contains shapes as buttons with VBA-macros behind them
sheet_3 to sheet_n contain more data without any shapes/linked scripts.
I managed to keep the vba scripts within the XLSM file by using the parameter "keep_vba", but it still deletes any shapes in the workbook. And since the shapes are used as buttons, I cannot delete them.
My code currently looks like follows.
from openpyxl import load_workbook
workbook = load_workbook(filename=insert_file, read_only=False, keep_vba=True)
import_sheet = workbook["sheet_1"]
row_array = ["1", "2", "3", "4"]
# get first empty row based on the key-column (=A)
empty_row_index = None
for i in range(1, 500, 1):
if import_sheet['A'+str(i)].value is None:
empty_row_index = i
break
for i, val in enumerate(row_array): # insert row
import_sheet.cell(row=empty_row_index, column=i+1).value = val
workbook.save(insert_file)
Any help / suggestions are welcome.
It turns out that xlwings can do what I want - it doesn't specify it in their documentation, but it works nevertheless. The following code snippet opens an xlsm file, inserts data into a cell and overwrites the workbook while keeping both macros and shapes/"buttons".
pip install xlwings
import xlwings as xw
wb = xw.Book(r'C:\path\to\file\file.xlsm')
sht = wb.sheets['sheet_1']
sht.range('A5').value = 'Foo 1'
wb.save(r'C:\path\to\file\file.xlsm') # full path overwrites file without prompt
# optional to close the appearing excel window
app = xw.apps.active
app.quit()
this is my list
list=['a','b','c']
when using this code
with open('commentafterlink.csv', 'w') as f:
f.write("%s\n" % list)
it stores each token of list in a one cell but i need to store whole list in one cell.where is the problem?
Can you try the following:
import xlwt
from xlwt import Workbook
# Workbook is created
wb = Workbook()
# add_sheet is used to create sheet.
sheet1 = wb.add_sheet('Sheet 1')
# sheet1.write(1, 0, ' '.join(list))
# if you want the output to be ['a','b','c']
sheet1.write(1, 0, str(list))
wb.save('xlwt example.xls')
Output:
1) You are not writing an Excel file, but a CSV file (that Excel knows how to import).
2) You are writing to the CSV file as if it was a text file, without respecting the semantics of CSV. CSV means "comma-separated values". If it contains the text ['a','b','c'], it will be interpreted as three columns: ['a' and 'b' and 'c']. You would need to quote the value in order to do it right; the default quote in CSV is a double quote. So if you write "['a','b','c']", this would be imported into Excel as one cell. However,
3) You are writing CSV file by hand, which means it is very easy to get the format wrong (e.g. forget to escape something that needs to be escaped). In general, whenever you are writing a file with an established format, it is worth checking if the format has a library that knows how to handle it. Python natively knows how to write CSV files using the csv module, and it knows how to write Excel files using packages you can install using pip, such as xlwt.
Here is how you write a CSV file correctly:
my_list = ['a', 'b', 'c']
import csv
with open('commentafterlink.csv', 'w') as w:
writer = csv.writer(w)
writer.writerow([str(my_list)])
Note: It is a Bad Thing to overwrite Python's built-in variables, such as list.
(I see another answerer has already provided you the Excel-specific solution using xlwt.)
Using join():
lst=['a','b','c']
with open('commentafterlink.csv', 'w') as f:
f.write("%s\n" % "".join(lst))
OUTPUT:
EDIT:
Using repr():
import xlsxwriter
lst=['a','b','c'] # avoid using list keyword
workbook = xlsxwriter.Workbook('commentafterlink.xlsx') # create/read the file
worksheet = workbook.add_worksheet() # adding a worksheet
worksheet.write('A1', repr(lst)) # write repr(lst) to cell A1
workbook.close() # close the file
OUTPUT:
Step 1. Take input from an excel file.
Step 2. Ceate WB object.
Step 3. Take data from that WB object and create a list of dictionaries.
Step 4. Manipulate, Format, style the data.
Step 5. Now need to output the data and APPEND it into an Existing Excel Workbook.
Meaning the Existing Excel Workbook will be appended with the new data from
Step 1 constantly.
I have Steps 1 - 4 down and can output my data to a NEW workbook. I am coming up
empty on Step 5.
Any guidance or direction would be appreciated.
### Python 3.X ###
import sys
import time
import openpyxl
from openpyxl.styles import Alignment, Font, Style
from openpyxl.cell import get_column_letter
from pathlib import Path
###Double Click a Batch file, CMD opens, Client drags and drops excel File
###to be Formatted
try:
path = sys.argv[1]
except IndexError:
path = Path(input('Input file to read: ').strip("'").strip('"'))
output_string = str(Path(path.parent, path.stem + '.NewFormatted.xlsx'))
wb = openpyxl.load_workbook(str(path))
sheet1 = wb.worksheet[0]
###Do the Python Dance on Data###
###Style, Font, Alignment applied to sheets###
###Currently Saves output as a NEW excel File in the Directory the original
###file was drag and dropped from
wb.save(output_string)
###Need the output to be appended to an existing Excel file already present
###in a target directory
###Note Formatted Workbook (output) has multiple sheets###
openpyxl will let you edit existing workbooks including appending data to them. But the methods provided are limited to individual items of data such as the cells. There are no aggregate functions for copying things like worksheets from one workbook to another.
I'm able to open my pre-existing workbook, but I don't see any way to open pre-existing worksheets within that workbook. Is there any way to do this?
You cannot append to an existing xlsx file with xlsxwriter.
There is a module called openpyxl which allows you to read and write to preexisting excel file, but I am sure that the method to do so involves reading from the excel file, storing all the information somehow (database or arrays), and then rewriting when you call workbook.close() which will then write all of the information to your xlsx file.
Similarly, you can use a method of your own to "append" to xlsx documents. I recently had to append to a xlsx file because I had a lot of different tests in which I had GPS data coming in to a main worksheet, and then I had to append a new sheet each time a test started as well. The only way I could get around this without openpyxl was to read the excel file with xlrd and then run through the rows and columns...
i.e.
cells = []
for row in range(sheet.nrows):
cells.append([])
for col in range(sheet.ncols):
cells[row].append(workbook.cell(row, col).value)
You don't need arrays, though. For example, this works perfectly fine:
import xlrd
import xlsxwriter
from os.path import expanduser
home = expanduser("~")
# this writes test data to an excel file
wb = xlsxwriter.Workbook("{}/Desktop/test.xlsx".format(home))
sheet1 = wb.add_worksheet()
for row in range(10):
for col in range(20):
sheet1.write(row, col, "test ({}, {})".format(row, col))
wb.close()
# open the file for reading
wbRD = xlrd.open_workbook("{}/Desktop/test.xlsx".format(home))
sheets = wbRD.sheets()
# open the same file for writing (just don't write yet)
wb = xlsxwriter.Workbook("{}/Desktop/test.xlsx".format(home))
# run through the sheets and store sheets in workbook
# this still doesn't write to the file yet
for sheet in sheets: # write data from old file
newSheet = wb.add_worksheet(sheet.name)
for row in range(sheet.nrows):
for col in range(sheet.ncols):
newSheet.write(row, col, sheet.cell(row, col).value)
for row in range(10, 20): # write NEW data
for col in range(20):
newSheet.write(row, col, "test ({}, {})".format(row, col))
wb.close() # THIS writes
However, I found that it was easier to read the data and store into a 2-dimensional array because I was manipulating the data and was receiving input over and over again and did not want to write to the excel file until it the test was over (which you could just as easily do with xlsxwriter since that is probably what they do anyway until you call .close()).
After searching a bit about the method to open the existing sheet in xlxs, I discovered
existingWorksheet = wb.get_worksheet_by_name('Your Worksheet name goes here...')
existingWorksheet.write_row(0,0,'xyz')
You can now append/write any data to the open worksheet.
You can use the workbook.get_worksheet_by_name() feature:
https://xlsxwriter.readthedocs.io/workbook.html#get_worksheet_by_name
According to https://xlsxwriter.readthedocs.io/changes.html the feature has been added on May 13, 2016.
"Release 0.8.7 - May 13 2016
-Fix for issue when inserting read-only images on Windows. Issue #352.
-Added get_worksheet_by_name() method to allow the retrieval of a worksheet from a workbook via its name.
-Fixed issue where internal file creation and modification dates were in the local timezone instead of UTC."
Although it is mentioned in the last two answers with it's documentation link, and from the documentation it seems indeed there are new methods to work with the "worksheets", I couldn't able to find this methods in the latest package of "xlsxwriter==3.0.3"
"xlrd" has removed support for anything other than xls files now.
Hence I was able to workout with "openpyxl" this gives you the expected functionality as mentioned in the first answer above.
I am trying to extract the header row (the first row) from multiple files, each of which has multiple sheets. The output of each sheet should be saved and appened in a new master file that contains all the headers from each sheet and each file.
The easiest way I have found is to use the command row_slice. However, the output from the file is a list of Cell objects and I cannot seem to access their indices.
I am looking for a way to save the data extracted into a new workbook.
Here is the code I have so far:
from xlrd import open_workbook,cellname
book = open_workbook('E:\Files_combine\MOU worksheets 2012\Walmart-GE_MOU 2012-209_worksheet_v03.xls')
last_index = len(book.sheet_names())
for sheet_index in range(last_index):
sheet = book.sheet_by_index(sheet_index)
print sheet.name
print sheet.row_slice(0,1)
I cannot get the output and store it as an input to a new file. Also, any ideas on how to automate this process for 100+ files will be appreciated.
You can store the output in a csv file and you can use the os.listdir and a for loop to loop over all the file names
import csv
import os
from xlrd import open_workbook, cellname
EXCEL_DIR = 'E:\Files_combine\MOU worksheets 2012'
with open("headers.csv", 'w') as csv_file:
writer = csv.writer(csv_file)
for file_name in os.listdir(EXCEL_DIR):
if file_name.endswith("xls"):
book = open_workbook(os.path.join(EXCEL_DIR, file_name))
for index, name in enumerate(book.sheet_names()):
sheet = book.sheet_by_index(index)
#the write row method takes a sequence
#I assume that row_slice returns a list or a tuple
writer.writerow(sheet.row_slice(0,1))