I'm trying to build a report generator which reads excel sheets and returns rows which contain values. I built a version which works as I require but only works for csv this is only my 1st code-mash-together, but it worked. I now would like to include conditional formatting as well (highlight certain cells values eg. if <65 format red) and so that required that I rewrite with xlsx sheets rather than csv.
Below is my attempt at getting this to work...
I can find the values and return the row, but on the second run through it returns an error
AttributeError: 'Worksheet' object has no attribute 'cell_value'
Which is surprising because it worked just previously and stepping through the code retuns the values I want.... I have tried changing it to .value, but returns:
AttributeError: 'function' object has no attribute 'value'
Help, I have no idea what I'm doing now. If it doens't make any sense i'm happy to post my original code for the csv to 'explain'
Thanks
import xlsxwriter
import xlrd
import os
import xlwt
# open original excelbook and access first sheet
for excelDocs in os.listdir('.'):
if not excelDocs.endswith('.xlsx'):
continue # skip non-xlsx files
workbook = xlrd.open_workbook(excelDocs)
sheet = workbook.sheet_by_index(0)
cellslist = []
i = 0
#########WORKS!#####################
for row in range(sheet.nrows):
for col in range(sheet.ncols):
if sheet.cell_value(row, col) == 'CP' or sheet.cell_value(row, col) == 'LNA' or sheet.cell_value(row, col) == 'Last Name':
i = i + 1
data = [sheet.cell_value(0, col) for col in range(sheet.ncols)]
workbook = xlsxwriter.Workbook()
sheet = workbook.add_worksheet('excelDocs')
for index, value in enumerate(data):
sheet.write(i, index, value)
workbook = xlrd.open_workbook(excelDocs)
I have no experience with xlsxwriter, xlrd or xlwt. As this is your "1st code-mash-together" I figured I would offer an alternative using openpyxl.
I do not have your data, so testing is a little difficult, but any syntax errors could be fixed. Please let me know if this does not run and I will help fix if required.
I am assuming your output is to a seperate file(report.xlsx here) and a tab for each workbook checked(each tab named for source book name).
import openpyxl
from openpyxl import *
from openpyxl.utils import get_column_letter
interestingValues = ['CP','LNA', 'LastName']
report = Workbook()
dest_filename = 'report.xlsx'
# open original excelbook and access first sheet
for excelDocs in os.listdir('.'):
if not excelDocs.endswith('.xlsx'):
continue # skip non-xlsx files
workbook = load_workbook(excelDocs)
sheet = workbook.active
workingReportSheet = report.create_sheet(str(excelDocs.split('.')[0]))
i = 0
for row in range(1,sheet.max_row):
for col in range(sheet.max_column):
columnLetter = get_column_letter(col +1)
if str(sheet['%s%s' % (columnLetter,row)].value) in interestingValues:
i += 1
data = [sheet['%s%s' % (str(get_column_letter(col)),i)].value for col in range(1,sheet.max_column +1)]
for index, value in enumerate(data):
workingReportSheet['%s%s' % (str(get_column_letter(index+1)),i)].value = value
report.save(filename = dest_filename)
Reading your code again, it may be that you are discarding your output.
Try the below.
import xlsxwriter
import xlrd
import os
import xlwt
#Create output sheet
outputworkbook = xlsxwriter.Workbook()
# open original excelbook and access first sheet
for excelDocs in os.listdir('.'):
if not excelDocs.endswith('.xlsx'):
continue # skip non-xlsx files
workbook = xlrd.open_workbook(excelDocs)
sheet = workbook.sheet_by_index(0)
cellslist = []
i = 0
outputsheet = outputworkbook.add_worksheet('excelDocs')
for row in range(sheet.nrows):
for col in range(sheet.ncols):
if sheet.cell_value(row, col) == 'CP' or sheet.cell_value(row, col) == 'LNA' or sheet.cell_value(row, col) == 'Last Name':
i = i + 1
data = [sheet.cell_value(0, col) for col in range(sheet.ncols)]
for index, value in enumerate(data):
outputsheet.write(i, index, value)
Related
I would like to get a specific row from Workbook 1 and append it after the existing data in Workbook 2.
The code that I tried so far can be found down below:
import openpyxl as xl
from openpyxl.utils import range_boundaries
min_cols, min_rows, max_cols, max_rows = range_boundaries('A:GH')
#Take source file
source = r"C:\Users\Desktop\Python project\Workbook1.xlsx"
wb1 = xl.load_workbook(source)
ws1 = wb1["P2"] #get the needed sheet
#Take destination file
destination = r"C:\Users\Desktop\Python project\Workbook2.xlsx"
wb2 = xl.load_workbook(destination)
ws2 = wb2["A3"] #get the needed sheet
row_data = 0
#Get row & col position and store it in row_data & col_data
for row in ws1.iter_rows():
for cell in row:
if cell.value == "Positive":
row_data += cell.row
for row in ws1.iter_rows(min_row=row_data, min_col = 1, max_col=250, max_row = row_data):
ws2.append((cell.value for cell in row[min_cols:max_cols]))
wb2.save(destination)
wb2.close()
But when I use the above mentioned code, I get the result but with a shift of 1 row.
I want the data that is appended to row 8, to be on row 7, right after the last data in Workbook 2.
(See image below)
Workbook 2
Does anyone got any feedback?
Thanks!
I found the solution and will post it here in case anyone will have the same problem. Although the cells below looked empty, they had apparently, weird formatting. That's why the Python script saw the cells as Non-empty and appended/shifted the data in another place(the place where there was no formatting).
The Solution would be to format every row below your data as empty cells. (Just copy a range of empty cells from a new Workbook and paste it below your data)
Hope that helps! ;)
This question already has an answer here:
Copy paste column range using OpenPyxl
(1 answer)
Closed 5 years ago.
I have data in an excel file, but for it to be useful I need to copy & paste the columns into a different order.
I have figured out how to open & read my file and to write a new excel file. I can also get the data from the original, and paste it into my new file but not in a loop.
here's an example of the data i'm working with to visualize my issue i need A1,B1,C1 next to each other and then A2,B2,C2, etc etc.
Here is my code from a smaller test file I created to play around with:
import openpyxl as op
wb = op.load_workbook('coding_test.xlsx')
ws = wb.active
mylist = []
mylist2 = []
mylist3 = []
for row in ws.iter_rows('H13:H23'):
for cell in row:
mylist.append(cell.value)
for row in ws.iter_rows('L13:L23'):
for cell in row:
mylist2.append(cell.value)
for row in ws.iter_rows('P13:P23'):
for cell in row:
mylist3.append(cell.value)
print (mylist, mylist2, mylist3)
new_wb = op.Workbook()
dest_filename = 'empty_coding_test.xlsx'
new_ws = new_wb.active
for row in zip (mylist, mylist2, mylist3):
new_ws.append(row)
new_wb.save(filename=dest_filename)
I want to create a loop to do the rest of the work, but I can't figure out how to design it so that I don't have to code for each column and set.
well, you can recycle code doing something like:
import openpyxl as op
wb = op.load_workbook('coding_test.xlsx')
ws = wb.active
new_wb = op.Workbook()
dest_filename = 'empty_coding_test.xlsx'
new_ws = new_wb.active
for row in ws.iter_rows('H13:H23'):
for cell in row:
new_ws['A%s' % cell].value = cell.value
for row in ws.iter_rows('L13:L23'):
for cell in row:
new_ws['B%s' % cell].value = cell.value
for row in ws.iter_rows('P13:P23'):
for cell in row:
new_ws['C%s' % cell].value = cell.value
new_wb.save(filename=dest_filename)
tell me if that work for you
I have created an xls file in which I write some user inputs into the cells. So far so good, the program works; it writes the first line. But when I run again the program instead of appending the rows it writes on top of the first one. I'm trying to understand how to make it append a new row into the excel sheet save it and close it etc
import xlsxwriter
workbook = xlsxwriter.Workbook("test.xlsx",)
worksheet = workbook.add_worksheet()
row = 0
col = 0
worksheet.write(row, col, 'odhgos')
worksheet.write(row, col + 1, 'e/p')
worksheet.write(row, col + 2, 'dromologio')
worksheet.write(row, col + 3, 'ora')
row += 1
worksheet.write_string(row, col, odigosou)
worksheet.write_string(row, col + 1, dromou)
worksheet.write_string(row, col + 2, dromologio)
worksheet.write_string(row, col + 3, ora)
workbook.close()
With this code I created I'm able to write in the file but how do I make it to append a row in the existing sheet. All tutorials I watched, all instructions I researched, just don't work; I'm doing something wrong obviously but I'm not able to spot it.
Question: ... how do I make it to append a row in the existing sheet
Solution using openpyxl, for instance:
from openpyxl import load_workbook
new_row_data = [
['odhgos', 'e/p', 'dromologio', 'ora'],
['odigosou', 'dromou', 'dromologio', 'ora']]
wb = load_workbook("test/test.xlsx")
# Select First Worksheet
ws = wb.worksheets[0]
# Append 2 new Rows - Columns A - D
for row_data in new_row_data:
# Append Row Values
ws.append(row_data)
wb.save("test/test.xlsx")
Tested with Python: 3.4.2 - openpyxl: 2.4.1 - LibreOffice: 4.3.3.2
Another solution which avoids FileNotFound errors by creating the file if it doesn't exist:
from openpyxl import Workbook
from openpyxl import load_workbook
filename = "myfile.xlsx"
new_row = ['1', '2', '3']
# Confirm file exists.
# If not, create it, add headers, then append new data
try:
wb = load_workbook(filename)
ws = wb.worksheets[0] # select first worksheet
except FileNotFoundError:
headers_row = ['Header 1', 'Header 2', 'Header 3']
wb = Workbook()
ws = wb.active
ws.append(headers_row)
ws.append(new_row)
wb.save(filename)
# Note: if you're adding values from a list, you could instead use:
# new_row = ""
# new_row += [val for val in list]
# Similarly, for adding values from a dict:
# new_row = ""
# new_row = += [val for val in mydict['mykey'].values()]
I'm working on a script that modifies an existing excel document and I need to have the ability to insert a column between two other columns like the VBA macro command .EntireColumn.Insert.
Is there any method with openpyxl to insert a column like this?
If not, any advice on writing one?
Here is an example of a much much faster way:
import openpyxl
wb = openpyxl.load_workbook(filename)
sheet = wb.worksheets[0]
# this statement inserts a column before column 2
sheet.insert_cols(2)
wb.save("filename.xlsx")
Haven't found anything like .EntireColumn.Insert in openpyxl.
First thought coming into my mind is to insert column manually by modifying _cells on a worksheet. I don't think it's the best way to insert column but it works:
from openpyxl.workbook import Workbook
from openpyxl.cell import get_column_letter, Cell, column_index_from_string, coordinate_from_string
wb = Workbook()
dest_filename = r'empty_book.xlsx'
ws = wb.worksheets[0]
ws.title = "range names"
# inserting sample data
for col_idx in xrange(1, 10):
col = get_column_letter(col_idx)
for row in xrange(1, 10):
ws.cell('%s%s' % (col, row)).value = '%s%s' % (col, row)
# inserting column between 4 and 5
column_index = 5
new_cells = {}
ws.column_dimensions = {}
for coordinate, cell in ws._cells.iteritems():
column_letter, row = coordinate_from_string(coordinate)
column = column_index_from_string(column_letter)
# shifting columns
if column >= column_index:
column += 1
column_letter = get_column_letter(column)
coordinate = '%s%s' % (column_letter, row)
# it's important to create new Cell object
new_cells[coordinate] = Cell(ws, column_letter, row, cell.value)
ws._cells = new_cells
wb.save(filename=dest_filename)
I understand that this solution is very ugly but I hope it'll help you to think in a right direction.
I am writing a python script to read data from an excel sheet using xlrd. Few of the cells of the the work sheet are highlighted with different color and I want to identify the color code of the cell. Is there any way to do that ? An example would be really appreciated.
Here is one way to handle this:
import xlrd
book = xlrd.open_workbook("sample.xls", formatting_info=True)
sheets = book.sheet_names()
print "sheets are:", sheets
for index, sh in enumerate(sheets):
sheet = book.sheet_by_index(index)
print "Sheet:", sheet.name
rows, cols = sheet.nrows, sheet.ncols
print "Number of rows: %s Number of cols: %s" % (rows, cols)
for row in range(rows):
for col in range(cols):
print "row, col is:", row+1, col+1,
thecell = sheet.cell(row, col)
# could get 'dump', 'value', 'xf_index'
print thecell.value,
xfx = sheet.cell_xf_index(row, col)
xf = book.xf_list[xfx]
bgx = xf.background.pattern_colour_index
print bgx
More info on the Python-Excel Google Group.
The Solution suggested by JMax works only for xls file, not for xlsx file. This raises a NotImplementedError: formatting_info=True not yet implemented. Xlrd library is still not updated to work for xlsx files. So you have to Save As and change the format every time which may not work for you.
Here is a solution for xlsx files using openpyxl library. A2 is the cell whose color code we need to find out.
import openpyxl
from openpyxl import load_workbook
excel_file = 'color_codes.xlsx'
wb = load_workbook(excel_file, data_only = True)
sh = wb['Sheet1']
color_in_hex = sh['A2'].fill.start_color.index # this gives you Hexadecimal value of the color
print ('HEX =',color_in_hex)
print('RGB =', tuple(int(color_in_hex[i:i+2], 16) for i in (0, 2, 4))) # Color in RGB
This function returns cell background's rgb value in tuple.
def getBGColor(book, sheet, row, col):
xfx = sheet.cell_xf_index(row, col)
xf = book.xf_list[xfx]
bgx = xf.background.pattern_colour_index
pattern_colour = book.colour_map[bgx]
#Actually, despite the name, the background colour is not the background colour.
#background_colour_index = xf.background.background_colour_index
#background_colour = book.colour_map[background_colour_index]
return pattern_colour
# say you have an Excel file called workbook
and inside it there is a worksheet called worksheet
# inside that worksheet is cell say C1 that is highlighted
and you want to get the color value of the highlighted cell
# Trying the openpyxl package
import openpyxl
# Importing all modules from the openpyxl package
from openpyxl import *
# reading ev2 excel workbook through load_workbook function
workbook = load_workbook("C:PATHTO/workbook.xlsx")
# Accessing existing worksheets
worksheet = workbook ["worksheet"]
## METHOD1 ##
# Getting the highlight property of the cell
highlight=str(worksheet ['C1'].fill)
# Printing out the value of the color
index=int(highlight.find("rgb='"))
print(highlight[index+5:index+13])
## METHOD 2 ##
print(worksheet ['C11'].fill.start_color.index)