got weird results when using the openpyxl module in Python - python

I have an excel spreadsheet whose data are as follows (from A1 to C3):
original spreadsheet
I want to calculate the chances of none-zero values in a column, then write the result to the last cell in this column. In the case of the third column, the result should be 2/3 = 0.67
Below is the Python script that I wrote to do the same thing, but it gets the wrong result obviously.
The code:
import openpyxl
wb = openpyxl.load_workbook('testXls.xlsx')
sheet = wb.active
for colNum in range(1, sheet.max_column + 1):
coverCount = 0
for rowNum in range(1, sheet.max_row + 1):
if sheet.cell(row=rowNum, column=colNum).value != 0:
coverCount += 1
sheet.cell(row=4, column=colNum).value = round(coverCount / 3, 2)
wb.save('testXls2.xlsx')
The result:
result spreadsheet
I can't find anything wrong in the code. Could someone enlighten me on this please? I really appreciate it.

The problem is that you write down the value in the fourth line of each iteration, which leads to unnecessary iteration.

Related

Python openpyxl to automate entire column in excel

import openpyxl
i=2
workbook= openpyxl.load_workbook()
sheet = workbook.active
for i, cellObj in enumerate (sheet['I'],2):
cellObj.value = '=IF(ISNUMBER(A2)*(A2<>0),A2,IF(ISNUMBER(F2)*(F2<>0),F2,IF(ISBLANK(A2)*ISBLANK(F2)*ISBLANK(H2),0,H2)))'
workbook.save()
Using openpxl, I tried to apply formula to entire column 'I' its not working as per the formula, I wanted formula to start from I2 but its start from I1 and wrong output as well.
I have attached a screenshot.
.
Can someone please correct the code?
Output of print(list(enumerate(sheet['I']))):
You'd probably be better off to do it this way, auto skip row 1 by starting the iteration at row 2 and update the formula using the cell row number.
import openpyxl
excelfile = 'foo.xlsx'
workbook= openpyxl.load_workbook(excelfile)
sheet = workbook.active
mr = sheet.max_row # Last row to add formula to
for row in sheet.iter_rows(min_col=9, max_col=9, min_row=2, max_row=mr):
for cell in row:
cr = cell.row # Get the current row number to use in formula
cell.value = f'=IF(ISNUMBER(A{cr})*(A{cr} <> 0), A{cr}, IF(ISNUMBER(F{cr})*(F{cr} <> 0), F{cr}, IF(ISBLANK(A{cr})*ISBLANK(F{cr})*ISBLANK(H{cr}), 0, H{cr})))'
workbook.save(excelfile)
If you know the from and to row numbers, then you can use it like this:
from openpyxl import load_workbook
wb = load_workbook(filename="/content/sample_data/Book1.xlsx")
ws = wb.active
from_row = 2
to_row = 4
for i in range(from_row, to_row+1):
ws[f"C{i}"] = f'=_xlfn.CONCAT(A{i}, "_", B{i})'
wb.save("/content/sample_data/formula.xlsx")
Input (Book1.xlsx):
Output (formula.xlsx):
I don't have your data, so I did not test the following formula; but your formula can be translated to format string as:
for i in range(from_row, to_row+1):
ws[f"I{i}"] = f'=IF(ISNUMBER(A{i})*(A{i}<>0),A{i},IF(ISNUMBER(F{i})*(F{i}<>0),F{i},IF(ISBLANK(A{i})*ISBLANK(F{i})*ISBLANK(H{i}),0,H{i})))'
It formats the formula as:
=IF(ISNUMBER(A2)*(A2<>0),A2,IF(ISNUMBER(F2)*(F2<>0),F2,IF(ISBLANK(A2)*ISBLANK(F2)*ISBLANK(H2),0,H2)))
=IF(ISNUMBER(A3)*(A3<>0),A3,IF(ISNUMBER(F3)*(F3<>0),F3,IF(ISBLANK(A3)*ISBLANK(F3)*ISBLANK(H3),0,H3)))
=IF(ISNUMBER(A4)*(A4<>0),A4,IF(ISNUMBER(F4)*(F4<>0),F4,IF(ISBLANK(A4)*ISBLANK(F4)*ISBLANK(H4),0,H4)))

How can I copy a range of data in excel (B2:B15) down with openpyxl?

I'm a Python beginner and I made a script to extract data into an xlsx file with openpyxl but I'm stuck with a problem which seems pretty easy. I'd like to copy(not move) the yellow data to the green cells in the following Excel file:
Or said in another way, I want to copy B2:B15 to B16:B29 within my python script. I don't need help with the import of openpyxl or creation of my ws it´s just the specific code that allows to copy the B2:B15 to B16:B29 which I don't get.
I appreciate any help! Ty so much.
I tried the following which didn´t work at all:
for row in range(16,29):
for col in range(1,2):
char = get_column_letter(col)
ws[char + str(row)] = ws(['B2:B15'].value)
If ws is your worksheet, then the code to do that is...
for row in range(16,30):
ws.cell(row=row, column=2).value = ws.cell(row=row-14, column=2).value
Updated below for doing this multiple times
Repeat = 5 #Indicate how many times you want to paste the 15 rows
for cycle in range(1, Repeat+1):
for row in range(14):
ws.cell(row=row+2+(cycle)*14, column=2).value = ws.cell(row=row+2, column=2).value

How can I loop through and increment the rows in an excel workbook formula in Python?

This is a continuation of the this question How can I iterate through excel files sheets and insert formula in Python?
I decided to have it on new thread as its another issue. I'm interested in copying a formula to a column across the rows in a number of workbooks. My code is below and the problem is in the for loop.
import openpyxl
in_folder = r'C:\xxx' #Input folder
out_folder = r'C:\yyy' #Output folder
if not os.path.exists(out_folder):
os.makedirs(out_folder)
dir_list = os.listdir(in_folder)
print(dir_list)
for xlfile in dir_list:
if xlfile.endswith('.xlsx') or xlfile.endswith('.xls'):
str_file = xlfile
work_book = openpyxl.load_workbook(os.path.join(in_folder,str_file))
work_sheet = work_book['Sheet1']
for i, cellObj in enumerate(work_sheet['U'], 1): #The cell where the formula is to be inserted and iterated down to the last row
cellObj.value = '=Q2-T2' #Cells value. This is where I'm going wrong but I'm not sure of the best way to have '=Q3-T3' etc till the last row. For each iteration, Q2 and T2 will be incremented to Q3 and T3 till the last row in the dataset.
work_book.save(os.path.join(out_folder, xlfile)) #Write the excel sheet with formulae to another folder
How can I increment the rows in the formula as I loop through the active worksheet to the end? More details in the comments next to the code.
maybe you could just try formatting the string?
...
row_count = 2
for i, cellObj in enumerate(work_sheet['U'], 1):
cellObj.value = f'=Q{row_count}-T{row_count}'
work_book.save(os.path.join(out_folder, xlfile))
row_count += 1

How to paste values only in Excel using Python and openpyxl

I have an Excel worksheet.
In column J i have some some source data which i used to make calculations in column K.
Column K has the values I need, but when i click on a cell the formula shows up.
I only want the values from column K, not the formula.
I read somewhere that i need to set data only=True, which I have done.
I then pasted data from Column K to Column L(with the intention of later deleting Columns J and K).
I thought that Column L will have only the values from K but if i click on a cell, the formula still shows up.
How do I simply paste values only from one column to another?
import openpyxl
wb = openpyxl.load_workbook('edited4.xlsx', data_only=True)
sheet = wb['Sheet1']
last_row = 100
for i in range(2, last_row):
cell = "K" + str(i)
a_cell = "J" + str(i)
sheet[cell] = '=IF(' + a_cell + '="R","Yes","No")'
rangeselected = []
for i in range (1, 100,1):
rangeselected.append(sheet.cell(row = i, column = 11).value)
for i in range (1, 1000,1):
sheet.cell(row=i, column=12).value = rangeselected[i-1]
wb.save('edited4.xlsx')
It's been a while since I've used openpyxl. But:
Openpyxl doesn't run an Excel formula. It reads either the formula string or the results of the last calculation run by Excel*. This means that if a calculation is created outside of Excel, and the file has never been open by Excel, then only the formula will be available. Unless you need to display (for historical purposes, etc.) what the formula is, you should do the calculation in Python - which will be faster and more efficient anyway.
* When I say Excel, I also include any Excel-like spreadsheet that will cache the results of the last run.
Try this (adjust column numbers as desired):
import openpyxl
wb = openpyxl.load_workbook('edited4.xlsx', data_only=True)
sheet = wb['Sheet1']
last_row = 100
data_column = 11
test_column = 12
result_column = 13
for i in range(2, last_row):
if sheet.cell(row=i, column=test_column).value == "R":
sheet.cell(row=i, column=result_column).value = "Yes"
else:
sheet.cell(row=i, column=result_column).value = "No"
wb.save('edited4.xlsx')
If you have a well-formed data sheet, you could probably shorten this by another step or two by using enumerate() and Worksheet.iter_rows() but I'll leave that to your imagination.

Write formula to Excel with Python error

I try to follow this question to add some formula in my excel using python and openpyxl package.
That link is what i need for my task.
but in this code :
for i, cellObj in enumerate(Sheet.columns[2], 1):
cellObj.value = '=IF($A${0}=$B${0}, "Match", "Mismatch")'.format(i)
i take an error at Sheet.columns[2] any idea why ? i follow the complete code.
i have python 2.7.13 version if that helps for this error.
****UPDATE****
COMPLETE CODE :
import openpyxl
wb = openpyxl.load_workbook('test1.xlsx')
print wb.get_sheet_names()
Sheet = wb.worksheets[0]
for i, cellObj in enumerate(Sheet.columns[2], 1):
cellObj.value = '=IF($A${0}=$B${0}, "Match", "Mismatch")'.format(i)
error message :
for i, cellObj in enumerate(Sheet.columns[2], 1):
TypeError: 'generator' object has no attribute 'getitem'
ws.columns and ws.rows are properties that return generators. But openpyxl also supports slicing and indexing for rows and columns
So, ws['C'] will give a list of the cells in the third column.
For other Stack adventurers looking to copy/paste a formula:
# Writing from pandas back to an existing EXCEL workbook
wb = load_workbook(filename=myfilename, read_only=False, keep_vba=True)
ws = wb['Mysheetname']
# Paste a formula Vlookup! Look at column A, put result in column AC.
for i, cellObj in enumerate(ws['AC'], 1):
cellObj.value = "=VLOOKUP($A${0}, 'LibrarySheet'!C:D,2,FALSE)".format(i)
One issue, I have a header and the formula overwrites it. Anyone know how to start from row 2?
If you want to start from another row you can either use an if statement to skip the first row, or specify the range in the enumeration. A coded example is below:
wb = load_workbook(filename=myfilename, read_only=False, keep_vba=True)
ws = wb['Mysheetname']
# using an if statement
for i, cellObj in enumerate(ws['AC'], 1):
if i > 1:
cellObj.value = "=VLOOKUP($A${0}, 'LibrarySheet'!C:D,2,FALSE)".format(i)
# specifying range, up to max row on worksheet - or you can specify an exact range
for i, cellObj in enumerate(ws['AC2:AC'+str(ws.max_row)],2):
cellObj[0].value = "=VLOOKUP($A${0}, 'LibrarySheet'!C:D,2,FALSE)".format(i)
The second method requires you to begin the index at 2 and returns a tuple rather than a cell object, so you need to specify cellObj[0].value to return the value of the cell object.
fortunately now you can easy do formulas in certain records. Also there are simpler functions to use, such as:
wb.sheetnames instead of wb.read_sheet_names()
sheet = wb['SHEET_NAME'] instead of sheet = wb.get_sheet_by_name('SHEET_NAME')
And formulas can be easily inserted with:
sheet['A1'] = '=SUM(1+1)'

Categories

Resources