How to paste values only in Excel using Python and openpyxl

How to paste values only in Excel using Python and openpyxl - python

I have an Excel worksheet.
In column J i have some some source data which i used to make calculations in column K.
Column K has the values I need, but when i click on a cell the formula shows up.
I only want the values from column K, not the formula.
I read somewhere that i need to set data only=True, which I have done.
I then pasted data from Column K to Column L(with the intention of later deleting Columns J and K).
I thought that Column L will have only the values from K but if i click on a cell, the formula still shows up.
How do I simply paste values only from one column to another?
import openpyxl
wb = openpyxl.load_workbook('edited4.xlsx', data_only=True)
sheet = wb['Sheet1']
last_row = 100
for i in range(2, last_row):
cell = "K" + str(i)
a_cell = "J" + str(i)
sheet[cell] = '=IF(' + a_cell + '="R","Yes","No")'
rangeselected = []
for i in range (1, 100,1):
rangeselected.append(sheet.cell(row = i, column = 11).value)
for i in range (1, 1000,1):
sheet.cell(row=i, column=12).value = rangeselected[i-1]
wb.save('edited4.xlsx')

It's been a while since I've used openpyxl. But:
Openpyxl doesn't run an Excel formula. It reads either the formula string or the results of the last calculation run by Excel*. This means that if a calculation is created outside of Excel, and the file has never been open by Excel, then only the formula will be available. Unless you need to display (for historical purposes, etc.) what the formula is, you should do the calculation in Python - which will be faster and more efficient anyway.
* When I say Excel, I also include any Excel-like spreadsheet that will cache the results of the last run.
Try this (adjust column numbers as desired):
import openpyxl
wb = openpyxl.load_workbook('edited4.xlsx', data_only=True)
sheet = wb['Sheet1']
last_row = 100
data_column = 11
test_column = 12
result_column = 13
for i in range(2, last_row):
if sheet.cell(row=i, column=test_column).value == "R":
sheet.cell(row=i, column=result_column).value = "Yes"
else:
sheet.cell(row=i, column=result_column).value = "No"
wb.save('edited4.xlsx')
If you have a well-formed data sheet, you could probably shorten this by another step or two by using enumerate() and Worksheet.iter_rows() but I'll leave that to your imagination.

Related

Python openpyxl to automate entire column in excel

import openpyxl
i=2
workbook= openpyxl.load_workbook()
sheet = workbook.active
for i, cellObj in enumerate (sheet['I'],2):
cellObj.value = '=IF(ISNUMBER(A2)*(A2<>0),A2,IF(ISNUMBER(F2)*(F2<>0),F2,IF(ISBLANK(A2)*ISBLANK(F2)*ISBLANK(H2),0,H2)))'
workbook.save()
Using openpxl, I tried to apply formula to entire column 'I' its not working as per the formula, I wanted formula to start from I2 but its start from I1 and wrong output as well.
I have attached a screenshot.
.
Can someone please correct the code?
Output of print(list(enumerate(sheet['I']))):

You'd probably be better off to do it this way, auto skip row 1 by starting the iteration at row 2 and update the formula using the cell row number.
import openpyxl
excelfile = 'foo.xlsx'
workbook= openpyxl.load_workbook(excelfile)
sheet = workbook.active
mr = sheet.max_row # Last row to add formula to
for row in sheet.iter_rows(min_col=9, max_col=9, min_row=2, max_row=mr):
for cell in row:
cr = cell.row # Get the current row number to use in formula
cell.value = f'=IF(ISNUMBER(A{cr})*(A{cr} <> 0), A{cr}, IF(ISNUMBER(F{cr})*(F{cr} <> 0), F{cr}, IF(ISBLANK(A{cr})*ISBLANK(F{cr})*ISBLANK(H{cr}), 0, H{cr})))'
workbook.save(excelfile)

If you know the from and to row numbers, then you can use it like this:
from openpyxl import load_workbook
wb = load_workbook(filename="/content/sample_data/Book1.xlsx")
ws = wb.active
from_row = 2
to_row = 4
for i in range(from_row, to_row+1):
ws[f"C{i}"] = f'=_xlfn.CONCAT(A{i}, "_", B{i})'
wb.save("/content/sample_data/formula.xlsx")
Input (Book1.xlsx):
Output (formula.xlsx):
I don't have your data, so I did not test the following formula; but your formula can be translated to format string as:
for i in range(from_row, to_row+1):
ws[f"I{i}"] = f'=IF(ISNUMBER(A{i})*(A{i}<>0),A{i},IF(ISNUMBER(F{i})*(F{i}<>0),F{i},IF(ISBLANK(A{i})*ISBLANK(F{i})*ISBLANK(H{i}),0,H{i})))'
It formats the formula as:
=IF(ISNUMBER(A2)*(A2<>0),A2,IF(ISNUMBER(F2)*(F2<>0),F2,IF(ISBLANK(A2)*ISBLANK(F2)*ISBLANK(H2),0,H2)))
=IF(ISNUMBER(A3)*(A3<>0),A3,IF(ISNUMBER(F3)*(F3<>0),F3,IF(ISBLANK(A3)*ISBLANK(F3)*ISBLANK(H3),0,H3)))
=IF(ISNUMBER(A4)*(A4<>0),A4,IF(ISNUMBER(F4)*(F4<>0),F4,IF(ISBLANK(A4)*ISBLANK(F4)*ISBLANK(H4),0,H4)))

Openpyxl - Copy range of cells(with formula) from a workbook to another

I'm trying to copy specific rows from Workbook 1 and append it to the existing data in Workbook 2.
Copy the highlighed rows from
Workbook 1,
and append them in Workbook 2 below 'March'
So far I succeeded to copy and paste the range, but there are two problems:
1.Cells are a shifted
2.The percentage(formula) is missing, leaving only numeric values.
See Result here
import openpyxl as xl
source = r"C:\Users\Desktop\Test_project_20200401.xlsx"
wbs = xl.load_workbook(source)
wbs_sheet = wbs["P2"] #selecting the sheet
destination = r"C:\Users\Desktop\Try999.xlsx"
wbd = xl.load_workbook(destination)
wbd_sheet = wbd["A3"] #select the sheet
row_data = 0
for row in wbs_sheet.iter_rows():
for cell in row:
if cell.value == "Yes":
row_data += cell.row
for row in wbs_sheet.iter_rows(min_row=row_data, min_col = 1, max_col=250, max_row = row_data+1):
wbd_sheet.append((cell.value for cell in row))
wbd.save(destination)
Does anyone have any idea on how can I solve this?
Any feedback/solution would help!
Thanks!

I think min_col should = 0
Range("A1").Formula (in VBA) gets the formula.
Range("A1").Value (in VBA) gets the value.
So try using .formula in Python
(thanks to: Get back a formula from a cell - VBA ... if this works)

Just want to add my own solution in here.
What I did, was to iterate through the columns and apply "cell.number_format = '0%', which converts your cell value to percentage.
for col in ws.iter_cols(min_row=1, min_col=2, max_row=250, max_col=250):
for cell in col:
cell.number_format = '0%'
More info can be found in here:
https://openpyxl.readthedocs.io/en/stable/_modules/openpyxl/styles/numbers.html

got weird results when using the openpyxl module in Python

I have an excel spreadsheet whose data are as follows (from A1 to C3):
original spreadsheet
I want to calculate the chances of none-zero values in a column, then write the result to the last cell in this column. In the case of the third column, the result should be 2/3 = 0.67
Below is the Python script that I wrote to do the same thing, but it gets the wrong result obviously.
The code:
import openpyxl
wb = openpyxl.load_workbook('testXls.xlsx')
sheet = wb.active
for colNum in range(1, sheet.max_column + 1):
coverCount = 0
for rowNum in range(1, sheet.max_row + 1):
if sheet.cell(row=rowNum, column=colNum).value != 0:
coverCount += 1
sheet.cell(row=4, column=colNum).value = round(coverCount / 3, 2)
wb.save('testXls2.xlsx')
The result:
result spreadsheet
I can't find anything wrong in the code. Could someone enlighten me on this please? I really appreciate it.

The problem is that you write down the value in the fourth line of each iteration, which leads to unnecessary iteration.

Replace missing values in excel worksheet using openpyxl module

I’m trying to replace cells in my Excel worksheet that contains hyphen “-“ with the average value between the above lying cell and the below lying cell. I’ll been trying to do this by looping through each row in column 3
import math
from openpyxl import load_workbook
import openpyxl
d_filename="Snow.xlsx"
wb = load_workbook(d_filename)
sheet_ranges=wb["PIT 1"]'
def interpolatrion_of_empty_cell():
for i in range(7,31):
if i =="-":
sheet_ranges.cell(row = i, column = 3).value = mean(i-1,i+1)
else:
sheet_ranges.cell(row = i, column = 3).value
wb.save(filename = d_filename)
is this just to easy to do or is it not possible with openpyxl?
cheers//
Smiffo

The reason values are not replaced is that you use i to check if its equal to -. i is an index, not the value of a cell. Also to calculate the mean, you are using indices, not the values of top and below cells.
So you could solve this in following way:
def interpolatrion_of_empty_cell():
for i in range(7,31):
cell_value = sheet_ranges.cell(row=i, column=3).value
if cell_value == "-":
top_value = sheet_ranges.cell(row=i+1, column=3).value
bottom_value = sheet_ranges.cell(row=i - 1, column=3).value
sheet_ranges.cell(row=i, column=3).value = (float(top_value) + float(bottom_value))/2
Not that this may require tweaking, as it does not accout for cases where tob and bottom rows are -, not numbers, or just empty cells.

Openpyxl optimizing cells search speed

I need to search the Excel sheet for cells containing some pattern. It takes more time than I can handle. The most optimized code I could write is below. Since the data patterns are usually row after row so I use iter_rows(row_offset=x). Unfortunately the code below finds the given pattern an increasing number of times in each for loop (starting from milliseconds and getting up to almost a minute). What am I doing wrong?
import openpyxl
import datetime
from openpyxl import Workbook
wb = Workbook()
ws = wb.active
ws.title = "test_sheet"
print("Generating quite big excel file")
for i in range(1,10000):
for j in range(1,20):
ws.cell(row = i, column = j).value = "Cell[{},{}]".format(i,j)
print("Saving test excel file")
wb.save('test.xlsx')
def FindXlCell(search_str, last_r):
t = datetime.datetime.utcnow()
for row in ws.iter_rows(row_offset=last_r):
for cell in row:
if (search_str == cell.value):
print(search_str, last_r, cell.row, datetime.datetime.utcnow() - t)
last_r = cell.row
return last_r
print("record not found ",search_str, datetime.datetime.utcnow() - t)
return 1
wb = openpyxl.load_workbook("test.xlsx", data_only=True)
t = datetime.datetime.utcnow()
ws = wb["test_sheet"]
last_row = 1
print("Parsing excel file in a loop for 3 cells")
for i in range(1,100,1):
last_row = FindXlCell("Cell[0,0]", last_row)
last_row = FindXlCell("Cell[1000,6]", last_row)
last_row = FindXlCell("Cell[6000,6]", last_row)

Looping over a worksheet multiple times is inefficient. The reason for the search getting progressively slower looks to be increasingly more memory being used in each loop. This is because last_row = FindXlCell("Cell[0,0]", last_row) means that the next search will create new cells at the end of the rows: openpyxl creates cells on demand because rows can be technically empty but cells in them are still addressable. At the end of your script the worksheet has a total of 598000 rows but you always start searching from A1.
If you wish to search a large file for text multiple times then it would probably make sense to create a matrix keyed by the text with the coordinates being the value.
Something like:
matrix = {}
for row in ws:
for cell in row:
matrix[cell.value] = (cell.row, cell.col_idx)
In a real-world example you'd probably want to use a defaultdict to be able to handle multiple cells with the same text.
This could be combined with read-only mode for a minimal memory footprint. Except, of course, if you want to edit the file.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to paste values only in Excel using Python and openpyxl - python

Related

Python openpyxl to automate entire column in excel

Openpyxl - Copy range of cells(with formula) from a workbook to another

got weird results when using the openpyxl module in Python

Replace missing values in excel worksheet using openpyxl module

Openpyxl optimizing cells search speed

Categories

Resources