How to achieve something like VBA's "Range.Find" using win32? - python

I'm looking to achieve functionality similar to the Range.Find method in VBA using win32com package in Python. I'm dealing with an Excel CSV file. While I have found lots of solutions using range(), it seems to require specifying a fixed range of cells, as opposed to Range.Find in VBA, which will auto search in worksheet without fixing the range.
Here is my code:
import win32com.client as client
excel= client.dynamic.Dispatch("Excel.Application")
excel.visible= True
wb= excel.workbooks.open(r"ExcelFile.xls")
ws= wb.worksheets('First')
### This able to extract information:
test_range= ws.Range("A1")
### Got issue AttributeError: 'function' object has no attribute 'Find':
test_range= ws.Range.Find("Series ID")
print(test_range.value)
Does it mean Range.Find method does not supported in win32 package or I point it with the wrong existing module?

Bonus answer: if you are a fan of the Excel API (10x to #ashleedawg comment), you can use it directly through xlwings:
import xlwings as xw
bookName = r'C:\somePath\hello.xlsx'
sheetName = 'Sheet1'
wb = xw.Book(bookName)
sht = wb.sheets[sheetName]
myCell = wb.sheets[sheetName].api.UsedRange.Find('test')
print('---------------')
print (myCell.address)
input()
Thus an input like this:
Nicely returns this:

So with the first part of the code some Excel file with random-like numbers is generated:
import xlsxwriter
from xlsxwriter.utility import xl_rowcol_to_cell
import xlrd
#First part of the code, used only to create some Excel file with data
wbk = xlsxwriter.Workbook('hello.xlsx')
wks = wbk.add_worksheet()
i = -1
for x in range(1, 1000, 11):
i+=1
cella = xl_rowcol_to_cell(i, 0) #0,0 is A1!
cellb = xl_rowcol_to_cell(i, 1)
cellc = xl_rowcol_to_cell(i, 2)
#print (cella)
wks.write(cella,x)
wks.write(cellb,x*3)
wks.write(cellc,x*4.5)
myPath= r'C:\Desktop\hello.xlsx'
wbk.close()
#SecondPart of the code
for sh in xlrd.open_workbook(myPath).sheets():
for row in range(sh.nrows):
for col in range(sh.ncols):
myCell = sh.cell(row, col)
print(myCell)
if myCell.value == 300.0:
print('-----------')
print('Found!')
print(xl_rowcol_to_cell(row,col))
quit()
With the second part of the code, the real "Searching" starts. In this case, we are searching for 300, which is actually one of the generated values from the first part of the code:
So, python starts looping through rows and columns, comparing the values with 300. If the value is found, it writes Found and stops searching:
This code can be actually re-written, with making the second part as a function (def).

If you want to do it with a function, this is a way to do it - defCell is the name of the function.
import xlsxwriter
import os
import xlrd
import time
from xlsxwriter.utility import xl_rowcol_to_cell
def findCell(sh, searchedValue):
for row in range(sh.nrows):
for col in range(sh.ncols):
myCell = sh.cell(row, col)
if myCell.value == searchedValue:
return xl_rowcol_to_cell(row, col)
return -1
myName = 'hello.xlsx'
wbk = xlsxwriter.Workbook(myName)
wks = wbk.add_worksheet()
i = -1
for x in range(1, 1000, 11):
i+=1
cella = xl_rowcol_to_cell(i, 0) #0,0 is A1!
cellb = xl_rowcol_to_cell(i, 1)
cellc = xl_rowcol_to_cell(i, 2)
wks.write(cella,x)
wks.write(cellb,x*3)
wks.write(cellc,x*4.5)
myPath= os.getcwd()+"\\"+myName
searchedValue = 300
for sh in xlrd.open_workbook(myPath).sheets():
print(findCell(sh, searchedValue))
input('Press ENTER to exit')
It produces this after running it:

Yes win32com can do the exact same range.find() function. The problem with your code is you didnt specify what is the range. Range has no Find attribute.
test_range= ws.Range.Find("Series ID") #<-----no range specified
Below is the correct use of Range and Find
import win32com.client as client
excel= client.dynamic.Dispatch("Excel.Application")
excel.visible= True
wb= excel.workbooks.open(r"ExcelFile.xls")
ws= wb.worksheets('First')
test_range= ws.Range("A1")
### example if you want to find out the column of search result
ResultColumn= test_range.Find("Series ID").Column
print(str(ResultColumn))

Related

Writing data to excel from a formula in python

What I want is with openpyxl to write a value I get form a len() or dups() to an excel cell.
Here are my imports:
import xlwings as xw
Here is the code:
#Load workbook
app = xw.App(visible = False)
wb = xw.Book(FilePath)
RawData_ws = wb.sheets['Raw Data']
Sheet1 = wb.sheets['Sheet 1']
RawData_ws['A1'] = (len(df.index))
Sheet1['B7'] = (len(df.index) - tot_dups))
RawData_ws['A2'] = (len(df.index)) #This one is after removing duplicate values
Tot_dups:
tot_dups = len(df.index)
I want the values of the different len() to show be written in the specific cells.
So, I already found the solution.
Change:
RawData_ws['A1'] = (len(df.index))
For:
RawData_ws['A1'].values = (len(df.index))

python xlwings won' t save the excel file in each iteration

This is a bit complex to explain, please ask if there are any doubts.
I have two excel files named, initial and updated. updated always has more sheets and maybe more rows in each sheet or changed values. I am trying to compare each sheet that exists in both initial and updated files and write and highlight the changes into a new excel file.
This is the code that i have.
from pathlib import Path
import pandas as pd
import numpy as np
import xlwings as xw
initial_version = Path.cwd() / "ConfigurationReport_TEST.xlsx"
updated_version = Path.cwd() / "ConfigurationReport_DEV2.xlsx"
excel1 = pd.ExcelFile(initial_version)
excel2 = pd.ExcelFile(updated_version)
lesser_sheetnames_dict = {}
greater_sheetnames_dict = {}
for idx, value in enumerate(excel1.sheet_names if len(excel1.sheet_names) < len(excel2.sheet_names) else excel2.sheet_names):
lesser_sheetnames_dict[idx] = value
for idx, value in enumerate(excel1.sheet_names if len(excel1.sheet_names) > len(excel2.sheet_names) else excel2.sheet_names):
greater_sheetnames_dict[idx] = value
print(lesser_sheetnames_dict)
print(len(lesser_sheetnames_dict))
print(len(greater_sheetnames_dict))
for sheetnum,sheetname in lesser_sheetnames_dict.items():
if sheetname not in greater_sheetnames_dict.values():
continue
else:
df1 = pd.read_excel(initial_version,sheet_name=sheetname)
df2 = pd.read_excel(updated_version,sheet_name=sheetname)
df1 = df1.fillna('')
df2 = df2.fillna('')
df2 = df2.reset_index()
df3 = pd.merge(df1,df2,how='outer',indicator='Exist')
df3 = df3.query("Exist != 'both'")
df_highlight_right = df3.query("Exist == 'right_only'")
df_highlight_left = df3.query("Exist == 'left_only'")
highlight_rows_right = df_highlight_right['index'].tolist()
highlight_rows_right = [int(row) for row in highlight_rows_right]
first_row_in_excel = 2
highlight_rows_right = [x + first_row_in_excel for x in highlight_rows_right]
with xw.App(visible=False) as app:
updated_wb = app.books.open(updated_version)
print(updated_wb.sheets([x+1 for x in greater_sheetnames_dict.keys() if greater_sheetnames_dict[x] == sheetname][0]))
updated_ws = updated_wb.sheets([x+1 for x in greater_sheetnames_dict.keys() if greater_sheetnames_dict[x] == sheetname][0])
rng = updated_ws.used_range
print(f"Used Range: {rng.address}")
# Hightlight the rows in Excel
for row in rng.rows:
if row.row in highlight_rows_right:
row.color = (255, 71, 76) # light red
updated_wb.save(Path.cwd() / "Difference_Highlighted.xlsx")
The problem that im facing is in the with block. Ideally this code should run for each sheet that exists in both the files and highlight the changes and save it into a new file.
But in this case, it runs for each sheet that exists in both the files, but only highlights and saves the last sheet.
Being my first interaction with xlwings library, i have very little idea on how that block works. Any assistance will be much appreciated.
I feel stupid to post this question now. The error was because of the the scope of with block. Since it was inside the if block, it kept opening a workbook every single time, wrote to it, highlighted the changes of the current sheet that's being iterated on, then saved it. Obviously, during the last iteration(sheet) it opened the file again, wrote to it, highlighted the changes and overwritten the previously saved file.
To avoid this, I moved the with block's opening statement to before if block, and now it works perfectly as intended.
with xw.App(visible=False) as app:
for sheetnum,sheetname in lesser_sheetnames_dict.items():
if sheetname not in greater_sheetnames_dict.values():
continue
else:
// code
updated_wb.save(Path.cwd() / "Difference_Highlighted.xlsx")

Reading a named range from excel - Python - xlrd

Following is the piece of code that I wrote, and I'm unable to proceed beyond reading the range. I want to be able to read the actual content of the range. Any help is appreciated.
import xlrd
xlBook = xlrd.open_workbook('data.xlsx')
# Open the sheet
sht = xlBook.sheet_by_name('calc')
# Look at the named range
named_obj = xlBook.name_map['load'][0]
# Now, able to retrieve the range
rangeData = (named_obj.formula_text)
(sheetName,ref) = rangeData.split('!')
# Gives me the range as $A$2:$B$20
print(ref)
# How do I print the contents of the cells knowing the range.
My method is to find out his column coordinates,
but I still recommend using openpyxl to be more intuitive.
def col2int(s: str):
weight = 1
n = 0
list_s = list(s)
while list_s:
n += (ord(list_s.pop()) - ord('A')+1) * weight
weight *= 26
return n
# ...
# How do I print the contents of the cells knowing the range. ↓
temp, col_start, row_start, col_end, row_end = ref.replace(':', '').split('$')
for row in range(int(row_start)-1, int(row_end)):
for col in range(col2int(col_start)-1, col2int(col_end)):
print(sht.cell(row, col).value)
the xlrd, xlwt, xlutils are meant for .xls files per their documentation. It recommends openpyxl for .xlsx files. Then you can use this:
Read values from named ranges with openpyxl

Get the absolute value of a sum in an Excel sheet using openpyxl

I am starting to use openpyxl and I want to copy the sum of a row.
In Excel the value is 150, but when I try to print it, the output I get is the formula, not the actual value:
=SUM(B1:B19)
The script I use is:
print(ws["B20"].value)
Using "data_only" didn't work.
wb = ("First_File_b.xlsx" , data_only=True)
Any idea how I can solve to obtain the numerical value? Help would be greatly appreciated.
Okay, here's a simple example
I have created a spreadsheet with first spreadsheet "Feuil1" (french version) which contains A1,...,A7 as 1,2,3,4,5,6,7 and A8=SUM(A1:A7)
Here's the code, that could be adapted maybe to other operators. maybe not so simply. It also supports ranges from A1:B12 for instance, untested and no parsing support for cols like AA although could be done.
import openpyxl,re
fre = re.compile(r"=(\w+)\((\w+):(\w+)\)$")
cre = re.compile(r"([A-Z]+)(\d+)")
def the_sum(a,b):
return a+b
d=dict()
d["SUM"] = the_sum
def get_evaluated_value(w,sheet_name,cell_name):
result = w[sheet_name][cell_name].value
if isinstance(result,int) or isinstance(result,float):
pass
else:
m = fre.match(result)
if m:
g = m.groups()
operator=d[g[0]] # ATM only sum is supported
# compute range
mc1 = cre.match(g[1])
mc2 = cre.match(g[2])
start_col = ord(mc1.group(1))
end_col = ord(mc2.group(1))
start_row = int(mc1.group(2))
end_row = int(mc2.group(2))
result = 0
for i in range(start_col,end_col+1):
for j in range(start_row,end_row+1):
c = chr(i)+str(j)
result = operator(result,w["Feuil1"][c].value)
return result
w = openpyxl.load_workbook(r"C:\Users\dartypc\Desktop\test.xlsx")
print(get_evaluated_value(w,"Feuil1","A2"))
print(get_evaluated_value(w,"Feuil1","A8"))
output:
2
28
yay!
I have solved the matter using a combination of openpyxl and pandas:
import pandas as pd
import openpyxl
from openpyxl import Workbook , load_workbook
source_file = "Test.xlsx"
# write to file
wb = load_workbook (source_file)
ws = wb.active
ws.title = "hello world"
ws.append ([10,10])
wb.save(source_file)
# read from file
df = pd.read_excel(source_file)
sum_jan = df ["Jan"].sum()
print (sum_jan)

Getting my output into another excel file

import os, sys
from xlrd import open_workbook
from xlutils.copy import copy
from xlwt import easyxf, Style
import time
rb = open_workbook('A1.xls', on_demand=True,formatting_info =True)
rs = rb.sheet_by_index(0)
wb = copy(rb)
ws = wb.get_sheet(0)
start =time.time()
g1 = dict()
for row in range(1,rs.nrows):
for cell in row:
cellContent = str(cell.value)
if cellContent not in g1.keys():
g1[cellContent]=1
else:
g1[cellContent]=g1[cellContent]+1
for cellContent in g1.keys():
print cellContent, g1[cellContent]
ws.write(row,1, cellContent)
wb.save('A2.xls')
When I run this code, I get the error message cell object not iterable
What could have gone wrong?
I am not familiar myself with xlrd or any of the other modules, but doing any work with csv or excel spreadsheets, I use Pandas, specifically this link. It allows you to easily read and make all sorts of modifications, and then write it out very easily as well. If all you wanted was to copy it would be really easy.
The problem you've got is that row is an integer, as it's populated using for row in range(1, rs.nrows): where the range() function returns an integer - In your case what I presume is each row number between 1 and the number of rows in your spreadsheet.
I'm not familiar with how the xlrd, xlutils and xlwt modules work, but I'd imagine you want to do something more like the following:
for row_number in range(1, rs.nrows):
row = rs.row(row_number)
for cell in row:
....
The Sheet.row(rowx) method gives you a sequence of Cell objects that you can iterate in your inner loop.

Categories

Resources