This is a bit complex to explain, please ask if there are any doubts.
I have two excel files named, initial and updated. updated always has more sheets and maybe more rows in each sheet or changed values. I am trying to compare each sheet that exists in both initial and updated files and write and highlight the changes into a new excel file.
This is the code that i have.
from pathlib import Path
import pandas as pd
import numpy as np
import xlwings as xw
initial_version = Path.cwd() / "ConfigurationReport_TEST.xlsx"
updated_version = Path.cwd() / "ConfigurationReport_DEV2.xlsx"
excel1 = pd.ExcelFile(initial_version)
excel2 = pd.ExcelFile(updated_version)
lesser_sheetnames_dict = {}
greater_sheetnames_dict = {}
for idx, value in enumerate(excel1.sheet_names if len(excel1.sheet_names) < len(excel2.sheet_names) else excel2.sheet_names):
lesser_sheetnames_dict[idx] = value
for idx, value in enumerate(excel1.sheet_names if len(excel1.sheet_names) > len(excel2.sheet_names) else excel2.sheet_names):
greater_sheetnames_dict[idx] = value
print(lesser_sheetnames_dict)
print(len(lesser_sheetnames_dict))
print(len(greater_sheetnames_dict))
for sheetnum,sheetname in lesser_sheetnames_dict.items():
if sheetname not in greater_sheetnames_dict.values():
continue
else:
df1 = pd.read_excel(initial_version,sheet_name=sheetname)
df2 = pd.read_excel(updated_version,sheet_name=sheetname)
df1 = df1.fillna('')
df2 = df2.fillna('')
df2 = df2.reset_index()
df3 = pd.merge(df1,df2,how='outer',indicator='Exist')
df3 = df3.query("Exist != 'both'")
df_highlight_right = df3.query("Exist == 'right_only'")
df_highlight_left = df3.query("Exist == 'left_only'")
highlight_rows_right = df_highlight_right['index'].tolist()
highlight_rows_right = [int(row) for row in highlight_rows_right]
first_row_in_excel = 2
highlight_rows_right = [x + first_row_in_excel for x in highlight_rows_right]
with xw.App(visible=False) as app:
updated_wb = app.books.open(updated_version)
print(updated_wb.sheets([x+1 for x in greater_sheetnames_dict.keys() if greater_sheetnames_dict[x] == sheetname][0]))
updated_ws = updated_wb.sheets([x+1 for x in greater_sheetnames_dict.keys() if greater_sheetnames_dict[x] == sheetname][0])
rng = updated_ws.used_range
print(f"Used Range: {rng.address}")
# Hightlight the rows in Excel
for row in rng.rows:
if row.row in highlight_rows_right:
row.color = (255, 71, 76) # light red
updated_wb.save(Path.cwd() / "Difference_Highlighted.xlsx")
The problem that im facing is in the with block. Ideally this code should run for each sheet that exists in both the files and highlight the changes and save it into a new file.
But in this case, it runs for each sheet that exists in both the files, but only highlights and saves the last sheet.
Being my first interaction with xlwings library, i have very little idea on how that block works. Any assistance will be much appreciated.
I feel stupid to post this question now. The error was because of the the scope of with block. Since it was inside the if block, it kept opening a workbook every single time, wrote to it, highlighted the changes of the current sheet that's being iterated on, then saved it. Obviously, during the last iteration(sheet) it opened the file again, wrote to it, highlighted the changes and overwritten the previously saved file.
To avoid this, I moved the with block's opening statement to before if block, and now it works perfectly as intended.
with xw.App(visible=False) as app:
for sheetnum,sheetname in lesser_sheetnames_dict.items():
if sheetname not in greater_sheetnames_dict.values():
continue
else:
// code
updated_wb.save(Path.cwd() / "Difference_Highlighted.xlsx")
Related
I'm trying to update a single cell in an existing excel file.
Here is my code:
file=(r'C:/Users/user/Desktop/test.xls')
df=pd.read_excel(file)
code=input('Patiste Kodiko:')
size=0
sizeint=int(input('Patiste Noumero:'))
given=int(input('Posa efigan?:'))
oldstock=(df[size].where(df['ΚΩΔΙΚΟΣ']==code))
oldstock=oldstock.dropna()
oldstock=oldstock.values[0]
oldstock = int(oldstock)
newstock = oldstock - given
x=(df['Α/Α'].where(df['ΚΩΔΙΚΟΣ']==code)+2)
x=x.dropna()
x = int(x)
dffin=df.at[x,size] = newstock
dffin.to_excel(file)
close()
After running this code, I receive an empty .xls file with only one cell written and everything else empty.
What am I missing here?
Thanks in advance.
You should be able to do a quick df.at function if you have your X and y cords or names.
import pandas as pd
fileLocation = (r'TestExcelsheet.xlsx')
excel = pd.read_excel(FileLocation,converters={'NimikeNro':str})
excel.dtypes
print(excel.index)
print(excel.head)
excel.at[1,'One'] = 444
print(excel)
excel.to_excel('TestExcelsheet.xlsx')
Where it's the Excel.at function you need to use to add data at a single cell and use a for loop for more than one cell
I am hoping you can help me - I'm sure its likely a small thing to fix, when one knows how.
In my workshop, neither I nor my colleagues can make 'find and replace all' changes via the front-end of our database. The boss just denies us that level of access. If we need to make changes to dozens or perhaps hundreds of records it must all be done by copy-and-paste or similar means. Craziness.
I am trying to make a workaround to that with Python 2 and in particular libraries such as Pandas, pyautogui and xlrd.
I have researched serval StackOverflow threads and have managed thus far to write some code that works well at reading a given XL file .In production, this will be a file exported from a found data set in the database GUI front-end and will be just a single column of 'Article Numbers' for the items in the computer workshop. This will always have an Excel column header. E.g
ANR
51234
34567
12345
...
All the records numbers are 5 digit numbers.
We also have the means of scanning items with an IR scanner to a 'Workflow' app on the iPad we have and automatically making an XL file out of that list of scanned items.
The XL file here could look something similar to this.
56788
12345
89012
...
It differs in that there is no column header. All XL files have their data 'anchored' at cell A1 on 'Sheet1" and again just single column will be used. No unnecessary complications here!
Here is the script anyway. When it is fully working system arguments will be supplied to it. For now, let's pretend that we need to change records to have their 'RAM' value changed from
"2GB" to "2 GB".
import xlrd
import string
import re
import pandas as pd
field = "RAM"
value = "2 GB"
myFile = "/Users/me/folder/testArticles.xlsx"
df = pd.read_excel(myFile)
myRegex = "^[0-9]{5}$"
# data collection and putting into lists.
workbook = xlrd.open_workbook(myFile)
sheet = workbook.sheet_by_index(0)
data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in range(sheet.nrows)]
formatted = []
deDuped = []
# removing any possible XL headers, setting all values to strings
# that look like five-digit ints, apply a regex to be sure.
for i in data:
cellValue = str(i)
cellValue = cellValue.translate(None, '\'[u]\'')
# remove the decimal point
# Searching for the header will cause a database front-end problem.
cellValue = cellValue[:-2]
cellValue = cellValue.translate(None, string.letters)
# making sure only valid article numbers get through
# blank rows etc can take a hike
if len(cellValue) != 0:
if re.match(myRegex, cellValue):
formatted.append(cellValue)
# weeding out any possilbe dupes.
for i in formatted:
if i not in deDuped:
deDuped.append(i)
#main code block
for i in deDuped:
#lots going on here involving pyauotgui
#making sure of no error running searches, checking for warnings, moving/tabbing around DB front-end etc
#if all goes to plan
#removing that record number from the excel file and saving the change
#so that if we run the script again for the same XL file
#we don't needlessly update an already OK record again.
df = df[~df['ANR'].astype(str).str.startswith(i)]
df.to_excel(myFile, index=False)
What I really would to like to find out is how can I run the script so that "doesn't care" about the presence or absence of the column header.
df = df[~df['ANR'].astype(str).str.startswith(i)]
Appears to be the line of code where this all hangs on. I've made several changes to the line in different combination but my script always crashes.
If a column header, ("ANR") in my case, is essential for this particular 'pandas' method is there a straight-forward way of inserting a column header into an XL file if it lacks one in the first place - i.e the XL files that come from the IR scanner and the 'Workflow' app on the iPad?
Thanks guys!
UPDATE
I've tried as suggested by Patrick implementing some code to check if cell "A1" has a header or not. Partial success. I can put "ANR" in cell A1 if its missing but I lose whatever was there in the first place.
import xlwt
from openpyxl import Workbook, load_workbook
from xlutils.copy import copy
import openpyxl
# data collection
workbook = xlrd.open_workbook(myFile)
sheet = workbook.sheet_by_index(0)
data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in range(sheet.nrows)]
cell_a1 = sheet.cell_value(rowx=0, colx=0)
if cell_a1 == "ANR":
print "has header"
else:
wb = openpyxl.load_workbook(filename= myFile)
ws = wb['Sheet1']
ws['A1'] = "ANE"
wb.save(myFile)
#re-open XL file again etc etc.
I found this new block of code over at writing to existing workbook using xlwt. In this instance the contributor actually used openpyxl.
I think I got it fixed for myself.
Still a tiny bit messy but seems to be working. Added an 'if/else' clause to check the value of cell A1 and to take action accordingly. Found most of the code for this at how to append data using openpyxl python to excel file from a specified row? - using the suggestion for openpyxl
import pyperclip
import xlrd
import pyautogui
import string
import re
import os
import pandas as pd
import xlwt
from openpyxl import Workbook, load_workbook
from xlutils.copy import copy
field = "RAM"
value = "2 GB"
myFile = "/Users/me/testSerials.xlsx"
df = pd.read_excel(myFile)
myRegex = "^[0-9]{5}$"
# data collection
workbook = xlrd.open_workbook(myFile)
sheet = workbook.sheet_by_index(0)
data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in range(sheet.nrows)]
cell_a1 = sheet.cell_value(rowx=0, colx=0)
if cell_a1 == "ANR":
print "has header"
else:
headers = ['ANR']
workbook_name = 'myFile'
wb = Workbook()
page = wb.active
# page.title = 'companies'
page.append(headers) # write the headers to the first line
workbook = xlrd.open_workbook(workbook_name)
sheet = workbook.sheet_by_index(0)
data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in range(sheet.nrows)]
for records in data:
page.append(records)
wb.save(filename=workbook_name)
#then load the data all over again, this time with inserted header
workbook = xlrd.open_workbook(myFile)
sheet = workbook.sheet_by_index(0)
data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in range(sheet.nrows)]
formatted = []
deDuped = []
# removing any possible XL headers, setting all values to strings that look like five-digit ints, apply a regex to be sure.
for i in data:
cellValue = str(i)
cellValue = cellValue.translate(None, '\'[u]\'')
# remove the decimal point
cellValue = cellValue[:-2]
# cellValue = cellValue.translate(None, ".0")
cellValue = cellValue.translate(None, string.letters)
# making sure any valid ANRs get through
if len(cellValue) != 0:
if re.match(myRegex, cellValue):
formatted.append(cellValue)
# ------------------------------------------
# weeding out any possilbe dupes.
for i in formatted:
if i not in deDuped:
deDuped.append(i)
# ref - https://stackoverflow.com/questions/48942743/python-pandas-to-remove-rows-in-excel
df = pd.read_excel(myFile)
print df
for i in deDuped:
#pyautogui code is run here...
#if all goes to plan update the XL file
df = df[~df['ANR'].astype(str).str.startswith(i)]
df.to_excel(myFile, index=False)
I'm looking to achieve functionality similar to the Range.Find method in VBA using win32com package in Python. I'm dealing with an Excel CSV file. While I have found lots of solutions using range(), it seems to require specifying a fixed range of cells, as opposed to Range.Find in VBA, which will auto search in worksheet without fixing the range.
Here is my code:
import win32com.client as client
excel= client.dynamic.Dispatch("Excel.Application")
excel.visible= True
wb= excel.workbooks.open(r"ExcelFile.xls")
ws= wb.worksheets('First')
### This able to extract information:
test_range= ws.Range("A1")
### Got issue AttributeError: 'function' object has no attribute 'Find':
test_range= ws.Range.Find("Series ID")
print(test_range.value)
Does it mean Range.Find method does not supported in win32 package or I point it with the wrong existing module?
Bonus answer: if you are a fan of the Excel API (10x to #ashleedawg comment), you can use it directly through xlwings:
import xlwings as xw
bookName = r'C:\somePath\hello.xlsx'
sheetName = 'Sheet1'
wb = xw.Book(bookName)
sht = wb.sheets[sheetName]
myCell = wb.sheets[sheetName].api.UsedRange.Find('test')
print('---------------')
print (myCell.address)
input()
Thus an input like this:
Nicely returns this:
So with the first part of the code some Excel file with random-like numbers is generated:
import xlsxwriter
from xlsxwriter.utility import xl_rowcol_to_cell
import xlrd
#First part of the code, used only to create some Excel file with data
wbk = xlsxwriter.Workbook('hello.xlsx')
wks = wbk.add_worksheet()
i = -1
for x in range(1, 1000, 11):
i+=1
cella = xl_rowcol_to_cell(i, 0) #0,0 is A1!
cellb = xl_rowcol_to_cell(i, 1)
cellc = xl_rowcol_to_cell(i, 2)
#print (cella)
wks.write(cella,x)
wks.write(cellb,x*3)
wks.write(cellc,x*4.5)
myPath= r'C:\Desktop\hello.xlsx'
wbk.close()
#SecondPart of the code
for sh in xlrd.open_workbook(myPath).sheets():
for row in range(sh.nrows):
for col in range(sh.ncols):
myCell = sh.cell(row, col)
print(myCell)
if myCell.value == 300.0:
print('-----------')
print('Found!')
print(xl_rowcol_to_cell(row,col))
quit()
With the second part of the code, the real "Searching" starts. In this case, we are searching for 300, which is actually one of the generated values from the first part of the code:
So, python starts looping through rows and columns, comparing the values with 300. If the value is found, it writes Found and stops searching:
This code can be actually re-written, with making the second part as a function (def).
If you want to do it with a function, this is a way to do it - defCell is the name of the function.
import xlsxwriter
import os
import xlrd
import time
from xlsxwriter.utility import xl_rowcol_to_cell
def findCell(sh, searchedValue):
for row in range(sh.nrows):
for col in range(sh.ncols):
myCell = sh.cell(row, col)
if myCell.value == searchedValue:
return xl_rowcol_to_cell(row, col)
return -1
myName = 'hello.xlsx'
wbk = xlsxwriter.Workbook(myName)
wks = wbk.add_worksheet()
i = -1
for x in range(1, 1000, 11):
i+=1
cella = xl_rowcol_to_cell(i, 0) #0,0 is A1!
cellb = xl_rowcol_to_cell(i, 1)
cellc = xl_rowcol_to_cell(i, 2)
wks.write(cella,x)
wks.write(cellb,x*3)
wks.write(cellc,x*4.5)
myPath= os.getcwd()+"\\"+myName
searchedValue = 300
for sh in xlrd.open_workbook(myPath).sheets():
print(findCell(sh, searchedValue))
input('Press ENTER to exit')
It produces this after running it:
Yes win32com can do the exact same range.find() function. The problem with your code is you didnt specify what is the range. Range has no Find attribute.
test_range= ws.Range.Find("Series ID") #<-----no range specified
Below is the correct use of Range and Find
import win32com.client as client
excel= client.dynamic.Dispatch("Excel.Application")
excel.visible= True
wb= excel.workbooks.open(r"ExcelFile.xls")
ws= wb.worksheets('First')
test_range= ws.Range("A1")
### example if you want to find out the column of search result
ResultColumn= test_range.Find("Series ID").Column
print(str(ResultColumn))
I know this is alot of code and there is alot to do, but i am really stuck and don't know how to continue after i got the function that the program can match identical files. I am pretty sure you know how the lookup from excel works. This Program does basicly the same. I tried to comment out the important parts and hope you can give me some help how i can continue this project. Thank you very much!
import pandas as pd
import xlrd
File1 = pd.read_excel("Excel_test.xlsx", usecols=[0], header=None, index=False) #the two excel files with the columns that should be compared
File2 = pd.read_excel("Excel_test02.xlsx", usecols=[0], header=None, index=False)
fullFile1 = pd.read_excel("Excel_test.xlsx", header=None, index=False)#the full excel files
fullFile2 = pd.read_excel("Excel_test02.xlsx", header=None, index=False)
i = 0
writer = pd.ExcelWriter("output.xlsx")
def loadingTime(): #just a loader that shows the percentage of the matching process
global i
loading = (i / len(File1)) * 100
loading = round(loading, 2)
print(str(loading) + "%/100%")
def matcher():
global i
while(i < len(File1)):#goes in column that should be compared and goes on higher if there is a match found in second file
for o in range(len(File2)):#runs through the column in second file
matching = File1.iloc[i].str.lower() == File2.iloc[o].str.lower() #matches the column contents of the two files
if matching.bool() == True:
print("Match")
"""
df.append(File1.iloc[i])#the whole row of the matched column should be appended in Dataframe with the arrangement of excel file
df.append(File2.iloc[o])#the whole row of the matched column should be appended in Dataframe with the arrangement of excel file
"""
i += 1
matcher()
df.to_excel(writer, "Sheet")
writer.save() #After the two files have been compared to each other, now a file containing both excel contents and is also arranged correctly
I'm trying to use the openpyxl module to take a spreadsheet, see if there are empty cells in a certain column (in this case, column E), and then copy the rows that contain those empty cells to a new spreadsheet. The code runs without traceback, but the resulting file won't open. What's going on?
Here's my code:
#import the openpyxl module
import openpyxl
#First create a new workbook & sheet
newwb = openpyxl.Workbook()
newwb.save('TESTINGTHISTHING.xlsx')
newsheet = newwb.get_sheet_by_name('Sheet')
#open the original file
wb = openpyxl.load_workbook('OriginalWorkbook.xlsx')
#create a sheet object
sheet = wb.get_sheet_by_name('Sheet1')
#Find out how many cells of a certain column are left blank,
#and what rows they're in
count = 0
listofrows = []
for row in range(2, sheet.get_highest_row() + 1):
company = sheet['E' + str(row)].value
if company == None:
listofrows.append(row)
count += 1
print listofrows
print count
#Put the values of the rows with blank company names into the new sheet
for i in range(len(listofrows)):
j = 0
newsheet['A' + str(i+1)] = sheet['A' + str(listofrows[j])].value
j += 1
newwb.save('TESTINGTHISTHING.xlsx')
Please help!
I just ran your program with a mock document. I was able to open my output file without problem. Your issues probably relies within your excel or openpyxl version.
Please provide your software versions in addition to your source document so I can look further into the issue.
You can always update openpyxl with:
c:\Python27\Scripts
pip install openpyxl --upgrade