I am trying to overwrite a value in a given cell using openpyxl. I have two sheets. One is called Raw, it is populated by API calls. Second is Data that is fed off of Raw sheet. Two sheets have exactly identical shape (cols/rows). I am doing a comparison of the two to see if there is a bay assignment in Raw. If there is - grab it to Data sheet. If both Raw and Data have the value in that column missing - then run a complex Algo (irrelevant for this question) to assign bay number based on logic.
I am having problems with rewriting Excel using openpyxl.
Here's example of my code.
data_df = pd.read_excel('Algo Build v23test.xlsx', sheet_name='MondayData')
raw_df = pd.read_excel('Algo Build v23test.xlsx', sheet_name='MondayRaw')
no_bay_res = data_df[data_df['Bay assignment'].isnull()].reset_index() #grab rows where there is no bay assignment in a specific column
book = load_workbook("Algo Build v23test.xlsx")
sheet = book["MondayData"]
for index, reservation in no_bay_res.iterrows():
idx = int(reservation['index'])
if pd.isna(raw_df.iloc[idx, 13]):
continue
else:
value = raw_df.iat[idx,13]
data_df.iloc[idx, 13] = value
sheet.cell(idx+2, 14).value = int(value)
book.save("Algo Build v23test.xlsx")
book.close()
print(value) #302
Now the problem is that it seems that book.close() is not working. Book is still callable in python. Now, it overwrites Excel totally fine. However, if I try to run these two lines again
data_df = pd.read_excel('Algo Build v23test.xlsx', sheet_name='MondayData')
raw_df = pd.read_excel('Algo Build v23test.xlsx', sheet_name='MondayRaw')
I am getting datasets full of NULL values, except for the value that was replaced. (attached the image).
However, if I open that Excel file manually from the folder and save it (CTRL+S) and try running the code again - it works properly. Weirdest problem.
I need to loop the code above for Monday-Sunday, so I need it to be able to read the data again without manually resaving the file.
Due to some reason, pandas will read all the formulas as NaN after the file been used in the script by openpyxl until the file has been opened, saved and closed. Here's the code that helps do that within the script. However, it is rather slow.
import xlwings as xl
def df_from_excel(path, sheet_name):
app = xl.App(visible=False)
book = app.books.open(path)
book.save()
app.kill()
return pd.read_excel(path, sheet_name)
I got the same problem, the only workaround I found is to terminate the excel.exe manually from taskmanager. After that everything went fine.
Related
I tested the openpyxl .remove() function and it's working on multiple empty file.
Problem: I have a more complex Excel file with multiple sheet that I need to remove. If I remove one or two it works, when I try to remove three or more, Excel raise an error when I open the file.
Sorry, we have troubles getting info in file bla bla.....
logs talking about pictures troubles
logs about error105960_01.xml ?
The strange thing is that it's talking about pictures trouble but I don't have this error if I don't remove 3 or more sheet. And I don't try to remove sheet with images !
Even more strange, It's always about the number, every file can be deleted without trouble but if I remove 3 or more, Excel yell at me.
The thing is that, it's ok when Excel "repair" the "error" but sometimes, excel reinitialize the format of the sheets (size of cell, bold and length of the characters, etc...) and everything fail :(
bad visual that I want to avoid
If someone have an idea, i'm running out of creativity !
For the code, I only use basic functions (simplify here but it would be long to present more...).
INPUT_EXCEL_PATH = "my_excel.xlsx"
OUTPUT_EXCEL_PATH = "new_excel.xlsx"
wb = openpyxl.load_workbook(INPUT_EXCEL_PATH)
ws = wb["sheet1"]
wb.remove(ws)
ws = wb["sheet2"]
wb.remove(ws)
ws = wb["sheet3"]
wb.remove(ws)
wb.save(OUTPUT_EXCEL_PATH)
In my case it was some left over empty CalculationChainPart. I used DocxToSource to investigate the corrupted file. Excel will attempt to fix the file on load. Save this file and compare it's structure to the original file. To delete descendant parts you can use the DeletePart() method.
using (SpreadsheetDocument doc = SpreadsheetDocument .Open(document, true)) {
MainDocumentPart mainPart = doc.MainDocumentPart;
if (mainPart.DocumentSettingsPart != null) {
mainPart.DeletePart(mainPart.DocumentSettingsPart);
}
}
CalculationChainPart can be also removed anytime.
While calculation chain information can be loaded by a spreadsheet application, it is not required. A calculation chain can be constructed in memory at load-time (source)
I'm trying to create a Python script (I'm using Python 3.7.3 with UTF-8 encoding on Windows 10 64-bit with Microsoft Office 365) that exports user selected worksheets to PDF, after the user has selected the Excel-files.
The Excel-files contain a lot of different settings for page setup and each worksheet in each Excel-file has a different page setup.
The task is therefore that I need to read all current variables regarding page setup to be able to assign them to the related variables for export.
The problem is when I'm trying to get Excel to return the current print area of the worksheet, which I can't figure out.
As far as I understand I need to be able to read the current print area, to be able to set it for the export.
The Excel-files are a mixture of ".xlxs" and ".xlsm".
I've tried using all kind of different methods from the Excel VBA documentation, but nothing has worked so far e.g. by adding ".Range" and ".Address" etc.
I've also tried the ".UsedRange", but there is no significant difference in the cells that I can search for and I can't format them in a specific way so I can't use this.
I've also tried using the "IgnorePrintAreas = False" variable in the "ExportAsFixedFormat"-function, but that didn't work either.
#This is some of the script.
#I've left out irrelevant parts (dialogboxes etc.) just to make it shorter
#Import pywin32 and open Excel and selected workbook.
import win32com.client as win32
excel = win32.gencache.EnsureDispatch("Excel.Application")
excel.Visible = False
wb = excel.Workbooks.Open(wb_path)
#Select the 1st worksheet in the workbook
#This is just used for testing
wb.Sheets([1]).Select()
#This is the line I can't get to work
ps_prar = wb.ActiveSheet.PageSetup.PrintArea
#This is just used to test if I get the print area
print(ps_prar)
#This is exporting the selected worksheet to PDF
wb.Sheets([1]).Select()
wb.ActiveSheet.ExportAsFixedFormat(0, pdf_path, Quality = 0, IncludeDocProperties = True, IgnorePrintAreas = False, OpenAfterPublish = True)
#This closes the workbook and the Excel-file (although Excel sometimes still exists in Task Manager
wb.Close()
wb = None
excel.Quit()
excel = None
If I leave the code as above and try and open a test Excel-file (.xlxs) with a small PrintArea (A1:H8) the print function just gives me a blank line.
If I add something to .PrintArea (as mentioned above) I get 1 of 2 errors:
"TypeError: 'str' object is not callable".
or
"ps_prar = wb.ActiveSheet.PageSetup.PrintArea.Range
AttributeError: 'str' object has no attribute 'Range'"
I'm hoping someone can help me in this matter - thanks, in advance.
try
wb = excel.Workbooks.OpenXML(wb_path)
insead of
wb = excel.Workbooks.Open(wb_path)
My problem was with a german version of ms-office. It works now. Check here https://social.msdn.microsoft.com/Forums/de-DE/3dce9f06-2262-4e22-a8ff-5c0d83166e73/excel-api-interne-namen?forum=officede
I've created a scraper which is supposed to parse some documents from a webpage and save it to an excel file creating two sheets. However, when I run it, I can see that It only saves the documents of last link in a single sheet whereas there should be two sheets with documents from two links properly. I even printed the results to see what is happening in the background but i found there nothing wrong. I thing the first sheet is overwritten and second one is never created. How to get around this so that data will be saved in two sheets in an excel file. Thanks in advance to take a look into it.
Here is my code:
import requests
from lxml import html
from pyexcel_ods3 import save_data
name_list = ['Altronix','APC']
def docs_parser(link, name):
res = requests.get(link)
root = html.fromstring(res.text)
vault = {}
for post in root.cssselect(".SubBrandList a"):
if post.text == name:
refining_docs(post.attrib['href'], vault)
def refining_docs(new_link, vault):
res = requests.get(new_link).text
root = html.fromstring(res)
sheet = root.cssselect("#BrandContent h2")[0].text
for elem in root.cssselect(".ProductDetails"):
name_url = elem.cssselect("a[class]")[0].attrib['href']
vault.setdefault(sheet, []).append([str(name_url)])
save_data("docs.ods", vault)
if __name__ == '__main__':
for name in name_list:
docs_parser("http://store.immediasys.com/brands/" , name)
But, the same way when I write code for another site, it meets the expectation creating different sheets and saving documents in those. Here is the link:
https://www.dropbox.com/s/bgyh1xxhew8hcvm/Pyexcel_so.txt?dl=0
Question: I thing the first sheet is overwritten and second one is never created. How to get around this so that data will be saved in two sheets in an excel file.
You overwrite the Workbook File on every Link that's be appended.
You should never call save_data(... within a loop, only once at the End of your Script.
Comparing you Two Scripts there is No difference, both behave the same, again and again overwriting the Workbook File. Maybe the File IO get overloaded as you doing more than 160 Times overwriting the Workbook File within a short Time.
The First Script should create 13 Sheets:
data sheet:powerpivot-etc links:20
data sheet:flappy-owl-videos links:1
data sheet:reporting-services-videos links:20
data sheet:csharp links:14
data sheet:excel-videos links:9
data sheet:excel-vba-videos links:20
data sheet:sql-server-videos links:9
data sheet:report-builder-2016-videos links:4
data sheet:ssrs-2016-videos links:5
data sheet:sql-videos links:20
data sheet:integration-services links:19
data sheet:excel-vba-user-form links:20
data sheet:archived-videos links:16
The Second Script should create 2 Sheets:
vault sheet:Altronix links:16
vault sheet:APC links:16
I am trying to scrape a .xlsx excel file for chart objects and export them as images. The only similar stackoverflow question I found was this one which attempts to do the same thing. The script, however, does not seem to work (even when I correct the syntax/methods).
I am willing to get this running in either Python 2.7.9 or 3.4.0. as I have both versions running on my computer.
Here is the code I am working with:
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Open(r'C:\Users\Emilyn\Desktop\chartTest.xlsx')
excel.Visible = True
wb.Sheets("Sheet1").Select()
wbSheetOne = wb.Sheets(1)
wb.DisplayAlerts = False
i = 0
for chart in wbSheetOne.ChartObjects():
print(chart.Name)
chart.CopyPicture()
excel.ActiveWorkbook.Sheets.Add(After =excel.ActiveWorkbook.Sheets(3)).Name="temp_sheet" + str(i)
temp_sheet = wb.ActiveSheet
cht = wb.ActiveSheet.ChartObjects().Add(0,0,800,600)
cht.Chart.Export("chart" + str(i) + ".png")
i = i+1
excel.ActiveWorkbook.close
wb.DisplayAlerts = True
This opens my excel file, generates three .png images in my documents folder, and creates three new worksheets for the images, but the images are all blank.I am not sure what I can do to get the chart objects in my excel file to correctly copy to these newly created images.
Any help I could get on this would be greatly appreciated as there seems to be no in depth documentation on pywin/win32com anywhere.
I've been searching the internet like mad and trying to get this to work for a day or two now... It's hard to get something to work when you don't know all of the methods available, or even what some of the methods do.
(Yes, I have read all the "read me" files that came with the library and read what they offered on their website as well.)
I already figured out what to do but I suppose I'll post it for future users.
for index in range(1, count + 1):
currentChart = wbSheet.ChartObjects(index)
currentChart.Copy
currentChart.Chart.Export("chart" + str(index) + ".png")
I used a count to do a for loop, this way you dynamically read the amount of chart objects in an excel file.
Also, the reason I started the range at 1 is because VB in excel starts index of objects at 1, not zero.
You copy an existing chart as a picture, but don't do anything with it.
You insert a worksheet, without adding any data, then you embed a chart in that worksheet, which must be blank since there's no data to display, then export the blank chart as a png file.
I got a really strange problem. I'm trying to read some data from an excel file, but the property nrows has a wrong value. Although my file has a lot of rows, it just returns 2.
I'm working in pydev eclipse. I don't know what is actually the problem; everything looks fine.
When I try to access other rows by index manually, but I got the index error.
I appreciate any help.
If it helps, it's my code:
def get_data_form_excel(address):
wb = xlrd.open_workbook(address)
profile_data_list = []
for s in wb.sheets():
for row in range(s.nrows):
if row > 0:
values = []
for column in range(s.ncols):
values.append(str(s.cell(row, column).value))
profile_data_list.append(values)
print str(profile_data_list)
return profile_data_list
To make sure your file is not corrupt, try with another file; I doubt xlrd is buggy.
Also, I've cleaned up your code to look a bit nicer. For example the if row > 0 check is unneeded because you can just iterate over range(1, sheet.nrows) in the first place.
def get_data_form_excel(address):
# this returns a generator not a list; you can iterate over it as normal,
# but if you need a list, convert the return value to one using list()
for sheet in xlrd.open_workbook(address).sheets():
for row in range(1, sheet.nrows):
yield [str(sheet.cell(row, col).value) for col in range(sheet.ncols)]
or
def get_data_form_excel(address):
# you can make this function also use a (lazily evaluated) generator instead
# of a list by changing the brackets to normal parentheses.
return [
[str(sheet.cell(row, col).value) for col in range(sheet.ncols)]
for sheet in xlrd.open_workbook(address).sheets()
for row in range(1, sheet.nrows)
]
After trying some other files I'm sure it's about the file, and I think it's related to Microsoft 2003 and 2007 differences.
I recently got this problem too. I'm trying to read an excel file and the row number given by xlrd.nrows is less than the actual one. As Zeinab Abbasi saied, I tried other files but it worked fine.
Finally, I find out the difference : there's a VB-script based button embedded in the failed file, which is used to download and append records to the current sheet.
Then, I try to convert the file to .xlsx format, but it asks me to save as another format with macro enabled, e.g .xlsm. This time xlrd.nrows gives the correct value.
Is your excel file using external data? I just had the same problem and found a fix. I was using excel to get info from a google sheet, and I wanted to have python show me that data. So, the fix for me was going to DATA>Connections(in "Get External Data")>Properties and unchecking "Remove data from the external data range before saving the workbook"