I created this topic ( Delete lines found in file with many lines ) and they suggested using the package "xlrd". I used and got interact with the file, but could not compare the contents of the cell with some string.
Here's my code:
import xlrd
arquivo = xlrd.open_workbook('/media/IRREMOVIVEL/arquivo.xls',)
planilha = arquivo.sheet_by_index(0)
def lerPlanilha():
for i in range(planilha.ncols):
if (planilha.cell(8,9) == "2010"):
print 'it works =>'
break
else:
print 'not works'
break
lerPlanilha()
But I got error: not works
Sorry for duplicate, maybe, and bad english.
xlrd.sheet.Sheet.cell method returns xlrd.sheet.Cell instance, which represents cell with value stored in it's value attribute. So something like sheet.cell(x,y).value should work.
And about deleting - you can't modify document using xlrd, you should use xlwt for writing excel, and xlutils for reading, modifying and writing document. A little intense in googling gives you something like http://www.python-excel.org/
Related
Forgive me if this is an idiotic question. Im new to coding and wanted to automate part of my workflow.
Im enjoying the puzzle so i won't ask too many questions. But im stuck on this
Every time an order comes in, I have to copy data from raw excel files to a templates.
I want to replace the three headers at the top of this page with variables ive already extracted from the raw excel data.
enter image description here
so that it would look like this on every page
enter image description here
In every tutorial I see, their "header" is just row 1.
I think xlsxwriter has the ability to change those headers looks like that only on new worksheets.
df1.to_clipboard(index=False, header=False) #Copies df1 to clipboard (BOM Data)
ws.Range("A2").Select()
ws.PasteSpecial(Format='Unicode Text') # Paste as text in template
*#So at this point i guess im using pywin32 to copy and paste but have to use switch back to xlsxwriter to change the header?*
wb = xlsxwriter.Workbook(r'C:\Users\jfras\Desktop\Auto BOM\PARKER BOM TEMPLATE.xlsx')
ws = wb.Worksheets(1)
header1 = '&CTest Entry'*#So at this point i guess im using pywin32 to copy and paste but have to use switch back to xlsxwriter to change the header?*
wb = xlsxwriter.Workbook(r'C:\Users\jfras\Desktop\Auto BOM\PARKER BOM TEMPLATE.xlsx')
ws = wb.Worksheets(1)
header1 = '&CTest Entry'
Your question is a little unclear, the screenshots you attached look to be inside of word. It seems like you are trying to automate moving data from excel into a word document template, is that correct?
If I understand correctly, you will need to use a python package to read your excel document, then use a python package to insert that data into a parameterized template in word. Here is an article explaining doing exactly that.
In a nutshell, using Openpyxl (or presumably any python excel reader of your choosing) you would read the excel sheet, then "plug-in" your data into a word template using something like Python-docx. The article linked above contains code snippets explaining this process in more detail.
I hope I understood your question right. If so, something like this code below may work:
import xlsxwriter
workbook = xlsxwriter.Workbook('teste.xlsx')
worksheet = workbook.add_worksheet()
worksheet.set_header('&L P10853' + '&CTEST OBJECT' + '&RUN_28583')
workbook.close()
Of course, if you just run this code you gonna end up having an empty sheet that prints nothing until you fill at least one cell.
But, anyway, you can understand the code like, the command set_header it's the mandatory here and it's doing what we want. When you put a string with &L you setting the left header &C for the center header and &R for the right header. You can see more in https://xlsxwriter.readthedocs.io/example_headers_footers.html
I tested the openpyxl .remove() function and it's working on multiple empty file.
Problem: I have a more complex Excel file with multiple sheet that I need to remove. If I remove one or two it works, when I try to remove three or more, Excel raise an error when I open the file.
Sorry, we have troubles getting info in file bla bla.....
logs talking about pictures troubles
logs about error105960_01.xml ?
The strange thing is that it's talking about pictures trouble but I don't have this error if I don't remove 3 or more sheet. And I don't try to remove sheet with images !
Even more strange, It's always about the number, every file can be deleted without trouble but if I remove 3 or more, Excel yell at me.
The thing is that, it's ok when Excel "repair" the "error" but sometimes, excel reinitialize the format of the sheets (size of cell, bold and length of the characters, etc...) and everything fail :(
bad visual that I want to avoid
If someone have an idea, i'm running out of creativity !
For the code, I only use basic functions (simplify here but it would be long to present more...).
INPUT_EXCEL_PATH = "my_excel.xlsx"
OUTPUT_EXCEL_PATH = "new_excel.xlsx"
wb = openpyxl.load_workbook(INPUT_EXCEL_PATH)
ws = wb["sheet1"]
wb.remove(ws)
ws = wb["sheet2"]
wb.remove(ws)
ws = wb["sheet3"]
wb.remove(ws)
wb.save(OUTPUT_EXCEL_PATH)
In my case it was some left over empty CalculationChainPart. I used DocxToSource to investigate the corrupted file. Excel will attempt to fix the file on load. Save this file and compare it's structure to the original file. To delete descendant parts you can use the DeletePart() method.
using (SpreadsheetDocument doc = SpreadsheetDocument .Open(document, true)) {
MainDocumentPart mainPart = doc.MainDocumentPart;
if (mainPart.DocumentSettingsPart != null) {
mainPart.DeletePart(mainPart.DocumentSettingsPart);
}
}
CalculationChainPart can be also removed anytime.
While calculation chain information can be loaded by a spreadsheet application, it is not required. A calculation chain can be constructed in memory at load-time (source)
I got a really strange problem. I'm trying to read some data from an excel file, but the property nrows has a wrong value. Although my file has a lot of rows, it just returns 2.
I'm working in pydev eclipse. I don't know what is actually the problem; everything looks fine.
When I try to access other rows by index manually, but I got the index error.
I appreciate any help.
If it helps, it's my code:
def get_data_form_excel(address):
wb = xlrd.open_workbook(address)
profile_data_list = []
for s in wb.sheets():
for row in range(s.nrows):
if row > 0:
values = []
for column in range(s.ncols):
values.append(str(s.cell(row, column).value))
profile_data_list.append(values)
print str(profile_data_list)
return profile_data_list
To make sure your file is not corrupt, try with another file; I doubt xlrd is buggy.
Also, I've cleaned up your code to look a bit nicer. For example the if row > 0 check is unneeded because you can just iterate over range(1, sheet.nrows) in the first place.
def get_data_form_excel(address):
# this returns a generator not a list; you can iterate over it as normal,
# but if you need a list, convert the return value to one using list()
for sheet in xlrd.open_workbook(address).sheets():
for row in range(1, sheet.nrows):
yield [str(sheet.cell(row, col).value) for col in range(sheet.ncols)]
or
def get_data_form_excel(address):
# you can make this function also use a (lazily evaluated) generator instead
# of a list by changing the brackets to normal parentheses.
return [
[str(sheet.cell(row, col).value) for col in range(sheet.ncols)]
for sheet in xlrd.open_workbook(address).sheets()
for row in range(1, sheet.nrows)
]
After trying some other files I'm sure it's about the file, and I think it's related to Microsoft 2003 and 2007 differences.
I recently got this problem too. I'm trying to read an excel file and the row number given by xlrd.nrows is less than the actual one. As Zeinab Abbasi saied, I tried other files but it worked fine.
Finally, I find out the difference : there's a VB-script based button embedded in the failed file, which is used to download and append records to the current sheet.
Then, I try to convert the file to .xlsx format, but it asks me to save as another format with macro enabled, e.g .xlsm. This time xlrd.nrows gives the correct value.
Is your excel file using external data? I just had the same problem and found a fix. I was using excel to get info from a google sheet, and I wanted to have python show me that data. So, the fix for me was going to DATA>Connections(in "Get External Data")>Properties and unchecking "Remove data from the external data range before saving the workbook"
I'm almost an absolute beginner in Python, but I am asked to manage some difficult task. I have read many tutorials and found some very useful tips on this website, but I think that this question was not asked until now, or at least in the way I tried it in the search engine.
I have managed to write some url in a csv file. Now I would like to write a script able to open this file, to open the urls, and write their content in a dictionary. But I have failed : my script can print these addresses, but cannot process the file.
Interestingly, my script dit not send the same error message each time. Here the last : req.timeout = timeout
AttributeError: 'list' object has no attribute 'timeout'
So I think my script faces several problems :
1- is my method to open url the right one ?
2 - and what is wrong in the way I build the dictionnary ?
Here is my attempt below. Thanks in advance to those who would help me !
import csv
import urllib
dict = {}
test = csv.reader(open("read.csv","rb"))
for z in test:
sock = urllib.urlopen(z)
source = sock.read()
dict[z] = source
sock.close()
print dict
First thing, don't shadow built-ins. Rename your dictionary to something else as dict is used to create new dictionaries.
Secondly, the csv reader creates a list per line that would contain all the columns. Either reference the column explicitly by urllib.urlopen(z[0]) # First column in the line or open the file with a normal open() and iterate through it.
Apart from that, it works for me.
Using the python module xlwt, writing to the same cell twice throws an error:
Message File Name Line Position
Traceback
<module> S:\********
write C:\Python26\lib\site-packages\xlwt\Worksheet.py 1003
write C:\Python26\lib\site-packages\xlwt\Row.py 231
insert_cell C:\Python26\lib\site-packages\xlwt\Row.py 150
Exception: Attempt to overwrite cell: sheetname=u'Sheet 1' rowx=1 colx=12
with the code snippet
def insert_cell(self, col_index, cell_obj):
if col_index in self.__cells:
if not self.__parent._cell_overwrite_ok:
msg = "Attempt to overwrite cell: sheetname=%r rowx=%d colx=%d" \
% (self.__parent.name, self.__idx, col_index)
raise Exception(msg) #row 150
prev_cell_obj = self.__cells[col_index]
sst_idx = getattr(prev_cell_obj, 'sst_idx', None)
if sst_idx is not None:
self.__parent_wb.del_str(sst_idx)
self.__cells[col_index] = cell_obj
Looks like the code 'raise'es an exception which halts the entire process. Is removing the 'raise' term enough to allow for overwriting cells? I appreciate xlwt's warning, but i thought the pythonic way is to assume "we know what we're doing". I don't want to break anything else in touching the module.
The problem is that overwriting of worksheet data is disabled by default in xlwt. You have to allow it explicitly, like so:
worksheet = workbook.add_sheet("Sheet 1", cell_overwrite_ok=True)
What Ned B. has written is valuable advice -- except for the fact that as xlwt is a fork of pyExcelerator, "author of the module" is ill-defined ;-)
... and Kaloyan Todorov has hit the nail on the head.
Here's some more advice:
(1) Notice the following line in the code that you quoted:
if not self.__parent._cell_overwrite_ok:
and search the code for _cell_overwrite_ok and you should come to Kaloyan's conclusion.
(2) Ask questions on (and search the archives of) the python-excel google-group
(3) Check out this site which gives pointers to the google-group and to a tutorial.
Background: the problem was that some people didn't know what they were doing (and in at least one case were glad to be told), and the behaviour that xlwt inherited from pyExcelerator was to blindly write two (or more) records for the same cell, which led not only to file bloat but also confusion, because Excel would complain and show the first written and OpenOffice and Gnumeric would silently show the last written. Removing all trace of the old data from the shared string table so that it wouldn't waste space or (worse) be visible in the file was a PITA.
The whole saga is recorded in the google-group. The tutorial includes a section on overwriting cells.
If you:
don't want to set the entire worksheet to be able to be overwritten in the constructor, and
still catch the exception on a case-by-case basis
...try this:
try:
worksheet.write(row, col, "text")
except:
worksheet._cell_overwrite_ok = True
# do any required operations since we found a duplicate
worksheet.write(row, col, "new text")
worksheet._cell_overwrite_ok = False
You should get in touch with the author of the module. Simply removing a raise is unlikely to work well. I would guess that it would lead to other problems further down the line. For example, later code may assume that any given cell is only in the intermediate representation once.