I have been running a python script successfully for several months. The script edits a template excel spreadsheet using the win32com commands and then saves the edited workbook as a new .xlsx file.
results_path = "C:\\Users\\...\\"
results_title = results_path + input + "_Results.xlsx"
if os.path.exists(template_path):
xl= win32com.client.gencache.EnsureDispatch("Excel.Application")
xl.Application.DisplayAlerts = False
xl.Workbooks.Open(Filename= template_path)
xl.Application.Cells(2,6).Value = input
r = 17
for row in y_test:
row = str(row)
row = row[1:]
row = row[:-1]
xl.Application.Cells(r,2).Value = row
r += 1
# xl.Application.CalculateFullRebuild
# xl.ActiveWorkbook.SaveAs(Filename = save_title)
# time.sleep(20)
r = 17
for row in prediction:
row = str(row)
row = row[1:]
row = row[:-1]
xl.Application.Cells(r,3).Value = row
r += 1
xl.ActiveWorkbook.SaveAs(Filename = results_title)
Without changing anything in the script it no longer works. One day it just stopped working
Here is the error:
Traceback (most recent call last):
File "<ipython-input-5-aaef40198ed6>", line 1, in <module>
runfile('C:/Users/Alex/Desktop/Stocks/Python Stock Code/BizNet.py', wdir='C:/Users/Alex/Desktop/Stocks/Python Stock Code')
File "C:\Users\Alex\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 710, in runfile
execfile(filename, namespace)
File "C:\Users\Alex\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 101, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/Alex/Desktop/Stocks/Python Stock Code/BizNet.py", line 99, in <module>
BizNet_test.accuracy_Test(companyInputOrderArray,input,model)
File "C:\Users\Alex\Desktop\Stocks\Python Stock Code\BizNet_test.py", line 125, in accuracy_Test
xl.ActiveWorkbook.SaveAs(results_title)
File "C:\Users\Alex\AppData\Local\Temp\gen_py\3.5\00020813-0000-0000-C000-000000000046x0x1x9\_Workbook.py", line 284, in SaveAs
, AccessMode, ConflictResolution, AddToMru, TextCodepage, TextVisualLayout
com_error: (-2147352562, 'Invalid number of parameters.', None, None)
Got it!!!
There was a temporary cache folder "gen_py" that I had to delete. The one that was referenced by the file path in the error.
"C:\Users\Alex\AppData\Local\Temp\gen_py\3.5\00020813-0000-0000-C000-000000000046x0x1x9\_Workbook.py"
I have no clue why this worked or how the error initially occurred, but everything is fine now.
Related
I have been building a web scraper in python/selenium/openpyxl as part of my job for the last two months. Considering it's my first time doing something like this it has been relatively successfull, it has produced results. But one thing that I have never been able to figure out how to do properly is, how am I supposed to loop through an .xlsx document?
I am working with files with 500k+ rows. I have split them up to 100k each though so that the size wouldn't become an issue.
So my problem is, when I loop through the document in this way:
wb = load_workbook(filePath, read_only=True)
ws = wb.active
while (currentRow <= docLength):
adress = ws.cell(row=currentRow, column=1)
car = ws.cell(row=currentRow, column=2)
#Scrape info and append into other document
currentRow += 1
It ends up consuming way too much memory, and the script goes very much slower already after 100 rows...
But if I do this, I get ParseError after a few hundred rows every time! It's very frustrating, as I think this is the right way to do it.
wb = load_workbook(filePath, read_only=True)
ws = wb.active
try:
for row in ws.iter_rows(min_row=1, max_col=10, max_row=docLength, values_only=True):
for x, cell in enumerate(row):
if x == 0:
adress = cell
if x == 1:
car = cell
#Scrape info and append into other document
currentRow += 1
except xml.etree.ElementTree.ParseError:
???
The full exception error:
Traceback (most recent call last):
File "C:\python\scrape.py", line 263, in <module>
for row in ws.iter_rows(min_row=1, max_col=10, max_row=docLength, values_only=True):
File "C:\Users\x\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl-3.0.3-py3.8.egg\openpyxl\worksheet\_read_only.py", line 78, in _cells_by_row
File "C:\Users\x\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl-3.0.3-py3.8.egg\openpyxl\worksheet\_reader.py", line 142, in parse
File "C:\Users\x\AppData\Local\Programs\Python\Python38-32\lib\xml\etree\ElementTree.py", line 1227, in iterator
yield from pullparser.read_events()
File "C:\Users\x\AppData\Local\Programs\Python\Python38-32\lib\xml\etree\ElementTree.py", line 1302, in read_events
raise event
File "C:\Users\x\AppData\Local\Programs\Python\Python38-32\lib\xml\etree\ElementTree.py", line 1274, in feed
self._parser.feed(data)
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 769027
Why is this happening? :(
And why is it saying such a high column number, is it because of the xml formating?
I have an .xlsx file which contains 2 worksheets. The first one contains regular data (nothing fancy), while the second one contains pivot tables. I need only the data form the first worksheet and I want to ignore the second one, but the pivot tables raise error: TypeError: expected <type 'basestring'> when openpyxl.load_workbook is called.
The error is raised in: openpyxl/reader/excel.py, in line: pivot_caches = parser.pivot_caches.
I tried with openpyxl versions 2.6.4 and 2.5.1. I'm using Python 2.7.
After deleting the 2nd worksheet, the error is gone and the data from the 1sth worksheet is read correctly. However, these files are uploaded by users and although I don't need the pivot tables, I would like to avoid forcing users to remove the unnecessary worksheet(s), if possible.
Sample code:
from io import BytesIO
import openpyxl
pivot = '~/Downloads/file_with_pivot_tables.xlsx'
with open(pivot) as fin:
content = BytesIO(fin.read())
wb = openpyxl.load_workbook(content) # this line fails
ws = wb.get_sheet_by_name('Sheet1')
Entire error trace:
File "/Users/gi/lib/openpyxl/reader/excel.py", line 224, in load_workbook
pivot_caches = parser.pivot_caches
File "/Users/gi/lib/openpyxl/packaging/workbook.py", line 125, in pivot_caches
cache = get_rel(self.archive, self.rels, id=c.id, cls=CacheDefinition)
File "/Users/gi/lib/openpyxl/packaging/relationship.py", line 162, in get_rel
obj.deps = get_dependents(archive, rels_path)
File "/Users/gi/lib/openpyxl/packaging/relationship.py", line 130, in get_dependents
rels = RelationshipList.from_tree(node)
File "/Users/gi/lib/openpyxl/descriptors/serialisable.py", line 84, in from_tree
obj = desc.expected_type.from_tree(el)
File "/Users/gi/lib/openpyxl/descriptors/serialisable.py", line 100, in from_tree
return cls(**attrib)
File "/Users/gi/lib/openpyxl/packaging/relationship.py", line 50, in __init__
self.Target = Target
File "/Users/gi/lib/openpyxl/descriptors/base.py", line 44, in __set__
raise TypeError('expected ' + str(self.expected_type))
TypeError: expected <type 'basestring'>
You can specify the sheet that you want to manipulate:
wb = openpyxl.load_workbook('H:\\myfile.xlsx')
ws = wb['sheet1']
ws['E1'] = 'The sky is gray.'
wb.save('H:\\myfile.xlsx')
wb.close()
You can also get a list of all the sheet names if you need to check them first:
print(wb.sheetnames)
I have a bunch of xlsx files, named from 1 to 169 like '1.xlsx', '2.xlsx' and so on... But while going through for loop, that read that files, the code does not see any rows in the 11th file (nrows in 11th file always is 0, while it is not if you open it manually) and gives me the IndexError (while these files are not empty).
I have no idea of what is going on with that code.
import os, xlwt, xlrd
file_dir = 'docs/'
files = os.listdir(file_dir)
#Open file and, read neaded variables and write them
def r_file(path, file):
workbook = xlrd.open_workbook(path+file)
info_sheet = workbook.sheet_by_index(0)
data_sheet = workbook.sheet_by_index(1)
#cells with company info
print info_sheet.nrows
company_name = info_sheet.cell(3,3).value
company_leg_adress = info_sheet.cell(4,3).value
company_fact_adress = info_sheet.cell(5,3).value
#cells with answers
question_1 = data_sheet.cell(3,10).value
question_1_1 = data_sheet.cell(8,2).value
question_1_2 = data_sheet.cell(13,2).value
question_2 = data_sheet.cell(18,10).value
question_3 = data_sheet.cell(25,10).value
question_3_additional = [data_sheet.cell(nrow,10).value for nrow in range(30,48)]
question_4 = data_sheet.cell(51,2).value
question_5 = data_sheet.cell(56,2).value
#get full row in list
row_as_list = [company_name,company_leg_adress,company_fact_adress, question_1, question_1_1, question_1_2, question_2, question_3, question_4]+question_3_additional
return row_as_list
#write companies in file
def w_file(companies):
wb = xlwt.Workbook()
ws = wb.add_sheet('aggr', cell_overwrite_ok=True)
for company in companies:
row_as_list = r_file(file_dir,str(company)+'.xlsx')
for each_index in row_as_list:
ws.write(company, row_as_list.index(each_index) , each_index)
wb.save('aggregation.xls')
companies_amount = [x for x in range(1,170)]
w_file(companies_amount)
after running it, it returns:
Traceback (most recent call last):
File "/home/ubuntu/workspace/ex50/bin/writing.py", line 44, in <module>
w_file(companies_amount)
File "/home/ubuntu/workspace/ex50/bin/writing.py", line 36, in w_file
row_as_list = r_file(file_dir,str(company)+'.xlsx')
File "/home/ubuntu/workspace/ex50/bin/writing.py", line 13, in r_file
company_name = info_sheet.cell(3,3).value
File "/usr/local/lib/python2.7/dist-packages/xlrd-1.0.0-py2.7.egg/xlrd/sheet.py", line 401, in cell
self._cell_types[rowx][colx],
IndexError: list index out of range
and it makes it only on the 11th file (no matter wich file will be the 11th).
Can you tell me what is going on with that thing?
Hey I have a csv with multilingual text. All I want is a column appended with a the language detected. So I coded as below,
from langdetect import detect
import csv
with open('C:\\Users\\dell\\Downloads\\stdlang.csv') as csvinput:
with open('C:\\Users\\dell\\Downloads\\stdlang.csv') as csvoutput:
writer = csv.writer(csvoutput, lineterminator='\n')
reader = csv.reader(csvinput)
all = []
row = next(reader)
row.append('Lang')
all.append(row)
for row in reader:
row.append(detect(row[0]))
all.append(row)
writer.writerows(all)
But I am getting the error as LangDetectException: No features in text
The traceback is as follows
runfile('C:/Users/dell/.spyder2-py3/temp.py', wdir='C:/Users/dell/.spyder2-py3')
Traceback (most recent call last):
File "<ipython-input-25-5f98f4f8be50>", line 1, in <module>
runfile('C:/Users/dell/.spyder2-py3/temp.py', wdir='C:/Users/dell/.spyder2-py3')
File "C:\Users\dell\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 714, in runfile
execfile(filename, namespace)
File "C:\Users\dell\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 89, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/dell/.spyder2-py3/temp.py", line 21, in <module>
row.append(detect(row[0]))
File "C:\Users\dell\Anaconda3\lib\site-packages\langdetect\detector_factory.py", line 130, in detect
return detector.detect()
File "C:\Users\dell\Anaconda3\lib\site-packages\langdetect\detector.py", line 136, in detect
probabilities = self.get_probabilities()
File "C:\Users\dell\Anaconda3\lib\site-packages\langdetect\detector.py", line 143, in get_probabilities
self._detect_block()
File "C:\Users\dell\Anaconda3\lib\site-packages\langdetect\detector.py", line 150, in _detect_block
raise LangDetectException(ErrorCode.CantDetectError, 'No features in text.')
LangDetectException: No features in text.
This is how my csv looks like
1)skunkiest smokiest yummiest strain pain killer and mood lifter
2)Relaxation, euphorique, surélevée, somnolence, concentré, picotement, une augmentation de l’appétit, soulager la douleur Giggly, physique, esprit sédation
3)Reduzierte Angst, Ruhe, gehobener Stimmung, zerebrale Energie, Körper Sedierung
4)Calmante, relajante muscular, Relajación Mental, disminución de náuseas
5)重いフルーティーな幸せ非常に強力な頭石のバースト
Please help me with this.
You can use something like this to detect which line in your file is throwing the error:
for row in reader:
try:
language = detect(row[0])
except:
language = "error"
print("This row throws and error:", row[0])
row.append(language)
all.append(row)
What you're going to see is that it probably fails at "重いフルーティーな幸せ非常に強力な頭石のバースト". My guess is that detect() isn't able to 'identify' any characters to analyze in that row, which is what the error implies.
Other things, like when the input is only a URL, also cause this error.
The error occurred when passing an object with no letters to detect. At least one letter should be there.
To reproduce, run any of below commands:
detect('.')
detect(' ')
detect('5')
detect('/')
So, you may apply some text pre-processing first to drop records in which row[0] value is an empty string,a null value, a white space, a number, a special character, or simply doesn't include any alphabets.
the problem is a null text or something like ' ' with no value;
check this in a condition and loop your reader in a list comprehension or
from langdetect import detect
textlang = [detect(elem) for elem in textlist if len(elem) > 50]
textlang = [detect(elem) if len(elem) > 50 else elem == 'no' for elem in textlist]
or with a loop
texl70 = df5['Titletext']
langdet = []
for i in range(len(df5)):
try:
lang=detect(texl70[i])
except:
lang='no'
print("This row throws error:", texl70[i])
langdet.append(lang)
The error occurrs when string has no letters. If you want to ignore that row and continue the process.
for i in df.index:
str = df.iloc[i][1]
try:
lang = detect(str)
except:
continue
I have hundreds of XML files that I need to extract two values from and ouput in an Excel or CSV file. This is the code I currently have:
#grabs idRoot and typeId root values from XML files
import glob
from openpyxl import Workbook
from xml.dom import minidom
import os
wb = Workbook()
ws = wb.active
def typeIdRoot (filename):
f = open(filename, encoding = "utf8")
for xml in f:
xmldoc = minidom.parse(f)
qmd = xmldoc.getElementsByTagName("MainTag")[0]
typeIdElement = qmd.getElementsByTagName("typeId")[0]
root = typeIdElement.attributes["root"]
global rootValue
rootValue = root.value
print ('rootValue =' ,rootValue,)
ws.append([rootValue])
wb.save("some.xlsx")
wb = Workbook()
ws = wb.active
def idRoot (filename):
f = open(filename, encoding = "utf8")
for xml in f:
xmldoc = minidom.parse(f)
tcd = xmldoc.getElementsByTagName("MainTag")[0]
activitiesElement = tcd.getElementsByTagName("id")[0]
sport = activitiesElement.attributes["root"]
sportName = sport.value
print ('idRoot =' ,sportName,)
ws.append([idRoot])
wb.save("some.xlsx")
for file in glob.glob("*.xml"):
typeIdRoot (file)
for file in glob.glob("*.xml"):
idRoot (file)
The first value follows a 1.11.111.1.111111.1.3 format. The second mixes letters and numbers. I believe this is the reason for the error:
Traceback (most recent call last):
File "C:\Python34\Scripts\xml\good.py", line 64, in <module>
idRoot (file)
File "C:\Python34\Scripts\xml\good.py", line 54, in idRoot
ws.append([idRoot])
File "C:\Python34\lib\site-packages\openpyxl\worksheet\worksheet.py", line 754, in append
cell = self._new_cell(col, row_idx, content)
File "C:\Python34\lib\site-packages\openpyxl\worksheet\worksheet.py", line 376, in _new_cell
cell = Cell(self, column, row, value)
File "C:\Python34\lib\site-packages\openpyxl\cell\cell.py", line 131, in __init__
self.value = value
File "C:\Python34\lib\site-packages\openpyxl\cell\cell.py", line 313, in value
self._bind_value(value)
File "C:\Python34\lib\site-packages\openpyxl\cell\cell.py", line 217, in _bind_value
raise ValueError("Cannot convert {0} to Excel".format(value))
ValueError: Cannot convert <function idRoot at 0x037D24F8> to Excel
I would like the result to add both values on the same row. So then I would have a new row for each file in the directory. I need to add the second value to the second row.
as such:
Value 1 Value 2
1.11.111.1.111111.1.3 10101011-0d10-0101-010d-0dc1010e0101
idRoot is the name of your FUNCTION.
So when you write
ws.append([idRoot])
you probably mean:
ws.append([sportName])
Of course, you can write something like:
ws.append([rootValue, sportName])
providing both variables are defined with reasonable values.
One last thing, you should save your file only once.