I'm using openpyxl to read some numerical values from Excel files, while proceeding to read the numbers on a column I want to avoid the division by zero cells. I know that there are 4 or 5 among 100 numbers.
I used the if not conditions in the way:
N=[]
If not ZerodivisionError:
N.append(cell.value)
Else
Break
But this turns the list empty. If I don't use the condition the error numbers will also be present in the list
I found the answer to my question and would like to leave it here to be more useful :
Since division by zero is as '#DIV/0!', that's enough to set the condition :
if type(cell.value) is not str :
and the cell will be avoided
Why not trap the error in the excel spreadsheet earlier:
=IFERROR(your_formula,0)
Replace the zero as suits your application : a blank or text etc.
Related
The Problem is that the excel cell contains a break(\n):
it looks like this (and don't want to change it, so that it doesn't contain a break, due to practical reasons):
Example-
123
My Code works for other cells, but doesn't work in this case where the cell has a break. I tried swapping "Example-123" to "Example-\n123" and "Example-\r123" but this didn't work.
How can I compare the two strings, ignoring the fact that the string contains a break?
if column[2].value == "Example-123":
example_dict.update(column_Test=column[2].column - 1)
test123 = str(row[example_dict["column_Test"]].value)
I am trying to manipulate a dataframe. The value of in a list which I use to append a column to the dataframe is 161137531201111100. However, I created a dictionary whose keys are the unique values of this column, and I use this dictionary in further operations. This could used to run perfectly before.
However, after trying this code on another data I had the following error:
KeyError: 1.611375312011111e+17
which means that this value is not the of the dictionary; I tried to trace the code, everything seemed to be okay. However, when I opened the csv file of the dataframe I built I found out that the value that is causing the problem is: 161137531201111000 which is not in the list(and ofc not a key in the dictionary) I used to create this column of dataframe. This seems weird. However, I don't know what is the reason? Is there any reason that a number is saved in another way?
And how can I save it as it is in all phases? Also, why did it change in the csv?
No unfortunately, they are not equal
print(1.611375312011111e+17 == 161137531201111000)` # False.
The problem lies in the way floating numbers are handled by computers, in general, and most programming languages, including Python.
Always use integers (and not "too large") when doing computations if you want exact results.
See Is floating point math broken? for generic explanation that you definitely must know as a programmer, even if it's not specific to Python.
(and be aware that Python tries to do a rather good job at keeping precision on integers, that unfortunately won't work on floating-point numbers).
And just for the sake of "fun" with floating point numbers, 1.611375312011111e+17 is actually equal to the integer 161137531201111104!
print(format (1.611375312011111e+17, ".60g")) # shows 161137531201111104
print(1.611375312011111e+17 == 161137531201111104) # True
a = dict()
a[1.611375312011111e+17] = "hello"
#print(a[161137531201111100]) # Key error, as in question
print(a[161137531201111104]) # This one shows "hello" properly!
I am writing a python method that checks a specific column in Excel and highlights duplicate values in red (if any), then copy those rows onto a separate sheet that I will use to check to see why they have duplicate values. This is just for Asset Management where I want to check to make sure there are no two exact serial numbers or Asset ID numbers etc.
At this moment I just want to check the column and highlight duplicate values in red. As of now, I have this method started and it runs it just does not highlight of the cells that have duplicate values. I am using a test sheet with these values in column A,
(336,565,635,567,474,326,366,756,879,567,453,657,678,324,987,667,567,657,567)The number "567" repeats a few times.
def check_duplicate_values(self,wb):
self.wb=wb
ws=self.wb.active
dxf = DifferentialStyle(fill=self.red_fill())
rule = Rule(type="duplicateValues", dxf=dxf, stopIfTrue=None, formula=['COUNTIF($A$1:$A1,A1)>1'])
ws.conditional_formatting.add('Sheet1!$A:$A',rule) #Not sure if I need this
self.wb.save('test.xlsx')
In Excel, I can just create a Conditional Format rule to accomplish this however in OpenPyXL I am not sure if I am using their built-in methods correctly. Also, could my formula be incorrect?
Whose built-in methods are you referring to? openpyxl is a file format library and, hence, allows you manage conditional formats as they are stored in Excel worksheets. Unfortunately, the details of the rules are not very clear from the specification so form of reverse engineering from an existing is generally required, though it's probably worth noting that rules created by Excel are almost always more verbose than actually required.
I would direct further questions to the openpyxl mailing list.
Just remove the formula and you're good to go.
duplicate_rule = Rule(type="duplicateValues", dxf=dxf, stopIfTrue=None)
You can also use unique rule:
unique_rule = Rule(type="uniqueValues", dxf=dxf, stopIfTrue=None)
Check this out for more info: https://openpyxl.readthedocs.io/en/stable/_modules/openpyxl/formatting/rule.html#RuleType
I am trying to read till the last empty cell for the specified number of rows.
Here is my code:
for j in xrange(0,REPEAT_CONST,1):
if r_sheet.cell_type(row+j,0)== xlrd.XL_CELL_EMPTY:
break
This code only works if the cell has been written something earlier and deleted. but will not work if the cell is never been edited. Not sure how to handle this.
Could you please help me to do this.
I will be grateful for your support.
Regards,
Pavan
The error you're experiencing suggests the rowx or colx arguments are out of bounds for this sheet.
In the cell access functions, "rowx" is a row index, counting from zero, and "colx" is a column index, counting from zero. Negative values for row/column indexes and slice positions are supported in the expected fashion.
I just found this out (never used XLRD before) but if you query the r_sheet.nrows I bet you'll find that the value is less than row+j. It appears that xlrd only reads part of the worksheet, essentially the UsedRange from Excel.
So you can use some exception handling, either a try/catch block or you could do this. Note this should short-circuit on the first part of the boolean expression whenever row+j is greater than the nrows attribute of that sheet.
if (row+j <= r_sheet.nrows-1) and (r_sheet.cell_value(row+j,0) = ''):
break
Or, perhaps even with your original method:
if (row+j <= r_sheet.nrows-1) and (r_sheet.cell_type(row+j,0)== xlrd.XL_CELL_EMPTY):
break
There are numerous questions about how to stop Excel from interpreting text as a number, or how to output number formats with openpyxl, but I haven't seen any solutions to this problem:
I have an Excel spreadsheet given to me by someone else, so I did not create it. When I open the file with Excel, I have certain values like "5E12" (clone numbers, if anyone cares) that appear to display correctly, but there's a little green arrow next to each one warning me that "This appears to be a number stored as text". Excel then asks me if I would like to convert it to a number, and if I saw yes, I get 5000000000000, which then converts automatically to scientific notation and displays 5E12 again, only this time a text output would show the full number with zeroes. Note that before the conversion, this really is text, even to Excel, and I'm only being warned/offered to convert it.
So, when reading this file in with openpyxl (from openpyxl.reader.excel import load_workbook), the 5E12 is getting converted automatically to 5000000000000. I assume that openpyxl is making the same assumption that Excel made, only the conversion happens without a prompt or input on my part.
How can I prevent this from happening? I do not want text that look like "numbers stored as text" to convert to numbers. They are text unless I say so.
So far, the only solution I have found is to add single quotes to the front of each cell, but this is not an ideal solution, as it's manual labor rather than a programmatic solution. Also, the solution needs to be general, since I don't always know where this problem might occur (I'm reading millions of lines per day, so I don't want to have to do anything by hand).
I think this is a problem with openpyxl. There is a google group discussion from the beginning of 2011 that mentions this problem, but assumes it's too rare to matter. https://groups.google.com/forum/?fromgroups=#!topic/openpyxl-users/HZfpShMp8Tk
So, any suggestions?
If you want to use openpyxl again (for whatever reason), the following changes to the worksheet reader routine do the trick of keeping the strings as strings:
diff --git a/openpyxl/reader/worksheet.py b/openpyxl/reader/worksheet.py
--- a/openpyxl/reader/worksheet.py
+++ b/openpyxl/reader/worksheet.py
## -134,8 +134,10 ##
data_type = element.get('t', 'n')
if data_type == Cell.TYPE_STRING:
value = string_table.get(int(value))
-
- ws.cell(coordinate).value = value
+ ws.cell(coordinate).set_value_explicit(value=value,
+ data_type=Cell.TYPE_STRING)
+ else:
+ ws.cell(coordinate).value = value
# to avoid memory exhaustion, clear the item after use
element.clear()
The Cell.value is a property and on assignment call Cell._set_value, which then does a Cell.bind_value which according to the method's doc: "Given a value, infer type and display options". As the types of the values are in the XML file those should be taken (here I only do that for strings) instead of doing something 'smart'.
As you can see from the code, the test whether it is a string was already there.