Python xlrd package issue - python

Is there any way to overcome the next problem:
When using the xlrd package in Python, the value '1' in the excel file is shows as '1.0'.
Now, I am writing a script that should be able to notice the difference, I am using this values as indexes, so '1' and '1.0' are completely different indexes, but I cant find any way to overcome this issue.
Is there something I can do?
Thanks.

Yes, this is a common problem with xlrd.
You generally need to convert to int, when the number is actually of int type and rest of the times you need it as a float itself.
So here is something that works out well in most cases:
int_value = int(float_value)
if float_value == int_value:
converted_value = int_value
else:
converted_value = float_value
Example
>>> a = 123.0
>>> type(a)
<type 'float'>
>>> b = int(a)
>>> a == b
True

I had the same issue while parsing excel files using xlrd. The easiest route that I found was to just convert the known float values to ints. Unless you happen to have the situation, where you don't know the datatype of the field you are parsing.
In that situation, you can try converting to int, and catching the error. For eg:
try:
converted_int = int(float_number)
except Error:
//log or deal with error here.

use int(some_number) to convert the number to integer.
You can see the documentation for int function from here: BIF
Hope this will help you :)

You can convert the number to int using int(input_number).
An idea to use isinstance() method also helps. You can use something like
import xlrd
book = xlrd.open_workbook('input.xlsx')
sheet = book.sheet_by_index(0)
data = []
for row in range(1, sheet.nrows): # skip the header
l = []
for column in range(0, sheet.ncols):
val = sheet.cell(row, column).value
if isinstance(val,float):
l.append(int(val))
else:
l.append(val)
data.append(l)
print(data) # [['id101', 1], ['id102', 2], ['id103', 3]]

Related

How to get rid or ignore NaNs? error float when using split

In the y column I have words but in some rows there are NaNs so it shows me an error that I can't use split on float because of the nans.
What to change?
Column y looks like text1,text2,nun, nun,text3
missing_values = ["#N/A","None","nan"]
df_eval = pd.read_csv('e.csv',na_values = missing_values)
with open('e1.csv', 'a', newline='',errors='ignore', encoding="utf8" ) as f:
writer = csv.writer(f)
#df_eval= pd.read_csv(f,na_values = missing_values)
# Importing the dataset
for j in range(len(df_eval)):
x=df_eval.iloc[j,1]
#print("X", x)
# x1=x.split("|")
# # print(x1)
y=df_eval.iloc[j,2]
#df_new = df[df_eval.iloc[j,2]].notnull()]
if str(y=="nan"):
continue
print("y", y)
Thank you in advance
You need to handle the case when x is nan separately, you could either skip the loop using continue as shown below. or set x to be an empty string x = '' if you think that is more appropriate.
...
for j in range(len(df_eval)):
x=df_eval.iloc[j,1]
if pd.isna(x):
continue;
...
...
I suspect you just want something like this:
from math import isnan
for row in df.iterrows():
entries = [x for x in row[1] if not isnan(x)]
do_stuff(entries)
in other words, since you're iterating anyhow, you might as well just build a standard python type (list) containing the variables you want, and then work with that.
The basic approach---check whether something isnan() or not is more widely applicable.
Note that you can't use .split() on a float, whether it's nan or not.

Issues when trying to sum the content of a list that's in STR type

I've been trying to sum the content of list ['6134,15', '20432,65', '10588795,61'] obtained from a query to a DB2 table using ibm_db module (I'm also using pandas to frame the results). These are dollar amounts, but they are retrieved as strings so I can't sum them. I have tried:
# This gets me the list from the column I'm interested in.
amtTot = maind.get('INVOICE_TOTAL')
for char in amtTot:
a = char.replace(",",".")
b = float(a)
print(b)
And in return I get each value - that's ok, but I need the sum of them. I tried to apply something like sum(b) but I get error "float is not iterable". If I try with int instead of float, I get "invalid literal for int() with base 10: '6134,15'".
Simplest solution would be use a variable to keep track of the sum:
total = 0
for char in amtTot:
a = char.replace(",",".")
b = float(a)
print(b)
total += b
print(total)
Try this:
cleanList = [float(i.replace(",", ".")) for i in amtTot]
sum(cleanList)
Hope this helps!

Pyexcel changing a cell value

So I was using openpyxl for all my Excel projects, but now I have to work with .xls files, so I was forced to change library. I've chosen pyexcel cuz it seemed to be fairly easy and well documented. So I've gone through hell with creating hundreds of variables, cuz there is no .index property, or something.
What I want to do now is to read the column in the correct file, f.e "Quantity" column, and get f.e value 12 from it, then check the same column in other file, and if it is not 12, then make it 12. Easy. But I cannot find any words about changing a single cell value in their documentation. Can you help me?
I didn't get it, wouldn't it be the most simple thing?
column_name = 'Quantity'
value_to_find = 12
sheets1 = pe.get_book(file_name='Sheet1.xls')
sheets1[0].name_columns_by_row(0)
row = sheets1[0].column[column_name].index(value_to_find)
sheets2 = pe.get_book(file_name='Sheet2.xls')
sheets2[0].name_columns_by_row(0)
if sheets2[0][row, column_name] != value_to_find:
sheets2[0][row, column_name] = value_to_find
EDIT
Strange, you can only assign values if you use cell_address indexing, must be some bug. Add this function:
def index_to_letter(n):
alphabet = list('ABCDEFGHIJKLMNOPQRSTUVWXYZ')
result = []
while (n > 26):
result.insert(0, alphabet[(n % 26)])
n = n // 26
result.insert(0, alphabet[n])
return ''.join(result)
And modify the last part:
sheets2[0].name_columns_by_row(0)
col_letter = index_to_letter(sheets2[0].colnames.index(column_name))
cel_address = col_letter+str(row+1)
if sheets2[0][cel_address] != value_to_find:
sheets2[0][cel_address] = value_to_find
EDIT 2
Looks like you cannot assign only when you use the column name directly, so a around would be to find the column_name's index:
sheets2[0].name_columns_by_row(0)
col_index = sheets2[0].colnames.index(column_name)
if sheets2[0][row, col_index] != value_to_find:
sheets2[0][row, col_index] = value_to_find
Excel uses 2 sets of references to a cell. Cell name ("A1") and cell vector (Row, Column).
The PyExcel documentation tutorial states it supports both methods. caiohamamura's method tries to build the cell name - you don't need to if the cells are in the same location in each file, you can use the vector.
Once you have the cell, assigning a value to a single cell is simple - you assign the value. Example:
import pyexcel
sheet = pyexcel.get_sheet(file_name="Units.xls")
print(sheet[3,2]) # this gives me "cloud, underwater"
sheet[3,2] = "cloud, underwater, special"
sheet.save_as("Units1.xls")
Note that all I had to do was "sheet[3,2] =".
This is not explicitly stated but is hinted at in the pyexcel documentation where it states that to update a whole column you do:
sheet.column["Column 2"] = [11, 12, 13]
i.e. replace a list by assigning a new list. Same logic applies to a single cell - just assign a new value.
Bonus - the [Row, column] method gets around cell locations in columns greater than 26 (i.e. 'AA' and above).
Caveat - make sure in your comparison you are comparing like-for-like i.e. int is understood to be an int and not a str. Python should implicitly converted but in some circumstances it may not - especially if you are using python 2 and Unicode is involved.

Convert Excel row,column indices to alphanumeric cell reference in python/openpyxl

I want to convert the row and column indices into an Excel alphanumeric cell reference like 'A1'. I'm using python and openpyxl, and I suspect there's a utility somewhere in that package that does this, but I haven't found anything after some searching.
I wrote the following, which works, but I'd rather use something that's part of the openpyxl package if it's available.
def xlref(row,column):
"""
xlref - Simple conversion of row, column to an excel string format
>>> xlref(0,0)
'A1'
>>> xlref(0,26)
'AA1'
"""
def columns(column):
from string import uppercase
if column > 26**3:
raise Exception("xlref only supports columns < 26^3")
c2chars = [''] + list(uppercase)
c2,c1 = divmod(column,26)
c3,c2 = divmod(c2,26)
return "%s%s%s" % (c2chars[c3],c2chars[c2],uppercase[c1])
return "%s%d" % (columns(column),row+1)
Does anyone know a better way to do this?
Here's the full new xlref using openpyxl.utils.get_column_letter from #Rick's answer:
from openpyxl.utils import get_column_letter
def xlref(row, column, zero_indexed=True):
if zero_indexed:
row += 1
column += 1
return get_column_letter(column) + str(row)
Now
>>> xlref(0, 0)
'A1'
>>> xlref(100, 100)
'CW101'
Looks like openpyxl.utils.get_column_letter does the same function as my columns function above, and is no doubt a little more hardened than mine is. Thanks for reading!
Older question, but maybe helpful: when using XlsxWriter, one can use xl_rowcol_to_cell() like this:
from xlsxwriter.utility import xl_rowcol_to_cell
cell = xl_rowcol_to_cell(1, 2) # C2
See Working with Cell Notation.

Arcmap Field Calculator Python sPrefix

I am working in Arcmap using the Field Calculator.
I have a attibute with values like the follwoing:
"addr:city"="Bielefeld","addrostcode"="33699","addr:street"="Westerkamp"
"addr:city"="Bielefeld","addr:street"="Detmolder Straße"
"addr:city"="Bielefeld","addr:housenumber"="34"
I want to extract them into individual attributes.
So I thought I need codes like:
dim city
if sPrefix = "addr:city":
return everything past "addr:city" until a comma appears
Any ideas how to solve that. I don't have much experience unfortunatley.
Thanks,
Uli!
here is a screenshot
screenshot
Have a look at python's csv module.
Edit:
I've never used Arcmap, but I'd imagine you can still import modules in it.
If the strings are pretty regular, you could just parse the data without it though:
eg.
#test.py
def func(s, srch):
parts = dict([item.replace('"','').split('=') for item in s.split(',')])
return parts.get(srch,'')
if __name__ == '__main__':
tags = '"addr:city"="Bielefeld","addrostcode"="33699","addr:street"="Westerkamp"'
print func(tags, 'addr:city')
>python test.py
>Bielefeld
something like this, define your own function:
In [40]: def func(x,item):
spl=strs.split(",")
for y in spl:
if item in y:
return y.split("=")[-1].strip('"')
....:
....:
In [53]: strs='"addr:city"="Bielefeld","addrostcode"="33699","addr:street"="Westerkamp"'
In [54]: func(strs,"addr:city")
Out[54]: 'Bielefeld'
In [55]: func(strs,"addr:street")
Out[55]: 'Westerkamp'
As I read your question, you want to extract a string which looks like '"addr:city"="Bielefeld","addr:housenumber"="34"' into individual (key, value) pairs. The easiest way to do this is probably to use the csv reader (http://docs.python.org/2/library/csv.html). You will need to determine exactly how to use it in your use case, but here is a generic example which is likely to work:
import csv
for pairs in csv.reader(attribute_list):
key, value = pair.split('"="')
print key, value

Categories

Resources