This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Integers from excel files become floats?
I have an excel spreadsheet that contains 1984, which xlrd handles as a number type, and thus gives me the value back as the float 1984.0. I want to get the original value as it appears in the spreadsheet, as a string "1984". How do I get this?
So internally in Excel, that 1984 is stored as a decimal number, so 1984.0 is correct. You could have changed the number formatting to show it as 1984.00, or whatever.
So are you asking how to query the cell formatting to tell that the number format is no decimals? If so you might look into using the formatting_info=True parameter of open_workbook
sheet = open_workbook(
'types.xls',formatting_info=True
).sheet_by_index(0)
Have you come across the python-excel.pdf document from http://www.python-excel.org/ ?
It is pretty good tutorial for learning to use xlrd and xlwt. Unfortunately, they say:
We've already seen that open_workbook has a parameter to load formatting information from Excel files. When this is done, all the formatting information is available, but the details of how it is presented are beyond the scope of this tutorial.
if cell.ctype==xlrd.XL_CELL_NUMBER
then excel is storing 1984 as a float and you would need to convert to a string in python
In excel
="1984" would be a string
'1984 would be a string, note that ' does not display
1984 is a #
The only kind of number is a float. The formatting attached to the cell determines if it represents a date, a decimal, or an integer. Look up the format string, and hopefully it will let you discern how the number is to be displayed.
Use string formatting:
"%d" % mynumber
>>> "%d" % 1984.0
'1984'
Related
When a number is long in my excel document, Excel formats the cell value to scientific notation (ex 1.234567e+5) while the true number still exists in the formula bar at the top of the document (ex 123456789012).
I want to convert this number to a string for my own purposes, but when I do, the scientific notation is captured, rather than the true number. How can I assure that it's the true number that is being converted to a string?
Python will ignore the formatting that Excel uses for anything other than dates and times, so you should just be able to convert the number to a string. You will, however, be limited by Excel's precision. The OOXML file format is not suitable for some tasks notably those with historical dates or high precision times.
This question already has answers here:
How can I convert a string with dot and comma into a float in Python
(9 answers)
Closed 5 months ago.
I am given a csv file which contains numbers ranging from 800 to 3000. The problem is numbers greater than thousand has a comma in them e.g. 1,227 or 1,074 or 2,403.
When I want to calculate their mean, variance or standard deviation using scipy or numpy, I get error: ValueError: could not convert string to float: '1,227'. How convert them to numbers so that I could do calculations on them. CSV file should not be changed as it is read only file.
Thanks, guys! I fixed it by using replace function. hpaulj's link was useful.
my_string=[val[2] for val in csvtext]
my_string=[x.replace(',', '') for x in my_string]
my_float=[float(i) for i in my_string]
This is the code, in which, 1st line loads csv string list to my_string and 2nd line removes comma and 3rd line produces numbers that are easy for calculation. So, there is no need for editing the file or creating a new one. Just a list manipulation will do the job.
This really is a locale issue, but a simple solution would be to simply call replace on the string first:
a = '1,274'
float(a.replace(',','')) # 1274.0
Another way is to use pandas to read the csv file. Its read_csv function has a thousands argument.
If you do know something about the locale, then it's probably best to use the locale.atof() function
I have data structured as follows:
['1404407396000',
'484745869385011200',
'0',
'1922149633',
"The nurse from the university said I couldn't go if I don't get another measles immunization...",
'-117.14384195',
'32.8110777']
I want to write this data to a csv file, but when I do, python converts the numbers to scientific notation (e.g 1.404E12).
I am using the following function to convert the list of lists to a csv:
def list_to_csv(data,name_of_csv_string):
import csv
"""
This function takes the list of lists created from the twitter data and
writes it to a csv.
data - List of lists
name_of_csv_string - What do you think this could be?
"""
with open(name_of_csv_string + ".csv", "wb") as f:
writer=csv.writer(f)
writer.writerows(data)
How can I avoid this?
By using the format specification mini language described here:
https://docs.python.org/2/library/string.html
Search for: 7.1.3.1. Format Specification Mini-Language
Use string formatting.
writer.writerows("%f" % data)
There are various formatting options you can check out here.
In my case, it was the Microsoft Excel app which was converting the numbers to scientific notation (even in the formula bar the numbers were in scientific notation).
Try opening the csv file using Notepad or a standard text editor to make sure if the numbers are saved as integers. In my case, the Notepad showed normal integer numbers, while it was Excel which showed them in the Scientific Notation form.
I am trying to figure out how to do some nice type inference on the columns of a CSV file.
Are there any libraries that might tell me, for example, that a column contains only integers?
All values are of course available in string format.
I will write my own tool if nothing of this sort already exists, but it seems weird to me that such a basic task does not have a library counterpart somewhere.
Why don't you do the straightforward approach?
if all values can be parsed as integers, to column is integers
otherwise, if all values can be parsed as doubles, to column is doubles
otherwise, the column is all strings
The reason why there is no library for this is probably because it's trivial to implement using the existing string to int and string to double conversion functions.
Regular expressions are good for that, in Python, you could use something like this:
import re
def str_is_num(s):
number_pattern = re.compile("-?^\d+(\.\d+)?$")
return re.match(number_pattern, s) != None
To check whether a cell is a number, you can evaluate str_is_num(cell)
This question already has answers here:
Suppress the u'prefix indicating unicode' in python strings
(11 answers)
Closed 8 years ago.
I want to go through data in my folder, identify them and rename them according to a list of rules I have in an excel spreadsheet
I load the needed libraries,
I make my directory the working directory;
I read in the xcel file (using xlrd)
and when I try to read the data by columns e.g. :
fname = metadata.col_values(0, start_rowx=1, end_rowx=None)
the list of values comes with a u in front of them - I guess unicode - such as:
fname = [u'file1', u'file2'] and so on
How can I convert fname to a list of ascii strings?
I'm not sure what the big issue behind having unicode filenames is, but assuming that all of your characters are ascii-valid characters the following should do it. This solution will just ignore anything that's non-ascii, but it's worth thinking about why you're doing this in the first place:
ascii_string = unicode_string.encode("ascii", "ignore")
Specifically, for converting a whole list I would use a list comprehension:
ascii_list = [old_string.encode("ascii", "ignore") for old_string in fname]
The u at the front is just a visual item to show you, when you print the string, what the underlying representation is. It's like the single-quotes around the strings when you print that list--they are there to show you something about the object being printed (specifically, that it's a string), but they aren't actually a part of the object.
In the case of the u, it's saying it's a unicode object. When you use the string internally, that u on the outside doesn't exist, just like the single-quotes. Try opening a file and writing the strings there, and you'll see that the u and the single-quotes don't show up, because they're not actually part of the underlying string objects.
with open(r'C:\test\foo.bar', 'w') as f:
for item in fname:
f.write(item)
f.write('\n')
If you really need to print strings without the u at the start, you can convert them to ASCII with u'unicode stuff'.encode('ascii'), but honestly I doubt this is something that actually matters for what you're doing.
You could also just use Python 3, where Unicode is the default and the u isn't normally printed.