How to format numbers without comma in csv using python? [duplicate] - python

This question already has answers here:
How can I convert a string with dot and comma into a float in Python
(9 answers)
Closed 5 months ago.
I am given a csv file which contains numbers ranging from 800 to 3000. The problem is numbers greater than thousand has a comma in them e.g. 1,227 or 1,074 or 2,403.
When I want to calculate their mean, variance or standard deviation using scipy or numpy, I get error: ValueError: could not convert string to float: '1,227'. How convert them to numbers so that I could do calculations on them. CSV file should not be changed as it is read only file.

Thanks, guys! I fixed it by using replace function. hpaulj's link was useful.
my_string=[val[2] for val in csvtext]
my_string=[x.replace(',', '') for x in my_string]
my_float=[float(i) for i in my_string]
This is the code, in which, 1st line loads csv string list to my_string and 2nd line removes comma and 3rd line produces numbers that are easy for calculation. So, there is no need for editing the file or creating a new one. Just a list manipulation will do the job.

This really is a locale issue, but a simple solution would be to simply call replace on the string first:
a = '1,274'
float(a.replace(',','')) # 1274.0
Another way is to use pandas to read the csv file. Its read_csv function has a thousands argument.
If you do know something about the locale, then it's probably best to use the locale.atof() function

Related

How to skip first two numbers in provide argument? (python) [duplicate]

This question already has answers here:
How to remove leading and trailing zeros in a string? Python
(7 answers)
Closed last year.
I'm having issues skipping or trimming the first two numbers in a provided argument.
As an example I am passing the value of "00123456" to 'id'. I want the request.args.get against 123456 instead 00123456. is there a function I can use to drop off the zero's. Also relatively new to the python world so please advise if I need to provide more info.
#main.route('/test')
def test():
"""
Test route for validating number
example - /test?id=00123456
"""
# Get number passed in id argument
varNum = request.args.get('id')
You can convert the "00123456" to an int and it will remove all the zeros at the start of the string.
print(int("00123456"))
output:
123456
Edit:
Use this only if you want to remove any zeros at the start of the number, if you want to remove the first two chars use string slicing.
Also, use this only if u know for sure that the str will only contain numbers.
You can use string slicing, if you know that there are always two zeroes:
varNum = request.args.get('id')[2:]
Alternatively, you can use .lstrip(), if you don't know how many leading zeroes there are in advance:
varNum = request.args.get('id').lstrip('0')

Reformatting a txt file with characters at certain positions using python

Very newbie programmer asking a question here. I have searched all over the forums but can't find something to solve this issue I thought there would be a simple function for. Is there a way to do this?
I am trying to reformat a txt file so I can use it with the pandas function but this requires my data to be in a specific format.
Currently my data is in the following format of a txt file:
01/09/21,00:28,7.1,75,3.0,3.7,3.7,292,0.0,0.0,1025.8,81.9,17.1,44,3.7,4.6,7.1,0,0,0.00,0.00,3.0,0,0.0,292,0.0,0.0
01/09/21,00:58,7.0,75,2.9,5.1,5.1,248,0.0,0.0,1025.9,81.9,17.0,44,5.1,3.8,7.0,0,0,0.00,0.00,1.9,0,0.0,248,0.0,0.0
it is required to be formatted like this for processing using pandas:
["06/09/21","19:58",11.4,69,5.9,0.0,0.0,0,0.0,0.3,1006.6,82.2,21.8,52,0.0,11.4,11.4,0,0,0.00,0.00,10.5,0,1.5,0,0.0,0.3],
["06/09/21","20:28",10.6,73,6.0,0.0,0.0,0,0.0,0.3,1006.3,82.2,22.4,49,0.0,10.6,10.6,0,0,0.00,0.00,9.7,0,1.5,0,0.0,0.3],
This requires adding a [" at the start and adding a " at the end of the date before the comma, then adding another " after the comma and another " at the end of the time section. At the end of the line, I also need to add a ],
I thought something like this would work but i get an error when trying to run it.
info =
06/09/21,19:58,11.4,69,5.9,0.0,0.0,0,0.0,0.3,1006.6,82.2,21.8,52,0.0,11.4,11.4,0,0,0.00,0.00,10.5,0,1.5,0,0.0,0.3
info=info[:1] +"['" +info[1:]
print (info)
I have over 1000 lines of data so doing it manually is out of the question. I've seen other questions like this, but they didn't get helpful answers. Can it be done, preferably with either a method or a loop?
You are confusing the CONTENTS of your data with the REPRESENTATION of your data. You don't really need brackets and quotes at all. What you need is a list that contains strings and integers. What you've shown there is how Python would PRINT a list containing strings and integers. The list doesn't actually contain brackets or quotes.
You can use pandas.read_csv directly on that data file with no extra processing. You just need to provide the column names.

How to use string slicing inside string.format [duplicate]

This question already has answers here:
Slicing strings in str.format
(6 answers)
Closed 6 years ago.
How can I do variable string slicing inside string.format like this.
"{0[:2]} Some text {0[2:4]} some text".format("123456")
Result I want result like this.
12 Some text 34 some text
You can't. Best you can do is limit how many characters of a string are printed (roughly equivalent to specifying a slice end), but you can't specify arbitrary start or end indices.
Save the data to a named variable and pass the slices to the format method, it's more readable, more intuitive, and easier for the parser to identify errors when they occur:
mystr = "123456"
"{} Some text {} some text".format(mystr[:2], mystr[2:4])
You could move some of the work from that to the format string if you really wanted to, but it's not a huge improvement (and in fact, involves larger temporaries when a slice ends up being needed anyway):
"{:.2s} Some text {:.2s} some text".format(mystr, mystr[2:])

Reading default output of Fortran in Python [duplicate]

This question already has answers here:
Read FORTRAN formatted numbers with Python
(4 answers)
Closed 6 years ago.
I have an output from old code in Fortran 77. The output is written with
write(NUM,*)
line. So basically, default format. Following is part of output:
1.25107598E-67 1.89781536E-61 1.28064971E-94 5.85754394-118 8.02718071E-94
I had a post-processing tool written in F77 and READ(NUM,*) read the input file correctly as:
1.25107598000000E-67 1.89781536000000E-61 1.28064971000000E-94 5.85754394000000E-118 8.02718071000000E-94
The problematic number is 5.85754394-118.
It will read correctly as it means 5.85754394E-118 in F77.
However, now I wrote a post-processing in python and I have a following line of code:
Z = numpy.fromstring(lines[nl], dtype=float, sep=' ')
which will read an output line by line (through loop on nl).
But when it reaches the 5.85754394-118 number it will stop reading, going to the next line of output and basically reading wrong number. Is there any way to read it in a correct way (default way of Fortran)?
I will guess I need to change dtype option but not have any clue.
You can post-process your output efficiently with a regular expression:
import re
r = re.compile(r"(?<=\d)\-(?=\d)")
output_line = "1.25107598E-67 1.89781536E-61 1.28064971E-94 5.85754394-118 8.02718071E-94 "
print(r.sub("E-",output_line))
result:
1.25107598E-67 1.89781536E-61 1.28064971E-94 5.85754394E-118 8.02718071E-94
(?<=\d)\-(?=\d) performs a lookbehind and lookahead for digits, and search for single minus sign between them. It replaces the minus sign by E-.

In xlrd how do I get the original cell value? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Integers from excel files become floats?
I have an excel spreadsheet that contains 1984, which xlrd handles as a number type, and thus gives me the value back as the float 1984.0. I want to get the original value as it appears in the spreadsheet, as a string "1984". How do I get this?
So internally in Excel, that 1984 is stored as a decimal number, so 1984.0 is correct. You could have changed the number formatting to show it as 1984.00, or whatever.
So are you asking how to query the cell formatting to tell that the number format is no decimals? If so you might look into using the formatting_info=True parameter of open_workbook
sheet = open_workbook(
'types.xls',formatting_info=True
).sheet_by_index(0)
Have you come across the python-excel.pdf document from http://www.python-excel.org/ ?
It is pretty good tutorial for learning to use xlrd and xlwt. Unfortunately, they say:
We've already seen that open_workbook has a parameter to load formatting information from Excel files. When this is done, all the formatting information is available, but the details of how it is presented are beyond the scope of this tutorial.
if cell.ctype==xlrd.XL_CELL_NUMBER
then excel is storing 1984 as a float and you would need to convert to a string in python
In excel
="1984" would be a string
'1984 would be a string, note that ' does not display
1984 is a #
The only kind of number is a float. The formatting attached to the cell determines if it represents a date, a decimal, or an integer. Look up the format string, and hopefully it will let you discern how the number is to be displayed.
Use string formatting:
"%d" % mynumber
>>> "%d" % 1984.0
'1984'

Categories

Resources