This question already has answers here:
Read FORTRAN formatted numbers with Python
(4 answers)
Closed 6 years ago.
I have an output from old code in Fortran 77. The output is written with
write(NUM,*)
line. So basically, default format. Following is part of output:
1.25107598E-67 1.89781536E-61 1.28064971E-94 5.85754394-118 8.02718071E-94
I had a post-processing tool written in F77 and READ(NUM,*) read the input file correctly as:
1.25107598000000E-67 1.89781536000000E-61 1.28064971000000E-94 5.85754394000000E-118 8.02718071000000E-94
The problematic number is 5.85754394-118.
It will read correctly as it means 5.85754394E-118 in F77.
However, now I wrote a post-processing in python and I have a following line of code:
Z = numpy.fromstring(lines[nl], dtype=float, sep=' ')
which will read an output line by line (through loop on nl).
But when it reaches the 5.85754394-118 number it will stop reading, going to the next line of output and basically reading wrong number. Is there any way to read it in a correct way (default way of Fortran)?
I will guess I need to change dtype option but not have any clue.
You can post-process your output efficiently with a regular expression:
import re
r = re.compile(r"(?<=\d)\-(?=\d)")
output_line = "1.25107598E-67 1.89781536E-61 1.28064971E-94 5.85754394-118 8.02718071E-94 "
print(r.sub("E-",output_line))
result:
1.25107598E-67 1.89781536E-61 1.28064971E-94 5.85754394E-118 8.02718071E-94
(?<=\d)\-(?=\d) performs a lookbehind and lookahead for digits, and search for single minus sign between them. It replaces the minus sign by E-.
Related
This question already has answers here:
Understanding generators in Python
(13 answers)
What does "list comprehension" and similar mean? How does it work and how can I use it?
(5 answers)
Closed last year.
I have found an example online of how to count items in a list with the sum() function in Python; however, when I search for how to use the sum() function on the internet, all I can find is the basic sum(iterable, start), which adds numbers together from each element of the list/array.
Code I found, where each line of the file contains one word, and file = open("words.txt", "r"):
wordsInFile = sum(1 for line in file)
this works in my program, and I kind of see what is happening, but I would like to learn more about this kind of syntax, and what it can or can't recognize besides line. It seems pretty efficient, but I can't find any website explaining how it works, which prevents me from using this in the future in other contexts.
This expression is a generator.
First, let's write it a bit differently
wordsInFile = sum([1 for line in file])
In this form, [1 for line in file] is called a list comprehension. It's basically a for loop which produces a list, wrapped up into one line. It's similar to
wordsInFile = []
for line in file:
wordsInFile.append(1)
but a lot more concise.
Now, when we remove the brackets
wordsInFile = sum(1 for line in file)
we get what's called a generator expression. It's basically the same as what I wrote before except that it doesn't produce an intermediate list. It produces a special iterator object that supplies the values on-demand, which is more efficient for a function like sum that only uses its input once (and hence doesn't need us to waste a bunch of memory producing a big list).
This question already has answers here:
Why do backslashes appear twice?
(2 answers)
Closed 1 year ago.
I'm merging some tab delimited files and the printed output is incorrect but if I access the string in a REPL it looks fine. Here's how it looks:
fh=open('out.vcf')
for line in fh:
i+=1
if i == 29401:
print(line)
AAEX03025909.1 1068 . T C 0 42 5
Then looking at it without print:
line
'AAEX03025909.1\t1405\t.\tC\tT\t\t\t\t\t\t0\t0\t0\t0\t0\t0\t0\t0\t10\t9\n'
When I look at out.vcf in less, it looks like the output of print. Why am I getting different outputs? I want the string that is produced without print. Using a comma instead of a tab solves the problem, but I'd like to keep it as tab delimited
there's always going to be some difference between how data is represented and how it's stored; practically, the values are stored as binary, but represented depending on the encoding .. in this case, you're seeing \t (ASCII character 9) represented both ways
print() will show the file with its encoding (which you can change), while simply echoing the file will show you the Python repr() interpretation
>>> "\t"
'\t'
>>> ord("\t")
9
>>> print("\t")
>>> repr("\t")
"'\\t'"
>>> print(repr("\t"))
'\t'
This question already has answers here:
Process escape sequences in a string in Python
(8 answers)
Closed 4 years ago.
I have a text file like below
# 1.txt
who is the\u00a0winners\u00a0where\u00a0season result\u00a0is 7th
If I read a file and print it, it shows
>>> s = open("1.txt").read()
>>> print(s)
who is the\u00a0winners\u00a0where\u00a0season result\u00a0is 7th
However, If I do like below with the same string,
>> s = "who is the\u00a0winners\u00a0where\u00a0season result\u00a0is 7th"
>> print(s)
who is the winners where season result is 7th
I want to read a text file like "1.txt" and print it like the below one. I can not find how to do it. Please help me. Thanks.
\u00a0 is a non break space and is one character.
In your first example your are reading \u00a0 as 6 chars.
If you want to read a file with \u00a0s and interpret them as spaces, you would have to parse the file yourself and create spaces for each \u00a0.
This question already has answers here:
How can I convert a string with dot and comma into a float in Python
(9 answers)
Closed 5 months ago.
I am given a csv file which contains numbers ranging from 800 to 3000. The problem is numbers greater than thousand has a comma in them e.g. 1,227 or 1,074 or 2,403.
When I want to calculate their mean, variance or standard deviation using scipy or numpy, I get error: ValueError: could not convert string to float: '1,227'. How convert them to numbers so that I could do calculations on them. CSV file should not be changed as it is read only file.
Thanks, guys! I fixed it by using replace function. hpaulj's link was useful.
my_string=[val[2] for val in csvtext]
my_string=[x.replace(',', '') for x in my_string]
my_float=[float(i) for i in my_string]
This is the code, in which, 1st line loads csv string list to my_string and 2nd line removes comma and 3rd line produces numbers that are easy for calculation. So, there is no need for editing the file or creating a new one. Just a list manipulation will do the job.
This really is a locale issue, but a simple solution would be to simply call replace on the string first:
a = '1,274'
float(a.replace(',','')) # 1274.0
Another way is to use pandas to read the csv file. Its read_csv function has a thousands argument.
If you do know something about the locale, then it's probably best to use the locale.atof() function
This question already has answers here:
Suppress the u'prefix indicating unicode' in python strings
(11 answers)
Closed 8 years ago.
I want to go through data in my folder, identify them and rename them according to a list of rules I have in an excel spreadsheet
I load the needed libraries,
I make my directory the working directory;
I read in the xcel file (using xlrd)
and when I try to read the data by columns e.g. :
fname = metadata.col_values(0, start_rowx=1, end_rowx=None)
the list of values comes with a u in front of them - I guess unicode - such as:
fname = [u'file1', u'file2'] and so on
How can I convert fname to a list of ascii strings?
I'm not sure what the big issue behind having unicode filenames is, but assuming that all of your characters are ascii-valid characters the following should do it. This solution will just ignore anything that's non-ascii, but it's worth thinking about why you're doing this in the first place:
ascii_string = unicode_string.encode("ascii", "ignore")
Specifically, for converting a whole list I would use a list comprehension:
ascii_list = [old_string.encode("ascii", "ignore") for old_string in fname]
The u at the front is just a visual item to show you, when you print the string, what the underlying representation is. It's like the single-quotes around the strings when you print that list--they are there to show you something about the object being printed (specifically, that it's a string), but they aren't actually a part of the object.
In the case of the u, it's saying it's a unicode object. When you use the string internally, that u on the outside doesn't exist, just like the single-quotes. Try opening a file and writing the strings there, and you'll see that the u and the single-quotes don't show up, because they're not actually part of the underlying string objects.
with open(r'C:\test\foo.bar', 'w') as f:
for item in fname:
f.write(item)
f.write('\n')
If you really need to print strings without the u at the start, you can convert them to ASCII with u'unicode stuff'.encode('ascii'), but honestly I doubt this is something that actually matters for what you're doing.
You could also just use Python 3, where Unicode is the default and the u isn't normally printed.