Reading file in Fortran-90 written out by Python f.write() - python

So someone wrote this code which outputs x,y,z positions of some particles.
if rs.contains_block(file+'.hdf5',"POS ",parttype=1):
d1 = rs.read_block(file, "POS ",parttype=1,verbose=False)
blocksize = struct.pack('I', len(d1)*8*3)
f.write(blocksize)
for i in range(len(d1)):
for j in range(3):
data = struct.pack('d',d1[i][j])
f.write(data)
f.write(blocksize)
print 'Position real', d1, k
else:
data = struct.pack('I',0)
f.write(data)
f.write(data)
Or basically:
for i = 1:Nparticles
for j = 1:3
write xyz[i][j]
end
end
Now I am trying to read this back in F90 like:
real*4, allocatable :: pos(:,:)
N=sum(Nparticles)
open (1, file=filename, form='unformatted')
allocate(pos(1:3,1:N))
read (1) pos
close(1)
When I print the first three rows I get:
do i =1,3
print *, pos(1:3,i)
end do
>> 0.00000000 2.61613369 -2.00000000
1.88289821 -2.00000000 1.96834707
2.00000000 2.61616445 2.00000000
Now I know for certain that the positions range from 0 - 25 so getting -2.0000 everywhere is concerning. Is there something about my friends Python write out (f.write()) that that I need to tell Fortran to do during the read in so it outputs the positions correctly? Leading characters? Allocation correction? I've used a Python read-in and I get the following for the first 3 entries:
>>> pos = rs.read_block("filename","POS ",parttype=1)
>>> pos(1:3,:)
array([[ 12.49398994, 21.89432526, 6.23691988],
[ 12.48858261, 21.89297867, 6.23258686],
[ 12.48777962, 21.89576149, 6.23423147],
Which is not what I get when I do the Fortran read in above.
Thanks.

Related

I'm reading into a 256 byte string. I want to skip it, if it's all binary zeros (\x00) Is there a single test?

Totally new to python. Trying to parse a file but not all records contain data. I want to skip the records that are all hex 00.
if record == ('\x00' * 256): from a sample of print("-"*80))
gave a Syntax error, hey I said I was new. :)
Thanks for the reply, I'm using 2.7 and reading like this....
with open(testfile, "rb") as f:
counter = 0
while True:
record = f.read(256)
counter += 1
Your example looks to be very close. I'm not sure about Python 2, but in Python 3 you should specify that a string is binary.
I would do something like:
empty = b'\x00' * 256
if record == empty:
print('skipped this line')
Remember that Python 2 uses print statements, so you should do print 'skipped this line' instead.

Separate lines in Python

I have a .txt file. It has 3 different columns. The first one is just numbers. The second one is numbers which starts with 0 and it goes until 7. The final one is a sentence like. And I want to keep them in different lists because of matching them for their numbers. I want to write a function. How can I separate them in different lists without disrupting them?
The example of .txt:
1234 0 my name is
6789 2 I am coming
2346 1 are you new?
1234 2 Who are you?
1234 1 how's going on?
And I have keep them like this:
----1----
1234 0 my name is
1234 1 how's going on?
1234 2 Who are you?
----2----
2346 1 are you new?
----3-----
6789 2 I am coming
What I've tried so far:
inputfile=open('input.txt','r').read()
m_id=[]
p_id=[]
packet_mes=[]
input_file=inputfile.split(" ")
print(input_file)
input_file=line.split()
m_id=[int(x) for x in input_file if x.isdigit()]
p_id=[x for x in input_file if not x.isdigit()]
With your current approach, you are reading the entire file as a string, and performing a split on a whitespace (you'd much rather split on newlines instead, because each line is separated by a newline). Furthermore, you're not segregating your data into disparate columns properly.
You have 3 columns. You can split each line into 3 parts using str.split(None, 2). The None implies splitting on space. Each group will be stored as key-list pairs inside a dictionary. Here I use an OrderedDict in case you need to maintain order, but you can just as easily declare o = {} as a normal dictionary with the same grouping (but no order!).
from collections import OrderedDict
o = OrderedDict()
with open('input.txt', 'r') as f:
for line in f:
i, j, k = line.strip().split(None, 2)
o.setdefault(i, []).append([int(i), int(j), k])
print(dict(o))
{'1234': [[1234, 0, 'my name is'],
[1234, 2, 'Who are you?'],
[1234, 1, "how's going on?"]],
'6789': [[6789, 2, 'I am coming']],
'2346': [[2346, 1, 'are you new?']]}
Always use the with...as context manager when working with file I/O - it makes for clean code. Also, note that for larger files, iterating over each line is more memory efficient.
Maybe you want something like that:
import re
# Collect data from inpu file
h = {}
with open('input.txt', 'r') as f:
for line in f:
res = re.match("^(\d+)\s+(\d+)\s+(.*)$", line)
if res:
if not res.group(1) in h:
h[res.group(1)] = []
h[res.group(1)].append((res.group(2), res.group(3)))
# Output result
for i, x in enumerate(sorted(h.keys())):
print("-------- %s -----------" % (i+1))
for y in sorted(h[x]):
print("%s %s %s" % (x, y[0], y[1]))
The result is as follow (add more newlines if you like):
-------- 1 -----------
1234 0 my name is
1234 1 how's going on?
1234 2 Who are you?
-------- 2 -----------
2346 1 are you new?
-------- 3 -----------
6789 2 I am coming
It's based on regexes (module re in python). This is a good tool when you want to match simple line based patterns.
Here it relies on spaces as columns separators but it can as easily be adapted for fixed width columns.
The results is collected in a dictionary of lists. each list containing tuples (pairs) of position and text.
The program waits output for sorting items.
It's a quite ugly code but it's quite easy to understand.
raw = []
with open("input.txt", "r") as file:
for x in file:
raw.append(x.strip().split(None, 2))
raw = sorted(raw)
title = raw[0][0]
refined = []
cluster = []
for x in raw:
if x[0] == title:
cluster.append(x)
else:
refined.append(cluster)
cluster = []
title = x[0]
cluster.append(x)
refined.append(cluster)
for number, group in enumerate(refined):
print("-"*10+str(number)+"-"*10)
for line in group:
print(*line)

Formatting output of CSV data?

I'm fairly new to python and made something that had this output:
(The text is in a csv file so so:
1,A
2,B
3,C etc)
Number Letter
1 A
2 B
3 C
26 Z
Unfortunately, I spent a good amount of time making it using a complicated method in which I manually made spaces like this:
Updated Code rn
fx = int(input('Number?\n'))
f=open('nums.txt','r')
lines=f.readlines()
line = lines[fx - 1]
with open('nums.txt','r') as f:
for i, line in enumerate(f):
if i >= 5:
break
NUM, LTR, SMB = line.rsplit(',', 1)
print(NUM.ljust(13) + LTR.ljust(13) + SMB)
How do I get it to make 3 columns? Right now it comes up with a
ValueError: not enough values to unpack (expected 3, got 2)
So is there a simpler method of achieving this that doesn't move the strings around like this:
Number Letter
1 A
2 B
3 C
26 Z #< string moves with spaces.
For simple alignment, you can use ljust or rjust. There is also no need to read the entire file for each line you want to process:
with open('numberletter','r') as f:
for i, line in enumerate(f):
if i >= 5:
break
number, letter = line.rsplit(',', 1)
print(number.ljust(13) + letter)
For more complex output formatting, look at str.format() and the formatting syntax
You can use sys module for that.
import sys
a=[1,"A"]
sys.stdout.write("%-6s %-50s " % (a[0],a[1]))

Reading binary file in python

I wrote a python script to create a binary file of integers.
import struct
pos = [7623, 3015, 3231, 3829]
inh = open('test.bin', 'wb')
for e in pos:
inh.write(struct.pack('i', e))
inh.close()
It worked well, then I tried to read the 'test.bin' file using the below code.
import struct
inh = open('test.bin', 'rb')
for rec in inh:
pos = struct.unpack('i', rec)
print pos
inh.close()
But it failed with an error message:
Traceback (most recent call last):
File "readbinary.py", line 10, in <module>
pos = struct.unpack('i', rec)
File "/usr/lib/python2.5/struct.py", line 87, in unpack
return o.unpack(s)
struct.error: unpack requires a string argument of length 4
I would like to know how I can read these file using struct.unpack.
Many thanks in advance,
Vipin
for rec in inh: reads one line at a time -- not what you want for a binary file. Read 4 bytes at a time (with a while loop and inh.read(4)) instead (or read everything into memory with a single .read() call, then unpack successive 4-byte slices). The second approach is simplest and most practical as long as the amount of data involved isn't huge:
import struct
with open('test.bin', 'rb') as inh:
indata = inh.read()
for i in range(0, len(data), 4):
pos = struct.unpack('i', data[i:i+4])
print(pos)
If you do fear potentially huge amounts of data (which would take more memory than you have available), a simple generator offers an elegant alternative:
import struct
def by4(f):
rec = 'x' # placeholder for the `while`
while rec:
rec = f.read(4)
if rec: yield rec
with open('test.bin', 'rb') as inh:
for rec in by4(inh):
pos = struct.unpack('i', rec)
print(pos)
A key advantage to this second approach is that the by4 generator can easily be tweaked (while maintaining the specs: return a binary file's data 4 bytes at a time) to use a different implementation strategy for buffering, all the way to the first approach (read everything then parcel it out) which can be seen as "infinite buffering" and coded:
def by4(f):
data = inf.read()
for i in range(0, len(data), 4):
yield data[i:i+4]
while leaving the "application logic" (what to do with that stream of 4-byte chunks) intact and independent of the I/O layer (which gets encapsulated within the generator).
I think "for rec in inh" is supposed to read 'lines', not bytes. What you want is:
while True:
rec = inh.read(4) # Or inh.read(struct.calcsize('i'))
if len(rec) != 4:
break
(pos,) = struct.unpack('i', rec)
print pos
Or as others have mentioned:
while True:
try:
(pos,) = struct.unpack_from('i', inh)
except (some_exception...):
break
Check the size of the packed integers:
>>> pos
[7623, 3015, 3231, 3829]
>>> [struct.pack('i',e) for e in pos]
['\xc7\x1d\x00\x00', '\xc7\x0b\x00\x00', '\x9f\x0c\x00\x00', '\xf5\x0e\x00\x00']
We see 4-byte strings, it means that reading should be 4 bytes at a time:
>>> inh=open('test.bin','rb')
>>> b1=inh.read(4)
>>> b1
'\xc7\x1d\x00\x00'
>>> struct.unpack('i',b1)
(7623,)
>>>
This is the original int! Extending into a reading loop is left as an exercise .
You can probably use array as well if you want:
import array
pos = array.array('i', [7623, 3015, 3231, 3829])
inh = open('test.bin', 'wb')
pos.write(inh)
inh.close()
Then use array.array.fromfile or fromstring to read it back.
This function reads all bytes from file
def read_binary_file(filename):
try:
f = open(filename, 'rb')
n = os.path.getsize(filename)
data = array.array('B')
data.read(f, n)
f.close()
fsize = data.__len__()
return (fsize, data)
except IOError:
return (-1, [])
# somewhere in your code
t = read_binary_file(FILENAME)
fsize = t[0]
if (fsize > 0):
data = t[1]
# work with data
else:
print 'Error reading file'
Your iterator isn't reading 4 bytes at a time so I imagine it's rather confused. Like SilentGhost mentioned, it'd probably be best to use unpack_from().

Bash or Python to go backwards?

I have a text file which a lot of random occurrences of the string #STRING_A, and I would be interested in writing a short script which removes only some of them. Particularly one that scans the file and once it finds a line which starts with this string like
#STRING_A
then checks if 3 lines backwards there is another occurrence of a line starting with the same string, like
#STRING_A
#STRING_A
and if it happens, to delete the occurrence 3 lines backward. I was thinking about bash, but I do not know how to "go backwards" with it. So I am sure that this is not possible with bash. I also thought about python, but then I should store all information in memory in order to go backwards and then, for long files it would be unfeasible.
What do you think? Is it possible to do it in bash or python?
Thanks
Funny that after all these hours nobody's yet given a solution to the problem as actually phrased (as #John Machin points out in a comment) -- remove just the leading marker (if followed by another such marker 3 lines down), not the whole line containing it. It's not hard, of course -- here's a tiny mod as needed of #truppo's fun solution, for example:
from itertools import izip, chain
f = "foo.txt"
for third, line in izip(chain(" ", open(f)), open(f)):
if third.startswith("#STRING_A") and line.startswith("#STRING_A"):
line = line[len("#STRING_A"):]
print line,
Of course, in real life, one would use an iterator.tee instead of reading the file twice, have this code in a function, not repeat the marker constant endlessly, &c;-).
Of course Python will work as well. Simply store the last three lines in an array and check if the first element in the array is the same as the value you are currently reading. Then delete the value and print out the current array. You would then move over your elements to make room for the new value and repeat. Of course when the array is filled you'd have to make sure to continue to move values out of the array and put in the newly read values, stopping to check each time to see if the first value in the array matches the value you are currently reading.
Here is a more fun solution, using two iterators with a three element offset :)
from itertools import izip, chain, tee
f1, f2 = tee(open("foo.txt"))
for third, line in izip(chain(" ", f1), f2):
if not (third.startswith("#STRING_A") and line.startswith("#STRING_A")):
print line,
Why shouldn't it possible in bash? You don't need to keep the whole file in memory, just the last three lines (if I understood correctly), and write what's appropriate to standard-out. Redirect that into a temporary file, check that everything worked as expected, and overwrite the source file with the temporary one.
Same goes for Python.
I'd provide a script of my own, but that wouldn't be tested. ;-)
As AlbertoPL said, store lines in a fifo for later use--don't "go backwards". For this I would definitely use python over bash+sed/awk/whatever.
I took a few moments to code this snippet up:
from collections import deque
line_fifo = deque()
for line in open("test"):
line_fifo.append(line)
if len(line_fifo) == 4:
# "look 3 lines backward"
if line_fifo[0] == line_fifo[-1] == "#STRING_A\n":
# get rid of that match
line_fifo.popleft()
else:
# print out the top of the fifo
print line_fifo.popleft(),
# don't forget to print out the fifo when the file ends
for line in line_fifo: print line,
This code will scan through the file, and remove lines starting with the marker. It only keeps only three lines in memory by default:
from collections import deque
def delete(fp, marker, gap=3):
"""Delete lines from *fp* if they with *marker* and are followed
by another line starting with *marker* *gap* lines after.
"""
buf = deque()
for line in fp:
if len(buf) < gap:
buf.append(line)
else:
old = buf.popleft()
if not (line.startswith(marker) and old.startswith(marker)):
yield old
buf.append(line)
for line in buf:
yield line
I've tested it with:
>>> from StringIO import StringIO
>>> fp = StringIO('''a
... b
... xxx 1
... c
... xxx 2
... d
... e
... xxx 3
... f
... g
... h
... xxx 4
... i''')
>>> print ''.join(delete(fp, 'xxx'))
a
b
xxx 1
c
d
e
xxx 3
f
g
h
xxx 4
i
This "answer" is for lyrae ... I'll amend my previous comment: if the needle is in the first 3 lines of the file, your script will either cause an IndexError or access a line that it shouldn't be accessing, sometimes with interesting side-effects.
Example of your script causing IndexError:
>>> lines = "#string line 0\nblah blah\n".splitlines(True)
>>> needle = "#string "
>>> for i,line in enumerate(lines):
... if line.startswith(needle) and lines[i-3].startswith(needle):
... lines[i-3] = lines[i-3].replace(needle, "")
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
IndexError: list index out of range
and this example shows not only that the Earth is round but also why your "fix" to the "don't delete the whole line" problem should have used .replace(needle, "", 1) or [len(needle):] instead of .replace(needle, "")
>>> lines = "NEEDLE x NEEDLE y\nnoddle\nnuddle\n".splitlines(True)
>>> needle = "NEEDLE"
>>> # Expected result: no change to the file
... for i,line in enumerate(lines):
... if line.startswith(needle) and lines[i-3].startswith(needle):
... lines[i-3] = lines[i-3].replace(needle, "")
...
>>> print ''.join(lines)
x y <<<=== whoops!
noddle
nuddle
<<<=== still got unwanted newline in here
>>>
My awk-fu has never been that good... but the following may provide you what you're looking for in a bash-shell/shell-utility form:
sed `awk 'BEGIN{ORS=";"}
/#STRING_A/ {
if(LAST!="" && LAST+3 >= NR) print LAST "d"
LAST = NR
}' test_file` test_file
Basically... awk is producing a command for sed to strip certain lines. I'm sure there's a relatively easy way to make awk do all of the processing, but this does seem to work.
The bad part? It does read the test_file twice.
The good part? It is a bash/shell-utility implementation.
Edit: Alex Martelli points out that the sample file above might have confused me. (my above code deletes the whole line, rather than the #STRING_A flag only)
This is easily remedied by adjusting the command to sed:
sed `awk 'BEGIN{ORS=";"}
/#STRING_A/ {
if(LAST!="" && LAST+3 >= NR) print LAST "s/#STRING_A//"
LAST = NR
}' test_file` test_file
This may be what you're looking for?
lines = open('sample.txt').readlines()
needle = "#string "
for i,line in enumerate(lines):
if line.startswith(needle) and lines[i-3].startswith(needle):
lines[i-3] = lines[i-3].replace(needle, "")
print ''.join(lines)
this outputs:
string 0 extra text
string 1 extra text
string 2 extra text
string 3 extra text
--replaced -- 4 extra text
string 5 extra text
string 6 extra text
#string 7 extra text
string 8 extra text
string 9 extra text
string 10 extra text
In bash you can use sort -r filename and tail -n filename to read the file backwards.
$LINES=`tail -n filename | sort -r`
# now iterate through the lines and do your checking
I would consider using sed. gnu sed supports definition of line ranges. if sed would fail, then there is another beast - awk and I'm sure you can do it with awk.
O.K. I feel I should put my awk POC. I could not figure out to use sed addresses. I have not tried combination of awk+sed, but it seems to me it's overkill.
my awk script works as follows:
It reads lines and stores them into 3 line buffer
once desired pattern is found (/^data.*/ in my case), the 3-line buffer is looked up to check, whether desired pattern has been seen three lines ago
if pattern has been seen, then 3 lines are scratched
to be honest, I would probably go with python also, given that awk is really awkward.
the AWK code follows:
function max(a, b)
{
if (a > b)
return a;
else
return b;
}
BEGIN {
w = 0; #write index
r = 0; #read index
buf[0, 1, 2]; #buffer
}
END {
# flush buffer
# start at read index and print out up to w index
for (k = r % 3; k r - max(r - 3, 0); k--) {
#search in 3 line history buf
if (match(buf[k % 3], /^data.*/) != 0) {
# found -> remove lines from history
# by rewriting them -> adjust write index
w -= max(r, 3);
}
}
buf[w % 3] = $0;
w++;
}
/^.*/ {
# store line into buffer, if the history
# is full, print out the oldest one.
if (w > 2) {
print buf[r % 3];
r++;
buf[w % 3] = $0;
}
else {
buf[w] = $0;
}
w++;
}

Categories

Resources