Some code I am using (not in python) takes input files written in specific way. I usually prepare such input files with python scripts. One of them takes the following format:
100
0 1 2
3 4 5
6 7 8
where 100 is just an overall parameter and the rest is a matrix. In python 2, I used to do it in the following way:
# python 2.7
import numpy as np
Matrix = np.arange(9)
Matrix.shape = 3,3
f = open('input.inp', 'w')
print >> f, 100
np.savetxt(f, Matrix)
I just converted to python 3 recently. Running above script with 2to3 gets me something like:
# python 3.6
import numpy as np
Matrix = np.arange(9)
Matrix.shape = 3,3
f = open('input.inp', 'w')
print(100, file=f)
np.savetxt(f, Matrix)
The first error I got was TypeError: write() argument must be str, not bytes, because there are something like fh.write(asbytes(format % tuple(row) + newline)) during the execution of numpy.savetxt. I was able to fix this problem through opening the file as a binary: f = open('input.inp', 'wb'). But this will cause the print() to fail. Is there a way to harmonize these two?
I ran into this same issue converting to python3. All strings in python3 are interpreted as unicode by default now, so you have to convert. I found the solution of writing to a string first and then writing the string to the file to be the most appealing. This is a working version of your snippet in python3 using this method:
# python 3.6
from io import BytesIO
import numpy as np
Matrix = np.arange(9)
Matrix.shape = 3,3
f = open('input.inp', 'w')
print(100, file=f)
fh = BytesIO()
np.savetxt(fh, Matrix, fmt='%d')
cstr = fh.getvalue()
fh.close()
print(cstr.decode('UTF-8'), file=f)
Related
I am trying to remove top 24 lines of a raw file, so I opened the original raw file(let's call it raw1.raw) and converted it to nparray, then I initialized a new array and remove the top24 lines, but after writing new array to the new binary file(raw2.raw), I found raw2 is 15.2mb only while the original file raw1.raw is like 30.6mb, my code:
import numpy as np
import imageio
import rawpy
import cv2
def ave():
fd = open('raw1.raw', 'rb')
rows = 3000 #around 3000, not the real rows
cols = 5100 #around 5100, not the real cols
f = np.fromfile(fd, dtype=np.uint8,count=rows*cols)
I_array = f.reshape((rows, cols)) #notice row, column format
#print(I_array)
fd.close()
im = np.zeros((rows - 24 , cols))
for i in range (len(I_array) - 24):
for j in range(len(I_array[i])):
im[i][j] = I_array[i + 24][j]
#print(im)
newFile = open("raw2.raw", "wb")
im.astype('uint8').tofile(newFile)
newFile.close()
if __name__ == "__main__":
ave()
I tried to use im.astype('uint16') when write in the binary file, but the value would be wrong if I use uint16.
There must clearly be more data in your 'raw1.raw' file that you are not using. Are you sure that file wasn't created using 'uint16' data and you are just pulling out the first half as 'uint8' data? I just checked the writing of random data.
import os, numpy as np
x = np.random.randint(0,256,size=(3000,5100),dtype='uint8')
x.tofile(open('testfile.raw','w'))
print(os.stat('testfile.raw').st_size) #I get 15.3MB.
So, 'uint8' for a 3000 by 5100 clearly takes up 15.3MB. I don't know how you got 30+.
############################ EDIT #########
Just to add more clarification. Do you realize that dtype does nothing more than change the "view" of your data? It doesn't effect the actual data that is saved in memory. This also goes for data that you read from a file. Take for example:
import numpy as np
#The way to understand x, is that x is taking 12 bytes in memory and using
#that information to hold 3 values. The first 4 bytes are the first value,
#the second 4 bytes are the second, etc.
x = np.array([1,2,3],dtype='uint32')
#Change x to display those 12 bytes at 6 different values. Doing this does
#NOT change the data that the array is holding. You are only changing the
#'view' of the data.
x.dtype = 'uint16'
print(x)
In general (there are few special cases), changing the dtype doesn't change the underlying data. However, the conversion function .astype() does change the underlying data. If you have any array of 12 bytes viewed as 'int32' then running .astype('uint8') will take each entry (4 bytes) and covert it (known as casting) to a uint8 entry (1 byte). The new array will only have 3 bytes for the 3 entries. You can see this litterally:
x = np.array([1,2,3],dtype='uint32')
print(x.tobytes())
y = x.astype('uint8')
print(y.tobytes())
So, when we say that a file is 30mb, we mean that the file has (minus some header information) is 30,000,000 bytes which are exactly uint8s. 1 uint8 is 1 byte. If any array has 6000by5100 uint8s (bytes), then the array has 30,600,000 bytes of information in memory.
Likewise, if you read a file (DOES NOT MATTER THE FILE) and write np.fromfile(,dtype=np.uint8,count=15_300_000) then you told python to read EXACTLY 15_300_000 bytes (again 1 byte is 1 uint8) of information (15mb). If your file is 100mb, 40mb, or even 30mb, it would be completely irrelevant because you told python to only read the first 15mb of data.
I have a IDL procedure reading a binary file and I try to translate it into a Python routine.
The IDL code look like :
a = uint(0)
b = float(0)
c = float(0)
d = float(0)
e = float(0)
x=dblarr(nptx)
y=dblarr(npty)
z=dblarr(nptz)
openr,11,name_file_data,/f77_unformatted
readu,11,a
readu,11,b,c,d,e
readu,11,x
readu,11,y
readu,11,z
it works perfectly. So I'm writing the same thing in python but I can't find the same results (even the value of 'a' is different). Here is my code :
x=np.zeros(nptx,float)
y=np.zeros(npty,float)
z=np.zeros(nptz,float)
with open(name_file_data, "rb") as fb:
a, = struct.unpack("I", fb.read(4))
b,c,d,e = struct.unpack("ffff", fb.read(16))
x[:] = struct.unpack(str(nptx)+"d", fb.read(nptx*8))[:]
y[:] = struct.unpack(str(npty)+"d", fb.read(npty*8))[:]
z[:] = struct.unpack(str(nptz)+"d", fb.read(nptz*8))[:]
Hope it will help anyone to answer me.
Update : As suggested in the answers, I'm now trying the module "FortranFile", but I'm not sure I understood everything about its use.
from scipy.io import FortranFile
f=FortranFile(name_file_data, 'r')
a=f.read_record('H')
b=f.read_record('f','f','f','f')
However, instead of having an integer for 'a', I got : array([0, 0], dtype=uint16).
And I had this following error for 'b': Size obtained (1107201884) is not a multiple of the dtypes given (16)
According to a table of IDL data types, UINT(0) creates a 16 bit integer (i.e. two bytes). In the Python struct module, the I format character denotes a 4 byte integer, and H denotes an unsigned 16 bit integer.
Try changing the line that unpacks a to
a, = struct.unpack("H", fb.read(2))
Unfortunately, this probably won't fix the problem. You use the option /f77_unformatted with openr, which means the file contains more than just the raw bytes of the variables. (See the documentation of the OPENR command for more information about /f77_unformatted.)
You could try to use scipy.io.FortranFile to read the file, but there are no gaurantees that it will work. The binary layout of an unformatted Fortran file is compiler dependent.
I'm trying to read noncontiguous fields from a binary file in Python using numpy fromfile function. It's based on this Matlab code using fread:
fseek(file, 0, 'bof');
q = fread(file, inf, 'float32', 8);
8 indicates the number of bytes I want to skip after reading each value. I was wondering if there was a similar option in fromfile, or if there is another way of reading specific values from a binary file in Python. Thanks for your help.
Henrik
Something like this should work, untested:
import struct
floats = []
with open(filename, 'rb') as f:
while True:
buff = f.read(4) # 'f' is 4-bytes wide
if len(buff) < 4: break
x = struct.unpack('f', buff)[0] # Convert buffer to float (get from returned tuple)
floats.append(x) # Add float to list (for example)
f.seek(8, 1) # The second arg 1 specifies relative offset
Using struct.unpack()
I am new to both Matlab and Python and I have to convert a program in Matlab to Python. I am not sure how to typecast the data after reading from the file in Python. The file used is a binary file.
Below is the Matlab code:
fid = fopen (filename, 'r');
fseek (fid, 0, -1);
meta = zeros (n, 9, 'single');
v = zeros (n, 128, 'single');
d = 0;
for i = 1:n
meta(i,:) = fread (fid, 9, 'float');
d = fread (fid, 1, 'int');
v(i,:) = fread (fid, d, 'uint8=>single');
end
I have written the below program in python:
fid = open(filename, 'r')
fid.seek(0 , 0)
meta = np.zeros((n,9),dtype = np.float32)
v = np.zeros((n,128),dtype = np.float32)
for i in range(n):
data_str = fid.read(9);
meta[1,:] = unpack('f', data_str)
For this unpack, I getting the error as
"unpack requires a string argument of length 4"
.
Please suggest someway to make it work.
I looked a little in the problem mainly because I need this in the near future, too. Turns out there is a very simple solution using numpy, assuming you have a matlab matrix stored like I do.
import numpy as np
def read_matrix(file_name):
return np.fromfile(file_name, dtype='<f') # little-endian single precision float
arr = read_matrix(file_path)
print arr[0:10] #sample data
print len(arr) # number of elements
The data type (dtype) you must find out yourself. Help on this is here. I used fwrite(fid,value,'single'); to store the data in matlab, if you have the same, the code above will work.
Note, that the returned variable is a list; you'll have to format it to match the original shape of your data, in my case len(arr) is 307200 from a matrix of the size 15360 x 20.
Please let me know the best way to write 8bit values to a file in python. The values can be between 0 and 255.
I open the file as follows:
f = open('temp', 'wb')
Assume the value I am writing is an int between 0 and 255 assigned to a variable, e.g.
x = 13
This does not work:
f.write(x)
..as you experts know does not work. python complains about not being able to write ints to buffer interface. As a work around I am writing it as a hex digit. Thus:
f.write(hex(x))
..and that works but is not only space inefficient but clearly not the right python way. Please help. Thanks.
Try explicitly creating a bytes object:
f.write(bytes([x]))
You can also output a series of bytes as follows:
f.write(bytes([65, 66, 67]))
As an alternative you can use the struct module...
import struct
x = 13
with open('temp', 'wb') as f:
f.write(struct.pack('>I', x)) # Big-endian, unsigned int
To read x from the file...
with open('temp', 'rb') as f:
x, = struct.unpack(">I", f.read())
You just need to write an item of bytes data type, not integer.