Saving Python Results as .txt File - python

I have this code (unique_set=np.random.choice([0, 1], (10000, 10, 10, 10))) that generates 10000 3D binary matrices and I'm attempting to save the result as a .txt file. The other similar questions I checked were either trying to write a print statement to a file or were noticeably different. I tried so many of the solutions like the one below, but none of them worked.
sys.stdout = open("test.txt", "w")
print(unique_set)
sys.stdout.close()

Try this one
import numpy as np
file = open('D:\\yourpath\\filename.txt', 'w')
unique_set=np.random.choice([0, 1], (10000, 10, 10, 10))
file.write('%s\n' %unique_set)

Not knowing how the format of your output file should look like, this is one possibility:
np.savetxt("test.txt", unique_set.flatten(), delimiter=",")

You can store as JSON text file (preserves it being a 4D array)
--storing Numpy N dimensional arrays
import json
with open('"test.txt"', 'w') as f:
json.dump(unique_set.tolist(), f)

Related

Retrieving error when reading a .npy file

I am trying to read a large .npy file but I am unable to read the file. Below is my python code for reading the file.
import numpy as np
pre_train = np.load('weights.npy',allow_pickle=True, encoding="latin1")
data_pic = pre_train.item()
#print(type(data_dic))
for item in data_pic:
print(item)
Error at: data_pic = pre_train.item()
Can only convert an array of size 1 to a Python scalar
Your code does not crash when loading the file. It crashes when using numpy.ndarray.item. In your case, you do not need to use item().
Using a good old for-loop will do!
data = np.load('...')
for i in data:
for j in i:
print(j)
# 2, 2, 6, 1, ...

Numpy load result file saved in append mode

I have one big file saved using numpy in append mode, i.e., it contains maybe 5000 arrays, each with shape, e.g. [1, 224, 224, 3], like this way:
filepath = 'hello'
for some loop:
...
with open(filepath, 'ab') as f:
np.save(f, ndarray)
I need to load the data in the file, maybe all arrays, or maybe in some generating mode, like reading the first 100, then the next 100, and so on. Is there any method to do this properly? Now, I only know if I use np.load, I can only get one array each time, and I don't know how to read the 100 to 199 arrays.
loading arrays saved using numpy.save in append mode
This question talk about something on this, but seems not what I want.
One solution, although ugly and can only get all arrays in the file (and thus risk the out of memory error) is as following:
a = []
with open(filepath, 'rb') as f:
while True:
try:
a.append(np.load(f))
except:
break
np.stack(a)
This is more of a hack (given your situation).
Anyway, here is the one that created the files with np.save in append mode:
import numpy as np
numpy_arrays = [np.array ([1, 2, 3]), np.array([0, 9])]
print numpy_arrays[0], numpy_arrays[1]
print type(numpy_arrays[0]), type(numpy_arrays[1])
for numpy_array in numpy_arrays:
with open ("./my-numpy-arrays.bin", 'ab') as f:
np.save(f, numpy_array)
[1 2 3] [0 9]
<type 'numpy.ndarray'> <type 'numpy.ndarray'>
... and here is the code that checks IOException (and other errors) while looping through.
with open ("./my-numpy-arrays.bin", 'rb') as f:
while True:
try :
numpy_array = np.load(f)
print numpy_array
except :
break
[1 2 3]
[0 9]
Not very pretty but ... it works.

Streaming multiple numpy arrays to a file

This differs from Write multiple numpy arrays to file in that I need to be able to stream content, rather than writing it all at once.
I need to write multiple compressed numpy arrays in binary to a file. I can not store all the arrays in memory before writing so it is more like streaming numpy arrays to a file.
This currently works fine as text
file = open("some file")
while doing stuff:
file.writelines(somearray + "\n")
where some array is a new instance every loop
however this does not work if i try and write the arrays as binary.
arrays are created at 30hz and grow too big to keep in memory. They also can not each be stored into a bunch of single array files because that would just be wasteful and cause a huge mess.
So i would like only one file per a session instead of 10k files per a session.
One option might be to use pickle to save the arrays to a file opened as an append binary file:
import numpy as np
import pickle
arrays = [np.arange(n**2).reshape((n,n)) for n in range(1,11)]
with open('test.file', 'ab') as f:
for array in arrays:
pickle.dump(array, f)
new_arrays = []
with open('test.file', 'rb') as f:
while True:
try:
new_arrays.append(pickle.load(f))
except EOFError:
break
assert all((new_array == array).all() for new_array, array in zip(new_arrays, arrays))
This might not be the fastest, but it should be fast enough. It might seem like this would take up more data, but comparing these:
x = 300
y = 300
arrays = [np.random.randn(x, y) for x in range(30)]
with open('test2.file', 'ab') as f:
for array in arrays:
pickle.dump(array, f)
with open('test3.file', 'ab') as f:
for array in arrays:
f.write(array.tobytes())
with open('test4.file', 'ab') as f:
for array in arrays:
np.save(f, array)
You'll find the file sizes as 1,025 KB, 1,020 KB, and 1,022 KB respectively.
An NPZ file is just a zip archive, so you could save each array to a temporary NPY file, add that NPY file to the zip archive, and then delete the temporary file.
For example,
import os
import zipfile
import numpy as np
# File that will hold all the arrays.
filename = 'foo.npz'
with zipfile.ZipFile(filename, mode='w', compression=zipfile.ZIP_DEFLATED) as zf:
for i in range(10):
# `a` is the array to be written to the file in this iteration.
a = np.random.randint(0, 10, size=20)
# Name for the temporary file to which `a` is written. The root of this
# filename is the name that will be assigned to the array in the npz file.
# I've used 'arr_{}' (e.g. 'arr_0', 'arr_1', ...), similar to how `np.savez`
# treats positional arguments.
tmpfilename = "arr_{}.npy".format(i)
# Save `a` to a npy file.
np.save(tmpfilename, a)
# Add the file to the zip archive.
zf.write(tmpfilename)
# Delete the npy file.
os.remove(tmpfilename)
Here's an example where that script is run, and then the data is read back using np.load:
In [1]: !ls
add_array_to_zip.py
In [2]: run add_array_to_zip.py
In [3]: !ls
add_array_to_zip.py foo.npz
In [4]: foo = np.load('foo.npz')
In [5]: foo.files
Out[5]:
['arr_0',
'arr_1',
'arr_2',
'arr_3',
'arr_4',
'arr_5',
'arr_6',
'arr_7',
'arr_8',
'arr_9']
In [6]: foo['arr_0']
Out[6]: array([0, 9, 3, 7, 2, 2, 7, 2, 0, 5, 8, 1, 1, 0, 4, 2, 5, 1, 8, 2])
You'll have to test this on your system to see if it can keep up with your array generation process.
Another alternative is to use something like HDF5, with either h5py or pytables.

Saving a dictionary of numpy arrays in human-readable format

This is not a duplicate question. I looked around a lot and found this question, but the savezand pickle utilities render the file unreadable by a human. I want to save it in a .txt file which can be loaded back into a python script. So I wanted to know whether there are some utilities in python which can facilitate this task and keep the written file readable by a human.
The dictionary of numpy arrays contains 2D arrays.
EDIT:
According to Craig's answer, I tried the following :
import numpy as np
W = np.arange(10).reshape(2,5)
b = np.arange(12).reshape(3,4)
d = {'W':W, 'b':b}
with open('out.txt', 'w') as outfile:
outfile.write(repr(d))
f = open('out.txt', 'r')
d = eval(f.readline())
print(d)
This gave the following error: SyntaxError: unexpected EOF while parsing.
But the out.txtdid contain the dictionary as expected. How can I load it correctly?
EDIT 2:
Ran into a problem : Craig's answer truncates the array if the size is large. The out.txt shows first few elements, replaces the middle elements by ... and shows the last few elements.
Convert the dict to a string using repr() and write that to the text file.
import numpy as np
d = {'a':np.zeros(10), 'b':np.ones(10)}
with open('out.txt', 'w') as outfile:
outfile.write(repr(d))
You can read it back in and convert to a dictionary with eval():
import numpy as np
f = open('out.txt', 'r')
data = f.read()
data = data.replace('array', 'np.array')
d = eval(data)
Or, you can directly import array from numpy:
from numpy import array
f = open('out.txt', 'r')
data = f.read()
d = eval(data)
H/T: How can a string representation of a NumPy array be converted to a NumPy array?
Handling large arrays
By default, numpy summarizes arrays longer than 1000 elements. You can change this behavior by calling numpy.set_printoptions(threshold=S) where S is larger than the size of the arrays. For example:
import numpy as np
W = np.arange(10).reshape(2,5)
b = np.arange(12).reshape(3,4)
d = {'W':W, 'b':b}
largest = max(np.prod(a.shape) for a in d.values()) #get the size of the largest array
np.set_printoptions(threshold=largest) #set threshold to largest to avoid summarizing
with open('out.txt', 'w') as outfile:
outfile.write(repr(d))
np.set_printoptions(threshold=1000) #recommended, but not necessary
H/T: Ellipses when converting list of numpy arrays to string in python 3

Splitting data file columns into separate arrays in Python

I'm new to python and have been trying to figure this out all day. I have a data file laid out as below,
time I(R_stkb)
Step Information: Temp=0 (Run: 1/11)
0.000000000000000e+000 0.000000e+000
9.999999960041972e-012 8.924141e-012
1.999999992008394e-011 9.623148e-012
3.999999984016789e-011 6.154220e-012
(Note: No empty line between the each data line.)
I want to plot the data using matplotlib functions, so I'll need the two separate columns in arrays.
I currently have
def plotdata():
Xvals=[], Yvals=[]
i = open(file,'r')
for line in i:
Xvals,Yvals = line.split(' ', 1)
print Xvals,Yvals
But obviously its completely wrong. Can anyone give me a simple answer to this, and with an explanation of what exactly the lines mean would be helpful. Cheers.
Edit: The first two lines repeat throughout the file.
This is a job for the * operator on the zip method.
>>> asdf
[[1, 2], [3, 4], [5, 6]]
>>> zip(*asdf)
[(1, 3, 5), (2, 4, 6)]
So in the context of your data it might be something like:
handle = open(file,'r')
lines = [line.split() for line in handle if line[:4] not in ('time', 'Step')]
Xvals, Yvals = zip(*lines)
or if your really need to be able to mutate the data afterwards you could just call the list constructor on each tuple:
Xvals, Yvals = [list(block) for block in zip(*lines)]
One way to do it is:
Xvals=[]; Yvals=[]
i = open(file,'r')
for line in i:
x, y = line.split(' ', 1)
Xvals.append(float(x))
Yvals.append(float(y))
print Xvals,Yvals
Note the call to the float function, which will change the string you get from the file into a number.
This is what numpy.loadtxt is designed for. Try:
import numpy as np
import matplotlib.pyplot as plt
data = np.loadtxt(file, skiprows = 2) # assuming you have time and step information on 2 separate lines
# and you do not want to read them
plt.plot(data[:,0], data[:,1])
plt.show()
EDIT:
if you have time and step information scattered throughout the file and you want to plot data on every step, there is a possibility of reading all the file to memory (suppose it's small enough), and then split it on time strings:
l = open(fname, 'rb').read()
for chunk in l.split('time'):
data = np.array([s.split() for s in chunk.split('\n')[2:]][:-1], dtype = np.float)
plt.plot(data[:,0], data[:,1])
plt.show()
Or else you could add the # comment sign to the comment lines and use np.loadxt.
If you want to plot this file with matplotlib, you might want to check out it's plotfile function. See the official documentation here.

Categories

Resources