I need to save multiple numpy arrays along with the user input that was used to compute the data these arrays contain in a single file. I'm having a hard time finding a good procedure to use to achieve this or even what file type to use. The only thing i can think of is too put the computed arrays along with the user input into one single array and then save it using numpy.save. Does anybody know any better alternatives or good file types for my use?
You could try using Pickle to serialize your arrays.
How about using pickle and then storing pickled array objects in a storage of your choice, like database or files?
I had this problem long ago so i dont have the code near to show you, but i used a binary write in a tmp file to get that done.
EDIT: Thats is, pickle is what i used. Thanks SpankMe and RoboInventor
Numpy provides functions to save arrays to files, e.g. savez():
outfile = '/tmp/data.dat'
x = np.arange(10)
y = np.sin(x)
np.savez(outfile, x=x, y=y)
npzfile = np.load(outfile)
print npzfile['x']
print npzfile['y']
Related
I want to store multiple GeoTiff files in one HDF5 file to use it for further analysis since the function I am supposed to use can just deal with HDF5 (so basically like a raster stack in R but stored in a HDF5). I have to use Python. I am relatively new to HDF5 format (and geoanalysis in Python generally) and don't really know how to approach this issue. Especially keeping the geolocation/projection inforation seems tricky to me. So far I tried:
import h5py
import rasterio
r1 = rasterio.open("filename.tif")
r2 = rasterio.open("filename2.tif")
with h5py.File('path/test.h5', 'w') as hdf:
hdf.create_dataset('GeoTiff1', data=r1)
hdf.create_dataset('GeoTiff2', data=r2)
Yielding the following errror:
TypeError: Object dtype dtype('O') has no native HDF5 equivalent
I am pretty sure this not at all the correct approach and I'm happy about any suggestions.
What you can try is to do this:
import numpy as np
spec_dtype = h5py.special_dtype(vlen=np.dtype('float64'))
Just make a spec_dtype variable with float64 type then apply this to create_dataset:
with h5py.File('path/test.h5', 'w') as hdf:
hdf.create_dataset('GeoTiff1', data=r1,, dtype=spec_dtype)
hdf.create_dataset('GeoTiff2', data=r2,, dtype=spec_dtype)
Apply these and hopefully it will work.
Using HDFql in Python, your use-case could be solved as follows:
import HDFql
HDFql.execute("SHOW FILE SIZE filename.tif, filename2.tif")
HDFql.cursor_next()
HDFql.execute("CREATE DATASET path/test.h5 GeoTiff1 AS OPAQUE(%d) VALUES FROM BINARY FILE filename.tif" % HDFql.cursor_get_bigint())
HDFql.cursor_next()
HDFql.execute("CREATE DATASET path/test.h5 GeoTiff2 AS OPAQUE(%d) VALUES FROM BINARY FILE filename2.tif" % HDFql.cursor_get_bigint())
I have multiple text files, each containing several columns. I need to read each file into an array in python, called RDF. The point is that I used to read one file into one array as following:
RDF_1 = numpy.loadtxt("filename_1.txt", skiprows=205, usecols=(1,), unpak=True)
How to create a loop in python such that it reads more than one file into their corresponding arrays like this:
for i in range(100):
RDF_i = numpy.loadtxt("filename_"+str(i)+".txt", skiprows=205, usecols=(1,), unpak=True)
You can use dictionaries as a proper way:
files_mapping = dict()
for i in range(100):
files_mapping[f'RDF_{i}'] = numpy.loadtxt(f"filename_{i}.txt", skiprows=205, usecols=(1,), unpak=True)
But if for some unknown reasons you really need to dynamically create variables then you can use exec:
for i in range(100):
exac(f'RDF_{i} = numpy.loadtxt(f"filename_{i}.txt", skiprows=205, usecols=(1,), unpak=True)'
And another possible way is using locals:
for i in range(100):
locals()[f'RDF_{i}'] = numpy.loadtxt(f"filename_{i}.txt", skiprows=205, usecols=(1,), unpak=True)
You need to avoid using two last options in real code because it's a direct way to spawning hard-to-find bugs.
I found a way to do it. I use two dimensional arrys after importing numpy library.
However, I had to zero the arrays before filling them out with data because python had already filled them out with random values.
This forum has been extremely helpful for a python novice like me to improve my knowledge. I have generated a large number of raw data in text format from my CFD simulation. My objective is to import these text files into python and do some postprocessing on them. This is a code that I have currently.
import numpy as np
from matplotlib import pyplot as plt
import os
filename=np.array(['v1-0520.txt','v1-0878.txt','v1-1592.txt','v1-3020.txt','v1-5878.txt'])
for i in filename:
format_name= i
path='E:/Fall2015/Research/CFDSimulations_Fall2015/ddn310/Autoexport/v1'
data= os.path.join(path,format_name)
X,Y,U,V,T,Tr = np.loadtxt(data,usecols=(1,2,3,4,5,6),skiprows=1,unpack = True) # Here X and Y represents the X and Y coordinate,U,V,T,Tr represents the Dependent Variables
plt.figure(1)
plt.plot(T,Y)
plt.legend(['vt1a','vtb','vtc','vtd','vte','vtf'])
plt.grid(b=True)
Is there a better way to do this, like importing all the text files (~10000 files) at once into python and then accessing whichever files I need for post processing (maybe indexing). All the text files will have the same number of columns and rows.
I am just a beginner to Python.I will be grateful if someone can help me or point me in the right direction.
Your post needs to be edited to show proper indentation.
Based on a quick read, I think you are:
reading a file, making a small edit, and write it back
then you load it into a numpy array and plot it
Presumably the purpose of your edit is to correct some header or value.
You don't need to write the file back. You can use content directly in loadtxt.
content = content.replace("nodenumber","#nodenumber") # Ignoring Node number column
data1=np.loadtxt(content.splitlines())
Y=data1[:,2]
temp=data1[:,5]
loadtxt accepts any thing that feeds it line by line. content.splitlines() makes a list of lines, which loadtxt can use.
the load could be more compact with:
Y, temp = np.loadtxt(content.splitlines(), usecols=(2,5), unpack=True)
With usecols you might not even need the replace step. You haven't given us a sample file to test.
I don't understand your multiple file needs. One way other you need to open and read each file, one by one. And it would be best to close one before going on to the next. The with open(name) as f: syntax is great for ensuring that a file is closed.
You could collect the loaded data in larger lists or arrays. If Y and temp are identical in size for all files, they can be collected into larger dimensional array, e.g. YY[i,:] = Y for the ith file, where YY is preallocated. If they can vary in size, it is better to collect them in lists.
I have just recently started using numpy and was wondering some things.
I have a numpy array that looks like this after splitting it:
[array([1,2,3]),
array([4,5,6])]
I want to use numpy.savez to save the main array into the .npz archive with each subarray in its own .npy file.
I thought using this:
numpy.savez('dataFile', mainArray)
would work but it only creates the archive with a single .npy file called arr_0.npy.
Is there a way to do something like this? and if so is there a way so that I can use any array with any number of subarrays with that method. To get these arrays I am reading from a .bin file that could contain any number of elements that would split into any number of arrays. This is why I'm having a hard time.
Is there a way to add files to an already created .npz file?
After doing more research I came upon the answer to my main question. I found out that you can use the *arg to loop through the list of arrays to add them.
I changed the code to
numpy.savez('test', *[mainArray[x] for x in rang(len(mainArray))])
This gave me the solution i was looking for. Thank you for your help.
If you want to save the subarrays in your main array, then you probably need to use save manually, i.e.
mainArray = [np.array([1,2,3]), np.array([4,5,6])]
for i in range(len(mainArray)):
np.save('dataFile_%i'%i, mainArray[i] )
Or you can use savez to save subarrays separately and load them later.
mainArray = [np.array([1,2,3]), np.array([4,5,6])]
np.savez('dataFile', mainArray[0], mainArray[1])
npzfile = np.load('dataFile.npz')
npzfile['arr_0']
npzfile['arr_1']
I need to use some matrices in Python programs, like
Q = np.matrix([[1,0,1,1,0],
[0,2,0,1,1],
[1,0,2,0,1],
[1,1,0,1,0],
[0,1,1,0,1]])
and I want to import the matrix (use numpy) from a file, so what should I do to realize it? what code should I write and what file should I use (.txt?). I am quite new to python, anyone can help me? Thank you in advance.
I'm assuming that you're not only importing the matrices, but also exporting them to files in the first place.
If that's true, there are multiple easy options, with different tradeoffs.
np.save saves the array in a binary format that's only usable by NumPy. But it's very fast, and generates reasonably small files.
np.save('matrix.npy', Q)
Q = np.load('matrix.npy')
np.savetxt saves the array in a text file, using a dialect of CSV (with whitespace separators, by default). It's slower, and generates bigger files, but if you want to be able to read or edit the files (or send them through an ASCII-only channel, like email without attachments), it's the best option.
np.savetxt('matrix.txt', Q)
Q = np.loadtxt('matrix.txt')
np.savetxt can also save the array in a compressed text file. This gives you small files, but they're slower to save and load. They're not directly human-readable, but it's very easy to un-gzip a file, and then you've got a text file you can read and edit. So, sometimes this is worth doing.
np.savetxt('matrix.txt.gz', Q)
Q = np.loadtxt('matrix.txt.gz')
Finally, you can just use standard Python saving and loading mechanisms, like pickle:
with open('matrix.pickle', 'wb') as f:
pickle.dump(Q, f)
with open('matrix.pickle', 'rb') as f:
Q = pickle.load(f)
This is really only useful if you need to store NumPy arrays together with non-NumPy objects.
If you have to save multiple matrices, instead of saving one per file, you might want to look at savez and savez_compressed. Or, if you need multiple objects, only some of which are NumPy, pickle may be the best option.