How to read multiple text files into arrays?

How to read multiple text files into arrays? - python

I have multiple text files, each containing several columns. I need to read each file into an array in python, called RDF. The point is that I used to read one file into one array as following:
RDF_1 = numpy.loadtxt("filename_1.txt", skiprows=205, usecols=(1,), unpak=True)
How to create a loop in python such that it reads more than one file into their corresponding arrays like this:
for i in range(100):
RDF_i = numpy.loadtxt("filename_"+str(i)+".txt", skiprows=205, usecols=(1,), unpak=True)

You can use dictionaries as a proper way:
files_mapping = dict()
for i in range(100):
files_mapping[f'RDF_{i}'] = numpy.loadtxt(f"filename_{i}.txt", skiprows=205, usecols=(1,), unpak=True)
But if for some unknown reasons you really need to dynamically create variables then you can use exec:
for i in range(100):
exac(f'RDF_{i} = numpy.loadtxt(f"filename_{i}.txt", skiprows=205, usecols=(1,), unpak=True)'
And another possible way is using locals:
for i in range(100):
locals()[f'RDF_{i}'] = numpy.loadtxt(f"filename_{i}.txt", skiprows=205, usecols=(1,), unpak=True)
You need to avoid using two last options in real code because it's a direct way to spawning hard-to-find bugs.

I found a way to do it. I use two dimensional arrys after importing numpy library.
However, I had to zero the arrays before filling them out with data because python had already filled them out with random values.

Related

How to automatically flatten and combine .csv files into one matrix in Python?

I have a bunch of .csv files in Python that are x by y dimensions. I know how to flatten and reshape matrices, but am having trouble automatically doing this for multiple files. Once I have flattened the matrices into one dimension, I also would like to stack them on top of each other, in one big matrix.
Is this even the proper way to do a for loop? I have not gotten to the part of stacking the linearized matrices on top of each other into one matrix yet. Would that involve the DataFrame.stack() function? When I run the code below it gives me an error.
import numpy as np
import pandas as pd
file_list = sorted(os.listdir('./E/')) #created the list of files in a specific directory
del file_list[0] #removed an item from the list that I did not want
for file in range(0,26):
pd.read_csv('./E/' + print(file_list), header=None) #should read files
A = set(Int.flatten()) #should collapse matrix to one dimension
B = np.reshape(A, -1) #should make it linear going across

Since I don't know what your files look like, I'm not sure this will work. But still, the below code includes some concepts that should be useful:
import numpy as np
import pandas as pd
file_list = sorted(os.listdir('.\\E'))
del file_list[0]
# Eventually, all_files_array will contain len(file_list) elements, each of which is a file.
all_files_array = []
for i in range(len(file_list)):
file = file_list[i]
# Depending on how you saved your file, you may need to add index=None as an argument to read_csv.
this_file_arr = pd.read_csv('.\\E\\' + file, header=None)
# Change the dtype to int if that's what you're working with.
this_file_arr = this_file_arr.to_numpy(dtype=float, copy=False)
this_file_arr = np.unique(this_file_arr.flatten())
# In all my tests, the following line does absolutely nothing, but I guess it doesn't hurt.
this_file_arr = np.reshape(this_file_arr, -1)
all_files_array.append(this_file_arr)
all_files_array = np.array(all_files_array)
# all_files_array now has shape (len(files_list), x, y) where one file has shape (x, y).
The main takeaways are that:
os.listdir() doesn't work when the path has slashes at the end. Also, Python requires that / in path names be replaced with '\', so I've done that.
Using range instead of hard-coding the number of files to read is good practice in case you add more files to file_list later down the line, unless of course you don't want to read all the files in file_list.
A print statement inside pd.read_csv is at best useless, and at worst will throw an error.
this_file_arr.flatten() is a NumPy method, so this_file_arr needs to be a NumPy array, hence the to_numpy() line.
Because np.reshape doesn't take sets, I used np.unique instead to avoid converting to a non-NumPy structure. If you want to use NumPy methods, keep your data in a NumPy array and don't convert it to a list or set or anything else.
Let me know if you have any questions!

Import Multiple Text files (Large Number) using numpy and Post Processing

This forum has been extremely helpful for a python novice like me to improve my knowledge. I have generated a large number of raw data in text format from my CFD simulation. My objective is to import these text files into python and do some postprocessing on them. This is a code that I have currently.
import numpy as np
from matplotlib import pyplot as plt
import os
filename=np.array(['v1-0520.txt','v1-0878.txt','v1-1592.txt','v1-3020.txt','v1-5878.txt'])
for i in filename:
format_name= i
path='E:/Fall2015/Research/CFDSimulations_Fall2015/ddn310/Autoexport/v1'
data= os.path.join(path,format_name)
X,Y,U,V,T,Tr = np.loadtxt(data,usecols=(1,2,3,4,5,6),skiprows=1,unpack = True) # Here X and Y represents the X and Y coordinate,U,V,T,Tr represents the Dependent Variables
plt.figure(1)
plt.plot(T,Y)
plt.legend(['vt1a','vtb','vtc','vtd','vte','vtf'])
plt.grid(b=True)
Is there a better way to do this, like importing all the text files (~10000 files) at once into python and then accessing whichever files I need for post processing (maybe indexing). All the text files will have the same number of columns and rows.
I am just a beginner to Python.I will be grateful if someone can help me or point me in the right direction.

Your post needs to be edited to show proper indentation.
Based on a quick read, I think you are:
reading a file, making a small edit, and write it back
then you load it into a numpy array and plot it
Presumably the purpose of your edit is to correct some header or value.
You don't need to write the file back. You can use content directly in loadtxt.
content = content.replace("nodenumber","#nodenumber") # Ignoring Node number column
data1=np.loadtxt(content.splitlines())
Y=data1[:,2]
temp=data1[:,5]
loadtxt accepts any thing that feeds it line by line. content.splitlines() makes a list of lines, which loadtxt can use.
the load could be more compact with:
Y, temp = np.loadtxt(content.splitlines(), usecols=(2,5), unpack=True)
With usecols you might not even need the replace step. You haven't given us a sample file to test.
I don't understand your multiple file needs. One way other you need to open and read each file, one by one. And it would be best to close one before going on to the next. The with open(name) as f: syntax is great for ensuring that a file is closed.
You could collect the loaded data in larger lists or arrays. If Y and temp are identical in size for all files, they can be collected into larger dimensional array, e.g. YY[i,:] = Y for the ith file, where YY is preallocated. If they can vary in size, it is better to collect them in lists.

How to use numpy.savez to save array with subarrays into separate .npy files

I have just recently started using numpy and was wondering some things.
I have a numpy array that looks like this after splitting it:
[array([1,2,3]),
array([4,5,6])]
I want to use numpy.savez to save the main array into the .npz archive with each subarray in its own .npy file.
I thought using this:
numpy.savez('dataFile', mainArray)
would work but it only creates the archive with a single .npy file called arr_0.npy.
Is there a way to do something like this? and if so is there a way so that I can use any array with any number of subarrays with that method. To get these arrays I am reading from a .bin file that could contain any number of elements that would split into any number of arrays. This is why I'm having a hard time.
Is there a way to add files to an already created .npz file?

After doing more research I came upon the answer to my main question. I found out that you can use the *arg to loop through the list of arrays to add them.
I changed the code to
numpy.savez('test', *[mainArray[x] for x in rang(len(mainArray))])
This gave me the solution i was looking for. Thank you for your help.

If you want to save the subarrays in your main array, then you probably need to use save manually, i.e.
mainArray = [np.array([1,2,3]), np.array([4,5,6])]
for i in range(len(mainArray)):
np.save('dataFile_%i'%i, mainArray[i] )
Or you can use savez to save subarrays separately and load them later.
mainArray = [np.array([1,2,3]), np.array([4,5,6])]
np.savez('dataFile', mainArray[0], mainArray[1])
npzfile = np.load('dataFile.npz')
npzfile['arr_0']
npzfile['arr_1']

Saving python data for an application

I need to save multiple numpy arrays along with the user input that was used to compute the data these arrays contain in a single file. I'm having a hard time finding a good procedure to use to achieve this or even what file type to use. The only thing i can think of is too put the computed arrays along with the user input into one single array and then save it using numpy.save. Does anybody know any better alternatives or good file types for my use?

You could try using Pickle to serialize your arrays.

How about using pickle and then storing pickled array objects in a storage of your choice, like database or files?

I had this problem long ago so i dont have the code near to show you, but i used a binary write in a tmp file to get that done.
EDIT: Thats is, pickle is what i used. Thanks SpankMe and RoboInventor

Numpy provides functions to save arrays to files, e.g. savez():
outfile = '/tmp/data.dat'
x = np.arange(10)
y = np.sin(x)
np.savez(outfile, x=x, y=y)
npzfile = np.load(outfile)
print npzfile['x']
print npzfile['y']

get specific content from file python

I have a file test.txt which has an array:
array = [3,5,6,7,9,6,4,3,2,1,3,4,5,6,7,8,5,3,3,44,5,6,6,7]
Now what I want to do is get the content of array and perform some calculations with the array. But the problem is when I do open("test.txt") it outputs the content as the string. Actually the array is very big, and if I do a loop it might not be efficient. Is there any way to get the content without splitting , ? Any new ideas?

I recommend that you save the file as json instead, and read it in with the json module. Either that, or make it a .py file, and import it as python. A .txt file that looks like a python assignment is kind of odd.

Does your text file need to look like python syntax? A list of comma separated values would be the usual way to provide data:
1,2,3,4,5
Then you could read/write with the csv module or the numpy functions mentioned above. There's a lot of documentation about how to read csv data in efficiently. Once you had your csv reader data object set up, data could be stored with something like:
data = [ map( float, row) for row in csvreader]

If you want to store a python-like expression in a file, store only the expression (i.e. without array =) and parse it using ast.literal_eval().
However, consider using a different format such as JSON. Depending on the calculations you might also want to consider using a format where you do not need to load all data into memory at once.

Must the array be saved as a string? Could you use a pickle file and save it as a Python list?
If not, could you try lazy evaluation? Maybe only process sections of the array as needed.
Possibly, if there are calculations on the entire array that you must always do, it might be a good idea to pre-compute those results and store them in the txt file either in addition to the list or instead of the list.

You could also use numpy to load the data from the file using numpy.genfromtxt or numpy.loadtxt. Both are pretty fast and both have the ability to do the recasting on load. If the array is already loaded though, you can use numpy to convert it to an array of floats, and that is really fast.
import numpy as np
a = np.array(["1", "2", "3", "4"])
a = a.astype(np.float)

You could write a parser. They are very straightforward. And much much faster than regular expressions, please don't do that. Not that anyone suggested it.
# open up the file (r = read-only, b = binary)
stream = open("file_full_of_numbers.txt", "rb")
prefix = '' # end of the last chunk
full_number_list = []
# get a chunk of the file at a time
while True:
# just a small 1k chunk
buffer = stream.read(1024)
# no more data is left in the file
if '' == buffer:
break
# delemit this chunk of data by a comma
split_result = buffer.split(",")
# append the end of the last chunk to the first number
split_result[0] = prefix + split_result[0]
# save the end of the buffer (a partial number perhaps) for the next loop
prefix = split_result[-1]
# only work with full results, so skip the last one
numbers = split_result[0:-1]
# do something with the numbers we got (like save it into a full list)
full_number_list += numbers
# now full_number_list contains all the numbers in text format
You'll also have to add some logic to use the prefix when the buffer is blank. But I'll leave that code up to you.

OK, so the following methods ARE dangerous. Since they are used to attack systems by injecting code into them, used them at your own risk.
array = eval(open("test.txt", 'r').read().strip('array = '))
execfile('test.txt') # this is the fastest but most dangerous.
Safer methods.
import ast
array = ast.literal_eval(open("test.txt", 'r').read().strip('array = ')).
...
array = [float(value) for value in open('test.txt', 'r').read().strip('array = [').strip('\n]').split(',')]
The eassiest way to serialize python objects so you can load them later is to use pickle. Assuming you dont want a human readable format since this adds major head, either-wise, csv is fast and json is flexible.
import pickle
import random
array = random.sample(range(10**3), 20)
pickle.dump(array, open('test.obj', 'wb'))
loaded_array = pickle.load(open('test.obj', 'rb'))
assert array == loaded_array
pickle does have some overhead and if you need to serialize large objects you can specify the compression ratio, the default is 0 no compression, you can set it to pickle.HIGHEST_PROTOCOL pickle.dump(array, open('test.obj', 'wb'), pickle.HIGHEST_PROTOCOL)
If you are working with large numerical or scientific data sets then use numpy.tofile/numpy.fromfile or scipy.io.savemat/scipy.io.loadmat they have little overhead, but again only if you are already using numpy/scipy.
good luck.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to read multiple text files into arrays? - python

I found a way to do it. I use two dimensional arrys after importing numpy library. However, I had to zero the arrays before filling them out with data because python had already filled them out with random values.

Related

How to automatically flatten and combine .csv files into one matrix in Python?

Import Multiple Text files (Large Number) using numpy and Post Processing

How to use numpy.savez to save array with subarrays into separate .npy files

Saving python data for an application

get specific content from file python

Categories

Resources