Access data in several matlab files - python

I am currently trying to get data from several MATLab files. I am reading from a directory where I have all my data saved. I can read the data (ex. Gain) within the a Matlab file individually. I would like to add a for loop to read through all the files (different data but same format) the same "Gain," but every time I try to write a for loop, it gives me this error: TypeError: list indices must be integers or slices, not dict
import numpy as np
import sys
import scipy.io as sio
import scipy
import matplotlib.pyplot as plt
import tkinter as tk
from tkinter import *
from tkinter import filedialog
import os
#directory where all data will be stored
dataDir="C:/Users/me/Desktop/Data/"
Files=[] #list of files
lenght=len(Files)
for file in os.listdir(dataDir):
Files.append(scipy.io.loadmat(dataDir+file))
#initialize arrays
results=[lenght]
Gain=[lenght]
for files in Files:
results[files]=Files[files]['results']
#Gain in side of my results key in the file. I can read it in only one file
Gain[files]=results[files]['PowerDomain'][0,0]['Gain'][0,0]
print (files)

There are two separate issues with the code that are causing this error:
(1) Your code is initializing results and Gain to length-1 lists. Instead (based on what it appears you're trying to do), you should be initializing them to length lenght (sic*) lists, i.e.results = [None] * lenght and Gain = [None] * lenght.
(2) Files is an array of dicts. In each loop iteration, files (sic**) becomes one of those dicts. If you instead want files to be an index into the Files list, then your loop should be for files in range(0, lenght):.
Alternatively, if you want results and Gain to be dicts, you can initialize them as results = {} and similarly for Gain, and then in each iteration you can set their values like results(<filename>) = ... (where <filename> is the current file name.
Also, if you want to keep the loop of the form for files in Files:, that's fine, but keep in mind that files will actually be an element of Files rather than an index into Files.
* From a style perspective, you should correct the spelling of lenght to length.
** Also from a style perspective, you should probably not name the list index files; perhaps something like f. But this is more subjective.

Related

How to automatically flatten and combine .csv files into one matrix in Python?

I have a bunch of .csv files in Python that are x by y dimensions. I know how to flatten and reshape matrices, but am having trouble automatically doing this for multiple files. Once I have flattened the matrices into one dimension, I also would like to stack them on top of each other, in one big matrix.
Is this even the proper way to do a for loop? I have not gotten to the part of stacking the linearized matrices on top of each other into one matrix yet. Would that involve the DataFrame.stack() function? When I run the code below it gives me an error.
import numpy as np
import pandas as pd
file_list = sorted(os.listdir('./E/')) #created the list of files in a specific directory
del file_list[0] #removed an item from the list that I did not want
for file in range(0,26):
pd.read_csv('./E/' + print(file_list), header=None) #should read files
A = set(Int.flatten()) #should collapse matrix to one dimension
B = np.reshape(A, -1) #should make it linear going across
Since I don't know what your files look like, I'm not sure this will work. But still, the below code includes some concepts that should be useful:
import numpy as np
import pandas as pd
file_list = sorted(os.listdir('.\\E'))
del file_list[0]
# Eventually, all_files_array will contain len(file_list) elements, each of which is a file.
all_files_array = []
for i in range(len(file_list)):
file = file_list[i]
# Depending on how you saved your file, you may need to add index=None as an argument to read_csv.
this_file_arr = pd.read_csv('.\\E\\' + file, header=None)
# Change the dtype to int if that's what you're working with.
this_file_arr = this_file_arr.to_numpy(dtype=float, copy=False)
this_file_arr = np.unique(this_file_arr.flatten())
# In all my tests, the following line does absolutely nothing, but I guess it doesn't hurt.
this_file_arr = np.reshape(this_file_arr, -1)
all_files_array.append(this_file_arr)
all_files_array = np.array(all_files_array)
# all_files_array now has shape (len(files_list), x, y) where one file has shape (x, y).
The main takeaways are that:
os.listdir() doesn't work when the path has slashes at the end. Also, Python requires that / in path names be replaced with '\', so I've done that.
Using range instead of hard-coding the number of files to read is good practice in case you add more files to file_list later down the line, unless of course you don't want to read all the files in file_list.
A print statement inside pd.read_csv is at best useless, and at worst will throw an error.
this_file_arr.flatten() is a NumPy method, so this_file_arr needs to be a NumPy array, hence the to_numpy() line.
Because np.reshape doesn't take sets, I used np.unique instead to avoid converting to a non-NumPy structure. If you want to use NumPy methods, keep your data in a NumPy array and don't convert it to a list or set or anything else.
Let me know if you have any questions!

python-xarray: open_mfdataset in which one .nc file has no populated time axis

I am attempting to open a series of netcdf4 datasets. using something like:
import xarray as xr; import os; import fnmatch
def new_arr(string):
return xr.open_mfdataset([ f for f in os.listdir(os.getcwd()) if fnmatch.fnmatch(f,string) == True])
The boolean provides me with a list of files matching what I want to load into the dataset. This works so long as all matching files in the directory have at least one entry in the (unlimited) time axis. However, if one file does not, xarray decides to just throw me an error.
Is there a concise pythonic way to have xarray pass over matching files that have time axes of length zero and still give me the dataset or am I going to be forced to run a bash script to get rid of the exceptions beforehand?

How to put many numpy files in one big numpy file without having memory error?

I follow this question Append multiple numpy files to one big numpy file in python in order to put many numpy files in one big file, the result is:
import matplotlib.pyplot as plt
import numpy as np
import glob
import os, sys
fpath ="path_Of_my_final_Big_File"
npyfilespath ="path_of_my_numpy_files"
os.chdir(npyfilespath)
npfiles= glob.glob("*.npy")
npfiles.sort()
all_arrays = np.zeros((166601,8000))
for i,npfile in enumerate(npfiles):
all_arrays[i]=np.load(os.path.join(npyfilespath, npfile))
np.save(fpath, all_arrays)
data = np.load(fpath)
print data
print data.shape
I have thousands of files, by using this code, I have always a memory error, so I can't have my result file.
How to resolve this error?
How to read, write and append int the final numpy file by file, ?
Try to have a look to np.memmap. You can instantiateall_arrays:
all_arrays = np.memmap("all_arrays.dat", dtype='float64', mode='w+', shape=(166601,8000))
from the documentation:
Memory-mapped files are used for accessing small segments of large files on disk, without reading the entire file into memory.
You will be able to access all the array, but the operating system will take care of loading the part that you actually need. Read carefully the documentation page and note that from the performance point of view you can decide whether the file should be stored column-wise or row-wise.

Import Multiple Text files (Large Number) using numpy and Post Processing

This forum has been extremely helpful for a python novice like me to improve my knowledge. I have generated a large number of raw data in text format from my CFD simulation. My objective is to import these text files into python and do some postprocessing on them. This is a code that I have currently.
import numpy as np
from matplotlib import pyplot as plt
import os
filename=np.array(['v1-0520.txt','v1-0878.txt','v1-1592.txt','v1-3020.txt','v1-5878.txt'])
for i in filename:
format_name= i
path='E:/Fall2015/Research/CFDSimulations_Fall2015/ddn310/Autoexport/v1'
data= os.path.join(path,format_name)
X,Y,U,V,T,Tr = np.loadtxt(data,usecols=(1,2,3,4,5,6),skiprows=1,unpack = True) # Here X and Y represents the X and Y coordinate,U,V,T,Tr represents the Dependent Variables
plt.figure(1)
plt.plot(T,Y)
plt.legend(['vt1a','vtb','vtc','vtd','vte','vtf'])
plt.grid(b=True)
Is there a better way to do this, like importing all the text files (~10000 files) at once into python and then accessing whichever files I need for post processing (maybe indexing). All the text files will have the same number of columns and rows.
I am just a beginner to Python.I will be grateful if someone can help me or point me in the right direction.
Your post needs to be edited to show proper indentation.
Based on a quick read, I think you are:
reading a file, making a small edit, and write it back
then you load it into a numpy array and plot it
Presumably the purpose of your edit is to correct some header or value.
You don't need to write the file back. You can use content directly in loadtxt.
content = content.replace("nodenumber","#nodenumber") # Ignoring Node number column
data1=np.loadtxt(content.splitlines())
Y=data1[:,2]
temp=data1[:,5]
loadtxt accepts any thing that feeds it line by line. content.splitlines() makes a list of lines, which loadtxt can use.
the load could be more compact with:
Y, temp = np.loadtxt(content.splitlines(), usecols=(2,5), unpack=True)
With usecols you might not even need the replace step. You haven't given us a sample file to test.
I don't understand your multiple file needs. One way other you need to open and read each file, one by one. And it would be best to close one before going on to the next. The with open(name) as f: syntax is great for ensuring that a file is closed.
You could collect the loaded data in larger lists or arrays. If Y and temp are identical in size for all files, they can be collected into larger dimensional array, e.g. YY[i,:] = Y for the ith file, where YY is preallocated. If they can vary in size, it is better to collect them in lists.

while loops for processing FITS files(python)

I am new at programming in python and am in the process of trying to create a setup of processing thousands of files with one piece of code in python. I created a practice folder to do this in. In it are two FITS files (FITS1.fits and FITS2.fits). I did the following to put them both in a .txt file:
ls > practice.txt
Here is what I did next:
$ python
import numpy
import pyfits
import matplotlib.pyplot as plt
from matplotlib import pylab
from pylab import *
import asciidata
a = asciidata.open('practice.txt')
print a[0][0] #To test to see if practice.txt really contains my FITS files FITS1.fits
i = 0
while i <=1 #Now I attempt a while loop to read data from columns in FITS files, plot the numbers desired, save and show the figures. I chose i <=1 because there are only two FITS files in the text(also because of zero-indexing).
b = pyfits.getdata(a[0][i]) # "i" will be the index used to use a different file when the while loop gets to the end
time = b['TIME'] #'TIME' is a column in the FITS file
brightness = b['SAP_FLUX']
plt.plot(time, brightness)
xlabel('Time(days)')
ylabel('Brightness (e-/s)')
title(a[0][i])
pylab.savefig('a[0][i].png') #Here am I lost on how to get the while loop to name the saved figure something different every time. It takes the 'a[0][i].png' as a string and not as the index I am trying to make it be.
pylab.show()
i=i+1 # I placed this here, hoping that when the while loop gets to this point, it would start over again with a different "i" value
After pressing enter twice, I see the first figure as expected. Then I will close it and see the second. However, only the first figure is saved. Does anyone have any suggestions on how I can change my loop to do what I need it to?
In your code the i is being treated as the letter i, not the variable. If you wanted to keep this naming you could do something like:
FileName = 'a[0][%s].png' % i
pylab.savefig(FileName)
You should use glob to automatically get a the fits files as a list, from there using a for loop will let you iterate of the names of the files directly instead of using an index. When you call plt.savefig, you need to construct the file name you want to save it as. Here is the code cleaned up and put together:
from glob import glob
import pyfits
from matplotlib import pyplot as plt
files = glob('*.fits')
for file_name in files:
data = pyfits.getdata(file_name)
name = file_name[:-len('.fits')] # Remove .fits from the file name
time = data['TIME']
brightness = data['SAP_FLUX']
plt.plot(time, brightness)
plt.xlabel('Time(days)')
plt.ylabel('Brightness (e-/s)')
plt.title(name)
plt.savefig(name + '.png')
plt.show()

Categories

Resources