Looping over files and plotting (Python) - python

My data is look like as in the picture. All of my datas are in .txt format and my aim is to loop over files and plot them. First row represents my variables
(WL, ABS, T%) so firstly I need to delete them before proceeding.
with open('Desktop/100-3.txt', 'r') as f:
data = f.read().splitlines(True)
with open('Desktop/100-3.txt', 'w') as f:
f.writelines(data[1:])
Probably it would not be necessary but I am very new in Numpy. Basically the algorithm will be as follows:
Read all the .txt files
Plot T% versus WL, plot ABS versus WL, save. (WL -> x variable)
Continue for the next file, .. (two graphs for every .txt file)
Then finish the loop, exit.
data looks like this
What I've tried
from numpy import loadtxt
import os
dizin = os.listdir(os.getcwd())
for i in dizin:
if i.endswith('.txt'):
data = loadtxt("??",float)

For data files like this I would prefer np.genfromtxt over np.loadtxt, it has many useful options you can look up in the docs. The glob module is also nice to iterate over directories with wildcards as filters:
from glob import glob
import numpy as np
import matplotlib.pyplot as plt
# loop over all files in the current directory ending with .txt
for fname in glob("./*.txt"):
# read file, skip header (1 line) and unpack into 3 variables
WL, ABS, T = np.genfromtxt(fname, skip_header=1, unpack=True)
# first plot
plt.plot(WL, T)
plt.xlabel('WL')
plt.ylabel('T%')
plt.show()
plt.clf()
# second plot
plt.plot(ABS, T)
plt.xlabel('WL')
plt.ylabel('ABS')
plt.show()
plt.clf()
The next step would be to do some research on matplotlib to make the plots look better.
Please let me know if the code does not work, I'll try to fix it then.
EDIT: Added plt.clf() to clear the figure before creating a new one.

Related

How to sum up yes and no into a total quantity, using matplotlib, pandas, python from a CSV import to plot a graph

[m ][1]
QUESTION #1) I am new to python and coding in general. I want to take my data from a CSV which has a column labeled "U.S. OSHA Recordable?". In that column every answer is either "yes" or "no". I want to display a plot.bar that shows "23 yes's" and "7 No's". Essentially adding up the total of "yes's" and "no's in the column, then displaying the total in 1 clean bar graphs. It will display 2 bars with the total number on top of both bars.... The problem is, the bar graph has a single line on the X axis right now and each line says "no, yes, no, yes, yes, no" about 27 individual times. I want the users to easily see 1 bar graph showing only 2 bars with the total on top like this image.
This is my code, I am not sure what i would need to sum up the Yes and No in the column.
import pandas as pd # powerful data visualization library
import numpy as np
import matplotlib.pyplot as plt # allows us to plot things
import csv # allows us to import and use CSV commands which are simple but effective
data = pd.read_csv(r'C:\Users\rmond\Downloads\PS_csvFile.csv', encoding="ISO-8859-1", skiprows=6) #skiprows allows you to skip the comments on top... & ecoding allows pandas to work on this CSV
data.head() # this will give the first row that you want it to read the header
data.plot.bar(x='U.S. OSHA Recordable?') #creates a plot in pandas
plt.show() # shows the plot to the user
df['Val'].value_counts().plot(kind='bar')
Here Val is the name of the column that contains 'Yes' & 'No'
import pandas as pd # powerful data visualization library
import numpy as np
import matplotlib.pyplot as plt # allows us to plot things
import csv # allows us to import and use CSV commands which are simple but effective
import seaborn as sns # it counts everything for you and outputs it exactly like I want
# This website saved my life https://www.pythonforengineers.com/introduction-to-pandas/
# use this to check the available styles: plt.style.available
data = pd.read_csv(r'C:\Users\rmond\Downloads\PS_csvFile.csv', encoding="ISO-8859-1", skiprows=6) #skiprows allows you to skip the comments on top... & ecoding allows pandas to work on this CSV
sns.set(style="whitegrid")
ax = sns.countplot(x='U.S. OSHA Recordable?', data=data)
plt.show() # shows the plot to the user
So interestingly enough I found out about "seaborn" I pip installed it and gave it a shot. It is supposed to pull data from a URL, but after viewing a few other pages on stack overflow I found a great suggestion. Anyways, this works great and it does everything for me. I am so happy with this solution. Now onto the next problem lol. I hope this helps someone else in the future.
My graph looks exactly like the one posted by SH-SF btw. Works great

Using multiple FITS file

How can I open two fits file at the same time with astropy? Is it possible to work on multiple FITS file at the same time or do I have to work on one at a time?
You can open as many FITS files as you like. Each is represented by a HDUList object.
from astropy.io import fits
hdu_list1 = fits.open('file1.fits')
hdu_list2 = fits.open('file2.fits')
Then I'd suggest to call this to see what the FITS files contain:
hdu_list1.info()
hdu_list2.info()
You can then access any header and data information in those FITS files and do what you want. It goes something like this:
array1 = hdu_list1[0].data
array2 = hdu_list2[0].data
ratio = array1 / array2
If you want to make a plot:
import matplotlib.pyplot as plt
plt.imshow(ratio)
The Astropy docs are very good. E.g. you could start to learn about astropy.io.fits here or here.

Import Multiple Text files (Large Number) using numpy and Post Processing

This forum has been extremely helpful for a python novice like me to improve my knowledge. I have generated a large number of raw data in text format from my CFD simulation. My objective is to import these text files into python and do some postprocessing on them. This is a code that I have currently.
import numpy as np
from matplotlib import pyplot as plt
import os
filename=np.array(['v1-0520.txt','v1-0878.txt','v1-1592.txt','v1-3020.txt','v1-5878.txt'])
for i in filename:
format_name= i
path='E:/Fall2015/Research/CFDSimulations_Fall2015/ddn310/Autoexport/v1'
data= os.path.join(path,format_name)
X,Y,U,V,T,Tr = np.loadtxt(data,usecols=(1,2,3,4,5,6),skiprows=1,unpack = True) # Here X and Y represents the X and Y coordinate,U,V,T,Tr represents the Dependent Variables
plt.figure(1)
plt.plot(T,Y)
plt.legend(['vt1a','vtb','vtc','vtd','vte','vtf'])
plt.grid(b=True)
Is there a better way to do this, like importing all the text files (~10000 files) at once into python and then accessing whichever files I need for post processing (maybe indexing). All the text files will have the same number of columns and rows.
I am just a beginner to Python.I will be grateful if someone can help me or point me in the right direction.
Your post needs to be edited to show proper indentation.
Based on a quick read, I think you are:
reading a file, making a small edit, and write it back
then you load it into a numpy array and plot it
Presumably the purpose of your edit is to correct some header or value.
You don't need to write the file back. You can use content directly in loadtxt.
content = content.replace("nodenumber","#nodenumber") # Ignoring Node number column
data1=np.loadtxt(content.splitlines())
Y=data1[:,2]
temp=data1[:,5]
loadtxt accepts any thing that feeds it line by line. content.splitlines() makes a list of lines, which loadtxt can use.
the load could be more compact with:
Y, temp = np.loadtxt(content.splitlines(), usecols=(2,5), unpack=True)
With usecols you might not even need the replace step. You haven't given us a sample file to test.
I don't understand your multiple file needs. One way other you need to open and read each file, one by one. And it would be best to close one before going on to the next. The with open(name) as f: syntax is great for ensuring that a file is closed.
You could collect the loaded data in larger lists or arrays. If Y and temp are identical in size for all files, they can be collected into larger dimensional array, e.g. YY[i,:] = Y for the ith file, where YY is preallocated. If they can vary in size, it is better to collect them in lists.

Tab separated data python

Must start that I am very new to Python and very bad at it still, but believe that it will be worth it to learn eventually.
My problem is that I have this device that prints out the values in a .txt but seperated by tabs instead of commas. Ex: 50\t50\t66\t0\t4...
And what I want is just plot a simple Histogram with that data.
I do realise that it should be the simplest thing but somehow I am having trouble with it finding a solution from my python nooby lectures nor can I really word this well enough to hit a search online.
import matplotlib.pyplot as plt
#import numpy as np
d = open('.txt', 'r')
d.read()
plt.hist(d)
plt.show()
PS: numpy is just a remainder from one of my previous exercises
No worries, everyone must start somewhere. You are on the right track, and are correct Python is a great language to learn. There are many was this can be accomplished, but here is one way. With the way this example written, it will generate one histogram graph per line in the file. You can modify or change that behavior if needed.
Please note that the CSV module will take care of converting the data in the file to floats by passing the quoting=csv.QUOTE_NONNUMERIC to the constructor of reader. This is probably the preferred method to handling number conversion in a CSV / TSV file.
import csv
import matplotlib.pyplot as plt
data_file = open('testme.txt')
tsv_reader = csv.reader(data_file, delimiter='\t',
quoting=csv.QUOTE_NONNUMERIC)
for row in tsv_reader:
plt.hist(row)
plt.show()
I've left out some things such as proper exception handling, and using a context manager to open to file as is best practice and demonstrated in the csv module documentation.
Once you learn more about the language, I'd suggest digging into those subjects further.
Assign the string result of read() to a variable s:
s = d.read()
split will break your string s into a list of strings:
s = s.split("\t")
map will apply a function to every element of a list:
s = map(float, s)
If you study csv you can handle the file with delimiter='\t' as one of the options. This will change the expected delimiter from ',' to '\t' (tab. All the examples that you study that use the ',' will be handled in the same way.

while loops for processing FITS files(python)

I am new at programming in python and am in the process of trying to create a setup of processing thousands of files with one piece of code in python. I created a practice folder to do this in. In it are two FITS files (FITS1.fits and FITS2.fits). I did the following to put them both in a .txt file:
ls > practice.txt
Here is what I did next:
$ python
import numpy
import pyfits
import matplotlib.pyplot as plt
from matplotlib import pylab
from pylab import *
import asciidata
a = asciidata.open('practice.txt')
print a[0][0] #To test to see if practice.txt really contains my FITS files FITS1.fits
i = 0
while i <=1 #Now I attempt a while loop to read data from columns in FITS files, plot the numbers desired, save and show the figures. I chose i <=1 because there are only two FITS files in the text(also because of zero-indexing).
b = pyfits.getdata(a[0][i]) # "i" will be the index used to use a different file when the while loop gets to the end
time = b['TIME'] #'TIME' is a column in the FITS file
brightness = b['SAP_FLUX']
plt.plot(time, brightness)
xlabel('Time(days)')
ylabel('Brightness (e-/s)')
title(a[0][i])
pylab.savefig('a[0][i].png') #Here am I lost on how to get the while loop to name the saved figure something different every time. It takes the 'a[0][i].png' as a string and not as the index I am trying to make it be.
pylab.show()
i=i+1 # I placed this here, hoping that when the while loop gets to this point, it would start over again with a different "i" value
After pressing enter twice, I see the first figure as expected. Then I will close it and see the second. However, only the first figure is saved. Does anyone have any suggestions on how I can change my loop to do what I need it to?
In your code the i is being treated as the letter i, not the variable. If you wanted to keep this naming you could do something like:
FileName = 'a[0][%s].png' % i
pylab.savefig(FileName)
You should use glob to automatically get a the fits files as a list, from there using a for loop will let you iterate of the names of the files directly instead of using an index. When you call plt.savefig, you need to construct the file name you want to save it as. Here is the code cleaned up and put together:
from glob import glob
import pyfits
from matplotlib import pyplot as plt
files = glob('*.fits')
for file_name in files:
data = pyfits.getdata(file_name)
name = file_name[:-len('.fits')] # Remove .fits from the file name
time = data['TIME']
brightness = data['SAP_FLUX']
plt.plot(time, brightness)
plt.xlabel('Time(days)')
plt.ylabel('Brightness (e-/s)')
plt.title(name)
plt.savefig(name + '.png')
plt.show()

Categories

Resources