having problems with matplotlib and spectroscopy data - python

I am trying to plot a .dat file from an stellar catalog using this code
try:
import pyfits
noPyfits=False
except:
noPyfits=True
import matplotlib.pyplot as plt
import numpy as np
f2 = open('/home/mcditoos/Desktop/Astrophysics_programs/Data_LAFT/ESPECTROS/165401.dat', 'r')
lines = f2.readlines()
f2.close()
x1 = []
y1 = []
for line in lines:
p = line.split()
x1.append(float(p[0]))
y1.append(float(p[1]))
xv = np.array(x1)
yv = np.array(y1)
plt.plot(xv, yv)
plt.show()
however i get the following error:
x1.append(float(p[0]))
IndexError: list index out of range
also i wanted to know if there is anyway of making it a program capable of opening the next .dat file given an input

I may not understand fully your question but why don't you use
X, Y = numpy.genfromtxt('yourfile', dtype='str')
X = X.astype('float')
Y = Y.astype('float')
If in your file you have 2 columns you can transpose your table with
X, Y = numpy.genfromtxt('yourfile', dtype='str').T

Related

How to automate loading multiple files into numpy arrays using a simple "for" loop?

I usually load my data, that -in most cases- consists of only two columns using np.loadtxt cammand as follows:
x0, y0 = np.loadtxt('file_0.txt', delimiter='\t', unpack=True)
x1, y1 = np.loadtxt('file_1.txt', delimiter='\t', unpack=True)
.
.
xn, yn = np.loadtxt('file_n.txt', delimiter='\t', unpack=True)
then plot each pair on its own, which is not ideal!
I want to make a simple "for" loop that goes for all text files in the same directory, load the files and plot them on the same figure.
import os
import matplotlib.pyplot as plt
# A list of all file names that end with .txt
myfiles = [myfile for myfile in os.listdir() if myfile.endswith(".txt")]
# Create a new figure
plt.figure()
# iterate over the file names
for myfile in myfiles:
# load the x, y
x, y = np.loadtxt(myfile, delimiter='\t', unpack=True)
# plot the values
plt.plot(x, y)
# show the figure after iterating over all files and plotting.
plt.show()
Load all the files in a dictionary using:
d = {}
for i in range(n):
d[i] = np.loadtxt('file_' + str(i) + '.txt', delimiter='\t', unpack=True)
Now, to access kth file, use d[k] or:
xk, yk = d[k]
Since, you have not mentioned about the data in the files and the plot you want to create, it's hard to tell what to do. But for plotting, you can refer Mttplotlib or Seaborn libraries.
You can also use glob to get all the files -
from glob import glob
import numpy as np
import os
res = []
file_path = "YOUR PATH"
file_pattern = "file_*.txt"
files_list = glob(os.path.join(file_path,file_pattern))
for f in files_list:
print(f'----- Loading {f} -----')
x, y = np.loadtxt(f, delimiter='\t', unpack=True)
res += [(x,y)]
res will contain your file contents at each index value corresponding to f

How to make details of a graph sorted

I have a directory that has 6 folders within. I am plotting folders automatically but when I see the result, it is a bit weird. While the folders are sorted in the computer, the plot is not ordered. For example, I want to have the result of C_r 0.05 before C_r 0.1 and so on. I have plotted using my folder path and I do not know how to make an example of this since I am plotting from my computer but I will put the graph that I have obtained and the code which plots the graph.
import os
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
sns.set(style="darkgrid")
#matplotlib qt
root = r'/home/hossein/Desktop/Out/INTERSECTION/BETA 15'
xx=[]
percentage=[]
labels = []
gg=[]
my_list = os.listdir(root)
my_list = [file for file in my_list if os.path.isdir(os.path.join(root, file))]
percetanges = []
for directory in my_list:
CASES = [file for file in os.listdir(os.path.join(root, directory)) if file.startswith('config')]
if len(CASES)==0:
continue
CASES.sort()
#print(CASES)
percentage=[]
for filename in CASES:
# print(filename)
with open(os.path.join(root, directory,filename), "r") as file:
#files[filename] = file.read()
lines = file.readlines()
x = [float(line.split()[0]) for line in lines]
y = [float(line.split()[1]) for line in lines]
#_new = np.array(y)
g = np.linspace(min(y),max(y),100)
h = min(y)*0.9
t = max(y)*0.9
xx=[]
for i in range(1,len(x)):
if (y[i] < h or y[i] > t):
xx.append(x[i])
percent = len(xx)/len(y)
percentage.append(percent)
labels.append(directory)
labels=sorted(labels)
percetanges.append(percentage)
percetanges=sorted(percetanges)
for i, x in enumerate(percetanges):
plt.boxplot(x,positions=[i],whis=0.001)
plt.xticks(np.arange(len(labels)),labels)
The answer is easy. it just needed to sort your directory before plotting. I mean when you want to read just by my_list.sort() . then you will find the right plot in order

How to plot multiple points from a list using matplotlib?

I have read a list of 3D points from a text file. The list looks like follows:
content = ['2.449,14.651,-0.992,', '6.833,13.875,-1.021,', '8.133,17.431,-1.150,', '3.039,13.724,-0.999,', '16.835,9.456,-1.031,', '16.835,9.457,-1.031,', '15.388,5.893,-0.868,', '13.743,25.743,-1.394,', '14.691,24.988,-1.387,', '15.801,25.161,-1.463,', '14.668,23.056,-1.382,', '22.378,20.268,-1.457,', '21.121,17.041,-1.353,', '19.472,13.555,-1.192,', '22.498,20.115,-1.436,', '13.344,-33.672,-0.282,', '13.329,-33.835,-0.279,', '13.147,-30.690,-0.305,', '13.097,-28.407,-0.339,', '13.251,-28.643,-0.366,', '13.527,-25.067,-0.481,', '19.433,-33.137,-0.408,', '19.445,-29.501,-0.345,', '20.592,-28.004,-0.312,', '19.109,-26.512,-0.380,', '18.521,-24.155,-0.519,', '22.837,48.245,-2.201,', '23.269,50.129,-2.282,', '23.499,46.652,-2.297,', '23.814,48.646,-2.271,', '30.377,46.501,-2.214,', '29.869,44.479,-2.143,', '29.597,41.257,-2.018,', '28.134,40.291,-2.159,', '-40.932,-0.320,-1.390,', '-36.808,0.442,-1.382,', '-30.831,0.548,-1.288,', '-29.404,1.235,-1.300,', '-26.453,1.424,-1.261,', '-30.559,2.775,-1.249,', '-27.714,3.439,-1.201,']
I want to plot all the points on a 3D plot. I have this so far:
#!/usr/bin/env python
import numpy as np
import matplotlib.pyplot as plt
with open("measurements.txt") as f:
content = f.read().splitlines()
#print content
for value in content:
x, y, z = value.split(',')
#print x, y, z
fig = plt.figure()
ax = plt.axes(projection='3d')
ax.scatter(x, y, z)
fig.savefig('scatterplot.png')
It throws an error:
Traceback (most recent call last): File "plotting.py", line 11, in
x, y, z = value.split(',')
ValueError: too many values to unpack
How do I plot these points? Thank you for your help.
First of all you need to take the values into respective arrays by spitting lines in file then pass them to the function.
content = ['2.449,14.651,-0.992,', '6.833,13.875,-1.021,', '8.133,17.431,-1.150,', '3.039,13.724,-0.999,', '16.835,9.456,-1.031,', '16.835,9.457,-1.031,', '15.388,5.893,-0.868,', '13.743,25.743,-1.394,', '14.691,24.988,-1.387,', '15.801,25.161,-1.463,', '14.668,23.056,-1.382,', '22.378,20.268,-1.457,', '21.121,17.041,-1.353,', '19.472,13.555,-1.192,', '22.498,20.115,-1.436,', '13.344,-33.672,-0.282,', '13.329,-33.835,-0.279,', '13.147,-30.690,-0.305,', '13.097,-28.407,-0.339,', '13.251,-28.643,-0.366,', '13.527,-25.067,-0.481,', '19.433,-33.137,-0.408,', '19.445,-29.501,-0.345,', '20.592,-28.004,-0.312,', '19.109,-26.512,-0.380,', '18.521,-24.155,-0.519,', '22.837,48.245,-2.201,', '23.269,50.129,-2.282,', '23.499,46.652,-2.297,', '23.814,48.646,-2.271,', '30.377,46.501,-2.214,', '29.869,44.479,-2.143,', '29.597,41.257,-2.018,', '28.134,40.291,-2.159,', '-40.932,-0.320,-1.390,', '-36.808,0.442,-1.382,', '-30.831,0.548,-1.288,', '-29.404,1.235,-1.300,', '-26.453,1.424,-1.261,', '-30.559,2.775,-1.249,', '-27.714,3.439,-1.201,']
import numpy as np
import matplotlib.pyplot as plt
#with open("measurements.txt") as f:
#content = f.read().splitlines()
#print content
#for value in content:
# x, y, z = value.split(',')
x = [float(i.split(',')[0]) for i in content]
y = [float(i.split(',')[1]) for i in content]
z = [float(i.split(',')[2]) for i in content]
#print(x, y, z)
fig = plt.figure()
ax = plt.axes(projection='3d')
ax.scatter(x, y, z)
fig.savefig('scatterplot.png')
output
It's clear ! when you do your split there is 4 values
content = ['2.449,14.651,-0.992,', '6.833,13.875,-1.021,', '8.133,17.431,-1.150,', '3.039,13.724,-0.999,', '16.835,9.456,-1.031,', '16.835,9.457,-1.031,', '15.388,5.893,-0.868,', '13.743,25.743,-1.394,', '14.691,24.988,-1.387,', '15.801,25.161,-1.463,', '14.668,23.056,-1.382,', '22.378,20.268,-1.457,', '21.121,17.041,-1.353,', '19.472,13.555,-1.192,', '22.498,20.115,-1.436,', '13.344,-33.672,-0.282,', '13.329,-33.835,-0.279,', '13.147,-30.690,-0.305,', '13.097,-28.407,-0.339,', '13.251,-28.643,-0.366,', '13.527,-25.067,-0.481,', '19.433,-33.137,-0.408,', '19.445,-29.501,-0.345,', '20.592,-28.004,-0.312,', '19.109,-26.512,-0.380,', '18.521,-24.155,-0.519,', '22.837,48.245,-2.201,', '23.269,50.129,-2.282,', '23.499,46.652,-2.297,', '23.814,48.646,-2.271,', '30.377,46.501,-2.214,', '29.869,44.479,-2.143,', '29.597,41.257,-2.018,', '28.134,40.291,-2.159,', '-40.932,-0.320,-1.390,', '-36.808,0.442,-1.382,', '-30.831,0.548,-1.288,', '-29.404,1.235,-1.300,', '-26.453,1.424,-1.261,', '-30.559,2.775,-1.249,', '-27.714,3.439,-1.201,']
Solution:
for value in content:
x, y, z,parasitic_value = value.split(',')
The element in content are:
'2.449,14.651,-0.992,'
A slightly different way to extract the data to plot from this string is to consider it as a tuple, and to use eval().
data = [eval("("+x[:len(x)-1]+")") for x in content]
Which returns:
[(2.449, 14.651, -0.992),
(6.833, 13.875, -1.021),
(8.133, 17.431, -1.15),
...
(-30.559, 2.775, -1.249),
(-27.714, 3.439, -1.201)]
EDIT: the error you got means:
You want 3 values, X, Y and Z; but when I split at ",", There are more (too many values to unpack).
content[0].split(",")
Out[4]: ['2.449', '14.651', '-0.992', '']
I see at least one error in there.
The most obvious one (because you got an error), is in splitting.
The third comma at the end is causing the string to be split into four elements
>>> l = 'a,b,c,'
>>> l.split(',')
['a', 'b', 'c', '']
you can work around that by using:
x,y,z,_ = value.split(',')
the next problem you'll run into is with your loop
for value in content:
x, y, z = value.split(',')
you are only storing the last of your values, since you overwrite them multiple times.
The easiest way to work around this is creating three lists and appending into them:
x = []
y = []
z = []
for measurement in content:
a,b,c,_ = measurement.split(',')
x.append(a)
y.append(b)
z.append(c)
This is not the most efficient way, but I think it should be easier to understand.
I recommend using it like this:
x = []
y = []
z = []
with open('measurements.txt') as file:
for line in file:
a,b,c,_ = line.split(',')
x.append(a)
y.append(b)
z.append(c)
To solve the main issue , you have to edit the list , and make it a 3d numpy array , by copying all the values , traversing the list via re.
Rather than assuming the list as multiple points , try to take the first 2 points or 3 points as a image/3D graph , and use imshow or Axes3D to plot it.

How to solve the error when i draw graphic in python with using datas in csv file?

I think the problem is the following steps, but just in case,I will also write the whole body of my code down blow. The most strange thing is, this code can read over 6000 csv files and a graphic can also successfully show, but when I want more files to be read, then occours an error. The screenshot shows the graphic and the content of the csv files. as you can see, the path = r'C:\Users\AK6PRAKT\Desktop\6daten' includes all datas and path = r'C:\Users\AK6PRAKT\Desktop\daten' includes only parts of them.enter image description here
import os
from matplotlib import pyplot as pyplot
from collections import defaultdict
import csv
import numpy as np
path = r'C:\Users\AK6PRAKT\Desktop\6daten'
dirs = os.listdir(path)
s = []
x = []
y = []
names = []
...(ignore some steps for reading the datas from csv files)
print(list_temp1,list_temp2) #list_temp1 is the datas of xaxise, and list_temp2 of yaxise.
y.append(float(list_temp2))
names.append(list_temp1)
x = range(len(names))
pyplot.ylim((0, 40))
my_y_ticks = np.arange(0, 40, 10)
pyplot.plot(x,y, linewidth=2)
pyplot.xticks(x,names,rotation = 90)
fig = pyplot.figure(figsize=(10,10))
pyplot.show()
and then...the whole body, and i must say something to declare: I had no background about computer science before, it's really a little bit hard for me to deal with such many datas at the very beginning. Actually i am now doing Internship in a german company and i started to learn python one week ago. I got an assignment from my mentor, I tried to devide the whole assignment into several steps, and I searched the commands of each of the steps and then combine them together with some revising. So, it may seem that I did a lot of useless work. Please be kind in commends(If you have suggestions about that, always glad to hear that of course)
import os
from matplotlib import pyplot as pyplot
from collections import defaultdict
import csv
import numpy as np
path = r'C:\Users\AK6PRAKT\Desktop\6daten'
dirs = os.listdir(path)
s = []
x = []
y = []
names = []
fig = pyplot.figure()
for i in dirs:
if os.path.splitext(i)[1] == ".csv":
f = open(path+"/"+i)
iter_f = iter(f);
str = ""
for line in iter_f:
str = str + line
s.append(str)
with open(path+"/"+i,'r') as r:
lines=r.readlines()
with open(path+"/"+i,'w') as w:
for row in lines:
if 'Date' not in row:
w.write(row)
columns = defaultdict(list)
with open(path+"/"+i) as f:
reader = csv.reader(f)
for row in reader:
for (i,v) in enumerate(row):
columns[i].append(v)
list_temp1 = columns[0]
list_temp1 = np.array(list_temp1)
list_temp2 = columns[1]
list_temp2 = np.array(list_temp2)
print(list_temp1,list_temp2)
y.append(float(list_temp2))
names.append(list_temp1)
x = range(len(names))
pyplot.ylim((0, 40))
my_y_ticks = np.arange(0, 40, 10)
pyplot.plot(x,y, linewidth=2)
pyplot.xticks(x,names,rotation = 90)
pyplot.yticks(my_y_ticks)
fig = pyplot.figure(figsize=(10,10))
pyplot.show()
the graphic from parts of datas
the graphic can not show while reading all datas

Column stack and row stack with H5py to existing datasets

I am trying to use Python to column stack and row stack data I have in an HDF5 file with additional data. I am recording images from a camera and saving them to individual files. Then I want to be able to generate a single file with all of the images patched together. Therefore, I would like to be able to make one dataset in a new file and stack together all of the arrays from each image file into the single file.
I know that h5py allows me to use the datasets like numPy arrays, but I do not know how to tell h5py to save the data to the file again. Below I have a very simple example.
My question is how can I column stack the data from the HDF5 file with the second array (arr2) such that arr2 is saved to the file?
(Note: In my actual application, the data in the file will be much larger than in the example. Therefore, importing the data into the memory, column stacking, and then rewriting it to the file is out of the question.)
import h5py
import numpy
arr1 = numpy.random.random((2000,2000))
with h5py.File("Plot0.h5", "w") as f:
dset = f.create_dataset("Plot", data = arr1)
arr2 = numpy.random.random((2000,2000))
with h5py.File("Plot0.h5", "r+") as f:
dset = f["Plot"]
dset = numpy.column_stack((dset, arr2))
It seems like a trivial issue, but all of my searches have been unsuccessful. Thanks in advance.
After rereading some of the documentation on H5py, I realized my mistake. Here is my new script structure that allows me to stack arrays in the HDF5 file:
import h5py
import numpy
arr1 = numpy.random.random((2000,2000))
with h5py.File("Plot0.h5", "w") as f:
dset = f.create_dataset("Plot", data = arr1, maxshape=(None,None))
dsetX, dsetY = 2000,2000
go = ""
while go == "":
go = raw_input("Current Size: " + str(dsetX) + " " + str(dsetY) + " Continue?")
arr2 = numpy.random.random((2000,2000))
with h5py.File("Plot0.h5", "r+") as f:
dset = f["Plot"]
print len(arr2[:])
print len(arr2[0][:])
change = "column"
dsetX, dsetY = dset.shape
if change == "column":
x1 = dsetX
x2 = len(arr2[:]) + dsetX
y1 = 0
y2 = len(arr2[0][:])
dset.shape = (x2, y2)
else:
x1 = 0
x2 = len(arr2[:])
y1 = dsetY
y2 = len(arr2[0][:]) + dsetY
dset.shape = (x2, y2)
print "x1", x1
print "x2", x2
print "y1", y1
print "y2", y2
print dset.shape
dset[x1:x2,y1:y2] = arr2
print arr2
print "\n"
print dset[x1:x2,y1:y2]
dsetX, dsetY = dset.shape
I hope this can help someone else. And of course, better solutions to this problem are welcome.

Categories

Resources