pandas plot - plot specific lines in file - python

I am using the following code for pandas plot. It takes in a file, and makes plots for specific lines (locus ID's: i.e. 'loc.27404').
However, this code manually specifies the lines which I want to plot.
I have a another file containing all the lines (there are 100s) that I want to plot - how can I write a script that will allow me to input this file such that so that these specific lines are plotted using the below code. I can't seem to write anything that makes sense.
data = {}
for line in File:
cols = line.strip().split('\t')
vals = map(float,cols[6:])
data[cols[3]] = vals
fig,ax = plt.subplots(4,figsize=[15,20])
l1= 'loc.27404'
l2= 'loc.37387'
l3 = 'loc.05134'
l4 = 'loc.10034'
pd.Series(data[l1],index=xticks).plot(ax=ax[0])
pd.Series(data[l2],index=xticks).plot(ax=ax[1])
pd.Series(data[l3],index=xticks).plot(ax=ax[2])
pd.Series(data[l4],index=xticks).plot(ax=ax[3])

A potential solution would be to read the 100s of lines and store results in a list, like: ['loc.27404','loc.37387','loc.05134','loc.10034'] to be used in a for-loop:
for i,line_to_plot in enumerate(['loc.27404','loc.37387','loc.05134','loc.10034']):
pd.Series(data[line_to_plot],index=xticks).plot(ax=ax[i])

Related

UV-Vis Spectrum, using readline() method

So I have been set this work to list data that is provided as a data file called "Spectrum.dat". If the code is correct, when you run the cell in Jupyter, below you should see the list of energies [2.1674, 2.1724, 2.1774, 2.1824, etc...] and the list of intensities [6.4e-07, 1.26e-06, 2.39e-06, 4.36e-06, etc...].
spect = ___
eV = []
inten = []
for line in spect.readlines():
if not (">" in ___):
tmp = line___
eV.append(float(tmp[0]))
inten.append(float(tmp[1]))
print(eV)
print(inten)
I'm trying to find out what the underscores are. The first bit I added was spect = open("spectrum.dat", "r+") and I am assuming the second underscores is if not (">" in spect): so now i am stuck on the last one.
Below is the Spectrum.dat file:

feed class from list

I am still new to python but using it for my linguistics research.
So I am doing some research into toponyms, and I got a list of input data from a topographic institution, which looks like the following:
Official_Name, tab, Dialect_Name, tab, Administrative_district, Topographic_district, Y_coordinates, X_coordinates, Longitude, Latitude.
So, I defined a class:
class MacroTop:
def __init__(self, Official_Name, Dialect_Name, Adm_District, Topo_District, Y, X, Long, Lat):
self.Official_Name = Official_Name
self.Dialect_Name = Dialect_Name
self.Adm_District = Adm_District
self.Topo_District = Topo_District
self.Y = Y
self.X = X
self.Long = Long
self.Lat = Lat
So, with open(), I wanted to load my .txt file with the data I have to read it into the class using a loop but it did not work.
The result I want is to be able to access a feature of the class, say, Dialect_Name and be able to look through all the entries of that feature. I can do that just in the loop, but I wanted to define a class so I could be able to do more manipulation afterwards.
my loop:
with open("locLuxAll.txt", "r") as topo_list:
lines = topo_list.readlines()
for line in lines:
line = line.split('\t')
print(line)
print(line[0]) # This would access all the data that is characterized as Official_Name
I tried to make another loop:
for i in range(0-len(lines)):
lines[i] = MacroTop(str(line[0]), str(line[1]), str(line[2]), str(line[3]), str(line[4]), str(line[5]), str(line[6]), str(line[7]))
But that did not seem to work.
This line fails:
for i in range(0-len(lines)):
You're trying to loop through negative number I guess, so the output will be an empty list.
In [11]: [i for i in range(-200)]
Out[11]: []
EDIT:
Your code seems unreadable to me, you have for i in range(len(lines)) but in this for loop, you're iterating through line variable, where is it from? First of all I'd not write back to lines list as it comes from readlines. Create new list for that, and you dont need i variable, those lines will be kept in order anyway.
class_lines = []
for line in lines:
class_lines.append(MacroTop(str(line[0]), str(line[1]), str(line[2]), str(line[3]), str(line[4]), str(line[5]), str(line[6]), str(line[7])))
Or even with list comprehension:
class_lines = [MacroTop(str(line[0]), str(line[1]), str(line[2]), str(line[3]), str(
line[4]), str(line[5]), str(line[6]), str(line[7])) for line in lines]

Using split in python on excel to get two parameter

I'm getting really confused with all the information on here using 'split' in python. Basically I want to write a code which opens a spreadsheet (with two columns in it) and the function I write will use the first column as x's and the second column as y's and then it will plot it in the x-y plane.
I thought I would use line.splitlines to cut each line in excel into (x,y) but I keep getting
'ValueError: need more than 1 value to unpack'
I don't know what this means?
Below is what I've written so far, (xdir is an initial condition for a different part of my question):
def plotMo(filename, xdir):
infile = open(filename)
data = []
for line in infile:
x,y = line.splitlines()
x = float(x)
y = float(y)
data.append([x,y])
infile.close()
return data
plt.plot(x,y)
For example with
0 0.049976
0.01 0.049902
0.02 0.04978
0.03 0.049609
0.04 0.04939
0.05 0.049123
0.06 0.048807
I would want to the first point in my plane to be (0, 0.049976) and the second plot to be (0.01, 0.049902).
x,y = line.splitlines() tries to split the current line into several lines.
Since splitlines returns only 1 element, there's an error because python cannot find a value to assign to y.
What you want is x,y = line.split() which will split the line according to 1 or more spaces (like awk would do) if no parameter is specified.
However it depends of the format: if there are blank lines you'll get the "unpack" problem at some point, so to be safe and skip malformed lines, write:
items = line.split()
if len(items)==2: x,y = items
To sum it up, a more pythonic, shorter & safer way of writing your routine would be:
def plotMo(filename):
with open(filename) as infile:
data = []
for line in infile:
items = line.split()
if len(items)==2:
data.append([float(e) for e in items])
return data
(maybe it could be condensed more, but that's good for starters)

Extract multiple arrays from .DAT file with undefined size

I have a device that stores three data sets in a .DAT file, they always have the same heading and number of columns, but the number of rows vary.
They are (n x 4), (m x 4), (L x 3).
I need to extract the three data sets into seperate arrays for plotting.
I have been trying to use numpy.genfromtxt and numpy.loadtxt, but the only way I can get them to work for this format is to manually define the row which each data set starts.
As I will regularly need to deal with this format I have been trying to automate it.
If someone could suggest a method which might work I would greatly appreciate it. I have attached an example file.
example file
Just a quck and dirty solution. At your file size, you might run into performance issues. If you know m, n and L, initialize the output vectors with the respective length.
here is the strategy: Load the whole File in a variable. Read the variable line by line. As soon as you discover a keyword, raise a flag that you are in the specific block. In the next line, read out the line to the correct variables.
isblock1 = isblock2 = isblock3 = False
fout = [] # construct also all the other variables that you want to collect.
with open(file, 'r') as file:
lines = file.readlines() #read all the lines
for line in lines:
if isblock1:
(f, psd, ipj, itj) = line.split()
fout.append(f) #do this also with the other variables
if isblock2:
(t1, p1, p2, p12) = line.split()
if isblock3:
(t2, v1, v2) = line.split()
if 'Frequency' is in line:
isblock1 = True
isblock2 = isblock3 = False
if 'Phasor' is in line:
isblock2 = True
isblock1 = isblock3 = False
if 'Voltage' is in line:
isblock3 = True
isblock1 = isblock2 = False
Hope that helps.

Splitting data file columns into separate arrays in Python

I'm new to python and have been trying to figure this out all day. I have a data file laid out as below,
time I(R_stkb)
Step Information: Temp=0 (Run: 1/11)
0.000000000000000e+000 0.000000e+000
9.999999960041972e-012 8.924141e-012
1.999999992008394e-011 9.623148e-012
3.999999984016789e-011 6.154220e-012
(Note: No empty line between the each data line.)
I want to plot the data using matplotlib functions, so I'll need the two separate columns in arrays.
I currently have
def plotdata():
Xvals=[], Yvals=[]
i = open(file,'r')
for line in i:
Xvals,Yvals = line.split(' ', 1)
print Xvals,Yvals
But obviously its completely wrong. Can anyone give me a simple answer to this, and with an explanation of what exactly the lines mean would be helpful. Cheers.
Edit: The first two lines repeat throughout the file.
This is a job for the * operator on the zip method.
>>> asdf
[[1, 2], [3, 4], [5, 6]]
>>> zip(*asdf)
[(1, 3, 5), (2, 4, 6)]
So in the context of your data it might be something like:
handle = open(file,'r')
lines = [line.split() for line in handle if line[:4] not in ('time', 'Step')]
Xvals, Yvals = zip(*lines)
or if your really need to be able to mutate the data afterwards you could just call the list constructor on each tuple:
Xvals, Yvals = [list(block) for block in zip(*lines)]
One way to do it is:
Xvals=[]; Yvals=[]
i = open(file,'r')
for line in i:
x, y = line.split(' ', 1)
Xvals.append(float(x))
Yvals.append(float(y))
print Xvals,Yvals
Note the call to the float function, which will change the string you get from the file into a number.
This is what numpy.loadtxt is designed for. Try:
import numpy as np
import matplotlib.pyplot as plt
data = np.loadtxt(file, skiprows = 2) # assuming you have time and step information on 2 separate lines
# and you do not want to read them
plt.plot(data[:,0], data[:,1])
plt.show()
EDIT:
if you have time and step information scattered throughout the file and you want to plot data on every step, there is a possibility of reading all the file to memory (suppose it's small enough), and then split it on time strings:
l = open(fname, 'rb').read()
for chunk in l.split('time'):
data = np.array([s.split() for s in chunk.split('\n')[2:]][:-1], dtype = np.float)
plt.plot(data[:,0], data[:,1])
plt.show()
Or else you could add the # comment sign to the comment lines and use np.loadxt.
If you want to plot this file with matplotlib, you might want to check out it's plotfile function. See the official documentation here.

Categories

Resources