Plotting in Python via matplotlib.pyplot (calculate the area) - python

I have a question \ problem. I need to plot the graph by the numbers that I got from the file (which I did) and then I need to draw a line connecting start and end, and calculate the area that between these two lines. I try to make a lot of variations, but i have no idea how I can make it..
I'm trying do it via matplotlib.pyplot library
Here the 'figure' whitch I should to get after add 'connection line between beginning and and' and now I need calcutale square between black line and blue.
PS the black one is kind of straight :)
Here is soure of code, and my data file...
http://pastebin.com/g40bAzPR
#!/path/to/python -tt
# numerical data
# python GraphicalPart.py ../dataFile.txt
import sys
import matplotlib.pyplot as plt
import numpy as np
def startDivide(fileName):
for i in range(1,2):
inputFile = open(fileName)
outputFile = open(fileName + "_" + str(i) + "_out.csv", "w")
floatList = []
for line in inputFile.readlines():
data = line.split(" ")
string = data[i]
if string.startswith('-'): #remove '-'
string = string[1:]
floatList.append(float(string))
floatList.sort() #sorting the list of data
for item in floatList:
outputFile.write("%s\n" % item)
outputFile.close()
inputFile.close()
data1=np.genfromtxt(fileName + "_" + str(i) + '_out.csv', skip_header=1)
plt.plot(data1)
plt.savefig(fileName + "_" + str(i) + "_.png")
plt.clf()
def main():
if len(sys.argv) != 2:
print "Not enough arguments. *_data.txt file only!"
else:
startDivide(sys.argv[1])
if __name__ == "__main__":
main()

for i in range(1,2) is a loop which only iterates once. Maybe you plan on increasing the number of iterations? If so, bear in mind that it's quicker to load the data once, rather than multiple times in a for-loop. You can do that using np.genfromtxt with the usecols parameter to specify the desired columns.
To find the area under the curve, you could use
np.trapz.
To find the area between two curves, you subtract area under the upper curve from the area under the lower curve. Assuming the diagonal line is always above the data curve:
import sys
import matplotlib.pyplot as plt
import numpy as np
def startDivide(filename):
data = np.genfromtxt(filename, dtype=None, usecols=[1])
data = np.abs(data)
data.sort()
np.savetxt("{}_1_out.csv".format(filename), data)
plt.plot(data)
plt.plot([0,len(data)-1], [data[0], data[-1]])
plt.savefig("{}_1_.png".format(filename))
area = np.trapz([data[0], data[-1]], dx=len(data)-1) - np.trapz(data)
print(area)
if __name__ == "__main__":
startDivide(sys.argv[1])

Related

python ignore empty files

We prepare a following python scripts (python 2.7) to make histograms.
histogram.py
#!/usr/bin/env python
import sys
import numpy as np
import matplotlib as mpl
import matplotlib.mlab as mlab
mpl.use('Agg')
import matplotlib.pyplot as plt
sys.argv[1] # Define input name
sys.argv[2] # Define output name
sys.argv[3] # Define title
# Open the file name called "input_file"
input_file=sys.argv[1]
inp = open (input_file,"r")
lines = inp.readlines()
if len(lines) >= 20:
x = []
#numpoints = []
for line in lines:
# if int(line) > -10000: # Activate this line if you would like to filter any date (filter out values smaller than -10000 here)
x.append(float(line))
# the histogram of the data
n, bins, patches = plt.hist(x, 50, normed=False, facecolor='gray')
plt.xlabel('Differences')
numpoints = len(lines)
plt.ylabel('Frequency ( n =' + str(numpoints) + ' ) ' )
title=sys.argv[3]
plt.title(title)
plt.grid(True)
save_file=sys.argv[2]
plt.savefig(save_file+".png")
plt.clf()
inp.close()
example: input
1
2
3
The script will do the following
python histogram.py input ${output_file_name}.png ${title_name}
We add a line "if len(lines) >= 20:" so if the data points are less than 20, we don't make a plot.
However, if the file is empty, this python script will be freeze.
We add a bash line to remove any empty files before running "python histogram.py input ${output_file_name}.png ${title_name}"
find . -size 0 -delete
For some reasons, this line always works in small scale testings but not in real production runs under several loops. So we would love to make the "histogram.py" ignore any empty files if possible.
The search only finds this link which doesn't seem to be quite helpful : (
Ignoring empty files from coverage report
Could anyone kindly offer some comments? Thanks!
Check if the input_file file is empty os.path.getsize(input_file) > 0
os.path.getsize
You will need the full path which I presume you will have and it will raise an error if the file does not exist or is inaccessible so you may want to handle those cases.
This code works, ignoring empty files:
#!/usr/bin/env python
import sys
import numpy as np
import matplotlib as mpl
import matplotlib.mlab as mlab
import os
mpl.use('Agg')
import matplotlib.pyplot as plt
sys.argv[1] # Define input name
sys.argv[2] # Define output name
sys.argv[3] # Define title
input_file=sys.argv[1]
# Open the file name called "input_file"
if os.path.getsize(input_file) > 0:
inp = open (input_file,"r")
lines = inp.readlines()
if len(lines) >= 20:
x = []
#numpoints = []
for line in lines:
# if int(line) > -10000: # Activate this line if you would like to filter any date (filter out values smaller than -10000 here)
x.append(float(line))
# the histogram of the data
n, bins, patches = plt.hist(x, 50, normed=False, facecolor='gray')
plt.xlabel('Differences')
numpoints = len(lines)
plt.ylabel('Frequency ( n =' + str(numpoints) + ' ) ' )
title=sys.argv[3]
plt.title(title)
plt.grid(True)
save_file=sys.argv[2]
plt.savefig(save_file+".png")
plt.clf()
inp.close()
else:
print "Empty file"
~$ python test.py empty.txt foo bar
Empty file
Check if the file exists + is not empty before hand.
import os
def emptyfile(filepath):
return ((os.path.isfile(filepath) > 0) and (os.path.getsize(filepath) > 0))

Plotting Repeating Data Set from File using matplotlib and lists

this is my first post here, so I hope it goes well.
I have a file of data(about 2mb) in the format
angle (space) energy (space) counts
angle (space) energy (space) counts
angle (space) energy (space) counts, etc.
(this is data recorded from a particle accelerator running for ~170 hours, so the file is large)
Angle starts out at 0, and is 0 while energy goes up to about 4500, and then
angle increases by one and energy starts again at 0 and goes up to 4500. This repeats
until theta = 255.
I am trying to create a program that plots the number of counts versus the energy level, energy level being my x axis, and counts being my y axis. I have tried many solutions, but to no avail.
Any help given to me on this would be much appreciated.
My code is posted below.
import matplotlib.pyplot as plt
import numpy as np
import pylab
from numpy import *
from matplotlib.pyplot import *
import math
import sys
import scipy.optimize
"""
Usage
---------------
Takes a file in the format of
Theta |Rel_MeV |Counts
97 4024 0
97 4025 0
97 4026 6
97 4027 2
and graphs it
fileURL is the input for the file to put into the program
txt_Title is the graph label
"""
DEBUG = 1
fileURL = './ne19_peaks_all.dat'
txt_Title = 'Oxygen and Alpha Particle Relative Energy'
MeV_divide_factor = 100
ptSize = 5
MarkerType = '+'
MeV_max = 5000
def main():
# Read the file.
f2 = open(fileURL, 'r')
# read the whole file into a single variable, which is a list of every row of the file.
lines = f2.readlines()
f2.close()
# initialize some variable to be lists:
list_MeV = []
list_counts = []
for i in range(MeV_max):
list_MeV.append(i)
list_counts.append(0)
# scan the rows of the file stored in lines, and put the values into some variables:
for line in lines:
p = line.split()
MeV = float(p[1])/MeV_divide_factor
count = float(p[2])
list_counts[int(MeV)] += count
x_arr = np.array(list_MeV)
y_arr = np.array(list_counts)
plt.plot(x_arr, y_arr, MarkerType)
plt.title(txt_Title)
plt.show()
return 0
def func(x, a, b):
return a*x + b
if __name__ == '__main__':
status = main()
sys.exit(status)
Used a dictionary where each energy level was a key, and with the counts being the values

Invalid Literal for int() with base 10: "

I have a string of data from an accelerometer (x, y, z) (looks like this in text file "XXX XXX XXX" and I am attempting to read it and convert to a line graph with three subplots of data. I'm adapting some code from a friend to do this but I'm not sure where some of these errors are coming from. Obviously beginner programmer. Help much appreciated.
Error: invalid literal for int() with base 10
import os
import numpy as npy
import matplotlib.pyplot as plt
global y0,y1,y2
increment_size = 8000
datasample_size = 16000
from os.path import join
filepath = "C:\\Users\\Riley\\Documents\\Programming\\"
infile = join(filepath, 'data.txt')
infile = open(infile,"r")
singleline = infile.readline()
asciidata = singleline.split()
asciidata[0]=asciidata[0][3:]
y0=[int(asciidata[0])]
y1=[int(asciidata[1])]
y2=[int(asciidata[2])]
count = 0
for singleline in infile:
count += 1
if (count % 10000) == 0:
print(count)
asciidata = singleline.split()
y0.append(int(asciidata[0]))
y1.append(int(asciidata[1]))
y2.append(int(asciidata[2]))
infile.close()
totaldata=count-1
print(totaldata)
low = 0
high = datasample_size
while low < totaldata:
t = npy.arange(low,high)
plt.subplot(311)
plt.ylim(-2000,2000)
plt.plot(t,y0[low:high])
plt.subplot(312)
plt.ylim(-2000,2000)
plt.plot(t,y1[low:high])
plt.subplot(313)
plt.ylim(-2000,2000)
plt.plot(t,y2[low:high])
outfilename = filepath + 'Plots/' + shortfilename + '_' + str(low) + '.png'
plt.savefig(outfilename)
outfilename2 = filepath + 'Datasegments/' + shortfilename + '_' + str(low) + '.txt'
outfile = open(outfilename2,"w")
for j in range(low,high):
outfile.write(str(y0[j])+'\t'+str(y1[j])+'\t'+str(y2[j])+'\n')
# print(low),
plt.show()
low = low + increment_size
high = high + increment_size
if high > totaldata:
high = totaldata
# if low > 10000:
# break
# plt.close()
It is possible that you may be trying to parse a float() with int().
If you are needing to take care of empty values, try int(s or 0)
There's a numpy function that does almost all of this for you. It's hard for me to test it without knowing the format of your data file (it would help if you pasted in the first few lines of 'data.txt')
from os import path
import numpy as npy
import matplotlib.pyplot as plt
increment_size = 8000
datasample_size = 16000
filepath = "C:\\Users\\Riley\\Documents\\Programming\\"
infile = path.join(filepath, 'data.txt')
# This line replaces all the file reading lines:
y0, y1, y2 = npy.genfromtxt(infile, unpack=True)
totaldata = len(y0)
print(totaldata)
low = 0
high = datasample_size
while low < totaldata:
...
Possibly the plotting could be done more simply too, but I'm not sure I understand why you are plotting it section by section.

Accelerometer Data write to file then graph Matplotlib (3 subplots [x, y, z])

I'm not very versed in programming so bear with me. Programming project as a hobby (I'm a Physics major). Anyways, trying to receive serial data and then graph using matplotlib from an Arduino Uno using an ADXL345 Breakout Trip-Axis Accelerometer. I don't need it to be dynamic (live feed) at the moment. Here's my code for writing serial data to file that performs well.
import serial
filepath = 'C:/Users/Josh/Documents/Programming/'
outfilename =filepath + 'data.txt'
outfile = open(outfilename,"w")
numpoints = 1000
ser = serial.Serial('COM4',9600)
for i in range(numpoints):
inString=ser.readline()
print inString
outfile.write(inString)
ser.close()
outfile.close()
This made a fairly accessible text file that I want to convert to a matplotlib graph containing three subplots for each axis (x, y, z). I'm getting a File IO errno 2 from python saying that it cant find the file (doesn't exist) but it does and the path is correct to my limited knowledge. Any help at all much appreciated. This is relevant part of my poorly made attempt:
import numpy as npy
import matplotlib.pyplot as plt
global y0,y1,y2
increment_size = 8000
datasample_size = 16000
filepath = ("C:\Users\Josh\Documents\Programming\data.txt")
infile = filepath + 'data.txt'
infile = open("data.txt","r")
singleline = infile.readline()
asciidata = singleline.split()
asciidata[0]=asciidata[0][3:] #strip three bytes of extraneous info
y0=[int(asciidata[0])]
y1=[int(asciidata[1])]
y2=[int(asciidata[2])]
Your filepath is the full file path, not the directory. You are then adding 'data.txt' to that, you need to change your code to:
filepath = 'C:\\Users\\Josh\\Documents\\Programming\\'
infile = filepath + 'data.txt'
infile = open(infile,"r")
In python '\' is used for escaping characters so to have an actual '\' you must use '\\'.
Alternatively you can (and generally should) use os.path.join to join together directories and files. In that case your code becomes:
from os.path import join
filepath = 'C:\\Users\\Josh\\Documents\\Programming'
infile = join(filepath, 'data.txt')
infile = open(infile,"r")
If you are interested about plotting realtime readings from the ADXL345 here is my code.
I used pyqtgraph for faster drawings
from pyqtgraph.Qt import QtGui, QtCore
import numpy as np
import pyqtgraph as pg
import serial
app = QtGui.QApplication([])
xdata = [0]
ydata = [0]
zdata = [0]
# set up a plot window
graph = pg.plot()
graph.setWindowTitle("ADXL345 realtime data")
graph.setInteractive(True)
xcurve = graph.plot(pen=(255,0,0), name="X axis")
ycurve = graph.plot(pen=(0,255,0), name="Y axis")
zcurve = graph.plot(pen=(0,0,255), name="Z axis")
# open serial port
ser = serial.Serial("COM4", 115200, timeout=1)
def update():
global xcurve, ycurve, zcurve, xdata, ydata, zdata
# serial read
dataRead = ser.readline().split()
# append to data list
xdata.append(float(dataRead[0]))
ydata.append(float(dataRead[1]))
zdata.append(float(dataRead[2]))
# plot
xcurve.setData(xdata)
ycurve.setData(ydata)
zcurve.setData(zdata)
app.processEvents()
# Qt timer
timer = QtCore.QTimer()
timer.timeout.connect(update)
timer.start(0)
if __name__ == '__main__':
import sys
if (sys.flags.interactive != 1) or not hasattr(QtCore, 'PYQT_VERSION'):
QtGui.QApplication.instance().exec_()

matplotlib: Have axis maintaining ratio

I am new to matplotlib, and I have a very simple (I'm guessing) question.
I have some data that need to be represented in a rectangle of 50x70 "units" (they're feet, actually representing a room) but I don't seem to be able to get matplotlib drawing a rectangle with the same scale on both axis and keeping the 50x70 "dimensions" at the same time.
I've tried the following:
import json
import matplotlib
import os
import sys
import traceback
import matplotlib.pyplot as plt
DATA_FILE = os.path.join(os.path.expanduser("~"), "results.json")
FLOOR_DIMENSIONS = (50, 70)
if __name__ == "__main__":
if len(sys.argv) > 1:
DATA_FILE = os.path.abspath(sys.argv[0])
print "Gonna see what happens with file %s" % DATA_FILE
try:
with open(DATA_FILE, 'r') as f:
result_dict = json.load(f)
except (IOError, OSError, ValueError), e:
print "Received %s %s when trying to parse json from %s\n"\
"Showing traceback: %s" % (type(e), e, DATA_FILE, traceback.format_exc())
result_dict = {}
for d_mac in result_dict:
data = result_dict[d_mac]
if len(data) < 3:
continue
x_s = list(d['x'] for d in data)
y_s = list(d['y'] for d in data)
plt.scatter(x_s, y_s, marker='o', c=numpy.random.rand(5,1), s=15)
plt.xlim([0, FLOOR_DIMENSIONS[0]])
plt.ylim([0, FLOOR_DIMENSIONS[1]])
#plt.axis('equal')
plt.show()
sys.exit(0)
Doing that, I get:
Which draws my data inside an square, changing the X-Y scale (X is 50 points, and Y is 70, therefor Y shows "shrunk")
Another option I tried was uncommenting the line saying plt.axis('equal'), but that "cuts" the Y axis (doesn't start in 0 and finishes in 70, but starts in 15 and ends in 55, probably because there's no data with y < 15 and y > 55)
But I don't want that either, I want the "canvas" starting in Y=0 and ending in Y=70, and if there's no data just show an empty space.
What I need is to draw something like this:
which I got by manually re-sizing the window where the plot was rendered :-D
Thank you in advance!
Add plt.axis('scaled').
edit: axis('image') may be better for your needs.
More axis settings can be found in the documentation.
import matplotlib.pyplot as plt
import numpy as np
xs = np.arange(50)
ys = (np.random.random(50)*70) + 15
plt.scatter(xs,ys)
plt.axis('image')
plt.axis([0, 50, 0, 70])
plt.show()
gives:
In the updated example I know the ys actually has a maximum of ~85, the offset was just to demonstrate proper axis enforcement.

Categories

Resources