graphing a line of data from a text file - python

This is my first time creating a graph on python. I have a text file holding data of "weekly gas averages". There are 52 of them (its a years worth of data). I understand how to read the data and make it into a list, I think, and I can do the basics of making a graph if I make the points myself. But I don't know how to connect the two, as in turn the data in the file into my X axis and then make my own Y axis (1-52). My code is a bunch of thoughts I've slowly put together. Any help or direction would be amazing.
import matplotlib.pyplot as plt
def main():
print("Welcome to my program. This program will read data
off a file"\
+" called 1994_Weekly_Gas_Averages.txt. It will plot the"\
+" data on a line graph.")
print()
gasFile = open("1994_Weekly_Gas_Averages.txt", 'r')
gasList= []
gasAveragesPerWeek = gasFile.readline()
while gasAveragesPerWeek != "":
gasAveragePerWeek = float(gasAveragesPerWeek)
gasList.append(gasAveragesPerWeek)
gasAveragesPerWeek = gasFile.readline()
index = 0
while index<len(gasList):
gasList[index] = gasList[index].rstrip('\n')
index += 1
print(gasList)
#create x and y coordinates with data
x_coords = [gasList]
y_coords = [1,53]
#build line graph
plt.plot(x_coords, y_coords)
#add title
plt.title('1994 Weekly Gas Averages')
#add labels
plt.xlabel('Gas Averages')
plt.ylabel('Week')
#display graph
plt.show()
main()

Two errors I can spot while reading your code:
The object gasList is already a list, so when you write x_coords = [gasList] you're creating a list of list, which will not work
the line y_coords=[1,53] creates a list with only 2 values: 1 and 53. When you plot, you need to have as many y-values as there are x-values, so you should have 52 values in that list. You don't have to write them all by hand, you can use the function range(start, stop) to do that for you
That being said, you will probably gain a lot by using the functions that have already been written for you. For instance, if you use the module numpy (import numpy as np), then you can use np.loadtxt() to read the content of the file and create an array in one line. It's going to be much faster, and less error prone that trying to parse files by your self.
The final code:
import matplotlib.pyplot as plt
import numpy as np
def main():
print(
"Welcome to my program. This program will read data off a file called 1994_Weekly_Gas_Averages.txt. It will "
"plot the data on a line graph.")
print()
gasFile = "1994_Weekly_Gas_Averages.txt"
gasList = np.loadtxt(gasFile)
y_coords = range(1, len(gasList) + 1) # better not hardcode the length of y_coords,
# in case there fewer values that expected
# build line graph
plt.plot(gasList, y_coords)
# add title
plt.title('1994 Weekly Gas Averages')
# add labels
plt.xlabel('Gas Averages')
plt.ylabel('Week')
# display graph
plt.show()
if __name__ == "__main__":
main()

Related

How to I make a PyQtGraph scrolling graph clear the previous line within a loop

I wish to plot some data from an array with multiple columns, and would like each column to be a different line on the same scrolling graph. As there are many columns, I think it would make sense to plot them within a loop. I'd also like to plot a second scrolling graph with a single line.
I can get the single line graph to scroll correctly, but the graph containing the multiple lines over-plots from the updated array without clearing the previous lines.
How do I get the lines to clear within the for loop. I thought that setData, might do the clearing. Do I have to have a pg.QtGui.QApplication.processEvents() or something similar within the loop? I tried to add that call but had it no effect.
My code:
#Based on example from PyQtGraph documentation
import numpy as np
import pyqtgraph as pg
win = pg.GraphicsLayoutWidget(show=True)
win.setWindowTitle('pyqtgraph example: Scrolling Plots')
timer = pg.QtCore.QTimer()
plot_1 = win.addPlot()
plot_2 = win.addPlot()
data1 = np.random.normal(size=(300))
curve1 = plot_1.plot(data1)
data_2d = np.random.normal(size=(3,300))
def update_plot():
global data1, data_2d
data1[:-1] = data1[1:]
data1[-1] = np.random.normal()
curve1.setData(data1)
for idx, n in enumerate(data_2d):
n[:-1] = n[1:]
n[-1] = np.random.normal()
curve2 = plot_2.plot(n,pen=(idx))
curve2.setData(n)
#pg.QtGui.QApplication.processEvents() #Does nothing
timer = pg.QtCore.QTimer()
timer.timeout.connect(update_plot)
timer.start(50)
if __name__ == '__main__':
pg.exec()
You could clear the plot of all curves each time with .clear(), but that wouldn't be very performant. A better solution would be to keep all the curve objects around and call setData on them each time, like you're doing with the single-curve plot. E.g.
curves_2d = [plot_2.plot(pen=idx) for idx, n in enumerate(data_2d)]
# ... in update_plot
curves_2d[idx].setData(n)

Plotting data at high frequency using matplotlib:animation

I am trying to write a script that will accomplish a few different things:
1) Read data from a text file that keeps being filled up by sensor data.
2) Filter this data.
3) Plot the unfiltered data.
At the moment, my script works fine for 1) and 2), but I can't get the plot to work. I think that this is complicated by the very high sampling rate I have on the sensor, 5376 Hz.
At the moment, my script polls the text file each 0.1 ms to see if a new line has been written into the document with this code:
# Function that continually looks at thefile.
def follow(thefile):
thefile.seek(0,2) # Look at the last line in the file
while True:
line = thefile.readline() # line is now equal to the last line in thefile
if not line: # If the last line in the file was empty, this will be true,
time.sleep(0.0001) # and the function will sleep for 0.1 ms before continuing
continue
yield line # If the last line was not empty, it will be yielded
I then have my main script that analyses the data:
import time
from scipy.signal import butter, filtfilt
import matplotlib.pyplot as plt
import numpy as np
import os
import pylab as pl
# Opening text file in subdirectory
script_dir = os.path.dirname(__file__) #<-- absolute dir the script is in
rel_path = "python_data.txt"
abs_file_path = os.path.join(script_dir, rel_path)
# Vectors for storing unfiltered data in
xar = []
yar = []
zar = []
pl.ion() # Interaction on
x = np.arange(0,10) # x-array
pl.line, = plt.plot(x,np.sin(x)) # Initiate line
logfile = open(abs_file_path,"r") # Open text file
loglines = follow(logfile) # Look for changes in text file
for line in loglines: # For each new line in text file
try:
x,y,z = line.split(' \t ') # Split up x-, y- and z- data into different arrays.
xar.append(int(x))
yar.append(int(y))
zar.append(int(z))
line.set_ydata(int(x)) # Update the line
pl.draw() # Redraw the canvas
except:
pass
# I then perform the filtering of xar, yar and zar when they have reached a certain length, and reset them after filtering is done with xar = [], yar = [], zar = [].
The method for plotting the data I have implemented is the one specified here: https://scipy-cookbook.readthedocs.io/items/Matplotlib_Animations.html
However, it is not working, as soon as I execute my script a figure is created, but is completely frozen, and nothing happens when I start collecting data in the text file. So I have a few questions:
1) Have I implemented the method correctly?
2) Am I calling line.set_ydata(int(x)) and pl.draw() to often, and that's why it crashes?
3) I don't really understand what happends when I create a line with pl.line, = plt.plot(x,np.sin(x)), and then update the y-element with line.set_ydata(int(x)). Does it matter what I write in the y-position of pl.line, = plt.plot(x,np.sin(x))?
4) Should I be using a completely different method if I want to update my plot with a frequency of 5376 Hz?

How does one plot a running average without importing external modules (other than matplotlib)?

Here is a link to the file with the information in 'sunspots.txt'. With the exception of external modules matploblib.pyplot and seaborn, how could one compute the running average without importing external modules like numpy and future? (If it helps, I can linspace and loadtxt without numpy.)
If it helps, my code thus far is posted below:
## open/read file
f2 = open("/Users/location/sublocation/sunspots.txt", 'r')
## extract data
lines = f2.readlines()
## close file
f2.close()
t = [] ## time
n = [] ## number
## col 1 == col[0] -- number identifying which month
## col 2 == col[1] -- number of sunspots observed
for col in lines: ## 'col' can be replaced by 'line' iff change below is made
new_data = col.split() ## 'col' can be replaced by 'line' iff change above is made
t.append(float(new_data[0]))
n.append(float(new_data[1]))
## extract data ++ close file
## check ##
# print(t)
# print(n)
## check ##
## import
import matplotlib.pyplot as plt
import seaborn as sns
## plot
sns.set_style('ticks')
plt.figure(figsize=(12,6))
plt.plot(t,n, label='Number of sunspots oberved monthly' )
plt.xlabel('Time')
plt.ylabel('Number of Sunspots Observed')
plt.legend(loc='best')
plt.tight_layout()
plt.savefig("/Users/location/sublocation/filename.png", dpi=600)
The question is from the weblink from this university (p.11 of the PDF, p.98 of the book, Exercise 3-1).
Before marking this as a duplicate:
A similar question was posted here. The difference is that all posted answers require importing external modules like numpy and future whereas I am trying to do without external imports (with the exceptions above).
Noisy data that needs to be smoothed
y = [1.0016, 0.95646, 1.03544, 1.04559, 1.0232,
1.06406, 1.05127, 0.93961, 1.02775, 0.96807,
1.00221, 1.07808, 1.03371, 1.05547, 1.04498,
1.03607, 1.01333, 0.943, 0.97663, 1.02639]
Try a running average with a window size of n
n = 3
Each window can by represented by a slice
window = y[i:i+n]
Need something to store the averages in
averages = []
Iterate over n-length slices of the data; get the average of each slice; save the average in another list.
from __future__ import division # For Python 2
for i in range(len(y) - n):
window = y[i:i+n]
avg = sum(window) / n
print(window, avg)
averages.append(avg)
When you plot the averages you'll notice there are fewer averages than there are samples in the data.
Maybe you could import an internal/built-in module and make use of this SO answer -https://stackoverflow.com/a/14884062/2823755
Lots of hits searching with running average algorithm python

Determining if data in a txt file obeys certain statistics

I'm working with a Geiger counter which can be hooked up to a computer and which records its output in the form of a .txt file, NC.txt, where it records the time since starting and the 'value' of the radiation it recorded. It looks like
import pylab
import scipy.stats
import numpy as np
import matplotlib.pyplot as plt
x1 = []
y1 = []
#Define a dictionary: counts
f = open("NC.txt", "r")
for line in f:
line = line.strip()
parts = line.split(",") #the columns are separated by commas and spaces
time = float(parts[1]) #time is recorded in the second column of NC.txt
value = float(parts[2]) #and the value it records is in the third
x1.append(time)
y1.append(value)
f.close()
xv = np.array(x1)
yv = np.array(y1)
#Statistics
m = np.mean(yv)
d = np.std(yv)
#Strip out background radiation
trueval = yv - m
#Basic plot of counts
num_bins = 10000
plt.hist(trueval,num_bins)
plt.xlabel('Value')
plt.ylabel('Count')
plt.show()
So this code so far will just create a simple histogram of the radiation counts centred at zero, so the background radiation is ignored.
What I want to do now is perform a chi-squared test to see how well the data fits, say, Poisson statistics (and then go on to compare it with other distributions later). I'm not really sure how to do that. I have access to scipy and numpy, so I feel like this should be a simple task, but just learning python as I go here, so I'm not a terrific programmer.
Does anyone know of a straightforward way to do this?
Edit for clarity: I'm not asking so much about if there is a chi-squared function or not. I'm more interested in how to compare it with other statistical distributions.
Thanks in advance.
You can use SciPy library, here is documentation and examples.

Reading and manipulating multiple netcdf files in python

I need help with reading multiple netCDF files, despite few examples in here, none of them works properly.
I am using Python(x,y) vers 2.7.5, and other packages : netcdf4 1.0.7-4, matplotlib 1.3.1-4, numpy 1.8, pandas 0.12,
basemap 1.0.2...
I have few things I'm used to do with GrADS that I need to start doing them in Python.
I have a few 2 meter temperature data (4-hourly data, each year, from ECMWF), each file contains 2 meter temp data, with Xsize=480, Ysize=241,
Zsize(level)=1, Tsize(time) = 1460 or 1464 for leap years.
These are my files name look alike: t2m.1981.nc, t2m.1982.nc, t2m.1983.nc ...etc.
Based on this page:
( Loop through netcdf files and run calculations - Python or R )
Here is where I am now:
from pylab import *
import netCDF4 as nc
from netCDF4 import *
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
import numpy as np
f = nc.MFDataset('d:/data/ecmwf/t2m.????.nc') # as '????' being the years
t2mtr = f.variables['t2m']
ntimes, ny, nx = shape(t2mtr)
temp2m = zeros((ny,nx),dtype=float64)
print ntimes
for i in xrange(ntimes):
temp2m += t2mtr[i,:,:] #I'm not sure how to slice this, just wanted to get the 00Z values.
# is it possible to assign to a new array,...
#... (for eg.) the average values of 00z for January only from 1981-2000?
#creating a NetCDF file
nco = nc.Dataset('d:/data/ecmwf/t2m.00zJan.nc','w',clobber=True)
nco.createDimension('x',nx)
nco.createDimension('y',ny)
temp2m_v = nco.createVariable('t2m', 'i4', ( 'y', 'x'))
temp2m_v.units='Kelvin'
temp2m_v.long_name='2 meter Temperature'
temp2m_v.grid_mapping = 'Lambert_Conformal' # can it be something else or ..
#... eliminated?).This is straight from the solution on that webpage.
lono = nco.createVariable('longitude','f8')
lato = nco.createVariable('latitude','f8')
xo = nco.createVariable('x','f4',('x')) #not sure if this is important
yo = nco.createVariable('y','f4',('y')) #not sure if this is important
lco = nco.createVariable('Lambert_Conformal','i4') #not sure
#copy all the variable attributes from original file
for var in ['longitude','latitude']:
for att in f.variables[var].ncattrs():
setattr(nco.variables[var],att,getattr(f.variables[var],att))
# copy variable data for lon,lat,x and y
lono=f.variables['longitude'][:]
lato=f.variables['latitude'][:]
#xo[:]=f.variables['x']
#yo[:]=f.variables['y']
# write the temp at 2 m data
temp2m_v[:,:]=temp2m
# copy Global attributes from original file
for att in f.ncattrs():
setattr(nco,att,getattr(f,att))
nco.Conventions='CF-1.6' #not sure what is this.
nco.close()
#attempt to plot the 00zJan mean
file=nc.Dataset('d:/data/ecmwf/t2m.00zJan.nc','r')
t2mtr=file.variables['t2m'][:]
lon=file.variables['longitude'][:]
lat=file.variables['latitude'][:]
clevs=np.arange(0,500.,10.)
map = Basemap(projection='cyl',llcrnrlat=0.,urcrnrlat=10.,llcrnrlon=97.,urcrnrlon=110.,resolution='i')
x,y=map(*np.meshgrid(lon,lat))
cs = map.contourf(x,y,t2mtr,clevs,extend='both')
map.drawcoastlines()
map.drawcountries()
plt.plot(cs)
plt.show()
First question is at the temp2m += t2mtr[1,:,:] . I am not sure how to slice the data to get only 00z (let say for January only) of all files.
Second, While running the test, an error came at cs = map.contourf(x,y,t2mtr,clevs,extend='both') saying "shape does not match that of z: found (1,1) instead of (241,480)". I know some error probably on the output data, due to error on recording the values, but I can't figure out what/where .
Thanks for your time. I hope this is not confusing.
So t2mtr is a 3d array
ntimes, ny, nx = shape(t2mtr)
This sums all values across the 1st axis:
for i in xrange(ntimes):
temp2m += t2mtr[i,:,:]
A better way to do this is:
temp2m = np.sum(tm2tr, axis=0)
temp2m = tm2tr.sum(axis=0) # alt
If you want the average, use np.mean instead of np.sum.
To average across a subset of the times, jan_times, use an expression like:
jan_avg = np.mean(tm2tr[jan_times,:,:], axis=0)
This is simplest if you want just a simple range, e.g the first 30 times. For simplicity I'm assuming the data is daily and years are constant length. You can adjust things for the 4hr frequency and leap years.
tm2tr[0:31,:,:]
A simplistic way on getting Jan data for several years is to construct an index like:
yr_starts = np.arange(0,3)*365 # can adjust for leap years
jan_times = (yr_starts[:,None]+ np.arange(31)).flatten()
# array([ 0, 1, 2, ... 29, 30, 365, ..., 756, 757, 758, 759, 760])
Another option would be to reshape tm2tr (doesn't work well for leap years).
tm2tr.reshape(nyrs, 365, nx, ny)[:,0:31,:,:].mean(axis=1)
You could test the time sampling with something like:
np.arange(5*365).reshape(5,365)[:,0:31].mean(axis=1)
Doesn't the data set have a time variable? You might be able to extract the desired time indices from that. I worked with ECMWF data a number of years ago, but don't remember a lot of the details.
As for your contourf error, I would check the shape of the 3 main arguments: x,y,t2mtr. They should match. I haven't worked with Basemap.

Categories

Resources