I am new to Python and trying to learn as much as I can.
I am trying to create a live graph with Matplotlib by reading from a CSV file.
It seems that I am having a TypeError: value, I am guessing from the timestamp format.
From what I read on Pandas infobase, The date_parser should take care of this, but i am unsure on using properly.
I would like to use the Timestamp in the 2nd column of the CSV as the X Axis, and then plot the rest of the data as Y values.
The CSV looks like this:
1,11:24:30,null,0,3,4,5,6,7,8,9,10,11,12
1,11:24:33,null,0,3,4,5,6,7,8,9,10,11,12
1,11:24:35,null,0,3,4,5,6,7,8,9,10,11,12
1,11:24:38,null,0,3,4,5,6,7,8,9,10,11,12
1,11:24:41,null,0,3,4,5,6,7,8,9,10,11,12
1,11:24:43,null,0,3,4,5,6,7,8,9,10,11,12
My code is below:
import random
from itertools import count
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
plt.style.use('fivethirtyeight')
x_vals = []
y_vals = []
index = count()
def animate(i):
data = pd.read_csv('C:/Python/20220124.csv', names=["Pass", "Time", "1", "2"], header=
None, parse_dates= True)
y1 = data['Pass']
y2 = data['Time']
y3 = data['1']
y4 = data['2']
plt.cla()
plt.plot(y1, label='Pass/Fail', lw=3, c='c', marker='o', markersize=4, mfc='k')
plt.plot(y2, label='Time', lw=3, c='c', marker='o', markersize=4, mfc='k')
plt.plot(y3, label='1', lw=2, ls='--', c='k')
plt.plot(y4, label='2', lw=2, ls='--', c='k')
plt.legend(loc='upper left')
plt.tight_layout()
ax = plt.gca()
xlim_low, xlim_high = ax.get_xlim()
ax.set_xlim(xlim_low, xlim_high)
y1offset = 1.0
y1max = (y4.max() + y1offset)
current_ymax = y1max
y1min = (y3.min() - y1offset)
current_ymin = y1min
ax.set_ylim(current_ymin, current_ymax)
ani = FuncAnimation(plt.gcf(), animate, interval=1000)
plt.tight_layout()
plt.show()
Thanks for any help!
I state that I am not an expert on pandas or matplotlib.
Looking at your code I think that the problem lies in the data definition of the CSV file.
You pass to read_csv the array names with 4 fields, but your CSV has lots more columns.
Trying your code, if I remove from the CSV the data in excess and use only four fields per line the plot is drawn.
As I stated, I don't know these libraries, but as I understood pandas read_csv is used to load data into a data structure (the DataFrame as I read from the docs).
Something probably is going wrong when the function attempts to read more data and parse them as timestamps producing the TypeError.
But is a big guess, I think is better to go back to the Pandas docs!
Related
i just stumpled upon a problem I simply cannot solve. I have a dataset with raw data which I will upload here: https://file.io/oJqkZjAGyqV1
Its an excel file with the data inside.
I then created some code to open it, read it, generate a mean and sem of my data as below.
# Import required packages
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
from pylab import cm
df = pd.read_excel("Chlorophyll_data_mod.xlsx")
#----Calculation of meanvalues and sem from raw_data---------
meandf2 = df.set_index(["Group"])
sets = []
for x in ["A","B","AB","xc"]:
meandf3 = meandf2.filter(like=f"Chl_{x}_").reset_index()
sets.append(meandf3)
#---------Grouping DataFrame----------#
means = []
ster = []
for x in range(len(sets)):
meandf = sets[x].groupby(["Group"]).mean()
meandf = meandf.reset_index()
means.append(meandf)
sems = sets[x].groupby("Group").sem()
sems = sems.reset_index()
ster.append(sems)
#----Selecting Dataframe from List-----#
plotdf = means[0]
ploter = ster[0]
plotgroup = plotdf.iloc[:,[0,]]
plotdata = plotdf.iloc[:,[1,]]
grouparray = plotgroup.to_numpy()
dataarray = plotdata.to_numpy()
#-----CreatePlot------#
fig, ax = plt.subplots(nrows=3, ncols=1, sharex="all", figsize=(10,8))
plotdf.plot(ax=ax[0,],x="Group",y="Chl_A_0D", kind="bar", legend=False, color="black")
plt.errorbar(x=plotdf["Group"], y=plotdf["Chl_A_0D"],yerr=ploter["Chl_A_0D"])
plotdf.plot(ax=ax[1,],x="Group",y="Chl_A_10DaT", kind="bar", legend=False, color="blue")
plt.errorbar(x=plotdf["Group"], y=plotdf["Chl_A_10DaT"],yerr=ploter["Chl_A_10DaT"])
plotdf.plot(ax=ax[2,],x="Group",y="Chl_A_7DaR", kind="bar", legend=False, color="magenta")
plt.errorbar(x=plotdf["Group"], y=plotdf["Chl_A_7DaR"],yerr=ploter["Chl_A_7DaR"])
#----Legend of the Plot-----#
fig.legend(loc="lower center", bbox_to_anchor=(0.5,0), fancybox=True, ncol=6)
#----Layout------#
plt.tight_layout(rect=[0, 0.02, 1,1])
plt.show()
And I manage to create a subplot, which shows 3 of my interested data points. However, I struggle with the error bars.
My approach was to calculate the sem and store it into a new dataframe. And then just read it from there for the yerr. However, this doesn't work.
plotdf.plot(ax=ax[2,],x="Group",y="Chl_A_7DaR", kind="bar", legend=False, color="magenta", yerr=ploter["Chl_A_7DaR"])
Results in an array error because of the structure.
And my current approach, as in the main code above only draws the error bars in the last subplot, but not in each individual plot.
Maybe here is someone who could help me understanding this function?
Best regards
I've got this script running to update every time a new value is added to my csv file (it's a manual log taking in values from a machine):
from itertools import count
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
plt.style.use('fivethirtyeight')
x_vals = []
y_vals = []
counter=0
index = count()
def animate(i):
data = pd.read_csv('C:/Users/Owner/Downloads/Manual-Log-27Nov2020-150846.csv')
data=data.dropna()
data['Date And Time']=data['Date']+' '+data['Time']
Date = data['Date And Time']
Temp = data['Temperature']
Pressure = data['Pressure']
pH = data['pH']
plt.cla()
plt.plot(Date, Temp, label='Temperature')
plt.plot(Date, Pressure, label='Pressure')
plt.plot(Date, pH, label='pH')
plt.legend(loc='upper left')
plt.xlabel('Date and Time')
plt.ylabel('Y Values')
plt.xticks(rotation=90)
plt.tight_layout()
ani = FuncAnimation(plt.gcf(), animate, interval=30000)
plt.tight_layout()
plt.show()
Since it's such a large file (20k lines or so) the resulting graph is huge and you can't really read the data properly. Is there a way I could only get it to show the 20 most recent readings?
you can slice it or get first n records. you can apply the following after your code line : data=data.dropna()
df=pd.DataFrame({'Count':[2,33,4,6,8,9],'apha':['A','B','C','D','E','F']})
df_sorted=sorted_df = df.sort_values(by=['Count'], ascending=True) # in case needed
df_limited=df_sorted.head(3) # this is one you are looking for
I'm animating a scatter plot with a code below. It reads data from a .csv file "sample.csv" and animates it by using np.roll in update function.
import numpy as np
import matplotlib.pyplot as plt
from numpy import genfromtxt
from matplotlib.animation import FuncAnimation
my_data = genfromtxt("sample.csv", delimiter=",", skip_header=1) # reading data
fig, ax = plt.subplots()
xdata, ydata = [], []
line, = ax.plot([], [], ".", markersize=14)
plt.grid()
def init():
ax.set_xlim(0, 50)
ax.set_ylim(0, 60)
return line,
def update(frame):
my_data[:,0] = np.roll(my_data[:,0],1) # moving graph
gap_loc = [19,20,21] # location of a gap
my_data[gap_loc, 1] = np.nan # creating a gap in graph
xdata.append(my_data[:,0])
ydata.append(my_data[:,1])
line.set_data(xdata, ydata)
return line,
ani = FuncAnimation(fig, update, frames=np.arange(0,50,1), init_func=init, blit=True)
plt.show()
The result looks like that:
As you see there is a gap which moves together with the remaining points. However, what I want to achieve is that the gap was stationary at the locations on the horizontal axis: 19,20,21.
How can I achieve this effect?
Below please find a dataset, I'm using for this animation.
Day,Var 1,Var 2
1,2,12
2,4,19
3,6,20
4,8,25
5,10,25
6,12,33
7,14,40
8,16,47
9,18,49
10,20,50
11,22,52
12,24,55
13,26,65
14,28,82
15,30,100
16,32,100
17,34,110
18,36,117
19,38,140
20,40,145
21,42,164
22,44,170
23,46,198
24,48,200
25,50,210
26,48,210
27,46,211
28,44,216
29,42,267
30,40,317
31,38,325
32,36,335
33,34,337
34,32,347
35,30,356
36,28,402
37,26,410
38,24,448
39,22,449
40,20,457
41,18,463
42,16,494
43,14,500
44,12,501
45,10,502
46,8,514
47,6,551
48,4,551
49,2,558
50,0,628
Define the gap when you load the data, and do so in the x column rather than the y:
# imports
my_data = genfromtxt("sample.csv", delimiter=",", skip_header=1) # reading data
gap_loc = [19,20,21] # location of a gap
my_data[gap_loc, 0] = np.nan # creating a gap in graph
# plotting code
So now when you roll the x column, there will always be np.nan at the x values [19, 20, 21], regardless of what the y coordinate is. You can use print(my_data) within the update function to make clear what was going on each iteration.
Here is the result:
Also, I think you are over-plotting because you continually expand xdata and ydata using append. I ended up just removing the xdata and ydata and doing:
def update(frame):
my_data[:,0] = np.roll(my_data[:,0],1) # moving graph
line.set_data(my_data[:,0], my_data[:,1])
return line,
so I am plotting error bar of pandas dataframe. Now the error bar has a weird arrow at the top, but what I want is a horizontal line. For example, a figure like this:
But now my error bar ends with arrow instead of a horinzontal line.
Here is the code i used to generate it:
plot = meansum.plot(
kind="bar",
yerr=stdsum,
colormap="OrRd_r",
edgecolor="black",
grid=False,
figsize=(8, 2),
ax=ax,
position=0.45,
error_kw=dict(ecolor="black", elinewidth=0.5, lolims=True, marker="o"),
width=0.8,
)
So what should I change to make the error become the one I want. Thx.
Using plt.errorbar from matplotlib makes it easier as it returns several objects including the caplines which contain the marker you want to change (the arrow which is automatically used when lolims is set to True, see docs).
Using pandas, you just need to dig the correct line in the children of plot and change its marker:
import pandas as pd
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
df = pd.DataFrame({"val":[1,2,3,4],"error":[.4,.3,.6,.9]})
meansum = df["val"]
stdsum = df["error"]
plot = meansum.plot(kind='bar',yerr=stdsum,colormap='OrRd_r',edgecolor='black',grid=False,figsize=8,2),ax=ax,position=0.45,error_kw=dict(ecolor='black',elinewidth=0.5, lolims=True),width=0.8)
for ch in plot.get_children():
if str(ch).startswith('Line2D'): # this is silly, but it appears that the first Line in the children are the caplines...
ch.set_marker('_')
ch.set_markersize(10) # to change its size
break
plt.show()
The result looks like:
Just don't set lolim = True and you are good to go, an example with sample data:
import pandas as pd
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
df = pd.DataFrame({"val":[1,2,3,4],"error":[.4,.3,.6,.9]})
meansum = df["val"]
stdsum = df["error"]
plot = meansum.plot(kind='bar',yerr=stdsum,colormap='OrRd_r',edgecolor='black',grid=False,figsize=(8,2),ax=ax,position=0.45,error_kw=dict(ecolor='black',elinewidth=0.5),width=0.8)
plt.show()
I have lists of ~10 corresponding input files containing columns of tab separated data approx 300 lines/datapoints each.
I'm looking to plot the contents of each set of data such that I have a 2 plots for each set of data one is simply of x vs (y1,y2,y3,...) and one which is transformed by a function e.g. x vs (f(y1), f(y2),f(y3),...).
I am not sure of the best way to achieve it, I thought about using a simple array of filenames then couldn't work out how to store them all without overwriting the data - something like this:
import numpy as np
import matplotlib.pyplot as plt
def ReadDataFile(file):
print (file)
x,y = np.loadtxt(file, unpack=True, usecols=(8,9))
return x, y
inputFiles = ['data1.txt','data2.txt','data2.txt',...]
for file in inputFiles:
x1,y1 = ReadDataFile(file) ## ? ##
p1,q1 = function(x1,y1) ## ? ##
plt.figure(1)
plt.plot(x1,y1)
plt.plot(x2,y2)
...
# plt.savefig(...)
plt.figure(2)
plt.plot(p1,q1)
plt.plot(p2,q2)
...
# plt.savefig(...)
plt.show()
I guess my question is how to best read and store all the data and maintain tha ability to access it without needing to put all the code in the readloop. Can I read two data sets into a list of pairs? Is that a thing in Python? if so, how do I access them?
Thanks in advance for any help!
Best regards!
Basically, I think you should put all your code in the readloop, because that will work easily. There's a slightly different way of using matplotlib that makes it easy to use the existing organization of your data AND write shorter code. Here's a toy, but complete, example:
import matplotlib.pyplot as plt
from numpy.random import random
fig, axs = plt.subplots(2)
for c in 'abc': # In your case, for filename in [file-list]:
x, y = random((2, 5))
axs[0].plot(x, y, label=c) # filename instead of c in your case
axs[1].plot(x, y**2, label=c) # Plot p(x,y), q(x,y) in your case
axs[0].legend() # handy to get this from the data list
fig.savefig('two_plots.png')
You can also create two figures and plot into each of them explicitly, if you need them in different files for page layout, etc:
import matplotlib.pyplot as plt
from numpy.random import random
fig1, ax1 = plt.subplots(1)
fig2, ax2 = plt.subplots(1)
for c in 'abc': # or, for filename in [file-list]:
x, y = random((2, 5))
ax1.plot(x, y, label=c)
ax2.plot(x, y**2, label=c)
ax1.legend()
ax2.legend()
fig1.savefig('two_plots_1.png')
fig2.savefig('two_plots_2.png')