Matplotlib: How to plot Time Series on top of Scatter Plot - python

I have found solutions to similar questions, but they all produce odd results.
I have a plot that looks like this:
generated using this code:
ax1 = dft.plot(kind='scatter',x='end_date',y='pct',c='fte_grade',colormap='Reds',colorbar=False,edgecolors='red',vmin=4,vmax=10)
ax1.set_xticklabels([datetime.datetime.fromtimestamp(ts / 1e9).strftime('%Y-%m-%d') for ts in ax1.get_xticks()])
dfb.plot(kind='scatter',x='end_date',y='pct',c='fte_grade',colormap='Blues',title='%s Polls'%state,ax=ax1,colorbar=False,edgecolors='blue',vmin=4,vmax=10)
plt.ylim(30,70)
plt.axhline(50,ls='--',alpha=0.5,color='grey')
plt.xticks(rotation=20)
Now, whenever I try to plot a line ontop of this, I get something like the following:
import matplotlib.pyplot as plt
import numpy as np
x = dft['pct']
u = dft['Trump Odds']
t = list(pd.to_datetime(dft['end_date']))
plt.hold(True)
plt.subplot2grid((1, 1), (0, 0))
plt.plot(t,x)
plt.scatter(t, u)
plt.show()
If it's not clear, this is not what I want. These dots represent individual polls, and I have data representing a line that aggregates the individual polls. I think this has something to do with datetimes and the possibility of multiple polls for a particular date in the polling. I think that the plotter is getting confused because I have double values for the same date, so it assumes this is not a time series, and when i plot a line, it maintains the assumption that we don't need any continuity.
There must be something within python that can handle drawing a time series on top of a time xaxis scatter plot right?
dft data:
end_date pct fte_grade Trump Odds
0 1598054400000000000 32.0 6 32.000000
1 1588550400000000000 32.0 7 32.000000
2 1582156800000000000 39.0 8 34.666667
3 1585180800000000000 33.0 8 34.206897
4 1587600000000000000 29.0 8 33.081081
5 1590019200000000000 32.0 8 33.025641
6 1559779200000000000 36.0 8 33.800000
7 1593043200000000000 32.0 8 32.400000

Is your str ange line is not due to the fact you didn't sort the df before to plot it:
import matplotlib.pyplot as plt
import numpy as np
dft=dft.sort_values(by=['end_date'])
x = dft['pct']
u = dft['Trump Odds']
t = list(pd.to_datetime(dft['end_date']))
plt.hold(True)
plt.subplot2grid((1, 1), (0, 0))
plt.plot(t,x)
plt.scatter(t, u)
plt.show()

Related

Python matplotlib ValueError: array is not a valid value for color [duplicate]

I have the following data set:
In[55]: usdbrl
Out[56]:
Date Price Open High Low Change STD
0 2016-03-18 3.6128 3.6241 3.6731 3.6051 -0.31 0.069592
1 2016-03-17 3.6241 3.7410 3.7449 3.6020 -3.16 0.069041
2 2016-03-16 3.7422 3.7643 3.8533 3.7302 -0.62 0.068772
3 2016-03-15 3.7656 3.6610 3.7814 3.6528 2.83 0.071474
4 2016-03-14 3.6618 3.5813 3.6631 3.5755 2.23 0.070348
5 2016-03-11 3.5820 3.6204 3.6692 3.5716 -1.09 0.076458
6 2016-03-10 3.6215 3.6835 3.7102 3.6071 -1.72 0.062977
7 2016-03-09 3.6849 3.7543 3.7572 3.6790 -1.88 0.041329
8 2016-03-08 3.7556 3.7826 3.8037 3.7315 -0.72 0.013700
9 2016-03-07 3.7830 3.7573 3.7981 3.7338 0.63 0.000000
I want to plot Price against Date:
But I would like to color the line by a third variable (in my case Date or Change).
Could anybody help with this please?
Thanks.
I've wrote a simple function to map a given property into a color:
import matplotlib.cm as cm
import matplotlib.pyplot as plt
def plot_colourline(x,y,c):
c = cm.jet((c-np.min(c))/(np.max(c)-np.min(c)))
ax = plt.gca()
for i in np.arange(len(x)-1):
ax.plot([x[i],x[i+1]], [y[i],y[i+1]], c=c[i])
return
This function normalizes the desired property and get a color from the jet colormap. You may want to use a different one. Then, get the current axis and plot different segments of your data with a different colour. Because I am doing a for loop, you should avoid using it for a very large data set, however, for normal purposes it is useful.
Consider the following example as a test:
import numpy as np
import matplotlib.pyplot as plt
n = 100
x = 1.*np.arange(n)
y = np.random.rand(n)
prop = x**2
fig = plt.figure(1, figsize=(5,5))
ax = fig.add_subplot(111)
plot_colourline(x,y,prop)
You could color the data points by a third variable, if that would help:
dates = [dt.date() for dt in pd.to_datetime(df.Date)]
plt.scatter(dates, df.Price, c=df.Change, s=100, lw=0)
plt.plot(dates, df.Price)
plt.colorbar()
plt.show()

Get the height of the rectangles in a plot

I have the following graph 1 obtained with the following code [2]. As you can see from the first line inside for I gave the height of the rectangles based on the standard deviation value. But I can't figure out how to get the height of the corresponding rectangle. For example given the blue rectangle I would like to return the 2 intervals in which it is included which are approximately 128.8 and 130.6. How can I do this?
[2] The code I used is the following:
import pandas as pd
import matplotlib.ticker as ticker
import matplotlib.pyplot as plt
import numpy as np
dfLunedi = pd.read_csv( "0.lun.csv", encoding = "ISO-8859-1", sep = ';')
dfSlotMean = dfLunedi.groupby('slotID', as_index=False).agg( NLunUn=('date', 'nunique'),NLunTot = ('date', 'count'), MeanBPM=('tempo', 'mean'), std = ('tempo','std') )
#print(dfSlotMean)
dfSlotMean.drop(dfSlotMean[dfSlotMean.NLunUn < 3].index, inplace=True)
df = pd.DataFrame(dfSlotMean)
df.to_csv('1.silLunedi.csv', sep = ';', index=False)
print(df)
bpmMattino = df['MeanBPM']
std = df['std']
listBpm = bpmMattino.tolist()
limInf = df['MeanBPM'] - df['std']
limSup = df['MeanBPM'] + df['std']
tick_spacing = 1
fig, ax = plt.subplots(1, 1)
for _, r in df.iterrows():
#
ax.plot([r['slotID'], r['slotID']+1], [r['MeanBPM']]*2, linewidth = r['std'] )
#ax.plot([r['slotID'], r['slotID']+1], [r['MeanBPM']]*2, linewidth = r['std'])
ax.xaxis.grid(True)
ax.yaxis.grid(True)
ax.yaxis.set_major_locator(ticker.MultipleLocator(tick_spacing))
ax.xaxis.set_major_locator(ticker.MultipleLocator(tick_spacing))
This is the content of the csv:
slotID NMonUnique NMonTot MeanBPM std
0 7 11 78 129.700564 29.323091
2 11 6 63 123.372397 24.049397
3 12 6 33 120.625667 24.029006
4 13 5 41 124.516341 30.814985
5 14 4 43 118.904512 26.205309
6 15 3 13 116.380538 24.336491
7 16 3 42 119.670881 27.416843
8 17 5 40 125.424125 32.215865
9 18 6 45 130.540578 24.437559
10 19 9 58 128.180172 32.099529
11 20 5 44 125.596045 28.060657
I would advise against using linewidth to show anything related to your data. The reason being that linewidth is measured in "points" (see the matplotlib documentation), the size of which are not related to the xy-space that you plot your data in. To see this in action, try plotting with different linewidths and changing the size of the plotting-window. The linewidth will not change with the axes.
Instead, if you do indeed want a rectangle, I suggest using matplotlib.patches.Rectangle. There is a good example of how to do that in the documentation, and I've also added an even shorter example below.
To give the rectangles different colors, you can do as here here and simply get a random tuple with 3 elements and use that for the color. Another option is to take a list of colors, for example the TABLEAU_COLORS from matplotlib.colors and take consecutive colors from that list. The latter may be better for testing, as the rectangles will get the same color for each run, but notice that there are just 10 colors in TABLEAU_COLORS, so you will have to cycle if you have more than 10 rectangles.
import matplotlib.pyplot as plt
import matplotlib.patches as ptc
import random
x = 3
y = 4.5
y_std = 0.3
fig, ax = plt.subplots()
for i in range(10):
c = tuple(random.random() for i in range(3))
# The other option as comment here
#c = mcolors.TABLEAU_COLORS[list(mcolors.TABLEAU_COLORS.keys())[i]]
rect = ptc.Rectangle(xy=(x, y-y_std), width=1, height=2*y_std, color=c)
ax.add_patch(rect)
ax.set_xlim((0,10))
ax.set_ylim((0,5))
plt.show()
If you define the height as the standard deviation, and the center is at the mean, then the interval should be [mean-(std/2) ; mean+(std/2)] for each rectangle right? Is it intentional that the rectangles overlap? If not, I think it is your use of linewidth to size the rectangles which is at fault. If the plot is there to visualize the mean and variance of the different categories something like a boxplot or raincloud plot might be better.

Plot gets shifted when using secondary_y

I want to plot temperature and precipitation from a weather station in the same plot with two y-axis. However, when I try this, one of the plots gets shifted for no reason it seems like. This is my code: (I have just tried for two precipitation measurements as of now, but you get the deal.)
ax = m_prec_ra.plot()
ax2 = m_prec_po.plot(kind='bar',secondary_y=True,ax=ax)
ax.set_xlabel('Times')
ax.set_ylabel('Left axes label')
ax2.set_ylabel('Right axes label')
This returns the following plot:
My plot is to be found here
I saw someone asking the same question, but I can't seem to figure out how to manually shift one of my datasets.
Here is my data:
print(m_prec_ra,m_prec_po)
Time
1 0.593436
2 0.532058
3 0.676219
4 1.780795
5 4.956048
6 11.909394
7 17.820051
8 14.225257
9 10.261061
10 2.628336
11 0.240568
12 0.431227
Name: Precipitation (mm), dtype: float64 Time
1 0.704339
2 1.225169
3 1.905223
4 4.156270
5 11.531221
6 22.246230
7 30.133800
8 27.634639
9 20.693056
10 5.282412
11 0.659365
12 0.622562
Name: Precipitation (mm), dtype: float64
The explanation for this behaviour is found in this Q & A.
Here, the solution would be to shift the lines one to the front, i.e. plotting against an index which starts at 0, instead of 1.
import numpy as np; np.random.seed(42)
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({"A" : np.arange(1,11),
"B" : np.random.rand(10),
"C" : np.random.rand(10)})
df.set_index("A", inplace=True)
ax = df.plot(y='B', kind = 'bar', legend = False)
df2 = df.reset_index()
df2.plot(ax = ax, secondary_y = True, y = 'B', kind = 'line')
plt.show()
What version of pandas are you using for this plotting?
Using 0.23.4 running this code:
df1 = pd.DataFrame({'Data_1':[1,2,4,8,16,12,8,4,1]})
df2 = pd.DataFrame({'Data_2':[1,2,4,8,16,12,8,4,1]})
ax = df1.plot()
ax2 = df2.plot(kind='bar',secondary_y=True,ax=ax)
ax.set_xlabel('Times')
ax.set_ylabel('Left axes label')
ax2.set_ylabel('Right axes label')
I get:
If you want to add sample data we could look at that.

Unable to handel missing value in time series plots using matplotlib

I am trying to create time series graph using following sample code but it plots noting when I put 'nan' for missing value but it works fine if no missing values in between
import matplotlib.pyplot as plot
import numpy as np
import datetime
date= [[2014,01,01], [2014,02,02], [2014,03,01], [2014,04,01], [2014,05,21]]
for i in range (len(date)):
dtf.append(datetime.date(int(datet[i][1]),int(datet[i][1]),int(datet[i][2])).toordinal())
days= np.array(dtf)
value =[ nan nan 35 nan 25] #not working
# work fine value =[ 20 21 35 24 25]
# not working value =[ 20 21 35 nan 25] its joins line upto 35 only
ax.plot_date(x=days, y=value, fmt="r-")
plot.show()
plot should be break at missing value and continue with next value
please let me know how to do it
A line connects two points. If one of the two points is nan it cannot be plotted, hence the line between a point and nan cannot be drawn.
Plotting an array with nan values will therefore only show lines, where both points are present.
This is fundamental logic and would occur even if trying to plot the data with pen and paper.
import matplotlib.pyplot as plt
import numpy as np
nan = np.nan
y = [2,3,2,nan,2,3,nan,3,nan,4,3,nan,2,1]
x = np.arange(len(y))
fig, ax = plt.subplots()
ax.plot(x,y, marker="o")
ax.grid()
ax.set_xticks(x)
for i in x:
if np.isnan(y[i]):
ax.text(i, 1.4, "nan", ha="center", rotation=90, fontsize=16)
plt.show()

Step plot by reading from file

I am a newbie to matplotlib. I am trying to plot step function and having some trouble. Right now I am able to read from the file and plot it as shown below. But the graph in the top is not in steps and the one below is not a proper step. I saw examples to plot step function by giving x & y value. I am not sure how to do it by reading from a file though. Can someone help me?
from pylab import plotfile, show, gca
import matplotlib.pyplot as plt
import matplotlib.cbook as cbook
fname = cbook.get_sample_data('sample.csv', asfileobj=False)
plotfile(fname, cols=(0,1), delimiter=' ')
plotfile(fname, cols=(0,2), newfig=False, delimiter=' ')
plt.show()
Sample inputs(3 columns):
27023927 3 0
27023938 2 0
27023949 3 0
27023961 2 0
27023972 3 0
27023984 2 0
27023995 3 0
27024007 2 0
27024008 2 1
27024018 3 1
27024030 2 1
27024031 2 0
27024041 3 0
27024053 2 0
27024054 2 1
27024098 2 0
Note: I have made the y-axis1 values as 3 & 2 so that this graph can occur in the top and another y-axis2 values 0 & 1 so that it comes in the bottom as shown below
Waveform as it looks now
Essentially your resolution is too low, for the lower plot the steps (except the last one) occur over 1 unit in x, while the steps are about an order of magnitude larger. This gives the appearance of steps while if you zoom in you will see the vertical lines have a non-infinite gradient (true steps change with an infinite gradient).
This is the same problem for both the top and bottom plots. We can easily remedy this by using the step function. You will generally find it easier to import the data, in this example I use the powerful numpy genfromtxt. This loads the data as an array data:
import numpy as np
import matplotlib.pylab as plt
data = np.genfromtxt('test.csv', delimiter=" ")
ax1 = plt.subplot(2,1,1)
ax1.step(data[:,0], data[:,1])
ax2 = plt.subplot(2,1,2)
ax2.step(data[:,0], data[:,2])
plt.show()
If you are new to python then there may be two things to mention, we use two subplots (ax1 and ax2) to plot the data rather than plotting on the same plot (this means you wouldn't need to add values to spatially separate them). We access the elements of the array through the [] this gives the [column, row] with : meaning all columns and and index i being the ith column
I would propose to load the data to a numpy array
import numpy as np
data = np.loadtxt('sample.csv')
And than plot it:
# first point
ax = [data[0,0]]
ay = [data[0,1]]
for i in range(1, data.shape[0]):
if ay[-1] != data[i,1]: # if y value has changed
# add current x and old y
ax.append(data[i,0])
ay.append(ay[-1])
# add current x and current y
ax.append(data[i,0])
ay.append(data[i,1])
import matplotlib.pyplot as plt
plt.plot(ax,ay)
plt.show()
What my solution differs from yours, is that I plot two points for every change in y. The two points produce this 90 degree bend. I Only plot the first curve. Change [?,1] to [?,2] for the second one.
Thanks for the suggestions. I was able to plot it after some research and here is my code,
import csv
import datetime
import matplotlib.pyplot as plt
import numpy as np
import dateutil.relativedelta as rd
import bisect
import scipy as sp
fname = "output.csv"
portfolio_list = []
x = []
a = []
b = []
portfolio = csv.DictReader(open(fname, "r"))
portfolio_list.extend(portfolio)
for data in portfolio_list:
x.append(data['i'])
a.append(data['a'])
b.append(data['b'])
stepList = [0, 1,2,3]
fig = plt.figure(figsize=(20, 10))
ax = fig.add_subplot(111)
plt.step(x, a, 'g', where='post')
plt.step(x, b, 'r', where='post')
plt.show()
and got the image like,

Categories

Resources