Unable to handel missing value in time series plots using matplotlib

Unable to handel missing value in time series plots using matplotlib - python

I am trying to create time series graph using following sample code but it plots noting when I put 'nan' for missing value but it works fine if no missing values in between
import matplotlib.pyplot as plot
import numpy as np
import datetime
date= [[2014,01,01], [2014,02,02], [2014,03,01], [2014,04,01], [2014,05,21]]
for i in range (len(date)):
dtf.append(datetime.date(int(datet[i][1]),int(datet[i][1]),int(datet[i][2])).toordinal())
days= np.array(dtf)
value =[ nan nan 35 nan 25] #not working
# work fine value =[ 20 21 35 24 25]
# not working value =[ 20 21 35 nan 25] its joins line upto 35 only
ax.plot_date(x=days, y=value, fmt="r-")
plot.show()
plot should be break at missing value and continue with next value
please let me know how to do it

A line connects two points. If one of the two points is nan it cannot be plotted, hence the line between a point and nan cannot be drawn.
Plotting an array with nan values will therefore only show lines, where both points are present.
This is fundamental logic and would occur even if trying to plot the data with pen and paper.
import matplotlib.pyplot as plt
import numpy as np
nan = np.nan
y = [2,3,2,nan,2,3,nan,3,nan,4,3,nan,2,1]
x = np.arange(len(y))
fig, ax = plt.subplots()
ax.plot(x,y, marker="o")
ax.grid()
ax.set_xticks(x)
for i in x:
if np.isnan(y[i]):
ax.text(i, 1.4, "nan", ha="center", rotation=90, fontsize=16)
plt.show()

Related

How to change y-axis limits on a bar graph?

I have a df, from which Ive indexed europe_n and Ive plotted a bar plot.
europe_n (r=5, c=45), looks like this. ;
df['Country'](string) & df['Population'](numeric) variable/s.
plt.bar(df['Country'],df['Population'], label='Population')
plt.xlabel('Country')
plt.ylabel('Population')
plt.legend()
plt.show()
Which gives me;
Objective: Im trying to change my y-axis limit to start from 0, instead of 43,094.
I ran the, plt.ylim(0,500000) method, but there was no change to the y-axis and threw an error. Any suggestions from matplotlib library?
Error;
Conclusion: The reason why I wasn't able to plot the graph as I wanted was due to all columns being in object dtype. I only realized this when Jupyter threw an error stating, 'there are no integers to plot'. Eventually converted the digit column Population to int type, code worked and I got the graph!

ax.set_ylim([0,max_value])
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
df = pd.DataFrame({
'Country':['Denmark', 'Finland', 'Iceland', 'Norway', 'Sweden'],
'Population':[5882261, 5540745, 372899, 5434319, 10549347]
})
print(df)
###
Country Population
0 Denmark 5882261
1 Finland 5540745
2 Iceland 372899
3 Norway 5434319
4 Sweden 10549347
fig, ax = plt.subplots()
ax.bar(df['Country'], df['Population'], color='#3B4B59')
ax.set_title('Population of Countries')
ax.set_xlabel('Country')
ax.set_ylabel('Population')
max_value = 12000000
ticks_loc = np.arange(0, max_value, step=2000000)
ax.set_yticks(ticks_loc)
ax.set_ylim([0,max_value])
ax.set_yticklabels(['{:,.0f}'.format(x) for x in ax.get_yticks()])
ax.grid(False)
fig.set_size_inches(10,5)
fig.set_dpi(300)
plt.show()
Be sure that you already imported the following packages,
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
Your code should probably like:
fig, ax = plt.subplots()
ax.bar(europe_n['Country'].values, europe_n['Area(sq km)'].values, color='#3B4B59')
ax.set_xlabel('Country')
ax.set_ylabel('Population')
max_value = 500000
ticks_loc = np.arange(0, max_value, step=10000)
ax.set_yticks(ticks_loc)
ax.set_ylim(0,max_value)
ax.set_yticklabels(['{:,.0f}'.format(x) for x in ax.get_yticks()])
ax.grid(False)
fig.set_size_inches(10,5)
fig.set_dpi(300)
plt.show()

https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.ylim.html
To set the y limit
plt.ylim(start,end)
To set the x limit
plt.xlim(start,end)
Example

Matplotlib: How to plot Time Series on top of Scatter Plot

I have found solutions to similar questions, but they all produce odd results.
I have a plot that looks like this:
generated using this code:
ax1 = dft.plot(kind='scatter',x='end_date',y='pct',c='fte_grade',colormap='Reds',colorbar=False,edgecolors='red',vmin=4,vmax=10)
ax1.set_xticklabels([datetime.datetime.fromtimestamp(ts / 1e9).strftime('%Y-%m-%d') for ts in ax1.get_xticks()])
dfb.plot(kind='scatter',x='end_date',y='pct',c='fte_grade',colormap='Blues',title='%s Polls'%state,ax=ax1,colorbar=False,edgecolors='blue',vmin=4,vmax=10)
plt.ylim(30,70)
plt.axhline(50,ls='--',alpha=0.5,color='grey')
plt.xticks(rotation=20)
Now, whenever I try to plot a line ontop of this, I get something like the following:
import matplotlib.pyplot as plt
import numpy as np
x = dft['pct']
u = dft['Trump Odds']
t = list(pd.to_datetime(dft['end_date']))
plt.hold(True)
plt.subplot2grid((1, 1), (0, 0))
plt.plot(t,x)
plt.scatter(t, u)
plt.show()
If it's not clear, this is not what I want. These dots represent individual polls, and I have data representing a line that aggregates the individual polls. I think this has something to do with datetimes and the possibility of multiple polls for a particular date in the polling. I think that the plotter is getting confused because I have double values for the same date, so it assumes this is not a time series, and when i plot a line, it maintains the assumption that we don't need any continuity.
There must be something within python that can handle drawing a time series on top of a time xaxis scatter plot right?
dft data:
end_date pct fte_grade Trump Odds
0 1598054400000000000 32.0 6 32.000000
1 1588550400000000000 32.0 7 32.000000
2 1582156800000000000 39.0 8 34.666667
3 1585180800000000000 33.0 8 34.206897
4 1587600000000000000 29.0 8 33.081081
5 1590019200000000000 32.0 8 33.025641
6 1559779200000000000 36.0 8 33.800000
7 1593043200000000000 32.0 8 32.400000

Is your str ange line is not due to the fact you didn't sort the df before to plot it:
import matplotlib.pyplot as plt
import numpy as np
dft=dft.sort_values(by=['end_date'])
x = dft['pct']
u = dft['Trump Odds']
t = list(pd.to_datetime(dft['end_date']))
plt.hold(True)
plt.subplot2grid((1, 1), (0, 0))
plt.plot(t,x)
plt.scatter(t, u)
plt.show()

How do I make my histogram of unequal bins show properly?

My data consists of the following:
Majority numbers < 60, and then a few outliers that are in the 2000s.
I want to display it in a histogram with the following bin ranges:
0-1, 1-2, 2-3, 3-4, ..., 59-60, 60-max
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.axes as axes
b = list(range(61)) + [2000] # will make [0, 1, ..., 60, 2000]
plt.hist(b, bins=b, edgecolor='black')
plt.xticks(b)
plt.show()
This shows the following:
Essentially what you see is all the numbers 0 .. 60 squished together on the left, and the 2000 on the right. This is not what I want.
So I remove the [2000] and get something like what I am looking for:
As you can see now it is better, but I still have the following problems:
How do I fix this such that the graph doesn't have any white space around (there's a big gap before 0 and after 60).
How do I fix this such that after 60, there is a 2000 tick that shows at the very end, while still keeping roughly the same spacing (not like the first?)

Here is one hacky solution using some random data. I still don't quite understand your second question but I tried to do something based on your wordings
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.axes as axes
fig, ax = plt.subplots(figsize=(12, 6))
data= np.random.normal(10, 5, 5000)
upper = 31
outlier = 2000
data = np.append(data, 100*[upper])
b = list(range(upper)) + [upper]
plt.hist(data, bins=b, edgecolor='black')
plt.xticks(b)
b[-1] = outlier
ax.set_xticklabels(b)
plt.xlim(0, upper)
plt.show()

Matplotlib Name points on plots

I have searched and found that using annotate in matplotlib for jupyter, we can name the x and y of a point.
I have retried doing as you suggested.
import matplotlib.pyplot as plt
import pandas as pd
def fit_data():
fig = plt.figure(1,figsize=(20,6))
plt.subplot(111)
data1 = pd.DataFrame({"ID" : list(range(11)),
"R" : list(range(11)),
"Theta" : list(range(11))})
plt.scatter(data1['R'], data1['Theta'], marker='o', color='b', s=15)
for i, row in data1.iterrows():
plt.annotate(row["ID"], xy=(row["R"],row["Theta"]))
plt.xlabel('R',size=20)
plt.ylabel('Theta',size=20)
plt.show()
plt.close()
fit_data()
It still doesn't take the ID from my data. It is still plotting an arbitrary plot.
this is the image after using the revised code
My data is as follows
1 19.177 24.642
2 9.398 12.774
3 9.077 12.373
4 15.287 19.448
5 4.129 5.41
6 2.25 3.416
7 11.674 15.16
8 10.962 14.469
9 1.924 3.628
10 2.087 3.891
11 9.706 13.186

I suppose the confusion comes from the fact that while scatter can plot all points at once, while an annotation is a singular object. You would hence need one annotation per row in the dataframe.
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({"ID" : list(range(6)), # Do not copy this part.
"R" : [5,4,1,2,3,4], # Use your own data
"Theta" : [20,15,40,60,51,71]}) # instead.
fig = plt.figure(1,figsize=(20,6))
plt.subplot(111)
plt.scatter(df['R'], df['Theta'], marker='o', color='b', s=15)
for i, row in df.iterrows():
plt.annotate(row["ID"], xy=(row["R"],row["Theta"]))
plt.xlabel('R',size=20)
plt.ylabel('Theta',size=20)
plt.show()

Mapping Column Data to Graph Properties

I have a dataframe called df that looks like this:
Qname X Y Magnitude
Bob 5 19 10
Tom 6 20 20
Jim 3 30 30
I would like to make a visual text plot of the data. I want to plot the Qnames on a figure with their coordinates set = X,Y and a s=Size.
I have tried:
fig = plt.figure()
ax = fig.add_axes((0,0,1,1))
X = df.X
Y = df.Y
S = df.magnitude
Name = df.Qname
ax.text(X, Y, Name, size=S, color='red', rotation=0, alpha=1.0, ha='center', va='center')
fig.show()
However nothing is showing up on my plot. Any help is greatly appreciated.

This should get you started. Matplotlib does not handle the text placement for you so you will probably need to play around with this.
import pandas as pd
import matplotlib.pyplot as plt
# replace this with your existing code to read the dataframe
df = pd.read_clipboard()
plt.scatter(df.X, df.Y, s=df.Magnitude)
# annotate the plot
# unfortunately you have to iterate over your points
# see http://stackoverflow.com/q/5147112/553404
for idx, row in df.iterrows():
# see http://stackoverflow.com/q/5147112/553404
# for better annotation options
plt.annotate(row['Qname'], xy=(row['X'], row['Y']))
plt.show()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Unable to handel missing value in time series plots using matplotlib - python

Related

How to change y-axis limits on a bar graph?

Matplotlib: How to plot Time Series on top of Scatter Plot

How do I make my histogram of unequal bins show properly?

Matplotlib Name points on plots

Mapping Column Data to Graph Properties

Categories

Resources