I am trying to create line chart using pandas data frame and matplotlib. I am using following code to create line chart.
import pandas as pd
import matplotlib.pyplot as plt
data = {
'Quarter': ['Q1-2018', 'Q2-2018', 'Q3-2018', 'Q4-2018', 'Q1-2019'],
'Data': [256339, 265555, 274880, 211128, 0]
}
dataset2 = pd.DataFrame(data=data)
ax3 = dataset2[['Quarter', 'Data']].plot.line(x='Quarter', y='Data',
legend=False)
ax3.margins(x=0.1)
plt.show()
Which produces following result
As you can see, start and end of line is starting and ending at edge of the plot.
What I am trying to achieve is to have some space at the start and end of line chart like below.
I tried setting x margin by using ax3.margins(x=0.1) but it does not do any thing.
How do I add some space to start and end of chart so that line does not stick to edges?
In pandas 0.23 you would get the correct plot with margins as desired, yet without labels. This "bug" seems to have been fixed in pandas 0.24, at the expense of another undesired behaviour.
That is, pandas fixes the limits of categorical plots and sets the ticklabels to the positions that would look correct if limits are not changed. While you could in theory unfix the limits (ax.set_xlim(None, None)) and let the axes autoscale (ax.autoscale()), the result will be a incorrectly labelled plot.
I doubt there is any reasoning behind this, it's rather an oversight in the pandas source. This pandas issue best describes the problem, which then boils down to this 5 year old issue.
In any case, for categorical plots, consider using matplotlib directly. It's categorical feature is pretty stable by now and easy to use:
import pandas as pd
import matplotlib.pyplot as plt
data = {
'Quarter': ['Q1-2018', 'Q2-2018', 'Q3-2018', 'Q4-2018', 'Q1-2019'],
'Data': [1,3,2,4,1]
}
df = pd.DataFrame(data=data)
plt.plot("Quarter", "Data", data=df)
plt.show()
Related
This question already has answers here:
How to move labels from bottom to top without adding "ticks"
(2 answers)
How to have the axis ticks in both top and bottom, left and right of a heatmap
(2 answers)
Closed 4 months ago.
I have created a heatmap using the seaborn and matplotlib package in python, and while it is perfectly suited for my current needs, I really would prefer to have the labels on the x-axis of the heatmap to be placed at the top of the plot, rather than at the bottom (which seems to be its default).
So an abridged form of my data looks like this:
NP NP1 NP2 NP3 NP4 NP5
identifier
A1BG~P04217 -0.094045 0.012229 0.102279 1.319618 0.002383
A2M~P01023 -0.805089 -0.477339 -0.351341 0.089735 -0.473815
AARS1~P49588 0.081827 -0.099849 -0.287426 0.101588 0.136366
ABCB6~Q9NP58 0.109911 0.458039 -0.039325 -0.484872 1.905586
ABCC1~I3L4X2 -0.560155 0.580285 0.012868 0.291303 -0.407900
ABCC4~O15439 0.055264 0.138630 -0.204665 0.191241 0.304999
ABCE1~P61221 -0.510108 -0.059724 -0.233365 0.078956 -0.651327
ABCF1~Q8NE71 -0.348526 -0.135414 -0.390021 -0.190644 -0.276303
ABHD10~Q9NUJ1 0.237959 -2.060834 0.325901 -0.778036 -4.046345
ABHD11~Q8NFV4 0.294587 1.193258 -0.797294 -0.148064 -1.153391
And when I use the following code:
import seaborn as sns
import matplotlib as plt
fig, ax = plt.subplots(figsize=(10,30))
ax = sns.heatmap(df_example, annot=True, xticklabels=True)
I get this kind of plot:
https://imgpile.com/i/T3zPH1
I should note that the this plot was made from the abridged dataframe above, the actual dataframe has thousands of identifiers, making it very long.
But as you can see, the labels on the x axis only appear at the bottom. I have been trying to get them to appear on the top, but seaborn doesn't seem to allow this kind of formatting.
So I have also tried using plotly express, but while I solve the issue of placing my x-axis labels on top, I have been completely unable to format the heat map as I had before using seaborn. The following code:
import plotly.express as px
fig = px.imshow(df_example, width= 500, height=6000)
fig.update_xaxes(side="top")
fig.show()
yields this kind of plot: https://imgpile.com/i/T3zF42.
I have tried many times to reformat it using the documentation from plotly (https://plotly.com/python/heatmaps/), but I can't seem to get it to work. When one thing is fixed, another problem arises. I really just want to keep using the seaborn based code as above, and just fix the x-axis labels. I'm also happy to have the x-axis label at both the top and bottom of the plot, but I can't get that work presently. Can someone advise me on what to do here?
Ok, so I did a bit more research, and it turns out you can add the follow code with the seaborn approach:
plt.tick_params(axis='both', which='major', labelsize=10, labelbottom = False, bottom=False, top = False, labeltop=True)
If your data are stored into csv file, you can use this code:
import pandas as pd
import plotly.express as px
df = pd.read_csv("file.csv").round(2)
fig = px.imshow(df.iloc[:,1:],
y = df['identifier'],
text_auto=True, aspect="auto")
fig.show()
The data in the CSV file are in the following format:
identifier NP1 NP2 NP3 NP4 NP5
A1BG~P04217 -0.094045 0.012229 0.102279 1.319618 0.002383
A2M~P01023 -0.805089 -0.477339 -0.351341 0.089735 -0.473815
AARS1~P49588 0.081827 -0.099849 -0.287426 0.101588 0.136366
ABCB6~Q9NP58 0.109911 0.458039 -0.039325 -0.484872 1.905586
ABCC1~I3L4X2 -0.560155 0.580285 0.012868 0.291303 -0.407900
ABCC4~O15439 0.055264 0.138630 -0.204665 0.191241 0.304999
ABCE1~P61221 -0.510108 -0.059724 -0.233365 0.078956 -0.651327
ABCF1~Q8NE71 -0.348526 -0.135414 -0.390021 -0.190644 -0.276303
ABHD10~Q9NUJ1 0.237959 -2.060834 0.325901 -0.778036 -4.046345
ABHD11~Q8NFV4 0.294587 1.193258 -0.797294 -0.148064 -1.153391
Now let's display the xaxis top of the heatmap by adding:
fig.update_layout(xaxis = dict(side ="top"))
Alternative solution if you have old version of Plotly:
fig = go.Figure(data=go.Heatmap(
x=df.columns[1:],
y=df.identifier,
z=df.iloc[:,1:],
text=df.iloc[:,1:],
texttemplate="%{text}"))
fig.update_layout(xaxis = dict(side ="top"))
fig.show()
I am trying to generate a smooth line using a dataset that contains time (measured as number of days) and a set of numbers that represent a socioeconomic variable.
Here is a sample of my data:
date, data
726,1.2414
727,1.2414
728,1.2414
729,1.2414
730,1.2414
731,1.2414
732,1.2414
733,1.2414
734,1.2414
735,1.2414
736,1.2414
737,1.804597701
738,1.804597701
739,1.804597701
740,1.804597701
741,1.804597701
742,1.804597701
743,1.804597701
744,1.804597701
745,1.804597701
746,1.804597701
747,1.804597701
748,1.804597701
749,1.804597701
750,1.804597701
751,1.804597701
752,1.793103448
753,1.793103448
754,1.793103448
755,1.793103448
756,1.793103448
757,1.793103448
758,1.793103448
759,1.793103448
760,1.793103448
761,1.793103448
762,1.793103448
763,1.793103448
764,1
765,1
This is my code so far:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline
out_file = "path_to_file/file.csv"
df = pd.read_csv(out_file)
time = df['date']
data = df['data']
ax1 = plt.subplot2grid((4,3),(0,0), colspan = 2, rowspan = 2) # Will be adding other plots
plt.plot(time, data)
plt.yticks(np.arange(1,5,1)) # Include classes 1-4 showing only 1 step changes
plt.gca().invert_yaxis() # Reverse y axis
plt.ylabel('Trend', fontsize = 8, labelpad = 10)
This generates the following plot:
Test plot
I have seen posts that answer similar questions (like the ones below), but can't seem to get my code to work. Can anyone suggest an elegant solution?
Generating smooth line graph using matplotlib
Python Matplotlib - Smooth plot line for x-axis with date values
I am new to Pandas and its libraries. By using the following code I can make a scatter plot of my 'class' in the plane 'Month' vs 'Amount'. Because I consider more than one class I would like to use colors for distinguishing each class and to see a legend in the figure.
Below my first attempt can generate dots for each given class having a different color but it can not generate the right legend. On the contrary the second attempt can generate the right legend but labeling is not correct. I can indeed visualize the first letter of each class name. Moreover this second attempt plots as many figures as the number of classes. I would like to see how I can correct both my attempts. Any ideas? suggestions? Thanks in advance.
ps. I wanted to use
colors = itertools.cycle(['gold','blue','red','chocolate','mediumpurple','dodgerblue'])
as well, so that I could decide the colors. I could not make it though.
Attempts:
import pandas as pd
import numpy as np
import random
from matplotlib import pyplot as plt
import matplotlib.cm as cm
np.random.seed(176)
random.seed(16)
df = pd.DataFrame({'class': random.sample(['living room','dining room','kitchen','car','bathroom','office']*10, k=25),
'Amount': np.random.sample(25)*100,
'Year': random.sample(list(range(2010,2018))*50, k=25),
'Month': random.sample(list(range(1,12))*100, k=25)})
print(df.head(25))
print(df['class'].unique())
for cls1 in df['class'].unique():
test1= pd.pivot_table(df[df['class']==cls1], index=['class', 'Month', 'Year'], values=['Amount'])
print(test1)
colors = cm.rainbow(np.linspace(0,2,len(df['class'].unique())))
fig, ax = plt.subplots(figsize=(8,6))
for cls1,c in zip(df['class'].unique(),colors):
# SCATTER PLOT
test = pd.pivot_table(df[df['class']==cls1], index=['class', 'Month', 'Year'], values=['Amount'], aggfunc=np.sum).reset_index()
test.plot(kind='scatter', x='Month',y='Amount', figsize=(16,6),stacked=False,ax=ax,color=c,s=50).legend(df['class'].unique(),scatterpoints=1,loc='upper left',ncol=3,fontsize=10.5)
plt.show()
for cls2,c in zip(df['class'].unique(),colors):
# SCATTER PLOT
test = pd.pivot_table(df[df['class']==cls2], index=['class', 'Month', 'Year'], values=['Amount'], aggfunc=np.sum).reset_index()
test.plot(kind='scatter', x='Month',y='Amount', figsize=(16,6),stacked=False,color=c,s=50).legend(cls2,scatterpoints=1,loc='upper left',ncol=3,fontsize=10.5)
plt.show()
enter image description here
Up-to-date code
I would like to plot the following code via scatter plot.
for cls1 in df['class'].unique():
test3= pd.pivot_table(df[df['class']==cls1], index=['class', 'Month'], values=['Amount'], aggfunc=np.sum)
print(test3)
Unlike above here a class appears only once each month thanks to the sum over Amount.
Here my attempt:
for cls2 in df['class'].unique():
test2= pd.pivot_table(df[df['class']==cls2], index=['class','Year'], values=['Amount'], aggfunc=np.sum).reset_index()
print(test2)
sns.lmplot(x='Year' , y='Amount', data=test2, hue='class',palette='hls', fit_reg=False,size= 5, aspect=5/3, legend_out=False,scatter_kws={"s": 70})
plt.show()
This gives me one plot for each class. A part from the first one (class=car) which shows different colors, the others seem to be ok. Despite this, I would like to have only one plot with all classes..
After the Marvin Taschenberger's useful help here is up-to-date result:
enter image description here
I get a white dot instead a colorful one and the legend has a different place in the figure with respect to your figure. Moreover I can not see the year labels correctly. Why?
An easy way to work around ( unfortunately not solving) your problem is letting seaborn deal with the heavy lifting due to the simple line
sns.lmplot(x='Month' , y='Amount', data=df, hue='class',palette='hls', fit_reg=False,size= 8, aspect=5/3, legend_out=False)
You could also plug in other colors for palette
EDIT : how about this then :
`
import pandas as pd
import numpy as np
import random
from matplotlib import pyplot as plt
import seaborn as sns
np.random.seed(176)
random.seed(16)
df = pd.DataFrame({'class': random.sample(['living room','dining room','kitchen','car','bathroom','office']*10, k=25),
'Amount': np.random.sample(25)*100,
'Year': random.sample(list(range(2010,2018))*50, k=25),
'Month': random.sample(list(range(1,12))*100, k=25)})
frame = pd.pivot_table(df, index=['class','Year'], values=['Amount'], aggfunc=np.sum).reset_index()
sns.lmplot(x='Year' , y='Amount', data=frame, hue='class',palette='hls', fit_reg=False,size= 5, aspect=5/3, legend_out=False,scatter_kws={"s": 70})
plt.show()
How can I format the x-axis so that the spacing between periods is "to scale". As in, the distance between 10yr and 30yr should be much larger than the distance between 1yr and 2yr.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import Quandl as ql
yield_ = ql.get("USTREASURY/YIELD")
today = yield_.iloc[-1,:]
month_ago = yield_.iloc[-1000,:]
df = pd.concat([today, month_ago], axis=1)
df.columns = ['today', 'month_ago']
df.plot(style={'today': 'ro-', 'month_ago': 'bx--'},title='Treasury Yield Curve, %');
plt.show()
I want my chart to look like this...
I think doing this while staying purely within Pandas might be tricky. You first need to create a new matplotlib figure and axe. The following might not work exactly but will give you a good idea.
df['years']=[1/12.,0.25,0.5,1,2,3,5,7,10,20,30]
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
df.plot(x='years',y='today',ax=ax,kind='scatter')
df.plot(x='years',y='month_ago',ax=ax,kind='scatter')
plt.show()
If you want your axe labels to look like your chart you'll also need to set the lower and upper limit of your axis so they look good and then do something like:
ax.set_xticklabels(list(df.index))
I'm trying to plot a statistical time series using Seaborn but I can't seem to figure it out. I've tried using both the lmplot and tsplot methods but am obviously missing something key.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as pylab
p = pd.DataFrame({
"date": pd.date_range('1/1/2015', periods = 12),
"values":range(1,13)
})
# Regular Matplotlib (via pandas) works
p.plot(x = "date", style = 'o--')
# Can't get lmplot to work
sns.lmplot(x = "date", y = "values", data = p)
# Can't get tsplot to work either
sns.tsplot(time = "date", value = "values", data = p)
Sorry I can't add this as a comment as I'm not rated high enough.
I've been battling through timeseries recently, and the following SO post is pretty much exactly the same as yours, with the same question about confidence intervals:
Plotting time-series data with seaborn