I have searched and found that using annotate in matplotlib for jupyter, we can name the x and y of a point.
I have retried doing as you suggested.
import matplotlib.pyplot as plt
import pandas as pd
def fit_data():
fig = plt.figure(1,figsize=(20,6))
plt.subplot(111)
data1 = pd.DataFrame({"ID" : list(range(11)),
"R" : list(range(11)),
"Theta" : list(range(11))})
plt.scatter(data1['R'], data1['Theta'], marker='o', color='b', s=15)
for i, row in data1.iterrows():
plt.annotate(row["ID"], xy=(row["R"],row["Theta"]))
plt.xlabel('R',size=20)
plt.ylabel('Theta',size=20)
plt.show()
plt.close()
fit_data()
It still doesn't take the ID from my data. It is still plotting an arbitrary plot.
this is the image after using the revised code
My data is as follows
1 19.177 24.642
2 9.398 12.774
3 9.077 12.373
4 15.287 19.448
5 4.129 5.41
6 2.25 3.416
7 11.674 15.16
8 10.962 14.469
9 1.924 3.628
10 2.087 3.891
11 9.706 13.186
I suppose the confusion comes from the fact that while scatter can plot all points at once, while an annotation is a singular object. You would hence need one annotation per row in the dataframe.
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({"ID" : list(range(6)), # Do not copy this part.
"R" : [5,4,1,2,3,4], # Use your own data
"Theta" : [20,15,40,60,51,71]}) # instead.
fig = plt.figure(1,figsize=(20,6))
plt.subplot(111)
plt.scatter(df['R'], df['Theta'], marker='o', color='b', s=15)
for i, row in df.iterrows():
plt.annotate(row["ID"], xy=(row["R"],row["Theta"]))
plt.xlabel('R',size=20)
plt.ylabel('Theta',size=20)
plt.show()
Related
I have a df, from which Ive indexed europe_n and Ive plotted a bar plot.
europe_n (r=5, c=45), looks like this. ;
df['Country'](string) & df['Population'](numeric) variable/s.
plt.bar(df['Country'],df['Population'], label='Population')
plt.xlabel('Country')
plt.ylabel('Population')
plt.legend()
plt.show()
Which gives me;
Objective: Im trying to change my y-axis limit to start from 0, instead of 43,094.
I ran the, plt.ylim(0,500000) method, but there was no change to the y-axis and threw an error. Any suggestions from matplotlib library?
Error;
Conclusion: The reason why I wasn't able to plot the graph as I wanted was due to all columns being in object dtype. I only realized this when Jupyter threw an error stating, 'there are no integers to plot'. Eventually converted the digit column Population to int type, code worked and I got the graph!
ax.set_ylim([0,max_value])
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
df = pd.DataFrame({
'Country':['Denmark', 'Finland', 'Iceland', 'Norway', 'Sweden'],
'Population':[5882261, 5540745, 372899, 5434319, 10549347]
})
print(df)
###
Country Population
0 Denmark 5882261
1 Finland 5540745
2 Iceland 372899
3 Norway 5434319
4 Sweden 10549347
fig, ax = plt.subplots()
ax.bar(df['Country'], df['Population'], color='#3B4B59')
ax.set_title('Population of Countries')
ax.set_xlabel('Country')
ax.set_ylabel('Population')
max_value = 12000000
ticks_loc = np.arange(0, max_value, step=2000000)
ax.set_yticks(ticks_loc)
ax.set_ylim([0,max_value])
ax.set_yticklabels(['{:,.0f}'.format(x) for x in ax.get_yticks()])
ax.grid(False)
fig.set_size_inches(10,5)
fig.set_dpi(300)
plt.show()
Be sure that you already imported the following packages,
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
Your code should probably like:
fig, ax = plt.subplots()
ax.bar(europe_n['Country'].values, europe_n['Area(sq km)'].values, color='#3B4B59')
ax.set_xlabel('Country')
ax.set_ylabel('Population')
max_value = 500000
ticks_loc = np.arange(0, max_value, step=10000)
ax.set_yticks(ticks_loc)
ax.set_ylim(0,max_value)
ax.set_yticklabels(['{:,.0f}'.format(x) for x in ax.get_yticks()])
ax.grid(False)
fig.set_size_inches(10,5)
fig.set_dpi(300)
plt.show()
https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.ylim.html
To set the y limit
plt.ylim(start,end)
To set the x limit
plt.xlim(start,end)
Example
I am trying to create a violin plot using seaborn.
My df looks like this:
drought
Out[65]:
Dataset TGLO TAM TAFR TAA Type
0 ACCESS1-0 0.181017 0.068988 0.166761 0.069303 AMIP
1 ACCESS1-3 0.109676 -0.001961 -0.008700 0.373162 AMIP
2 BNU-ESM 0.277070 0.272242 0.266324 -0.077017 AMIP
3 CCSM4 0.385075 0.258976 0.304438 0.211241 AMIP
...
21 CMAP 0.087274 -0.062214 -0.079958 0.372267 OBS
22 ERA 0.179999 -0.010484 0.134584 0.204052 OBS
23 GPCC 0.173947 -0.020719 0.021819 0.370157 OBS
24 GPCP 0.151394 0.036450 -0.021462 0.336876 OBS
25 UEA 0.223828 -0.018237 0.088486 0.398062 OBS
26 UofD 0.190969 0.094744 0.036374 0.310938 OBS
I want to have a split violin plot based on Type and this is the code I am using
sns.violinplot(data=drought, hue='Type', split=True)
And this is the error:
Cannot use `hue` without `x` or `y`
I do not have an x or y value because what I want is to have the columns as x , and the values in the rows as y.
Thanks for your help!
Do you want to ignore the 'Dataset' column and have split violins for the 4 other columns? In that case, you need to convert these 4 columns to "long form" (via pandas' melt()).
Here is an example:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
drought = pd.DataFrame({'Dataset': ["".join(np.random.choice([*'VWXYZ'], 5)) for _ in range(40)],
'TGLO': np.random.randn(40),
'TAM': np.random.randn(40),
'TAFR': np.random.randn(40),
'TAA': np.random.randn(40),
'Type': np.repeat(['AMIP', 'OBS'], 20)})
drought_long = drought.melt(id_vars=['Dataset', 'Type'], value_vars=['TGLO', 'TAM', 'TAFR', 'TAA'])
sns.set_style('white')
ax = sns.violinplot(data=drought_long, x='variable', y='value', hue='Type', split=True, palette='flare')
ax.legend()
sns.despine()
plt.tight_layout()
plt.show()
I was trying to plot using seaborn, but the label was not showing up, even though it was assigned in the axis object.
How to show the label on the plot?
Here, is my code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
dx = pd.DataFrame({'c0':range(5), 'c1':range(5,10)})
dx.index = list('abcde')
ax = sns.pointplot(x=dx.index,
y="c0",
data=dx, color="r",
scale=0.5, dodge=True,
capsize=.2, label="child")
ax = sns.pointplot(x=dx.index,
y="c1",
data=dx, color="g",
scale=0.5, dodge=True,
capsize=.2, label="teen")
ax.legend()
plt.show()
The legend gives error:
No handles with labels found to put in legend.
sns.pointplot() isn't meant for just plotting multiple dataframe attributes in the same figure, but for visualizing relationships between them, in which case it will generate its own labels. You can override them by passing a labels argument to ax.legend() (see Add Legend to Seaborn point plot ), but once you make changes to your plot, chances are there is going to be some mess.
To produce your plots using seaborn esthetics, I would do this:
sns.set_style("white")
fig, ax = plt.subplots()
plt.plot(dx.index, dx.c0, "o-", ms=3,
color="r", label='child')
plt.plot(dx.index, dx.c1, "o-", ms=3,
color="g", label='teen')
ax.legend()
Result:
If you're using seaborn you should try to use tidy (or "long") data rather than "wide". See this link about Organizing Datasets
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
dx = pd.DataFrame({'c0':range(5), 'c1':range(5,10)})
dx.index = list('abcde')
# reset the index and melt the remaining columns
dx1 = dx.reset_index().melt(id_vars='index')
print(dx1)
index variable value
0 a c0 0
1 b c0 1
2 c c0 2
3 d c0 3
4 e c0 4
5 a c1 5
6 b c1 6
7 c c1 7
8 d c1 8
9 e c1 9
You can now plot once rather than twice
# modified the "x" and "data" parameters
# added the "hue" parameter and removed the "color" parameter
ax = sns.pointplot(x='index',
y="value",
data=dx1,
hue='variable',
scale=0.5, dodge=True,
capsize=.2)
# get handles and labels from the data so you can edit them
h,l = ax.get_legend_handles_labels()
# keep same handles, edit labels with names of choice
ax.legend(handles=h, labels=['child', 'teen'])
plt.show()
Edit
As of pandas version 1.1.0, pd.melt has parameter ignore_index so we don't have to reset the index any more.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
dx = pd.DataFrame({'c0':range(5), 'c1':range(5,10)})
dx.index = list('abcde')
dx = dx.melt(ignore_index=False)
ax = sns.pointplot(x=dx.index,
y="value",
data=dx, hue="variable",
scale=0.5, dodge=True,
capsize=.2)
h,l = ax.get_legend_handles_labels()
l = ["child", "teen"]
ax.legend(h, l)
plt.show()
In your case, the ylabel is already set to c0, so a legend isn't necessary.
If you insist on legend, I suggest not using sns. Instead, try this using pandas' interface to matplotlib
dx = pd.DataFrame({'c0':range(5), 'c1':range(5,10)})
dx.set_index('c0').plot(marker='o', )
Or use matplotlib's API directly with more flexibility
plt.plot(dx.c0, dx.c1, marker='o', label='child')
plt.legend()
After some practice, I found the solution using pandas itself,
dx.plot(kind='line',marker='o',xticks=range(5))
Gives the plot:
In Pandas, I am doing:
bp = p_df.groupby('class').plot(kind='kde')
p_df is a dataframe object.
However, this is producing two plots, one for each class.
How do I force one plot with both classes in the same plot?
Version 1:
You can create your axis, and then use the ax keyword of DataFrameGroupBy.plot to add everything to these axes:
import matplotlib.pyplot as plt
p_df = pd.DataFrame({"class": [1,1,2,2,1], "a": [2,3,2,3,2]})
fig, ax = plt.subplots(figsize=(8,6))
bp = p_df.groupby('class').plot(kind='kde', ax=ax)
This is the result:
Unfortunately, the labeling of the legend does not make too much sense here.
Version 2:
Another way would be to loop through the groups and plot the curves manually:
classes = ["class 1"] * 5 + ["class 2"] * 5
vals = [1,3,5,1,3] + [2,6,7,5,2]
p_df = pd.DataFrame({"class": classes, "vals": vals})
fig, ax = plt.subplots(figsize=(8,6))
for label, df in p_df.groupby('class'):
df.vals.plot(kind="kde", ax=ax, label=label)
plt.legend()
This way you can easily control the legend. This is the result:
import matplotlib.pyplot as plt
p_df.groupby('class').plot(kind='kde', ax=plt.gca())
Another approach would be using seaborn module. This would plot the two density estimates on the same axes without specifying a variable to hold the axes as follows (using some data frame setup from the other answer):
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
# data to create an example data frame
classes = ["c1"] * 5 + ["c2"] * 5
vals = [1,3,5,1,3] + [2,6,7,5,2]
# the data frame
df = pd.DataFrame({"cls": classes, "indices":idx, "vals": vals})
# this is to plot the kde
sns.kdeplot(df.vals[df.cls == "c1"],label='c1');
sns.kdeplot(df.vals[df.cls == "c2"],label='c2');
# beautifying the labels
plt.xlabel('value')
plt.ylabel('density')
plt.show()
This results in the following image.
There are two easy methods to plot each group in the same plot.
When using pandas.DataFrame.groupby, the column to be plotted, (e.g. the aggregation column) should be specified.
Use seaborn.kdeplot or seaborn.displot and specify the hue parameter
Using pandas v1.2.4, matplotlib 3.4.2, seaborn 0.11.1
The OP is specific to plotting the kde, but the steps are the same for many plot types (e.g. kind='line', sns.lineplot, etc.).
Imports and Sample Data
For the sample data, the groups are in the 'kind' column, and the kde of 'duration' will be plotted, ignoring 'waiting'.
import pandas as pd
import seaborn as sns
df = sns.load_dataset('geyser')
# display(df.head())
duration waiting kind
0 3.600 79 long
1 1.800 54 short
2 3.333 74 long
3 2.283 62 short
4 4.533 85 long
Plot with pandas.DataFrame.plot
Reshape the data using .groupby or .pivot
.groupby
Specify the aggregation column, ['duration'], and kind='kde'.
ax = df.groupby('kind')['duration'].plot(kind='kde', legend=True)
.pivot
ax = df.pivot(columns='kind', values='duration').plot(kind='kde')
Plot with seaborn.kdeplot
Specify hue='kind'
ax = sns.kdeplot(data=df, x='duration', hue='kind')
Plot with seaborn.displot
Specify hue='kind' and kind='kde'
fig = sns.displot(data=df, kind='kde', x='duration', hue='kind')
Plot
Maybe you can try this:
fig, ax = plt.subplots(figsize=(10,8))
classes = list(df.class.unique())
for c in classes:
df2 = data.loc[data['class'] == c]
df2.vals.plot(kind="kde", ax=ax, label=c)
plt.legend()
I have a dataframe called df that looks like this:
Qname X Y Magnitude
Bob 5 19 10
Tom 6 20 20
Jim 3 30 30
I would like to make a visual text plot of the data. I want to plot the Qnames on a figure with their coordinates set = X,Y and a s=Size.
I have tried:
fig = plt.figure()
ax = fig.add_axes((0,0,1,1))
X = df.X
Y = df.Y
S = df.magnitude
Name = df.Qname
ax.text(X, Y, Name, size=S, color='red', rotation=0, alpha=1.0, ha='center', va='center')
fig.show()
However nothing is showing up on my plot. Any help is greatly appreciated.
This should get you started. Matplotlib does not handle the text placement for you so you will probably need to play around with this.
import pandas as pd
import matplotlib.pyplot as plt
# replace this with your existing code to read the dataframe
df = pd.read_clipboard()
plt.scatter(df.X, df.Y, s=df.Magnitude)
# annotate the plot
# unfortunately you have to iterate over your points
# see http://stackoverflow.com/q/5147112/553404
for idx, row in df.iterrows():
# see http://stackoverflow.com/q/5147112/553404
# for better annotation options
plt.annotate(row['Qname'], xy=(row['X'], row['Y']))
plt.show()