I am using the standard pandas.df.plot() function to plot two columns in a dataframe. For some reason, the x-axis values and the xlabel are not visible! There seem to be no options to turn them on in the function either (see https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.plot.html).
Does someone know what is going on, and how to correct it?
import matplotlib.cm as cm
import pandas as pd
ax1 = df.plot.scatter(x='t', y='hlReference', c='STEP_STRENGTH', cmap=cm.autumn);
gives this:
This is a bug with Jupyter notebooks displaying pandas scatterplots that have a colorscale displayed while using Matplotlib as the plotting backend.
#june-skeeter has a solution in the answers that works. Alternatively, pass sharex=False to df.plot.scatter and you don't need to create subplots.
import matplotlib.cm as cm
import pandas as pd
X = np.random.rand(10,3)
df = pd.DataFrame(X,columns=['t','hlReference', 'STEP_STRENGTH'])
df.plot.scatter(
x='t',
y='hlReference',
c='STEP_STRENGTH',
cmap=cm.autumn,
sharex=False
)
See discussion in this closed pandas issues. Which references the above solution in a related SO answer.
Still an issue with pandas v1.1.0. You can track the issue here: https://github.com/pandas-dev/pandas/issues/36064
Create your axes instance first and then send it as an argument to the plot()
import matplotlib.cm as cm
import pandas as pd
X = np.random.rand(10,3)
df = pd.DataFrame(X,columns=['t','hlReference', 'STEP_STRENGTH'])
fig,ax1=plt.subplots()
df.plot.scatter(x='t', y='hlReference', c='STEP_STRENGTH', cmap=cm.autumn,ax=ax1)
Related
This question already has answers here:
How to plot in multiple subplots
(12 answers)
Closed 6 months ago.
I am trying to create subplots inside for loop for various columns of the dataset. I am using the California housing dataset from sklearn. So, there are 4 columns and I want to display three figures for each column in a subplot. I have provided the code which I have tried. Can somebody help me with this issue? Can we make it dynamic so that if I need to add more figure then we can add easily with title?
from sklearn.datasets import fetch_california_housing
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
california_housing = fetch_california_housing(as_frame=True)
# california_housing.frame.head()
features_of_interest = ["AveRooms", "AveBedrms", "AveOccup", "Population"]
california_housing.frame[features_of_interest]
fig, axes = plt.subplots(4, 3)
for cols in features_of_interest:
# scatterplot
sns.scatterplot(x=california_housing.frame[cols], y=california_housing.target)
# histogram
sns.histplot(x=california_housing.frame[cols], y=california_housing.target)
#qqplot
sm.qqplot(california_housing.frame[cols], line='45')
plt.show()
There are some problems with your code:
you need to import statsmodels.api as sm
you need to use the ax parameter from scatterplot, histplot, and qqplot to indicate where the plot will be present
the way that you load the data isnot allowing matplotlib and seaborn to use the data. I made some changes on this part.
you do not need to show on each iteration just at the end.
from sklearn.datasets import fetch_california_housing
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import statsmodels.api as sm
california_housing = fetch_california_housing(as_frame=True).frame
features_of_interest = ["AveRooms", "AveBedrms", "AveOccup", "Population"]
fig, axes = plt.subplots(len(features_of_interest), 3)
for i, cols in enumerate(features_of_interest):
# scatterplot
sns.scatterplot(x=california_housing[cols], y=california_housing['MedHouseVal'], ax=axes[i,0])
# histogram
sns.histplot(x=california_housing[cols], y=california_housing['MedHouseVal'], ax=axes[i,1])
#qqplot
sm.qqplot(california_housing[cols], line='45', ax=axes[i,2])
plt.show()
PS.: I used len(features_of_interest) to auto-adapt our script considering the number of features.
So I am trying to plot correlation Matrix (already calculated) in python. the table is like below:
And I would like it to look like this:
I am using the Following code in python:
import seaborn as sn
import matplotlib.pyplot as plt
import pandas as pd
data =pd.read_excel('/Desktop/wetchimp_global/corr/correlation_matrix.xlsx')
df = pd.DataFrame(data)
print (df)
corrMatrix = data.corr()
print (corrMatrix)
sn.heatmap(corrMatrix, annot=True)
plt.show()
Note that, the matrix is ready and I don't want to calculate the correlation again! but I failed to do that. Any suggestions?
You are recalculating the correlation with the following line:
corrMatrix = data.corr()
You then go on to utilize this recalculated variable in the heatmap here:
sn.heatmap(corrMatrix, annot=True)
plt.show()
To resolve this, instead of passing in the corrMatrix value which is the recalculated value, pass the pure excel data data or df (as df is just a copy of data). Thus, all the code you should need is:
import seaborn as sn
import matplotlib.pyplot as plt
import pandas as pd
data =pd.read_excel('/Desktop/wetchimp_global/corr/correlation_matrix.xlsx')
sn.heatmap(data, annot=True)
plt.show()
Note that this assumes, however, that your data IS ready for the heatmap as you suggest. As we online do not have access to your data we cannot confirm that.
I have deleted to frist column (names) and add them later so the code is as below:
import seaborn as sn
import matplotlib.pyplot as plt
import pandas as pd
data =pd.read_excel('/Users/yousefalbuhaisi/Desktop/wetchimp_global/corr/correlation_matrix.xlsx')
fig, ax = plt.subplots(dpi=150)
y_axis_labels = ['CLC','GIEMS','GLWD','LPX_BERN','LPJ_WSL','LPJ_WHyME','SDGVM','DLEM','ORCHIDEE','CLM4ME']
sn.heatmap(data,yticklabels=y_axis_labels, annot=True)
plt.show()
and the results are:
hi I'm just starting to use pandas on python to graph some data instead of excel,
i want to customize the colors as well as the opacity of some given data because its always going into its default color lists
heres my code :
from pandas import DataFrame
import matplotlib.pyplot as plt
import numpy as np
x=np.array([[4,8,5,7,6],[2,3,4,2,6],[4,7,4,7,8],[2,6,4,8,6],[2,4,3,3,2]])
df=DataFrame(x, columns=['a','b','c','d','e'], index=[2,4,6,8,10])
df.plot(kind='bar')
plt.show()
You can call df.plot.bar directly and pass a dictionary of column name to color mappings to the color parameter.
from pandas import DataFrame
import matplotlib.pyplot as plt
import numpy as np
x=np.array([[4,8,5,7,6],[2,3,4,2,6],[4,7,4,7,8],[2,6,4,8,6],[2,4,3,3,2]])
df=DataFrame(x, columns=['a','b','c','d','e'], index=[2,4,6,8,10])
df.plot.bar(color={'a':'gold','b':'silver','c':'green','d':'purple','e':'blue'})
plt.show()
I have seen many questions on changing the tick frequency on SO, and that did help when I am building a line chart, but I have been struggling when its a bar chart. So below are my codes
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame(np.random.randint(1,10,(90,1)),columns=['Values'])
df.plot(kind='bar')
plt.show()
and thats the output I see. How do I change the tick frequency ?
(To be more clearer frequency of 5 on x axis!)
Using Pandas plot function you can do:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(1,10,(90,1)),columns=['Values'])
df.plot(kind='bar', xticks=np.arange(0,90,5))
Or better:
df.plot(kind='bar', xticks=list(df.index[0::5]))
I'm puzzled by this Pandas/Matplotlib behaviour:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
series = pd.Series(np.arange(10))
classifier = lambda x: 'Odd' if x%2 else "Even"
grouped = series.groupby(classifier)
grouped.plot(kind='box')
plt.show()
How do I get the boxplots next to each other Pandas style i.e. with nice syntax? :)
(Pandas v. 0.16.2, Matplotlib v. 1.4.3)
Edit:
I know I could do this:
grouped = grouped.apply(pd.Series.to_frame)
but I would assume there's a cleaner way to do this?
So my general advice is to avoid plotting through pandas with the following exceptions:
Super quick 'n' dirty interactive exploration and inspection
Time series
Any other time you'll want to use seaborn or roll your own matplotlib function. Since you're working with a dataframe, seaborn is your best bet, although labeled data support is very quickly coming down the pipe for matplotlib.
I'm also going to advise that you go ahead and create the dataframe with the classification stored inside of it.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn
seaborn.set(style='ticks')
df = pd.DataFrame(np.arange(10), columns=['val'])
df['class'] = df['val'].apply(lambda x: 'Odd' if x%2 else "Even")
seaborn.boxplot(x='class', y='val', data=df, width=0.5)
seaborn.despine(offset=10, trim=True)