I've created a Brownian motion and then I have taken the last values of 1000 entries repeated 10000 times. I was able to plot the histogram using the following code as follows:
import seaborn as sns
import matplotlib.pyplot as plt
\\BM represents list of values generated by the Brownian motion
fig, (ax1,ax2) = plt.subplots(2)
ax1.hist(BM[:,-1],12)
I've been able to draw the KDE as follows, however i unable to merge the two diagrams together. Can someone please help me?
sns.kdeplot(data=BM[:,-1])
Try with sns.kdeplot(BM['col1']) where 'col1' is the name of the column you want to plot.
I'll give you a reproducible example that works for me.
import seaborn as sns
import pandas as pd
import numpy as np
BM = pd.DataFrame(np.array([-0.00871515, -0.0001227 , -0.01449098, 0.01808527, 0.00074193, 0.01145541]
, columns=['col1'])
BM.head(2)
col1
0 -0.008715
1 -0.000123
sns.kdeplot(BM['col1'])
Edit based on your additional question:
To have the histogram and a kde plot use this one:
sns.distplot(BM['col1'])
Related
This question already has answers here:
How to plot in multiple subplots
(12 answers)
Closed 6 months ago.
I am trying to create subplots inside for loop for various columns of the dataset. I am using the California housing dataset from sklearn. So, there are 4 columns and I want to display three figures for each column in a subplot. I have provided the code which I have tried. Can somebody help me with this issue? Can we make it dynamic so that if I need to add more figure then we can add easily with title?
from sklearn.datasets import fetch_california_housing
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
california_housing = fetch_california_housing(as_frame=True)
# california_housing.frame.head()
features_of_interest = ["AveRooms", "AveBedrms", "AveOccup", "Population"]
california_housing.frame[features_of_interest]
fig, axes = plt.subplots(4, 3)
for cols in features_of_interest:
# scatterplot
sns.scatterplot(x=california_housing.frame[cols], y=california_housing.target)
# histogram
sns.histplot(x=california_housing.frame[cols], y=california_housing.target)
#qqplot
sm.qqplot(california_housing.frame[cols], line='45')
plt.show()
There are some problems with your code:
you need to import statsmodels.api as sm
you need to use the ax parameter from scatterplot, histplot, and qqplot to indicate where the plot will be present
the way that you load the data isnot allowing matplotlib and seaborn to use the data. I made some changes on this part.
you do not need to show on each iteration just at the end.
from sklearn.datasets import fetch_california_housing
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import statsmodels.api as sm
california_housing = fetch_california_housing(as_frame=True).frame
features_of_interest = ["AveRooms", "AveBedrms", "AveOccup", "Population"]
fig, axes = plt.subplots(len(features_of_interest), 3)
for i, cols in enumerate(features_of_interest):
# scatterplot
sns.scatterplot(x=california_housing[cols], y=california_housing['MedHouseVal'], ax=axes[i,0])
# histogram
sns.histplot(x=california_housing[cols], y=california_housing['MedHouseVal'], ax=axes[i,1])
#qqplot
sm.qqplot(california_housing[cols], line='45', ax=axes[i,2])
plt.show()
PS.: I used len(features_of_interest) to auto-adapt our script considering the number of features.
So I am trying to plot correlation Matrix (already calculated) in python. the table is like below:
And I would like it to look like this:
I am using the Following code in python:
import seaborn as sn
import matplotlib.pyplot as plt
import pandas as pd
data =pd.read_excel('/Desktop/wetchimp_global/corr/correlation_matrix.xlsx')
df = pd.DataFrame(data)
print (df)
corrMatrix = data.corr()
print (corrMatrix)
sn.heatmap(corrMatrix, annot=True)
plt.show()
Note that, the matrix is ready and I don't want to calculate the correlation again! but I failed to do that. Any suggestions?
You are recalculating the correlation with the following line:
corrMatrix = data.corr()
You then go on to utilize this recalculated variable in the heatmap here:
sn.heatmap(corrMatrix, annot=True)
plt.show()
To resolve this, instead of passing in the corrMatrix value which is the recalculated value, pass the pure excel data data or df (as df is just a copy of data). Thus, all the code you should need is:
import seaborn as sn
import matplotlib.pyplot as plt
import pandas as pd
data =pd.read_excel('/Desktop/wetchimp_global/corr/correlation_matrix.xlsx')
sn.heatmap(data, annot=True)
plt.show()
Note that this assumes, however, that your data IS ready for the heatmap as you suggest. As we online do not have access to your data we cannot confirm that.
I have deleted to frist column (names) and add them later so the code is as below:
import seaborn as sn
import matplotlib.pyplot as plt
import pandas as pd
data =pd.read_excel('/Users/yousefalbuhaisi/Desktop/wetchimp_global/corr/correlation_matrix.xlsx')
fig, ax = plt.subplots(dpi=150)
y_axis_labels = ['CLC','GIEMS','GLWD','LPX_BERN','LPJ_WSL','LPJ_WHyME','SDGVM','DLEM','ORCHIDEE','CLM4ME']
sn.heatmap(data,yticklabels=y_axis_labels, annot=True)
plt.show()
and the results are:
I have seen many questions on changing the tick frequency on SO, and that did help when I am building a line chart, but I have been struggling when its a bar chart. So below are my codes
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame(np.random.randint(1,10,(90,1)),columns=['Values'])
df.plot(kind='bar')
plt.show()
and thats the output I see. How do I change the tick frequency ?
(To be more clearer frequency of 5 on x axis!)
Using Pandas plot function you can do:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(1,10,(90,1)),columns=['Values'])
df.plot(kind='bar', xticks=np.arange(0,90,5))
Or better:
df.plot(kind='bar', xticks=list(df.index[0::5]))
I have an imported excel file in python and want to create a bar chart.
In the bar chart, I want the bars to be separated by profit, 0-10, 10-20, 20-30...
How do I do this?
this is one of the things I have tried:
import NumPy as np
import matplotlib.pyplot as plt
%matplotlib inline
df.plot(kind="bar",x="profit", y="people")
df[df.profit<=10]
plt.show()
and:
df[df.profit range (10,20)]
It is a bit difficult to help you better without a sample of your data, but I constructed a dataset randomly that should have the shape of yours, so that this solution can hopefully be useful to you:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# For random data
import random
%matplotlib inline
df = pd.DataFrame({'profit':[random.choice([i for i in range(100)]) for x in range(100)], 'people':[random.choice([i for i in range(100)]) for x in range(100)]})
display(df)
out = pd.cut(df['profit'], bins=[x*10 for x in range(10)], include_lowest=True)
ax = out.value_counts(sort=False).plot.bar(rot=0, color="b", figsize=(14,4))
plt.xlabel("Profit")
plt.ylabel("People")
plt.show()
I had a look at another question on here (Pandas bar plot with binned range) and there they explained how this issue can be solved.
Hope it helps :)
How can I achieve that using matplotlib?
Here is my code with the data you provided. As there's no class [they are all different, despite your first example in your question does have classes], I gave colors based on the numbers. You can definitely start alone from here, whatever result you want to achieve. You just need pandas, seaborn and matplotlib:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# import xls
df=pd.read_excel('data.xlsx')
# exclude Ranking values
df1 = df.ix[:,1:-1]
# for each element it takes the value of the xls cell
df2=df1.applymap(lambda x: float(x.split('\n')[1]))
# now plot it
df_heatmap = df2
fig, ax = plt.subplots(figsize=(15,15))
sns.heatmap(df_heatmap, square=True, ax=ax, annot=True, fmt="1.3f")
plt.yticks(rotation=0,fontsize=16);
plt.xticks(fontsize=12);
plt.tight_layout()
plt.savefig('dfcolorgraph.png')
Which produces the following picture.