How to use sns.jointplot form specific data? - python

I want to use jointplot from seaborn. I am using the following code but I do not Understand sns.jointplot command. I want in X axis to have g and in y axis to have Years.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('My.txt', sep="\t", header=None)
df.info
print(df)
Years=df[0]
date=df[1]
Lat=df[2]
Lon=df[3]
D=df[4]
g=df[5]

sns.jointplot(data=df, x=5 y=0)
should work? Otherwise, you will need to provide a Minimal, Complete, and Verifiable example that includes a toy dataset (refer to How to make good reproducible pandas examples)

Related

Plot Correlation Table imported from excel with Python

So I am trying to plot correlation Matrix (already calculated) in python. the table is like below:
And I would like it to look like this:
I am using the Following code in python:
import seaborn as sn
import matplotlib.pyplot as plt
import pandas as pd
data =pd.read_excel('/Desktop/wetchimp_global/corr/correlation_matrix.xlsx')
df = pd.DataFrame(data)
print (df)
corrMatrix = data.corr()
print (corrMatrix)
sn.heatmap(corrMatrix, annot=True)
plt.show()
Note that, the matrix is ready and I don't want to calculate the correlation again! but I failed to do that. Any suggestions?
You are recalculating the correlation with the following line:
corrMatrix = data.corr()
You then go on to utilize this recalculated variable in the heatmap here:
sn.heatmap(corrMatrix, annot=True)
plt.show()
To resolve this, instead of passing in the corrMatrix value which is the recalculated value, pass the pure excel data data or df (as df is just a copy of data). Thus, all the code you should need is:
import seaborn as sn
import matplotlib.pyplot as plt
import pandas as pd
data =pd.read_excel('/Desktop/wetchimp_global/corr/correlation_matrix.xlsx')
sn.heatmap(data, annot=True)
plt.show()
Note that this assumes, however, that your data IS ready for the heatmap as you suggest. As we online do not have access to your data we cannot confirm that.
I have deleted to frist column (names) and add them later so the code is as below:
import seaborn as sn
import matplotlib.pyplot as plt
import pandas as pd
data =pd.read_excel('/Users/yousefalbuhaisi/Desktop/wetchimp_global/corr/correlation_matrix.xlsx')
fig, ax = plt.subplots(dpi=150)
y_axis_labels = ['CLC','GIEMS','GLWD','LPX_BERN','LPJ_WSL','LPJ_WHyME','SDGVM','DLEM','ORCHIDEE','CLM4ME']
sn.heatmap(data,yticklabels=y_axis_labels, annot=True)
plt.show()
and the results are:

Unable to change the tick frequency on my chart

I have seen many questions on changing the tick frequency on SO, and that did help when I am building a line chart, but I have been struggling when its a bar chart. So below are my codes
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame(np.random.randint(1,10,(90,1)),columns=['Values'])
df.plot(kind='bar')
plt.show()
and thats the output I see. How do I change the tick frequency ?
(To be more clearer frequency of 5 on x axis!)
Using Pandas plot function you can do:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(1,10,(90,1)),columns=['Values'])
df.plot(kind='bar', xticks=np.arange(0,90,5))
Or better:
df.plot(kind='bar', xticks=list(df.index[0::5]))

Plotting a bar chart

I have an imported excel file in python and want to create a bar chart.
In the bar chart, I want the bars to be separated by profit, 0-10, 10-20, 20-30...
How do I do this?
this is one of the things I have tried:
import NumPy as np
import matplotlib.pyplot as plt
%matplotlib inline
df.plot(kind="bar",x="profit", y="people")
df[df.profit<=10]
plt.show()
and:
df[df.profit range (10,20)]
It is a bit difficult to help you better without a sample of your data, but I constructed a dataset randomly that should have the shape of yours, so that this solution can hopefully be useful to you:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# For random data
import random
%matplotlib inline
df = pd.DataFrame({'profit':[random.choice([i for i in range(100)]) for x in range(100)], 'people':[random.choice([i for i in range(100)]) for x in range(100)]})
display(df)
out = pd.cut(df['profit'], bins=[x*10 for x in range(10)], include_lowest=True)
ax = out.value_counts(sort=False).plot.bar(rot=0, color="b", figsize=(14,4))
plt.xlabel("Profit")
plt.ylabel("People")
plt.show()
I had a look at another question on here (Pandas bar plot with binned range) and there they explained how this issue can be solved.
Hope it helps :)

Make pandas plot() show xlabel and xvalues

I am using the standard pandas.df.plot() function to plot two columns in a dataframe. For some reason, the x-axis values and the xlabel are not visible! There seem to be no options to turn them on in the function either (see https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.plot.html).
Does someone know what is going on, and how to correct it?
import matplotlib.cm as cm
import pandas as pd
ax1 = df.plot.scatter(x='t', y='hlReference', c='STEP_STRENGTH', cmap=cm.autumn);
gives this:
This is a bug with Jupyter notebooks displaying pandas scatterplots that have a colorscale displayed while using Matplotlib as the plotting backend.
#june-skeeter has a solution in the answers that works. Alternatively, pass sharex=False to df.plot.scatter and you don't need to create subplots.
import matplotlib.cm as cm
import pandas as pd
X = np.random.rand(10,3)
df = pd.DataFrame(X,columns=['t','hlReference', 'STEP_STRENGTH'])
df.plot.scatter(
x='t',
y='hlReference',
c='STEP_STRENGTH',
cmap=cm.autumn,
sharex=False
)
See discussion in this closed pandas issues. Which references the above solution in a related SO answer.
Still an issue with pandas v1.1.0. You can track the issue here: https://github.com/pandas-dev/pandas/issues/36064
Create your axes instance first and then send it as an argument to the plot()
import matplotlib.cm as cm
import pandas as pd
X = np.random.rand(10,3)
df = pd.DataFrame(X,columns=['t','hlReference', 'STEP_STRENGTH'])
fig,ax1=plt.subplots()
df.plot.scatter(x='t', y='hlReference', c='STEP_STRENGTH', cmap=cm.autumn,ax=ax1)

Python pandas groupby boxplots overlap

I'm puzzled by this Pandas/Matplotlib behaviour:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
series = pd.Series(np.arange(10))
classifier = lambda x: 'Odd' if x%2 else "Even"
grouped = series.groupby(classifier)
grouped.plot(kind='box')
plt.show()
How do I get the boxplots next to each other Pandas style i.e. with nice syntax? :)
(Pandas v. 0.16.2, Matplotlib v. 1.4.3)
Edit:
I know I could do this:
grouped = grouped.apply(pd.Series.to_frame)
but I would assume there's a cleaner way to do this?
So my general advice is to avoid plotting through pandas with the following exceptions:
Super quick 'n' dirty interactive exploration and inspection
Time series
Any other time you'll want to use seaborn or roll your own matplotlib function. Since you're working with a dataframe, seaborn is your best bet, although labeled data support is very quickly coming down the pipe for matplotlib.
I'm also going to advise that you go ahead and create the dataframe with the classification stored inside of it.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn
seaborn.set(style='ticks')
df = pd.DataFrame(np.arange(10), columns=['val'])
df['class'] = df['val'].apply(lambda x: 'Odd' if x%2 else "Even")
seaborn.boxplot(x='class', y='val', data=df, width=0.5)
seaborn.despine(offset=10, trim=True)

Categories

Resources