Python Change axis on Multi Histogram plot - python

I have a pandas dataframe df for which I plot a multi-histogram as follow :
df.hist(bins=20)
This give me a result that look like this (Yes this exemple is ugly since there is only one data per histogram, sorry) :
I have a subplot for each numerical column of my dataframe.
Now I want all my histograms to have an X-axis between 0 and 1. I saw that the hist() function take a ax parameter, but I cannot manage to make it work.
How is it possible to do that ?
EDIT :
Here is a minmal example :
import pandas as pd
import matplotlib.pyplot as plt
myArray = [(0,0,0,0,0.5,0,0,0,1),(0,0,0,0,0.5,0,0,0,1)]
myColumns = ['col1','col2','col3','co4','col5','col6','col7','col8','col9']
df = pd.DataFrame(myArray,columns=myColumns)
print(df)
df.hist(bins=20)
plt.show()

Here is a solution that works, but for sure is not ideal:
import pandas as pd
import matplotlib.pyplot as plt
myArray = [(0,0,0,0,0.5,0,0,0,1),(0,0,0,0,0.5,0,0,0,1)]
myColumns = ['col1','col2','col3','co4','col5','col6','col7','col8','col9']
df = pd.DataFrame(myArray,columns=myColumns)
print(df)
ax = df.hist(bins=20)
for x in ax:
for y in x:
y.set_xlim(0,1)
plt.show()

Related

Transposing x and y axes with matplotlib and pandas

I'm trying to use a bar chart to visualize my csv data. The data looks like this:
question,count_1,count_2,count_3,count_4,count_5
Q1,0,0,6,0,0
Q2,6,0,0,0,0
Q3,3,2,1,0,0
Q4,0,0,6,0,0
Q5,6,0,0,0,0
Q6,0,6,0,0,0
Q7,6,0,0,0,0
Q8,0,0,0,5,1
Q9,1,4,0,0,1
Q10,0,0,1,5,0
Here is my code
import pandas as pd
import csv
import matplotlib.pyplot as plt
df = pd.read_csv('example.csv')
ax = df.set_index(['question']).plot.bar(stacked=True)
ax.legend(loc='best')
plt.show()
Which gives me:
What I'm trying to do is flip the x and y axes. I want the bars to be horizontal and y axis to be the questions. I tried to transpose my data frame using:
ax = df.set_index(['question']).T.plot.bar(stacked=True)
but that gives me:
which is not what I want. Can anyone help?
to get the bars horizontally (flip the x and y axis), you need to use barh (horizontal bar). More info here. So, the code would be...
import pandas as pd
import csv
import matplotlib.pyplot as plt
df = pd.read_csv('example.csv')
ax = df.set_index(['question']).plot.barh(stacked=True)
ax.legend(loc='best')
plt.show()
Output plot

How to make a distplot for each column in a pandas dataframe

I 'm using Seaborn in a Jupyter notebook to plot histograms like this:
import numpy as np
import pandas as pd
from pandas import DataFrame
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
df = pd.read_csv('CTG.csv', sep=',')
sns.distplot(df['LBE'])
I have an array of columns with values that I want to plot histogram for and I tried plotting a histogram for each of them:
continous = ['b', 'e', 'LBE', 'LB', 'AC']
for column in continous:
sns.distplot(df[column])
And I get this result - only one plot with (presumably) all histograms:
My desired result is multiple histograms that looks like this (one for each variable):
How can I do this?
Insert plt.figure() before each call to sns.distplot() .
Here's an example with plt.figure():
Here's an example without plt.figure():
Complete code:
# imports
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [6, 2]
%matplotlib inline
# sample time series data
np.random.seed(123)
df = pd.DataFrame(np.random.randint(-10,12,size=(300, 4)), columns=list('ABCD'))
datelist = pd.date_range(pd.datetime(2014, 7, 1).strftime('%Y-%m-%d'), periods=300).tolist()
df['dates'] = datelist
df = df.set_index(['dates'])
df.index = pd.to_datetime(df.index)
df.iloc[0]=0
df=df.cumsum()
# create distplots
for column in df.columns:
plt.figure() # <==================== here!
sns.distplot(df[column])
Distplot has since been deprecated in seaborn versions >= 0.14.0. You can, however, use sns.histplot() to plot histogram distributions of the entire dataframe (numerical features only) in the following way:
fig, axes = plt.subplots(2,5, figsize=(15, 5))
ax = axes.flatten()
for i, col in enumerate(df.columns):
sns.histplot(df[col], ax=ax[i]) # histogram call
ax[i].set_title(col)
# remove scientific notation for both axes
ax[i].ticklabel_format(style='plain', axis='both')
fig.tight_layout(w_pad=6, h_pad=4) # change padding
plt.show()
If, you specifically want a way to estimate the probability density function of a continuous random variable using the Kernel Density Function (mimicing the default behavior of sns.distplot()), then inside the sns.histplot() function call, add kde=True, and you will have curves overlaying the histograms.
Also works when looping with plt.show() inside:
for column in df.columns:
sns.distplot(df[column])
plt.show()

Bar plot and coloured categorical variable

I have a dataframe with 3 variables:
data= [["2019/oct",10,"Approved"],["2019/oct",20,"Approved"],["2019/oct",30,"Approved"],["2019/oct",40,"Approved"],["2019/nov",20,"Under evaluation"],["2019/dec",30,"Aproved"]]
df = pd.DataFrame(data, columns=['Period', 'Observations', 'Result'])
I want a barplot grouped by the Period column, showing all the values ​​contained in the Observations column and colored with the Result column.
How can I do this?
I tried the sns.barplot, but it joined the values in Observations column in just one bar(mean of the values).
sns.barplot(x='Period',y='Observations',hue='Result',data=df,ci=None)
Plot output
Assuming that you want one bar for each row, you can do as follows:
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
result_cat = df["Result"].astype("category")
result_codes = result_cat.cat.codes.values
cmap = plt.cm.Dark2(range(df["Result"].unique().shape[0]))
patches = []
for code in result_cat.cat.codes.unique():
cat = result_cat.cat.categories[code]
patches.append(mpatches.Patch(color=cmap[code], label=cat))
df.plot.bar(x='Period',
y='Observations',
color=cmap[result_codes],
legend=False)
plt.ylabel("Observations")
plt.legend(handles=patches)
If you would like it grouped by the months, and then stacked, please use the following (note I updated your code to make sure one month had more than one status), but not sure I completely understood your question correctly:
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
data= [["2019/oct",10,"Approved"],["2019/oct",20,"Approved"],["2019/oct",30,"Approved"],["2019/oct",40,"Under evaluation"],["2019/nov",20,"Under evaluation"],["2019/dec",30,"Aproved"]]
df = pd.DataFrame(data, columns=['Period', 'Observations', 'Result'])
df.groupby(['Period', 'Result'])['Observations'].sum().unstack('Result').plot(kind='bar', stacked=True)

Plotting data with categorical x and y axes in python

I have a list of case and control samples along with the information about what characteristics are present or absent in each of them. A dataframe including the information can be generated by Pandas:
import pandas as pd
df={'Patient':[True,True,False],'Control':[False,True,False]} # Presence/absence data for three genes for each sample
df=pd.DataFrame(df)
df=df.transpose()
df.columns=['GeneA','GeneB','GeneC']
I need to visualize this data as a dotplot/scatterplot in the way that both of the x and y axis to be categorical and presence/absence to be coded by different shapes. Something like following:
Patient| x x -
Control| - x -
__________________
GeneA GeneB GeneC
I am new to Matplotlib/seaborn and I can plot simple line plots and scatter plots. But searching online I could not find any instructions or plot similar to what I need here.
A quick way would be:
import pandas as pd
import matplotlib.pyplot as plt
df={'Patient':[1,1,0],'Control':[0,1,0]} # Presence/absence data for three genes for each sample
df=pd.DataFrame(df)
df=df.transpose()
df.columns=['GeneA','GeneB','GeneC']
heatmap = plt.imshow(df)
plt.xticks(range(len(df.columns.values)), df.columns.values)
plt.yticks(range(len(df.index)), df.index)
cbar = plt.colorbar(mappable=heatmap, ticks=[0, 1], orientation='vertical')
# vertically oriented colorbar
cbar.ax.set_yticklabels(['Absent', 'Present'])
Thanks to #DEEPAK SURANA for adding labels to the colorbar.
I searched the pyplot documentation and could not find a scatter or dot plot exactly like you described. Here is my take on creating a plot that illustrates what you want. The True records are blue and the False records are red.
# creating dataframe and extra column because index is not numeric
import pandas as pd
df={'Patient':[True,True,False],
'Control':[False,True,False]}
df=pd.DataFrame(df)
df=df.transpose()
df.columns=['GeneA','GeneB','GeneC']
df['level'] = [i for i in range(0, len(df))]
print(df)
# plotting the data
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(10,6))
for idx, gene in enumerate(df.columns[:-1]):
df_gene = df[[gene, 'level']]
cList = ['blue' if x == True else 'red' for x in df[gene]]
for inr_idx, lv in enumerate(df['level']):
ax.scatter(x=idx, y=lv, c=cList[inr_idx], s=20)
fig.tight_layout()
plt.yticks([i for i in range(len(df.index))], list(df.index))
plt.xticks([i for i in range(len(df.columns)-1)], list(df.columns[:-1]))
plt.show()
Something like this might work
import pandas as pd
import numpy as np
from matplotlib.ticker import FixedLocator
df={'Patient':[1,1,0],'Control':[0,1,0]} # Presence/absence data for three genes for each sample
df=pd.DataFrame(df)
df=df.transpose()
df.columns=['GeneA','GeneB','GeneC']
plot = df.T.plot()
loc = FixedLocator([0,1,2])
plot.xaxis.set_major_locator(loc)
plot.xaxis.set_ticklabels(df.columns)
look at https://matplotlib.org/examples/pylab_examples/major_minor_demo1.html
and https://matplotlib.org/api/ticker_api.html
I think you have to convert the boolean values to zeros and ones to make it work. Someting like df.astype(int)

How to plot DataFrames? in Python

I'm trying to plot a DataFrame, but I'm not getting the results I need. This is an example of what I'm trying to do and what I'm currently getting. (I'm new in Python)
import pandas as pd
import matplotlib.pyplot as plt
my_data = {1965:{'a':52, 'b':54, 'c':67, 'd':45},
1966:{'a':34, 'b':34, 'c':35, 'd':76},
1967:{'a':56, 'b':56, 'c':54, 'd':34}}
df = pd.DataFrame(my_data)
df.plot( style=[])
plt.show()
I'm getting the following graph, but what I need is: the years in the X axis and each line must be what is currently in X axis (a,b,c,d). Thanks for your help!!.
import pandas as pd
import matplotlib.pyplot as plt
my_data = {1965:{'a':52, 'b':54, 'c':67, 'd':45},
1966:{'a':34, 'b':34, 'c':35, 'd':76},
1967:{'a':56, 'b':56, 'c':54, 'd':34}}
df = pd.DataFrame(my_data)
df.T.plot( kind='bar') # or df.T.plot.bar()
plt.show()
Updates:
If this is what you want:
df = pd.DataFrame(my_data)
df.columns=[str(x) for x in df.columns] # convert year numerical values to str
df.T.plot()
plt.show()
you can do it this way:
ax = df.T.plot(linewidth=2.5)
plt.locator_params(nbins=len(df.columns))
ax.xaxis.set_major_formatter(mtick.FormatStrFormatter('%4d'))

Categories

Resources