plot graphs horizontally when using df.groupby.plot.bar - python

I want to graph 3 plots horizontally side by side
Three graphs are generated using the code below:
df.groupby(df.col1, pd.cut[0,1,2]).col2.mean().plot.bar()
df1.groupby(df.col1, pd.cut[0,1,2]).col2.mean().plot.bar()
df2.groupby(df.col1, pd.cut[0,1,2]).col2.mean().plot.bar()
I'm not sure where to set axes in this case. Any help would be appreciated.

You may simply use pandas' barh function.
df.groupby(pd.cut(df.col1, [0,1,2]).col2.mean().plot.barh()
This is an example, using this approach to create a dataframe with random samples:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df.groupby(pd.cut(df.A, [0,10,20,30,40,50,60,70,80,90,100])).A.mean().plot.barh()
This snippet outputs the following plot:

Related

How to create a heatmap of Pandas dataframe in Python

I'm trying to create a heatmap and I am following the following question:
Making heatmap from pandas DataFrame
My dataframe looks like the following picture:
I tried the following code:
years = ["1860","1870", "1880","1890","1900","1910","1920","1930","1940","1950","1960","1970","1980","1990","2000"]
kantons = ["AG","AI","AR","BE","BL","BS","FR","GE","GL","GR","JU","LU","NE","NW","OW","SG","SH","SO","SZ","TG","TI","UR","VD","VS","ZG","ZH"]
df = pd(abs(dfYears), index=years, columns=kantons)
which gives the exception that:
"AG" can not be used as float
So I thought if I need to drop the index column which is not possible.
Any suggestions?
When replicating similar data, you can do:
import pandas as pd
import numpy as np
years = ["1860","1870", "1880","1890","1900","1910","1920","1930","1940","1950","1960","1970","1980","1990","2000"]
kantons = ["AG","AI","AR","BE","BL","BS","FR","GE","GL","GR","JU","LU","NE","NW","OW","SG","SH","SO","SZ","TG","TI","UR","VD","VS","ZG","ZH"]
df = pd.DataFrame(np.random.randint(low=10000, high=200000, size=(15, 26)), index=years, columns=kantons)
df.style.background_gradient(cmap='Reds')
Pandas has some Builtin Styles for the most common visualization needs. .background_gradient function is a simple way for highlighting cells based on their values. cmap parameter determines the color map based on the matplotlib colormaps.

Unable to draw KDE on python

I've created a Brownian motion and then I have taken the last values of 1000 entries repeated 10000 times. I was able to plot the histogram using the following code as follows:
import seaborn as sns
import matplotlib.pyplot as plt
\\BM represents list of values generated by the Brownian motion
fig, (ax1,ax2) = plt.subplots(2)
ax1.hist(BM[:,-1],12)
I've been able to draw the KDE as follows, however i unable to merge the two diagrams together. Can someone please help me?
sns.kdeplot(data=BM[:,-1])
Try with sns.kdeplot(BM['col1']) where 'col1' is the name of the column you want to plot.
I'll give you a reproducible example that works for me.
import seaborn as sns
import pandas as pd
import numpy as np
BM = pd.DataFrame(np.array([-0.00871515, -0.0001227 , -0.01449098, 0.01808527, 0.00074193, 0.01145541]
, columns=['col1'])
BM.head(2)
col1
0 -0.008715
1 -0.000123
sns.kdeplot(BM['col1'])
Edit based on your additional question:
To have the histogram and a kde plot use this one:
sns.distplot(BM['col1'])

How to plot time series graph in jupyter?

I have tried to plot the data in order to achieve something like this:
But I could not and I just achieved this graph with plotly:
Here is the small sample of my data
Does anyone know how to achieve that graph?
Thanks in advance
You'll find a lot of good stuff on timeseries on plotly.ly/python. Still, I'd like to share some practical details that I find very useful:
organize your data in a pandas dataframe
set up a basic plotly structure using fig=go.Figure(go.Scatter())
Make your desired additions to that structure using fig.add_traces(go.Scatter())
Plot:
Code:
import plotly.graph_objects as go
import pandas as pd
import numpy as np
# random data or other data sources
np.random.seed(123)
observations = 200
timestep = np.arange(0, observations/10, 0.1)
dates = pd.date_range('1/1/2020', periods=observations)
val1 = np.sin(timestep)
val2=val1+np.random.uniform(low=-1, high=1, size=observations)#.tolist()
# organize data in a pandas dataframe
df= pd.DataFrame({'Timestep':timestep, 'Date':dates,
'Value_1':val1,
'Value_2':val2})
# Main plotly figure structure
fig = go.Figure([go.Scatter(x=df['Date'], y=df['Value_2'],
marker_color='black',
opacity=0.6,
name='Value 1')])
# One of many possible additions
fig.add_traces([go.Scatter(x=df['Date'], y=df['Value_1'],
marker_color='blue',
name='Value 2')])
# plot figure
fig.show()

Make pandas plot() show xlabel and xvalues

I am using the standard pandas.df.plot() function to plot two columns in a dataframe. For some reason, the x-axis values and the xlabel are not visible! There seem to be no options to turn them on in the function either (see https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.plot.html).
Does someone know what is going on, and how to correct it?
import matplotlib.cm as cm
import pandas as pd
ax1 = df.plot.scatter(x='t', y='hlReference', c='STEP_STRENGTH', cmap=cm.autumn);
gives this:
This is a bug with Jupyter notebooks displaying pandas scatterplots that have a colorscale displayed while using Matplotlib as the plotting backend.
#june-skeeter has a solution in the answers that works. Alternatively, pass sharex=False to df.plot.scatter and you don't need to create subplots.
import matplotlib.cm as cm
import pandas as pd
X = np.random.rand(10,3)
df = pd.DataFrame(X,columns=['t','hlReference', 'STEP_STRENGTH'])
df.plot.scatter(
x='t',
y='hlReference',
c='STEP_STRENGTH',
cmap=cm.autumn,
sharex=False
)
See discussion in this closed pandas issues. Which references the above solution in a related SO answer.
Still an issue with pandas v1.1.0. You can track the issue here: https://github.com/pandas-dev/pandas/issues/36064
Create your axes instance first and then send it as an argument to the plot()
import matplotlib.cm as cm
import pandas as pd
X = np.random.rand(10,3)
df = pd.DataFrame(X,columns=['t','hlReference', 'STEP_STRENGTH'])
fig,ax1=plt.subplots()
df.plot.scatter(x='t', y='hlReference', c='STEP_STRENGTH', cmap=cm.autumn,ax=ax1)

Plot each column of Pandas dataframe pairwise against one column

I have a pandas dataframe where one of the columns is a set of labels that I would like to plot each of the other columns against in subplots. In other words, I want the y-axis of each subplot to use the same column, called 'labels', and I want a subplot for each of the remaining columns with the data from each column on the x-axis. I expected the following code snippet to achieve this, but I don't understand why this results in a single nonsensical plot:
examples.plot(subplots=True, layout=(-1, 3), figsize=(20, 20), y='labels', sharey=False)
The problem with that code is that you didn't specify an x value. It seems nonsensical because it's plotting the labels column against an index from 0 to the number of rows. As far as I know, you can't do what you want in pandas directly. You might want to check out seaborn though, it's another visualization library that has some nice grid plotting helpers.
Here's an example with your data:
import pandas as pd
import seaborn as sns
import numpy as np
examples = pd.DataFrame(np.random.rand(10,4), columns=['a', 'b', 'c', 'labels'])
g = sns.PairGrid(examples, x_vars=['a', 'b', 'c'], y_vars='labels')
g = g.map(plt.plot)
This creates the following plot:
Obviously it doesn't look great with random data, but hopefully with your data it will look better.

Categories

Resources