Create Altair Chart of Value Counts of Multiple Columns - python

Using Altair charting, how can I create a chart of value_counts() of multiple columns? This is easily done by matplotlib. How can the identical chart be created using Altair?
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({'Col1':[0,1,2,3],
'Col2':[0,1,2,2],
'Col3':[2,3,3,3]})
pd.DataFrame({col:df[col].value_counts(normalize=True) for col in df}).plot(kind='bar')

You could do this:
import pandas as pd
import altair as alt
df = pd.DataFrame({
'Col1':[0,1,2,3],
'Col2':[0,1,2,2],
'Col3':[2,3,3,3]
}).melt(var_name='column')
alt.Chart(df).mark_bar().encode(
x='column',
y='count()',
column='value:O',
color='column'
)
In the next major release of Altair you can use the offset channels instead of faceting as in this example.

Related

Can I make a pie chart based on indexes in Python?

Could you please help me if you know how to make a pie chart in Python from it?
This is a reproducible example how the df looks like. However, I have way more rows over there.
import pandas as pd
data = [["70%"], ["20%"], ["10%"]]
example = pd.DataFrame(data, columns = ['percentage'])
example.index = ['Lasiogl', 'Centella', 'Osmia']
example
You can use matplotlib to plot the pie chart using dataframe and its indexes as labels of the chart:
import matplotlib.pyplot as plt
import pandas as pd
data = ['percentage':["70%"], ["20%"], ["10%"]]
example = pd.DataFrame(data, columns = ['percentage'])
my_labels = 'Lasiogl', 'Centella', 'Osmia'
plt.pie(example,labels=my_labels,autopct='%1.1f%%')
plt.show()

Unable to change the tick frequency on my chart

I have seen many questions on changing the tick frequency on SO, and that did help when I am building a line chart, but I have been struggling when its a bar chart. So below are my codes
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame(np.random.randint(1,10,(90,1)),columns=['Values'])
df.plot(kind='bar')
plt.show()
and thats the output I see. How do I change the tick frequency ?
(To be more clearer frequency of 5 on x axis!)
Using Pandas plot function you can do:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(1,10,(90,1)),columns=['Values'])
df.plot(kind='bar', xticks=np.arange(0,90,5))
Or better:
df.plot(kind='bar', xticks=list(df.index[0::5]))

Line chart using plotly

The following is my dataframe.
How can I create a line using plotly where x-axis contains the years and y-axis contains the production value of that specific year
IIUC, this code should do the job:
import plotly.graph_objects as go
import pandas as pd
df = pd.DataFrame({'Minerals':['Nat. Gas'],
'2013':[5886],
'2014':[5258],
'2015':[5214],
'2016':[5073],
'2017':[5009],})
fig = go.Figure(data=go.Scatter(x=df.columns[1:],
y=df.loc[0][1:]))
fig.show()
and you get:

Pandas: seaborn countplot from several columns

I have a dataframe with several categorical columns. I know how to do countplot which routinly plots ONE column.
Q: how to plot maximum count from ALL columns in one plot?
here is an exemplary dataframe to clarify the question:
import pandas as pd
import numpy as np
import seaborn as sns
testdf=pd.DataFrame(({ 'Ahome' : pd.Categorical(["home"]*10),
'Bsearch' : pd.Categorical(["search"]*8 + ["NO"]*2),
'Cbuy' : pd.Categorical(["buy"]*5 + ["NO"]*5),
'Dcheck' : pd.Categorical(["check"]*3 + ["NO"]*7),
} ))
testdf.head(10)
sns.countplot(data=testdf,x='Bsearch');
The last line is just using normal countplot for one column. I'd like to have the columns category (home,search,buy and check) in x-axis and their frequency in y-axis.
You need to use countplot as below:
df = pd.melt(testdf)
sns.countplot(data=df.loc[df['value']!="NO"], x='variable', hue='value')
Output:
As #HarvIpan points out, using melt you would create a long-form dataframe with the column names as entries. Calling countplot on this dataframe produces the correct plot.
As a difference to the existing solution, I would recommend not to use the hue argument at all.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df=pd.DataFrame(({ 'Ahome' : pd.Categorical(["home"]*10),
'Bsearch' : pd.Categorical(["search"]*8 + ["NO"]*2),
'Cbuy' : pd.Categorical(["buy"]*5 + ["NO"]*5),
'Dcheck' : pd.Categorical(["check"]*3 + ["NO"]*7),
} ))
df2 = df.melt(value_vars=df.columns)
df2 = df2[df2["value"] != "NO"]
sns.countplot(data=df2, x="variable")
plt.show()

Seaborn visualize groups

How can I plot this data frame using seaborn to show the KPI per model?
allFrame = pd.DataFrame({'modelName':['first','second', 'third'],
'kpi_1':[1,2,3],
'kpi_2':[2,4,3]})
Not like sns.barplot(x="kpi2", y="kpi1", hue="modelName", data=allFrame)
But rather like this per KPI
Try melting the dataframe first, and then you can plot using seaborn:
import pandas as pd
import seaborn as sns
allFrame = pd.DataFrame({'modelName':['first','second', 'third'],
'kpi_1':[1,2,3],
'kpi_2':[2,4,3]})
allFrame2 = pd.melt(frame=allFrame,
id_vars=['modelName'],
value_vars=["kpi_1","kpi_2"],
value_name="Values", var_name="kpis")
sns.barplot(x="kpis", y="Values", hue="modelName", data=allFrame2)
Thanks!

Categories

Resources