bar chart with Matplotlib - python

Here is my data structure:
data = {'2013': {1:25,2:81,3:15}, '2014': {1:28, 2:65, 3:75}, '2015': {1:78,2:91,3:86 }}
My x-axis is the number [1,2,3]
My y-axis is the quantity of each number. For example: In 2013, 1 is x axis while its quantity is 25.
Print each individual graph for each year
I would like to graph a bar chart, which uses matplotlib with legend on it.

import matplotlib.pyplot as plt
import pandas as pd
data = {'2013': {1:25,2:81,3:15}, '2014': {1:28, 2:65, 3:75}, '2015': {1:78,2:91,3:86 }}
df = pd.DataFrame(data)
df.plot(kind='bar')
plt.show()
I like pandas because it takes your data without having to do any manipulation to it and plot it.

You can access the keys of a dictionary via dict.keys() and the values via dict.values()
If you wanted to plot, say, the data for 2013 you can do:
import matplotlib.pyplot as pl
x_13 = data['2013'].keys()
y_13 = data['2013'].values()
pl.bar(x_13, y_13, label = '2013')
pl.legend()
That should do the trick. More elegantly, do can simply do:
year = '2013'
pl.bar(data[year].keys(), data[year].values(), label=year)
which woud allow you to loop it:
for year in ['2013','2014','2015']:
pl.bar(data[year].keys(), data[year].values(), label=year)

You can do this a few ways.
The Functional way using bar():
data = {'2013': {1: 25, 2: 81, 3: 15}, '2014': {1: 28, 2: 65, 3: 75}, '2015': {1: 78, 2: 91, 3: 86}}
df = pd.DataFrame(data)
X_axis = np.arange(len(df))
plt.bar(X_axis - 0.1,height=df["2013"], label='2013',width=.1)
plt.bar(X_axis, height=df["2014"], label='2014',width=.1)
plt.bar(X_axis + 0.1, height=df["2015"], label='2015',width=.1)
plt.legend()
plt.show()
More info here.
The Object-Oriented way using figure():
data = {'2013': {1: 25, 2: 81, 3: 15}, '2014': {1: 28, 2: 65, 3: 75}, '2015': {1: 78, 2: 91, 3: 86}}
df = pd.DataFrame(data)
fig= plt.figure()
axes = fig.add_axes([.1,.1,.8,.8])
X_axis = np.arange(len(df))
axes.bar(X_axis -.25,df["2013"], color ='b', width=.25)
axes.bar(X_axis,df["2014"], color ='r', width=.25)
axes.bar(X_axis +.25,df["2015"], color ='g', width=.25)

Related

Labeling year on time series

I am working on a timeseries plot from data that looks like the following:
import pandas as pd
data = {'index': [1, 34, 78, 900, 1200, 5000, 9001, 12000, 15234, 23432],
'rating': [90, 85, 89, 82, 78, 65, 54, 32, 39, 45],
'Year': [2005, 2005, 2005, 2006, 2006, 2006, 2007, 2008, 2009, 2009]}
df = pd.DataFrame(data)
The main issue is the lack of actual dates. I have plotted the data using the index order - the data is sorted in index-ascending order, the value of the index is meaningless.
I have plotted the data using
import plotly.express as px
fig = px.line(df, x='index', y='rating')
fig.show()
but would like to shade or label each year on the plot (could just be vertical dotted lines separating years, or alternated grey shades beneath the line but above the axis per year).
I am assuming that you have already sorted the DataFrame using the index column.
Here's a solution using bar (column) chart using matplotlib.
import matplotlib.pyplot as plt
import numpy as np
# [optional] create a dictionary of colors with year as keys. It is better if this is dynamically generated if you have a lot of years.
color_cycle = {'2005': 'red', '2006': 'blue', '2007': 'green', '2008': 'orange', '2009': 'purple'}
# I am assuming that the rating data is sorted by index already
# plot rating as a column chart using equal spacing on the x-axis
plt.bar(x=np.arange(len(df)), height=df['rating'], width=0.8, color=[color_cycle[str(year)] for year in df['Year']])
# add Year as x-axis labels
plt.xticks(np.arange(len(df)), df['Year'])
# add labels to the axes
plt.xlabel('Year')
plt.ylabel('Rating')
# display the plot
plt.show()
Outputs

Python: Iterating though dataframe columns as values in a function that prints charts

I'm trying to iterate through numeric fields in a data frame and create two separate bar charts one for Test1 and another for Test2 scores grouped by Name. I have a for loop that I get a type error on. I have a small sample of the data below but this for loop would run for data frame larger than 25 fields. Below is my code and error:
import pandas as pd
import matplotlib.pyplot as plt
data = {'Name': ['Tom', 'Joseph', 'Krish', 'John', 'Tom', 'Joseph', 'Krish', 'John'],
'Test1': [20, 21, 19, 18, 30, 33, 12, 10],
'Test2': [78, 89, 77, 91, 95, 90, 87, 70]}
df = pd.DataFrame(data)
for columns in df.columns[1:]:
data = df[(df.columns > 80 )].groupby(
df.Name, as_index = True).agg(
{columns: "sum"})
fig, (ax) = plt.subplots( figsize = (24,7))
data.plot(kind = 'bar', stacked = False,
ax = ax)
TypeError: '>' not supported between instances of 'str' and 'int'
Your program was having an issue with attempting to compare the data in the "Name" column with the integer value that you had in the variable definition line before it would move along to the other two columns.
data = df[(df.columns > 80 )].groupby(df.Name, as_index = True).agg({columns: "sum"})
The values in that column are strings which makes the function fail. Through some trial and error, I revised your program to just perform comparisons on columns two and three ("Test1" and "Test2"). Following is the revised code.
import pandas as pd
import matplotlib.pyplot as plt
data = {'Name': ['Tom', 'Joseph', 'Krish', 'John', 'Tom', 'Joseph', 'Krish', 'John'],
'Test1': [20, 21, 19, 18, 30, 33, 12, 10],
'Test2': [78, 89, 77, 91, 95, 90, 87, 70]}
df = pd.DataFrame(data)
for columns in df.columns[1:]:
data = df[(df['Test1'] > 20) | (df['Test2'] > 80)].groupby(df.Name, as_index = True).agg({columns: "sum"})
fig, (ax) = plt.subplots( figsize = (24,7))
data.plot(kind = 'bar', stacked = False, ax = ax)
plt.show()
Running that program produced the two bar charts.
You might want to experiment with the comparison values, but I think this should provide you with the information to move forward on your program.
Hope that helped.
Regards.

python seaborn: customize line plot and scatterplot together (also legend)

df = pd.DataFrame({
'id': {0: -3, 1: 2, 2: -3, 3: 1},
'val': {0: 0.4, 1: 0.03, 2: 0.88, 3: 1.3},
'indicator': {0: 'A', 1: 'A', 2: 'B', 3: 'B'},
'count': {0: 40000, 1: 5779, 2: 3000, 3: 31090}
})
df
and I hope to get a plot like the following:
I know with code of following, I can get a close plot but I want to have the line size also varies with "count" variable, but when I tried to add size = 'count', I did not get a meaningful plot and also for the legend, I want to only have one legend for "indicator" rather than two:
plt.figure()
sns.lineplot(x = 'id', y = 'val', hue = 'indicator', data = df)
sns.scatterplot(x = 'id', y = 'val', hue = 'indicator', size = 'count', data = df)
To answer the second part of your question - you can disable the lineplot legend like so:
sns.lineplot(x = 'id', y = 'val', hue = 'indicator', data = df, legend=False)
This will leave you with two legend groups - one for colours and one for sizes. This is the easiest way, but you can also tinker with plt.legend() and build your own from scratch.
As for making the lines vary their thickness dynamically from one point to another, I don't think you can do it using seaborn. For something like that you'd need a more low-level library, like bokeh or use matplotlib directly to draw connecting lines between line markers, adjusting for their varying size.

Colors for Python (seaborn): colors without adding to DataFrame

slov = {'People': {0: 'Ivan', 1: 'John', 2: 'Peter', 3: 'Ming'}, 'Country':{0: 'Russia', 1: 'USA', 2: 'USA', 3: 'China'},\
'Height': {0: 181, 1: 175, 2: 174, 3: 173}}
I'd like to see this picture
But I don't know how to do it. I mean I want to have a red color for Russian people, Green color for USA people and yellow color for Chines.
My attemp to find solution:
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
sns.set(style="white")
slov = {'People': {0: 'Ivan', 1: 'John', 2: 'Peter', 3: 'Ming'}, 'Country':{0: 'Russia', 1: 'USA', 2: 'USA', 3: 'China'},\
'Height': {0: 181, 1: 175, 2: 174, 3: 173}}
obj = pd.DataFrame(slov)
palette=["g", "b", "r"]
obj['Color']='r'
row_index = obj.Country == 'Russia'
obj.loc[row_index, 'Color'] = 'r'
row_index = obj.Country == 'USA'
obj.loc[row_index, 'Color'] = 'g'
row_index = obj.Country == 'China'
obj.loc[row_index, 'Color'] = 'y'
g = sns.factorplot(x="People", y="Height", data=obj, kind='bar', palette=obj['Color'])
plt.show()
And maybe my solution is not very good. I added color to DataFrame. Maybe we can write this better. Maybe I don't need to add color to my DataFrame (It seems not very correct.). But How can I solve my task without adding these colors to my DataFrame?
You can use map by dict:
d = {'Russia':'r', 'USA':'g','China':'y'}
g = sns.factorplot(x="People",
y="Height",
data=obj,
kind='bar',
palette=obj['Country'].map(d))
plt.show()

Matplotlib vs PivotChart: Grouped Axis Labels

How can I format Matplotlib plots on multi-indexed data to resemble Excel's PivotChart axis layout? Excel's PivotChart feature groups similar axis labels together, whereas MPL labels each tick individually as (Index1,Index2). Using the Sample Data, I've provided the outputs for both Excel and MPL; notice how Index1 is grouped in the Excel chart, but not in the MPL plot.
data = {
'Index1': {0: 'A', 1: 'A', 2: 'B', 3: 'B'},
'Index2': {0: 1, 1: 2, 2: 1, 3: 2},
'Value': {0: 50, 1: 100, 2: 50, 3: 100}
}
Matplotlib Chart
Excel Chart
Does anyone have a solution? Ideally, the number of multi-index levels will not matter. Thanks for the help!

Categories

Resources