I have this table of data:
Date, City, Sales, PSale1, PSale2
Where PSale1, and PSale2 are predicted prices for sales.
The data set looks like this:
Date,City,Sales,PSale1,PSale2
01/01,NYC,200,300,178
01/01,SF,300,140,100
02/01,NYC,400,410,33
02/01,SF,42,24,60
I want to plot this data in seaborn, I was working on to plot a graph which will have date on the x-axis, and PSale1 and PSale2 as bars for each city on y-axis, but I am confused how to work with such a data.
As much as I know seaborn doesn't allow plotting two variables on any axis, how should I approach this situation?
Convert the dataframe to a long format
Plot using seaborn.FacetGrid
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
data = {'Date': ['01/01', '01/01', '02/01', '02/01'],
'City': ['NYC', 'SF', 'NYC', 'SF'],
'Sales': [200, 300, 400, 42],
'PSale1': [300, 140, 410, 24],
'PSale2': [178, 100, 33, 60]}
df = pd.DataFrame(data)
# convert to long
dfl = df.set_index(['Date', 'City', 'Sales']).stack().reset_index().rename(columns={'level_3': 'pred', 0: 'value'})
# plot
g = sns.FacetGrid(dfl, row='City')
g.map(sns.barplot, 'Date', 'value', 'pred').add_legend()
plt.show()
All in one figure
# shape the dataframe
dfc = df.drop(columns=['Sales']).set_index(['Date', 'City']).stack().unstack(level=1)
dfc.columns.name = None
# plot
dfc.plot.bar(stacked=True)
plt.xlabel('Date: Predicted')
plt.show()
Related
I am working on a timeseries plot from data that looks like the following:
import pandas as pd
data = {'index': [1, 34, 78, 900, 1200, 5000, 9001, 12000, 15234, 23432],
'rating': [90, 85, 89, 82, 78, 65, 54, 32, 39, 45],
'Year': [2005, 2005, 2005, 2006, 2006, 2006, 2007, 2008, 2009, 2009]}
df = pd.DataFrame(data)
The main issue is the lack of actual dates. I have plotted the data using the index order - the data is sorted in index-ascending order, the value of the index is meaningless.
I have plotted the data using
import plotly.express as px
fig = px.line(df, x='index', y='rating')
fig.show()
but would like to shade or label each year on the plot (could just be vertical dotted lines separating years, or alternated grey shades beneath the line but above the axis per year).
I am assuming that you have already sorted the DataFrame using the index column.
Here's a solution using bar (column) chart using matplotlib.
import matplotlib.pyplot as plt
import numpy as np
# [optional] create a dictionary of colors with year as keys. It is better if this is dynamically generated if you have a lot of years.
color_cycle = {'2005': 'red', '2006': 'blue', '2007': 'green', '2008': 'orange', '2009': 'purple'}
# I am assuming that the rating data is sorted by index already
# plot rating as a column chart using equal spacing on the x-axis
plt.bar(x=np.arange(len(df)), height=df['rating'], width=0.8, color=[color_cycle[str(year)] for year in df['Year']])
# add Year as x-axis labels
plt.xticks(np.arange(len(df)), df['Year'])
# add labels to the axes
plt.xlabel('Year')
plt.ylabel('Rating')
# display the plot
plt.show()
Outputs
I am having trouble plotting a timeseries plot using seaborn. By adding hue to the plot the timeseries plot breaks. I can't figure out why the time series plot break in between and how can I stop dates from overlapping.
The code to replicate the issue is below:
test_data = {'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-06', '2021-01-08', '2021-01-09'],
'Price':[20, 10, 30, 40, 25, 23, 56],
'Kind': ['Pre', 'Pre', 'Pre', 'Pre', 'Current', 'Post', 'Post']}
test_df = pd.DataFrame(test_data)
test_df['Date'] = pd.to_datetime(test_df['Date'])
sns.lineplot(data=test_df, x="Date", y="Price", hue='Kind')
How can I fix the line break and dates overlapping?
Try adding the style and markers arguments to handle the isolated point with "Kind" == "Current":
import seaborn as sns
import matplotlib.pyplot as plt
fig = plt.figure()
sns.lineplot(data=test_df, x="Date", y="Price", style='Kind', markers=['o', 'o', 'o'], hue='Kind')
fig.autofmt_xdate()
which displays the plot:
I have made a pie chart using an excel sheet but it is coming out incomplete. I am not sure of the reason. Here is the code:
import matlotplib.pyplot as plt
import pandas as pd
import numpy as np
Employee=pd.read_excel("C:\\Users\\Jon\\Desktop\\data science\\Employee.xlsx")
Employee
colors = ["#1f77b4", "#ff7f0e"]
group_by_departments=Employee.groupby("Department").count().reset_index()
sizes = group_by_departments['Gender']
labels = group_by_departments['Department']
plt.pie(sizes, labels=labels, colors = colors,autopct='%.2f %%')
plt.show()
You can use .size() to get the count for each group. You'll need to group by Department and Gender simultaneously to obtain the individual counts of all the subgroups.
Here is some example code:
from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
N = 100
Employee = pd.DataFrame({'Gender': np.random.choice(['Male', 'Female'], N),
'Department': np.random.choice(['IT', 'Sales', 'HR', 'Finance'], N),
'Age': np.random.randint(20, 65, N),
'Salary': np.random.randint(20, 100, N) * 1000})
colors = ["turquoise", "tomato"]
group_by_departments_and_gender = Employee.groupby(["Department", "Gender"]).size().reset_index(name='Counts')
sizes = group_by_departments_and_gender['Counts']
labels = [f'{dept}\n {gender}' for dept, gender in group_by_departments_and_gender[['Department', 'Gender']].values]
plt.pie(sizes, labels=labels, colors=colors, autopct='%.2f %%')
plt.tight_layout()
plt.show()
PS: You could assign a color per gender via:
colors = ["magenta" if gender=="Male" else "deepskyblue" for gender in group_by_departments_and_gender["Gender"]]
This especially helps in case one of the genders wouldn't be present in one of the departments.
I have a simple pandas DataFrame as shown below. I want to create a scatter plot of value on the y-axis, date on the x-axis, and color the points by category. However, coloring the points isn't working.
# Create dataframe
df = pd.DataFrame({
'date': ['2016-01-01', '2016-02-01', '2016-03-01', '2016-01-01', '2016-02-01', '2016-03-01'],
'category': ['Wholesale', 'Wholesale', 'Wholesale', 'Retail', 'Retail', 'Retail'],
'value': [50, 60, 65, 55, 62, 70]
})
df['date'] = pd.to_datetime(df['date'])
# Try to plot
df.plot.scatter(x='date', y='value', c='category')
ValueError: 'c' argument must be a mpl color, a sequence of mpl colors or a sequence of numbers, not ['Wholesale' 'Wholesale' 'Wholesale' 'Retail' 'Retail' 'Retail'].
Why am a I getting the error? Pandas scatter plot documentation says the argument c can be "A column name or position whose values will be used to color the marker points according to a colormap."
df.plot.scatter(x='date', y='value', c=df['category'].map({'Wholesale':'red','Retail':'blue'}))
I think you are looking at seaborn:
import seaborn as sns
sns.scatterplot(data=df, x='date', y='value', hue='category')
Output:
Or you can loop through df.groupby:
fig, ax = plt.subplots()
for cat, d in df.groupby('category'):
ax.scatter(x=d['date'],y=d['value'], label=cat)
Output:
For example you might want data like:
DATE,KEY,VALUE
2019-01-01,REVENUE,100
2019-01-01,COST,100.1
...
plotted as a time series BAR chart with little space in between the bars and no labels except for dates. The popup or legend would show you what the REV,COST cols were.
Basic bar chart with alt.Column, alt.X, alt.Y works but the labels and grouping are wrong. Is it possible to make the Column groups correspond to the x-axis and hide the X axis labels?
EDIT:
Latest best:
import altair as alt
import pandas as pd
m = 100
data = pd.DataFrame({
'DATE': pd.date_range('2019-01-01', freq='D', periods=m),
'REVENUE': np.random.randn(m),
'COST': np.random.randn(m),
}).melt('DATE', var_name='KEY', value_name='VALUE')
bars = alt.Chart(data, width=10).mark_bar().encode(
y=alt.Y('VALUE:Q', title=None),
x=alt.X('KEY:O', axis=None),
color=alt.Color('KEY:O', scale=alt.Scale(scheme='category20')),
tooltip=['DATE', 'KEY', 'VALUE'],
)
(bars).facet(
column=alt.Column(
'yearmonthdate(DATE):T', header=alt.Header(labelOrient="bottom",
labelAngle=-45,
format='%b %d %Y'
)
),
align="none",
spacing=0,
).configure_header(
title=None
).configure_axis(
grid=False
).configure_view(
strokeOpacity=0
)
Another post because I can't seem to add multiple images to the original one.
This is another way with another flaw: the bars are overlapping. Notice the dates however are handled properly because this is using an actual axis.
import altair as alt
import pandas as pd
import numpy as np
m = 250
data = pd.DataFrame({
'DATE': pd.date_range('2019-01-01', freq='D', periods=m),
'REVENUE': np.random.randn(m),
'COST': np.random.randn(m),
}).melt('DATE', var_name='KEY', value_name='VALUE')
# Create a selection that chooses the nearest point & selects based on x-value
nearest = alt.selection(type='single', nearest=True, on='mouseover',
fields=['REVENUE'], empty='none')
# The basic line
line = alt.Chart(data).mark_bar(interpolate='basis').encode(
x='DATE:T',
y='VALUE:Q',
color='KEY:N'
).configure_bar(opacity=0.5)
line
You can create a grouped bar chart using a combination of encodings and facets, and you can adjust the axis titles and scales to customize the appearance. Here is an examle (replicating https://vega.github.io/editor/#/examples/vega/grouped-bar-chart in Altair, as you mentioned in your comment):
import altair as alt
import pandas as pd
data = pd.DataFrame([
{"category":"A", "position":0, "value":0.1},
{"category":"A", "position":1, "value":0.6},
{"category":"A", "position":2, "value":0.9},
{"category":"A", "position":3, "value":0.4},
{"category":"B", "position":0, "value":0.7},
{"category":"B", "position":1, "value":0.2},
{"category":"B", "position":2, "value":1.1},
{"category":"B", "position":3, "value":0.8},
{"category":"C", "position":0, "value":0.6},
{"category":"C", "position":1, "value":0.1},
{"category":"C", "position":2, "value":0.2},
{"category":"C", "position":3, "value":0.7}
])
text = alt.Chart(data).mark_text(dx=-10, color='white').encode(
x=alt.X('value:Q', title=None),
y=alt.Y('position:O', axis=None),
text='value:Q'
)
bars = text.mark_bar().encode(
color=alt.Color('position:O', legend=None, scale=alt.Scale(scheme='category20')),
)
(bars + text).facet(
row='category:N'
).configure_header(
title=None
)
original answer:
I had trouble parsing from your question exactly what you're trying to do (in the future please consider including a code snippet demonstrating what you've tried and pointing out why the result is not sufficient), but here is an example of a bar chart with data of this form, that has x axis labeled by only date, with a tooltip and legend showing the revenue and cost:
import altair as alt
import pandas as pd
data = pd.DataFrame({
'DATE': pd.date_range('2019-01-01', freq='D', periods=4),
'REVENUE': [100, 200, 150, 50],
'COST': [150, 125, 75, 80],
}).melt('DATE', var_name='KEY', value_name='VALUE')
alt.Chart(data).mark_bar().encode(
x='yearmonthdate(DATE):O',
y='VALUE',
color='KEY',
tooltip=['KEY', 'VALUE'],
)