So I'm currently made a clustering for a dataset for Facebook and I put a label data for each row with each cluster that I have and the data frame looks like this
so I would like to plot the data into a stacked bar chart
so I did group the data like
dfff=x_df.groupby("cluster")["page_type"].value_counts()
and the output like this
cluster page_type
0 government 5387
company 3231
politician 3149
tvshow 1679
1 government 563
company 9
politician 2
2 company 3255
politician 2617
tvshow 1648
government 930
Name: page_type, dtype: int64
so how can I plot this series into a stacked bar chart of 3 columns (0 ,1 ,2) which they are the cluster that I have?
In order to produce a stacked bar plot, .unstack the groupby dataframe, dfff.
pandas User Guide: Visualization
import pandas as pd
import matplotlib.pyplot as plt
# given dfff and a groupby dataframe
dfp = dfff.unstack()
# display(dfp)
page_type company government politician tvshow
id
0 3231.0 5387.0 3149.0 1679.0
1 9.0 563.0 2.0 NaN
2 3255.0 930.0 2617.0 1648.0
# plot stacked bar
dfp.plot.bar(stacked=True)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
Seaborn look
import matplotlib.pyplot as plt
# set style parameter
plt.style.use('seaborn')
# plot stacked bar
dfp.plot.bar(stacked=True)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
Related
i'm trying to assess the displacement of a particular fish on the seabed according to seasonality. Thus, i would like to create a map with different colored points according to the month in which the detection occured (e.g., all points from August in blue, all points from Sept in red, all points from Oct in yellow).
In my dataframe i have both coordinates for each point (Lat, Lon) and the dates (Dates) of detection:
LAT
LON
Dates
0
49.302005
-67.684971
2019-08-06
1
49.302031
-67.684960
2019-08-12
2
49.302039
-67.684983
2019-08-21
3
49.302039
-67.684979
2019-08-30
4
49.302041
-67.684980
2019-09-03
5
49.302041
-67.684983
2019-09-10
6
49.302042
-67.684979
2019-09-18
7
49.302043
-67.684980
2019-09-25
8
49.302045
-67.684980
2019-10-01
9
49.302045
-67.684983
2019-10-09
10
49.302048
-67.684979
2019-10-14
11
49.302049
-67.684981
2019-10-21
12
49.302049
-67.684982
2019-10-29
Would anyone know how to create this kind of map? I know to create a simple map with all points, but i really wonder how plot points associated to the date of detection.
Thank you very much
Here's one way to do it entirely with Pandas and matplotlib:
import pandas as pd
from matplotlib import pyplot as plt
# I'll just create some fake data for the exmaple
df = pd.DataFrame(
{
"LAT": [49.2, 49.2, 49.3, 45.6, 467.8],
"LON": [-67.7, -68.1, -65.2, -67.8, -67.4],
"Dates": ["2019-08-06", "2019-08-03", "2019-07-17", "2019-06-12", "2019-05-29"]})
}
)
# add a column containing the months
df["Month"] = pd.DatetimeIndex(df["Dates"]).month
# make a scatter plot with the colour based on the month
fig, ax = plt.subplots()
ax = df.plot.scatter(x="LAT", y="LON", c="Month", ax=ax, colormap="viridis")
fig.show
If you want the months as names rather than indexes, and a slightly more fancy plot (e.g., with a legend labelling the dates) using seaborn, you could do:
import seaborn as sns
# get month as name
df["Month"] = pd.to_datetime(df["Dates"]).dt.strftime("%b")
fig, ax = plt.subplots()
sns.scatterplot(df, x="LAT", y="LON", hue="Month", ax=ax)
fig.show()
There is this boring dataframe with stock data I have:
date close MA100 buy sell
2022-02-14 324.95 320.12 0 0
2022-02-13 324.87 320.11 1 0
2022-02-12 327.20 321.50 0 0
2022-02-11 319.61 320.71 0 1
Then I am plotting the prices
import pandas as pd
import matplotlib.pyplot as plt
df = ...
df['close'].plot()
df['MA100'].plot()
plt.show()
So far so good...
Then I'd like to show a marker on the chart if there was buy (green) or sell (red) on that day.
It's just to highlight if there was a transaction on that day. The exact intraday price at which the trade happened is not important.
So the x/y-coordinates could be the date and the close if there is a 1 in column buy (sell).
I am not sure how to implement this.
Would I need a loop to iterate over all rows where buy = 1 (sell = 1) and then somehow add these matches to the plot (probably with annotate?)
I'd really appreciate it if someone could point me in the right direction!
You can query the data frame for sell/buy and scatter plot:
fig, ax = plt.subplots()
df.plot(x='date', y=['close', 'MA100'], ax=ax)
df.query("buy==1").plot.scatter(x='date', y='close', c='g', ax=ax)
df.query("sell==1").plot.scatter(x='date', y='close', c='r', ax=ax)
Output:
My dataset is like this
Days Visitors
Tuesday 23
Monday 30
Sunday 120
Friday 2
Friday 30
Tuesday 13
Monday 20
Saturday 100
How can I plot a histogram for this dataset, but assume it as a large dataset(560030 rows), not just only these values.
Actually I want to have days on x-axis and Visitors on Y-axis.
Use seaborn, which is an API for matplotlib.
seaborn.histplot
seaborn.displot
This will show the distribution of the number of visitors for each day of the week.
sns.histplot
import seaborn as sns
import pandas as pd
import numpy as np # for test data
import random # for test data
import calendar # for test data
# test dataframe
np.random.seed(365)
random.seed(365)
df = pd.DataFrame({'Days': random.choices(calendar.day_name, k=1000), 'Visitors': np.random.randint(1, 121, size=(1000))})
# display(df.head(6))
Days Visitors
0 Friday 83
1 Sunday 53
2 Saturday 34
3 Wednesday 92
4 Tuesday 45
5 Wednesday 6
# plot the histogram
sns.histplot(data=df, x='Visitors', hue='Days', multiple="stack")
Once the histogram is plotted, if the legend needs to be moved, use of the workaround found in seaborn issue: Not clear how to reposition seaborn.histplot legend #2280, may be necessary.
sns.displot
This option most clearly conveys the daily distribution of visitor counts
sns.displot(data=df, col='Days', col_wrap=4, x='Visitors')
Barplot
seaborn.barplot
This will show the sum of all visits for a given day
sns.barplot(data=df, x='Days', y='Visitors', estimator=sum, ci=None)
plt.xticks(rotation=90)
I'm trying to plot the following dataframe with Bokeh (data_frame in the code), in my example I only have two columns 0 and 1 (and Dates which is the x-axis). But in my real dataset I have more than 10, so I'm trying to find a better version than mine which does not generalize well. (I thought of a for loop but it doesn't seem optimal)
from bokeh.plotting import figure, show
from bokeh.charts import TimeSeries
from bokeh.io import output_notebook
output_notebook()
data_frame = pd.DataFrame({0: [0.17, 0.189, 0.185, 0.1657], 1: [0.05, 0.0635, 0.0741, 0.0925], 'Date': [2004, 2005, 2006, 2007]})
p = figure(x_axis_label = 'date',
y_axis_label='Topics Distribution')
p.circle(data_frame.Date, data_frame.iloc[:, 0])
p.circle(data_frame.Date, data_frame.iloc[:, 1])
show(p)
I've tried this as well, but it does not work and I don't want lines only points:
p = TimeSeries(data_frame, index='Date', legend=True,
title = 'T', ylabel='topics distribution')
Thanks for your help!
Let's try a different approach and see if this makes a little more sense:
Reshape the data to be in a
"tidy" data format
Use Bokeh high-level Scatter chart with color argument
Code:
chartdata = data_frame.set_index('Date').stack().reset_index().rename(columns={'level_1':'Category',0:'Value'})
print(chartdata)
Output "tidy" data format:
Date Category Value
0 2004 0 0.1700
1 2004 1 0.0500
2 2005 0 0.1890
3 2005 1 0.0635
4 2006 0 0.1850
5 2006 1 0.0741
6 2007 0 0.1657
7 2007 1 0.0925
Build chart:
from bokeh.charts import Scatter
p = Scatter(chartdata, x='Date', y='Value', color='Category',xlabel='date', ylabel='Topics Distribution')
I have a pivot table (i.e):
City Atlanta New York Chicago
Region name Slow Grid Fathe
2010-01 1 2 3
2010-02 3 15 23
... ...
2016-01 12 1 0
when I try to plot some values with the following:
pivot.ix['2016-01'].plot(kind='barh',
figsize=(7, 10),
width=0.8,
fontsize=10,
colormap='autumn')
I get the following graph:
How to change the code to plot this graph ascendingly?
Adding .sort_values() in between the pivot.ix[] and the .plot() calls will sort the Series returned by .ix[] before it's plotted and should give you the result you want.
pivot.ix['2016-01'].sort_values().plot(kind='barh',
figsize=(7, 10),
width=0.8,
fontsize=10,
colormap='autumn')