Seaborn Plotting Multiple Plots for Groups - python

How do I plot multiple plots for each of the groups (each ID) below with Seaborn? I would like to plot two plots, one underneath the other, one line (ID) per plot.
ID Date Cum Value Daily Value
3306 2019-06-01 100.0 100.0
3306 2019-07-01 200.0 100.0
3306 2019-08-01 350.0 150.0
4408 2019-06-01 200.0 200.0
4408 2019-07-01 375.0 175.0
4408 2019-08-01 400.0 025.0
This only plots both lines together and can look messy if there are 200 unique IDs.
sns.lineplot(x="Date", y="Daily Value",
hue="ID", data=df)

you can use
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame({'id': [3306, 3306, 3306, 4408, 4408, 4408],
'date': ['2019-06-01', '2019-07-01', '2019-08-01', '2019-06-01', '2019-07-01', '2019-08-01'],
'cum': [100, 200, 350, 200, 375, 400],
'daily': [100, 100, 150, 200, 175, 25]
})
g = sns.FacetGrid(df, col = 'id')
g.map(plt.plot, 'date', 'daily')
which gives
but what happens if you have 200 ids?

Related

Plotly ticks are weirdly aligned

Let's take the following pd.DataFrame as an example
df = pd.DataFrame({
'month': ['2022-01', '2022-02', '2022-03'],
'col1': [1_000, 1_500, 2_000],
'col2': [100, 150, 200],
}).melt(id_vars=['month'], var_name='col_name')
which creates
month col_name value
-----------------------------
0 2022-01 col1 1000
1 2022-02 col1 1500
2 2022-03 col1 2000
3 2022-01 col2 100
4 2022-02 col2 150
5 2022-03 col2 200
Now when I would use simple seaborn
sns.barplot(data=df, x='month', y='value', hue='col_name');
I would get:
Now I would like to use plotly and the following code
import plotly.express as px
fig = px.histogram(df,
x="month",
y="value",
color='col_name', barmode='group', height=500, width=1_200)
fig.show()
And I get:
So why are the x-ticks so weird and not simply 2022-01, 2022-02 and 2022-03?
What is happening here?
I found that I always have this problem with the ticks when using color. It somehow messes the ticks up.
You can solve it by customizing the step as 1 month per tick with dtick="M1", as follows:
import pandas as pd
import plotly.express as px
df = pd.DataFrame({
'month': ['2022-01', '2022-02', '2022-03'],
'col1': [1000, 1500, 2000],
'col2': [100, 150, 200],
}).melt(id_vars=['month'], var_name='col_name')
fig = px.bar(df,
x="month",
y="value",
color='col_name', barmode='group', height=500, width=1200)
fig.update_xaxes(
tickformat = '%Y-%m',
dtick="M1",
)
fig.show()

multi-index dataframe causes wide separation between plotted data

I have the follow plot:
my pandas dataset is using multi index pandas, like
bellow is my code:
ax = plt.gca()
df['adjClose'].plot(ax=ax, figsize=(12,4), rot=9, grid=True, label='price', color='orange')
df['ma5'].plot(ax=ax, label='ma5', color='yellow')
df['ma100'].plot(ax=ax, label='ma100', color='green')
# df.plot.scatter(x=df.index, y='buy')
x = pd.to_datetime(df.unstack(level=0).index, format='%Y/%m/%d')
# plt.scatter(x, df['buy'].values)
ax.scatter(x, y=df['buy'].values, label='buy', marker='^', color='red')
ax.scatter(x, y=df['sell'].values, label='sell', marker='v', color='green')
plt.show()
Data from .csv
symbol,date,close,high,low,open,volume,adjClose,adjHigh,adjLow,adjOpen,adjVolume,divCash,splitFactor,ma5,ma100,buy,sell
601398,2020-01-01 00:00:00+00:00,5.88,5.88,5.88,5.88,0,5.2991971571,5.2991971571,5.2991971571,5.2991971571,0,0.0,1.0,,,,
601398,2020-01-02 00:00:00+00:00,5.97,6.03,5.91,5.92,234949400,5.3803073177,5.4343807581,5.3262338773,5.3352461174,234949400,0.0,1.0,,,,
601398,2020-01-03 00:00:00+00:00,5.99,6.02,5.96,5.97,152213050,5.3983317978,5.425368518,5.3712950777,5.3803073177,152213050,0.0,1.0,,,,
601398,2020-01-06 00:00:00+00:00,5.97,6.05,5.95,5.96,226509710,5.3803073177,5.4524052382,5.3622828376,5.3712950777,226509710,0.0,1.0,,,,
the above data is what looks after I have done to save csv, but after reload, it lost original structure like below
The issue, as can be seen in the plot, is the first 3 lines are plotted against the dataframe index, which presents as a tuple. The scatter plots are plotted against datetime values, x, which is not a value on the ax axis, so they're plotted far the to right.
- the axis is a bunch of stacked tuples, like
Don't convert the dataframe to a multi-index. If you're doing something, which creates the multi-index, then do df.reset_index(level=x, inplace=True) where x represents the level where 'symbol' is in the multi-index.
After removing 'symbol' from the index, convert 'date' to a datetime dtype with df.index = pd.to_datetime(df.index).date
Presumably, there's more than one unique 'symbol' in the dataframe, so a separate plot should be drawn for each.
Tested in pandas 1.3.1, python 3.8, and matplotlib 3.4.2
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# load the data from the csv
df = pd.read_csv('file.csv')
# convert date to a datetime format and extract only the date component
df.date = pd.to_datetime(df.date).dt.date
# set date as the index
df.set_index('date', inplace=True)
# this is what the dataframe should look like before plotting
symbol close high low open volume adjClose adjHigh adjLow adjOpen adjVolume divCash splitFactor ma5 ma100 buy sell
date
2020-01-01 601398 5.88 5.88 5.88 5.88 0 5.30 5.30 5.30 5.30 0 0.0 1.0 NaN NaN NaN NaN
2020-01-02 601398 5.97 6.03 5.91 5.92 234949400 5.38 5.43 5.33 5.34 234949400 0.0 1.0 NaN NaN NaN NaN
2020-01-03 601398 5.99 6.02 5.96 5.97 152213050 5.40 5.43 5.37 5.38 152213050 0.0 1.0 NaN NaN NaN NaN
2020-01-06 601398 5.97 6.05 5.95 5.96 226509710 5.38 5.45 5.36 5.37 226509710 0.0 1.0 NaN NaN NaN NaN
# extract the unique symbols
symbols = df.symbol.unique()
# get the number of unique symbols
sym_len = len(symbols)
# create a number of subplots based on the number of unique symbols in df
fig, axes = plt.subplots(nrows=sym_len, ncols=1, figsize=(12, 4*sym_len))
# if there's only 1 symbol, axes won't be iterable, so we put it in a list
if type(axes) != np.ndarray:
axes = [axes]
# iterate through each symbol and plot the relevant data to an axes
for ax, sym in zip(axes, symbols):
# select the data for the relevant symbol
data = df[df.symbol.eq(sym)]
# plot data
data[['adjClose', 'ma5', 'ma100']].plot(ax=ax, title=f'Data for Symbol: {sym}', ylabel='Value')
ax.scatter(data.index, y=data['buy'], label='buy', marker='^', color='red')
ax.scatter(data.index, y=data['sell'], label='sell', marker='v', color='green')
ax.legend(bbox_to_anchor=(1, 1.02), loc='upper left')
fig.tight_layout()
data.high and data.low are plotted for the scatter plots, since data.buy and data.sell are np.nan in the test data.
df can be conveniently created with:
sample = {'symbol': [601398, 601398, 601398, 601398], 'date': ['2020-01-01 00:00:00+00:00', '2020-01-02 00:00:00+00:00', '2020-01-03 00:00:00+00:00', '2020-01-06 00:00:00+00:00'], 'close': [5.88, 5.97, 5.99, 5.97], 'high': [5.88, 6.03, 6.02, 6.05], 'low': [5.88, 5.91, 5.96, 5.95], 'open': [5.88, 5.92, 5.97, 5.96], 'volume': [0, 234949400, 152213050, 226509710], 'adjClose': [5.2991971571, 5.3803073177, 5.3983317978, 5.3803073177], 'adjHigh': [5.2991971571, 5.4343807581, 5.425368518, 5.4524052382], 'adjLow': [5.2991971571, 5.3262338773, 5.3712950777, 5.3622828376], 'adjOpen': [5.2991971571, 5.3352461174, 5.3803073177, 5.3712950777], 'adjVolume': [0, 234949400, 152213050, 226509710], 'divCash': [0.0, 0.0, 0.0, 0.0], 'splitFactor': [1.0, 1.0, 1.0, 1.0], 'ma5': [np.nan, np.nan, np.nan, np.nan], 'ma100': [np.nan, np.nan, np.nan, np.nan], 'buy': [np.nan, np.nan, np.nan, np.nan], 'sell': [np.nan, np.nan, np.nan, np.nan]}
df = pd.DataFrame(sample)
Just find another way to solve my problem:
df = df.unstack(level=0)
This is tested work for me
I think is similar as bellow bases on #Trentons last advise:
df.reset_index(level=0, inplace=True)
df.index = df.index.date

Matplotlib bar chart on datetime index values

I'm having trouble getting the following code to display a bar chart properly. The plot has very thin lines which are not visible until you zoom in, but even then it's not clear. I've tried to control with the width option to plt.bar() but it's not doing anything (e.g. tried 0.1, 1, 365).
Any pointers on what I'm doing wrong would be appreciated.
Many thanks
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
import matplotlib.dates as mdates
plt.close('all')
mydateparser2 = lambda x: pd.datetime.strptime(x, "%m/%d/%Y")
colnames2=['Date','Net sales', 'Cost of sales']
df2 = pd.read_csv(r'account-test.csv', parse_dates = ['Date'] , date_parser = mydateparser2, index_col='Date')
df2= df2.filter(items=colnames2)
df2 = df2.sort_values('Date')
print (df2.info())
print (df2)
fig = plt.figure()
plt.bar(df2.index.values, df2['Net sales'], color='red', label='Net sales' )
plt.ylim(500000,2800000)
plt.show()
plt.legend(loc=4)
Resulting output (to show data types)
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 15 entries, 2005-12-31 to 2019-12-31
Data columns (total 2 columns):
Net sales 15 non-null int64
Cost of sales 15 non-null int64
dtypes: int64(2)
memory usage: 360.0 bytes
None
Net sales Cost of sales
Date
2005-12-31 1161400 907200
2006-12-31 1193100 928300
2007-12-31 1171100 888100
2008-12-31 1324900 1035700
2009-12-31 1108300 859800
2010-12-31 1173600 891000
2011-12-31 1392400 1050300
2012-12-31 1578200 1171500
2013-12-31 1678200 1224200
2014-12-31 1855500 1346700
2015-12-31 1861200 1328400
2016-12-31 2004300 1439700
2017-12-31 1973300 1421500
2018-12-31 2189100 1608300
2019-12-31 2355700 1715300
Maybe you are trying to plot too many bars on a small plot. Try fig = plt.figure(figsize=(12,6) to have a bigger plot. You can also pass width=0.9 to your bar command:
fig, ax = plt.subplots(figsize=(12,6))
df.plot.bar(y='Net sales', width=0.9, ax=ax) # modify width to your liking
Output:

pandas - finding most recent (but previous) date in a second reference dataframe

I have two dataframes and for one I want to find the closest (previous) date in the other.
If the date matches then I need to take the previous date
df_main contains the reference information
For df_sample I want to lookup the Time in df_main for the closest (but previous) entry. I can do this using method='ffill' , but where the date for the Time field is the same day it returns that day - I want it to return the previous - basically a < rather than <=.
In my example df_res I want the closest_val column to contain [ "n/a", 90, 90, 280, 280, 280]
import pandas as pd
dsample = {'Index': [1, 2, 3, 4, 5, 6],
'Time': ["2020-06-01", "2020-06-02", "2020-06-03", "2020-06-04" ,"2020-06-05" ,"2020-06-06"],
'Pred': [100, -200, 300, -400 , -500, 600]
}
dmain = {'Index': [1, 2, 3],
'Time': ["2020-06-01", "2020-06-03","2020-06-06"],
'Actual': [90, 280, 650]
}
def find_closest(x, df2):
df_res = df2.iloc[df2.index.get_loc(x['Time'], method='ffill')]
x['closest_time'] = df_res['Time']
x['closest_val'] = df_res['Actual']
return x
df_sample = pd.DataFrame(data=dsample)
df_main = pd.DataFrame(data=dmain)
df_sample = df_sample.set_index(pd.DatetimeIndex(df_sample['Time']))
df_main = df_main.set_index(pd.DatetimeIndex(df_main['Time']))
df_res = df_sample.apply(find_closest, df2=df_main ,axis=1)
Use pd.merge_asof (make sure 'Time' is indeed a datetime):
pd.merge_asof(dsample, dmain, left_on="Time", right_on="Time", allow_exact_matches=False)
The output is:
Index_x Time Pred Index_y Actual
0 1 2020-06-01 100 NaN NaN
1 2 2020-06-02 -200 1.0 90.0
2 3 2020-06-03 300 1.0 90.0
3 4 2020-06-04 -400 2.0 280.0
4 5 2020-06-05 -500 2.0 280.0
5 6 2020-06-06 600 2.0 280.0
IIUC, we can do a Cartesian product of both your dataframes, then filter out the exact matches, then apply some logic to figure out the closest date.
Finally, we will join your extact, and non exact matches into a final dataframe.
s = pd.merge(
df_sample.assign(key="var1"),
df_main.assign(key="var1").rename(columns={"Time": "TimeDelta"}).drop("Index", 1),
on="key",
how="outer",
).drop("key", 1)
extact_matches = s[s['Time'].eq(s['TimeDelta'])]
non_exact_matches_cart = s[~s['Time'].isin(extact_matches['Time'])]
non_exact_matches = non_exact_matches_cart.assign(
delta=(non_exact_matches_cart["Time"] - non_exact_matches_cart["TimeDelta"])
/ np.timedelta64(1, "D")
).query("delta >= 0").sort_values(["Time", "delta"]).drop_duplicates(
"Time", keep="first"
).drop('delta',1)
alot to take in the above variable, but essentially we are finding the difference in time, removing any difference that goes into the future, and dropping the values keeping the closest date in the past.
df = pd.concat([extact_matches, non_exact_matches], axis=0).sort_values("Time").rename(
columns={"TimeDelta": "closest_time", "Actual": "closest val"}
)
print(df)
Index Time Pred closest_time closest val
0 1 2020-06-01 100 2020-06-01 90
3 2 2020-06-02 -200 2020-06-01 90
7 3 2020-06-03 300 2020-06-03 280
10 4 2020-06-04 -400 2020-06-03 280
13 5 2020-06-05 -500 2020-06-03 280
17 6 2020-06-06 600 2020-06-06 650

Plot Price as Horizontal Line for Non Zero Volume Values

My Code:
import matplotlib.pyplot as plt
plt.style.use('seaborn-ticks')
import pandas as pd
import numpy as np
path = 'C:\\File\\Data.txt'
df = pd.read_csv(path, sep=",")
df.columns = ['Date','Time','Price','volume']
df = df[df.Date == '08/02/2019'].reset_index(drop=True)
df['Volume'] = np.where((df.volume/1000) < 60, 0, (df.volume/1000))
df.plot('Time','Price')
dff = df[df.Volume > 60].reset_index(drop=True)
dff = dff[['Date','Time','Price','Volume']]
print(dff)
plt.subplots_adjust(left=0.05, bottom=0.05, right=0.95, top=0.95, wspace=None, hspace=None)
plt.show()
My Plot Output is as below:
The Output of dff Datframe as below:
Date Time Price Volume
0 08/02/2019 13:39:43 685.35 97.0
1 08/02/2019 13:39:57 688.80 68.0
2 08/02/2019 13:43:50 683.00 68.0
3 08/02/2019 13:43:51 681.65 92.0
4 08/02/2019 13:49:42 689.95 70.0
5 08/02/2019 13:52:00 695.20 64.0
6 08/02/2019 14:56:42 686.25 68.0
7 08/02/2019 15:03:15 685.35 63.0
8 08/02/2019 15:03:31 683.15 69.0
9 08/02/2019 15:08:08 684.00 61.0
I want to plot the Prices of this table as Vertical Lines as per the below image. Any Help..
Based on your image, I think you mean horizontal lines. Either way it's pretty simple, Pyplot has hlines/vlines builtins. In your case, try something like
plt.hlines(dff['Price'], '08/02/2019', '09/02/2019')
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
path = 'File.txt'
df = pd.read_csv(path, sep=",")
df.columns = ['Date','Time','Price','volume']
df = df[df.Date == '05/02/2019'].reset_index(drop=True)
df['Volume'] = np.where((df.volume/7500) < 39, 0, (df.volume/7500))
df["Time"] = pd.to_datetime(df['Time'])
df.plot(x="Time",y='Price', rot=0)
plt.title("Date: " + str(df['Date'].iloc[0]))
dff = df[df.Volume > 39].reset_index(drop=True)
dff = dff[['Date','Time','Price','Volume']]
print(dff)
dict = dff.to_dict('index')
for x in range(0, len(dict)):
plt.axhline(y=dict[x]['Price'],linewidth=2, color='blue')
plt.subplots_adjust(left=0.05, bottom=0.06, right=0.95, top=0.96, wspace=None, hspace=None)
plt.show()

Categories

Resources