I have managed to manipulate my plotting data to render the corresponding time series plot. But I am not quite satisfied with the current output because it is not easy to understand the newly generated plot.
my current data and my output:
here is my data looks like:
update
here is my sketch code that shaped above plot data:
df=df.groupby(['date'])['qty1'].sum().reset_index()
df['year'] = pd.DatetimeIndex(df['date']).year
df['month'] = pd.DatetimeIndex(df['date']).month
plot_data=df.groupby(['year', 'month'])['qty1'].sum().unstack().fillna(0)
plot_data.plot(kind='line')
and based on this data, I am getting this plot:
but this is not what I expected for.
desired plot:
Here is the plot that I actually want it:
I didn't get this plot. How can I get this? any idea?
Is this what you are looking for ?
import pandas as pd
import matplotlib.pyplot as plt
import calendar
%matplotlib inline
df = pd.DataFrame(dic) #dic is the dictionary you provided in the github link
df.columns = [str(i) for i in range(1,13)]
df = df.T
df.columns = ['2014','2015','2016','2017','2018']
df['Avg'] = df.mean(axis =1)
fig,ax = plt.subplots(figsize = (15,7))
plt.plot(df.index,df['2016'], marker='s',color = 'green', linewidth = 1, label="2016")
plt.plot(df.index,df['2017'],"bo-", linewidth = 1.5, label="2017")
plt.plot(df.index,df['2018'], marker='s', ms =10, color = 'red', linewidth = 3, label="2018")
plt.plot(df.index,df['Avg'], "--", color = 'grey', linewidth = 8, label="5-Yr-Avg")
plt.xlabel('\n Months\n', fontsize = 25, color = 'lightslategrey')
plt.legend(frameon = False, loc = "lower center", ncol=len(df.columns), fontsize = 14)
plt.grid(axis='y')
ax.set_xticklabels([calendar.month_abbr[i] for i in range(1,13)])
plt.tick_params( left = False, labelsize= 13)
ax.spines['left'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.show()
Related
I would like to show the distribution of Income based on location and whether that user left or not. For this task which graph should I use. How can I show the distribution of numeric columns according to 2 other categorical columns?
You can use seaborn.FacetGrid in order to quickly organize a subplot with two columns: one for users who left and the other for the ones who didn't. Then you can use a hue in order to distinguish locations:
g = sns.FacetGrid(data = df, col = 'Left', hue = 'Location')
g.map(sns.histplot, 'Income').add_legend()
Complete code
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
base = {'Germany': 120000, 'France': 100000, 'Spain': 80000}
def func(df):
return base[df['Location']] + 10000*np.random.randn() - df['Left']*5000*np.random.randn()
N = 1000
df = pd.DataFrame()
df['Location'] = np.random.choice(a = ['France', 'Germany', 'Spain'], size = N)
df['Left'] = np.random.choice(a = [0, 1], size = N)
df['Income'] = df.apply(func, axis = 1)
g = sns.FacetGrid(data = df, col = 'Left', hue = 'Location')
g.map(sns.histplot, 'Income').add_legend()
plt.show()
Another solution, suggested by #JohanC in the comment, is to use a violinplot, where on x axis you have different locations and on y axis the income, using the hue in order to distinguish users who left and the other for the ones who didn't (moreover violins are splitted by hue in two halves):
fig, ax = plt.subplots()
sns.violinplot(ax = ax, data = df, x = 'Location', y = 'Income', hue = 'Left', split = True)
plt.show()
If you are not allowed to use seaborn, you can achieve a similar result of the first example by using only matplotlib through a loop over different locations:
fig, ax = plt.subplots(1, 2, sharex = 'all', sharey = 'all', figsize = (8, 4))
for location in df['Location'].unique():
ax[0].hist(x = df[(df['Location'] == location) & (df['Left'] == 0)]['Income'], label = location, alpha = 0.7, edgecolor = 'black')
ax[1].hist(x = df[(df['Location'] == location) & (df['Left'] == 1)]['Income'], label = location, alpha = 0.7, edgecolor = 'black')
ax[0].set_title('Left = 0')
ax[1].set_title('Left = 1')
ax[0].set_xlabel('Income')
ax[1].set_xlabel('Income')
ax[0].set_ylabel('Count')
ax[1].legend(title = 'Location', loc = 'upper left', bbox_to_anchor = (1.05, 1))
plt.tight_layout()
plt.show()
I'm trying to remake an existing animated line graph I made where each line has a uniquely scaled y-axis - one on the left, one on the right. The graph is comparing the value of two cryptocurrencies that have vastly different sizes (eth/btc), which is why I need multiple scales to actually see changes.
My data has been formatted in a pd df (numbers here are random):
Date ETH Price BTC Price
0 2020-10-30 00:00:00 0.155705 1331.878496
1 2020-10-31 00:00:00 0.260152 1337.174272
.. ... ... ...
290 2021-08-15 16:42:09 0.141994 2846.719819
[291 rows x 3 columns]
And code is roughly:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as ani
color = ['cyan', 'orange', 'red']
fig = plt.figure()
plt.xticks(rotation=45, ha="right", rotation_mode="anchor")
plt.subplots_adjust(bottom = 0.2, top = 0.9)
plt.ylabel('Coin Value (USD)')
plt.xlabel('Date')
def buildChart(i=int):
df1 = df.set_index('Date', drop=True)
plt.legend(["ETH Price", "BTC Price"])
p = plt.plot(df1[:i].index, df1[:i].values)
for i in range(0,2):
p[i].set_color(color[i])
animator = ani.FuncAnimation(fig, buildChart, interval = 10)
plt.show()
Resulting Animation
I tried to create a second axis with a twin x to the first axis.
color = ['cyan', 'orange', 'blue']
fig, ax1 = plt.subplots() #Changes over here
plt.xticks(rotation=45, ha="right", rotation_mode="anchor")
plt.subplots_adjust(bottom = 0.2, top = 0.9)
plt.ylabel('Coin Value (USD)')
plt.xlabel('Date')
def buildChart(i=int):
df1 = df.set_index('Date', drop=True)
plt.legend(["ETH Price", "Bitcoin Price"])
data1 = df1.iloc[:i, 0:1] # Changes over here
# ------------- More Changes Start
ax2 = ax1.twinx()
ax2.set_ylabel('Cost of Coin (USD)')
data2 = df1.iloc[:i, 1:2]
ax2.plot(df1[:i].index, data2)
ax2.tick_params(axis='y')
# -------------- More Changes End
p = plt.plot(df1[:i].index, data1)
for i in range(0,1):
p[i].set_color(color[i])
import matplotlib.animation as ani
animator = ani.FuncAnimation(fig, buildChart, interval = 10)
plt.show()
Resulting Animation After Changes
Current issues:
X-Axis start at ~1999 rather than late 2020
---- Causes all changes on the y-axis to be a nearly vertical line
Left Y-Axis label is on a scale of 0-1?
Right y-axis labels are recurring, overlapping, moving.
I believe my approach to making a second scale must have been wrong to get this many errors, but this seems like the way to do it.
I re-structured your code in order to easily set up a secondary axis animation.
Here the code of the animation with a single y axis:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
df = pd.DataFrame({'date': pd.date_range(start = '2020-01-01', end = '2020-04-01', freq = 'D')})
df['ETH'] = 2*df.index + 300 + 100*np.random.randn(len(df))
df['BTC'] = 5*df.index + 13000 + 200*np.random.randn(len(df))
def update(i):
ax.cla()
ax.plot(df.loc[:i, 'date'], df.loc[:i, 'ETH'], label = 'ETH Price', color = 'red')
ax.plot(df.loc[:i, 'date'], df.loc[:i, 'BTC'], label = 'BTC Price', color = 'blue')
ax.legend(frameon = True, loc = 'upper left', bbox_to_anchor = (1.15, 1))
ax.set_ylim(0.9*min(df['ETH'].min(), df['BTC'].min()), 1.1*max(df['ETH'].max(), df['BTC'].max()))
ax.tick_params(axis = 'x', which = 'both', top = False)
ax.tick_params(axis = 'y', which = 'both', right = False)
plt.setp(ax.xaxis.get_majorticklabels(), rotation = 45)
ax.set_xlabel('Date')
ax.set_ylabel('ETH Coin Value (USD)')
plt.tight_layout()
fig, ax = plt.subplots(figsize = (6, 4))
ani = FuncAnimation(fig = fig, func = update, frames = len(df), interval = 100)
plt.show()
Starting from the code above, you should twin the axis out of the update function: if you keep ax.twinx() inside the function, this operation will be repeated in each iteration and you will get a new axis each time.
Below the code for an animation with a secondary axis:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
df = pd.DataFrame({'date': pd.date_range(start = '2020-01-01', end = '2020-04-01', freq = 'D')})
df['ETH'] = 2*df.index + 300 + 100*np.random.randn(len(df))
df['BTC'] = 5*df.index + 13000 + 200*np.random.randn(len(df))
def update(i):
ax1.cla()
ax2.cla()
line1 = ax1.plot(df.loc[:i, 'date'], df.loc[:i, 'ETH'], label = 'ETH Price', color = 'red')
line2 = ax2.plot(df.loc[:i, 'date'], df.loc[:i, 'BTC'], label = 'BTC Price', color = 'blue')
lines = line1 + line2
labels = [line.get_label() for line in lines]
ax1.legend(lines, labels, frameon = True, loc = 'upper left', bbox_to_anchor = (1.15, 1))
ax1.set_ylim(0.9*df['ETH'].min(), 1.1*df['ETH'].max())
ax2.set_ylim(0.9*df['BTC'].min(), 1.1*df['BTC'].max())
ax1.tick_params(axis = 'x', which = 'both', top = False)
ax1.tick_params(axis = 'y', which = 'both', right = False, colors = 'red')
ax2.tick_params(axis = 'y', which = 'both', right = True, labelright = True, left = False, labelleft = False, colors = 'blue')
plt.setp(ax1.xaxis.get_majorticklabels(), rotation = 45)
ax1.set_xlabel('Date')
ax1.set_ylabel('ETH Coin Value (USD)')
ax2.set_ylabel('BTC Coin Value (USD)')
ax1.yaxis.label.set_color('red')
ax2.yaxis.label.set_color('blue')
ax2.spines['left'].set_color('red')
ax2.spines['right'].set_color('blue')
plt.tight_layout()
fig, ax1 = plt.subplots(figsize = (6, 4))
ax2 = ax1.twinx()
ani = FuncAnimation(fig = fig, func = update, frames = len(df), interval = 100)
plt.show()
My question:
while plotting x and y values from a dataframe, if we have y values as discrete numbers say, id_number or category. if we use scatter plot, it will give linearly spaced yaxis ticks which may have large vertical spacing in between the plotted values depending on how much spaced our original values are.
what i required is to plot some category values ( fixed discrete values ) against the time events ( xaxis ) in a scatter plot, but the values in the table are just integer not strings. As i don't have any deep idea how to do this, the following is what i have achieved, but with modified original table with string values. Here is my testing data ( original data is large )
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as mtic
import matplotlib.category as mcat
np.random.seed(432987435)
nofpoints = 160
xval = np.arange(nofpoints)
disc = [ 200, 240, 250, 290 ]
yval = np.random.choice( disc , nofpoints)
yval_str = yval.astype(str)
yval , yval_str
cval = np.random.random( nofpoints )
df = pd.DataFrame( { 'xval': xval , 'yval':yval , 'cval': cval })
df_str = pd.DataFrame( { 'xval': xval , 'yval':yval_str , 'cval': cval })
using usual plotting method
fig = plt.figure(dpi=128 , figsize=(12,6))
ax1 = fig.add_subplot(111)
# here we are using the original dataframe(df), without any string field inside.
#ax1.grid(True)
ax1.scatter( 'xval' , 'yval' , data=df , marker='o', facecolor='None' , edgecolor='g')
plt.show()
this is what we get
see the large spacing between the values and each plot point is not against the tick values. (I don't want to use legend to show the category using colourmap, since it is preserved for some other purpose)
with modified dataframe having string as yaxis value
fig = plt.figure(dpi=128 , figsize=(12,6))
ax2 = fig.add_subplot(111)
# dataframe used is modified one with a string field inside.
# as we can see the order is shuffled.
ax2.scatter( 'xval' , 'yval' , data=df_str , marker='o', facecolor='None' , edgecolor='k')
plt.show()
to avoid shuffling
fig = plt.figure(dpi=128 , figsize=(12,6))
ax3 = fig.add_subplot(111)
# to maintain the same order and avoid shuffling we used matplotlib.category
#ax3.grid(True)
disc_str = [ str(x) for x in disc ]
units = mcat.UnitData(sorted(disc_str))
ax3.yaxis.set_units(units)
ax3.yaxis.set_major_locator( mcat.StrCategoryLocator(units._mapping))
ax3.yaxis.set_major_formatter( mcat.StrCategoryFormatter(units._mapping))
ax3.scatter( 'xval' , 'yval' , data=df_str , marker='o', facecolor='None' , edgecolor='y')
plt.show()
Is there any way to achieve this, without modifying the original table, i mean to plot integer category values as yaxis values.
You can do it by replacing ax1.scatter with seaborn.stripplot:
sns.stripplot(ax = ax1, data = df, x = 'xval', y = 'yval_str', marker = 'o', color = 'white', edgecolor = 'green', linewidth = 1)
Before you do that, if you want y axis in a particular order, you should sort your df:
df = pd.DataFrame({'xval': xval, 'yval': yval, 'yval_str': yval_str, 'cval': cval}).sort_values(by = 'yval', ascending = False)
Complete Code
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
np.random.seed(432987435)
nofpoints = 160
xval = np.arange(nofpoints)
disc = [200, 240, 250, 290]
yval = np.random.choice(disc, nofpoints)
yval_str = yval.astype(str)
cval = np.random.random(nofpoints)
df = pd.DataFrame({'xval': xval, 'yval': yval, 'yval_str': yval_str, 'cval': cval}).sort_values(by = 'yval', ascending = False)
fig = plt.figure(dpi = 128, figsize = (12, 6))
ax1 = fig.add_subplot(111)
sns.stripplot(ax = ax1, data = df, x = 'xval', y = 'yval_str', marker = 'o', color = 'white', edgecolor = 'green', linewidth = 1)
plt.show()
If you want perfectly horizontally aligned points, you have to pass jitter = False to sns.stripplot:
sns.stripplot(ax = ax1, data = df, x = 'xval', y = 'yval_str', marker = 'o', color = 'white', edgecolor = 'green', linewidth = 1, jitter = False)
I want to automatize an imshow degrading figure with python3. I would like to give a data frame and this to be plot no matter how many columns are given.
I tried this:
vmin = 3.5
vmax = 6
fig, axes = plt.subplots(len(list(df.columns)),1)
for i,j in zip(list(df.columns),range(1,len(list(df.columns))+1)):
df = df.sort_values([i], ascending = False)
y = df[i].tolist()
gradient = [y,y]
plt.imshow(gradient, aspect='auto', cmap=plt.get_cmap('hot_r'), vmin=vmin, vmax=vmax)
axes = plt.subplot(len(list(df.columns)),1,j)
sm = plt.cm.ScalarMappable(cmap=plt.get_cmap('hot_r'),norm=plt.Normalize(vmin,vmax))
sm._A = []
plt.colorbar(sm,ax=axes)
plt.show()
My problem is that the first set of data (first column of the df) is never showed. Also the map is not where I want it to be. This is exactly what I get:
But this is what I want:
You shouldn't use plt.subplot if you already have created your subplots via plt.subplots.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
f = lambda x, s: x*np.exp(-x**2/s)/2
df = pd.DataFrame({"A" : f(np.linspace(0,50,600),70)+3.5,
"B" : f(np.linspace(0,50,600),110)+3.5,
"C" : f(np.linspace(0,50,600),150)+3.5,})
vmin = 3.5
vmax = 6
fig, axes = plt.subplots(len(list(df.columns)),1)
for col, ax in zip(df.columns,axes.flat):
df = df.sort_values([col], ascending = False)
y = df[col].values
gradient = [y,y]
im = ax.imshow(gradient, aspect='auto',
cmap=plt.get_cmap('hot_r'), vmin=vmin, vmax=vmax)
# Since all images have the same vmin/vmax, we can take any of them for the colorbar
fig.colorbar(im, ax=axes)
plt.show()
I have a seaborn scatter plot (lmplot) with over 10K points. In order to perceive all the data, it works better when the plot size is larger (making the markers relatively small) and the alpha on the markers is low. However, this makes the markers on the legend difficult to distinguish. How does one set the marker size and marker alpha in Seaborn?
I see that g._legend has a markersize attribute, but directly setting it doesn't do anything.
Example
import numpy as np
import pandas as pd
import seaborn as sns
n_group = 4000
pos = np.concatenate((np.random.randn(n_group,2) + np.array([-1,-1]),
np.random.randn(n_group,2) + np.array([0.2, 1.5]),
np.random.randn(n_group,2) + np.array([0.6, -1.8])))
df = pd.DataFrame({"x": pos[:,0], "y": pos[:, 1],
"label": np.repeat(range(3), n_group)})
g = sns.lmplot("x", "y", df, hue = "label", fit_reg = False,
size = 8, scatter_kws = {"alpha": 0.1})
g._legend.set_title("Clusters")
You can do this by setting the alpha values of the legend markers themselves. You can also use _sizes to set the marker sizes in the same for loop:
n_group = 4000
pos = np.concatenate((np.random.randn(n_group,2) + np.array([-1,-1]),
np.random.randn(n_group,2) + np.array([0.2, 1.5]),
np.random.randn(n_group,2) + np.array([0.6, -1.8])))
df = pd.DataFrame({"x": pos[:,0], "y": pos[:, 1],
"label": np.repeat(range(3), n_group)})
g = sns.lmplot("x", "y", df, hue = "label", fit_reg = False,
size = 8, scatter_kws = {"alpha": 0.1})
g._legend.set_title("Clusters")
for lh in g._legend.legendHandles:
lh.set_alpha(1)
lh._sizes = [50]
# You can also use lh.set_sizes([50])
the above did not work for me in a seaborn lineplot. This did:
g = sns.lineplot(data=df, x='X', y='Y', hue='HUE', ci=False, style='STYLE',
markers=True, ms=16, dashes=False)
#get legend and change stuff
handles, lables = g.get_legend_handles_labels()
for h in handles:
h.set_markersize(10)
# replace legend using handles and labels from above
lgnd = plt.legend(handles, lables, bbox_to_anchor=(1.02, 1), loc='upper left', borderaxespad=0, title='TITLE')