If I specified col arg in relplot(), it will plot for each category in col column.
But if there were too many categories in col columns, the graph would be squeezed vertically, because seaborn will put all plots in one row.
I remembered that there is an argument to solve this problem: something like max_number_of_col=4, which puts 4 plots on each row so when the row is full, it will plot on next row.
Unfortunately, I'm not 100% sure whether this is an api of seaborn or not. Please, let me know any api which has this function.
You are looking for the col_wrap= argument to replot()
Related
I want to plot a candlestick graph with additional indicators where one of indicator can be all nan. I'm using python matplotlib utilities called mplfinance for this. Mplfinance takes one parameter as main data for building candlesticks and second parameter is an array with additional indicators values. When I try to implement a custom indicator I first add an empty column filled with nan to the array with indicators and then fill it in a loop with a condition. It may happen that the whole column can stay all nan after the loop so I get an error and can't plot the graph.
df = pd.DataFrame.from_dict(pd.json_normalize(newBars), orient='columns')
idf = df.copy()
idf = idf.iloc[:,[0]]
idf.columns = ['col0']
idf.assign(col1=float('nan')) # Now we add column #1
for i in range(len(idf)-1):
if a > b: # Some condition I use to calculate Col1
idf.iat[i, 1] = float_value
indicators = [
mpf.make_addplot(idf['Col0'],color='grey',width=1,panel=0),
mpf.make_addplot(idf['Col1'],color='g',type='scatter',markersize=50,marker='^',panel=0),
]
mpf.plot(df, type='candle', style='yahoo', volume=True, addplot=indicators,
figscale=1.1,figratio=(8,5), panel_ratios=(2,1))
From the code there is a chance that Col1 can be all nan and in this case I get the following error:
ValueError: zero-size array to reduction operation maximum which has no identity
How can I avoid this error and just plot the graph without nan columns even if such column exists in the array?
Mplfinance is designed this way on purpose. if you pass all NaN data to mpf.make_addplot() you are effect saying plot nothing. You can easily test if you have any data before adding the make_addplot() to you list of addplot indicators.
Yes, it may make your code simpler if you can just pass indicators without having to check if your model actually "indicated" anything, however (1) this will make the mplfinance code have to check, increasing (albeit very slightly) the cost of maintaining the mplfinance library, and (2) it could be that you passed all NaN values by mistake, in which case if mplfinance simply ignores the data you may spend a lot of time debugging to determine why your indicator is not showing up on the plot.
For further discussion, see: https://github.com/matplotlib/mplfinance/issues/259#issuecomment-688429294
I am writing code that will output subplots for anything between 1 subplot and 20 subplots. The row and column configuration for all 20 different types of plots will be different, (e.g. for 4 subplots, I'll want 2 rows and 2 columns, but for 12 subplots I'll want 3 rows and 4 columns), and instead of typing in the number of rows and columns I want for each different number of subplots, I was wondering if there was a way to automatically generate the nrow and ncolumn values based off of the number of subplots I want in the image. I know there are similar questions to this out there, but I've only seen answers that suggest manually entering in the number of rows and columns you want for each subplot, and haven't seen a way to automate it yet. Thanks in advance for the help!
Maybe try the package grid_strategy? https://github.com/matplotlib/grid-strategy
I'm trying to make a line graph for my dataframe that has the names of 10 customers on the X axis and their amount of purchases they made on the Y axis.
I have over 100 customers in my data frame, so I created a new data frame that is grouped by customers and which shows the sum of their orders and I wish to only display the top 10 customers on my graph.
I have tried using
TopCustomers.nlargest(10, 'Company', keep='first')
But I run into the error nlargest() got multiple values for argument 'keep' and if I don't use keep, I get told it's a required argument.
TopCustomers is composed of TopCustomers = raw.groupby(raw['Company'])['Orders'].sum()
Sorting is not required at the moment, but it'd be good to know in advance.
On an additional Note: The list of customer's name is rather lengthy and, after playing with some dummy data, I see that the labels for the X axis are stacked on top of each other, is there a way to make it bigger so that all 10 are clearly visible? and maybe mark a dot where the X,Y meets?
we can do sort_values and tail
TopCustomers.sort_values().tail(10)
I am new to Bokeh and am trying to make a layout of 3 columns which have different amount of plots. For example, column 1 has 3 plots, but column 2 has 4 plots. So far the only way I can do it is by padding the shorter columns with extra entries, but this is obviously a waste of space.
I saw in this example that is is possible to do w/ rows of different sizes, so I hope one can do so w/ columns...
It is possible to do this by composing your layout using the row and column methods from the layouts module. Here is an example of what that could look like:
from bokeh.layouts import row, column
my_layout = row(
column([plot1, plot2, plot3]),
column([plot4, plot5])
)
This is the dataframe I am working with:
(only the first two years don't have data for country 69 I will fix this). nkill being the number of killed for that year summed from the original long form dataframe.
I am trying to do something similar to this plot:
However, with the country code as a hue. I know there are similar posts but none have helped me solve this, thank you in advance.
By Hue I mean that in the seaborn syntactical use As pictured in this third picture. See in this example Hue creates a plot for every type of variable in that column. So if I had two country codes in the country column, for every year it would plot two bars (one for each country) side by side.
Just looking at the data it should be possible to directly use the hue argument.
But first you would need to create actual columns from the dataframe
df.reset_index(inplace=True)
Then something like
sns.barplot(x = "year", y="nkill", hue="country", data=df)
should give you the desired plot.