The att_sales table has 3 fields item,qty and yr_mon. The buckets(i) function returns a list of 20 SKUs(list of item values). A group by function is used to find the monthly sales for each of these SKUs
and then churns out a violinplot. The exercise wors fine till this point.
I am trying to visualize the monthly sales for about 200 SKUs along 10 subplots.To do this I intended an iterator to run from 1 through 10 and populate each of the subplots.The code below populates the last subplot out of 10 empty ones. How do I go about achieving this?
fig, (axis1) = plt.subplots(5,2,figsize=(15,30))
plt.xticks(rotation=45)
s=att_sales[['item','qty','yr_mon']]
s=s[s.item.isin(buckets(i))]
s=s.groupby(['item','yr_mon'], as_index=False).qty.sum()
sns.violinplot(x="item", y="qty", data=s)
Edit1: On implmenting #Ted's solution I got an error min() arg is an empty sequence when the for loop ran from 0 to n. Changing the for loop to run between 1 and n, provides most the solution but not quite.
I need to know how to increase the size of the overall plot and of the individual subplots, and also change the orientation of the xticks to 45 degrees.
Here is a simplified example that I think you can tweak to make it work for you. I am using the tips dataset in seaborn and plotting 4 different violin plots based on what day it is. I have also created a buckets function that returns a single element list of one day.
When the figure is created with fig, axes = plt.subplots(2,2,figsize=(10,10)), it returns both a matplotlib figure object which is stored into fig and a 2 dimensional numpy array of matplotlib axes objects which is stored in axes. To get the top left plot you would do axes[0, 0]. If you wanted the bottom right hand plot you would do axes[1, 1]. If you created a 5 row by 2 column figure axes[3,0] would be the plot on the 4th row and first column.
# create function that will return a list of items
# this particular example returns just a list of one element
def buckets(i):
return [tips.day.unique()[i]]
# load dataset and create figure
tips = sns.load_dataset("tips")
num_plots = 4
fig, axes = plt.subplots(2,2,figsize=(10,10))
# iterate through all axes and create a violin plot
for i in range(num_plots):
df = tips[tips.day.isin(buckets(i))]
row = i // 2
col = i % 2
ax_curr = axes[row, col]
sns.violinplot(x="sex", y="tip", data=df, ax=ax_curr)
ax_curr.set_title(buckets(i))
Note that in this particular example you can use a facet grid which will do the same exact thing as what I did by plotting each day in a separate plot. You can take advantage of the facet grid if you label each bucket of SKUs a unique id. See the very last example on this page
Related
I'm trying to generate a plot in seaborn using a for loop to plot the contents of each dataframe column on its own row.
The number of columns that need plotting can vary between 1 and 30. However, the loop creates multiple individual plots, each with their own x-axis, which are not aligned and with a lot of wasted space between the plots. I'd like to have all the plots together with a shared x-axis without any vertical spacing between each plot that I can then save as a single image.
The code I have been using so far is below.
comp_relflux = measurements.filter(like='rel_flux_C', axis=1) *# Extracts relevant columns from larger dataframe
comp_relflux=comp_relflux.reindex(comp_relflux.mean().sort_values().index, axis=1) # Sorts into order based on column mean.
plt.rcParams["figure.figsize"] = [12.00, 1.00]
for column in comp_relflux.columns:
plt.figure()
sns.scatterplot((bjd)%1, comp_relflux[column], color='b', marker='.')
This is a screenshot of the resultant plots.
I have also tried using FacetGrid, but this just seems to plot the last column's data.
p = sns.FacetGrid(comp_relflux, height=2, aspect=6, despine=False)
p.map(sns.scatterplot, x=(bjd)%1, y=comp_relflux[column])
To combine the x-axis labels and have just one instead of having it for each row, you can use sharex. Also, using plt.subplot() to the number of columns you have, you would also be able to have just one figure with all the subplots within it. As there is no data available, I used random numbers below to demonstrate the same. There are 4 columns of data in my df, but have kept as much of your code and naming convention as is. Hope this is what you are looking for...
comp_relflux = pd.DataFrame(np.random.rand(100, 4)) #Random data - 4 columns
bjd=np.linspace(0,1,100) # Series of 100 points - 0 to 1
rows=len(comp_relflux.columns) # Use this to get column length = subplot length
fig, ax = plt.subplots(rows, 1, sharex=True, figsize=(12,6)) # The subplots... sharex is assigned here and I move the size in here from your rcParam as well
for i, column in enumerate(comp_relflux.columns):
sns.scatterplot((bjd)%1, comp_relflux[column], color='b',marker='.', ax=ax[i])
1 output plot with 4 subplots
I am working on a relatively large dataset (5000 rows) in pandas and would like to draw a bar plot, but continuous and with different colors 1.
For every depth data there will be a value of SBT.
Initially, I thought to generate a bar for each depth, but due to the amount of data, the graph does not display it very well and it takes a really long time to load.
In the meantime, I generated a plot of the data, but with lines.
I added the code and the picture of this plot below 2.
fig, SBTcla = plt.subplots()
SBTcla.plot(SBT,datos['Depth (m)'], color='black',label='SBT')
plt.xlim(0, 10)
plt.grid(color='grey', linestyle='--', linewidth=1)
plt.title('SBT');
plt.xlabel('SBT');
plt.ylabel('Profundidad (mts)');
plt.gca().invert_yaxis();
Your graph consists of a lot of points with no information. Consecutive rows which contain the same SBT could we eliminated. Grouping by consecutive rows with equal content can be done by a shift and cummulative sum. The boolean expression looks for steps from one region to the next. If it is a step it returns true and the sum increases by one.
x = datos.groupby((datos['SBT'].shift() != datos['SBT']).cumsum())
Each group can be plotted on its own, with a filled style
I have been trying to merge these two plots together but have not found a built-in in the documentation for MatPlotLib on how to do so. I want to show the two bar values next to each and for every new entry, add the new entry to the graph while shifting the other entries over to make space. The plots are below.
As stated prior, when I say merge, I do not simply mean just plop Plot A onto Plot B, but rather join the plots together so both bar values are shown in the same graph, like this:
The reasoning for this is that I will be able to log all the entries in a single plot without having to manually do so. By implementing something like this in my code, it would make entries go a lot quicker.
EDIT: I understand that I can graph these two together, but that is not what I am looking for. Once I get the necessary input, my program creates a graph of that data and saves it as a file. I am looking to append any new data to that original file by just shifting the original value over to the left in order to make space.
EDIT 2: How could I extract the data from each plot and after doing so, create a new graph? This would seem to be another acceptable workaround.
Is there anything preventing you from plotting each of them side by side but changing the index?
a, b, c = 2, 5, 3
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 1)
count = 0
ax.bar(count, a)
# if prgoram produces a new output then...
count += 1
ax.bar(count, b) # index means new bar plot has shifted
# again
count += 1
ax.bar(count, c) # shifted again
This should automatically expand the x-axis anyway. You may have to alter this slightly if you've particularly concenred about the width of these bars.
If this isn't what you wanted you could consider replotting with the bar container or even just stripping the height to reuse.
fig, ax = plt.subplots(1, 1)
count = 0
bar_cont = ax.bar(count, a) # reference to the bar container of interest
print(bar_cont.get_height())
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_axes([0,0,1,1])
people = ['JOHN DOE', 'BOB SMITH']
values = [14,14]
ax.bar(people,values)
plt.show()
Should be the solution. You just have to pass a list instead of a single value to the plt.bar() function. More detailed explaination here.
Using matlotlib, I can create figures that look like this:
Here, each row consists of a series of numbers from 0 to 0.6. The left hand axis text indicates the maximum value in each row. The bottom axis text represents the column indices.
The code for the actual grid essentially involves this line:
im = ax[r,c].imshow(info_to_use, vmin=0, vmax=0.6, cmap='gray')
where ax[r,c] is the current subplot axes at row r and column c, and info_to_use is a numpy array of shape (num_rows, num_cols) and has values between 0 and 0.6.
I am wondering if there is a way to convert the code above so that it instead displays bar charts, one per row? Something like this hand-drawn figure:
(The number of columns is not the same in my hand-drawn figure compared to the earlier one.) I know this would result in a very hard-to-read plot if it were embedded into a plot like the first one here. I would have this for a plot with fewer rows, which would make the bars easier to read.
The references that helped me make the first plot above were mostly from:
Python - Plotting colored grid based on values
custom matplotlib plot : chess board like table with colored cells
https://matplotlib.org/3.1.1/gallery/subplots_axes_and_figures/colorbar_placement.html#sphx-glr-gallery-subplots-axes-and-figures-colorbar-placement-py
https://matplotlib.org/3.1.1/gallery/images_contours_and_fields/image_annotated_heatmap.html#sphx-glr-gallery-images-contours-and-fields-image-annotated-heatmap-py
But I'm not sure how to make the jump from these to a bar chart in each row. Or at least something that could mirror it, e.g., instead of shading the full cell gray, only shade as much of it based on the percentage of the vmax?
import numpy as np
from matplotlib import pyplot as plt
a = np.random.rand(10,20)*.6
In a loop, call plt.subplot then plt.bar for each row in the 2-d array.
for i, thing in enumerate(a,1):
plt.subplot(a.shape[0],1,i)
plt.bar(range(a.shape[1]),thing)
plt.show()
plt.close()
Or, create all the subplots; then in a loop make a bar plot with each Axes.
fig, axes = plt.subplots(a.shape[0],1,sharex=True)
for ax, data in zip(axes, a):
ax.bar(range(a.shape[1]), data)
plt.show()
plt.close()
I am trying to create a figure with four subplots using the Matplotlib object based approach. I am having trouble setting the x-axis to hourly markers on each plot. With my present code the hourly marks are retained only on the last of the four subplots
I have a list which contains four dataframes that were read in from CSV. I used pd.to_datetime to create an index. No problem.
I can loop through the four dataframes and plot my y variable (TS_comp) against time. this works fine and I get date/time on each x-axis. But what I want is to have just hour markers on each of the x axis. When I add code in the loop to set the major locator it ends up that the x-axis labels are wiped on the first three subplots. The two lines of code from the loop below are:
ax.xaxis.set_major_locator(hours)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H'))
I do not understand why this is happening as each time it goes through the loop it should be addressing a different axis object. Note x-axis time ranges are different so not a simple matter of sharing the x-axis across the subplots.
fig, ax = plt.subplots(nrows=2, ncols=2)
i=0
hours = mdates.HourLocator(interval = 1)
for ax in fig.get_axes():
ax.plot(dfs[i].TS_comp,'k-',markersize = 0.5)
ax.xaxis.set_major_locator(hours)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H'))
i=i+1;
Expect to get hourly markers on each of the subplots, ended up with hourly markers on just the last plot