I am working on getting some graphs generated for 4 columns, with the COLUMN_NM being the main index.
The issue I am facing is the column names are showing along the bottom. This is problematic for 2 reasons, first being there could be dozens of these columns so the graph would look messy and could stretch too far to the right. Second being they are getting cut off (though I am sure that can be fixed)
I would prefer to have the column names listed vertically in the box where 'MAX_COL_LENGTH' current resides, and have the bars different colors per column instead.
Any ideas how I would adjust this or suggestions to make this better?
for col in ['DISTINCT_COUNT', 'MAX_COL_LENGTH', 'MIN_COL_LENGTH', 'NULL_COUNT']:
grid[['COLUMN_NM', col]].set_index('COLUMN_NM').plot.bar(title=col)
plt.show()
In this case you can plot points one by one and setup the label name for each point:
gs = gridspec.GridSpec(1,1)
fig = plt.figure(figsize=(5, 5))
ax = fig.add_subplot(gs[:, :])
data = [1,2,3,4,5]
label = ['l1','l2','l3','l4','l5']
for n,(p,l) in enumerate(zip(data,label)):
ax.bar(n,p,label=l)
ax.set_xticklabels([])
ax.legend()
This is the output for the code above:
Related
I'm trying to generate a plot in seaborn using a for loop to plot the contents of each dataframe column on its own row.
The number of columns that need plotting can vary between 1 and 30. However, the loop creates multiple individual plots, each with their own x-axis, which are not aligned and with a lot of wasted space between the plots. I'd like to have all the plots together with a shared x-axis without any vertical spacing between each plot that I can then save as a single image.
The code I have been using so far is below.
comp_relflux = measurements.filter(like='rel_flux_C', axis=1) *# Extracts relevant columns from larger dataframe
comp_relflux=comp_relflux.reindex(comp_relflux.mean().sort_values().index, axis=1) # Sorts into order based on column mean.
plt.rcParams["figure.figsize"] = [12.00, 1.00]
for column in comp_relflux.columns:
plt.figure()
sns.scatterplot((bjd)%1, comp_relflux[column], color='b', marker='.')
This is a screenshot of the resultant plots.
I have also tried using FacetGrid, but this just seems to plot the last column's data.
p = sns.FacetGrid(comp_relflux, height=2, aspect=6, despine=False)
p.map(sns.scatterplot, x=(bjd)%1, y=comp_relflux[column])
To combine the x-axis labels and have just one instead of having it for each row, you can use sharex. Also, using plt.subplot() to the number of columns you have, you would also be able to have just one figure with all the subplots within it. As there is no data available, I used random numbers below to demonstrate the same. There are 4 columns of data in my df, but have kept as much of your code and naming convention as is. Hope this is what you are looking for...
comp_relflux = pd.DataFrame(np.random.rand(100, 4)) #Random data - 4 columns
bjd=np.linspace(0,1,100) # Series of 100 points - 0 to 1
rows=len(comp_relflux.columns) # Use this to get column length = subplot length
fig, ax = plt.subplots(rows, 1, sharex=True, figsize=(12,6)) # The subplots... sharex is assigned here and I move the size in here from your rcParam as well
for i, column in enumerate(comp_relflux.columns):
sns.scatterplot((bjd)%1, comp_relflux[column], color='b',marker='.', ax=ax[i])
1 output plot with 4 subplots
This is a bit of an odd problem I've encountered. I'm trying to read data from a CSV file in Python, and have the two resulting lines be inside of the same box, with different scales so they're both clear to read.
The CSV file looks like this:
Date,difference,current
11/19/20, 0, 606771
11/20/20, 14612, 621383
and the code looks like this:
data = pd.read_csv('data.csv')
time = data['Time']
ycurr = data['current']
ydif = data['difference']
fig, ax = plt.subplots()
line1, = ax.plot(time, ycurr, label='Current total')
line1.set_dashes([2, 2, 10, 2]) # 2pt line, 2pt break, 10pt line, 2pt break
line2, = ax.twinx().plot(time, ydif, dashes=[6, 2], label='Difference')
ax.legend()
plt.show()
I can display the graphs with the X-axis having Date values and Y-axis having difference values or current values just fine.
However, when I attempt to use subplots() and use the twinx() attribute with the second line, I can only see one of two lines.
I initially thought this might be a formatting issue in my code, so I updated the code to have the second line be ax2 = ax1.twin(x) and call upon the second line using this, but the result stayed the same. I suspect that this might be an issue with reading in the CSV data? I tried to do read in x = np.linspace(0, 10, 500) y = np.sin(x) y2 = np.sin(x-0.05) instead and that worked:
Everything is working as expected but probably not how you want it to work!
So each line only consists of two data points which in the end will give you a linear curve. Both of these curves share the same x-coordinates while the y-axis is scaled for each plot. And here comes the problem, both axes are scaled to display the data in the same way. This means, the curves lie perfectly on top of each other. It is difficult to see because both lines are dashed.
You can see what is going on by changing the colors of the line. For example add color='C1' to one of the curves.
By the way, what do you want to show with your plot? A curve consisting of two data points mostly doesn't show much and you are better of if you just show their values directly instead.
I have a heatmap using seaborn and am trying to adjust the height of the 4th plot below. You will see that it only has 2 rows of data vs the others that have more:
I have used the following code to create the plot:
f, ax = plt.subplots(nrows=4,figsize=(20,10))
cmap = plt.cm.GnBu_r
sns.heatmap(df,cbar=False,cmap=cmap,ax=ax[0])
sns.heatmap(df2,cbar=False,cmap=cmap,ax=ax[1])
sns.heatmap(df3,cbar=False,cmap=cmap,ax=ax[2])
sns.heatmap(df4,cbar=False,cmap=cmap,ax=ax[3])
Does anyone know the next step to essentially make the 4th plot smaller in height and thus stretching out the other 3? The 4th plot will generally always have 2-3 where as the others will have 6-7 most times. Thanks very much!
As normal, it is pretty funky/tedious with matplotlib. But here it is!
f = plt.figure(constrained_layout = True)
specs = f.add_gridspec(ncols = 1, nrows = 4, height_ratios = [1,1,1,.5])
for spec, df in zip(specs, (df, df2, df3, df4)):
ax = sns.heatmap(df,cbar=False,cmap=cmap, ax=f.add_subplot(spec))
You can change the heights relative to each other using the height_ratios. You could also implement a wdith_ratios parameter if you desired to change the relative widths. You could also implement a for loop to iterate over the graphing.
I have a data file which consists of 131 columns and 4 rows. I am plotting it into python as follows
df = pd.read_csv('data.csv')
df.plot(figsize = (15,10))
Once it is plotted, all 131 legends are coming together like a huge tower over the line plots.
Please see the image here, which I have got :
Link to Image, I have clipped after v82 for better understanding
I have found some solutions on Stackoverflow (SO) to shift legend anywhere in the plot but I could not find any solution to break this legend tower into multiple small-small pieces and stack them one beside another.
Moreover, I want my plot something look like this
My desired plot :
Any help would be appreciable. Thank you.
You can specify the position of the legend in relative coordinates using loc and use ncol parameter to split the single legend column into multiple columns. To do so, you need an axis handle returned by the df.plot
df = pd.read_csv('data.csv')
ax = df.plot(figsize = (10,7))
ax.legend(loc=(1.01, 0.01), ncol=4)
plt.tight_layout()
I would like to plot a set of error bars with different colors. Also different colours for my data points.
At the moment I am using:
colours = ['r','b','g','k','m']
labels = ['200Mpc','300Mpc','340Mpc','400Mpc','450Mpc']
fig2 = plt.figure(figsize=(7,5))
ax3 = fig2.add_subplot(111)
for a,b,c,d,e,f in zip(r0_array,gamma_array,r0_error,gamma_error,colours,labels):
ax3.scatter(r0_array,gamma_array,c=e,label=f)
ax3.errorbar(r0_array,gamma_array,xerr=c,yerr=d,fmt='o',color=e)
ax3.set_xlabel('$r_{0}$',fontsize=14)
ax3.set_ylabel(r'$\gamma$',fontsize=14)
ax3.legend(loc='best')
fig2.show()
Which results in a figure with the errorbars and colours being overplotted.
I can see that the for loop is being run 5 times again, as I can see all the colours, but I don't see why this is happening!
I figured out the very silly mistake I was making!!
After the for loop, each value, i.e. a,b,c,d,e,f take the values inside the arrays r0_array,gamma_array etc..
Instead of calling a,b,c, and d in scatter, I am calling the entire array r0_array, gamma_array,etc.. each time.
for a,b,c,d,e,f in zip(r0_array,gamma_array,r0_error,gamma_error,colours,labels):
ax3.scatter(a,b,color=e,label=f)
ax3.errorbar(a,b,xerr=c,yerr=d,fmt='o')
fixed the issue.