I have the following dataframe:
land_cover 1 2 3 4 5 6 size
0 20 19.558872 6.856950 3.882243 1.743048 1.361306 1.026382 16.520265
1 30 9.499454 3.513521 1.849498 0.836386 0.659660 0.442690 8.652517
2 40 10.173790 3.123167 1.677257 0.860317 0.762718 0.560290 11.925280
3 50 10.098777 1.564575 1.280729 0.894287 0.884028 0.887448 12.647710
4 60 6.166109 1.588687 0.667839 0.230659 0.143044 0.070628 2.160922
5 110 17.846565 3.884678 2.202129 1.040551 0.843709 0.673298 30.406541
I want to plot the data in the way that:
. land_cover is the x-axis
. cols 1 - 6 should be stacked bar plots per land_cover class (row)
. and the column 'size' should be a second y-axis and could be a simple point symbol for every row and additionally a smooth line connecting the points
Any ideas?
Your code is pretty fine. I only add two more lines
import matplotlib.pyplot as plt
df.plot(x="land_cover", y=[1, 2, 3, 4, 5, 6], stacked=True, kind="bar")
ax = df['size'].plot(secondary_y=True, color='k', marker='o')
ax.set_ylabel('size')
plt.show()
In general just add one extra argument to your plot call: secondary_y=['size'].
In this case a separate plot is easier though, because of line vs bars etc.
Related
I have a dataframe with 12 columns and 30 rows (only the first 5 rows are shown here):
0 1 2 3 4 5 6 7 8 9 10 11
0
10 0.420000 0.724000 0.552000 0.316000 0.176000 0.320000 0.228000 0.552000 0.476000 0.468000 0.560000 0.332000
20 0.387097 0.701613 0.516129 0.338710 0.177419 0.346774 0.217742 0.443548 0.483871 0.435484 0.516129 0.330645
30 0.353659 0.731707 0.365854 0.280488 0.158537 0.243902 0.231707 0.451220 0.524390 0.414634 0.451220 0.329268
40 0.377049 0.557377 0.311475 0.213115 0.213115 0.262295 0.262295 0.459016 0.540984 0.475410 0.377049 0.262295
50 0.285714 0.673469 0.183673 0.183673 0.163265 0.285714 0.204082 0.387755 0.489796 0.367347 0.306122 0.244898
I would like to plot a dot plot with rows indices as the x-axis columns values as the y-axis (ie. 12 dots on each x).
I have tried the following:
df.plot()
and I get this plot
I would like to show only the markers (dots) and not the lines
I tried df.plot(linestyle='None') but then I get an empty plot.
How can I change my code to show the dots/markers and hide the lines?
pandas.DataFrame.plot passes **kwargs to matplotlib's .plot method. Thus you can use any of the matplotlib.lines.Line2D properties:
df.plot(ls='', marker='.')
How to plot the set of numbers like (first column is x-axis, second column is y-axis):
1 3.4335e-14
2 5.8945e-28
3 6.7462e-42
4 5.7908e-56
5 3.9765e-70
6 2.2756e-84
7 1.1162e-98
8 4.7904e-113
9 1.8275e-127
10 6.2749e-142
11 1.9586e-156
12 5.6041e-171
13 1.4801e-185
14 3.6300e-200
15 8.3091e-215
16 1.7831e-229
17 3.6013e-244
18 6.8694e-259
19 1.2414e-273
For now I get:
And I can't figure out how to make it properly. It means no flat line from 2 to the end and correct y-axis values. I read these values from the file with:
x_values.append(line.split(' ')[0])
y_values.append(float(line.split(' ')[1]))
You may wish to switch the yscale to "log" scale, e.g.:
import matplotlib.ticker as mtick
_,ax = plt.subplots()
plt.plot(x,y)
plt.xticks(x)
plt.yscale("log")
ax.yaxis.set_major_formatter(mtick.FormatStrFormatter('%.2e'));
I have a dataframe with variable scale data, I am trying to get a plot with subplots. something like this.
raw_data = {'strike_date': ['2019-10-31', '2019-11-31','2019-12-31','2020-01-31', '2020-02-31'],
'strike': [100.00, 113.00, 125.00, 126.00, 135.00],
'lastPrice': [42, 32, 36, 18, 23],
'volume': [4, 24, 31, 2, 3],
'openInterest': [166, 0, 0, 62, 12]}
ploty_df = pd.DataFrame(raw_data, columns = ['strike_date', 'strike', 'lastPrice', 'volume', 'openInterest'])
ploty_df
strike_date strike lastPrice volume openInterest
0 2019-10-31 100.0 42 4 166
1 2019-11-31 113.0 32 24 0
2 2019-12-31 125.0 36 31 0
3 2020-01-31 126.0 18 2 62
4 2020-02-31 135.0 23 3 12
this is what I tried so far with a twinx, if you noticed the out put is a flat data without any scale difference for strike and volume.
fig, ax = plt.subplots()
fig.subplots_adjust(right=0.75)
mm = ax.twinx()
yy = ax.twinx()
for col in ploty_df.columns:
mm.plot(ploty_df.index,ploty_df[[col]],label=col)
mm.set_ylabel('volume')
yy.set_ylabel('strike')
yy.spines["right"].set_position(("axes", 1.2))
yy.set_ylim(mm.get_ylim()[0]*12, mm.get_ylim()[1]*12)
plt.tick_params(axis='both', which='major', labelsize=16)
handles, labels = mm.get_legend_handles_labels()
mm.legend(fontsize=14, loc=6)
plt.show()
and the output
the main problem with your script is that you are generating 3 axes but only plotting on one of them, you need to think of each axes as a separate object with its own y-scale, y-limit and so. So for example in your script when you call fig, ax = plt.subplots() you generate the first axes that you call ax (this is the standard yaxis with the scale on the left-side of your plot). If you want to plot something on this axes you should call ax.plot() but in your case you are plotting everything on the axes that you called mm.
I think you should really go through the matplotlib documentation do understand these concepts better. For plotting on multiple y-axis I would recommend you to have a look at this example.
Below you can find a basic example to plot your data on 3 different y-axis, you can take it as a starting point to produce the graph you are looking for.
#convert the index of your dataframe to datetime
plot_df.index=pd.DatetimeIndex(plot_df.strike_date)
fig, ax = plt.subplots(figsize=(15,7))
fig.subplots_adjust(right=0.75)
l1,=ax.plot(plot_df['strike'],'r')
ax.set_ylabel('Stike')
ax2=ax.twinx()
l2,=ax2.plot(plot_df['lastPrice'],'g')
ax2.set_ylabel('lastPrice')
ax3=ax.twinx()
l3,=ax3.plot(plot_df['volume'],'b')
ax3.set_ylabel('volume')
ax3.spines["right"].set_position(("axes", 1.2))
ax3.spines["right"].set_visible(True)
ax.legend((l1,l2,l3),('Stike','lastPrice','volume'),loc='center left')
here the result:
p.s. Your example dataframe contains non existing dates (31st February 2020) so you have to modify those in order to be able to convert the index to datetime.
Below are three columns VMDensity, ServerswithCorrectable errors and VMReboots.
VMDensity correctableCount avgVMReboots
LowDensity 7 5
HighDensity 1 23
LowDensity 5 11
HighDensity 1 23
LowDensity 9 5
HighDensity 1 22
HighDensity 1 22
LowDensity 9 2
LowDensity 9 6
LowDensity 5 3
I tried the following but not sure how to create it by groups with different colors.
import matplotlib.pyplot as plt
import pandas as pd
plt.scatter(df.correctableCount, df.avgVMReboots)
Now, I need generate a scatter plot with the grouping by VMDensity. The low density VM's should be in one color and the high density in another one.
If I understand you correctly you do not need to "group" the data: You want to plot all data points regardsless. You just want to color them differently. So try something like
plt.scatter(df.correctableCount, df.avgVMReboots, c=df.VMDensity)
You will need to map the df.VMDensity strings to numbers and/or play with scatter's cmap parameter.
See this example from matplotlib's gallery.
I want to do a scatter plot of a wavelength (float) in y-axis and spectral class (list of character/string) in x-axis, labels = ['B','A','F','G','K','M']. Data are saved in pandas dataframe, df.
df['Spec Type Index']
0 NaN
1 A
2 G
. .
. .
167 K
168 Nan
169 G
Then,
df['Disk Major Axis "']
0 4.30
1 4.50
2 22.00
. .
. .
167 1.32
168 0.28
169 25.00
Thus, I thought this should be done simply with
plt.scatter(df['Spec Type Index'], df['Disk Major Axis "'])
But I get this annoying error
could not convert string to float: 'G'
After fixing this, I want to make custom xticks as follows. However, how can I
labels = ['B','A','F','G','K','M']
ticks = np.arange(len(labels))
plt.xticks(ticks, labels)
First, I think you have to map those strings to integers then matplotlib can decide where to place those points.
labels = ['B','A','F','G','K','M']
mapping = {'B': 0,'A': 1,'F': 2,'G': 3,'K': 4,'M': 5}
df = df.replace({'Spec Type Index': mapping})
Then plot the scatter,
fig, ax = plt.subplots()
ax.scatter(df['Spec Type Index'], df['Disk Major Axis "'])
Finally,
ax.set_xticklabels(labels)