I have a dataframe with 12 columns and 30 rows (only the first 5 rows are shown here):
0 1 2 3 4 5 6 7 8 9 10 11
0
10 0.420000 0.724000 0.552000 0.316000 0.176000 0.320000 0.228000 0.552000 0.476000 0.468000 0.560000 0.332000
20 0.387097 0.701613 0.516129 0.338710 0.177419 0.346774 0.217742 0.443548 0.483871 0.435484 0.516129 0.330645
30 0.353659 0.731707 0.365854 0.280488 0.158537 0.243902 0.231707 0.451220 0.524390 0.414634 0.451220 0.329268
40 0.377049 0.557377 0.311475 0.213115 0.213115 0.262295 0.262295 0.459016 0.540984 0.475410 0.377049 0.262295
50 0.285714 0.673469 0.183673 0.183673 0.163265 0.285714 0.204082 0.387755 0.489796 0.367347 0.306122 0.244898
I would like to plot a dot plot with rows indices as the x-axis columns values as the y-axis (ie. 12 dots on each x).
I have tried the following:
df.plot()
and I get this plot
I would like to show only the markers (dots) and not the lines
I tried df.plot(linestyle='None') but then I get an empty plot.
How can I change my code to show the dots/markers and hide the lines?
pandas.DataFrame.plot passes **kwargs to matplotlib's .plot method. Thus you can use any of the matplotlib.lines.Line2D properties:
df.plot(ls='', marker='.')
Related
How do I add the x-axis(Month) to a simple Matplotlib
My Dataset:
Month Views CMA30
0 11 24662 24662.000000
1 11 2420 13541.000000
2 11 11318 12800.000000
3 11 8529 11732.250000
4 10 78861 25158.000000
5 10 1281 21178.500000
6 10 22701 21396.000000
7 10 17088 20857.500000
This is my code:
df[['Views', 'CMA30']].plot(label='Views', figsize=(5, 5))
This is giving me Views and CMA30 on the y-axis. How do I add Month(1-12) on the x-axis?
If you average the values per month, then try groupby/mean:
df.groupby('Month')[['Views','CMA30']].mean().plot(label='Views', figsize=(5, 5))
I am wanting to display the confidence interval for each bar in my plot, but they do not seem to show. I have two dataframes, and I am displaying the average of the NUMBER_GIRLS column in my plot from both dataframes.
For example, consider the two dataframes (shown below).
schools_north_df
ID NAME NUMBER_GIRLS
----------------------------
1 SCHOOL_1 32
2 SCHOOL_2 12
3 SCHOOL_3 26
schools_south_df
ID NAME NUMBER_GIRLS
----------------------------
1 SCHOOL_1 56
2 SCHOOL_2 33
3 SCHOOL_3 34
Therefore, I have used this code (shown below) to plot my barplot with the confidence intervals showing for each bar - but when plotting it, the confidence interval does not show up.
import matplotlib.pyplot as plt
objects = ('North', 'South')
y_pos = np.arange(len(objects))
avg_girls = [schools_north_df[NUMBER_GIRLS].mean(), schools_south_df[NUMBER_GIRLS].mean()]
sns.barplot(y_pos, avg_girls, ci=95)
plt.xticks(y_pos, objects)
plt.title('Average Number of Girls')
plt.show()
If anyone could kindly help me and indicate what is wrong with my code. I really need the confidence interval to display on my barplot.
Thank you very much!
If you want seaborn to display the confidence intervals, you need to let seaborn aggregate the data by itself (that is to say, provide the raw data instead of calculating the mean yourself).
I would create a new dataframe with an extra column (region) to indicate whether the data are from the "north" or the "south" and then request seaborn to plot NUMBER_GIRLS vs region:
df = pd.concat([schools_north_df.assign(region='North'), schools_south_df.assign(region='South')])
output:
ID NAME NUMBER_GIRLS region
0 1 SCHOOL_1 32 North
1 2 SCHOOL_2 12 North
2 3 SCHOOL_3 26 North
0 1 SCHOOL_1 56 South
1 2 SCHOOL_2 33 South
2 3 SCHOOL_3 34 South
plot:
sns.barplot(data=df, x='region', y='NUMBER_GIRLS', ci=95)
I'm pretty new to plotting using matplotlib and I'm having a few problems with the legends, I have this data set:
Wavelength CD Time
0 250.0 0.00000 1
1 249.8 -0.04278 1
2 249.6 -0.03834 1
3 249.4 -0.02384 1
4 249.2 -0.04817 1
... ... ... ...
3760 200.8 0.99883 15
3761 200.6 0.50277 15
3762 200.4 -0.19228 15
3763 200.2 0.81317 15
3764 200.0 0.90226 15
[3765 rows x 3 columns]
Column types:
Wavelength float64
CD float64
Time int64
dtype: object
Why when plotted with Time as the categorical variable all the values are not shown in the legend?
x = df1['Wavelength']
y = df1['CD']
z = df1['Time']
sns.lineplot(x, y, hue = z)
plt.tight_layout()
plt.show()
But I can plot using pandas built in matplotlib function with a colorbar bar like this:
df1.plot.scatter('Wavelength', 'CD', c='Time', cmap='RdYlBu')
What's the best way of choosing between discrete and continuous legends using matplotlib/seaborn?
Many thanks!
How to plot the set of numbers like (first column is x-axis, second column is y-axis):
1 3.4335e-14
2 5.8945e-28
3 6.7462e-42
4 5.7908e-56
5 3.9765e-70
6 2.2756e-84
7 1.1162e-98
8 4.7904e-113
9 1.8275e-127
10 6.2749e-142
11 1.9586e-156
12 5.6041e-171
13 1.4801e-185
14 3.6300e-200
15 8.3091e-215
16 1.7831e-229
17 3.6013e-244
18 6.8694e-259
19 1.2414e-273
For now I get:
And I can't figure out how to make it properly. It means no flat line from 2 to the end and correct y-axis values. I read these values from the file with:
x_values.append(line.split(' ')[0])
y_values.append(float(line.split(' ')[1]))
You may wish to switch the yscale to "log" scale, e.g.:
import matplotlib.ticker as mtick
_,ax = plt.subplots()
plt.plot(x,y)
plt.xticks(x)
plt.yscale("log")
ax.yaxis.set_major_formatter(mtick.FormatStrFormatter('%.2e'));
Below are three columns VMDensity, ServerswithCorrectable errors and VMReboots.
VMDensity correctableCount avgVMReboots
LowDensity 7 5
HighDensity 1 23
LowDensity 5 11
HighDensity 1 23
LowDensity 9 5
HighDensity 1 22
HighDensity 1 22
LowDensity 9 2
LowDensity 9 6
LowDensity 5 3
I tried the following but not sure how to create it by groups with different colors.
import matplotlib.pyplot as plt
import pandas as pd
plt.scatter(df.correctableCount, df.avgVMReboots)
Now, I need generate a scatter plot with the grouping by VMDensity. The low density VM's should be in one color and the high density in another one.
If I understand you correctly you do not need to "group" the data: You want to plot all data points regardsless. You just want to color them differently. So try something like
plt.scatter(df.correctableCount, df.avgVMReboots, c=df.VMDensity)
You will need to map the df.VMDensity strings to numbers and/or play with scatter's cmap parameter.
See this example from matplotlib's gallery.