I currently have a dataframe, df:
In [1]: df
Out [1]:
one two
1.5 11.22
2 15.36
2.5 11
3.3 12.5
3.5 14.78
5 9
6.2 26.14
I used this code to get a heat map:
In [2]:
plt.figure(figsize=(30, 7))
plt.title('Test')
ax = sns.heatmap(data=df, annot=True,)
plt.xlabel('Test')
ax.invert_yaxis()
value = 6
index = np.abs(df.index - value).argmin()
ax.axhline(index + .5, ls='--')
print(index)
Out [2]:
I am looking for the y-axis, instead, to automatically scale and plot the df[2] values in their respective positions on the full axis. For example, there should be a clear empty space between 3.5 and 5.0 as there aren’t any values - I want the values in between on the y-axis with 0 value against them.
This can be easily achieved with a bar plot instead:
plt.bar(df['one'], df['two'], color=list('rgb'), width=0.2, alpha=0.4)
Related
I have as example the following DataFrame df and I want to plot the price as x-axis and share_1 and share_2 as y-axis in bar stacked form. I want to avoid using pandas.plot and rather using plt.bar and extract the x_values and y_values from the Dataframe.
Price size share_1 share_2
10 1 0.05 0.95
10 2 0.07 0.93
10 3 0.1 0.95
20 4 0.15 0.75
20 5 0.2. 0.8
20 6 0.35 0.65
30 7 0.5. 0.5
30 8 0.53 0.47
30 9 0.6. 0.4
This is the way I proceed:
x= df['Price']
y1= df['share_1']
y2= df['share_2']
plt.bar(x,y1,label='share_1')
plt.bar(x,y2,label='share_2')
I still have the problem that the matplotlib removed the duplicate values the x-axis or maybe the mean value for the duplicated values is calculated automatically so that I get 3 value in the x-axis and not 6 as I aim to have. I don't know what is the reason.
my questions are:
It's possible to extract x and y values as I did or should I convert the values in certain form as string or list?
How can I avoid the fact that the duplicate values are removed in the x-axis. I want to have exactly the same number of x_values as in the DataFrame
Try:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.bar(x, y1, label="share_1")
ax.bar(x, y2, label="share_2", bottom=y1)
ax.set_xticks(x)
ax.legend()
ax.set_xticklabels(labels)
plt.show()
As an aside, consider using pandas.plot as follows:
fig,ax = plt.subplots()
df.plot.bar(x="Price", y=["share_1","share_2"], stacked=True, ax=ax)
I have the following dataset:
df = pd.DataFrame({'cls': [1,2,2,1,2,1,2,1,2,1,2],
'x': [10,11,21,21,8,1,4,3,5,6,2],
'y': [10,1,2,2,5,2,4,3,8,6,5]})
df['bin'] = pd.qcut(np.array(df['x']), 4)
a = df.groupby(['bin', 'cls'])['y'].mean()
a
This gives me
bin cls
(0.999, 3.5] 1 2.5
2 5.0
(3.5, 6.0] 1 6.0
2 6.0
(6.0, 10.5] 1 10.0
2 5.0
(10.5, 21.0] 1 2.0
2 1.5
Name: y, dtype: float64
I want to plot the right-most column (that is, the average of y per cls per bin) per bin per class. That is, for each bin we have two values of y that I would like to plot as points/scatters. Is that possible using matplotlib or seaborn?
You can indeed use seaborn for what you're asking. Does this work?
# import libraries
import matplotlib.pyplot as plt
import seaborn as sns
# set up some plotting options
fig = plt.figure(figsize=(5, 5))
ax = fig.add_subplot(1,1,1)
# we reset index to avoid having to do multi-indexing
a = a.reset_index()
# use seaborn with argument 'hue' to do the grouping
sns.barplot(x="bin", y="y", hue="cls", data=a, ax=ax)
plt.show()
EDIT: I've just noticed that you wanted to plot "points". I wouldn't advise it for this dataset but you can do that if you replace barplot with catplot.
I want to plot a box plot with my DataFrame:
A B C
max 10 11 14
min 3 4 10
q1 5 6 12
q3 9 7 13
how can I plot a box plot with these fixed values?
You can use the Axes.bxp method in matplotlib, based on this helpful answer. The input is a list of dictionaries containing the relevant values, but the median is a required key in these dictionaries. Since the data you provided does not include medians, I have made up medians in the code below (but you will need to calculate them from your actual data).
import matplotlib.pyplot as plt
import pandas as pd
# reproducing your data
df = pd.DataFrame({'A':[10,3,5,9],'B':[11,4,6,7],'C':[14,10,12,13]})
# add a row for median, you need median values!
sample_medians = {'A':7, 'B':6.5, 'C':12.5}
df = df.append(sample_medians, ignore_index=True)
df.index = ['max','min','q1','q3','med']
Here is the modified df with medians included:
>>> df
A B C
max 10.0 11.0 14.0
min 3.0 4.0 10.0
q1 5.0 6.0 12.0
q3 9.0 7.0 13.0
med 7.0 6.5 12.5
Now we transform the df into a list of dictionaries:
labels = list(df.columns)
# create dictionaries for each column as items of a list
bxp_stats = df.apply(lambda x: {'med':x.med, 'q1':x.q1, 'q3':x.q3, 'whislo':x['min'], 'whishi':x['max']}, axis=0).tolist()
# add the column names as labels to each dictionary entry
for index, item in enumerate(bxp_stats):
item.update({'label':labels[index]})
_, ax = plt.subplots()
ax.bxp(bxp_stats, showfliers=False);
plt.show()
Unfortunately the median line is a required parameter so it must be specified for every box. Therefore we just make it as thin as possible to be virtually unseeable.
If you want each box to be drawn with different specifications, they will have to be in different subplots. I understand if this looks kind of ugly, so you can play around with the spacing between subplots or consider removing some of the y-axes.
fig, axes = plt.subplots(nrows=1, ncols=3, sharey=True)
# specify list of background colors, median line colors same as background with as thin of a width as possible
colors = ['LightCoral', '#FEF1B5', '#EEAEEE']
medianprops = [dict(linewidth = 0.1, color='LightCoral'), dict(linewidth = 0.1, color='#FEF1B5'), dict(linewidth = 0.1, color='#EEAEEE')]
# create a list of boxplots of length 3
bplots = [axes[i].bxp([bxp_stats[i]], medianprops=medianprops[i], patch_artist=True, showfliers=False) for i in range(len(df.columns))]
# set each boxplot a different color
for i, bplot in enumerate(bplots):
for patch in bplot['boxes']:
patch.set_facecolor(colors[i])
plt.show()
Say I have a data set that is in two columns. I want to plot a line plot iterating through every 10. So, I would take the first 10, and then the second 10, which is right under the first 10, for another line plot on the same graph (different color line). The data is stacked on each other in a CSV file with no header.
Currently, I have it taking in the entire column. It plots them, however there is no differentiation as to which data set it is. I want to plot multiple lines on the same graph but the CSV file has all the data sets in one column, but I need to graph every 10.
EDIT
Below I have Data added I would like the first column to be the x-axis and the second to be the y.
Sample Data:
0 8.2
1 9.1
2 2.2
3 3.3
4 9.8
5 6.3
6 4.8
7 8.6
8 3.9
9 2.1
0 9.34
1 10.2
2 7.22
3 6.98
4 1.34
5 2.56
6 6.78
7 4.56
8 3.3
9 9.4
OK, try this:
# this is the toy data
df = pd.DataFrame({0:list(range(10))*2,
1:np.random.uniform(9,11,20)})
# set up axes for plots
fig, ax = plt.subplots(1,1)
# the groupby argument groups every 10 rows together
# then pass it to the `lambda` function,
# which plots each chunk to the given plt axis
df.groupby(df.reset_index().index//10).apply(lambda x: ax.plot(x[0], x[1]) )
plt.show()
Option 2:
I found sns is a better tool for the purpose:
fig, ax = plt.subplots(1,1, figsize=(10,6))
sns.lineplot(x=df[0],
y=df[1],
hue=df.reset_index().index//10,
data=df,
palette='Set1')
plt.show()
outputs:
This is my DataFrame df:
bin qty
0 (0.0, 25.0] 3634.805042
1 (25.0, 50.0] 1389.567460
2 (50.0, 75.0] 1177.400000
3 (75.0, 100.0] 898.750000
4 (100.0, 125.0] 763.000000
I want to create a bar chart like a histogram. Y axis should be qty and X axis should be bin, for example "(0.0, 25.0]", rotated vertically.
I tried this, but it fails because bin is not numeric:
plt.bar(df.bin, df.qty, align='center', alpha=0.5)
plt.show()
Let's try, using Pandas Plot:
df.plot.bar('bin','qty', alpha=.5)
Output:
Using matplotlib:
x = pd.np.arange(len(df['bin']))
fig,ax = plt.subplots(figsize=(14,8))
ax.bar(x,df['qty'])
width = .35
ax.set_xticks(x + width // 2)
ax.set_xticklabels(df['bin'])
plt.show()
Output:
If you're trying to use matplotlib, your bin column isn't a valid object. matplotlib.pyplot.bar requires a sequence of scalars equivalent to the left value of each bin. So your dataframe should look like
bin qty
0 0.0 3634.805042
1 25.0 1389.567460
2 50.0 1177.400000
3 75.0 898.750000
4 100.0 763.000000