draw multiple box plots on a single graph [duplicate] - python
This question already has answers here:
Import multiple CSV files into pandas and concatenate into one DataFrame
(20 answers)
dataframe to long format
(2 answers)
seaborn boxplot and stripplot points aren't aligned over the x-axis by hue
(1 answer)
Closed 6 months ago.
For a given dataset I am plotting a box plot of size of object at 10 different points as below:
import matplotlib.pyplot as plt
import matplotlib.font_manager as font_manager
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import matplotlib as mpl
font_prop = font_manager.FontProperties( size=18)
def plot (path, name=""):
df = pd.read_csv(path, index_col=0)
df = df.dropna()
Position = [1 + i // df.shape[0] for i in range(df.size)]
df_n = [df[col] for col in df.columns]
df_t = pd.concat(df_n).tolist()
groups = [[] for i in range(max(Position))]
[groups[Position[i] - 1].append(df_t[i]) for i in range(len(df_t))]
plt.figure(figsize=(12, 5))
plt.scatter(Position, df_t, color='g')
b = plt.boxplot(groups, patch_artist=False)
for median in b['medians']:
median.set(color='r', linewidth=2)
A typical graph would be like this:
I have 4 different datasets and I would like to present a graph where on the position axis (x axis) there will be 4 bar plots above each position. How would I modify my code to do that?
Here is the sample dataset:
https://github.com/aebk2015/multipleboxplot.git
,P1,P2,P3,P4,P5,P6,P7,P8,P9,P10,Class
1,7.6,1.0,1.0,1.0,1.0,6.0,49.0,1.0,1.0,40.0,L
2,9.7,2.7,5.6,1.0,1.0,1.0,34.0,1.0,1.0,1.0,L
3,1.0,6.0,1.0,1.0,1.0,3.0,39.0,1.0,28.0,1.0,L
4,8.0,25.5,1.0,1.0,1.0,1.0,24.0,1.0,1.0,1.0,L
5,1.0,29.0,1.0,1.0,1.0,1.0,38.0,29.0,20.0,1.0,L
6,4.0,34.0,1.0,1.0,1.0,39.0,14.0,1.0,12.0,1.0,L
7,1.0,17.0,1.0,1.0,1.0,1.0,20.8,1.0,14.6,1.0,L
8,1.0,1.0,1.0,1.0,1.0,1.0,19.0,17.5,1.0,1.0,L
9,1.0,30.0,1.0,1.0,1.0,3.0,23.0,1.0,1.0,1.0,L
10,1.0,5.0,25.0,1.0,1.0,17.0,6.3,1.0,17.0,1.0,L
1,11.8,19.0,1.0,1.0,1.0,11.3,2.0,4.0,5.0,1.0,C
2,12.0,17.0,20.0,9.0,1.0,23.0,4.0,7.0,1.0,1.0,C
3,14.0,30.0,8.0,1.0,11.0,24.0,38.0,1.0,3.5,1.0,C
4,10.5,10.4,11.5,20.5,1.0,22.0,3.0,15.0,5.6,3.7,C
5,1.0,13.5,8.0,6.6,1.0,37.0,1.0,1.0,1.0,4.0,C
6,12.4,22.0,1.0,1.0,1.0,29.0,17.0,11.0,1.0,1.0,C
7,1.0,43.0,1.0,1.0,1.0,10.0,18.0,8.6,1.0,1.0,C
8,15.0,12.0,1.0,35.0,1.0,1.0,1.0,10.0,3.0,1.0,C
9,1.0,24.0,8.0,1.0,1.0,1.0,4.0,1.0,1.0,1.0,C
10,4.6,2.0,7.4,1.0,1.0,22.0,5.6,1.0,25.0,1.0,C
1,1.0,39.0,11.0,13.0,1.0,1.0,28.0,7.0,1.0,7.0,W
2,8.0,52.0,22.0,10.0,1.0,1.0,33.0,13.0,1.0,4.8,W
3,1.0,28.0,1.0,10.0,1.0,1.0,24.0,3.0,1.0,4.0,W
4,8.8,11.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,W
5,1.0,42.0,1.0,1.0,1.0,69.0,1.0,31.0,1.0,49.0,W
6,9.0,36.0,11.0,14.0,24.0,1.0,8.0,1.0,1.0,15.8,W
7,13.0,33.0,12.7,8.7,1.0,1.0,7.8,38.0,1.0,1.0,W
8,1.0,36.0,12.0,1.0,1.0,12.0,1.0,1.0,1.0,1.0,W
9,1.0,10.0,12.0,1.0,1.0,1.0,64.0,13.0,1.0,14.0,W
10,8.0,31.0,19.0,1.0,24.0,1.0,48.0,1.0,1.0,1.0,W
1,1.0,9.7,6.8,53.0,1.0,57.0,1.0,9.5,1.0,1.0,B
2,5.8,16.3,1.0,10.8,1.0,58.0,1.0,1.0,1.0,1.0,B
3,1.0,38.0,17.0,34.0,1.0,55.0,1.0,8.0,1.0,1.0,B
4,1.0,42.0,1.0,26.0,1.0,1.0,65.0,44.0,1.0,1.0,B
5,41.0,43.0,16.0,9.7,1.0,36.0,61.0,1.0,1.0,1.0,B
6,47.0,20.0,1.0,1.0,1.0,1.0,28.0,7.7,1.0,1.0,B
7,22.0,92.0,1.0,1.0,1.0,20.0,15.0,1.0,1.0,1.0,B
8,31.0,72.0,1.0,1.0,1.0,1.0,20.0,1.0,1.0,1.0,B
Related
Plotting dataframe using matplot lib [duplicate]
This question already has answers here: Line plot with data points in pandas (2 answers) Closed 1 year ago. Hi I am trying to get a line plot for a dataframe: i = [0.01,0.02,0.03,....,0.98,0.99,1.00] values= [76,98,22,.....,32,98,100] but there is index from 0,1,...99 as well and when I plot the index line also gets plotted. How do I ignore the plotting of index? I used the following code: plt.plot(df,color= 'blue', label= 'values') plt.title('values for corresponding i') plt.legend(loc= 'upper right') plt.xlabel("i") plt.ylabel("values") plt.show()
You could use plot.line directly on pandas dataframe, it's a wrapper around matplotlib and it makes stuff easier. Example: import pandas as pd import numpy as np import matplotlib.pyplot as plt # Generate random DataFrame i = np.arange(0, 1, 0.01) values = np.random.randint(1, 100, 100) df = pd.DataFrame({"i": i, "values": values}) # Plot df.plot.line(x="i", y="values", color="blue", label="values") plt.title("values for corresponding i") plt.legend(loc="upper right") plt.xlabel("i") plt.ylabel("values") Result:
Avoiding overlapping plots in seaborn bar plot
I have the following code where I am trying to plot a bar plot in seaborn. (This is a sample data and both x and y variables are continuous variables). import pandas as pd import numpy as np import seaborn as sns import matplotlib.pyplot as plt xvar = [1,2,2,3,4,5,6,8] yvar = [3,6,-4,4,2,0.5,-1,0.5] year = [2010,2011,2012,2010,2011,2012,2010,2011] df = pd.DataFrame() df['xvar'] = xvar df['yvar']=yvar df['year']=year df sns.set_style('whitegrid') fig,ax=plt.subplots() fig.set_size_inches(10,5) sns.barplot(data=df,x='xvar',y='yvar',hue='year',lw=0,dodge=False) It results in the following plot: Two questions here: I want to be able to plot the two bars on 2 side by side and not overlapped the way they are now. For the x-labels, in the original data, I have alot of them. Is there a way I can set xticks to a specific frequency? for instance, in the chart above only I only want to see 1,3 and 6 for x-labels. Note: If I set dodge = True then the lines become very thin with the original data.
For the first question, get the patches in the bar chart and modify the width of the target patch. It also shifts the position of the x-axis to represent the alignment. The second question can be done by using slices to set up a list or a manually created list in a specific order. import pandas as pd import numpy as np import seaborn as sns import matplotlib.pyplot as plt xvar = [1,2,2,3,4,5,6,8] yvar = [3,6,-4,4,2,0.5,-1,0.5] year = [2010,2011,2012,2010,2011,2012,2010,2011] df = pd.DataFrame({'xvar':xvar,'yvar':yvar,'year':year}) fig,ax = plt.subplots(figsize=(10,5)) sns.set_style('whitegrid') g = sns.barplot(data=df, x='xvar', y='yvar', hue='year', lw=0, dodge=False) for idx,patch in enumerate(ax.patches): current_width = patch.get_width() current_pos = patch.get_x() if idx == 8 or idx == 15: patch.set_width(current_width/2) if idx == 15: patch.set_x(current_pos+(current_width/2)) ax.set_xticklabels([1,'',3,'','',6,'']) plt.show()
How can I get actual values from pandas df in sns.distplot displayed on x axis [duplicate]
This question already has an answer here: Prevent scientific notation (1 answer) Closed 1 year ago. I'm trying to create an histogram made of data I got as homework. when I'm trying to plot it, values on the x axis are different (0.0-1.0) from those in the actual dataset (20,000 - 1,000,000). How do I get the range of actual values from my data to be displayed on the x axis of the histogram instead? My code: import pandas as pd import matplotlib.pyplot as plt import seaborn as sns df = pd.read_csv('okcupid_profiles.csv') df = df[df['income'] != -1] income_histogram = sns.distplot(df['income'], bins=40) income_histogram the histogram I've created Thanks
The values displayed in the x-axis are the same on the dataset, if you can see in the bottom right corner there is 1e6, that mean : 0.1 * 1e6 == 100,000
How to color a single bar based off name in a Seaborn barplot Python [duplicate]
This question already has answers here: How to change the color of a single bar if condition is True (2 answers) Closed 2 years ago. I have the following dataframe producing the following plot: # Import pandas library import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns # initialize data data = [['tom', 10,1,'a'], ['matt', 15,5,'a'], ['Nick', 14,1,'a']] # Create the pandas DataFrame df = pd.DataFrame(data, columns = ['Name', 'Attempts','Score','Category']) print(df.head(3)) Name Attempts Score Category 0 tom 10 1 a 1 matt 15 5 a 2 Nick 14 1 a # Initialize the matplotlib figure sns.set() sns.set_context("paper") sns.axes_style({'axes.spines.left': True}) f, ax = plt.subplots(nrows=3,figsize=(8.27,11.7)) # Plot sns.set_color_codes("muted") sns.barplot(x="Attempts", y='Name', data=df, label="Total", color="b", ax=ax[0]) sns.scatterplot(x='Score',y='Name',data=df,zorder=10,color='k',edgecolor='k',ax=ax[0],legend=False) ax[0].set_title("title") plt.show() I want to highlight just the bar Nick in a different color (eg red). Is there an easy way to do this?
In the barplot method, you can use the palette instead of the parameter color and do a loop to check which value you want to change. sns.barplot(x="Attempts", y='Name', data=df, label="Total", palette=["b" if x!='Nick' else 'r' for x in df.Name], ax=ax[0]) and you get
Plot Multicolored Time Series Plot based on Conditional in Python [duplicate]
This question already has answers here: How to plot multi-color line if x-axis is date time index of pandas (2 answers) Closed 5 years ago. I have a pandas Financial timeseries DataFrame with two columns and one datetime index. TOTAL.PAPRPNT.M Label 1973-03-01 25504.000 3 1973-04-01 25662.000 3 1973-05-01 25763.000 0 1973-06-01 25996.000 0 1973-07-01 26023.000 1 1973-08-01 26005.000 1 1973-09-01 26037.000 2 1973-10-01 26124.000 2 1973-11-01 26193.000 3 1973-12-01 26383.000 3 As you can see each data-set corresponds to a 'Label'. This label should essentially classify if the line from the previous 'point' to the next 'point' carries certain characteristics (different types of stock graph changes) and therefore use a separate color for each of these plots. This question is related to this question Plot Multicolored line based on conditional in python but the 'groupby' part totally skipped my understanding and this scheme is Bicolored scheme rather than a multicolored one (I have four labels). I want to create a Multicoloured Plot of the graph based on the Labels associated with each entry in the dataframe.
Here's an example of what I think your trying to do. It's based on the MPL documentation mentioned in the comments and uses randomly generated data. Just map the colormap boundaries to the discrete values given by the number of classes. import numpy as np import matplotlib.pyplot as plt from matplotlib.collections import LineCollection from matplotlib.colors import ListedColormap, BoundaryNorm import pandas as pd num_classes = 4 ts = range(10) df = pd.DataFrame(data={'TOTAL': np.random.rand(len(ts)), 'Label': np.random.randint(0, num_classes, len(ts))}, index=ts) print(df) cmap = ListedColormap(['r', 'g', 'b', 'y']) norm = BoundaryNorm(range(num_classes+1), cmap.N) points = np.array([df.index, df['TOTAL']]).T.reshape(-1, 1, 2) segments = np.concatenate([points[:-1], points[1:]], axis=1) lc = LineCollection(segments, cmap=cmap, norm=norm) lc.set_array(df['Label']) fig1 = plt.figure() plt.gca().add_collection(lc) plt.xlim(df.index.min(), df.index.max()) plt.ylim(-1.1, 1.1) plt.show() Each line segment is coloured according to the class label given in df['Label'] Here's a sample result: