Related
This question already has answers here:
Plotting two columns of dataFrame in seaborn
(1 answer)
Seaborn multiple barplots
(2 answers)
Closed last month.
sample data:
list1 = ['C','C1,C2','A9','GV5','A6','A3']
arr1 = np.random.default_rng().uniform(low=5,high=10,size=[6,3])
df = pd.DataFrame(arr1,index = list1, columns=["A","B","C"])
I can make a barplot to compare a single category of data, but I'm not sure how to display multiple categories at once in a side by side; showing the values of columns A, B, and C side by side for each index value.
I tried adding more into the y designation, but it was fruitless.
ax = sns.barplot(data = df, x = df.index, y='A') ##chart of only one category
ax = sns.barplot(data = df, x = df.index, y=('A','B','C')) ##doesn't work
This question already has answers here:
Remove prefix (or suffix) substring from column headers in pandas
(7 answers)
How to convert column names of a DataFrame from string to integers
(1 answer)
Rotate pandas DataFrame 90 degrees
(1 answer)
matplotlib large set of colors for plots
(1 answer)
How to plot multiple pandas columns
(3 answers)
Closed 7 months ago.
I have a Pandas DataFrame of measurements:
,Fp076,Fp084,Fp092,Fp099,Fp107,Fp115,Fp122,Fp130,Fp143,Fp151,Fp158,Fp166,Fp174,Fp181,Fp189,Fp197,Fp204,Fp212,Fp220,Fp227
0,0.531743,0.512256,0.427771,0.444216,0.332228,0.296139,0.202653,0.298724,0.341529,0.276829,0.24803,0.278406,0.345853,0.317384,0.32032,0.179936,0.205871,0.495948,0.167417,0.097147
1,-0.032964,0.047469,0.128079,0.142839,0.253755,0.165963,0.210111,0.239816,0.162333,0.115085,0.129781,0.134795,0.09575,0.243093,0.10684,0.195201,0.143984,0.266312,0.198049,0.084467
2,0.459728,0.541346,0.830889,0.368135,0.407241,0.499617,0.383159,0.507517,0.409411,0.325441,0.305605,0.378738,0.342981,0.43766,0.295844,0.228164,0.276319,0.226467,0.375678,0.219189
3,2.6838,2.394591,2.493416,0.874906,2.113343,1.812258,1.667047,1.779347,1.515663,1.620196,1.539494,1.63528,1.555373,1.471318,1.610067,1.507087,1.467174,1.458346,1.681998,1.14625
4,0.368415,0.435004,0.155035,0.161064,0.180133,0.202117,0.142981,0.138321,0.122557,0.099213,0.098213,0.062174,0.123664,0.2051,0.167415,0.185133,0.127677,0.037875,0.156252,0.015579
5,0.213577,0.187244,0.274151,0.173572,0.296122,0.308341,0.164578,0.159559,0.318383,0.181329,0.260223,0.257395,0.241779,0.292731,0.244476,0.187523,0.247331,0.293338,0.323894,0.179478
6,0.096093,0.140454,0.067185,6.441058,0.016797,0.141757,0.181792,0.13692,0.204091,0.180182,0.149626,0.220342,0.179286,0.276316,0.104531,0.20343,0.045161,-0.004546,0.045833,0.193849
7,0.286467,0.086673,-0.106538,-0.261802,0.16964,0.182858,0.062774,0.20471,0.040105,0.086975,0.211068,0.182423,0.098721,0.077085,0.102986,0.129935,0.130571,0.176024,0.154079,0.102391
8,0.480631,0.714554,0.858241,0.746666,0.555411,0.452689,0.337912,0.333942,0.269359,0.221312,0.09818,0.226218,0.287361,0.209858,0.222951,0.207584,0.258397,0.026713,0.162048,0.149924
9,1.055405,0.638777,0.468793,0.41544,0.559187,0.471218,0.493805,0.544716,0.412903,0.412182,0.51041,0.383991,0.351397,0.383201,0.368308,0.237954,0.330242,0.262648,0.425204,0.434928
10,1.116658,0.737544,0.854376,-0.004434,0.419419,0.35921,0.377095,0.273815,0.258913,0.290614,0.271843,0.321572,0.234764,0.298931,0.206039,0.192746,0.200727,0.132419,0.229914,0.159857
11,-0.004305,0.052289,0.275035,-0.849414,0.104146,0.185819,0.128376,0.136433,0.091787,0.149753,0.107246,0.081407,0.118816,0.117434,0.169153,0.108273,0.205751,0.145238,0.153086,0.114278
12,0.836223,0.323901,0.269564,0.364082,0.343695,0.386785,0.24881,0.307267,0.222634,0.214189,0.12167,0.251107,0.134083,0.284545,0.175479,0.221877,0.184749,0.225089,0.205388,0.214972
where each row is the flux measurements at the frequencies in the header (76, 84, 92, 99... MHz). I'm trying to plot a line graph of the flux measurements for a row. Since the frequencies in the header are not linear, I've tried this:
f = np.array([76,84,92,99,107,115,122,130,143,151,158,166,174,181,189,197,204,212,220,227])
y1 = [0.531743,0.512256,0.427771,0.444216,0.332228,0.296139,0.202653,0.298724,0.341529,0.276829,0.24803,0.278406,0.345853,0.317384,0.32032,0.179936,0.205871,0.495948,0.167417,0.097147]
y2 = [-0.032964,0.047469,0.128079,0.142839,0.253755,0.165963,0.210111,0.239816,0.162333,0.115085,0.129781,0.134795,0.09575,0.243093,0.10684,0.195201,0.143984,0.266312,0.198049,0.084467]
y3 = [0.459728,0.541346,0.830889,0.368135,0.407241,0.499617,0.383159,0.507517,0.409411,0.325441,0.305605,0.378738,0.342981,0.43766,0.295844,0.228164,0.276319,0.226467,0.375678,0.219189]
fig, ax = plt.subplots()
ax.scatter(f, y1, label = r'$\alpha = -0.37$')
ax.plot(f, y1)
ax.scatter(f, y2, label = r'$\alpha = NaN$')
ax.plot(f, y2)
ax.scatter(f, y3, label = r'$\alpha = -0.75$')
ax.plot(f, y3)
ax.set_xlabel('Frequency (MHz)')
ax.set_ylabel('Flux (Jy/beam)')
ax.grid(which = 'both', axis = 'both')
which is just copy-pasting the first three rows of data, to produce:
That's basically what I want, but what's a better way to do it?
There are many ways to solve this problem, but the simplest way (that I can think of) is to pivot your dataframe and then use seaborn to plot all the columns
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# convert you sample data
data = [[e for e in row.split(',') if e] for row in data_.split("\n") if row]
columns = data[0]
# create the `x` axis
columns = [int(col.replace('Fp','')) for col in columns]
columns = ['index'] + columns
data = data[1:]
df = pd.DataFrame(data=data, columns=columns)
df = df.drop(columns=['index'])
df = df.astype('float')
This is the example of the dataframe without transforming the headers with int(col.replace('Fp',''))
you can transform your columns as I did above using
df.columns = [int(col.replace('Fp','')) for col in df.columns]
Once this is done you can do the following pivot
# the pivot of your data
df_ = df.T
# plot your data
plt.figure(figsize=(15,8))
sns.lineplot(data=df_)
plt.title('Example of timeseries plot')
plt.xlabel('Frequency(MHz)')
plt.ylabel('Flux (Jy/beam)')
the output is
You can play around with the various plotting to your desire, but this would be the simplest way (tip - try to leverage as much of the seaborn or pandas plotting methods for this aggregated plots)
This question already has answers here:
Import multiple CSV files into pandas and concatenate into one DataFrame
(20 answers)
dataframe to long format
(2 answers)
seaborn boxplot and stripplot points aren't aligned over the x-axis by hue
(1 answer)
Closed 6 months ago.
For a given dataset I am plotting a box plot of size of object at 10 different points as below:
import matplotlib.pyplot as plt
import matplotlib.font_manager as font_manager
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import matplotlib as mpl
font_prop = font_manager.FontProperties( size=18)
def plot (path, name=""):
df = pd.read_csv(path, index_col=0)
df = df.dropna()
Position = [1 + i // df.shape[0] for i in range(df.size)]
df_n = [df[col] for col in df.columns]
df_t = pd.concat(df_n).tolist()
groups = [[] for i in range(max(Position))]
[groups[Position[i] - 1].append(df_t[i]) for i in range(len(df_t))]
plt.figure(figsize=(12, 5))
plt.scatter(Position, df_t, color='g')
b = plt.boxplot(groups, patch_artist=False)
for median in b['medians']:
median.set(color='r', linewidth=2)
A typical graph would be like this:
I have 4 different datasets and I would like to present a graph where on the position axis (x axis) there will be 4 bar plots above each position. How would I modify my code to do that?
Here is the sample dataset:
https://github.com/aebk2015/multipleboxplot.git
,P1,P2,P3,P4,P5,P6,P7,P8,P9,P10,Class
1,7.6,1.0,1.0,1.0,1.0,6.0,49.0,1.0,1.0,40.0,L
2,9.7,2.7,5.6,1.0,1.0,1.0,34.0,1.0,1.0,1.0,L
3,1.0,6.0,1.0,1.0,1.0,3.0,39.0,1.0,28.0,1.0,L
4,8.0,25.5,1.0,1.0,1.0,1.0,24.0,1.0,1.0,1.0,L
5,1.0,29.0,1.0,1.0,1.0,1.0,38.0,29.0,20.0,1.0,L
6,4.0,34.0,1.0,1.0,1.0,39.0,14.0,1.0,12.0,1.0,L
7,1.0,17.0,1.0,1.0,1.0,1.0,20.8,1.0,14.6,1.0,L
8,1.0,1.0,1.0,1.0,1.0,1.0,19.0,17.5,1.0,1.0,L
9,1.0,30.0,1.0,1.0,1.0,3.0,23.0,1.0,1.0,1.0,L
10,1.0,5.0,25.0,1.0,1.0,17.0,6.3,1.0,17.0,1.0,L
1,11.8,19.0,1.0,1.0,1.0,11.3,2.0,4.0,5.0,1.0,C
2,12.0,17.0,20.0,9.0,1.0,23.0,4.0,7.0,1.0,1.0,C
3,14.0,30.0,8.0,1.0,11.0,24.0,38.0,1.0,3.5,1.0,C
4,10.5,10.4,11.5,20.5,1.0,22.0,3.0,15.0,5.6,3.7,C
5,1.0,13.5,8.0,6.6,1.0,37.0,1.0,1.0,1.0,4.0,C
6,12.4,22.0,1.0,1.0,1.0,29.0,17.0,11.0,1.0,1.0,C
7,1.0,43.0,1.0,1.0,1.0,10.0,18.0,8.6,1.0,1.0,C
8,15.0,12.0,1.0,35.0,1.0,1.0,1.0,10.0,3.0,1.0,C
9,1.0,24.0,8.0,1.0,1.0,1.0,4.0,1.0,1.0,1.0,C
10,4.6,2.0,7.4,1.0,1.0,22.0,5.6,1.0,25.0,1.0,C
1,1.0,39.0,11.0,13.0,1.0,1.0,28.0,7.0,1.0,7.0,W
2,8.0,52.0,22.0,10.0,1.0,1.0,33.0,13.0,1.0,4.8,W
3,1.0,28.0,1.0,10.0,1.0,1.0,24.0,3.0,1.0,4.0,W
4,8.8,11.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,W
5,1.0,42.0,1.0,1.0,1.0,69.0,1.0,31.0,1.0,49.0,W
6,9.0,36.0,11.0,14.0,24.0,1.0,8.0,1.0,1.0,15.8,W
7,13.0,33.0,12.7,8.7,1.0,1.0,7.8,38.0,1.0,1.0,W
8,1.0,36.0,12.0,1.0,1.0,12.0,1.0,1.0,1.0,1.0,W
9,1.0,10.0,12.0,1.0,1.0,1.0,64.0,13.0,1.0,14.0,W
10,8.0,31.0,19.0,1.0,24.0,1.0,48.0,1.0,1.0,1.0,W
1,1.0,9.7,6.8,53.0,1.0,57.0,1.0,9.5,1.0,1.0,B
2,5.8,16.3,1.0,10.8,1.0,58.0,1.0,1.0,1.0,1.0,B
3,1.0,38.0,17.0,34.0,1.0,55.0,1.0,8.0,1.0,1.0,B
4,1.0,42.0,1.0,26.0,1.0,1.0,65.0,44.0,1.0,1.0,B
5,41.0,43.0,16.0,9.7,1.0,36.0,61.0,1.0,1.0,1.0,B
6,47.0,20.0,1.0,1.0,1.0,1.0,28.0,7.7,1.0,1.0,B
7,22.0,92.0,1.0,1.0,1.0,20.0,15.0,1.0,1.0,1.0,B
8,31.0,72.0,1.0,1.0,1.0,1.0,20.0,1.0,1.0,1.0,B
This question already has an answer here:
Prevent scientific notation
(1 answer)
Closed 1 year ago.
I'm trying to create an histogram made of data I got as homework.
when I'm trying to plot it, values on the x axis are different (0.0-1.0) from those in the actual dataset (20,000 - 1,000,000).
How do I get the range of actual values from my data to be displayed on the x axis of the histogram instead?
My code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('okcupid_profiles.csv')
df = df[df['income'] != -1]
income_histogram = sns.distplot(df['income'], bins=40)
income_histogram
the histogram I've created
Thanks
The values displayed in the x-axis are the same on the dataset, if you can see in the bottom right corner there is 1e6, that mean :
0.1 * 1e6 == 100,000
This question already has answers here:
How to change the color of a single bar if condition is True
(2 answers)
Closed 2 years ago.
I have the following dataframe producing the following plot:
# Import pandas library
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# initialize data
data = [['tom', 10,1,'a'], ['matt', 15,5,'a'], ['Nick', 14,1,'a']]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Attempts','Score','Category'])
print(df.head(3))
Name Attempts Score Category
0 tom 10 1 a
1 matt 15 5 a
2 Nick 14 1 a
# Initialize the matplotlib figure
sns.set()
sns.set_context("paper")
sns.axes_style({'axes.spines.left': True})
f, ax = plt.subplots(nrows=3,figsize=(8.27,11.7))
# Plot
sns.set_color_codes("muted")
sns.barplot(x="Attempts", y='Name', data=df,
label="Total", color="b", ax=ax[0])
sns.scatterplot(x='Score',y='Name',data=df,zorder=10,color='k',edgecolor='k',ax=ax[0],legend=False)
ax[0].set_title("title")
plt.show()
I want to highlight just the bar Nick in a different color (eg red). Is there an easy way to do this?
In the barplot method, you can use the palette instead of the parameter color and do a loop to check which value you want to change.
sns.barplot(x="Attempts", y='Name', data=df,
label="Total", palette=["b" if x!='Nick' else 'r' for x in df.Name], ax=ax[0])
and you get