I have data that is arranged like the following. This is an example from a dataset with 100s of loci.
loci head(%) tail(%) wing(%)
1 20 40 40
2 10 50 40
3 12 48 40
4 22 38 40
I wish to make a ternary plot for these, with head, tail, and wing making the three points of the triangle. The edges of the triangle would represent the percentages. How can I begin to do this using pandas? Any guidance would be useful.
Using matplotlib and a couple functions from the radar_chart example, we can create a radar chart directly from a dataframe.
Before we read the dataframe, you'll want to copy the imports, radar_factory and unit_poly_verts functions from the example matplotlib provides. You also need pandas, obviously.
Your imports will look like this:
import matplotlib.pyplot as plt
from matplotlib.path import Path
from matplotlib.spines import Spine
from matplotlib.projections.polar import PolarAxes
from matplotlib.projections import register_projection
import pandas as pd
import numpy as np
Since you want only the head, tail and wing, and it looks like loci is an index, I imported the data set with user_col="loci". This means the dataframe looks like this upon import:
head(%) tail(%) wing(%)
loci
1 20 40 40
2 10 50 40
3 12 48 40
4 22 38 40
Finally, you want to create a function that operates similarly to the code in the example, but instead reads the dataframe. The code below should do that and is based on the code in the '__main__' block. I stripped out some of the code that isn't required for this example and unhardcoded the colors:
def nColors(k=2, cmap='spectral'):
if type(cmap) == str:
cm = plt.get_cmap(cmap)
colors = [cm(1.*i/(k-1)) for i in range(k)]
elif cmap==None:
colors = ['k']
else:
colors = cmap
return colors
def plot_radar(data):
N = data.shape[1]
theta = radar_factory(N, frame='circle')
spoke_labels = data.columns.tolist()
fig = plt.figure(figsize=(9, 9))
fig.subplots_adjust(wspace=0.25, hspace=0.20, top=0.85, bottom=0.05)
ax = fig.add_subplot(111, projection='radar')
colors = nColors(len(data), cmap='spectral')
for i, (index, d) in enumerate(data.iterrows()):
ax.plot(theta, d.tolist(), color=colors[i])
ax.fill(theta, d.tolist(), facecolor=colors[i], alpha=0.25)
ax.set_varlabels(spoke_labels)
plt.show()
Call this function and pass your dataframe:
plot_radar(df)
This code uses the spectral color map, but you can change that by passing a valid color map in the colors = nColors(len(data)) line as the second parameter.
You can either have a circle or a polygon (triangle in this case since there are 3 dimensions).
The above code results in a chart like this:
If you change the frame parameter in the line theta = radar_factory(N, frame='circle') to be polygon, you get a chart like this:
Related
I have the following graph 1 obtained with the following code [2]. As you can see from the first line inside for I gave the height of the rectangles based on the standard deviation value. But I can't figure out how to get the height of the corresponding rectangle. For example given the blue rectangle I would like to return the 2 intervals in which it is included which are approximately 128.8 and 130.6. How can I do this?
[2] The code I used is the following:
import pandas as pd
import matplotlib.ticker as ticker
import matplotlib.pyplot as plt
import numpy as np
dfLunedi = pd.read_csv( "0.lun.csv", encoding = "ISO-8859-1", sep = ';')
dfSlotMean = dfLunedi.groupby('slotID', as_index=False).agg( NLunUn=('date', 'nunique'),NLunTot = ('date', 'count'), MeanBPM=('tempo', 'mean'), std = ('tempo','std') )
#print(dfSlotMean)
dfSlotMean.drop(dfSlotMean[dfSlotMean.NLunUn < 3].index, inplace=True)
df = pd.DataFrame(dfSlotMean)
df.to_csv('1.silLunedi.csv', sep = ';', index=False)
print(df)
bpmMattino = df['MeanBPM']
std = df['std']
listBpm = bpmMattino.tolist()
limInf = df['MeanBPM'] - df['std']
limSup = df['MeanBPM'] + df['std']
tick_spacing = 1
fig, ax = plt.subplots(1, 1)
for _, r in df.iterrows():
#
ax.plot([r['slotID'], r['slotID']+1], [r['MeanBPM']]*2, linewidth = r['std'] )
#ax.plot([r['slotID'], r['slotID']+1], [r['MeanBPM']]*2, linewidth = r['std'])
ax.xaxis.grid(True)
ax.yaxis.grid(True)
ax.yaxis.set_major_locator(ticker.MultipleLocator(tick_spacing))
ax.xaxis.set_major_locator(ticker.MultipleLocator(tick_spacing))
This is the content of the csv:
slotID NMonUnique NMonTot MeanBPM std
0 7 11 78 129.700564 29.323091
2 11 6 63 123.372397 24.049397
3 12 6 33 120.625667 24.029006
4 13 5 41 124.516341 30.814985
5 14 4 43 118.904512 26.205309
6 15 3 13 116.380538 24.336491
7 16 3 42 119.670881 27.416843
8 17 5 40 125.424125 32.215865
9 18 6 45 130.540578 24.437559
10 19 9 58 128.180172 32.099529
11 20 5 44 125.596045 28.060657
I would advise against using linewidth to show anything related to your data. The reason being that linewidth is measured in "points" (see the matplotlib documentation), the size of which are not related to the xy-space that you plot your data in. To see this in action, try plotting with different linewidths and changing the size of the plotting-window. The linewidth will not change with the axes.
Instead, if you do indeed want a rectangle, I suggest using matplotlib.patches.Rectangle. There is a good example of how to do that in the documentation, and I've also added an even shorter example below.
To give the rectangles different colors, you can do as here here and simply get a random tuple with 3 elements and use that for the color. Another option is to take a list of colors, for example the TABLEAU_COLORS from matplotlib.colors and take consecutive colors from that list. The latter may be better for testing, as the rectangles will get the same color for each run, but notice that there are just 10 colors in TABLEAU_COLORS, so you will have to cycle if you have more than 10 rectangles.
import matplotlib.pyplot as plt
import matplotlib.patches as ptc
import random
x = 3
y = 4.5
y_std = 0.3
fig, ax = plt.subplots()
for i in range(10):
c = tuple(random.random() for i in range(3))
# The other option as comment here
#c = mcolors.TABLEAU_COLORS[list(mcolors.TABLEAU_COLORS.keys())[i]]
rect = ptc.Rectangle(xy=(x, y-y_std), width=1, height=2*y_std, color=c)
ax.add_patch(rect)
ax.set_xlim((0,10))
ax.set_ylim((0,5))
plt.show()
If you define the height as the standard deviation, and the center is at the mean, then the interval should be [mean-(std/2) ; mean+(std/2)] for each rectangle right? Is it intentional that the rectangles overlap? If not, I think it is your use of linewidth to size the rectangles which is at fault. If the plot is there to visualize the mean and variance of the different categories something like a boxplot or raincloud plot might be better.
I have data comprising of 3 columns:
zone | pop1 | pop2
---- ---- ----
3 4500 3800
2 2800 3100
1 1350 1600
2 2100 1900
3 3450 3600
I would like to draw a scatter plot of pop1 and pop2, with the circles having colors based on the value of zone.
I have the following code so far:
df = pd.read_csv(file_path)
plt.scatter(df['pop1'],df['pop2'], s = 1)
How can I give different colors, let's say red, green and blue, corresponding to zone values 1, 2 and 3 respectively?
Without using an additional library, you can also go for something like:
colors = {1:'red', 2:'green', 3:'blue'}
for i in range(len(df)):
plt.scatter(df['pop1'].iloc[i], df['pop2'].iloc[i],
c=colors[df['zone'].iloc[i]])
EDIT: You dont need to use a loop, you can use something like this:
colors = {1:'red', 2:'green', 3:'blue'}
plt.scatter(df['pop1'], df['pop2'],
c=[colors[i] for i in df['zone']])
Which gives the output:
This requires you to make a dictionary of colors for the values in zones though. Also you will spend some extra time making the list comprehension.
You can use seaborn package, which use matplotlib wrapper. It has varieties of features with beautiful plots. Here is simple example to your question.
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import pandas as pd
data = pd.DataFrame({'col1':[4500,2800,1350,2100,3450],
'col2':[3800,3100 ,1650,1900,3600],
'col3':[3,2,1,2,3]})
sns.lmplot(data=data, x='col1', y='col2', hue='col3',
fit_reg=False, legend=True)
#fit_reg is use to fit a line for regression, we need only dots.
I have:
import numpy as np
import pandas as pd
import seaborn as sb
import matplotlib.pyplot as plt
# Generate random data
set1 = np.random.randint(0, 40, 24)
set2 = np.random.randint(0, 100, 24)
# Put into dataframe and plot
df = pd.DataFrame({'set1': set1, 'set2': set2})
data = pd.melt(df)
sb.swarmplot(data=data, x='variable', y='value')
The two random distributions plotted with seaborn's swarmplot function:
I want the individual plots of both distributions to be connected with a colored line such that the first data point of set 1 in the dataframe is connected with the first data point of set 2.
I realize that this would probably be relatively simple without seaborn but I want to keep the feature that the individual data points do not overlap.
Is there any way to access the individual plot coordinates in the seaborn swarmfunction?
EDIT: Thanks to #Mead, who pointed out a bug in my post prior to 2021-08-23 (I forgot to sort the locations in the prior version).
I gave the nice answer by Paul Brodersen a try, and despite him saying that
Madness lies this way
... I actually think it's pretty straight forward and yields nice results:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
# Generate random data
rng = np.random.default_rng(42)
set1 = rng.integers(0, 40, 5)
set2 = rng.integers(0, 100, 5)
# Put into dataframe
df = pd.DataFrame({"set1": set1, "set2": set2})
print(df)
data = pd.melt(df)
# Plot
fig, ax = plt.subplots()
sns.swarmplot(data=data, x="variable", y="value", ax=ax)
# Now connect the dots
# Find idx0 and idx1 by inspecting the elements return from ax.get_children()
# ... or find a way to automate it
idx0 = 0
idx1 = 1
locs1 = ax.get_children()[idx0].get_offsets()
locs2 = ax.get_children()[idx1].get_offsets()
# before plotting, we need to sort so that the data points
# correspond to each other as they did in "set1" and "set2"
sort_idxs1 = np.argsort(set1)
sort_idxs2 = np.argsort(set2)
# revert "ascending sort" through sort_idxs2.argsort(),
# and then sort into order corresponding with set1
locs2_sorted = locs2[sort_idxs2.argsort()][sort_idxs1]
for i in range(locs1.shape[0]):
x = [locs1[i, 0], locs2_sorted[i, 0]]
y = [locs1[i, 1], locs2_sorted[i, 1]]
ax.plot(x, y, color="black", alpha=0.1)
It prints:
set1 set2
0 3 85
1 30 8
2 26 69
3 17 20
4 17 9
And you can see that the data is linked correspondingly in the plot.
Sure, it's possible (but you really don't want to).
seaborn.swarmplot returns the axis instance (here: ax). You can grab the children ax.get_children to get all plot elements. You will see that for each set of points there is an element of type PathCollection. You can determine the x, y coordinates by using the PathCollection.get_offsets() method.
I do not suggest you do this! Madness lies this way.
I suggest you have a look at the source code (found here), and derive your own _PairedSwarmPlotter from _SwarmPlotter and change the draw_swarmplot method to your needs.
Let's say I did 16 experiments replicated 3 times for a total of 48 measures.
Here's an example of my dataframe and a snippet that replicates my problem:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(np.abs(np.random.randn(48, 4)), columns=list("ABCD"))
color_list = plt.cm.gnuplot(np.linspace(0,1,16))
df.A.plot.bar( figsize=(16,7), color=color_list)
plt.axvline(15.5, c="black", lw=1)
plt.axvline(31.5, c="black", lw=1)
plt.plot()
And here's the output of the graph:
So, the vertical lines represents each replica, there is 16 bars in each. My color list is also of length 16 so I would expect to have 3 times the same colors, but at 16 and 17, there is two black bars (which is the 0th color in the list) and at 32, there is this yellow bar that comes from the offset caused by the second black line at 17. Followed by 2 black lines again.
It looks like my color_list puts a 0 automatically when it starts for the second time.
Edit : My temporary solution was to create a list of length 48 that is the first list repeated 3 times:
lst=list(np.linspace(0,1,16))
liste = lst + lst + lst
color_list = plt.cm.gnuplot(liste)
I am a newbie to matplotlib. I am trying to plot step function and having some trouble. Right now I am able to read from the file and plot it as shown below. But the graph in the top is not in steps and the one below is not a proper step. I saw examples to plot step function by giving x & y value. I am not sure how to do it by reading from a file though. Can someone help me?
from pylab import plotfile, show, gca
import matplotlib.pyplot as plt
import matplotlib.cbook as cbook
fname = cbook.get_sample_data('sample.csv', asfileobj=False)
plotfile(fname, cols=(0,1), delimiter=' ')
plotfile(fname, cols=(0,2), newfig=False, delimiter=' ')
plt.show()
Sample inputs(3 columns):
27023927 3 0
27023938 2 0
27023949 3 0
27023961 2 0
27023972 3 0
27023984 2 0
27023995 3 0
27024007 2 0
27024008 2 1
27024018 3 1
27024030 2 1
27024031 2 0
27024041 3 0
27024053 2 0
27024054 2 1
27024098 2 0
Note: I have made the y-axis1 values as 3 & 2 so that this graph can occur in the top and another y-axis2 values 0 & 1 so that it comes in the bottom as shown below
Waveform as it looks now
Essentially your resolution is too low, for the lower plot the steps (except the last one) occur over 1 unit in x, while the steps are about an order of magnitude larger. This gives the appearance of steps while if you zoom in you will see the vertical lines have a non-infinite gradient (true steps change with an infinite gradient).
This is the same problem for both the top and bottom plots. We can easily remedy this by using the step function. You will generally find it easier to import the data, in this example I use the powerful numpy genfromtxt. This loads the data as an array data:
import numpy as np
import matplotlib.pylab as plt
data = np.genfromtxt('test.csv', delimiter=" ")
ax1 = plt.subplot(2,1,1)
ax1.step(data[:,0], data[:,1])
ax2 = plt.subplot(2,1,2)
ax2.step(data[:,0], data[:,2])
plt.show()
If you are new to python then there may be two things to mention, we use two subplots (ax1 and ax2) to plot the data rather than plotting on the same plot (this means you wouldn't need to add values to spatially separate them). We access the elements of the array through the [] this gives the [column, row] with : meaning all columns and and index i being the ith column
I would propose to load the data to a numpy array
import numpy as np
data = np.loadtxt('sample.csv')
And than plot it:
# first point
ax = [data[0,0]]
ay = [data[0,1]]
for i in range(1, data.shape[0]):
if ay[-1] != data[i,1]: # if y value has changed
# add current x and old y
ax.append(data[i,0])
ay.append(ay[-1])
# add current x and current y
ax.append(data[i,0])
ay.append(data[i,1])
import matplotlib.pyplot as plt
plt.plot(ax,ay)
plt.show()
What my solution differs from yours, is that I plot two points for every change in y. The two points produce this 90 degree bend. I Only plot the first curve. Change [?,1] to [?,2] for the second one.
Thanks for the suggestions. I was able to plot it after some research and here is my code,
import csv
import datetime
import matplotlib.pyplot as plt
import numpy as np
import dateutil.relativedelta as rd
import bisect
import scipy as sp
fname = "output.csv"
portfolio_list = []
x = []
a = []
b = []
portfolio = csv.DictReader(open(fname, "r"))
portfolio_list.extend(portfolio)
for data in portfolio_list:
x.append(data['i'])
a.append(data['a'])
b.append(data['b'])
stepList = [0, 1,2,3]
fig = plt.figure(figsize=(20, 10))
ax = fig.add_subplot(111)
plt.step(x, a, 'g', where='post')
plt.step(x, b, 'r', where='post')
plt.show()
and got the image like,