Related
I've been coding a Python program that will take a list from a different program, take specific values from that list, and add them to a 2D list. This subsequently creates an animation, with one frame per sub-list within the 2D list. However, when it's animated (using Celluloid), an extra frame is added in front of it that displays a graph of every sub-list at once, which disrupts the animation.
The code I'm using is this:
#Imports the relevant parts of the external modules
from matplotlib import pyplot as plt
from celluloid import Camera
from main import projectileInfo, tickRate
displacements = [] #A 2D list of all displacements
for i in projectileInfo:
displacements.append([i[1], i[0]])
print(displacements)
fig = plt.figure()
camera = Camera(fig)
for j in displacements:
x = [j[0]]
y = [j[1]]
plt.plot(x,y, color = '000000', marker = '.') #Plots the data as black points
camera.snap()
ani = camera.animate(interval = 1000*tickRate, repeat = False)
plt.show()
The issue doesn't come up if I specify values for the animation within the code itself (e.g. displacements = [[1,1], [2,2], [3,3]], but it does if projectileInfo is specified within the program.
For reference, examples for projectileInfo and tickRate are provided:
projectileInfo = [[0, 0, 2.944881550342992, 0.5724269861296344, -1.0092420948384448, -0.03813290516155552, -12.052760210752101, -0.08473978924790115], [0.2944881550342992, 0.05724269861296344, 1.739605529267782, 0.5639530072048443, -0.35217721337929575, -0.03701225346578069, -10.592616029731769, -0.0822494521461793], [0.4684487079610774, 0.11363799933344787, 0.6803439262946049, 0.5557280619902264, -0.05386624698009846, -0.03594051938005718, -9.929702771066887, -0.0798678208445715], [0.5364831005905379, 0.1692108055324705, -0.3126263508120839, 0.5477412799057693, 0.011373937998969578, -0.0349148868178283, -9.784724582224511,
-0.07758863737295177], [0.5052204655093295, 0.22398493352304744, -1.291098809034535, 0.5399824161684741, 0.19398969267459468, -0.03393274001211678, -9.378911794056458, -0.07540608891581507], [0.376110584605876, 0.27798317513989484, -2.2289899884401807, 0.5324418072768926, 0.5781971273919331, -0.03299164661811001, -8.525117494684594, -0.07331477026246667], [0.1532115857618579, 0.3312273558675841, -3.0815017379086402, 0.5251103302506459, 1.1050566133054158, -0.0320893424586703, -7.354318637099077, -0.07130964990815623], [-0.15493858802900617, 0.3837383888926487, -3.816933601618548, 0.5179793652598302, 1.6954652941177968, -0.031223717732420407, -6.042299346404897, -0.06938603940537869], [-0.536631948190861, 0.43553632541863174, -4.421163536259038, 0.5110407613192923, 2.2747457012945764, -0.030392804526055698, -4.75500955267872, -0.0675395656134571], [-0.9787483018167649, 0.486640401550561, -4.89666449152691, 0.5042868047579466, 2.7903609807178054, -0.029594765491590475, -3.6091978206270996, -0.06576614553686772], [-1.4684147509694558, 0.5370690820263556, -5.25758427358962, 0.4977101902042599,
3.216860139839751, -0.02882788356578406, -2.6614219114672206, -0.06406196347952013], [-1.9941731783284178, 0.5868401010467816, -5.523726464736342, 0.49130399385630785, 3.5507821034099836, -0.028090552623374627, -1.9193731035333705, -0.062423450274165834], [-2.546545824802052, 0.6359705004324124, -5.715663775089679, 0.48506164882889125, 3.801833041871401, -0.027381268968280633, -1.3614821291746655, -0.06084726437395696], [-3.11811220231102, 0.6844766653153016, -5.851811988007145, 0.47897692239149553, 3.985110999814779, -0.026698623577869795, -0.9541977781893805, -0.05933027461748843]]
tickRate = 0.05
I haven't managed to reproduce the problem in any test program, not even within the same virtual environment as the original code.
The post Get data points from Seaborn distplot describes how you can get data elements using sns.distplot(x).get_lines()[0].get_data(), sns.distplot(x).patches and [h.get_height() for h in sns.distplot(x).patches]
But how can you do this if you've used multiple layers by plotting the data in a loop, such as:
Snippet 1
for var in list(df):
print(var)
distplot = sns.distplot(df[var])
Plot
Is there a way to retrieve the X and Y values for both linecharts and the bars?
Here's the whole setup for an easy copy&paste:
#%%
# imports
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
import pylab
pylab.rcParams['figure.figsize'] = (8, 4)
import seaborn as sns
from collections import OrderedDict
# Function to build synthetic data
def sample(rSeed, periodLength, colNames):
np.random.seed(rSeed)
date = pd.to_datetime("1st of Dec, 1999")
cols = OrderedDict()
for col in colNames:
cols[col] = np.random.normal(loc=0.0, scale=1.0, size=periodLength)
dates = date+pd.to_timedelta(np.arange(periodLength), 'D')
df = pd.DataFrame(cols, index = dates)
return(df)
# Dataframe with synthetic data
df = sample(rSeed = 123, colNames = ['X1', 'X2'], periodLength = 50)
# sns.distplot with multiple layers
for var in list(df):
myPlot = sns.distplot(df[var])
Here's what I've tried:
Y-values for histogram:
If I run:
barX = [h.get_height() for h in myPlot.patches]
Then I get the following list of lenght 11:
[0.046234272703757885,
0.1387028181112736,
0.346757045278184,
0.25428849987066837,
0.2542884998706682,
0.11558568175939472,
0.11875881712519201,
0.3087729245254993,
0.3087729245254993,
0.28502116110046083,
0.1662623439752689]
And this seems reasonable since there seems to be 6 values for the blue bars and 5 values for the red bars. But how do I tell which values belong to which variable?
Y-values for line:
This seems a bit easier than the histogram part since you can use myPlot.get_lines()[0].get_data() AND myPlot.get_lines()[1].get_data() to get:
Out[678]:
(array([-4.54448949, -4.47612134, -4.40775319, -4.33938504, -4.27101689,
...
3.65968859, 3.72805675, 3.7964249 , 3.86479305, 3.9331612 ,
4.00152935, 4.0698975 , 4.13826565]),
array([0.00042479, 0.00042363, 0.000473 , 0.00057404, 0.00073097,
0.00095075, 0.00124272, 0.00161819, 0.00208994, 0.00267162,
...
0.0033384 , 0.00252219, 0.00188591, 0.00139919, 0.00103544,
0.00077219, 0.00059125, 0.00047871]))
myPlot.get_lines()[1].get_data()
Out[679]:
(array([-3.68337423, -3.6256517 , -3.56792917, -3.51020664, -3.4524841 ,
-3.39476157, -3.33703904, -3.27931651, -3.22159398, -3.16387145,
...
3.24332952, 3.30105205, 3.35877458, 3.41649711, 3.47421965,
3.53194218, 3.58966471, 3.64738724]),
array([0.00035842, 0.00038018, 0.00044152, 0.00054508, 0.00069579,
0.00090076, 0.00116922, 0.00151242, 0.0019436 , 0.00247792,
...
0.00215912, 0.00163627, 0.00123281, 0.00092711, 0.00070127,
0.00054097, 0.00043517, 0.00037599]))
But the whole thing still seems a bit cumbersome. So does anyone know of a more direct approach to perhaps retrieve all data to a dictionary or dataframe?
I was just getting the same need of retrieving data from a seaborn distribution plot, what worked for me was to call the method .findobj() on each iteration's graph. Then, one can notice that the matplotlib.lines.Line2D object has a get_data() method, this is similar as what you've mentioned before for myPlot.get_lines()[1].get_data().
Following your example code
data = []
for idx, var in enumerate(list(df)):
myPlot = sns.distplot(df[var])
# Fine Line2D objects
lines2D = [obj for obj in myPlot.findobj() if str(type(obj)) == "<class 'matplotlib.lines.Line2D'>"]
# Retrieving x, y data
x, y = lines2D[idx].get_data()[0], lines2D[idx].get_data()[1]
# Store as dataframe
data.append(pd.DataFrame({'x':x, 'y':y}))
Notice here that the data for the first sns.distplot plot is stored on the first index of lines2D and the data for the second sns.distplot is stored on the second index. I'm not really sure about why this happens this way, but if you were to consider more than two plots, then you will access each sns.distplot data by calling Lines2D on it's respective index.
Finally, to verify one can plot each distplot
plt.plot(data[0].x, data[0].y)
I deal with simulation data and have been using matplotlib a lot lately and have been encountering something (a bug?) that's annoying.
I have been allowing matplotlib to automatically set the tick labels and their type (scientific, etc) and with some data I get weird scientific ticker labels.
In searching for a resolution to this I found that you can call set_powerlimits((n,m)) to set the the limits of data that will be displayed using scientific notation. But I have encountered this problem (if I remember correctly) with data spanning several orders of magnitude, also my data is all over the place so I need a programmatic solution of some sort, not a hard set solution.
see: http://matplotlib.org/api/ticker_api.html
Below I have included example data, code, and a screenshot.
#! /usr/bin/env python
from matplotlib import pyplot as plt
data = [
[1.83186088e-08,0.03275],
[1.07139009e-07,0.03275],
[2.06376627e-07,0.03275],
[3.03918517e-07,0.03275],
[4.06032883e-07,0.03275],
[5.01194017e-07,0.03275],
[6.02195723e-07,0.03275],
[7.03536925e-07,0.03275],
[8.04625154e-07,0.03275],
[9.06401951e-07,0.03275],
[1.00041895e-06,0.03275],
[1.10230745e-06,0.03275],
[1.2042525e-06,0.03275],
[1.30647822e-06,0.03275],
[1.40109887e-06,0.03275],
[1.50380097e-06,0.03275],
[1.60683242e-06,0.03275],
[1.70208505e-06,0.03275],
[1.80545692e-06,0.03275],
[1.90090648e-06,0.03275],
[2.00453092e-06,0.03275],
[2.10018627e-06,0.03275],
[2.20401747e-06,0.03275],
[2.30009359e-06,0.03275],
[2.4043033e-06,0.03275],
[2.50066449e-06,0.03275],
[2.60513728e-06,0.03275],
[2.70165405e-06,0.03275],
[2.80635938e-06,0.03275],
[2.90331342e-06,0.03275],
[3.00021199e-06,0.03275],
[3.10546819e-06,0.03275],
[3.20257899e-06,0.03275],
[3.30032923e-06,0.0327499999],
[3.40612833e-06,0.0327499999],
[3.50401732e-06,0.0327499997],
[3.60153069e-06,0.0327499996],
[3.70700708e-06,0.0327499993],
[3.80456907e-06,0.0327499988],
[3.90259984e-06,0.0327499982],
[4.00084149e-06,0.0327499973],
[4.10700266e-06,0.0327499959],
[4.2047462e-06,0.0327499942],
[4.30209468e-06,0.0327499918],
[4.40018204e-06,0.0327499886],
[4.50712875e-06,0.032749984],
[4.60630591e-06,0.0327499785],
[4.70519881e-06,0.0327499715],
[4.80398305e-06,0.0327499628],
[4.90251297e-06,0.0327499521],
[5.00182752e-06,0.032749939],
[5.10157551e-06,0.0327499232],
[5.20157575e-06,0.0327499043],
[5.30145192e-06,0.0327498822],
[5.40127044e-06,0.0327498565],
[5.500537e-06,0.0327498272],
[5.60773155e-06,0.0327497911],
[5.70660709e-06,0.0327497534],
[5.80610521e-06,0.0327497112],
[5.90651786e-06,0.0327496642],
[6.00749437e-06,0.0327496124],
[6.10822094e-06,0.0327495566],
[6.20042255e-06,0.0327495018],
[6.30049028e-06,0.0327494386],
[6.40035803e-06,0.0327493715],
[6.50035477e-06,0.0327493004],
[6.60056805e-06,0.0327492251],
[6.70029936e-06,0.0327491461],
[6.80054193e-06,0.0327490625],
[6.90130872e-06,0.0327489743],
[7.00202598e-06,0.0327488818],
[7.10217348e-06,0.0327487855],
[7.20243015e-06,0.0327486847],
[7.30199609e-06,0.0327485801],
[7.40193254e-06,0.0327484707],
[7.50188319e-06,0.0327483567],
[7.60306205e-06,0.0327482367],
[7.70357184e-06,0.0327481129],
[7.80343389e-06,0.0327479853],
[7.90330165e-06,0.0327478532],
[8.00348513e-06,0.0327477162],
[8.10167039e-06,0.0327475777],
[8.206328e-06,0.0327474253],
[8.3020567e-06,0.0327472819],
[8.40527826e-06,0.0327471228],
[8.50095898e-06,0.0327469714],
[8.60536828e-06,0.0327468019],
[8.70106059e-06,0.0327466426],
[8.80396558e-06,0.032746467],
[8.90727378e-06,0.0327462865],
[9.00225164e-06,0.0327461166],
[9.10359892e-06,0.0327459311],
[9.20470894e-06,0.0327457418],
[9.30582982e-06,0.0327455481],
[9.40750123e-06,0.0327453488],
[9.50134495e-06,0.0327451608],
[9.60358199e-06,0.0327449513],
[9.70705637e-06,0.0327447344],
[9.80377546e-06,0.0327445269],
[9.90091941e-06,0.032744314],
]
times=[]
vals=[]
for elem in data:
times.append(elem[0])
vals.append(elem[1])
plt.plot(times,vals)
plt.show()
screen_shot
You might try using the Engineering Formatter:
times=[]
vals=[]
for elem in data:
times.append(elem[0])
vals.append(elem[1])
plt.plot(times,vals)
plt.show()
formatter = matplotlib.ticker.EngFormatter(unit='S', places=3)
formatter.ENG_PREFIXES[-6] = 'u'
plt.axes().yaxis.set_major_formatter(formatter)
Which will look like this:
This is a known problem. You'd be better to analyse the data manually for its limits, like you have done in the screen shot, and use ax.set_ylim(min, max) yourself after plotting. You can also turn off the offset with:
import matplotlib.ticker as mticker
# plot some stuff
# ...
y_formatter = mticker.ScalarFormatter(useOffset=False)
ax.yaxis.set_major_formatter(y_formatter)
I think that you best option is to use logaritmic axis, but if you need to create the graphic with linear axis, you must set the power limits yourself. You can compute the power limits using math.log10:
import math
from matplotlib import ticker
# Compute the span of the data
pow_min = math.floor(math.log10(min(vals)))
pow_max = math.ceil(math.log10(max(vals)))
# Create a scalar formatter without offset, in order to have
# the right exponent over the yaxis
fmt = ticker.ScalarFormatter(useOffset=False)
fmt.set_powerlimits((pow_min, pow_max))
fig = plt.figure()
ax1 = fig.add_subplot(1, 1, 1)
ax1.plot(times, vals)
ax1.yaxis.set_major_formatter(fmt) # Set the formatter
I've been looking into how to make plots against time on the x axis and have it pretty much sorted, with one strange quirk that makes me wonder whether I've run into a bug or (admittedly much more likely) am doing something I don't really understand.
Simply put, below is a simplified version of my program. If I put this in a .py file and execute it from an interpreter (ipython) I get a figure with an x axis with the year only, "2012", repeated a number of times, like this.
However, if I comment out the line (40) that sets the xticks manually, namely 'plt.xticks(tk)' and then run that exact command in the interpreter immediately after executing the script, it works great and my figure looks like this.
Similarly it also works if I just move that line to be after the savefig command in the script, that's to say to put it at the very end of the file. Of course in both cases only the figure drawn on screen will have the desired axis, and not the saved file. Why can't I set my x axis earlier?
Grateful for any insights, thanks in advance!
import matplotlib.pyplot as plt
import datetime
# define arrays for x, y and errors
x=[16.7,16.8,17.1,17.4]
y=[15,17,14,16]
e=[0.8,1.2,1.1,0.9]
xtn=[]
# convert x to datetime format
for t in x:
hours=int(t)
mins=int((t-int(t))*60)
secs=int(((t-hours)*60-mins)*60)
dt=datetime.datetime(2012,01,01,hours,mins,secs)
xtn.append(date2num(dt))
# set up plot
fig=plt.figure()
ax=fig.add_subplot(1,1,1)
# plot
ax.errorbar(xtn,y,yerr=e,fmt='+',elinewidth=2,capsize=0,color='k',ecolor='k')
# set x axis range
ax.xaxis_date()
t0=date2num(datetime.datetime(2012,01,01,16,35)) # x axis startpoint
t1=date2num(datetime.datetime(2012,01,01,17,35)) # x axis endpoint
plt.xlim(t0,t1)
# manually set xtick values
tk=[]
tk.append(date2num(datetime.datetime(2012,01,01,16,40)))
tk.append(date2num(datetime.datetime(2012,01,01,16,50)))
tk.append(date2num(datetime.datetime(2012,01,01,17,00)))
tk.append(date2num(datetime.datetime(2012,01,01,17,10)))
tk.append(date2num(datetime.datetime(2012,01,01,17,20)))
tk.append(date2num(datetime.datetime(2012,01,01,17,30)))
plt.xticks(tk)
plt.show()
# save to file
plt.savefig('savefile.png')
I don't think you need that call to xaxis_date(); since you are already providing the x-axis data in a format that matplotlib knows how to deal with. I also think there's something slightly wrong with your secs formula.
We can make use of matplotlib's built-in formatters and locators to:
set the major xticks to a regular interval (minutes, hours, days, etc.)
customize the display using a strftime formatting string
It appears that if a formatter is not specified, the default is to display the year; which is what you were seeing.
Try this out:
import datetime as dt
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter, MinuteLocator
x = [16.7,16.8,17.1,17.4]
y = [15,17,14,16]
e = [0.8,1.2,1.1,0.9]
xtn = []
for t in x:
h = int(t)
m = int((t-int(t))*60)
xtn.append(dt.datetime.combine(dt.date(2012,1,1), dt.time(h,m)))
def larger_alim( alim ):
''' simple utility function to expand axis limits a bit '''
amin,amax = alim
arng = amax-amin
nmin = amin - 0.1 * arng
nmax = amax + 0.1 * arng
return nmin,nmax
plt.errorbar(xtn,y,yerr=e,fmt='+',elinewidth=2,capsize=0,color='k',ecolor='k')
plt.gca().xaxis.set_major_locator( MinuteLocator(byminute=range(0,60,10)) )
plt.gca().xaxis.set_major_formatter( DateFormatter('%H:%M:%S') )
plt.gca().set_xlim( larger_alim( plt.gca().get_xlim() ) )
plt.show()
Result:
FWIW the utility function larger_alim was originally written for this other question: Is there a way to tell matplotlib to loosen the zoom on the plotted data?
I have a series of lines that each need to be plotted with a separate colour. Each line is actually made up of several data sets (positive, negative regions etc.) and so I'd like to be able to create a generator that will feed one colour at a time across a spectrum, for example the gist_rainbow map shown here.
I have found the following works but it seems very complicated and more importantly difficult to remember,
from pylab import *
NUM_COLORS = 22
mp = cm.datad['gist_rainbow']
get_color = matplotlib.colors.LinearSegmentedColormap.from_list(mp, colors=['r', 'b'], N=NUM_COLORS)
...
# Then in a for loop
this_color = get_color(float(i)/NUM_COLORS)
Moreover, it does not cover the range of colours in the gist_rainbow map, I have to redefine a map.
Maybe a generator is not the best way to do this, if so what is the accepted way?
To index colors from a specific colormap you can use:
import pylab
NUM_COLORS = 22
cm = pylab.get_cmap('gist_rainbow')
for i in range(NUM_COLORS):
color = cm(1.*i/NUM_COLORS) # color will now be an RGBA tuple
# or if you really want a generator:
cgen = (cm(1.*i/NUM_COLORS) for i in range(NUM_COLORS))