Plot a graph relating to the Granger Causality Test - python

I have used the code below to run a Granger causality test on a data frame that I have. The code runs fine and returns the correct results that I would expect, however, I was wondering if it is possible to plot the data in a graph using python showing the causality?
Something similar to this:
I have tried using the code below and have been successful in returning data.
print(grangercausalitytests(df[['Number_of_Ethereum_Searches', 'Price_in_USD']], maxlag=1, addconst=True, verbose=True))

If you are trying to inspect this visually, then you should use a cross-correlation diagram. This illustrates the strength of the correlation between two time series.
Let's illustrate this with an example. Consider the following two variables:
Sunlight hours
Maximum temperature
Ever notice how July/August are the hottest months in the Northern Hemispehere, while the longest day is on June 21? This is due to a a time lag, where the effects of maximum sunlight do not cause a maximum temperature until a month later or so.
If one were to plot a cross-correlation function to describe this, here is what it would look like (code included).
# Import Libraries
import numpy as np
import pandas as pd
import statsmodels
import statsmodels.tsa.stattools as ts
from statsmodels.tsa.stattools import acf, pacf
import matplotlib as mpl
import matplotlib.pyplot as plt
import quandl
import scipy.stats as ss
import os;
# Set Path
path="directory"
os.chdir(path)
os.getcwd()
# Variables
dataset=np.loadtxt("dataset.csv", delimiter=",")
x=dataset[:,1]
y=dataset[:,0]
plt.xcorr(x, y, normed=True, usevlines=True, maxlags=365)
plt.title("Sunlight Hours versus Maximum Temperature")
plt.show()
Cross-Correlation Diagram
The ACF (autocorrelation) and PACF (partial autocorrelation) plots for these can also be plotted.
# Autocorrelation
acfx=statsmodels.tsa.stattools.acf(x)
plt.plot(acfx)
plt.title("Autocorrelation Function")
plt.show()
pacfx=statsmodels.tsa.stattools.pacf(x)
plt.plot(pacfx)
plt.title("Partial Autocorrelation Function")
plt.show()
acfy=statsmodels.tsa.stattools.acf(y)
plt.plot(acfy)
plt.title("Autocorrelation Function")
plt.show()
pacfy=statsmodels.tsa.stattools.pacf(y)
plt.plot(pacfy)
plt.title("Partial Autocorrelation Function")
plt.show()
Autocorrelation and Partial Autocorrelation plots (Maximum temperature)
Autocorrelation and Partial Autocorrelation plots (Sunlight hours)
Notice how the strength of the correlations for sunlight hours persist for longer than maximum temperature, which implies that the effects of long sunlight hours persist in influencing temperature (i.e. one is Granger-causing the other).
Hope the above example is helpful. I would suggest looking at both cross-correlations and autocorrelations in order to get a better overview of the nature of Granger causality in your data.

Related

Cartopy/Matplotlib savefig slow for lcc projections

We're in the process of migrating from Basemap to Cartoy and finding the savefig functionality to be a far too slow for our app, specifically when using the LambertConformal CRS. I ran a quick check using the the code snippet below and found that it takes about 10x as long to save using the LambertConformal projection vs PlateCarree. The plots within our app include numerous data layers including colormaps and contours but I found that savefig is consistently much slower when using the LCC CRS. I saw that others suggested users set PYPROJ_GLOBAL_CONTEXT=ON but since our app is multithreaded that isn't an option for us. We can adopt the PlateCarree CRS as our default if needed but I thought I would check to see if other users in the community had similar issues.
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
import time
extent = [-10, 10, 50, 60]
lcc_projection = ccrs.LambertConformal(
central_latitude=55,
central_longitude=0,
)
pc_projection = ccrs.PlateCarree()
pc_plot = plt.axes(projection=pc_projection, facecolor="dimgrey")
pc_plot.set_extent(extent, crs=pc_projection)
pc_plot.add_feature(cfeature.LAND)
t = time.time()
plt.savefig('map_pc.png')
print(f"pc savefig time : {time.time() - t}") # Takes ~3s
plt.close()
lcc_plot = plt.axes(projection=lcc_projection, facecolor="dimgrey")
lcc_plot.set_extent(extent, crs=pc_projection)
lcc_plot.add_feature(cfeature.LAND)
t = time.time()
plt.savefig('map_lcc.png')
print(f"lcc savefig time : {time.time() - t}") # Takes ~30s
plt.close()
Resulting images:
PC plot
LCC Plot

Transform some kind of exponential distribution into normal distribution

I have the following exponential distribution, generated with the following code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import random
np.random.seed(1)
tags_ratio = np.random.exponential(1/25, 1000)
plt.hist(tags_ratio, range=(0, 1), bins=100)
plt.show()
I'm trying to transform my data, resides in tags_ratio into normal distribution, but with no success.
Tried with the log function and square functions. it given decent results. But I'm interesting in more ideas. Maybe more sophisticated.
You can try to see if this helps:
from scipy.stats import boxcox
tags_ratio = boxcox(tags_ratio, 0.3)
plt.hist(tags_ratio)
plt.show()
result:
for more explanations and theory about Box-Cox click here.

HourLocator() on yaxis raising runtime error unexpectedly exceeding Locator.MAXTICK

I am trying to setup a simple figure with some horizontal lines.
plt.figure()
plt.axhline(datetime.time(12,0,0,0),color='blue',ls='--',lw=3)
plt.axhline(datetime.time(18,0,0,0),color='red',ls='--',lw=3)
This works fine and I get:
which is correct.
Then I would like my yticks to be on the rounded hourly values only.
I am trying using HourLocator()
from matplotlib.dates import HourLocator, DateFormatter,
plt.gca().yaxis.set_major_locator(HourLocator()) # this fails
plt.gca().yaxis.set_major_formatter(DateFormatter('%H:%M'))
However, this generates this error.
Why is it trying to generate 570241 ticks?
RuntimeError: Locator attempting to generate 570241 ticks from 42120.0 to 65880.0: exceeds Locator.MAXTICKS
Note that matplotlib does not support datetime.time values. Admittedly, the fact that it seemingly works hides this a bit.
So you first need to use datetime.datetime instead.
import datetime
import matplotlib.pyplot as plt
fig,ax=plt.subplots()
ax.axhline(datetime.datetime(2018,7,24,12,0,0,0),color='blue',ls='--',lw=3)
ax.axhline(datetime.datetime(2018,7,24,18,0,0,0),color='red',ls='--',lw=3)
ax.autoscale()
plt.show()
Now this already gives you hourly ticks (by coincidence). But you may of course use custom locations and formats now, adding
from matplotlib.dates import HourLocator, DateFormatter
plt.gca().yaxis.set_major_locator(HourLocator())
plt.gca().yaxis.set_major_formatter(DateFormatter('%H:%M'))
as in the question will give you the desired output.

smoothing curves with no local extremums using numpy

I am trying to get a smooth curve for my data points. Say (lin_space,rms) are my ordered pairs that I need to plot. For the following code-
spl=UnivariateSpline(lin_space,rms)
x=np.arange(0,1001,0.5)
plt.plot(lin_space,rms,'k.')
plt.plot(lin_space,spl(lin_space),'b-')
plt.plot(x,np.sqrt(x),'r-')
After smoothing with UnivariateSpline I am getting the blue line whereas I need my plots like the red one like shown (with no local extremums)
You'll want a more limited class of models.
One option, for the data that you have shown, is to do least squares with a square-root function. That should produce good results.
A running average will be smooth(er), depending on how you weight the terms.
A Gaussian Process regression with an RBF + WhiteNoise kernel might be worth checking into, with appropriate a priori bounds on the length scale of the RBF kernel. OTOH, your residuals aren't normally distributed, so this model may not work as well for values toward the edges.
Note: If you specifically want a function with no local extrema, you need to select a class of models that has that property. e.g. fitting a square root function.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
import sklearn.linear_model
mpl.rcParams['figure.figsize'] = (18,16)
WINDOW=30
def ma(signal, window=30):
return sum([signal[i:-window+i] for i in range(window)])/window
X=np.linspace(0,1000,1000)
Y=np.sqrt(X) + np.log(np.log(X+np.e))*np.random.normal(0,1,X.shape)
sqrt_model_X = np.sqrt(X)
model = sklearn.linear_model.LinearRegression()
model.fit(sqrt_model_X.reshape((-1,1)),Y.reshape((-1,1)))
plt.scatter(X,Y,c='b',marker='.',s=5)
plt.plot(X,np.sqrt(X),'r-')
plt.plot(X[WINDOW:],ma(Y,window=WINDOW),'g-.')
plt.plot(X,model.predict(sqrt_model_X.reshape((-1,1))),'k--')
plt.show()

plotting an iv-curve with python

So I am trying to plot an IV-curve using python, but all I'm getting is a straight, linear line. This may seem like a trivial code, but I just started self teaching myself literally a week a go, so bear with me as I am still learning :)
Here's my code:
import matplotlib.pyplot as plt
xdata = [20,27] # voltage data
ydata = [0.4,0.9] # current data
plt.plot(xdata, ydata)
plt.title(r'IV-curve')
plt.xlabel('Voltage(V)')
plt.ylabel('Current(I)')
plt.show()
A picture of it is shown here: http://imgur.com/a/lxPPo
Probably you can try something like below (change equation to set 'y' based on your requirements):

Categories

Resources