Visualizing time series in spirals using R or Python? - python

Does anyone know how to do this in R? That is, represent this cyclical data from the left plot to the right plot?
http://cs.lnu.se/isovis/courses/spring07/dac751/papers/TimeSpiralsInfoVis2001.pdf
Here is some example data.
Day = c(rep(1,5),rep(2,5),rep(3,5))
Hour = rep(1:5,3)
Sunlight = c(0,1,2,3,0,1,2,3,2,1,0,0,4,2,1)
data = cbind(Day,Hour,Sunlight)

This seems pretty close:
# sample data - hourly for 10 days; daylight from roughly 6:00am to 6:00pm
set.seed(1) # for reproducibility
Day <- c(rep(1:10,each=24))
Hour <- rep(1:24)
data <- data.frame(Day,Hour)
data$Sunlight <- with(data,-10*cos(2*pi*(Hour-1+abs(rnorm(240)))/24))
data$Sunlight[data$Sunlight<0] <- 0
library(ggplot2)
ggplot(data,aes(x=Hour,y=10+24*Day+Hour-1))+
geom_tile(aes(color=Sunlight),size=2)+
scale_color_gradient(low="black",high="yellow")+
ylim(0,250)+ labs(y="",x="")+
coord_polar(theta="x")+
theme(panel.background=element_rect(fill="black"),panel.grid=element_blank(),
axis.text.y=element_blank(), axis.text.x=element_text(color="white"),
axis.ticks.y=element_blank())

I know how to do this in Python. I find the scatter plot from matplotlib good for this sort of thing. Here's an example:
import matplotlib.pyplot as plt
import numpy as np
period = 0.5
f = np.arange(0, 100, 0.03) // Data range
z = np.sin(f) // Data
a = f*np.sin(period*f);
b = f*np.cos(period*f);
fig = plt.figure()
ax = plt.subplot(111)
fig.add_subplot(ax)
ax.scatter(a, b, c=z, s=100, edgecolors='none')
plt.show()
You can change period to change the number of revolutions in the spiral. a and b plot the spiral whilst z contains the actual data (in this example, a sine wave).

Related

Recover the time shift from nympy.correlate result in Python

This is not a duplicate question since other answers only explain how to plot the cross-correlation function and do not explain how you can get the time difference.
Given a sin signal and shifted version, we should be able to get the time delay between them.
I have created a sin signal and shifted it by t_d=0.05. The following is my code and its output:
import numpy as np
import matplotlib.pyplot as plt
fs = 1000
x = np.linspace(0, 1, fs)
f = 5
t_shift = 0.05
y = np.sin(2*np.pi*f*x)
y_shifted = np.sin(2*np.pi*f*(x-t_shift))
fig, ax = plt.subplots()
ax.plot(x, y, x, y_shifted)
plt.show()
By normalizing signals and applying numpy.correlate we get the following:
y_norm = (y-y.mean())/y.std()
y_shifted_norm = (y_shifted - y_shifted.mean())/y_shifted.std()
cc = np.correlate(y_norm, y_shifted_norm, 'full')
fig, ax = plt.subplots()
ax.plot(range(len(cc)), cc)
plt.show()
Question
From the indices of cross-correlation function, how can I get t_shift=0.05?
#Sepide. It seems to me as if you are trying to maximise the correlation between the signal y and a shifted version of y_shifted. This might be accomplished using np.correlate() but it seems nontrivial indeed to recover the time shifts in the signals. In the solution below I manually shift the time series and compute the correlation coefficient using np.corrcoef. As soon as this Pearson correlation coefficient equals 1, the two signals are aligned.
import numpy as np
import matplotlib.pyplot as plt
# Setting
fs = 1000
x = np.linspace(0, 1, fs)
f = 5
t_shift = 0.05
t_step = 1/fs
# Data
y = np.sin(2*np.pi*f*x)
y_shifted = np.sin(2*np.pi*f*(x-t_shift))
# Compute correlation
MaxTimeShift = 200
CorrelationList = np.empty((MaxTimeShift,1));
CorrelationList[:] = np.NaN
# Compute correlation for various shifts
for iter in range(MaxTimeShift):
CorrelationList[iter] = np.corrcoef( y[0:801].T, y_shifted[iter:(801+iter)].T)[0,1]
# Plot 1
plt.figure(1)
plt.plot(x, y, x, y_shifted)
plt.show()
# Plot 2
plt.figure(2)
ShiftList = t_step*np.arange(MaxTimeShift)
plt.plot(ShiftList, CorrelationList)
plt.title("Correlation coefficient")
plt.show()
print("The time shift between the signals is: ", ShiftList[np.argmax(CorrelationList)])

python - frequency of power spectrum

I want to plot a power spectrum from my data set (array of about 2000 values, the data is recorded every minute).
I've gotten so far as:
y= np.fft.fft(data)
abs = np.abs(y) #absolute value
p = np.square(abs) #power
but am confused about setting the frequency.
I've tried using freqs = np.fft.fftfreq(len(y)), but when I plot the result it looks like, which can't be right.
What am I doing wrong?
Here is an example to plot the power spectrum:
import matplotlib.pyplot as plt
import numpy as np
t = np.linspace(0,2000,200)
data = 2 * np.sin(2*np.pi *60*t) + 2 * np.sin(2*np.pi *42*t)
spectrum = np.fft.fft(data)
power_spectrum = np.square(np.abs(spectrum))
fig, ax = plt.subplots()
ax.plot(np.arange(len(power_spectrum)), power_spectrum)
plt.show()

matplotlib plot circular daily-cycle diagram (daily polar plot)

I am trying to plot the daily cycle of a sampling strategy as a kind of rose diagram / polar plot.
I want to show each of our experiment samples where the distance from the center is the day, and the angle from 0 represents the time at which that sample was collected. Ideally, I would like to be able to colour the points by different variables.
The ideal plot should look something like below:
Simulate dummy data to explain the problem
I have the data in an xarray format. We have a launchtime dimension that contains the time at which a sample was taken, and we want to use this to plot when in the day, and then each of the days in turn.
import xarray as xr
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from pandas.tseries.offsets import DateOffset
import matplotlib.dates as mdates
import itertools
value = np.random.normal(size=100)
expected_time = pd.date_range("2000-01-01", freq="180min", periods=100)
# add random offset to simulate being +/- the true expected release time
time_deltas = np.array([DateOffset(minute=max(0, min(int(i), 59))) for i in np.abs(np.random.normal(0, 10, size=100))])
time = [expected_time[i] + time_deltas[i] if (i % 2 == 0) else expected_time[i] - time_deltas[i] for i in range(100)]
df = pd.DataFrame({"launchtime": time, "value": value})
ds = df.set_index("launchtime").to_xarray()
ds = ds.assign_coords(expected_time=("launchtime", expected_time))
My thinking so far
def time_to_angle(dt: pd.Timestamp) -> float:
SEC_IN_DAY = 86_400
start_of_day = pd.to_datetime(f"{dt.day}-{dt.month}-{dt.year}")
delta = (dt - start_of_day)
n_seconds = delta.seconds
# return angle in degrees
return (n_seconds / SEC_IN_DAY) * 360
# angle from 0 degrees
angles = [time_to_angle(pd.to_datetime(dt)) for dt in ds.launchtime.values]
# how far along the radius
days = np.arange(np.unique(ds["launchtime.day"].values).size)
# how to plot in polar coordinates? Do I have to draw an x,y grid and plot as a scatter?
Any advice on how to go about addressing this problem would be super appreciated!
with a bit of high/secondary school trigonometry it can be quite simply achieved as a scatter plot.
consider the time of day as the angle calculated in radians
consider the day (how old) as the radius
it then reduces to simple use of sin() and cos() to calculated x and y co-ordinates
for good measure decided to colour points too... not that happy with a clock face showing 24hrs, analogue clocks only show 12 hours
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import math
fig, ax = plt.subplots(1,1, figsize=(4, 4))
df = pd.DataFrame({"sampledate":pd.date_range("01-apr-2021", "09-apr-2021 23:59", freq="30s")})
# drop a bit of data so it's not perfect circles...
df = df.loc[np.random.choice(df.index, int(len(df)/200) )]
df["angle"] = ((df["sampledate"] - df["sampledate"].dt.floor("D")).dt.total_seconds() / (24*60*60)) * 2*math.pi
df["radius"] = (df["sampledate"] - df["sampledate"].min()).dt.days+1
# scatter, radius is how old sample is, angle is time of day
ax.scatter(x=df["angle"].apply(math.sin) * df["radius"], y=df["angle"].apply(math.cos) * df["radius"],
c=np.where(df.sampledate.dt.hour.le(12), "red", "pink"), s=10)
ax.axis("off")
# draw markers on clock face...
for h in list(range(0, 24, 3)):
a = (h/24)*2*math.pi
x = math.sin(a)*df["radius"].max()
y = math.cos(a)*df["radius"].max()
ax.annotate(h, xy=(x, y), xytext=(x*1.1,y*1.1), backgroundcolor="yellow")
We can use the projection="polar" argument to the ax = plt.subplot(projection='polar') argument in order to create a plot with:
the angle showing the time of day (theta, defined in radians)
the radius showing number of days since the first sample (r).
We need to do some initial pre-processing of the time of day data and the
from sklearn.preprocessing import LabelEncoder
def time_to_angle(dt: pd.Timestamp) -> float:
SEC_IN_DAY = 86_400
start_of_day = pd.to_datetime(f"{dt.day}-{dt.month}-{dt.year}")
delta = (dt - start_of_day)
n_seconds = delta.seconds
# return angle in radians
return (n_seconds / SEC_IN_DAY) * 2 * np.pi
# angle from 0 degrees
angles = [time_to_angle(pd.to_datetime(dt)) for dt in ds.launchtime.values]
# how far along the radius = DAY NUM
le = LabelEncoder()
days = le.fit_transform(ds["launchtime.dayofyear"].values)
# Use the polar projection plot from matplotlib
ax = plt.subplot(projection='polar')
ax.set_theta_direction(-1)
ax.set_theta_zero_location("N")
ax.scatter(angles, days, marker="o")
ax.set_xticklabels(['00:00', '03:00', '06:00', '09:00', '12:00', '15:00', '18:00', '21:00',])
ax.set_yticklabels([])
ax.set_title("Sampling Schedule [UTC]")
plt.show()
The output looks something like this:

How to stack multiple histograms in a single figure in Python?

I have a numpy array with shape [30, 10000], where the first axis is the time step, and the second contains the values observed for a series of 10000 variables. I would like to visualize the data in a single figure, similar to this:
that you can find in the seaborn tutorial here. Basically, what I would like is to draw a histogram of 30/40 bins for each of the 30 temporal steps, and then - somehow - concatenate these histogram to have a common axis and plot them in the same figure.
My data look like a gaussian that moves and gets wider in time. You can reproduce something similar using the following code:
mean = 0.0
std = 1.0
data = []
for t in range(30):
mean = mean + 0.01
std = std + 0.1
data.append(np.random.normal(loc=mean, scale=std, size=[10000]))
data = np.array(data)
A figure similar to the picture showed above would be the best, but any help is appreciated!
Thank you,
G.
Use histogram? You could do this with np.hist2d, but this way is a little clearer...
import matplotlib.pyplot as plt
import numpy as np
data = np.random.randn(30, 10000)
H = np.zeros((30, 40))
bins = np.linspace(-3, 3, 41)
for i in range(30):
H[i, :], _ = np.histogram(data[i, :], bins)
fig, ax = plt.subplots()
times = np.arange(30) * 0.1
pc = ax.pcolormesh(bins, times, H)
ax.set_xlabel('data bins')
ax.set_ylabel('time [s]')
fig.colorbar(pc, label='count')

How to make readable a line plot using a DataFrame with a large number of rows

I have a 1,000,000 x 2 DataFrame object consisting of data I'm trying to understand visually. Its basically a simulation of 1,000,000 events where a packet traveling along a network is either queued or dropped depending on the buffer's size. So, the two column values are Packets in Queue and Packets Dropped.
I'm trying to make a line plot using Python, Matplotlib and Jupyter Notebooks that has the ID of the event on the x-axis and the number of packets in the queue at a specific ID point on the y-axis. There should be two lines, the first representing the number of packets in the queue and the second representing the number of packets dropped. However, given that there are over 1,000,000 simulations, the graph isn't intelligible. The values are too squished together. Is it possible to make a readable graph with 1,000,000 event instances or do I need to dramatically trim the number of events?
With a million data points it will require a lot of effort and zooming in to see them in fine detail. Plotly has some nice tools for zooming in and out of plots as well as sliding your data window along the x-axis.
If you're okay with some averaging, you can plot a moving average and get close to a hundred thousand points. You can stack two subplots on each other to see both columns of data in reasonable detail. You can of course average them more, but you lose the ability to see fine details.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
def moving_avg(x, N=30):
return np.convolve(x, np.ones((N,))/N, mode='valid')
plt.figure(figsize = (16,12))
plt.subplot(3,1,1)
x = np.random.random(1000)
plt.plot(x, linewidth = 1, alpha = 0.5, label = 'linewidth = 1')
plt.plot(moving_avg(x, 10), 'C0', label = 'moving average, N = 10')
plt.xlim(0,len(x))
plt.legend(loc=2)
plt.subplot(3,1,2)
x = np.random.random(10000)
plt.plot(x, linewidth = 0.2, alpha = 0.5, label = 'linewidth = 0.2')
plt.plot(moving_avg(x, 100), 'C0', label = 'moving average, N = 100')
plt.xlim(0,len(x))
plt.legend(loc=2)
plt.subplot(3,1,3)
x = np.random.random(100000)
plt.plot(x, linewidth = 0.05, alpha = 0.5, label = 'linewidth = 0.05')
plt.plot(moving_avg(x, 500), 'C0', label = 'moving average, N = 500')
plt.xlim(0,len(x))
plt.legend(loc=2)
plt.tight_layout()
Try histogram
from matplotlib.pyplot import hist
import pandas as pd
df = pd.DataFrame()
df['x'] = np.random.rand(1000000)
hist(df.index, weights=df.x, bins=1000)
plt.show()
Method 2 line plots
df['x'] = np.random.rand(1000000)
df['y'] = np.random.rand(1000000)
w = 1000
v1 = df['x'].rolling(min_periods=1, window=w).sum()[[i*w for i in range(1, int(len(df)/w))]]/w
v2 = df['y'].rolling(min_periods=1, window=w).sum()[[i*w for i in range(1, int(len(df)/w))]]/w
plt.plot(np.arange(len(v1)),v1, c='b')
plt.plot(np.arange(len(v1)),v2, c='r')
plt.show()
We are calculating the mean of w=1000 points i.e averaging w values together and plotting them.
Looks like below when 1000000 points are bucked at every 1000 interval

Categories

Resources