I have a numpy array with shape [30, 10000], where the first axis is the time step, and the second contains the values observed for a series of 10000 variables. I would like to visualize the data in a single figure, similar to this:
that you can find in the seaborn tutorial here. Basically, what I would like is to draw a histogram of 30/40 bins for each of the 30 temporal steps, and then - somehow - concatenate these histogram to have a common axis and plot them in the same figure.
My data look like a gaussian that moves and gets wider in time. You can reproduce something similar using the following code:
mean = 0.0
std = 1.0
data = []
for t in range(30):
mean = mean + 0.01
std = std + 0.1
data.append(np.random.normal(loc=mean, scale=std, size=[10000]))
data = np.array(data)
A figure similar to the picture showed above would be the best, but any help is appreciated!
Thank you,
G.
Use histogram? You could do this with np.hist2d, but this way is a little clearer...
import matplotlib.pyplot as plt
import numpy as np
data = np.random.randn(30, 10000)
H = np.zeros((30, 40))
bins = np.linspace(-3, 3, 41)
for i in range(30):
H[i, :], _ = np.histogram(data[i, :], bins)
fig, ax = plt.subplots()
times = np.arange(30) * 0.1
pc = ax.pcolormesh(bins, times, H)
ax.set_xlabel('data bins')
ax.set_ylabel('time [s]')
fig.colorbar(pc, label='count')
Related
This is not a duplicate question since other answers only explain how to plot the cross-correlation function and do not explain how you can get the time difference.
Given a sin signal and shifted version, we should be able to get the time delay between them.
I have created a sin signal and shifted it by t_d=0.05. The following is my code and its output:
import numpy as np
import matplotlib.pyplot as plt
fs = 1000
x = np.linspace(0, 1, fs)
f = 5
t_shift = 0.05
y = np.sin(2*np.pi*f*x)
y_shifted = np.sin(2*np.pi*f*(x-t_shift))
fig, ax = plt.subplots()
ax.plot(x, y, x, y_shifted)
plt.show()
By normalizing signals and applying numpy.correlate we get the following:
y_norm = (y-y.mean())/y.std()
y_shifted_norm = (y_shifted - y_shifted.mean())/y_shifted.std()
cc = np.correlate(y_norm, y_shifted_norm, 'full')
fig, ax = plt.subplots()
ax.plot(range(len(cc)), cc)
plt.show()
Question
From the indices of cross-correlation function, how can I get t_shift=0.05?
#Sepide. It seems to me as if you are trying to maximise the correlation between the signal y and a shifted version of y_shifted. This might be accomplished using np.correlate() but it seems nontrivial indeed to recover the time shifts in the signals. In the solution below I manually shift the time series and compute the correlation coefficient using np.corrcoef. As soon as this Pearson correlation coefficient equals 1, the two signals are aligned.
import numpy as np
import matplotlib.pyplot as plt
# Setting
fs = 1000
x = np.linspace(0, 1, fs)
f = 5
t_shift = 0.05
t_step = 1/fs
# Data
y = np.sin(2*np.pi*f*x)
y_shifted = np.sin(2*np.pi*f*(x-t_shift))
# Compute correlation
MaxTimeShift = 200
CorrelationList = np.empty((MaxTimeShift,1));
CorrelationList[:] = np.NaN
# Compute correlation for various shifts
for iter in range(MaxTimeShift):
CorrelationList[iter] = np.corrcoef( y[0:801].T, y_shifted[iter:(801+iter)].T)[0,1]
# Plot 1
plt.figure(1)
plt.plot(x, y, x, y_shifted)
plt.show()
# Plot 2
plt.figure(2)
ShiftList = t_step*np.arange(MaxTimeShift)
plt.plot(ShiftList, CorrelationList)
plt.title("Correlation coefficient")
plt.show()
print("The time shift between the signals is: ", ShiftList[np.argmax(CorrelationList)])
I want to plot a power spectrum from my data set (array of about 2000 values, the data is recorded every minute).
I've gotten so far as:
y= np.fft.fft(data)
abs = np.abs(y) #absolute value
p = np.square(abs) #power
but am confused about setting the frequency.
I've tried using freqs = np.fft.fftfreq(len(y)), but when I plot the result it looks like, which can't be right.
What am I doing wrong?
Here is an example to plot the power spectrum:
import matplotlib.pyplot as plt
import numpy as np
t = np.linspace(0,2000,200)
data = 2 * np.sin(2*np.pi *60*t) + 2 * np.sin(2*np.pi *42*t)
spectrum = np.fft.fft(data)
power_spectrum = np.square(np.abs(spectrum))
fig, ax = plt.subplots()
ax.plot(np.arange(len(power_spectrum)), power_spectrum)
plt.show()
I have a 1,000,000 x 2 DataFrame object consisting of data I'm trying to understand visually. Its basically a simulation of 1,000,000 events where a packet traveling along a network is either queued or dropped depending on the buffer's size. So, the two column values are Packets in Queue and Packets Dropped.
I'm trying to make a line plot using Python, Matplotlib and Jupyter Notebooks that has the ID of the event on the x-axis and the number of packets in the queue at a specific ID point on the y-axis. There should be two lines, the first representing the number of packets in the queue and the second representing the number of packets dropped. However, given that there are over 1,000,000 simulations, the graph isn't intelligible. The values are too squished together. Is it possible to make a readable graph with 1,000,000 event instances or do I need to dramatically trim the number of events?
With a million data points it will require a lot of effort and zooming in to see them in fine detail. Plotly has some nice tools for zooming in and out of plots as well as sliding your data window along the x-axis.
If you're okay with some averaging, you can plot a moving average and get close to a hundred thousand points. You can stack two subplots on each other to see both columns of data in reasonable detail. You can of course average them more, but you lose the ability to see fine details.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
def moving_avg(x, N=30):
return np.convolve(x, np.ones((N,))/N, mode='valid')
plt.figure(figsize = (16,12))
plt.subplot(3,1,1)
x = np.random.random(1000)
plt.plot(x, linewidth = 1, alpha = 0.5, label = 'linewidth = 1')
plt.plot(moving_avg(x, 10), 'C0', label = 'moving average, N = 10')
plt.xlim(0,len(x))
plt.legend(loc=2)
plt.subplot(3,1,2)
x = np.random.random(10000)
plt.plot(x, linewidth = 0.2, alpha = 0.5, label = 'linewidth = 0.2')
plt.plot(moving_avg(x, 100), 'C0', label = 'moving average, N = 100')
plt.xlim(0,len(x))
plt.legend(loc=2)
plt.subplot(3,1,3)
x = np.random.random(100000)
plt.plot(x, linewidth = 0.05, alpha = 0.5, label = 'linewidth = 0.05')
plt.plot(moving_avg(x, 500), 'C0', label = 'moving average, N = 500')
plt.xlim(0,len(x))
plt.legend(loc=2)
plt.tight_layout()
Try histogram
from matplotlib.pyplot import hist
import pandas as pd
df = pd.DataFrame()
df['x'] = np.random.rand(1000000)
hist(df.index, weights=df.x, bins=1000)
plt.show()
Method 2 line plots
df['x'] = np.random.rand(1000000)
df['y'] = np.random.rand(1000000)
w = 1000
v1 = df['x'].rolling(min_periods=1, window=w).sum()[[i*w for i in range(1, int(len(df)/w))]]/w
v2 = df['y'].rolling(min_periods=1, window=w).sum()[[i*w for i in range(1, int(len(df)/w))]]/w
plt.plot(np.arange(len(v1)),v1, c='b')
plt.plot(np.arange(len(v1)),v2, c='r')
plt.show()
We are calculating the mean of w=1000 points i.e averaging w values together and plotting them.
Looks like below when 1000000 points are bucked at every 1000 interval
I want to example the variance of a dataset, by bootstrap(resample) the data.
from numpy.random import randn
fig,ax = plt.subplots()
bins = arange(-5,6,0.5)
df = pd.DataFrame(randn(3000))
df.hist(ax=ax, bins=bins, alpha = 0.7, normed=True)
count_collection = []
for i in xrange(1,100):
temp_df = df.sample(frac=0.5, replace=True)
temp_df.hist(ax=ax, bins=bins, alpha = 0.25, normed=True)
count, division = np.histogram(temp_df, bins=bins)
count_collection.append(count)
However, such plot is hard to see the limit. Is it possible to plot the upper/lower limit of the histogram, so it can be see clearer, maybe some thing like Boxplot for each bin?
(source: matplotlib.org)
or just curves with upper/lower limit to indicate range?
My main difficulty is extracting the max/min value for each bin (The count_collection)
UPDATE:
What would be a good way to plot the range?
count_collection = np.array(count_collection)
mx = np.max(count_collection,0)
mn = np.min(count_collection,0)
ax.plot(division[1:]-0.25, mx, '_', mew=1)
ax.plot(division[1:]-0.25, mn, '_', mew=1)
I find this is still hard to look, any suggestion?
To extract the max and min you may use the following:
count_collection = np.array(count_collection)
mx = np.max(count_collection,0)
mn = np.min(count_collection,0)
The first line just changes from a list of 1d arrays to 2d array, so max and min can operate.
edit:
Since the original plot was normalized, it is hard to make sense of max and min of half the sample size. But you can do something like this:
import numpy as np
from numpy.random import randn
import matplotlib.pyplot as plt
import pandas as pd
fig,ax = plt.subplots()
bins = np.arange(-5,6,0.5)
df = pd.DataFrame(randn(3000))
#df.hist(ax=ax, bins=bins, alpha = 0.7, normed=True)
histval, _ = np.histogram(df, bins=bins)
count_collection = []
for i in np.arange(1,100):
temp_df = df.sample(frac=0.5, replace=True)
# temp_df.hist(ax=ax, bins=bins, alpha = 0.25, normed=True)
count, division = np.histogram(temp_df, bins=bins)
count_collection.append(count)
count_collection = np.array(count_collection)
mx = np.max(count_collection,0)
mn = np.min(count_collection,0)
plt.bar(bins[:-1], histval, 0.5)
plt.plot(bins[:-1] + 0.25, mx*2)
plt.plot(bins[:-1] + 0.25, mn*2)
The 2x factor is due to the 2x smaller sample size when calculating the max and min.
Does anyone know how to do this in R? That is, represent this cyclical data from the left plot to the right plot?
http://cs.lnu.se/isovis/courses/spring07/dac751/papers/TimeSpiralsInfoVis2001.pdf
Here is some example data.
Day = c(rep(1,5),rep(2,5),rep(3,5))
Hour = rep(1:5,3)
Sunlight = c(0,1,2,3,0,1,2,3,2,1,0,0,4,2,1)
data = cbind(Day,Hour,Sunlight)
This seems pretty close:
# sample data - hourly for 10 days; daylight from roughly 6:00am to 6:00pm
set.seed(1) # for reproducibility
Day <- c(rep(1:10,each=24))
Hour <- rep(1:24)
data <- data.frame(Day,Hour)
data$Sunlight <- with(data,-10*cos(2*pi*(Hour-1+abs(rnorm(240)))/24))
data$Sunlight[data$Sunlight<0] <- 0
library(ggplot2)
ggplot(data,aes(x=Hour,y=10+24*Day+Hour-1))+
geom_tile(aes(color=Sunlight),size=2)+
scale_color_gradient(low="black",high="yellow")+
ylim(0,250)+ labs(y="",x="")+
coord_polar(theta="x")+
theme(panel.background=element_rect(fill="black"),panel.grid=element_blank(),
axis.text.y=element_blank(), axis.text.x=element_text(color="white"),
axis.ticks.y=element_blank())
I know how to do this in Python. I find the scatter plot from matplotlib good for this sort of thing. Here's an example:
import matplotlib.pyplot as plt
import numpy as np
period = 0.5
f = np.arange(0, 100, 0.03) // Data range
z = np.sin(f) // Data
a = f*np.sin(period*f);
b = f*np.cos(period*f);
fig = plt.figure()
ax = plt.subplot(111)
fig.add_subplot(ax)
ax.scatter(a, b, c=z, s=100, edgecolors='none')
plt.show()
You can change period to change the number of revolutions in the spiral. a and b plot the spiral whilst z contains the actual data (in this example, a sine wave).