O. community. Long time lurker, first time posting.
Consider stacked area charts that are made up of the sum of individual curves. I am trying to create 'optimized' stacked area charts in python that iterate individual curves along the x-axis within given constraints to find a minimum area, minimum slope, or minimum peak.
My x-axis data is time and y-axis data are headcount resources for a given project. This will help find the minimum number of resources needed on co-current projects by varying the start times.
Here's my starter code with example data (not much I know), but just to illustrate:
import pandas as pd
import matplotlib.pyplot as plt
data = {'Project_1':[0,25,36,14,2],
'Project_2':[2,78,45,11,1],
'Project_3':[25,16,59,8,0]}
df = pd.DataFrame(data)
ax = df.plot.area()
plt.show()
I'd like to set 2-constraints for each project of when each project needs to start (minimum time and max time). For example:
Project 1: Time 1 >= x1 > Time 2 (project 1 must start at or later than Time 1 and before Time 2)
Project 2: Time 4 > x2 > Time 4
Project 3: x3 = x1 (ie- project 3 must start the same time as project 1)
Once those are set, the curves would iterate (for loop?) along their given constraints to find the minimum total area, or minimum slope (ie- closest to a flat line), or minimum peak. Any of these solution optimizations are OK, and the ability to choose the solution would be bonus points. There also might be an interesting solution trying to minimize the total curve length(?).
The output would be:
A graph like the one shown, or multiple if there are multiple solutions (maybe top 3 best)
The starting times for x1,x2,x3, etc
Any help, tips, etc on this would be appreciated! Thank you so much!
I've tried searching for iterating curves and stacked area charts with little success. I included some basic code for illustration. I'm a little stumped on this one... enough to try stack overflow for the first time!
I have a time series of rain intensity (in µm/s), which I resample to 1 minute intervals. The data already has a 1 minute time step, but I may have data outage due to quality checks or basic equipment failure. The resample ensures that I have a consistent, equidistant time series to loop over, which is fastest for me so far.
The problem is that in theory I can choose another time step for the calculation, say 5 minutes. I have found that this gives larger dimensions for a rainwater basin, which was odd to me. I figured out that it is because the sum of the resample systematically gives higher values, i.e. more precipitation -> larger basin.
How is it that resample gives this odd result? Is it because it can take the same time steps and account for them in different resampled time steps...?
File is uploaded here
import pandas
import numpy
import datetime
import matplotlib
from matplotlib import pyplot as plt
data1 = pandas.read_csv("rain_1min.txt", sep=";", parse_dates=["time"], index_col="time")
test = list(range(1,121))
sums = []
for timestep in test:
data_rs = data1["rain"].resample(f"{timestep}Min").mean().replace("nan", 0.0)
sums.append(numpy.nansum(data_rs))
fig, ax = plt.subplots(figsize=[8,4], dpi=100)
ax.plot(test, sums)
ax.set_xlabel("Rule = x Min")
ax.set_ylabel("Sum of mean()")
I have data from a number of high frequency data capture devices connected to generators on an electricity grid. These meters collect data in ~1 second "bursts" at ~1.25ms frequency, ie. fast enough to actually see the waveform. See below graphs showing voltage and current for the three phases shown in different colours.
This timeseries has a changing fundamental frequency, ie the frequency of the electricity grid is changing over the length of the timeseries. I want to roll this (messy) waveform data up to summary statistics of frequency and phase angle for each phase, calculated/estimated every 20ms (approx once per cycle).
The simplest way that I can think of would be to just count the gap between the 0 passes (y=0) on each wave and use the offset to calculate phase angle. Is there a neat way to achieve this (ie. a table of interpolated x values for which y=0).
However the above may be quite noisy, and I was wondering if there is a more mathematically elegant way of estimating a changing frequency and phase angle with pandas/scipy etc. I know there are some sophisticated techniques available for periodic functions but I'm not familiar enough with them. Any suggestions would be appreciated :)
Here's a "toy" data set of the first few waves as a pandas Series:
import pandas as pd, datetime as dt
ds_waveform = pd.Series(
index = pd.date_range('2020-08-23 12:35:37.017625', '2020-08-23 12:35:37.142212890', periods=100),
data = [ -9982., -110097., -113600., -91812., -48691., -17532.,
24452., 75533., 103644., 110967., 114652., 92864.,
49697., 18402., -23309., -74481., -103047., -110461.,
-113964., -92130., -49373., -18351., 24042., 75033.,
103644., 111286., 115061., 81628., 61614., 19039.,
-34408., -62428., -103002., -110734., -114237., -92858.,
-49919., -19124., 23542., 74987., 103644., 111877.,
115379., 82720., 62251., 19949., -33953., -62382.,
-102820., -111053., -114555., -81941., -62564., -19579.,
34459., 62706., 103325., 111877., 115698., 83084.,
62888., 20949., -33362., -61791., -102547., -111053.,
-114919., -82805., -62882., -20261., 33777., 62479.,
103189., 112195., 116380., 83630., 63843., 21586.,
-32543., -61427., -102410., -111553., -115374., -83442.,
-63565., -21217., 33276., 62024., 103007., 112468.,
116471., 84631., 64707., 22405., -31952., -61108.,
-101955., -111780., -115647., -84261.])
I have a time series sensor data. It is for a period of 24 hours sampled every minute (so in total 1440 data points per day). I did a fft on this to see what are the dominant frequencies. But what I got is a very noisy fft and a strong peak at zero.
I have already subtracted the mean to remove for the DC component at bin 0. But I still get a strong peak at zero. I'm not able to figure what could be the other reason or what should I try next to remove this.
The graph is very different from I have usually seen online as I was learning about fft. In the sense, I'm not able to see dominant peaks like how it is usually seen. Is my fft wrong?
Attaching code that i tried and images:
import numpy as np
from matplotlib import pyplot as plt
from scipy.fftpack import fft,fftfreq
import random
x = np.random.default_rng().uniform(29,32,1440).tolist()
x=np.array(x)
x=x-x.mean()
N = 1440
# sample spacing
T = 1.0 / 60
yf = fft(x)
yf_abs = abs(yf).tolist()
plt.plot(abs(yf))
plt.show()
freqs = fftfreq(len(x), 60)
plt.plot(freqs,yf_abs)
plt.show()
Frequency vs amplitude
Since I'm new to this, I'm not able to figure out where I'm wrong or interpret the results. Any help will be appreciated. Thanks! :)
I have read the following sentence:
Figure 3 depicts how the pressure develops during a touch event. It
shows the mean over all button touches from all users. To account for
the different hold times of the touch events, the time axis has been
normalized before averaging the pressure values.
They have measured the touch pressure over touch events and made a plot. I think normalizing the time axis means to scale the time axis to 1s, for example. But how can this be made? Let's say for example I have a measurement which spans 3.34 seconds (1000 timestamps and 1000 measurements). How can I normalize this measurement?
If you want to normalize you data you can do as you suggest and simply calculate:
z_i=\frac{x_i-min(x)}{max(x)-min(x)}
(Sorry but i cannot post images yet but you can visit this )
where zi is your i-th normalized time data, and xi is your absolute data.
An example using numpy:
import numpy
x = numpy.random.rand(10) # generate 10 random values
normalized = (x-min(x))/(max(x)-min(x))
print(x,normalized)