I have a dataset that looks as follows:
What I would like to do with this data is calculate how much time was spent in specific states, per day. So say for example I wanted to know how long the unit was running today. I would just like to know the sum of the time the unit spent RUNNING: 45 minutes, NOT_RUNNING: 400 minutes, WARMING_UP: 10 minutes, etc.
I know how to summarize the column data on its own, but I'm looking to reference the time stamp I have available to subtract the first time it was on, from the last time it was on and get that measure of difference. I haven't had any luck searching for this solution, but there's no way I'm the first to come across this and know it can be done some how, just looking to learn how. Anything helps, Thanks!
Related
I'd like to keep the last 10 minutes of a time series in memory with some sort of deque system in Python.
Right now I'm using deque but I may receive 100 data points in few seconds and then nothing for few seconds.
Any idea ?
I read something about FastRBTree in a post but it dated back to 2014. Is there any better solution now?
I am mostly interested in computing the standard deviation over a fixed period of time, So the less data I receive within that fixed period of time, the less the standard deviation will be
If you are concerned about container size, "simplest" thing might be to use the deque and just set a maxlen argument and then as it overflows, the oldest adds are just lost, but that does not guarantee 10 minutes worth obviously. But it is an efficient data structure for this.
If you want to "trim by time in deque" then you probably need to create a custom class that can hold the data and a timestamp of some kind and then periodically poll the end of the deque for the time of the earliest item and keep popping until you are no later than current time + 10 mins.
If things are happening more dynamically, you might use some kind of database structure to do this (not my area of expertise, but seems plausible path to pursue) and possible re-ask a similar question (with some more details) as a database or sqlite question.
I ran this code 3 days ago, and I do see progress. I'm not sure if these set of params should be taking this long?
I've halved the params from before, because I thought it took too long, so I hoped that this set of params would run faster, but 3 days is really long so I'm starting to question whether I'm doing this right or whether I'm missing something.
I got the code from the fbprophet documentation itself, so it may not be an issue with the code(?).
In case it matters, I'm predicting solar energy potential for the last month of 2015, based on 10 year's worth of data. Dataset is sourced from kaggle.
I have a program that outputs flight time but sometimes it's less than an hour and in that case I don't want a 0 hour displayed.
I could use an if statement for one that has hours and one that doesn't but that doesn't seem efficient. Also would be nice if minutes is zero don't display minutes either.
landed_time_msg = time.strftime("Apx. flt. time %-H Hours : %-M Mins. ",time.gmtime(self.landed_time))
I’m trying to think through a sort of extra credit project- optimizing our schedule.
Givens:
“demand” numbers that go down to the 1/2 hour. These tell us the ideal number of people we’d have on at any given time;
8 hour shift, plus an hour lunch break > 2 hours from the start and end of the shift (9 hours from start to finish);
Breaks: 2x 30 minute breaks in the middle of the shift;
For simplicity, can assume an employee would have the same schedule every day.
Desired result:
Dictionary or data frame with the best-case distribution of start times, breaks, lunches across an input number of employees such that the difference between staffed and demanded labor is minimized.
I have pretty basic python, so my first guess was to just come up with all of the possible shift permutations (points at which one could take breaks or lunches), and then ask python to select x (x=number of employees available) at random a lot of times, and then tell me which one best allocates the labor. That seems a bit cumbersome and silly, but my limitations are such that I can’t see beyond such a solution.
I have tried to look for libraries or tools that help with this, but the question here- how to distribute start times and breaks within a shift- doesn’t seem to be widely discussed. I’m open to hearing that this is several years off for me, but...
Appreciate anyone’s guidance!
I have two separate programs; one counts the daily view stats and another calculates earning based on the stats.
Counter runs first and followed by Earning Calculator a few seconds later.
Earning Calculator works by getting stats from counter table using date(created_at) > date(now()).
The problem I'm facing is that let's say at 23:59:59 Counter added 100 views stats and by the time the Earning Calculator ran it's already the next day.
Since I'm using date(created_at) > date(now()), I will miss out the last 100 views added by the Counter.
One way to solve my problem is to summarise the previous daily report at 00:00:10 every day. But I do not like this.
Is there any other ways to solve this issue?
Thanks.
You have to put a date on your data and instead of using now() use it.