How do I detect and remove multiple substrings, which may vary?

How do I detect and remove multiple substrings, which may vary? - python

Every week, my school makes us type which assignments we will be doing for the week. I want to do this automatically with python.
Below is a example of what an input I might want the program to take.
Unit 1 presentation - Due in 2 days - 40 minutes spent
English Unit 2 Test - Due in 3 days - 1 hour spent
History Essay - Due in 7 days - 10 minutes spent
I want to get rid of everything other than the name of the assigments. How can I achieve this? I can't simply use the find() method since the substrings may vary.
Google hasn't been much help. I'm a beginner, so please bear with me.

I would try to take each line as line and process as
assignment_name = line.split("-")[0].strip()

Related

Python PuLP Optimization - How to improve performance?

I'm writing a script to help me schedule shifts for a team. I have the model and the script to solve it, but when I try to run it, it takes too long. I think it might be an issue with the size of the problem I'm trying to solve. Before getting into the details, here are some characteristics of the problem:
Team has 13 members.
Schedule shifts for a month: 18 operating hours/day, 7 days/week, for 4 weeks.
I want to schedule non-fixed shifts: members can be on shift during any of the 18 operating hours. They can even be on for a couple hours, off for another hour, and then on again, etc. This is subject to some constraints.
Sets of the problem:
m: each of the members
h: operating hours (0 to 17)
d: days of the week (0 to 6)
w: week of the month (0 to 4)
Variables:
X[h,h,d,w]: Binary. 1 if member m starts a shift on hour h,d,w. 0 otherwise.
Y[h,h,d,w]: Binary. 1 if member m is on shift on hour h,d,w. 0 otherwise.
W[m,d,w]: Binary. 1 if member m had any shift on day d,w. 0 otherwise.
G[m,w]: Binary. 1 if member m had the weekend off during week w. 0 otherwise.
The problem has 20 constraints, 13 which are "real constraints" of the problem and 7 which are relations between the variables, for the model to work.
When I run it, I get this message:
At line 2 NAME MODEL
At line 3 ROWS
At line 54964 COLUMNS
At line 295123 RHS
At line 350083 BOUNDS
At line 369067 ENDATA
Problem MODEL has 54959 rows, 18983 columns and 201097 elements
Coin0008I MODEL read with 0 errors
I left it running overnight and didn't even get a solution. Then I tried changing all variables to continuous variables and it took ~25 seconds to find the optimal solution.
I don't know if I'm having issues because of the size of the problem, or if it's because I'm using only binary variables. Or a combination of that.
Are there any general tips or best practices that could be used to improve performance on a model? Or is it always related to the specific model I'm trying to solve?

The long duration solve is almost certainly due to the number of binary variables in your model, and it may be degenerate, meaning that there are many solutions that produce the same value for the objective, so it is just thrashing around trying all of the (relatively equal) combinations.
The fact that it does solve when you relax the integer constraint is good and good troubleshooting. Assuming the model is constructed properly, here are a couple things to try:
Solve for 1 week at a time (4 separate solves). It isn't clear what the linkage is from week-to-week from your description, but that would reduce the model to 1/4 of its size.
Change your time-blocks to more than 1 hour. If you used 3 hour blocks, your problem would again reduce to 1/3 of its size by your description. You would only need to reduce the set H to {1,2,3,4,5,6} and then do the mental math to align that with the actual hours. Or you could do 2 hour blocks.
You should also tinker with the optimality gap. Sometimes the difference between 95% Optimal and 100% is days/weeks/years of solver time. Have you tried a .02, .05, 0.10, or 0.25 relative gap? You may still get the optimal answer, but you forgo the guarantee of optimality.

Employee Scheduling Constraints Issue

I am currently trying to develop an employee scheduling tool in order to reduce the daily workload. I am using pyomo for setting up the model but unfortunately stuck on one of the constraint setting.
Here is the simplified background:
4 shifts are available for assignation - RDO (Regular Day Off), M (Morning), D (Day) and N (Night). All of them are 8-hrs shift
Every employee will get 1 RDO per week and constant RDO is preferred (say staff A better to have Monday as day off constantly but this can be violate)
Same working shift (M / D / N) is preferred for every staff week by week (the constraint that I stuck on)
a. Example 1 (RDO at Monday): The shift of Tuesday to Sunday should be / have better to be the same
b. Example 2 (RDO at Thursday): The shift of Mon to Wed should be same as the last working day of prior week, while the shift of Fri to Sun this week also need to be same but not limit to be which shift
Since the RDO day (Mon - Sun) is different among employees, the constraint of point 3 also require to be changed people by people conditionally (say if RDO == "Mon" then do A ; else if RDO == "Tue" then do B), I have no idea how can it be reflected on the constraint as IF / ELSE statement cant really work on solver.
Appreciate if you can give me some hints or direction. Thanks very much!

The constraints you are trying to create could be moderately complicated, and are very dependent on how you set up the problem, how many time periods you look at in the model, etc. etc. and are probably beyond the scope of 1 answer. Are you taking an LP course in school? If so, you might want to bounce your framework off of your instructor for ideas.
That aside, you might want to tackle the ROD by assigning each person a cost table based on their preferences and then putting in a small penalty in the objective based on their "costs" to influence the solver to give them their "pick" -- assumes the "picks" are relatively well distributed and not everybody wants Friday off, etc.
You could probably do the same with the shifts, essentially making a parameter that is indexed by [employee, shift] with "costs" and using that in the obj in a creative way. This would be the easiest solution... others get into counting variables, big-M, etc.

You could in this case use a switch case. Use the ROD as input and the cases are the days of the week. In the cases you can than to the rest of the planing.
Here is a good reverence:
https://pythongeeks.org/switch-in-python/

Cummulative Time Spent in Specific States

I have a dataset that looks as follows:
What I would like to do with this data is calculate how much time was spent in specific states, per day. So say for example I wanted to know how long the unit was running today. I would just like to know the sum of the time the unit spent RUNNING: 45 minutes, NOT_RUNNING: 400 minutes, WARMING_UP: 10 minutes, etc.
I know how to summarize the column data on its own, but I'm looking to reference the time stamp I have available to subtract the first time it was on, from the last time it was on and get that measure of difference. I haven't had any luck searching for this solution, but there's no way I'm the first to come across this and know it can be done some how, just looking to learn how. Anything helps, Thanks!

fbprophet taking >3 days to run. Am I exploring too many params at one go?

I ran this code 3 days ago, and I do see progress. I'm not sure if these set of params should be taking this long?
I've halved the params from before, because I thought it took too long, so I hoped that this set of params would run faster, but 3 days is really long so I'm starting to question whether I'm doing this right or whether I'm missing something.
I got the code from the fbprophet documentation itself, so it may not be an issue with the code(?).
In case it matters, I'm predicting solar energy potential for the last month of 2015, based on 10 year's worth of data. Dataset is sourced from kaggle.

Is there a library or suggested tactic for shift planning with hours and breaks?

I’m trying to think through a sort of extra credit project- optimizing our schedule.
Givens:
“demand” numbers that go down to the 1/2 hour. These tell us the ideal number of people we’d have on at any given time;
8 hour shift, plus an hour lunch break > 2 hours from the start and end of the shift (9 hours from start to finish);
Breaks: 2x 30 minute breaks in the middle of the shift;
For simplicity, can assume an employee would have the same schedule every day.
Desired result:
Dictionary or data frame with the best-case distribution of start times, breaks, lunches across an input number of employees such that the difference between staffed and demanded labor is minimized.
I have pretty basic python, so my first guess was to just come up with all of the possible shift permutations (points at which one could take breaks or lunches), and then ask python to select x (x=number of employees available) at random a lot of times, and then tell me which one best allocates the labor. That seems a bit cumbersome and silly, but my limitations are such that I can’t see beyond such a solution.
I have tried to look for libraries or tools that help with this, but the question here- how to distribute start times and breaks within a shift- doesn’t seem to be widely discussed. I’m open to hearing that this is several years off for me, but...
Appreciate anyone’s guidance!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.