How to get general spot instances pricing history

How to get general spot instances pricing history - python

I would like to get spot instances pricing history for different regions and instances.
I found how to get my own requests pricing history
and found how to get the current spot instances pricing in the spot instance advisor
But can't find how to get the general history for all instance types and regions. How can I do that? preferably something that is ready for download, or in Python code.

You can use describe_spot_price_history():
Describes the Spot price history.
When you specify a start and end time, this operation returns the prices of the instance types within the time range that you specified and the time when the price changed. The price is valid within the time period that you specified; the response merely indicates the last time that the price changed.
Please note that since March 2018, Spot Prices are relatively stable. Previously, when capacity was required, AWS increase the spot price. Now, however, the Spot Price tends to stay the same but capacity is still recovered when needed. It means that higher bids do not impact the spot price.
For details, see: New Amazon EC2 Spot pricing model: Simplified purchasing without bidding and fewer interruptions | AWS Compute Blog
As a result, the Spot Price History is not particularly interesting any more. The Spot Instance Advisor is just as good a source of information to determine the likelihood of having spot instances taken away.

Related

How to Write a Limit Order Strategy with Backtrader?

I am new to Backtrader and I can't figure out how to write the following strategy:
Every morning it places a limit buy order at 80% of Open price. If the order is executed during the day (i.e. Low price < the limit price for that day), then sell the stock at Close.
I am using Yahoo's OHLC daily data.
Can any one show me how to write the Strategy part of the code? I posted a similar question on BT's official forum but couldn't get an answer.
Thanks.

Text Analysis to determine Offer Performance

I'm currently exploring different ways to judge and predict the performance of various offers and marketing campaigns. I have a list of metrics to pull from which I'm currently using now to predict performance, such as:
Day the offer was sent
Month
Weather
Time of Day
+more
And for my performance metric, I use
Redemption Rate (For every offer sent, how many times was it redeemed) - This is how I judge success
But one of the most important metrics is the offer itself, which I know in the form of a text-string.
Here are a few user-generated examples.
Get $4.00 off a large pizza
Receive 20% off your next order
Buy any Chocolate Milkshake, get another one half price
Two wraps for $7.50
Free cookie with any purchase
..and hundred's more
Now, I know there's very important information in those text stings, but I don't know the best way to analyze it and extract key information. For example, in this text it shows the product its advertising, the discount, the dollar amount, the percentage off, etc. I need a generalized way to go through each string (I'm assuming through some tolkenized method), and extract relevant information.
I'm hoping to get some input on how I could analyze these strings, eventually with the purpose of generating a string-based dataset (along with the other aforementioned data points) that I can use for predictions.
I am writing my code using python 3.0.
Any advice is greatly appreciated. Thanks.

How to find Item Lifetime based on Item Consumption?

Imagine the following problem:
Your friend bought a brand new car with 20 wheels and this car is driving large distances. As the car drives the car consumes tires. Every time your friend needs a new tire, he calls you to send a new tire. And of course you do that.
After 2 years you want to know what the lifetime of that particular tire actually is. But, the only thing you know is
That you send your friend 26 tires in the last 2 years
The dates you sent tires to your friend
His car has 20 wheels
His car was brand new to start with
All the tires were for the same car
How can we find the lifetime of this tire with only the data we have?
This problem is what I'm facing today. When a tier (or any other item) is replaced, the system will not track where in the machine it was replaced, only that it was replaced. For me this makes it difficult to come up with a method to find the lifetime.
Is there anyone, who can guide me in the right direction?
Is there a sort of python library which can be used?

The easiest solution that comes in mind is the average duration between the sending events.
This can only applied if you send tire for tire and not in a batch. In this case you have additonal complexity.
You need a list of all dates
Sort the list by date
Iterate over the list and calculate the duration between two dates following each other
Save the duration in another list
Calculate the average duration from this list
Now you have the average wheel lifetime for the whole vehicle. Multiply this duration by 20 to get the lifetime of the average tire.
Correct me if I am wrong or didnt understand the question.

Predicting customers intent

I got this Prospects dataset:
ID Company_Sector Company_size DMU_Final Joining_Date Country
65656 Finance and Insurance 10 End User 2010-04-13 France
54535 Public Administration 1 End User 2004-09-22 France
and Sales dataset:
ID linkedin_shared_connections online_activity did_buy Sale_Date
65656 11 65 1 2016-05-23
54535 13 100 1 2016-01-12
I want to build a model which assigns to each prospect in the Prospects table the probability of becoming a customer. The model will predict if a prospect going to buy, and return the probability. the Sales table gives info about 2015 sales. My approach-the 'did buy' column should be a label in the model because 1 represents that prospect bought in 2016, and 0 means no sale. another interesting column is the online activity that ranges from 5 to 685. the higher it is- the more active the prospect is about the product. so I'm trying maybe to do Random Forest model and then somehow put the probability for each prospect in the new intent column. Is a Random Forest an efficient model in this case or maybe I should use another one. How can I apply the model results into the new 'intent' column for each prospect in the first table.

Well first, please see the How to ask and the On-topic guidelines. This is more of a consulting than a practical or specific question. Maybe more appropriate topic is machine learning.
TL;DR: Random forests are nice but seem to be inappropriate due to unbalanced data. You should read about recommender systems, and more fashioned good-performing models like Wide and Deep
An answer depends on: How much data do you have? What are your available data during inference? could you see the current "online_activity" attribute of the potential sale, before the customer is buying? many questions may change the whole approach that fits for your task.
Suggestion:
Generally speaking, these is a kind of business where you usually deal with very unbalanced data - low number of "did_buy"=1 against huge number of potential customers.
On the data science side, you should define valuable metric for success that can be mapped to money directly as possible. Here, it seems that taking actions by advertising or approaching to more probable customers can rise the "did_buy" / "was_approached" is a great metric for success. Overtime, you succeed if you rise that number.
Another thing to take into account, is your data may be sparse. I do not know how much buys you usually get, but it can be that you have only 1 from each country etc. That should also be taken into consideration, since simple random forest can be easily targeting this column in most of its random models and overfitting will be come a big issue. Decision trees suffer from unbalanced datasets. However, by taking the probability of each label in the leaf, instead of a decision, can sometimes be helpful for simple interpretable models and it reflects the unbalanced data. To be honest, I do not truly believe this is the right approach.
If I where you:
I would first embed the Prospects columns to a vector by:
Converting categories to random vectors (for each category) or one-hot encoding.
Normalizing or bucketizing company sizes into numbers that fits the prediction model (next)
Same ideas regarding dates. Here, maybe year can be problematic but months/days should be useful.
Country is definitely categorical, maybe add another "unknown" country class.
Then,
I would use a model that can be actually optimized according to different costs. Logistic regression is a wide one, deep neural network is another option, or see Google's Wide and deep for a combination.
Set the cost to be my golden number (the money-metric in terms of labels), or something as close as possible.
Run experiment
Finally,
Inspect my results and why it failed.
Suggest another model/feature
Repeat.
Go eat launch.
Ask a bunch of data questions.
Try to answer at least some.
Discover new interesting relations in the data.
Suggest something interesting.
Repeat (tomorrow).
Ofcourse there is a lot more into that than just the above, but that is for you to discover on your data and business.
Hope I helped! Good luck.

Need algorithm suggestions for flight routings

I'm in the early stages of thinking through a wild trip that involves visiting every commercial airport in India. A little research shows that the national carrier - Air India, has a special ticket called the Silver Pass that allows unlimited travel on their domestic network for 15 days. I would like to use this as my weapon of choice!
See this for a map of all the airports served by Air India
I have the following information available with me in Excel:
All of the domestic flight routes (departure airports and arrival airports in IATA codes)
Duration for every flight route
Weekly frequency for every flight (not all flights run on all days of the week, for example)
Given this information, how do I figure out what is the maximum number of airports that I can hit in 15 days using the Silver Pass ticket? Looking online shows that this is either a traveling salesman problem or a graph traversal problem. What would you guys recommend that I look at to solve this.
Some background on myself - I'm just beginning to learn Python and would like to figure out a way to solve this problem using that. Given that, what are the python-based algorithms/libraries that I should be looking at that will help me structure an approach to solving this?

Your problem is closely related to the Hamiltonian Path problem and Traveling Salesman Problem, which are NP-Hard.
Given an instance of Hamiltonian Path Problem - build a flight data:
Each vertex is an airport
Each edge is a flight
All flights leave at the same time and takes the same time.(*)
(*)The flight duration and departure time [which are common for all] should be calculated so you will be able to visit all terminals only if you visit each terminal only once. It can be easily done in polynomial time. Assume we have a fixed time of k hours for the ticket, we construct the flight table such that each flight takes exactly k/(n-1) hours, and there is a flight every k/(n-1) hours as well1 [remember all flights are at the same time].
It is easy to see that if and only if the graph has a hamiltonian path, you can use the ticket to visit al airports, since if we visit a certain airport twice in the path, we need at least n flights and the total time will be at least (k/(n-1)) * n > k, and we failed the time limit. [other way around is similar].
Thus your problem [for general case] is NP-Hard, and there is no known polynomial solution for it.
1: We assume it takes no time to pass between flights, this can be easily fixed by simply decreasing flight length by the time it takes to "jump" between two flights.

Representing your problem as a graph is definitely the best option. Since the duration, number of flights, and number of airports are relatively limited, and since you are (presumably) happy with approximate solutions, attacking this by brute force ought to be practical, and is probably your best option. Here's roughly what I would do:
Represent each airport as a node on the graph, and each flight as an edge.
Given a starting airport and a current time, select all the flights leaving that airport after the current time. Use a scoring function of some sort to rank them, such that flights to airports you haven't visited are ranked higher than flights to airports you haven't visited, and flights are ranked higher the sooner they are.
Recursively explore each outgoing edge, in order of score, and repeat the procedure for the arriving airport.
Any time you reach a node with no outgoing valid edges, compare it to the best possible solution. If it's an improvement, output it and set it as the new best solution.
Depending on the number of flights, you may be able to run this procedure exhaustively. The number of solutions grows exponentially with the number of flights, of course, so this will quickly become impractical. This is where the scoring function becomes useful - it prioritizes the solutions more likely to produce useful answers. You can run the procedure for as long as you want, and stop when it produces a solution you're happy with.
The properties of the scoring function will have a big impact on how good the solutions are. If your priority is exploring unique places, you want to put a big premium on unvisited airports, and since you want to explore as many as possible, you need to prioritize short transfer times. My suggestion for a starting point would be to make the penalty for going somewhere you've already been proportional to the time it would take to fly from there to somewhere else. That way, it'll still be explored as a stopover, but avoided where possible. Also, note that your scoring function will need context, namely the set of airports that have been visited by the current candidate path.
You can also use the scoring function to apply other constraints. Say you don't want to travel during the night (a reasonable assumption); you can penalize the score of edges that involve nighttime flights.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.