Predict the future graph based on averages of given data - python

I am trying to make a future stock price forecaster, i am nearly done but the final step has stumped me.
How do i predict the future of the graph based on the different averages of given data?
#how it works up to now:
stockprice =[1, 2, 3, ... 9999]
#for every number in stock price, add that number till x amount(x would be input) numbers and divide them (calculate average)
StockDataSeperate = StockData_AverageFinder[-int_toSplitBy:-1]
for num in StockDataSeperate:
Average += num
Average = Average / len(StockDataSeperate)
Averaged_StockData = np.append(Averaged_StockData, Average)
#doing this x amount of times and exponentiating the number to average by, by x.
using this data (StockPrice averaged graphs), is it possible to predict the future of the raw data using the data averaged?
if anyone has any links or ideas i would be so greatful!

Obviously, using a moving average for future values does not work since you don't have values beyond the present. In theory you would assume that near-term stock prices follow a random walk, so you best guess for future value would be to simply predict the last known value.
However, a more "exciting" solution could be to train a LSTM by turning the stock price series into a supervised learning problem. It is important that you dont predict the price itself but the return between the stock prices in your time series. Of course you can also use the returns of moving averages as inputs or even multiple moving averages and conduct multivariate time series forecasting.
Hopefully, I don't have to mention that stock price prediction is not that "easy" - it's a good exercise though.

Related

Serially Correlated Demand in Python

Trying to solve the following problem but not sure how to continue:
Suppose a logistics company would like to simulate demand for a given product.
Assume that there are Good and Bad weeks.
On a good week, the demand is normally distributed with mean 200 and standard deviation 50.
On a bad week, the demand is normally distributed with mean 100 and standard deviation 30.
As a practical constraint, you should round the decimal part of demand to the nearest integer and set it to zero if it is ever negative.
Additionally, we should assume that a week being good or bad is serially correlated across time.
Conditional on a given week being Good, the next week remains Good with probability 0.9. Similarly, conditional on a given week being Bad, the next week remains Bad with probability 0.9.
You are to simulate a time series of demand for 100 weeks, assuming the first week starts Good. Also, plot the demand over time.
This is what I have so far:
rng = np.random.default_rng(seed=42)
simulated_demand = [rng.normal(200, 50)]
for t in range(1, 100):
if simulated_demand[t-1] == rng.normal():
simulated_demand.append(rng.normal(150, 70))
else:
simulated_demand.append(rng.normal(50, 15))
simulated_demand = pd.DataFrame(simulated_demand, columns=['Demand Time Series'])
simulated_demand.plot(style='r--', figsize=(10,3))
How can I fix the if condition?

A Python method for calculating rolling Sum for a single line item and then using that to perform further calculations

Disclaimer: Python and Stack Overflow Noob here.
I have a data frame which looks like this below:
And I need to calculate the annualized return rate for each product in that Data Frame using the formula:
ARR = (Rolling 3 Month Sum of Returns) / (3 Month Moving Average of Field Population) X 100
The question is:
In that DataFrame, the rolling sum of returns and the moving average of Field Population are wrong since, they are calculated for the entire data frame and not for each product. How do I make it calculate for each product?

Is there a way to assign probabilities to samples in a random number generator?

I have a financial dataset with monthly aggregates. I know the real world average for each measure.
I am trying to build some dummy transactional data using Python. I do not want the dummy transactional data to be entirely random. I want to model it around the real world averages I have.
Eg - If from the real data, the monthly total profit is $1000 and the total transactions are 5, then the average profit per transaction is $200.
I want to create dummy transactions that are modelled around this real world average of $200.
This is how I did it :
import pandas as pd
from random import gauss
bucket = []
for _ in range(5):
value = [int(gauss(200,50))]
bucket += value
transactions = pd.DataFrame({ 'Amount' : bucket})
Now, the challenge for me is that I have to randomize the identifiers too.
For eg, I know for a fact that there are three buyers in total. Let's call them A, B and C.
These three have done those 5 transactions and I want to randomly assign them when I create the dummy transactional data. However, I also know that A is very likely to do a lot more transactions than B and C. To make my dummy data close to real life scenarios, I want to assign probabilities to the occurence of these buyers in my dummy transactional data.
Let's say I want it like this:
A : 60% appearance
B : 20% appearance
C : 20% appearance
How can I achieve this?
What you are asking is not a probability. You want a 100% chance of A having 60% chance of buying. For the same take a dict as an input that has a probability of each user buying. Then create a list with these probabilities on your base and randomly pick a buyer from the list. Something like below:
import random
#Buy percentages of the users
buy_percentage = {'A': 0.6, 'B': 0.2, 'C': 0.2}
#no of purchases
base = 100
buy_list = list()
for buyer, percentage in buy_percentage.items():
buy_user = [buyer for _ in range(0, int(percentage*base))]
buy_list.extend(buy_user)
for _ in range(0,base):
#Randomly gets a buyer but makes sure that your ratio is maintained
buyer = random.choice(buy_list)
#your code to get buying price goes below
UPDATE:
Alternatively, the answer given in the below link can be used. This solution is better in my opinion.
A weighted version of random.choice

What kind of Machine Learning Problem it is if we have to Predict customer next spend category in python?

I have a data set of shape -> (6210782, 5).
This has 200,000 unique customers and their transactions at different different outlets. Time Series is little over an year.
df.head()
customer_id TransactionDate TransationTime Amount OutletCategory
514 22-04-2015 19:42:18 9445 M16
514 23-04-2015 16:29:28 2000 M23
514 02-05-2015 15:17:55 1398 M16
514 27-06-2015 13:51:29 1995 M7
514 07-08-2015 17:31:30 2000 M23
What Kind of Machine Learning Problem it is and what should be the approach and algorithm used in solving following tasks:
1) predict customers Next Transaction category?
(I am thinking of this as multinomial classification)
2) predict customers Next Transaction category in next 6 hrs?
3) predict customers Next Transaction Amount?
(Is this an LSTM task)
4) predict customers Next Transaction Amount in next 6 hrs?
As we have 200,000 unique customers how should I prepare the data if I have to predict the next transaction amount ? Should I pivot the customers to columns???
Data/ Time Series Exploration that may help visualize the data:
Below is the Transactions Amount wrt to categories over the time series chart:
For below charts: I have created a small data set of "Datetime" as index and "Amount" column to understand the transnational behavior wrt to time.
Amount Spend to Transaction Dates chart:
Amount Spend to Weekly TransactionDates chart:
Mean of Amount spend in a day(hourly)
Expectations:
I am new to Data Science and Python so just looking for right steps to proceed with the task (will manage the code myself)
There will be never the exactly right answer to this kind of problem.
To your problems:
Everything related to 6 hours seems like to be a Timeseries problem. The works e.g. with Arima-Models.
3) Is a Regression, you basically have to predict a amount which has a wide range of possibilities. The starting point could be a linear-regression. But there are also other algorithms for that
1) Should be a multiclass problem, for this you could use a decision tree e.g.
In general:
To give you more ideas: Scikit-Learn https://scikit-learn.org/stable/ can be a good starting point for you.

Margin of Error for Complex Sample in Python

I have weighted STATA dataset of national survey (n=6342). The data has already been weighted, i.e. each respondent represents 4000 respondents on average.
I am reading dataset with pandas.read_stata function. Basically, what I need to achieve is to extract data by each question with respected frequencies(%) along with Margin of Error for each frequency.
I have written Python code to do it and it works perfectly fine with frequency itself, i.e. calculating sum of weighted value in each frequency and divide it by total weighted value sum.
Pseudo-code looks like this:
q_5 = dataset['q5'].unique()`
frequencies = {}
for value in q_5:
variable = dataset[dataset['q5'] == value]
freq = ((variable['indwt'].sum()/weights_sum)*100)
freq = round(freq,0)
frequencies.update({value : freq})
However, I cannot get the proper confidence intervals or margin of error since this is a complex sample.
I was advised to use R instead, but taking into consideration syntax learning curve, I would rather stick with Python.
Is there any statistical package for Python that could calculate ME for complex sample?

Categories

Resources