I'm hoping someone could lead me in the right direction. I have a list of users and money transactions for a service.
The data I have: userid, transaction date, transaction amount. Based on those three variables, is it possible that for a new transaction I could code something where it can decide/provide probability that the new transaction is likely recurring.
Example would be a user has a transaction for $100 at the end of the month, every month. If a new transaction for the same amount comes in the next month it's a high probability it's reoccuring, as opposed to a 25 dollar transaction towards the start of the month
or someone has had a transaction for around 1200 in December the last 10 years. If they do that again this year it's likely recurring.
Any help for guidance as to how to tackle this or machine learning models to use.
Thanks
Related
My question is something that I didn't encounter anywhere, I've been wondering if it was possible for a TF Model to determinate values between 2 dates that have real / validated values assigned to them.
I have an example :
Let's take the price of Nickel, here's it's chart the last week :
There is no data for the two following dates : 19/11 and 20/11
But we have the data points before and after.
So is it possible to use the datas from before and after these 2 points to guess the values of the 2 missing dates ?
Thank you a lot !
It would be possible to create a machine learning model to predict the prices given a dataset of previous prices. Take a look at this post for instance. You would have to modify it slightly such that it predicts the prices in the gaps given previous and upcoming prices.
But for the example you gave assuming the dates are of this year 2022, these are a Saturday and Sunday, the stock market is closed on the weekends, hence there is not price of the item. Also notice that there are other days in the year where there is not trading occurring, think about holidays, then there also is not price of course.
I work in the gym space, and I'm trying to predict the numbers of gym leavers we will see next month, the following month etc.
The number of leavers are directly be impacted by the number of joiners we had 13 months ago (for a 12 month contract) or 4 months ago (for a 3 month) contract. As you need to give a months notice.
There is some seasonality in Jan/Sept, but ultimately the type of contract a member joins only and length is the biggest contribution to how long they'd likely stay.
We have over a hundred permutations on contract types and length.
What is the best way to model this in python, and which methods.
I've created a proof of concept model in excel, which looks at historic churn rates, a month 1/2/3etc by contract, and can apply that to our current member mix and their tenure to predict how many will leave this month but it's extremely messy on lots of worksheets. But it is accurate, and outside irregular macro events is very accurate in predicting Leavers within the next month..
I've tried a linear regression based on the leaver volume this month, against all the Joiners in t-1, t-2... t-64.. but it spits out a bunch of co-efficients which don't provide any reasonable number. Some are (+)ive and some (-)ive. But i thought over a longer enough period the numbers of joiners could show estimate leavers.
I've thought Time series next, but struggle to understand how to set the data up to run that. As i have some many contract mixes, and in one way, i need to look at the data and say, this person is on this contract, has been with us X months, so has this chance of leaving.
I am new to Backtrader and I can't figure out how to write the following strategy:
Every morning it places a limit buy order at 80% of Open price. If the order is executed during the day (i.e. Low price < the limit price for that day), then sell the stock at Close.
I am using Yahoo's OHLC daily data.
Can any one show me how to write the Strategy part of the code? I posted a similar question on BT's official forum but couldn't get an answer.
Thanks.
Could you please assist me with to following question?
I have a customer activity dataframe that looks like this:
It contains at least 500.000 customers and a "timeseries" of 42 months. The ones and zeroes represent customer activity. If a customer was active during a particular month then there will be a 1, if not - 0. I need determine those customers that most likely (+ probability) will not be active during the next 6 months (2018 July-December).
Could you please direct me what approach/models should i use in order to predict this? I use Python.
Thanks in advance!
The most direct analysis would be a survival model characterizing the customer's return over time: https://towardsdatascience.com/survival-analysis-in-python-a-model-for-customer-churn-e737c5242822
If you have more information about the customer besides the time series, you can augment your model with additional signals.
I am doing a very simple database in mysql to track movement of items. The current paper form looks like this:
Date totalFromPreviousDay NewToday LeftToday RemainAtEndOfDay
1.1.2017 5 5 2 8 (5+5-2)
2.1.2017 8 3 0 11 ( 8+ 3 -0)
3.1.2017 11 0 5 6 (11+0-5)
And so forth. In my table, I want to make totalFromPreviousDay and RemainAtEndOfDay calculated fields which I show in my front end only. That is mainly cos we tend to erase on the paper due to errors. I want them to be reflected based on changes to the other two fields. As such, I did my table like this:
id
date
NewToday
LeftToday
Now the problem I am facing is, I want to select any date and be able to say "there were 5 items at the start of the day or from previous day, then 5 were added, 0 left and the day ended with 10 items"
So far, I can't really think of a way going about it. Theoretically, I want to try something like this: if the requested day is Feb. 1, 2017, start at 0 cos that's the day we started collecting data. If not, loop thru the records at 0 and doing the math until the requested date is found.
But that is obviously inefficient cos i have to start form first date until the last every time.
Is my approach ok or I should include the columns in the table? If the first, what would be the way to do it in python/mysql?
I think you have to step back a little bit and define the business needs first (it is worthwhile to talk somebody, who worked with stocks before) because these determine your table structure.
A system always tracks the current level of stocks and the movement. It is a business decision how often you save your historical stock level and this influences how you store the data.
You may save the current stock level along with all transactions. In this case you would store the stock level in the transactions table. You do not even have to sum up a transactions per day because the last transaction per day will have the daily closing stock level anyway.
You may choose to save the historic stock levels regularly (on a daily / weekly / monthly, etc. basis). In this case you will have a separate historic stock levels table with stock id, stock name (name may change over the time, so may be a good idea to save it), date and the level. If you would like to know the historic stock level for any point of time that falls between your saved points, then you need to take the latest saved stock level before the period you are looking for, and sum up all transactions to the saved period.