Train a dataset related to oil& gas industry using lstm for finding the optimum number of test that should be conducted in a month for each oil wells - python

I have a simulated data set of the oil and gas industry. This data set contains data of 365 days for 36 wells. Flow rates of each well are noted down for 365 days. My aim is to find the optimum number of test that needs to be carried out on each well in a month with a minimum number of errors. Possibly by using LSTM.
Screen shots of the dataset

Related

Multi Station time series forecast of flood through LSTM

I need to make flood forecast for 11 different stations. Each station require different forecast i.e. flood level (water inflow in cusecs) is different for different stations. Can A single LSTM model can handle all stations.
Parameters for all stations are same, like upstream discharge, rain, slope.
Moreover Flood period is only one month or 20 days of a year. If I want to use 10 years of data is there a way to use only 2 months data each year. 1 month with flood
1 month with no flood
But according to my knowledge there should not be any break in time steps in LSTM.
Please guide me your best. i am new to machine learning.
I would be highly thankful

calculating a weighted daily average for each DOY in xarray across a decade

I have a few years of sea level height data with variables of both absolute height and sea level anomaly. I want to calculate an improved anomaly dataset that takes into account seasonal changes in absolute height. Towards that goal I'm trying to calculate the mean height at every point on the grid for each day of the year. Ideally I'd like to take into account the previous two weeks and following two weeks with the closer days carrying more weight in the final mean. I think a normal distribution of weights would be ideal. There is a nice example in the xarray documentation of how to calculate seasonal averages, but I've yet to find a suitable approach for this weighted mean of each day.
My initial ds looks like:
I'm able to calculate this daily average via:
ds_daily_avg = ds.groupby('time.dayofyear').mean(dim='time')
The output of ds_daily_avg
but there is too much variation in the daily averages because I only have a decade of data. I've thought of then just doing a rolling average of ~ 14 days and while good enough this doesn't properly do the weighting I'm hoping to implement:
ds_daily_avg.sla.rolling(dayofyear=14).mean()
Any advice for properly doing this weighted mean through time?

Unsupervised learning: Anomaly detection on discrete time series

I am working on a final year project on an unlabelled dataset consisting of vibration data from multiple components inside a wind turbine.
Datasets:
I have data from 4 wind turbines each consisting of 415 10-second intervals.
About the 10 second interval data:
Each of the 415 10-second intervals consist of vibration data for the generator, gearbox etc. (14 features in total)
The vibration data (the 14 features) have a resolution of 25.6kHz (262144 rows in each interval)
The 10-seconds are recorded once every day, at different times => A little more than 1 year worth of data
Head of dataframe with some of the features shown:
Plan:
My current plan is to
Do a Fast Fourier Transformation (FFT) from the time domain for each of the different sensors (gearbox, generator etc.) for each of the 415 intervals. From the FFT I am able to extract frequency information to put in a dataframe. (Statistical data from the FFT like spectral RMS per bin)
Build different data sets for different components.
Add features such as wind speed, wind direction, power produced etc.
I will then build unsupervised ML models that can detect anomalies.
Unsupervised models I consider using are Encoder-Decorder and clustering.
Questions:
Does it look like I have enough data for this type of task? 415
intervals x 4 different turbines = 1660 rows and approx. 20 features
Should the data be treated as a time series? (It is sampled for 10 seconds once a day at random times..)
What other unsupervised ML models/approaches that could be good for this task?
I hope this was clearly written. Thanks in advance for any input!

Forecasting product return rates based on past returns

I have a waterfall dataset which shows the returns of laptops over different weeks due to defects. For sales of each month it shows returns over different months
I transformed the data to a weekly level: For example the returns column will be the returns of the first month for each sale month.(it will be the sum of the first diagonal
I fit the pdf into a 2 parameter weibull distribution which models failure rates. (to approximate a bathtub curve)
I then used the fit curve to estimate reliability rates form the predicted cdf. Using that the unreliability rate is predicted.
However the Weibull distribution does not accurately model a bathtub curve which is frequent with failure rates and hence the accuracy is quite low.
Is there a better approach to this problem of predicted return rates?

Holt-Winters for multi-seasonal forecasting in Python

My data: I have two seasonal patterns in my hourly data... daily and weekly. For example... each day in my dataset has roughly the same shape based on hour of the day. However, certain days like Saturday and Sunday exhibit increases in my data, and also slightly different hourly shapes.
(Using holt-winters, as I found discovered here: https://gist.github.com/andrequeiroz/5888967)
I ran the algorithm, using 24 as my periods per season, and forecasting out 7 seasons (1 week), I noticed that it would over-forecast the weekdays and under-forecast the weekend since its estimating the saturday curve based on fridays curve and not a combination of friday's curve and saturday(t-1)'s curve. What would be a good way to include a secondary period in my data, as in, both 24 and 7? Is their a different algorithm that I should be using?
One obvious way to account for different shapes would be to use just one sort of period, but make it have a periodicity of 7*24, so you would be forecasting the entire week as a single shape.
Have you tried linear regression, in which the predicted value is a linear trend plus a contribution from dummy variables? The simplest example to explain would be trend plus only a daily contribution. Then you would have
Y = X*t + c + A*D1 + B*D2 + ... F * D6 (+ noise)
Here you use linear regression to find the best fitting values of X, c, and A...F. t is the time, counting up 0, 1, 2, 3,... indefinitely, so the fitted value of X gives you a trend. c is a constant value, so it moves all the predicted Ys up or down. D1 is set to 1 on Tuesdays and 0 otherwise, D2 is set to 1 on Wednesdays and 0 otherwise... D6 is set to 1 on Sundays and 0 otherwise, so the A..F terms give contributions for days other than Mondays. We don't fit a term for Mondays because if we did then we could not distinguish the c term - if you added 1 to c and subtracted one from each of A..F the predictions would be unchanged.
Hopefully you can now see that we could add 23 terms to account for an shape for the 24 hours of each day, and a total of 46 terms to account for a shape for the 24 hours of each weekday and the different 24 hours of each weekend day.
You would be best to look for a statistical package to handle this for you, such as the free R package (http://www.r-project.org/). It does have a bit of a learning curve, but you can probably find books or articles that take you through using it for just this sort of prediction.
Whatever you do, I would keep on checking forecasting methods against your historical data - people have found that the most accurate forecasting methods in practice are often surprisingly simple.

Categories

Resources