Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 days ago.
Improve this question
I have tried to go through many data sources but failed to find intraday historical data of Hang Seng Composite Index (whether free or not). Long story short, I am now trying to get intraday historical data of HSCI Index from Trading View. Recently they added a feature called deep backtesting which allows testing different strategies with more high frequency historical data than before. However, for some unknown reasons, you cannot access the data itself, although you can see the history of your trades when testing the strategy. Thus, I had an idea to simply make a script (strategy) to buy every 5 minutes. That way, I would be able to see historical intraday data every 5 minutes. However, I am an absolute zero when it comes to coding, so I have no idea how to make such a script. So please give some ideas. Will be extremely grateful for help!
Again, I have tried lots of different sources but couldn't find where to buy such dataset.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 months ago.
Improve this question
My friend is doing his thesis related to fish spawning in rivers, for this, he collects hours of data that he then analysis manually in Audacity and looks for specific patterns in the spectrograms that might indicate the sound of fish spawning.
Since he has days' worth of data I proposed a challenge to myself: to create an algorithm that might help him in detecting these patterns.
I am fairly new to Machine Learning, but a junior in programming and this sounds like a fun learning experience.
I identify the main problems as:
samples are 1 hour in length.
noise in the background (such as cars and the rivers)
Is this achievable with machine learning or should I look into other options? If yes which ones?
Thank you for taking the time to read!
the first step would be to convert the sound signals into features that machines can understand. Maybe look into MFCCs for that.
Given that you have an appropriate feature representation of your problem domain, the main thing to consider would be what kind of machine learning algorithm would you apply? Unless you would like to sit and annotate hours of data, naive supervised learning is out of the window.
I think your best bet would be to modify VAD (voice activity detection) algorithms or better yet, Speaker recognition/Identification modals.
You could also approach it by first having a complex enough representation that allows you to "see" the sound and comparing it with every frame in the test data of the specific length. Might be useful to check out DTW (Dynamic Time warping)
If you have not designed such modals before, it will be a bit difficult and might take quite a long time.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I'm doing a project where I have data of 100 sensors and its cycles until it breaks. It shows a lot of characteristcs until its failure, and then shows it for the replacement sensor. With this data, I have to built a model where I can predict for how long the sensor will work until its failure, but only with a few data, not the full cycle. I have no idea what machine learning model is suitable for this.
The type of problem you are describing is known as survival analysis. A wide range of both statistical and machine learning methods are available to help you solve these type of problems.
What is great about these methods is it also allow you to use data points where the event you are interested in has not occur. In your example, it means you can possibly extend your dataset by including data from sensors which has not failed yet.
When you look at the methods I suggest you also spend some time examining how to evaluate these types of models, since the evaluation methods are also slightly different then in typical machine learning problems.
A comprehensive range of techniques is available at: http://dmkd.cs.vt.edu/TUTORIAL/Survival/Slides.pdf
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
all.
I have a question on how to add missing values to a dataset object.
I'm currently working on crop growth modeling, and employ NASA Power API as a weather dataset.
However, the NASA Power dataset has missing days.
enter image description here
I used pcse library in order to extract NASA Power dataset.
My question is, how to add the missing day's data.
I tried
wdp(date) = wdp(date-timedelta(days=1))
but it gives me back 'can't assign to function call'
anyhow, it seems that the data for the missing date does not exist in the object and I am not allowed to make it.
You have the right idea, but the wrong syntax. In Python, list and dict access uses square brackets ([]), see the docs.
To add to that, pcse’s WeatherDataProvider object does not support this style access. Checking out the code in this link, it appears there is a method you can call named _store_WeatherDataContainer, where the leading _ indicates it is not intended for public use, but that doesn’t mean you can’t :-)
It should look like this:
wdp._store_WeatherDataContainer(wdp(date-timedelta(days=1)), date)
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
was just wondering whether anyone had any thoughts on best practices when working in databricks. It is financially costing a lot to develop within databricks, hence would like to know where else it would be best to develop python code in. With thought also to collaborative work, is there a similar set up to databricks for collaborative work that is free or of little cost to use.
Any suggestions, greatly appreciated!
The cost of Databricks is really related to the size of the clusters you are running (1 worker, 1 driver or 1 driver 32 workers?), the spec of the machines in the cluster (low RAM and CPU or high RAM and CPU), and how long you leave them running (always running or short time to live, aka "Terminate after x minutes of inactivity". I am also assuming you are not running the always on High Concurrency cluster mode.
Some general recommendations would be:
work with smaller datasets in dev, eg representative samples which would enable you to...
work with smaller clusters in dev, eg instead of working with large 32 node clusters, work with 2 node small clusters
set time to live as short eg 15 mins
which together would reduce your cost
Obviously there is a trade-off in assembling representative samples and making sure your outputs are still accurate and useful but that's up to you.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm developing a little tool which is able to classify musical genres. To do this, I would like to use a K-nn algorithm (or another one, but this one seems to be good enough) and I'm using python-yaafe for the feature extraction.
My problem is that, when I extract a feature from my song (example: mfcc), as my songs are 44100Hz-sampled, I retrieve a lot (number of sample windows) of 12-values-array, and I really don't know how to deal with that. Is there an approach to get just one representative value per feature and per song?
One approach would be to take the least RMS energy value of the signal as a parameter for classification.
You should use a music segment, rather than using the whole music file for classification.Theoretically, the part of the music of 30 sec, starting after the first 30 secs of the music, is best representative for genre classification.
So instead of taking the whole array, what you can do is to consider the part which corresponds to this time window, 30sec-59sec. Calculate the RMS energy of the signal separately for every music file, averaged over the whole time. You may also take other features into account, eg. , MFCC.
In order to use MFCC, you may go for the averaged value of all signal windows for a particular music file. Make a feature vector out of it.
You may use the difference between the features as the distance between the data points for classification.