I'm actually trying to code a small program in Python to visualize the actual price of a crypto asset with real-time data. I already have the data (historic and actual data updated every second). I just want to find a good python library (as optimized as possible) in order to show the candlestick chart and eventually some indicators or lines/curves on the same graph. I did some quick research and it seems like "plotly" (used with "cufflinks") or "bokeh" are good choices. Which one would you advise me and why ? I'm also open to some suggestions of other libraries if they are good and optimized !
Thank you in advance :)
Take a look at https://github.com/highfestiva/finplot.
Where you can find examples of fetching realtime data from crypto exchange. Author notifies that this library designed in favor of speed and crypto.
Looks very nice.
Related
I have to develop a real-time data visualization module in python that is relatively simple, but I don't know where to begin or what tools to use.
Essentially, I would have two images drawn on either side of the screen, and depending on values streamed through lab streaming layer (LSL), the images would change size. That's it.
Any pointers would be extremely appreciated.
Maybe this would help: BrainStreamingLayer. It's a higher-level implementation around pyLSL. https://github.com/bsl-tools/bsl
It has a real-time visualization module, however, the initial use case is EEG amplifiers, so some adaptation may be required.
I began to fall in love with a Python Visualization library called Altair, and i use it with every small data science project that ive done.
Now, in terms of Industry use cases, Does it make sense to visualize Big Data or should we just take a random sample?
Short answer: no, if you're trying to visualize data with tens of thousands of rows or more, Altair is probably not the right tool. But there are efforts in progress to add support for larger datasets in the vega ecosystem; see https://github.com/vega/scalable-vega.
I am trying to histogram counts of a large (300,000 records) temporal data set. I am for now just trying to histogram by month which is only 6 data points, but doing this with either json or altair_data_server storage makes the page crash. Is this impossible to handle well with pure Altair? I could of course preprocess in pandas, but that ruins the wonderful declarative nature of altair.
If so is this a missing feature of altair or is it out of scope? I'm learning that vegalite stores the entire underlying data and applies the transformation at run time, but it seems like altair could (and maybe does) have a way to store only the relevant data for the chart.
alt.Chart(df).mark_bar().encode(
x=alt.X('month(timestamp):T'),
y='count()'
)
Altair charts work by sending the entire dataset to your browser and processing it in the frontend; for this reason it does not work well for larger datasets, no matter how the dataset is served to the frontend.
In cases like yours, where you are aggregating the data before displaying it, it would in theory be possible to do that aggregation in the backend, and only send aggregated data to the frontend renderer. There are some projects that hope to make this more seamless, including scalable Vega and altair-transform, but neither approach is very mature yet.
In the meantime, I'd suggest doing your aggregations in Pandas, and sending the aggregated data to Altair to plot.
Edit 2023-01-25: VegaFusion addresses this problem by automatically pre-aggregating the data on the server and is mature enough for production use. Version 1.0 is available under the same license as Altair.
Try below :-
alt.data_transformers.enable('default', max_rows=None)
and then
alt.Chart(df).mark_bar().encode(
x=alt.X('month(timestamp):T'),
y='count()'
)
you will get the chart but make sure to save all of your work if the browser will crash.
Using the following works for me:
alt.data_transformers.enable('data_server')
Situation
Our company generates waste from various locations in US. The waste is taken to different locations based on suppliers' treatment methods and facilities placed nationally.
Consider a waste stream A which is being generated from location X. Now the overall costs to take care of Stream A includes Transportation cost from our site as well treatment method. This data is tabulated.
What I want to achieve
I would like my python program to import excel table containing this data and plot the distance between our facility and treatment facility and also show in a hub-spoke type picture just like airlines do as well show data regarding treatment method as a color or something just like on google maps.
Can someone give me leads on where should I start or which python API or module that might best suite my scenario?
This is a rather broad question and perhaps not the best for SO.
Now to answer it: you can read excel's csv files with the csv module. Plotting is best done with matplotlib.pyplot.
I want to build an analytics engine on top of an article publishing platform. More specifically, I want to track the users' reading behaviour (e.g. number of views of an article, time spent with the article open, rating, etc), as well as statistics on the articles themselves (e.g. number of paragraphs, author, etc).
This will have two purposes:
Present insights about users and articles
Provide recommendations to users
For the data analysis part I've been looking at cubes, pandas and pytables. There is a lot of data, and it is stored in MySQL tables; I'm not sure which of these packages would better handle such a backend.
For the recommendation part, I'm simply thinking about feeding data from the data analysis engine to a clustering model.
Any recommendations about how to put all this together, as well as cool python projects out there that can help me out?
Please let me know if I should give more information.
Thank you
Scikit-learn should make you happy for the data processing (clustering) part.
For the analysis and visualization side, you have Cubes as you mentioned, and for viz I use CubesViewer which I wrote.