How to build data streaming pipeline with python?

How to build data streaming pipeline with python? - python

I want to build an application that analyzes tweets in real time and applies a machine-learning model.
What would be the best architecture to use that would allow easy integration of the python ecosystem? I've found a Faust from Robinhood, but it looks like it is no longer maintained.
Would really appreciate your suggestions.

Related

Running a Python Script on a Website (in the background)

Firstly, apologies for the very basic question. I have looked into other answers but they haven't quite answered what I'm after. I'm confident designing a site in HTML/CSS and have very very basic knowledge of Python.
I want to run a very basic Python script on my website. It analyses tweets about a specific topic, and then posts a sentiment analysis score. I want it to run this sentiment analysis every hour and cache the score.
I have a working Python script which does this in Jupyter Notebook. Could you give me an overview of how I would make this script function online and cache the results? I've read into using Python web frameworks, but from my limited understanding, they seem like overkill?
Thank you for your help!

Could you give me an overview of how I would make this script function online
The key thing would be to uncouple the two parts of your system:
Producing the data
Showing it in a website.
So the first thing to do is have your sentiment-analysis script push its value to a database. The database could be something as simple as a csv file, or it could be a key/value store, or something like MySQL or CouchDB (or hundreds of other choices).
Over on the website you have to make a decision between:
Server-side
Client-side
If the former, you could program in Python if that is what you are most familiar with. Whatever language/framework combination you go for, there will an example tutorial of how to read a value from a database and display it: it is just about the most fundamental thing.
If client-side you will usually be programming in JavaScript. Again you need to choose a framework, but again you should easily be able to find a tutorial to follow.
(Unless you have a good reason to prefer server-side, such as familiarity with an existing framework, or security issues with accessing your database, I'd go with a client-side approach.)
I've read into using Python web frameworks... overkill?
Yes and no. You are going to need some kind of database, and some kind of framework. It would be good to understand the basics of web security, too. If the sentiment analysis is your major goal, all that is going to be a distraction, and it might be better to find a friend who already knows web programming to work with. Or just find a tutorial that is very close to what you want to do, and adapt that.
(P.S. I was going to flag your question as "too broad", but you did ask for an overview, so I hope this helps.)

Is there a way to use pre-trained R ML model in python web app?

More of a theoretical question:
Use case: Create an API that takes json input, triggers ML algorithm inside of it and returns result to the user.
I know that in case of python ML model, I could just pack whole thing into pickle and use it easily inside of my web app. The problem is that all our algorithms are currently written in R and I would rather avoid re-writing them to python. I have checked a few libraries that allow to run R code within python but I cannot find a way to pack it "in a pickle way" and then just utilize.
It may be stupid but I have not had much to do with R so far.
Thank you in advance for any suggestions!

Not sure what calling R code from Python has to do with ML models.
If you have a trained model, you can try converting it into ONNX format (emerging standard), and try using the result from Python.

Data Privacy with Tensorboard

I've recently begun using Tensorflow via Keras and Python 3.5 to analyze company data, and I am by no means an expert and only recently built my first "real-world" model.
With my experimental data I used Tensorboard to visualiza how my neural network was working, and I would like to do the same with my real data. However, my company is extremely strict about company data leaving our servers - so my question is this:
Does tensorboard take the raw data used in the model and upload it off-site to generate its reports/visuals or does it only use processed data/results from my model?
I've done several google searches already, and I haven't found anything conclusive one way or the other.
If I'm not asking this question correctly, please let me know - I'm new to all of this.
Thank you.

No, Tensorboard does not upload the data to "the cloud" or anywhere outside the computer where it is running, it just interprets data produced by the model.

extract grammar features from sentence on Google App Engine

For my GAE app I need to do some natural language processing to extract the subject and object from an input sentence.
Apparently NLTK can't be installed (easily) on GAE so I am looking for another solution.
I noticed GAE comes with Antlr3 but from browsing their documentation it solves a different kind of grammar problem.
Any ideas?

You can easily build and NTLK RPC server on some machine and access it.
Another option is to find another web based service that already does that (such as opencalais).

With regards to the NLTK problem specifically, my solution would probably be to fix the weird imports that NLTK is doing, and use that as originally planned. When you're done, submit a patch of course.
That said, if this ultimately involves touching the data store, the answer is that it probably can't be done in a performant way, unless your data set is small or for some reason your NLP stuff doesn't need to hit some kind of full-text index. The GAE guys are working on it, but they have indicated that no one should be expecting a quick resolution to this particular issue.

Yahoo Chat in Python

I'm wondering how the best way to build a way to interface with Yahoo Chat is. I haven't found anything that looks incredibly easy to do yet. One thought it to build it all from scratch, the other thought would be to grab the code from open source software. I could use something like zinc, however, this maybe more complex than it needs to be. Another option would be to find a library that supports it, however, I haven't seen one. What are your thoughts on how to proceed and what would be the best way? I'm not necessarily looking for the fastest way as this is a bit of a learning project for me.

Python-purple is a python API for accessing libpurple, the Pidgin backend. It will give you access to all the IM networks which Pidgin supports, including Y!Messenger, MSN Messenger, Jabber/GTalk/XMPP, and more...

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.