I am running tweepy and trying to run an advanced query through the "search_users" api. I am noticing a big difference in the search results even if the exact query is passed from the api compared to the web people search on twitter. Any thoughts?
Example
query = 'mustang AND near:"New Haven" AND within:15mi'
tweepy.search_users(q=query)
Is there a difference? Is there another API call I should look at?
After some research I have discovered that the "USER SEARCH" does not currently support advanced search parameters. This is unfortunate. If this changes I will come back and update this answer.
Related
users = api.search_users(f"{name} AND (car OR racing OR F1 OR driver OR {city})", count = 10)
I'm writing a small script to search for people on Twitter by keywords using tweepy and its search_users method, but I always get a blank response.
Maybe someone has encountered this, or knows how it works?
Twitter’s user search API does not support any of these kinds of filters or advanced searches, so Tweepy does not, either.
Firstly, apologies for the very basic question. I have looked into other answers but they haven't quite answered what I'm after. I'm confident designing a site in HTML/CSS and have very very basic knowledge of Python.
I want to run a very basic Python script on my website. It analyses tweets about a specific topic, and then posts a sentiment analysis score. I want it to run this sentiment analysis every hour and cache the score.
I have a working Python script which does this in Jupyter Notebook. Could you give me an overview of how I would make this script function online and cache the results? I've read into using Python web frameworks, but from my limited understanding, they seem like overkill?
Thank you for your help!
Could you give me an overview of how I would make this script function online
The key thing would be to uncouple the two parts of your system:
Producing the data
Showing it in a website.
So the first thing to do is have your sentiment-analysis script push its value to a database. The database could be something as simple as a csv file, or it could be a key/value store, or something like MySQL or CouchDB (or hundreds of other choices).
Over on the website you have to make a decision between:
Server-side
Client-side
If the former, you could program in Python if that is what you are most familiar with. Whatever language/framework combination you go for, there will an example tutorial of how to read a value from a database and display it: it is just about the most fundamental thing.
If client-side you will usually be programming in JavaScript. Again you need to choose a framework, but again you should easily be able to find a tutorial to follow.
(Unless you have a good reason to prefer server-side, such as familiarity with an existing framework, or security issues with accessing your database, I'd go with a client-side approach.)
I've read into using Python web frameworks... overkill?
Yes and no. You are going to need some kind of database, and some kind of framework. It would be good to understand the basics of web security, too. If the sentiment analysis is your major goal, all that is going to be a distraction, and it might be better to find a friend who already knows web programming to work with. Or just find a tutorial that is very close to what you want to do, and adapt that.
(P.S. I was going to flag your question as "too broad", but you did ask for an overview, so I hope this helps.)
I'm looking for a search engine that I can point to a column in my database that supports advanced functions like spelling correction and "close to" results.
Right now I'm just using
SELECT <column> from <table> where <colname> LIKE %<searchterm>%
and I'm missing some results particularly when users misspell items.
I've written some code to fix misspellings by running it through a spellchecker but thought there may be a better out-of-the box option to use. Google turns up lots of options for indexing and searching the entire site where I really just need to index and search this one table column.
Apache Solr is a great Search Engine that provides (1) N-Gram Indexing (search for not just complete strings but also for partial substrings, this helps greatly in getting similar results) (2) Provides an out of box Spell Corrector based on distance metric/edit distance (which will help you in getting a "did you mean chicago" when the user types in chicaog) (3) It provides you with a Fuzzy Search option out of box (Fuzzy Searches helps you in getting close matches for your query, for an example if a user types in GA-123 he would obtain VMDEO-123 as a result) (4) Solr also provides you with "More Like This" component which would help you out like the above options.
Solr (based on Lucene Search Library) is open source and is slowly rising to become the de-facto in the Search (Vertical) Industry and is excellent for database searches (As you spoke about indexing a database column, which is a cakewalk for Solr). Lucene and Solr are used by many Fortune 500 companies as well as Internet Giants.
Sphinx Search Engine is also great (I love it too as it has very low foot print for everything & is C++ based) but to put it simply Solr is much more popular.
Now Python support and API's are available for both. However Sphinx is an exe and Solr is an HTTP. So for Solr you simply have to call the Solr URL from your python program which would return results that you can send to your front end for rendering, as simple as that)
So far so good. Coming to your question:
First you should ask yourself that whether do you really require a Search Engine? Search Engines are good for all use cases mentioned above but are really made for searching across huge amounts of full text data or million's of rows of tabular data. The Algorithms like Did you Mean, Similar Records, Spell Correctors etc. can be written on top. Before zero-ing on Solr please also search Google for (1) Peter Norvig Spell Corrector & (2) N-Gram Indexing. Possibility is that just by writing few lines of code you may get really the stuff that you were looking out for.
I leave it up to you to decide :)
I would suggest looking into open source technologies like Sphynx Search.
Before going down the Solr/Sphinx route for full text indexing - which adds complexity and their own overhead - you can try the built-in full text engine in PostgreSQL if you are using that database. It's easy to setup and performs better than LIKE queries.
Check out https://github.com/hcarvalhoalves/django-tsearch2
I am sorry for asking but I am new in writing crawler.
I would like to crawl Twitter space for Twitter users and follow relationship among them using python.
Any recommendation for starting points such as tutorials?
Thank you very much in advance.
I'm a big fan of Tweepy myself - https://github.com/tweepy/tweepy
You'll have to refer to the Twitter docs for the API methods that you're going to need. As far as I know, Tweepy wraps all of them, but I recommend looking at Twitter's own docs to find out which ones you need.
To construct a following/follower graph, you're going to need some of these:
GET followers/ids - grab followers (in IDs) for a user
GET friends/ids - grab followings (in IDs) for a user
GET users/lookup - grab up to 100 users, specified by IDs
besides reading the twitter api?
a good starting point would be the great python twitter library by mike verdona which personally I think is the the best one. (also an intorduction here)
also see this question in stackoverflow
I'm working on a project that is quite search-oriented. Basically, users will add content to the site, and this content should be immediately available in the search results. The project is still in development.
Up until now, I've been using Haystack with Xapian. One thing I'm worried about is the performance of the website once a lot of content is available. Indexing will have to occur very frequently if I want to emulate real-time search.
I was reading up on MongoDB recently. I haven't found a satisfying answer to my question, but I have the feeling that MongoDB might be of help for the real-time search indexing issue I expect to encounter. Is this correct? In other words, would the search functionality available in MongoDB be more suited for a real-time search function?
The content that will be available on the site is large unstructured text (including HTML) and related data (prices, tags, datetime info).
Thanks in advance,
Laundro
I don't know much about MongoDB, but I'm using with great success Sphinx Search - simple, powerful and very fast tool for full text indexing&search. It also provides Python wrapper out-of-the-box.
It would be easier to pick it up if Haystack provided bindings for it, unfortunately Sphinx bindings are still on a wish list.
Nevertheless, setting Spinx up is so quick (I did it in a few hours, for existing in-production Django-based CRM), that maybe you can give it a try before switching to a more generic solution.
MongoDB is not really a "dedicated full text search engine". Based on their full text search docs you can only create a array of tags that duplicates the string data or other columns, which with many elements (hundreds or thousands) can make inserts very expensive.
Agree with Tomasz, Sphinx Search can be used for what you need. Real time indexes if you want it to be really real time or Delta indexes if several seconds of delay are acceptable.