API Twitter: I want to exclude an specific argument using api.search - python

I need to get tweets that contains a specific subject. I want to see what customers are talking about the company 'this_company', but I don't want the tweets from 'this_company'. Therefore, I want to exclude screen_name = 'this_company'
I'm using:
posts = api.search(q = 'this_company', lan='en', tweet_mode = 'extended', since = '2020-07-10'
I tried to put screen_name != 'this_company', but it doesn't work (I don't think I can pass an argument with !=).
Does someone know how I can do that?

I believe you can use operators directly in the query, as per the search API. (Some examples here.)
So you could search with q = "this_company -from:this_company"
(Untested code- some quoting might be necessary.)

Related

Is there a way to search user/screen names in tweepy (or any other API package) using partial names like using contains or %?

So I am able to lookup users using the below command...
Search = 'Amazon'
users = api.search_users(q = Search)
results = []
for user in users:
name = user.name
screen_name= user.screen_name
results.append([name, screen_name])
results = pd.DataFrame(results)
results.columns = ['name', 'screen_name']
results
...and I was wonder if there was a way to use some form of contains/islike/% lookup when I only know part of the name. So for instance. If I was looking for Amazon, could I do something where I state theat
api.search_users(q is like 'Amaz%')
Furthermore, I believe that the search_users function is looking up by the screen name. Is there a function that looks it up by the user name instead?
There is no Twitter API function for this (wildcard or contains lookup, or user name instead of screen name), so there is no way for Tweepy or other libraries to offer the functionality.
We are as persian and other languages users cant use it cuz we just cant show our exact name inside an english alphabet user_id and we allways use and search people with their unicode display names.

Is it possible to set multiple strings in query for search method of tweepy? python

What I want is to search tweets that have multiple words I choose on twitter with python.
The official doc dose not say anything but it seems that the search method only takes 1 query.
source code
import tweepy
CK=
CS=
AT=
AS=
auth = tweepy.OAuthHandler(CK, CS)
auth.set_access_token(AT, AS)
api = tweepy.API(auth)
for status in api.search(q='word',count=100,): # I want to set multiple words in q but when I do.
print(status.user.id)
print(status.user.screen_name)
print(status.user.name)
print(status.text)
print(status.created_at)
What I have tried is below it didn't get any error but it searched only with the last word in the query in this case, the results were only tweets with the word "Python" it did not get tweets with both words.
for status in api.search(q='Java' and 'Python',count=100,)
Official doc
https://developer.twitter.com/en/docs/twitter-api/v1/tweets/search/api-reference/get-search-tweets
So my questions is that is it possible to set multiple words in query.
Is the way I wrote is simply wrong?
If so, please let me know.
If it can't set multiple words, I would appreciate if you could share simple python code that works for what I want to do.
Thank you in advance.
Use:
for status in api.search(q='Java Python', count=100)
From the Search Tweets: Standard v1.1 section Standard search operators:
watching now - containing both “watching” and “now”. This is the default operator.
As explained by Vlad Siv, just put each word you wish to look for in the speech marks for the query param. This should in turn look for tweets containing these words.

Py-StackExchange API returns nothing for a simple query

I'm using Py-StackExchange to get a list of questions from CrossValidated. I need to filter by the titles of pages that include the word "keras".
This is my code. Its execution takes a very long time and finally returns nothing.
cv = stackexchange.Site(stackexchange.CrossValidated, app_key=user_api_key, impose_throttling=True)
cv.be_inclusive()
for q in cv.questions(pagesize=100):
if "keras" in q.title:
print('--- %s ---' % q.title)
print(q.creation_date)
I checked the same query manually with a search and obtained the list of questions very quickly.
How can I do the same using Py-StackExchange?
You have two options:
Use this SEDE query. This will give you all questions which contain keras in their title on Cross Validated. However, note that SEDE is updated weekly.
Use the Stack Exchange API's /search/advanced method. This method has a title parameter which accepts:
text which must appear in returned questions' titles.
I haven't used Py-StackExchange before, so I don't know how it works. Therefore, in this example I'm going to use the StackAPI library (docs):
from stackapi import StackAPI
q_filter = '!4(L6lo9D9ItRz4WBh'
word_to_search = 'keras'
SITE = StackAPI('stats')
keras_qs = SITE.fetch('search/advanced',
filter = q_filter,
title = word_to_search)
print(keras_qs['items'])
print(f"Found {len(keras_qs['items'])} questions.")
The filter I'm using here is !-MOiN_e9RRw)Pq_PfQ*ovQp6AZCUT08iP; you can change that or not provide it at all. There's no reason to provide an API key (the lib uses one) unless there's a readon to do so.

Python Whoosh - Combining Results

Thanks for taking the time to answer this in advance. I'm relatively new to both Python (3.6) and Whoosh (2.7.4), so forgive me if I'm missing something obvious.
Whoosh 2.7.4 — Combining Results Error
I'm trying to follow the instructions in the Whoosh Documentation here on How to Search > Combining Results. However, I'm really lost in this section:
# Get the terms searched for
termset = set()
userquery.existing_terms(termset)
As I run my code, it produces this error:
'set' object has no attribute 'schema'
What went wrong?
I also looked into the docs about the Whoosh API on this, but I just got more confused about the role of ixreader. (Or is it index.Index.reader()?) Shrugs
A Peek at My Code
Schema
schema = Schema(uid=ID(unique=True, stored=True), # unique ID
indice=ID(stored=True, sortable=True),
title=TEXT,
author=TEXT,
body=TEXT(analyzer=LanguageAnalyzer(lang)),
hashtag=KEYWORD(lowercase=True, commas=True,
scorable=True)
)
The relevant fieldnames are the 'hashtag' and 'body'. Hashtags are user selected keywords for each document, and body is the text in the document. Pretty self-explanatory, no?
Search Function
Much of this is lifted directly from Whoosh Doc. Note, dic is just a dictionary containing the query string. Also, it should be noted that the error occurs during userquery.existing_terms(termset), so if the rest of it is bunk, my apologies, I haven't gotten that far.
try:
ix = index.open_dir(self.w_path, indexname=lang)
qp = QueryParser('body', schema=ix.schema)
userquery = qp.parse(dic['string'])
termset = set()
userquery.existing_terms(termset)
bbq = Or([Term('hashtag', text) for fieldname, text
in termset if fieldname == 'body'])
s = ix.searcher()
results = s.search(bbq, limit=5)
allresults = s.search(userquery, limit=10)
results.upgrade_and_extend(allresults)
for r in results:
print(r)
except Exception as e:
print('failed to search')
print(e)
return False
finally:
s.close()
Goal of My Code
I am taking pages from different files (pdf, epub, etc) and storing each page's text as a separate 'document' in a whoosh index (i.e. the field 'body'). Each 'document' is also labeled with a unique ID (uid) that allows me to take the search Results and determine the pdf file from which it comes and which pages contain the search hit (e.g. the document from page 2 of "1.pdf" has the uid 1.2). In other words, I want to give the user a list of page numbers that contain the search term and perhaps the pages with the most hits. For each file, the only document that has hashtags (or keywords) is the document with a uid ending in zero (i.e. page zero, e.g. uid 1.0 for "1.pdf"). Page zero may or may not have a 'body' too (e.g. the publish date, author names, summary, etc). I did this in order to prevent one document with more pages to be dramatically ranked higher from another with considerably less pages because of the multiple repetitions of the keyword over each 'document' (i.e. page).
Ultimately, I just want the code to elevate documents with the hashtag over documents with just search hits in the body text. I thought about just boosting the hashtag field instead, but I'm not sure what the mechanics of that is and the documentation recommends against this.
Suggestions and corrections would be greatly appreciated. Thank you again!
The code from your link doesn't look right to me. It too gives me the same error. Try rearranging your code as follows:
try:
ix = index.open_dir(self.w_path, indexname=lang)
qp = QueryParser('body', schema=ix.schema)
userquery = qp.parse(dic['string'])
s = ix.searcher()
allresults = s.search(userquery, limit=10)
termset = userquery.existing_terms(s.reader())
bbq = Or([Term('hashtag', text) for fieldname, text in termset if fieldname == 'body'])
results = s.search(bbq, limit=5)
results.upgrade_and_extend(allresults)
for r in results:
print(r)
except Exception as e:
print('failed to search')
print(e)
return False
finally:
s.close()
existing_terms requires a reader so I create the searcher first and give its reader to it.
As for boosting a field, the mechanics are quite simple:
schema = Schema(title=TEXT(field_boost=2.0), body=TEXT).
Add a sufficiently high boost to bring hashtag documents to the top and be sure to apply a single query on both body and hashtag fields.
Deciding between boosting or combining depends on whether you want all matching hashtag documents to be always, absolutely at the top before any other matches show. If so, combine. If instead you prefer to strike a balance in relevance albeit with a stronger bias for hashtags, boost.

Does gdata-python-client allow fulltext queries with multiple terms?

I'm attempting to search for contacts via the Google Contacts API, using multiple search terms. Searching by a single term works fine and returns contact(s):
query = gdata.contacts.client.ContactsQuery()
query.text_query = '1048'
feed = gd_client.GetContacts(q=query)
for entry in feed.entry:
# Do stuff
However, I would like to search by multiple terms:
query = gdata.contacts.client.ContactsQuery()
query.text_query = '1048 1049 1050'
feed = gd_client.GetContacts(q=query)
When I do this, no results are returned, and I've found so far that spaces are being replaced by + signs:
https://www.google.com/m8/feeds/contacts/default/full?q=3066+3068+3073+3074
I'm digging through the gdata-client-python code right now to find where it's building the query string, but wanted to toss the question out there as well.
According to the docs, both types of search are supported by the API, and I've seen some similar docs when searching through related APIs (Docs, Calendar, etc):
https://developers.google.com/google-apps/contacts/v3/reference#contacts-query-parameters-reference
Thanks!
Looks like I was mistaken in my understanding of the gdata query string functionality.
https://developers.google.com/gdata/docs/2.0/reference?hl=en#Queries
'The service returns all entries that match all of the search terms (like using AND between terms).'
Helps to read the docs and understand them!

Categories

Resources