In twitter search api, I'm able to find new paramter src = 'typd' or src = 'sprv', getting different results for each src paramter.But I'm unable to figure it out, what the term 'typd' and 'sprv' means?
for eg:
https://twitter.com/search?q=Technology&src=typd
https://twitter.com/search?q=Technology&src=sprv
'sprv' and 'typd' relate to Twitter's spelling correction system. As Leb said, 'typd' indicates results from a query that was typed-in and may be incorrect; while 'sprv' is a clear "no, I really meant this".
For example, if I type 'flayrah' into the search bar I get results for 'flayra' at URL https://twitter.com/search?q=flayrah&src=typd and the text "Showing results for flayra. Search for flayrah instead."
Clicking the link brings me to https://twitter.com/search?q=flayrah&src=sprv with results for 'flayrah'.
I'm not sure what sprv means but those two links aren't giving me different results, they're the exact same.
typd means that you actually typed the query into the search yourself.
Also note that the search through the previous link and search through Rest API (per your tag) are two different things.
The Twitter Search API is part of Twitter’s v1.1 REST API. It allows queries against the indices of recent or popular Tweets and behaves similarily to, but not exactly like the Search feature available in Twitter mobile or web clients, such as Twitter.com search.
https://dev.twitter.com/rest/public/search
Related
There were solutions provided before, but they don't work anymore :
extract the number of results from google search
for example the above code doesn't work anymore because the number of results doesn't seem to even be in the respond, there is no resultStats ID, in my browser the result is in the id of "result-status" but this doesn't exist in the respond
I don't want to actually use the API of google because there is a big limit on daily search, and i need to search for thousands of words daily, what is the solution for me?
Is it possible to use the API search in Tweepy to restrict a keyword search to only people you follow? For example, on the web, the URL would be: https://twitter.com/search?q=keyword&f=live&pf=on
I tried using the pf= paramater, since that is what shows up in the web URL. For example: pf='on' or pf=true. But it is still returning unfiltered tweets.
What I have currently:
for tweet in tweepy.Cursor(api.search,
q='keyword',
result_type='recent',
pf='on').items(20)
I don't see any reference to this in the API docs. Is there some other it would have to be done?
There's nothing available in the search API that would let you do this using an operator directly. One (clunky) way you could achieve this would be to put all the the people you are following onto a Twitter list, and then use the list:username/listname operator inside the q parameter to limit the search to the accounts in that list.
When using Tweepy, GetOldTweets3, and Twitter Advanced Search with the following parameters:
Query: "Accident"
Place: "Dallas, TX"
Since: "2018/1/1"
Until: "2018/1/2"
The number of Tweets are different for each method of searching. Tweepy, using full-archive search, returns 12 Tweets. GetOldTweets3 returns 22 Tweets. And using the Twitter Advanced Search returns 3 Tweets. Is there a reason for the different number of tweets?
Twitter's search through its website has different operators than its API.
Searching "Accident near:Dallas,TX since:2018-01-01 until:2018-01-02" on Twitter itself, results in 22 Tweets. If you're looking at only the Top ones, there are only 3, yes, but you can see all of them by clicking the Latest tab. The near operator this query uses doesn't seem to be explicitly documented anywhere, so it's unclear how exactly it works. In fact, location/place doesn't even seem to be part of the Advanced Search UI anymore. Historically, it seems this worked by searching within a radius (defaulting to 15 miles if the within operator isn't set) of the location specified.
The current branch/PR for Tweepy adding API.search_full_archive, which is what I assume you're using, uses the full-archive endpoint of Twitter's premium search APIs. Something like api.search_full_archive("Environment_Name", "Accident place:Dallas,TX", fromDate=201801010000, toDate=201801020000) does in fact return 12 Tweets. However, this is using the documented place premium search operator, which has specific defined behavior:
Matches Tweets tagged with the specified location or Twitter place ID
This means that it will only return Tweets that were tagged specifically with that location, rather than including other locations nearby within a certain radius. Oddly enough, these results actually include 2 Tweets that the website's search misses and doesn't seem to return by location search. This could be due to Twitter's search policies, but again, it's difficult to determine the exact reason since Twitter's website search isn't documented and is somewhat of a black box.
If you want to specify a set of coordinates and radius for your search using the premium search API, you can do so with the point_radius premium search operator. Using Tweepy's API.geo_search method, which uses the Twitter API's GET geo/search endpoint, and a query for "Dallas,TX", the Place object returned that represents Dallas, TX specifies a centroid of [-96.7301749064317, 32.819858499999995]. There's no guarantee that these are the coordinates that Twitter's website search uses, but with some testing, using these coordinates with point_radius, the radius that would return the exact results matching the website search results seems to be somewhere between 17 and 18 miles. With a radius of 17.5 miles, there's only 3 extra Tweets from Plano.
GetOldTweets3 does not use Twitter's API and instead scrapes the site directly. This should not be considered reliable and is against Twitter's Terms of Service:
scraping the Services without the prior consent of Twitter is expressly prohibited
If you want the most accurate and defined results, you should use Twitter's API. This is the only valid method if you want to retrieve those results programmatically without violating Twitter's TOS. Your options for searching by location are either by specifically for that location by name or Twitter place ID, coordinates and radius, or bounding box, using the place, point_radius, or bounding_box premium search operators, respectively. Note that for some reason, as those 2 other Tweets exhibited, certain Tweets might only be able to be found by specific location rather than area.
I'm attempting to filter data down by page path which is simple enough in most cases. However, I'm getting an unexpected result returned:
data = service.data().ga().get(
ids='ga:' + profile_id,
start_date='2018-06-15',
end_date='yesterday',
metrics='ga:sessions,ga:newUsers,ga:sessionDuration,ga:goal12Completions,ga:goal4Completions,ga:goal5Completions,ga:bounces,ga:users',
dimensions='ga:date,ga:sourceMedium,ga:userType,ga:country,ga:region,ga:city,ga:pagePath',
sort='ga:date',
filters='ga:pagePath=~/path1/path2.*',
start_index=index,
max_results=10000).execute()
return data
The data within Analytics has page data structured thus:
domain.com/path1/path2/
domain.com/path1/path2/some
domain.com/path1/path2/extra
domain.com/path1/path2/parameters
I expect the filter above to return data for each of these page structures, however, it only returns data for pages that have a parameter after path 2:
domain.com/path1/path2/some
domain.com/path1/path2/extra
domain.com/path1/path2/parameters
I've tried various ways to filter this data including:
filters='ga:pagePath=#/path1/path2'
filters='ga:pagePath=#/path2'
I've also attempted to pass in the search string as a variable into the filter which produced the same result.
I've also tested it out in the query explorer which gives the same results as my script. However, filtering for the same regex expression in the advanced filter area of GA gives me the results I expect from the first bullet list above. I also threw some of the data into a text file and did a regex search on it which gave me all of the expected results.
My next step is testing taking away specific metrics to see if there's a combination creating a problem but there shouldn't be according to the documentation.
Any suggestions on next steps for debugging or a correction of the filter?
Adjusting the filtering to a "contains substring" method will solve your problem. Refer to the Google Analytics API reference guide to see all of the available filtering options. Also, I would highly recommend double-checking your original data source within the Google Analytics user interface to ensure the URLs that you're seeking are in fact available.
filters='ga:pagePath=#<YOUR-SUBSTRING>',
I've searched the whole afternoon but I'm still stuck.
I need to google keywords, and save the ranks of a given domain name for each keyword.
I tried to use several libraries : xgoogle, google, and pygoogle. However, pygoogle just doesn't work, and google, pygoogle always end up raising "HTTP Error : Service Unavailable".
So I suppose I should use the Google API, that uses the libraries urllib2 and simplejson, as well as the URL "http://ajax.googleapis.com/ajax/services/search/web?v=1.0".
I have several questions :
How to choose the top level domain ?
How to choose the langage of the results ?
How to choose how many results are shown ?
Are the results ranked the way I should find them in my own Google search ? I'm asking the question since I'm under the impression it's not the case.
Are the photos URL taken into account ?
How to choose the starting URL ? Is it possible to start from the 10th result ?
Thank you for your help,
Sebi81