I'm collecting full urls from tweets (not t.co ones), for that I needed to use option tweet_mode="extended" which you get in elevated access level (that I have).
I want to get full urls using Paginator.
I don't know how to do this except collecting tweet ids first and then calling api.get_status like this:
for sq in search_q:
for tweet in tweepy.Paginator(client.search_recent_tweets,sq).flatten(limit=5):
tweet_ids.append(tweet.id)
for tid in tweet_ids:
status = api.get_status(tid, tweet_mode="extended")
full_urls.append(status.entities['urls'][0]['expanded_url'])
which seems awfully inefficient.
Any help is appreciated.
Adding a tweet_fields for entities solves this.
for sq in search_q:
for tweet in tweepy.Paginator(client.search_recent_tweets,sq,tweet_fields=["entities"]).flatten(limit=5):
tweet_ids.append(tweet.data["entities"]['urls'][0]['expanded_url'])
Related
I'm trying to get the authors of a publication by using scopus. For that I got an API key and startet. I searched for the DOI and got a response. Everything is fine, there is also an entry "authors", but for each request this field is simply empty. My code in python is below:
import pyscopus
key = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXX'
doi = '10.1016/0270-0255(87)90003-0'
scopus = pyscopus.Scopus(key)
response_json = json.loads(scopus.search(f'doi({doi})', view='STANDARD').to_json(orient="records"))
So a s I sayed, you can call response_json['authors'] but it is always empty. There are authors given on the website, but webscraping is forbidden. Am I doing something wrong or do they simply not provide these information (which is confusing, since there is a field)? So far I couldn't find an answer.
I know there are other ways like crossref to get these information, but for reasons I want to do it with scopus.
Thanks!
I am trying to get the number of tweets containing a hashtag (let's say "#kitten") in python.
I am using tweepy.
However, all the codes I have found are in this form :
query = "kitten"
for i, status in enumerate(tweepy.Cursor(api.search, q=query).items(50)):
print(i, status)
I have this error : 'API' object has no attribute 'search'
Tweepy seemed to not cointain this object anymore. Is there any way to answer my problem ?
Sorry for my bad english.
After browsing the web and twitter documentation I found the answer.
If you want the historic of all tweet counts from 2006 you need Academic authorization. This is not my case so I can only get 7 days tracking which is enough in my case. Here is the code :
import tweepy
query = "kitten -is:retweet"
client = tweepy.Client(bearer_token)
counts = client.get_recent_tweets_count(query=query, granularity='day')
for i in counts.data:
print(i["tweet_count"])
The "-is:retweet" is here to not count the retweets. You need to remove it if you want to count them.
Since we're not pulling any tweets (only the volume of them) we are not increasing our MONTHLY TWEET CAP USAGE.
Be carefull when using symbols in your query such as "$" it might give you an error. For a list of valid operators see : list of valid operators for query
As said here Twitter counts introduction, you only need "read only" authorization to perform a recent count request. (see Recent Tweet counts)
Hello I am trying to scrape the tweets of a certain user using tweepy.
Here is my code :
tweets = []
username = 'example'
count = 140 #nb of tweets
try:
# Pulling individual tweets from query
for tweet in api.user_timeline(id=username, count=count, include_rts = False):
# Adding to list that contains all tweets
tweets.append((tweet.text))
except BaseException as e:
print('failed on_status,',str(e))
time.sleep(3)
The problem I am having is the tweets are coming back unfinished with "..." at the end.
I think I've looked at all the other similar problems on stack overflow and elsewhere but nothing works. Most do not concern me because I am NOT dealing with retweets .
I have tried putting tweet_mode = 'extended' and/or tweet.full_text or tweet._json['extended_tweet']['full_text'] in different combinations .
I don't get an error message but nothing works, just an empty list in return.
And It looks like the documentation is out of date because it says nothing about the 'tweet_mode' nor the 'include_rts' parameter :
Has anyone managed to get the full text of each tweet?? I'm really stuck on this seemingly simple problem and am losing my hair so I would appreciate any advice :D
Thanks in advance!!!
TL;DR: You're most likely running into a Rate Limiting issue. And use the full_text attribute.
Long version:
First,
The problem I am having is the tweets are coming back unfinished with "..." at the end.
From the Tweepy documentation on Extended Tweets, this is expected:
Compatibility mode
... It will also be discernible that the text attribute of the Status object is truncated as it will be suffixed with an ellipsis character, a space, and a shortened self-permalink URL to the Tweet.
Wrt
And It looks like the documentation is out of date because it says nothing about the 'tweet_mode' nor the 'include_rts' parameter :
They haven't explicitly added it to the documentation of each method, however, they specify that tweet_mode is added as a param:
Standard API methods
Any tweepy.API method that returns a Status object accepts a new tweet_mode parameter. Valid values for this parameter are compat and extended , which give compatibility mode and extended mode, respectively. The default mode (if no parameter is provided) is compatibility mode.
So without tweet_mode added to the call, you do get the tweets with partial text? And with it, all you get is an empty list? If you remove it and immediately retry, verify that you still get an empty list. ie, once you get an empty list result, check if you keep getting an empty list even when you change the params back to the one which worked.
Based on bug #1329 - API.user_timeline sometimes returns an empty list - it appears to be a Rate Limiting issue:
Harmon758 commented on Feb 13
This API limitation would manifest itself as exactly the issue you're describing.
Even if it was working, it's in the full_text attribute, not the usual text. So the line
tweets.append((tweet.text))
should be
tweets.append(tweet.full_text)
(and you can skip the extra enclosing ())
Btw, if you're not interested in retweets, see this example for the correct way to handle them:
Given an existing tweepy.API object and id for a Tweet, the following can be used to print the full text of the Tweet, or if it’s a Retweet, the full text of the Retweeted Tweet:
status = api.get_status(id, tweet_mode="extended")
try:
print(status.retweeted_status.full_text)
except AttributeError: # Not a Retweet
print(status.full_text)
If status is a Retweet, status.full_text could be truncated.
As per the twitter API v2:
tweet_mode does not work at all. You need to add expansions=referenced_tweets.id. Then in the response, search for includes. You can find all the truncated tweets as full tweets in the includes. You will still see the truncated tweets in response but do not worry about it.
I'm looking into the Twitter Search API, and apparently, it has a count parameter that determines "The number of tweets to return per page, up to a maximum of 100." What does "per page" mean, if I'm for example running a python script like this:
import twitter #python-twitter package
api = twitter.Api(consumer_key="mykey",
consumer_secret="mysecret",
access_token_key="myaccess",
access_token_secret="myaccesssecret")
results = api.GetSearch(raw_query="q=%23myHashtag&geocode=59.347937,18.072433,5km")
print(len(results))
This will only give me 15 tweets in results. I want more, preferably all tweets, if possible. So what should I do? Is there a "next page" option? Can't I just specify the search query in a way that gives me all tweets at once? Or if the number of tweets is too large, some maximum number of tweets?
Tweepy has a Cursor object that works like this:
for tweet in tweepy.Cursor(api.search, q="#myHashtag&geocode=59.347937,18.072433,5km", lang='en', tweet_mode='extended').items():
# handle tweets here
You can find more info in the Tweepy Cursor docs.
With TwitterAPI you would access pages this way:
pager = TwitterPager(api,
'search/tweets',
{'q':'#myHashtag', 'geocode':'59.347937,18.072433,5km'})
for item in pager.get_iterator():
print(item['text'] if 'text' in item else item)
A complete example is here: https://github.com/geduldig/TwitterAPI/blob/master/examples/page_tweets.py
I'm a nub when it comes to python. I literally just started today and have little understanding of programming. I have managed to make the following code work:
from twitter import *
config = {}
execfile("config.py", config)
twitter = Twitter(
auth = OAuth(config["access_key"], config["access_secret"],
config["consumer_key"], config["consumer_secret"]))
user = "skiftetse"
results = twitter.statuses.user_timeline(screen_name = user)
for status in results:
print "(%s) %s" % (status["created_at"], status["text"].encode("ascii",
"ignore"))
The problem is that it's only printing 20 results. The twitter page i'd like to get data from has 22k posts, so something is wrong with the last line of code.
screenshot
I would really appreciate help with this! I'm doing this for a research sentiment analysis, so I need several 100's to analyze. Beyond that it'd be great if retweets and information about how many people re tweeted their posts were included. I need to get better at all this, but right now I just need to meet that deadline at the end of the month.
You need to understand how the Twitter API works. Specifically, the user_timeline documentation.
By default, a request will only return 20 Tweets. If you want more, you will need to set the count parameter to, say, 50.
e.g.
results = twitter.statuses.user_timeline(screen_name = user, count = 50)
Note, count:
Specifies the number of tweets to try and retrieve, up to a maximum of 200 per distinct request.
In addition, the API will only let you retrieve the most recent 3,200 Tweets.