I'm trying to scrape the full 280 character tweets off of twitter but I can't get them to not trail off with '...' after 140 chars. Here's my code:
import tweepy
import datetime
auth = tweepy.OAuthHandler("", "")
auth.set_access_token("", "")
api = tweepy.API(auth)
end_date = datetime.datetime.utcnow() - datetime.timedelta(days=0)
for status in api.user_timeline(targer_user):
print(status.text)
if status.created_at > end_date:
break
I've read that adding text_mode=extendedto the function will solve this, but it's making no difference for me. If I use another suggested argument tweet_mode='extended', text is no longer an attribute of status.
How can I fix this?
It seems you need to use full_text now to get the 280 char tweet. Try something along the lines of:
print(status.extended_tweet['full_text'])
The tweet_mode='extended' can be used in user_timeline if you want, in which case you would just use below:
print(status.full_text)
This looks a bit nicer to me.
It might also be worth pointing out that - from what I've read - this might not work for a retweet (Twitter streaming API not return full tweets) but there are separate bits of the api you can use for that, so be sure to check before you print.
Twitter docs, in case you want a closer look at the update: https://developer.twitter.com/en/docs/tweets/tweet-updates.html
Related
Hello I am trying to scrape the tweets of a certain user using tweepy.
Here is my code :
tweets = []
username = 'example'
count = 140 #nb of tweets
try:
# Pulling individual tweets from query
for tweet in api.user_timeline(id=username, count=count, include_rts = False):
# Adding to list that contains all tweets
tweets.append((tweet.text))
except BaseException as e:
print('failed on_status,',str(e))
time.sleep(3)
The problem I am having is the tweets are coming back unfinished with "..." at the end.
I think I've looked at all the other similar problems on stack overflow and elsewhere but nothing works. Most do not concern me because I am NOT dealing with retweets .
I have tried putting tweet_mode = 'extended' and/or tweet.full_text or tweet._json['extended_tweet']['full_text'] in different combinations .
I don't get an error message but nothing works, just an empty list in return.
And It looks like the documentation is out of date because it says nothing about the 'tweet_mode' nor the 'include_rts' parameter :
Has anyone managed to get the full text of each tweet?? I'm really stuck on this seemingly simple problem and am losing my hair so I would appreciate any advice :D
Thanks in advance!!!
TL;DR: You're most likely running into a Rate Limiting issue. And use the full_text attribute.
Long version:
First,
The problem I am having is the tweets are coming back unfinished with "..." at the end.
From the Tweepy documentation on Extended Tweets, this is expected:
Compatibility mode
... It will also be discernible that the text attribute of the Status object is truncated as it will be suffixed with an ellipsis character, a space, and a shortened self-permalink URL to the Tweet.
Wrt
And It looks like the documentation is out of date because it says nothing about the 'tweet_mode' nor the 'include_rts' parameter :
They haven't explicitly added it to the documentation of each method, however, they specify that tweet_mode is added as a param:
Standard API methods
Any tweepy.API method that returns a Status object accepts a new tweet_mode parameter. Valid values for this parameter are compat and extended , which give compatibility mode and extended mode, respectively. The default mode (if no parameter is provided) is compatibility mode.
So without tweet_mode added to the call, you do get the tweets with partial text? And with it, all you get is an empty list? If you remove it and immediately retry, verify that you still get an empty list. ie, once you get an empty list result, check if you keep getting an empty list even when you change the params back to the one which worked.
Based on bug #1329 - API.user_timeline sometimes returns an empty list - it appears to be a Rate Limiting issue:
Harmon758 commented on Feb 13
This API limitation would manifest itself as exactly the issue you're describing.
Even if it was working, it's in the full_text attribute, not the usual text. So the line
tweets.append((tweet.text))
should be
tweets.append(tweet.full_text)
(and you can skip the extra enclosing ())
Btw, if you're not interested in retweets, see this example for the correct way to handle them:
Given an existing tweepy.API object and id for a Tweet, the following can be used to print the full text of the Tweet, or if it’s a Retweet, the full text of the Retweeted Tweet:
status = api.get_status(id, tweet_mode="extended")
try:
print(status.retweeted_status.full_text)
except AttributeError: # Not a Retweet
print(status.full_text)
If status is a Retweet, status.full_text could be truncated.
As per the twitter API v2:
tweet_mode does not work at all. You need to add expansions=referenced_tweets.id. Then in the response, search for includes. You can find all the truncated tweets as full tweets in the includes. You will still see the truncated tweets in response but do not worry about it.
I have some code that looks like this:
import tweepy
auth = tweepy.OAuthHandler(...)
auth.set_access_token(...)
api = tweepy.API(auth)
for e, i in enumerate(tweepy.Cursor(api.retweeters, '1157819926532501504').items()): # 1157819926532501504 is the id of https://twitter.com/NASA/status/1157819926532501504
print(e, i)
When I run that, I get something like this:
0 3249595190
1 1678701169
2 34877330
...
86 625615049
87 1157852235381870592
If you look at https://twitter.com/NASA/status/1157819926532501504, you will see that the tweet has 3.2K retweets, whereas only 88 are getting printed out. Why is this? I'm using tweepy.Cursor, so pagination should take care of itself, no? I tried api.retweeters(id='1157819926532501504', cursor=-1) to see what was happening to the cursor, and I got a response like this:
([3249595190, 1678701169, ..., 625615049, 1157852235381870592], (0, 0))
Changing the cursor parameter doesn't change the response, neither does using the page parameter.
Am I misunderstanding fundamental? Does twitter not allow one to retrieve all of the retweeters for a tweet? Am I misunderstanding tweepy? I'd appreciate any help. Thank you!
There is a limit really:100. https://developer.twitter.com/en/docs/tweets/post-and-engage/api-reference/get-statuses-retweets-id
Unfortunately, it didn't always return 100. Sometimes, it returns just 40 ou 50 even if the tweet has 50k retweets.
Solutions:
1) Run a regular search with the exact same text of the original tweet. It's the best chance to retrieve more retweets. Retweets are returned on regular searchs too.
2) If the tweet is "fresh" and "hot" (with a lot of engagement), do several calls. You can get the most recent tweets with that.
I'm a nub when it comes to python. I literally just started today and have little understanding of programming. I have managed to make the following code work:
from twitter import *
config = {}
execfile("config.py", config)
twitter = Twitter(
auth = OAuth(config["access_key"], config["access_secret"],
config["consumer_key"], config["consumer_secret"]))
user = "skiftetse"
results = twitter.statuses.user_timeline(screen_name = user)
for status in results:
print "(%s) %s" % (status["created_at"], status["text"].encode("ascii",
"ignore"))
The problem is that it's only printing 20 results. The twitter page i'd like to get data from has 22k posts, so something is wrong with the last line of code.
screenshot
I would really appreciate help with this! I'm doing this for a research sentiment analysis, so I need several 100's to analyze. Beyond that it'd be great if retweets and information about how many people re tweeted their posts were included. I need to get better at all this, but right now I just need to meet that deadline at the end of the month.
You need to understand how the Twitter API works. Specifically, the user_timeline documentation.
By default, a request will only return 20 Tweets. If you want more, you will need to set the count parameter to, say, 50.
e.g.
results = twitter.statuses.user_timeline(screen_name = user, count = 50)
Note, count:
Specifies the number of tweets to try and retrieve, up to a maximum of 200 per distinct request.
In addition, the API will only let you retrieve the most recent 3,200 Tweets.
I'm just being a bit of an idiot here, I think, but I've figured out how to fetch my timeline, but not how to modify that into performing a search. I've currently got:
consumer = oauth.Consumer(key=CONSUMER_KEY, secret=CONSUMER_SECRET)
access_token = oauth.Token(key=ACCESS_KEY, secret=ACCESS_SECRET)
client = oauth.Client(consumer, access_token)
response, data = client.request(searchURL)
I'm guessing it's the last line that'll change to work with the search, but I'm not sure how to format it, if I change the searchURL to the one used for actually searching (it's currently on timeline) it says it's in the wrong format.
Can anyone help?
Thanks.
Turns out it's off the form:
searchURL = https://api.twitter.com/1.1/search/tweets.json?q=obama&count=2&tresult_type=popular
That's an example search using the keyword "obama", setting the count to 2, and filtering for popular results.
response, data = client.request(searchURL)
tweets = json.loads(data)
The format of the returned tweets is a bit...awkward, but understandable with a bit of playing around.
I can't seem to get tweepy to work with replying to a specific tweet:
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_KEY, ACCESS_SECRET)
api = tweepy.API(auth)
### at this point I've grabbed the tweet and loaded it to JSON...
tweetId = tweet['results'][0]['id']
api.update_status('My status update',tweetId)
The api says it takes optional parameters and in_reply_to_status_id is the first, but it seems to be ignoring it altogether. This script will post an updated status, but it does not link it as a reply to the tweetId that I'm passing.
API for reference: http://code.google.com/p/tweepy/wiki/APIReference#update_status
Anyone have any ideas? I feel like I'm missing something simple here...
Thanks in advance.
Just posting the solution so no someone else suffers the way I did.
Twitter updated the API and added an option named auto_populate_reply_metadata
All you need to do is set that to true, and the leave the rest as should be. Here is a sample:
api.update_status(status = 'your reply', in_reply_to_status_id = tweetid , auto_populate_reply_metadata=True)
Also, the status_id is the long set of digits at the end of the tweet URL. Good Luck!
I ran into the same problem, but luckily I found the solution. You just need to include the user's screen_name in the tweet:
api.update_status('#<username> My status update', tweetId)
You can also do
api.update_status("my update", in_reply_to_status_id = tweetid)
Well then, it was something simple. I had to specify who the tweet was directed towards using the # notation.
api.update_status('My status update #whoIReplyTo',tweetId)
I discovered that I had to include the tweet's ID string (rather than actual ID number) when specifying the tweet that I was replying to
api.update_status('#whoIReplyTo my reply tweet',tweetIdString)
This seems to be a bug in Tweepy – even if you make a call to api.update_status with the correct parameters set,
api.update_status(status=your_status, in_reply_to_status=tweet_to_reply_to.id)
the tweet will not be a reply. In order to get a reply, you need to mention the user you want to reply to AND specify the correct in_reply_to_status id.
reply_status = "#%s %s" % (username_to_reply_to, your_status)
api.update_status(status=reply_status, in_reply_to_status=tweet_to_reply_to.id)
Keep in mind though – Tweepy and Twitter's servers still enforce a maximum number of 140 characters, so make sure you check that
len(reply_status) <= 140
Again, I think this is a bug because on the Twitter app, you can reply without embedding the username of the person to whom you're replying.
reply_status = "#%s %s" % (tweet.user.screen_name, "type your reply here")
api.update_status(status=reply_status, in_reply_to_status_id=tweet.id)
this is the last correct form, I just test it a few minutes ago