Tweepy Cursor.items() not working as expected with api.retweeters - python

I have some code that looks like this:
import tweepy
auth = tweepy.OAuthHandler(...)
auth.set_access_token(...)
api = tweepy.API(auth)
for e, i in enumerate(tweepy.Cursor(api.retweeters, '1157819926532501504').items()): # 1157819926532501504 is the id of https://twitter.com/NASA/status/1157819926532501504
print(e, i)
When I run that, I get something like this:
0 3249595190
1 1678701169
2 34877330
...
86 625615049
87 1157852235381870592
If you look at https://twitter.com/NASA/status/1157819926532501504, you will see that the tweet has 3.2K retweets, whereas only 88 are getting printed out. Why is this? I'm using tweepy.Cursor, so pagination should take care of itself, no? I tried api.retweeters(id='1157819926532501504', cursor=-1) to see what was happening to the cursor, and I got a response like this:
([3249595190, 1678701169, ..., 625615049, 1157852235381870592], (0, 0))
Changing the cursor parameter doesn't change the response, neither does using the page parameter.
Am I misunderstanding fundamental? Does twitter not allow one to retrieve all of the retweeters for a tweet? Am I misunderstanding tweepy? I'd appreciate any help. Thank you!

There is a limit really:100. https://developer.twitter.com/en/docs/tweets/post-and-engage/api-reference/get-statuses-retweets-id
Unfortunately, it didn't always return 100. Sometimes, it returns just 40 ou 50 even if the tweet has 50k retweets.
Solutions:
1) Run a regular search with the exact same text of the original tweet. It's the best chance to retrieve more retweets. Retweets are returned on regular searchs too.
2) If the tweet is "fresh" and "hot" (with a lot of engagement), do several calls. You can get the most recent tweets with that.

Related

Tweepy not returning full tweet: tweet_mode = 'extended' not working

Hello I am trying to scrape the tweets of a certain user using tweepy.
Here is my code :
tweets = []
username = 'example'
count = 140 #nb of tweets
try:
# Pulling individual tweets from query
for tweet in api.user_timeline(id=username, count=count, include_rts = False):
# Adding to list that contains all tweets
tweets.append((tweet.text))
except BaseException as e:
print('failed on_status,',str(e))
time.sleep(3)
The problem I am having is the tweets are coming back unfinished with "..." at the end.
I think I've looked at all the other similar problems on stack overflow and elsewhere but nothing works. Most do not concern me because I am NOT dealing with retweets .
I have tried putting tweet_mode = 'extended' and/or tweet.full_text or tweet._json['extended_tweet']['full_text'] in different combinations .
I don't get an error message but nothing works, just an empty list in return.
And It looks like the documentation is out of date because it says nothing about the 'tweet_mode' nor the 'include_rts' parameter :
Has anyone managed to get the full text of each tweet?? I'm really stuck on this seemingly simple problem and am losing my hair so I would appreciate any advice :D
Thanks in advance!!!
TL;DR: You're most likely running into a Rate Limiting issue. And use the full_text attribute.
Long version:
First,
The problem I am having is the tweets are coming back unfinished with "..." at the end.
From the Tweepy documentation on Extended Tweets, this is expected:
Compatibility mode
... It will also be discernible that the text attribute of the Status object is truncated as it will be suffixed with an ellipsis character, a space, and a shortened self-permalink URL to the Tweet.
Wrt
And It looks like the documentation is out of date because it says nothing about the 'tweet_mode' nor the 'include_rts' parameter :
They haven't explicitly added it to the documentation of each method, however, they specify that tweet_mode is added as a param:
Standard API methods
Any tweepy.API method that returns a Status object accepts a new tweet_mode parameter. Valid values for this parameter are compat and extended , which give compatibility mode and extended mode, respectively. The default mode (if no parameter is provided) is compatibility mode.
So without tweet_mode added to the call, you do get the tweets with partial text? And with it, all you get is an empty list? If you remove it and immediately retry, verify that you still get an empty list. ie, once you get an empty list result, check if you keep getting an empty list even when you change the params back to the one which worked.
Based on bug #1329 - API.user_timeline sometimes returns an empty list - it appears to be a Rate Limiting issue:
Harmon758 commented on Feb 13
This API limitation would manifest itself as exactly the issue you're describing.
Even if it was working, it's in the full_text attribute, not the usual text. So the line
tweets.append((tweet.text))
should be
tweets.append(tweet.full_text)
(and you can skip the extra enclosing ())
Btw, if you're not interested in retweets, see this example for the correct way to handle them:
Given an existing tweepy.API object and id for a Tweet, the following can be used to print the full text of the Tweet, or if it’s a Retweet, the full text of the Retweeted Tweet:
status = api.get_status(id, tweet_mode="extended")
try:
print(status.retweeted_status.full_text)
except AttributeError: # Not a Retweet
print(status.full_text)
If status is a Retweet, status.full_text could be truncated.
As per the twitter API v2:
tweet_mode does not work at all. You need to add expansions=referenced_tweets.id. Then in the response, search for includes. You can find all the truncated tweets as full tweets in the includes. You will still see the truncated tweets in response but do not worry about it.

Python Tweepy 280 Character Statuses

I'm trying to scrape the full 280 character tweets off of twitter but I can't get them to not trail off with '...' after 140 chars. Here's my code:
import tweepy
import datetime
auth = tweepy.OAuthHandler("", "")
auth.set_access_token("", "")
api = tweepy.API(auth)
end_date = datetime.datetime.utcnow() - datetime.timedelta(days=0)
for status in api.user_timeline(targer_user):
print(status.text)
if status.created_at > end_date:
break
I've read that adding text_mode=extendedto the function will solve this, but it's making no difference for me. If I use another suggested argument tweet_mode='extended', text is no longer an attribute of status.
How can I fix this?
It seems you need to use full_text now to get the 280 char tweet. Try something along the lines of:
print(status.extended_tweet['full_text'])
The tweet_mode='extended' can be used in user_timeline if you want, in which case you would just use below:
print(status.full_text)
This looks a bit nicer to me.
It might also be worth pointing out that - from what I've read - this might not work for a retweet (Twitter streaming API not return full tweets) but there are separate bits of the api you can use for that, so be sure to check before you print.
Twitter docs, in case you want a closer look at the update: https://developer.twitter.com/en/docs/tweets/tweet-updates.html

how to get the full status text from twitter API with JSON?

tldr: how to access the FULL tweet body with JSON?
Hello
I have a problem finding the full text of a tweet in JSON.
I am making a python app with tweepy. I would like to take a status, and then access the text
EDIT
I used user_timeline() to get a tweet_list. Then got one tweet from them like this:
tweet=tweet_list[index]._json
now when I do this:
tweet['text']
it returns a shortened tweet with a link to the original
eg:
Unemployment for Black Americans is the lowest ever recorded. Trump
approval ratings with Black Americans has doubl…
(the shortened link, couldn't directly link due to stackoverflow rules)
I want to return this:
Unemployment for Black Americans is the lowest ever recorded. Trump
approval ratings with Black Americans has doubled. Thank you, and it
will get even (much) better! #FoxNews
I don't mind if the link is added as long as the full tweet is shown
Okay after looking a bit more. I believe it is impossible to do it directly with JSON
There is a solution here about getting the full tweet. you can see it here
The problem with the answer above is that full_text turn the object into a string. if you need the object in its initial state to use it later with json to get other info. do the following:
use tweet_mode="extended" in user_timeline() and save it in tweet_list. eg:
tweet_list = api.user_timeline("user", count=10, tweet_mode="extended")
take one tweet only like this: tweet=tweet_list[0]
if you want the full text tweet, do this: tweet.full_text
if you need a json version of the object do this jtweet = tweet._json or just access the key like this tweet._json['id']
Hope that helps
You didn't provide any information about, how you want to achieve your goal. Looking at tweepy API, there is optional flag argument full_text which you can pass to function. get direct message function
It defaults to false causing that returned messages are shortened to 140 chars. Just set it at True and see what happen.

Twitter timeline in Python, but only getting 20ish results?

I'm a nub when it comes to python. I literally just started today and have little understanding of programming. I have managed to make the following code work:
from twitter import *
config = {}
execfile("config.py", config)
twitter = Twitter(
auth = OAuth(config["access_key"], config["access_secret"],
config["consumer_key"], config["consumer_secret"]))
user = "skiftetse"
results = twitter.statuses.user_timeline(screen_name = user)
for status in results:
print "(%s) %s" % (status["created_at"], status["text"].encode("ascii",
"ignore"))
The problem is that it's only printing 20 results. The twitter page i'd like to get data from has 22k posts, so something is wrong with the last line of code.
screenshot
I would really appreciate help with this! I'm doing this for a research sentiment analysis, so I need several 100's to analyze. Beyond that it'd be great if retweets and information about how many people re tweeted their posts were included. I need to get better at all this, but right now I just need to meet that deadline at the end of the month.
You need to understand how the Twitter API works. Specifically, the user_timeline documentation.
By default, a request will only return 20 Tweets. If you want more, you will need to set the count parameter to, say, 50.
e.g.
results = twitter.statuses.user_timeline(screen_name = user, count = 50)
Note, count:
Specifies the number of tweets to try and retrieve, up to a maximum of 200 per distinct request.
In addition, the API will only let you retrieve the most recent 3,200 Tweets.

Discogs API => How to retrieve genre?

I've crawled a tracklist of 36.000 songs, which have been played on the Danish national radio station P3. I want to do some statistics on how frequently each of the genres have been played within this period, so I figured the discogs API might help labeling each track with genre. However, the documentation for the API doesent seem to include an example for querying the genre of a particular song.
I have a CSV-file with with 3 columns: Artist, Title & Test(Test where i want the API to label each song with the genre).
Here's a sample of the script i've built so far:
import json
import pandas as pd
import requests
import discogs_client
d = discogs_client.Client('ExampleApplication/0.1')
d.set_consumer_key('key-here', 'secret-here')
input = pd.read_csv('Desktop/TEST.csv', encoding='utf-8',error_bad_lines=False)
df = input[['Artist', 'Title', 'Test']]
df.columns = ['Artist', 'Title','Test']
for i in range(0, len(list(df.Artist))):
x = df.Artist[i]
g = d.artist(x)
df.Test[i] = str(g)
df.to_csv('Desktop/TEST2.csv', encoding='utf-8', index=False)
This script has been working with a dummy file with 3 records in it so far, for mapping the artist of a given ID#. But as soon as the file gets larger(ex. 2000), it returns a HTTPerror when it cannot find the artist.
I have some questions regarding this approach:
1) Would you recommend using the search query function in the API for retrieving a variable as 'Genre'. Or do you think it is possible to retrieve Genre with a 'd.' function from the API?
2) Will I need to aquire an API-key? I have succesfully mapped the 3 records without an API-key so far. Looks like the key is free though.
Here's the guide I have been following:
https://github.com/discogs/discogs_client
And here's the documentation for the API:
https://www.discogs.com/developers/#page:home,header:home-quickstart
Maybe you need to re-read the discogs_client examples, i am not an expert myself, but a newbie trying to use this API.
AFAIK, g = d.artist(x) fails because x must be a integer not a string.
So you must first do a search, then get the artist id, then d.artist(artist_id)
Sorry for no providing an example, i am python newbie right now ;)
Also have you checked acoustid for
It's a probably a rate limit.
Read the status code of your response, you should find an 429 Too Many Requests
Unfortunately, if that's the case, the only solution is to add a sleep in your code to make one request per second.
Checkout the api doc:
http://www.discogs.com/developers/#page:home,header:home-rate-limiting
I found this guide:
https://github.com/neutralino1/discogs_client.
Access the api with your key and try something like:
d = discogs_client.Client('something.py', user_token=auth_token)
release = d.release(774004)
genre = release.genres
If you found a better solution please share.

Categories

Resources