I'm a nub when it comes to python. I literally just started today and have little understanding of programming. I have managed to make the following code work:
from twitter import *
config = {}
execfile("config.py", config)
twitter = Twitter(
auth = OAuth(config["access_key"], config["access_secret"],
config["consumer_key"], config["consumer_secret"]))
user = "skiftetse"
results = twitter.statuses.user_timeline(screen_name = user)
for status in results:
print "(%s) %s" % (status["created_at"], status["text"].encode("ascii",
"ignore"))
The problem is that it's only printing 20 results. The twitter page i'd like to get data from has 22k posts, so something is wrong with the last line of code.
screenshot
I would really appreciate help with this! I'm doing this for a research sentiment analysis, so I need several 100's to analyze. Beyond that it'd be great if retweets and information about how many people re tweeted their posts were included. I need to get better at all this, but right now I just need to meet that deadline at the end of the month.
You need to understand how the Twitter API works. Specifically, the user_timeline documentation.
By default, a request will only return 20 Tweets. If you want more, you will need to set the count parameter to, say, 50.
e.g.
results = twitter.statuses.user_timeline(screen_name = user, count = 50)
Note, count:
Specifies the number of tweets to try and retrieve, up to a maximum of 200 per distinct request.
In addition, the API will only let you retrieve the most recent 3,200 Tweets.
Related
I am trying to get the number of tweets containing a hashtag (let's say "#kitten") in python.
I am using tweepy.
However, all the codes I have found are in this form :
query = "kitten"
for i, status in enumerate(tweepy.Cursor(api.search, q=query).items(50)):
print(i, status)
I have this error : 'API' object has no attribute 'search'
Tweepy seemed to not cointain this object anymore. Is there any way to answer my problem ?
Sorry for my bad english.
After browsing the web and twitter documentation I found the answer.
If you want the historic of all tweet counts from 2006 you need Academic authorization. This is not my case so I can only get 7 days tracking which is enough in my case. Here is the code :
import tweepy
query = "kitten -is:retweet"
client = tweepy.Client(bearer_token)
counts = client.get_recent_tweets_count(query=query, granularity='day')
for i in counts.data:
print(i["tweet_count"])
The "-is:retweet" is here to not count the retweets. You need to remove it if you want to count them.
Since we're not pulling any tweets (only the volume of them) we are not increasing our MONTHLY TWEET CAP USAGE.
Be carefull when using symbols in your query such as "$" it might give you an error. For a list of valid operators see : list of valid operators for query
As said here Twitter counts introduction, you only need "read only" authorization to perform a recent count request. (see Recent Tweet counts)
I have some code that looks like this:
import tweepy
auth = tweepy.OAuthHandler(...)
auth.set_access_token(...)
api = tweepy.API(auth)
for e, i in enumerate(tweepy.Cursor(api.retweeters, '1157819926532501504').items()): # 1157819926532501504 is the id of https://twitter.com/NASA/status/1157819926532501504
print(e, i)
When I run that, I get something like this:
0 3249595190
1 1678701169
2 34877330
...
86 625615049
87 1157852235381870592
If you look at https://twitter.com/NASA/status/1157819926532501504, you will see that the tweet has 3.2K retweets, whereas only 88 are getting printed out. Why is this? I'm using tweepy.Cursor, so pagination should take care of itself, no? I tried api.retweeters(id='1157819926532501504', cursor=-1) to see what was happening to the cursor, and I got a response like this:
([3249595190, 1678701169, ..., 625615049, 1157852235381870592], (0, 0))
Changing the cursor parameter doesn't change the response, neither does using the page parameter.
Am I misunderstanding fundamental? Does twitter not allow one to retrieve all of the retweeters for a tweet? Am I misunderstanding tweepy? I'd appreciate any help. Thank you!
There is a limit really:100. https://developer.twitter.com/en/docs/tweets/post-and-engage/api-reference/get-statuses-retweets-id
Unfortunately, it didn't always return 100. Sometimes, it returns just 40 ou 50 even if the tweet has 50k retweets.
Solutions:
1) Run a regular search with the exact same text of the original tweet. It's the best chance to retrieve more retweets. Retweets are returned on regular searchs too.
2) If the tweet is "fresh" and "hot" (with a lot of engagement), do several calls. You can get the most recent tweets with that.
Context
I am working on a topic modeling for twitter project.
The idea is to retrieve all tweets related to a specific country and analyze them in order to discover what people from a specific country are talking about on Twitter.
What I have tried
1.First Solution
I know that we can use twitter streaming API or cursor to retrieve tweets from a specific country and I have tried the following code to get all tweets given geocodes coordinates of a country.
I have written the following code :
def get_tweets(query_fname, auth, max_time, location=None):
stop = datetime.now() + max_time
twitter_stream = Stream(auth, CustomListener(query_fname))
while datetime.now() < stop:
if location:
twitter_stream.filter(locations=[11.94,-13.64,30.54,5.19], is_async=True)
else:
twitter_stream.filter(track=query, is_async=True)
The problem of this approach
Not everyone has allowed Twitter to access his location details and with this approach, I can only get a few tweets something like 300 tweets for my location.
There are some persons who are not in the country but who tweet about the country and people within the country replies to them. Their tweets are not captured by this approach.
2.Second Solution
Another approach was to collect tweets with hashtags related to a country with a cursor
I have tried this code :
def query_tweet(client, query=[], max_tweets=2000, country=None):
"""
query tweets using the query list pass in parameter
"""
query = ' OR '.join(query)
name = 'by_hashtags_'
now = datetime.now()
today = now.strftime("%d-%m-%Y-%H-%M")
with open('data/query_drc_{}_{}.jsonl'.format(name, today), 'w') as f:
for status in Cursor(
client.search,
q=query,
include_rts=True).items(max_tweets):
f.write(json.dumps(status._json) + "\n")
Problem
This approach gives more results than the first one but as you may notice, not everyone uses those hashtags to tweets about the country.
3.Third approach
I have tried to retrieve the tweet using place id specific to a country but it gives the same problem as the first approach.
My questions
How can I retrieve all tweets about a specific country? I mean everything people are tweeting about for a specific country with or without country-specific hashtags?
Hint: For people who are not located in the country, It may be a good idea to get their tweets if they were replied or retweeted by people within the country.
Regards.
I'm a nub when it comes to python. I literally just started today and have little understanding of programming. I have managed to make the following code work:
from twitter import *
config = {}
execfile("config.py", config)
twitter = Twitter(
auth = OAuth(config["access_key"], config["access_secret"],
config["consumer_key"], config["consumer_secret"]))
user = "skiftetse"
results = twitter.statuses.user_timeline(screen_name=user, count=(1000), include_rts=True)
for status in results:
print "(%s) %s" % (status["created_at"], status["text"].encode("ascii",
"ignore"))
I was told that the API will only let you retrieve the most recent 3,200 Tweets, however you can only retreive 200 per distinct request. How can I get past the first 200 and move on to retrieve...say 1000?
Would it also be possible to make a chart that shows the posts and how many times it was retweeted?
I would really appreciate help with this! I'm doing this for a research sentiment analysis, so I need a large size to analyze.
Thanks!
when you request first time it gives you 200 records but the last status id
would become your "since_id" when you call with "since_id" you will get next 200 tweet of that user
s = twitter.statuses.user_timeline(screen_name=user, count=(1000), since_id = since_id)
I've crawled a tracklist of 36.000 songs, which have been played on the Danish national radio station P3. I want to do some statistics on how frequently each of the genres have been played within this period, so I figured the discogs API might help labeling each track with genre. However, the documentation for the API doesent seem to include an example for querying the genre of a particular song.
I have a CSV-file with with 3 columns: Artist, Title & Test(Test where i want the API to label each song with the genre).
Here's a sample of the script i've built so far:
import json
import pandas as pd
import requests
import discogs_client
d = discogs_client.Client('ExampleApplication/0.1')
d.set_consumer_key('key-here', 'secret-here')
input = pd.read_csv('Desktop/TEST.csv', encoding='utf-8',error_bad_lines=False)
df = input[['Artist', 'Title', 'Test']]
df.columns = ['Artist', 'Title','Test']
for i in range(0, len(list(df.Artist))):
x = df.Artist[i]
g = d.artist(x)
df.Test[i] = str(g)
df.to_csv('Desktop/TEST2.csv', encoding='utf-8', index=False)
This script has been working with a dummy file with 3 records in it so far, for mapping the artist of a given ID#. But as soon as the file gets larger(ex. 2000), it returns a HTTPerror when it cannot find the artist.
I have some questions regarding this approach:
1) Would you recommend using the search query function in the API for retrieving a variable as 'Genre'. Or do you think it is possible to retrieve Genre with a 'd.' function from the API?
2) Will I need to aquire an API-key? I have succesfully mapped the 3 records without an API-key so far. Looks like the key is free though.
Here's the guide I have been following:
https://github.com/discogs/discogs_client
And here's the documentation for the API:
https://www.discogs.com/developers/#page:home,header:home-quickstart
Maybe you need to re-read the discogs_client examples, i am not an expert myself, but a newbie trying to use this API.
AFAIK, g = d.artist(x) fails because x must be a integer not a string.
So you must first do a search, then get the artist id, then d.artist(artist_id)
Sorry for no providing an example, i am python newbie right now ;)
Also have you checked acoustid for
It's a probably a rate limit.
Read the status code of your response, you should find an 429 Too Many Requests
Unfortunately, if that's the case, the only solution is to add a sleep in your code to make one request per second.
Checkout the api doc:
http://www.discogs.com/developers/#page:home,header:home-rate-limiting
I found this guide:
https://github.com/neutralino1/discogs_client.
Access the api with your key and try something like:
d = discogs_client.Client('something.py', user_token=auth_token)
release = d.release(774004)
genre = release.genres
If you found a better solution please share.