Getting more than 700k followers ID from twitter using python - python

I was able to get the number of 75,000. After that, It keeps on pulling duplicate IDs. Here is my code. Can I get any suggestions so that I can correctly pull the large amount of follower Ids without duplicates?
import tweepy
import time
access_token = "..."
access_token_secret = "..."
consumer_key = "..."
consumer_secret = "..."
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
ids = []
while True:
try:
for page in tweepy.Cursor(api.followers_ids, screen_name = "...").pages():
ids.extend(page)
except tweepy.TweepError:
time.sleep(60*15)
continue
except StopIteration:
pass
break

I don't know why you are getting duplicates, but you could put the values into a set rather than a list to remove them efficiently.
Just change ids = [] to ids = set()
and ids.extend(page) to ids.update(page)

Related

How to get twitter userID from username using tweepy?

I am getting error : help me I tried many times. But it's not showing id from username.
for tweet in tweepy.Cursor(api..........:
try:
screen_name= "NFTfemmefatale"
id = screen_name
get = api.get_user(id)
print("id:" + str (get))
except:
print("error")
try this:
import tweepy
# assign the values accordingly
consumer_key = ""
consumer_secret = ""
access_token = ""
access_token_secret = ""
# authorization of consumer key and consumer secret
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
# set access to user's access key and access secret
auth.set_access_token(access_token, access_token_secret)
# calling the api
api = tweepy.API(auth)
# the screen name of the user
screen_name = "yourname"
# fetching the user
user = api.get_user(screen_name)
# fetching the ID
ID = user.id_str
print("The ID of the user is : " + ID)

Is it possible to use the Tweepy module to get the date followers were added?

I apologize in advance if I don't know how to search the Tweepy documentation. I am quite new to python/programming in general.
I have written a small script to pull Twitter follower data for an account I manage for work. I would like to investigate when followers added us to see if our posts are increasing engagement. What I cannot figure out is if I can use the Tweepy module to pull this particular information (when the follower added us)?
Thank you in advance for any help. My MWE:
import tweepy
import pandas as pd
# Load API keys
consumer_key = "my_consumer_key"
consumer_secret = "my_consumer_secret"
access_token = "my_access_token"
access_token_secret = "my_access_token_secret"
# Authenticate access to Twitter API
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
# Get the list of followers for the account
followers = api.followers_ids()
# Create a user map
userMap = {}
# Loop over all users
for f in followers:
# create a temporary list
tempList = []
try:
tweets = api.user_timeline(f, count = 33) # pull the 33 most recent tweets
except tweepy.TweepError:
print('Failed to run command.') # Tweepy throws an error if a user hasn't tweeted
# Loop over all tweets per each user f
for t in tweets:
tempList.append(t)
userMap[f] = tempList
# Create lists of pertinent data
dateList = []
favList = []
rtList = []
keyList = []
def genList(tweetList):
for tweets in tweetList:
for t in tweets:
keyList.append(str(t.id))
dateList.append(str(t.created_at))
favList.append(str(t.favorite_count))
rtList.append(str(t.retweet_count))
genList(userMap.values())
# Create a pandas data frame
df = pd.DataFrame(list(zip(keyList, dateList, favList, rtList)),
columns = ['userID', 'created_at', 'favorited', 'retweeted'])
This information is not provided by Twitter.
The followers/list (in Tweepy followers() method) returns a list of User objects. It looks like the only solution is to monitor the changes and manage the history yourself.

Tweepy: Ignore previous tweets to improve optimization

Problem: Trying to pull tweets via tweepy using Cursor. I want to make sure I don't pull tweets I previously pulled.
Here is working code:
import tweepy
import pandas as pd
import numpy as np
ACCESS_TOKEN = ""
ACCESS_TOKEN_SECRET = ""
CONSUMER_KEY = ""
CONSUMER_SECRET = ""
# OAuth process, using the keys and tokens
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
# Creation of the actual interface, using authentication
api = tweepy.API(auth, wait_on_rate_limit=True)
csvFile = open(r'filename', 'a')
#Use csv writer
headers = ['UserName', 'Tweet', 'TweetId', 'tweet_date', 'source', 'fav_count', 'retweet_count', 'coordinates', 'geo']
# definitions for writing to CSV
csvWriter = csv.writer(csvFile, lineterminator='\n')
# write the headers once
csvWriter.writerow(headers)
handles = ['pycon', 'gvanrossum']
previousTweets =
['222288832031240000',
'222287080586362000',
'222277240178741000',
'221414283844653000',
'221188011906445000',
'205274818877210000']
for handle in handles:
for status in tweepy.Cursor(api.user_timeline, screen_name= handle, tweet_mode="extended").items():
if status.id not in previousTweets:
csvWriter.writerow([status.user.name.encode('utf-8'), status.full_text.encode('utf-8'), status.id, status.created_at, status.source,
status.favorite_count, status.retweet_count, status.coordinates, status.geo])
print(handle)
This takes a long time and becomes unusable if you want to have a PreviousTweet list of over 75 tweets. Does anyone know a better way to filter out old tweets when using Tweepy and the Cursor function?
You can pass the since_id argument to the cursor.
This allows fetching status that is more recent than the specified ID (
http://docs.tweepy.org/en/v3.5.0/api.html#API.user_timeline)
try:
since_id = previous_tweets[-1]
except IndexError:
since_id = None
for handle in handles:
last_tweet = None
for status in tweepy.Cursor(
api.user_timeline, screen_name=handle,
tweet_mode="extended", since_id=since_id
).items():
# ... persist tweets to flat file or database
last_tweet_id = status.id
# this persists the last_tweet_id in memory.
# you may find that persisting this to a database a better way to go.
previous_tweets.append(last_tweet_id)

Python x Tweepy: how to pull tweets from all users contained within a list

I'm very, very new to Python as disclosure.
I have successfully pulled all users who are members of a list on Twitter. I have also pulled all tweets of a user, based on screen name - both components contained below. How do I combine these, and pull all tweets of all users who are members of a list please? Is this even possible? Everything below:
#GOAL: pull all tweets from all users who are memberis of a list.
#imports necessary methods from Twitter library
import json
import tweepy
import time
import csv
import sys
#authorises twitter
CONSUMER_KEY = 'SECRET'
CONSUMER_SECRET = 'SECRET'
ACCESS_TOKEN = 'SECRET'
ACCESS_SECRET = 'SECRET'
#authorisations
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_SECRET)
api = tweepy.API(auth, wait_on_rate_limit=True)
#returns members of a list & some details on them
for user in tweepy.Cursor(api.list_members, slug="uk-mps-labour", owner_screen_name="tweetminster", include_entities=True).items():
print(f"{user.id}\t{user.screen_name}\t{user.name}\t{user.description}\t{user.location}\t{user.followers_count}\t{user.friends_count}\t{user.verified}")
#creates a loop to iterate through the list of user ids
#returns all tweets of a user
counter = 0 #establishes a counter to number tweets output
for status in tweepy.Cursor(api.user_timeline, screen_name="frogface", tweet_mode="extended").items():
counter = counter + 1
print(f"{counter}\t{status.user.id}\t{status.user.screen_name}\t{status.created_at}\t{status.full_text}")
When you are iterating through users of the list instead of printing the user details, add the screen_name to a list.
next iterate through the screen_names list and then get the users tweets. The code will look something like this:
screen_names = []
#returns members of a list & some details on them
for user in tweepy.Cursor(api.list_members, slug="uk-mps-labour", owner_screen_name="tweetminster", include_entities=True).items():
screen_names.append(f"{user.screen_name}")
for i in screen_names:
#returns all tweets of a user
counter = 0 #establishes a counter to number tweets output
for status in tweepy.Cursor(api.user_timeline, screen_name=i, tweet_mode="extended").items():
counter = counter + 1
print(f"{counter}\t{status.user.id}\t{status.user.screen_name}\t{status.created_at}\t{status.full_text}")

Tweepy Search w/ While Loop

This is driving me crazy. As you can see below I am trying to use a simple while loop to perform a couple of tweepy searches and append them into a data frame. For some reason however after pulling the first set of 100 tweets it just repeats that set instead of performing a new search. Any advice would be greatly appreciated.
import sys
import csv
import pandas as pd
import tweepy
from tweepy import OAuthHandler
consumer_key = ''
consumer_secret = ''
access_token = ''
access_secret = ''
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth)
num_results = 200
result_count = 0
last_id = None
df = pd.DataFrame(columns=['Name', 'Location', 'Followers', 'Text', 'Coorinates'])
while result_count < num_results:
result = api.search(q='',count=100, geocode= "38.996918,-104.995826,190mi", since_id = last_id)
for tweet in result:
user = tweet.user
last_id = tweet.id_str
name = user.name
friends = user.friends_count
followers = user.followers_count
text = tweet.text.encode('utf-8')
location = user.location
coordinates = tweet.coordinates
df.loc[result_count] = pd.Series({'Name':name, 'Location':location, 'Followers':followers, 'Text':text, 'Coordinates':coordinates})
print(text)
result_count += 1
# Save to Excel
print("Writing all tables to Excel...")
df.to_csv('out.csv')
print("Excel Export Complete.")
The API.search method returns tweets that match a specified query. It's not a Streaming APi, so it returns all data at once.
Furthermore, in your query parameters, you have added count, that specifies the number of statuses to retrieve.
So the problem is that with your query you are returning the first 100 data of the complete set for each while iteration.
I suggest you to change the code in something like this
result = api.search(q='', geocode= "38.996918,-104.995826,190mi", since_id = last_id)
for tweet in result:
user = tweet.user
last_id = tweet.id_str
name = user.name
friends = user.friends_count
followers = user.followers_count
text = tweet.text.encode('utf-8')
location = user.location
coordinates = tweet.coordinates
df.loc[result_count] = pd.Series({'Name':name, 'Location':location, 'Followers':followers, 'Text':text, 'Coordinates':coordinates})
print(text)
Let me know.

Categories

Resources