I need to extract the url than it's on tweet when someone mencion me.
image about what I need
Problem: If the tweet has url on tweet it's impossible take the text, but if the tweet only have text, I can take the text.
Thats are the results with diferents librarys.
Tweepy library
auth = tweepy.OAuthHandler(apiKey, apiSecretKey)
auth.set_access_token(accesToken, secretToken)
api = tweepy.API(auth, wait_on_rate_limit= True, wait_on_rate_limit_notify= True)
search_words = "#LoremIp03500003 -filter:retweets"
tweets =tweepy.Cursor(api.search, q=search_words, tweet_mode="extended").items(20)
users = [[tweet.full_text, tweet.id] for tweet in tweets]
users
image output Tweepy
Twython library
results = t.search(q="LoremIp03500003 -filter:retweets", tweet_mode='extended')
all_tweets = results['statuses']
for tweet in all_tweets:
print(tweet ['full_text'])
image output Twython
Twitter library
api = twitter.Api(apiKey,
apiSecretKey,
accesToken,
secretToken)
api.GetSearch(term="#LoremIp03500003")
image output Twitter
Final goal: I need to make a CSV with all mentions of my twitter account with: username(person than mencion me), id_user, text, url(if have one on text), id_tweet
Related
I am trying to get, for a list of a tweets with a media attached and with a specific hashtag, their:
text, author id, tweet id, creation data, retweet_count, like_count and image url.
I am, however, having some problem grabbing the image url of the media attachment.
This is one of my very poor (quite the novice here) attempts to do it:
client = tweepy.Client('bearer_token')
response = client.search_recent_tweets(
"#covid -is:retweet has:media",
max_results=100,
expansions="author_id,attachments.media_keys",
tweet_fields="created_at,public_metrics,attachments",
user_fields="username,name,profile_image_url",
media_fields="public_metrics,url,height,width,alt_text")
for tweet in response.data:
metric = tweet.public_metrics
print(f"{tweet.created_at}, {tweet.text}, {tweet.author_id} \n {metric['retweet_count']}, {metric['like_count']}")
for image in tweet.includes['media']:
media[image.media_key] = f"{image.url}"
print(media[image.media_key])
With this I can get the text of the tweet. If I use the same technique with Paginator to obtain more tweets I cannot even see the text....
Anyone knows how to retrieve both the text and the image url (preferably using Paginator) of a tweet ?
import pandas as pd
import tweepy as tw # To extract the twitter data using Twitters official API
from tqdm import tqdm, notebook
import os
pd.set_option('display.max_columns' , None)
pd.set_option('display.max_rows' , None)
pd.set_option('display.max_colwidth' , None)
pd.set_option('display.width' , None)
consumer_api_key = 'XXXX'
consumer_api_secret = 'XXXX'
auth = tw.OAuthHandler(consumer_api_key, consumer_api_secret)
api = tw.API(auth, wait_on_rate_limit=True)
search_words = "#Ethereum -filter:retweets"
# We type in our key word to search for relevant tweets that contain "#"
#You can fix a time frame with the date since and date until parameters
date_until = "2021-05-01"
# Collect tweets
tweets = tw.Cursor(api.search_tweets,
q=search_words,
lang="en",
until=date_until).items(15000)
tweets_copy = []
for tweet in tqdm(tweets):
tweets_copy.append(tweet)
print(f"New tweets retrieved: {len(tweets_copy)}")
I am trying to extract tweets with the keyword #Ethereum from a specific time frame,but when I run the code I keep getting a red bar in Jupyter Notebook that says "0it [00:00, ?it/s]" and this leads to know tweets being retrieved. Can anyone help?
From the Twitter search documentation:
The Twitter Search API searches against a sampling of recent Tweets published in the past 7 days.
And from the until parameter documentation for this endpoint:
Keep in mind that the search index has a 7-day limit. In other words, no tweets will be found for a date older than one week.
This is also clearly writen in the Tweepy method documentation.
I am going to collect all the comments of a particular Twitter username within 2018. How can I do that?
The code below scrape tweets for one tweet without time consideration:
name = 'MoveTheWorld'
tweet_id = '1310127204538482690'
replies=[]
for tweet in tweepy.Cursor(api.search,q='to:'+name, result_type='recent', timeout=999999).items(8000):
if hasattr(tweet, 'in_reply_to_status_id_str'):
if (tweet.in_reply_to_status_id_str==tweet_id):
replies.append(tweet)
I am trying to scrape the tweets from a trending tag in twitter. I tried to find the xpath of the text in a tweet, but it doesn't work.
browser = webdriver.Chrome('/Users/Suraj/Desktop/twitter/chromedriver')
url = 'https://twitter.com/search?q=%23'+'Swastika'+'&src=trend_click'
browser.get(url)
time.sleep(1)
The following piece of code doesn't give any results.
browser.find_elements_by_xpath('//*[#id="tweet-text"]')
Other content which I was able to find where :
browser.find_elements_by_css_selector("[data-testid=\"tweet\"]") # works
browser.find_elements_by_xpath("/html/body/div[1]/div/div/div[2]/main/div/div/div/div[1]/div/div[2]/div/div/section/div/div/div/div/div/div/article/div/div/div/div[2]/div[2]/div[1]/div/div") # works
I want to know how I can select the text from the tweet.
You can use Selenium to scrape twitter but it would be much easier/faster/efficient to use the twitter API with tweepy. You can sign up for a developer account here: https://developer.twitter.com/en/docs
Once you have signed up get your access keys and use tweepy like so:
import tweepy
# connects to twitter and authenticates your requests
auth = tweepy.OAuthHandler(TWapiKey, TWapiSecretKey)
auth.set_access_token(TWaccessToken, TWaccessTokenSecret)
# wait_on_rate_limit prevents you from requesting too many times and having twitter block you
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
# loops through every tweet that tweepy.Cursor pulls -- api.search tells cursor
# what to do, q is the search term, result_type can be recent popular or mixed,
# and the max_id/since_id are snowflake ids which are twitters way of
# representing time and finally count is the maximum amount of tweets you can return per request.
for tweet in tweepy.Cursor(api.search, q=YourSearchTerm, result_type='recent', max_id=snowFlakeCurrent, since_id=snowFlakeEnd, count=100).items(500):
createdTime = tweet.created_at.strftime('%Y-%m-%d %H:%M')
createdTime = dt.datetime.strptime(createdTime, '%Y-%m-%d %H:%M').replace(tzinfo=pytz.UTC)
data.append(createdTime)
This code is an example of a script that pulls 500 tweets from YourSearchTerm recent tweets and then appends the time each was created to a list. You can check out the tweepy documentation here: http://docs.tweepy.org/en/latest/
Each tweet that you pull with the tweepy.Cursor() will have many attributes that you can choose and append to a list and or do something else. Even though it is possible to scrape twitter with Selenium it's realllly not recommended as it will be very slow whereas tweepy returns result in mere seconds.
Applying for the API is not always successful. I used Twint, which provides a means to scrape quickly. In this case to a CSV output.
def search_twitter(terms, start_date, filename, lang):
c = twint.Config()
c.Search = terms
c.Custom_csv = ["id", "user_id", "username", "tweet"]
c.Output = filename
c.Store_csv = True
c.Lang = lang
c.Since = start_date
twint.run.Search(c)
return
I am trying to extract the tweets of my friends using api.home_timeline. I don't want to stream it, but I want to save 800 tweets, the screen names, and their likes/favorites count to a csv file. Twitter only allows 200 tweets at a time. Given my keys as already specified, this is what I have so far:
def data_set(handle):
auth=tweepy.OAuthHandler(CONSUMER_KEY,CONSUMER_SECRET)
auth=set_access_token(ACCESS_KEY,ACCESS_SECRET)
api=tweepy.API(auth)
count_tweets=api.home_timeline(screen_name=handle,count=200)
twits=[]
tweet_data=[tweet.text for tweet in count_tweets]
for t in count_tweets:
twits.append(t)
if __name__== '__main__':
tweet_data('my twitter name')
my original plan was to have multiple count_tweets such as count_tweet1, etc. I am unsure how to proceed with the rest. Any suggestions are greatly appreciated.
Twitter follows pagination. For every request, you make it gives a maximum of 200 tweets(in the case of home_timeline). The 200 tweets you get are based on popularity. You can fetch all the tweets from the user's timeline by iterating over the pages. Tweepy provides Cursor functionality to iterate over the pages
Edited code for your case:
def data_set(handle):
auth=tweepy.OAuthHandler(CONSUMER_KEY,CONSUMER_SECRET)
auth=set_access_token(ACCESS_KEY,ACCESS_SECRET)
api=tweepy.API(auth)
tweet_data = []
for page in tweepy.Cursor(api.user_timeline, screen_name=handle, count=200, tweet_mode='extended').pages():
for tweet in page:
tweet_data.append(tweet.full_text)
return tweet_data
## Not sure why the following lines are needed
# twits=[]
# tweet_data=[tweet.text for tweet in count_tweets]
# for t in count_tweets:
# twits.append(t)
if __name__== '__main__':
print(data_set('my twitter name'))
I have used api.user_timeline instead of api.home_timeline in the code as you said you are trying to fetch the tweets from your friends timeline. If your use case is satisfied by api.home_timeline you can replace it instead.