I am trying to get, for a list of a tweets with a media attached and with a specific hashtag, their:
text, author id, tweet id, creation data, retweet_count, like_count and image url.
I am, however, having some problem grabbing the image url of the media attachment.
This is one of my very poor (quite the novice here) attempts to do it:
client = tweepy.Client('bearer_token')
response = client.search_recent_tweets(
"#covid -is:retweet has:media",
max_results=100,
expansions="author_id,attachments.media_keys",
tweet_fields="created_at,public_metrics,attachments",
user_fields="username,name,profile_image_url",
media_fields="public_metrics,url,height,width,alt_text")
for tweet in response.data:
metric = tweet.public_metrics
print(f"{tweet.created_at}, {tweet.text}, {tweet.author_id} \n {metric['retweet_count']}, {metric['like_count']}")
for image in tweet.includes['media']:
media[image.media_key] = f"{image.url}"
print(media[image.media_key])
With this I can get the text of the tweet. If I use the same technique with Paginator to obtain more tweets I cannot even see the text....
Anyone knows how to retrieve both the text and the image url (preferably using Paginator) of a tweet ?
Related
I need to extract the url than it's on tweet when someone mencion me.
image about what I need
Problem: If the tweet has url on tweet it's impossible take the text, but if the tweet only have text, I can take the text.
Thats are the results with diferents librarys.
Tweepy library
auth = tweepy.OAuthHandler(apiKey, apiSecretKey)
auth.set_access_token(accesToken, secretToken)
api = tweepy.API(auth, wait_on_rate_limit= True, wait_on_rate_limit_notify= True)
search_words = "#LoremIp03500003 -filter:retweets"
tweets =tweepy.Cursor(api.search, q=search_words, tweet_mode="extended").items(20)
users = [[tweet.full_text, tweet.id] for tweet in tweets]
users
image output Tweepy
Twython library
results = t.search(q="LoremIp03500003 -filter:retweets", tweet_mode='extended')
all_tweets = results['statuses']
for tweet in all_tweets:
print(tweet ['full_text'])
image output Twython
Twitter library
api = twitter.Api(apiKey,
apiSecretKey,
accesToken,
secretToken)
api.GetSearch(term="#LoremIp03500003")
image output Twitter
Final goal: I need to make a CSV with all mentions of my twitter account with: username(person than mencion me), id_user, text, url(if have one on text), id_tweet
I am trying to download some photos from Flickr. With My KEY and Secret, I am able to search and download using these lines of code
image_tag = 'seaside'
extras = ','.join(SIZES[0])
flickr = FlickrAPI(KEY, SECRET)
photos = flickr.walk(text=image_tag, # it will search by image title and image tags
extras=extras, # get the urls for each size we want
privacy_filter=1, # search only for public photos
per_page=50,
sort='relevance',
safe_search = 1 )
Using this I am able to acquire the url and the photo ID but I would like to download photostats too (likes, views), but I can't find an appropriate command that starting from the ID of the photo allows me to download the stats.
You can find what you are looking for exactly on Flickr web page, in the API's documentation:
https://www.flickr.com/services/api/flickr.stats.getPhotoStats.html
Calling the method:
flickr.stats.getPhotoStats
with arguments:
api_key, date, photo_id
You will receive what you look for in the following format:
<stats views="24" comments="4" favorites="1" />
Remember to generate before your authentication token, there is a link in this same page on how to generate it, if you still didn't.
I am trying to scrape the tweets from a trending tag in twitter. I tried to find the xpath of the text in a tweet, but it doesn't work.
browser = webdriver.Chrome('/Users/Suraj/Desktop/twitter/chromedriver')
url = 'https://twitter.com/search?q=%23'+'Swastika'+'&src=trend_click'
browser.get(url)
time.sleep(1)
The following piece of code doesn't give any results.
browser.find_elements_by_xpath('//*[#id="tweet-text"]')
Other content which I was able to find where :
browser.find_elements_by_css_selector("[data-testid=\"tweet\"]") # works
browser.find_elements_by_xpath("/html/body/div[1]/div/div/div[2]/main/div/div/div/div[1]/div/div[2]/div/div/section/div/div/div/div/div/div/article/div/div/div/div[2]/div[2]/div[1]/div/div") # works
I want to know how I can select the text from the tweet.
You can use Selenium to scrape twitter but it would be much easier/faster/efficient to use the twitter API with tweepy. You can sign up for a developer account here: https://developer.twitter.com/en/docs
Once you have signed up get your access keys and use tweepy like so:
import tweepy
# connects to twitter and authenticates your requests
auth = tweepy.OAuthHandler(TWapiKey, TWapiSecretKey)
auth.set_access_token(TWaccessToken, TWaccessTokenSecret)
# wait_on_rate_limit prevents you from requesting too many times and having twitter block you
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
# loops through every tweet that tweepy.Cursor pulls -- api.search tells cursor
# what to do, q is the search term, result_type can be recent popular or mixed,
# and the max_id/since_id are snowflake ids which are twitters way of
# representing time and finally count is the maximum amount of tweets you can return per request.
for tweet in tweepy.Cursor(api.search, q=YourSearchTerm, result_type='recent', max_id=snowFlakeCurrent, since_id=snowFlakeEnd, count=100).items(500):
createdTime = tweet.created_at.strftime('%Y-%m-%d %H:%M')
createdTime = dt.datetime.strptime(createdTime, '%Y-%m-%d %H:%M').replace(tzinfo=pytz.UTC)
data.append(createdTime)
This code is an example of a script that pulls 500 tweets from YourSearchTerm recent tweets and then appends the time each was created to a list. You can check out the tweepy documentation here: http://docs.tweepy.org/en/latest/
Each tweet that you pull with the tweepy.Cursor() will have many attributes that you can choose and append to a list and or do something else. Even though it is possible to scrape twitter with Selenium it's realllly not recommended as it will be very slow whereas tweepy returns result in mere seconds.
Applying for the API is not always successful. I used Twint, which provides a means to scrape quickly. In this case to a CSV output.
def search_twitter(terms, start_date, filename, lang):
c = twint.Config()
c.Search = terms
c.Custom_csv = ["id", "user_id", "username", "tweet"]
c.Output = filename
c.Store_csv = True
c.Lang = lang
c.Since = start_date
twint.run.Search(c)
return
I am trying to extract the tweets of my friends using api.home_timeline. I don't want to stream it, but I want to save 800 tweets, the screen names, and their likes/favorites count to a csv file. Twitter only allows 200 tweets at a time. Given my keys as already specified, this is what I have so far:
def data_set(handle):
auth=tweepy.OAuthHandler(CONSUMER_KEY,CONSUMER_SECRET)
auth=set_access_token(ACCESS_KEY,ACCESS_SECRET)
api=tweepy.API(auth)
count_tweets=api.home_timeline(screen_name=handle,count=200)
twits=[]
tweet_data=[tweet.text for tweet in count_tweets]
for t in count_tweets:
twits.append(t)
if __name__== '__main__':
tweet_data('my twitter name')
my original plan was to have multiple count_tweets such as count_tweet1, etc. I am unsure how to proceed with the rest. Any suggestions are greatly appreciated.
Twitter follows pagination. For every request, you make it gives a maximum of 200 tweets(in the case of home_timeline). The 200 tweets you get are based on popularity. You can fetch all the tweets from the user's timeline by iterating over the pages. Tweepy provides Cursor functionality to iterate over the pages
Edited code for your case:
def data_set(handle):
auth=tweepy.OAuthHandler(CONSUMER_KEY,CONSUMER_SECRET)
auth=set_access_token(ACCESS_KEY,ACCESS_SECRET)
api=tweepy.API(auth)
tweet_data = []
for page in tweepy.Cursor(api.user_timeline, screen_name=handle, count=200, tweet_mode='extended').pages():
for tweet in page:
tweet_data.append(tweet.full_text)
return tweet_data
## Not sure why the following lines are needed
# twits=[]
# tweet_data=[tweet.text for tweet in count_tweets]
# for t in count_tweets:
# twits.append(t)
if __name__== '__main__':
print(data_set('my twitter name'))
I have used api.user_timeline instead of api.home_timeline in the code as you said you are trying to fetch the tweets from your friends timeline. If your use case is satisfied by api.home_timeline you can replace it instead.
For instance, with my current code, this tweet shows as:
Stunning ride through the Oxfordshire lanes this morning to get the legs ready for the Fast Test… https:// t.co/W0uFKU9jCr
I want to look like as it shown on Twitter's website, e.g.:
Stunning ride through the Oxfordshire lanes this morning to get the legs ready for the Fast Test… https://www.instagram.com/p/BSocl5Djf5v/
How can I do this exactly? I mean replacing Twitter's short urls by the urls of media, expanded urls, tweet quotes... I know it has to do with the 'entities' object in the json but I'm not sure how to handle that in my code
for status in new_tweets:
if ('RT #' not in status.full_text):
id = status.id
text = status.full_text
You are right that you need to use entities. You can get the expanded_url like so:
for status in tweepy.Cursor(twitter_api.user_timeline, screenname=username).items(limit):
if status.entities['urls']:
for url in status.entities['urls']:
links = url['expanded_url']
print(links)
You can make this print out the the status text and the expanded_url by concatenating them
for status in tweepy.Cursor(twitter_api.user_timeline, screenname=username).items(limit):
if status.entities['urls']:
for url in status.entities['urls']:
links = url['expanded_url']
print(status.text + links)
Not that this code will only print when the tweet has URLs, so I believe it will not print a tweet if no media link is shared.