Exporting tweets to a dataframe - python

I can't export the user information as a data frame even though it appears fine in the console when I print (users_info). Can someone help? Thanks.
# Define the search term and the date_since date as variables
search_words = "#sunrise"
date_since = "2019-09-01"
# Collect tweets
tweets = tw.Cursor(api.search,
q=search_words,
lang="en",
since=date_since).items(5)
users_info = [[tweet.user.screen_name, tweet.user.location, tweet.text, tweet.created_at, tweet.retweet_count, tweet.source] for tweet in tweets]
df = pd.DataFrame(users_info, columns =['user_name','user_location', 'text', 'date', 'retweet_count', 'url'])
df.to_excel=('sunrise_tweets.xlsx')

There should not be a = after df.to_excel, as this causes your filename to be assigned to df.to_excel instead calling the df.to_excel method:
df.to_excel('sunrise_tweets.xlsx')
Also ensure you have installed openpyxl or XlsxWriter. See the docs for further information.

Related

How to search a specific country's tweets with Tweepy client.search_recent_tweets()

y'all. I'm trying to figure out how to sort for a specific country's tweets using search_recent_tweets. I take a country name as input, use pycountry to get the 2-character country code, and then I can either put some sort of location filter in my query or in search_recent_tweets params. Nothing I have tried so far in either has worked.
######
import tweepy
from tweepy import OAuthHandler
from tweepy import API
import pycountry as pyc
# upload token
BEARER_TOKEN='XXXXXXXXX'
# get tweets
client = tweepy.Client(bearer_token=BEARER_TOKEN)
# TAKE USER INPUT
countryQuery = input("Find recent tweets about travel in a certain country (input country name): ")
keyword = 'women safe' # gets tweets containing women and safe for that country (safe will catch safety)
# get country code to plug in as param in search_recent_tweets
country_code = str(pyc.countries.search_fuzzy(countryQuery)[0].alpha_2)
# get 100 recent tweets containing keywords and from location = countryQuery
query = str(keyword+' place_country='+str(countryQuery)+' -is:retweet') # search for keyword and no retweets
posts = client.search_recent_tweets(query=query, max_results=100, tweet_fields=['id', 'text', 'entities', 'author_id'])
# expansions=geo.place_id, place.fields=[country_code],
# filter posts to remove retweets
# export tweets to json
import json
with open('twitter.json', 'w') as fp:
for tweet in posts.data:
json.dump(tweet.data, fp)
fp.write('\n')
print("* " + str(tweet.text))
I have tried variations of:
query = str(keyword+' -is:retweet') # search for keyword and no retweets
posts = client.search_recent_tweets(query=query, place_fields=[str(countryQuery), country_code], max_results=100, tweet_fields=['id', 'text', 'entities', 'author_id'])
and:
query = str(keyword+' place.fields='+str(countryQuery)+','+country_code+' -is:retweet') # search for keyword and no retweets
posts = client.search_recent_tweets(query=query, max_results=100, tweet_fields=['id', 'text', 'entities', 'author_id'])
These either ended up pulling me NoneType tweets aka nothing or causing a
"The place.fields query parameter value [Germany] is not one of [contained_within,country,country_code,full_name,geo,id,name,place_type]"
The documentation for search_recent_tweets seems like place.fields / place_fields / place_country should be supported.
Any advice would help!!!

How do I generate a new column on Pandas with Python to Generate Tweet Hyperlinks with Conversation ID

I am using Tweepy to scrape tweets. I cannot get the tweet URL using tweepy but I can get the conversation ID. I want to generate a new column that is essentially twitter.com/user/status/(conversation_id) of every cell before saving it.
How can I do this? My current code after the scraping cursor is:
columns = ['conservation id',
'created_at',
'likes',
'full_text',
'retweet count',
'user location',
'user name',
'user verified',
'in reply to status?',
'language']
data = []
for tweet in tweets:
data.append([tweet.id_str,
tweet.created_at,
tweet.favorite_count,
tweet.full_text,
tweet.retweet_count,
tweet.user.location,
tweet.user.screen_name,
tweet.user.verified,
tweet.in_reply_to_status_id,
tweet.lang])
df = pd.DataFrame(data, columns=columns)
print(df)
df.to_csv('testrun.csv')
Fixed it.
for tweet in tweets:
data.append([tweet.created_at,
tweet.favorite_count,
tweet.full_text,
tweet.retweet_count,
"https://twitter.com/user/status/"+tweet.id_str,
tweet.user.location,
tweet.user.screen_name,
tweet.user.verified,
tweet.in_reply_to_status_id,
tweet.lang])

how to fix this error;AttributeError: 'Status' object has no attribute 'full_text'

#search tweet
keywords = 'تويو اسرع'
limit = 30
tweets =
tweepy.Cursor(api.search_tweets, q=keywords, count=100, tweet_mode='extends').items(limit)
#create DataFrame
columns = ['Time','Tweet']
data = []
for tweet in tweets:
data.append([tweet.created_at, tweet.full_text])
df = pd.DataFrame(data, columns=columns)
print(df)
You made a typo in the tweepy.Cursor arguments.
The tweet_mode should be set to extended, not to extends.

How to handle errors during tweet extraction using python?

I'm trying extract a dataset using tweepy. I have a set of tweet Ids that I use to extract full text tweets. I have looped the IDs and tweepy functions to get the tweet texts, but my program keeps crashing because a few of the tweet IDs on my list are from suspended accounts.
This is the related code snippet I'm using:
# Creating DataFrame using pandas
db = pd.DataFrame(columns=['username', 'description', 'location', 'following',
'followers', 'totaltweets', 'retweetcount', 'text', 'hashtags'])
#reading tweet IDs from file
df = pd.read_excel('dataid.xlsx')
mylist = df['tweet_id'].tolist()
#tweet counter
n=1
#looping for extract tweets
for i in mylist:
tweets=api.get_status(i, tweet_mode="extended")
username = tweets.user.screen_name
description = tweets.user.description
location = tweets.user.location
following = tweets.user.friends_count
followers = tweets.user.followers_count
totaltweets = tweets.user.statuses_count
retweetcount = tweets.retweet_count
text=tweets.full_text
hashtext = list()
ith_tweet = [username, description, location, following,followers, totaltweets, retweetcount, text, hashtext]
db.loc[len(db)] = ith_tweet
n=n+1
filename = 'scraped_tweets.csv'

Twitter API: How to search tweets based on query words and predetermined time span + tweets characteristics

Novice programmer here seeking help. I have a list of hashtags for which I want to get all the historical tweets from 01-01-2015 to 31-12-2018.
I tried to use the Tweepy library but it only allows access for the last 7 days of tweets. I also tried to use GetOldTweets as it gives access to historical tweets but it kept continuously crashing. So now I have acquired premium API access for Twitter which also gives me access to the full historic tweets. In order to do do my query with the premium API I cannot use the Tweepy Library (as it does not have a link with the premium APIs right?) and my choices are between TwitterAPI and Search-Tweets.
1- Does TwitterAPI and Search-Tweets supply information regarding the user name, user location, if the user is verified, the language of the tweet, the source of the tweet, the count of the retweets and favourites and the date for each tweet? (As tweepy does). I could not find any information about this.
2- Can I supply a time span in my query?
3- How do I do all of this?
This was my code for the Tweepy library:
hashtags = ["#AAPL","#FB","#KO","#ABT","#PEPCO",...]
df = pd.DataFrame(columns = ["Hashtag", "Tweets", "User", "User_Followers",
"User_Location", "User_Verified", "User_Lang", "User_Status",
"User_Method", "Fav_Count", "RT_Count", "Tweet_date"])
def tweepy_df(df,tags):
for cash in tags:
i = len(df)+1
for tweet in tweepy.Cursor(api.search, q= cash, since = "2015-01-01", until = "2018-12-31").items():
print(i, end = '\r')
df.loc[i, "Hashtag"] = cash
df.loc[i, "Tweets"] = tweet.text
df.loc[i, "User"] = tweet.user.name
df.loc[i, "User_Followers"] = tweet.followers_count
df.loc[i, "User_Location"] = tweet.user.location
df.loc[i, "User_Verified"] = tweet.user.verified
df.loc[i, "User_Lang"] = tweet.lang
df.loc[i, "User_Status"] = tweet.user.statuses_count
df.loc[i, "User_Method"] = tweet.source
df.loc[i, "Fav_Count"] = tweet.favorite_count
df.loc[i, "RT_Count"] = tweet.retweet_count
df.loc[i, "Tweet_date"] = tweet.created_at
i+=1
return df
How do I adapt this for, for example, the Twitter API Library?
I know that it should be adapted to something like this:
for tweet in api.request('search/tweets', {'q':cash})
But it is still missing the desired timespan. And I'm not sure if the names for the characteristics match the ones for this libraries.
Using TwitterAPI, you can make Premium Search requests this way:
from TwitterAPI import TwitterAPI
SEARCH_TERM = '#AAPL OR #FB OR #KO OR #ABT OR #PEPCO'
PRODUCT = 'fullarchive'
LABEL = 'your label'
api = TwitterAPI('consumer key', 'consumer secret', 'access token key', 'access token secret')
r = api.request('tweets/search/%s/:%s' % (PRODUCT, LABEL), {'query':SEARCH_TERM})
for item in r:
if 'text' in item:
print(item['text'])
print(item['user']['name'])
print(item['followers_count'])
print(item['user']['location'])
print(item['user']['verified'])
print(item['lang'])
print(item['user']['statuses_count'])
print(item['source'])
print(item['favorite_count'])
print(item['retweet_count'])
print(item['created_at'])
The Premium search doc explains the supported request arguments. To do a date range use this:
r = api.request('tweets/search/%s/:%s' % (PRODUCT, LABEL),
{'query':SEARCH_TERM, 'fromDate':201501010000, 'toDate':201812310000})

Categories

Resources