Putting several tweets in dataframe

Putting several tweets in dataframe - python

I am trying to download the last 10 tweets from BarackObama. However, when I try to put them into a dataframe, it only includes the 10th tweet (so only 1). Does someone know how to solve this problem? I tried the top part of the code first with just print instead of data, and then i got all 10 tweets, so I dont know where it goes wrong. I also dont get an error message.
user = 'BarackObama'
posts = tweepy.Cursor(api.user_timeline, screen_name=user,).items(10)
for status in posts:
if status.lang == 'en':
data = {'User': [status.user.name],
'Account name' ['#'+status.user.screen_name],
'Tweet': [status.text],
'Time': [status.created_at],
'Nr of retweets': [status.retweet_count],
'Nr of favorited': [status.favorite_count]}
df = pd.DataFrame(data)
df.head()

Seems like you have to create a list of tweets, and then put them into DataFrame:
user = 'BarackObama'
posts = tweepy.Cursor(api.user_timeline, screen_name=user,).items(10)
tweets = []
for status in posts:
if status.lang == 'en':
data = {'User': [status.user.name],
'Account name' ['#'+status.user.screen_name],
'Tweet': [status.text],
'Time': [status.created_at],
'Nr of retweets': [status.retweet_count],
'Nr of favorited': [status.favorite_count]}
tweets.append(data)
df = pd.DataFrame(tweets)
df.head()

Related

How do I generate a new column on Pandas with Python to Generate Tweet Hyperlinks with Conversation ID

I am using Tweepy to scrape tweets. I cannot get the tweet URL using tweepy but I can get the conversation ID. I want to generate a new column that is essentially twitter.com/user/status/(conversation_id) of every cell before saving it.
How can I do this? My current code after the scraping cursor is:
columns = ['conservation id',
'created_at',
'likes',
'full_text',
'retweet count',
'user location',
'user name',
'user verified',
'in reply to status?',
'language']
data = []
for tweet in tweets:
data.append([tweet.id_str,
tweet.created_at,
tweet.favorite_count,
tweet.full_text,
tweet.retweet_count,
tweet.user.location,
tweet.user.screen_name,
tweet.user.verified,
tweet.in_reply_to_status_id,
tweet.lang])
df = pd.DataFrame(data, columns=columns)
print(df)
df.to_csv('testrun.csv')

Fixed it.
for tweet in tweets:
data.append([tweet.created_at,
tweet.favorite_count,
tweet.full_text,
tweet.retweet_count,
"https://twitter.com/user/status/"+tweet.id_str,
tweet.user.location,
tweet.user.screen_name,
tweet.user.verified,
tweet.in_reply_to_status_id,
tweet.lang])

Want to get twitter data using tweepy but in trouble

I am trying to retrieve Twitter data using Tweepy, using that below code, but I'm having difficulties in collecting media_fields data. Especially, I want to get the type of media, but I failed.
As you can see below, the value is copied and exists in the cell that should be empty.
[enter image description here][1]
import tweepy
from twitter_authentication import bearer_token
import time
import pandas as pd
client = tweepy.Client(bearer_token, wait_on_rate_limit=True)
hoax_tweets = []
for response in tweepy.Paginator(client.search_all_tweets,
query = 'Covid hoax -is:retweet lang:en',
user_fields = ['username', 'public_metrics', 'description', 'location','verified','entities'],
tweet_fields=['id', 'in_reply_to_user_id', 'referenced_tweets', 'context_annotations',
'source', 'created_at', 'entities', 'geo', 'withheld', 'public_metrics',
'text'],
media_fields=['media_key', 'type', 'url', 'alt_text',
'public_metrics','preview_image_url'],
expansions=['author_id', 'in_reply_to_user_id', 'geo.place_id',
'attachments.media_keys','referenced_tweets.id','referenced_tweets.id.author_id'],
place_fields=['id', 'name', 'country_code', 'place_type', 'full_name', 'country',
'geo', 'contained_within'],
start_time = '2021-01-20T00:00:00Z',
end_time = '2021-01-21T00:00:00Z',
max_results=100):
time.sleep(1)
hoax_tweets.append(response)
result = []
user_dict = {}
media_dict = {}
# Loop through each response object
for response in hoax_tweets:
# Take all of the users, and put them into a dictionary of dictionaries with the info we want to keep
for user in response.includes['users']:
user_dict[user.id] = {'username': user.username,
'followers': user.public_metrics['followers_count'],
'tweets': user.public_metrics['tweet_count'],
'description': user.description,
'location': user.location,
'verified': user.verified
}
for media in response.includes['media']:
media_dict[tweet.id] = {'media_key':media.media_key,
'type':media.type
}
for tweet in response.data:
# For each tweet, find the author's information
author_info = user_dict[tweet.author_id]
# Put all of the information we want to keep in a single dictionary for each tweet
result.append({'author_id': tweet.author_id,
'username': author_info['username'],
'author_followers': author_info['followers'],
'author_tweets': author_info['tweets'],
'author_description': author_info['description'],
'author_location': author_info['location'],
'author_verified':author_info['verified'],
'tweet_id': tweet.id,
'text': tweet.text,
'created_at': tweet.created_at,
'retweets': tweet.public_metrics['retweet_count'],
'replies': tweet.public_metrics['reply_count'],
'likes': tweet.public_metrics['like_count'],
'quote_count': tweet.public_metrics['quote_count'],
'in_reply_to_user_id':tweet.in_reply_to_user_id,
'media':tweet.attachments,
'media_type': media,
'conversation':tweet.referenced_tweets
})
# Change this list of dictionaries into a dataframe
df = pd.DataFrame(result)
Also, when I change the code ''media':tweet.attachments' to 'media':tweet.attachments[0] to get 'media_key' data, I get the following error message."TypeError: 'NoneType' object is not subscriptable"
What am I doing wrong? Any suggestions would be appreciated.
[1]: https://i.stack.imgur.com/AxCcl.png

The subscriptable error comes from the fact that tweet.attachments is None, from here the NoneType part. To make it work, you can add a check for None:
'media':tweet.attachments[0] if tweet.attachments else None
I have never used the twitter API, but one thing is to make sure the tweet attachments are always present or if they may be absent.

PRAW Loop With HTTP Exceptions

I am using a Python script to loop through a list of subreddits and pull their posts. The list is long, however, and occassionally there will be 403, 404, etc. errors in there. I am attempting to bypass those which give errors, but have been unable to do so thus far. The code is below.
I am using a list of subreddits and praw to pull from them. However, the list is quite long and occasionally a subreddit on it will be deleted, resulting in an HTTP exception (403, 404, etc). My code is below, does anyone know a line or two I can put in to skip those which give errors?
df = pd.read_csv('reddits.csv', sep = ',')
df.head()
Submission = namedtuple('Submission', ['time', 'score', 'title', 'text', 'author', 'comments', 'url', 'domain', 'permalink', 'ups', 'downs', 'likes', 'crosspost', 'duplicates', 'views'])
data = []
for i in df.reddits:
subreddit = reddit.subreddit(i)
for submission in subreddit.new(limit=10):
time = datetime.utcfromtimestamp(submission.created_utc)
score = submission.score
title = submission.title
text = submission.selftext
author = submission.author
comments = submission.num_comments
url = submission.url
domain = submission.domain
permalink = submission.permalink
ups = submission.ups
downs = submission.downs
likes = submission.likes
crosspost = submission.num_crossposts
duplicates = submission.num_duplicates
views = submission.view_count
data.append(Submission(time, score, title, text, author, comments, url, domain, permalink, ups, downs, likes, crosspost, duplicates, views))
df = pd.DataFrame(data)
os.chdir('wd')
filename = i + str(datetime.now()) + '.csv'
df.to_csv(filename, index=False, encoding='utf-8')

You need to catch the exception, then you can continue
df = pd.read_csv('reddits.csv', sep = ',')
df.head()
Submission = namedtuple('Submission', ['time', 'score', 'title', 'text', 'author', 'comments', 'url', 'domain', 'permalink', 'ups', 'downs', 'likes', 'crosspost', 'duplicates', 'views'])
data = []
for i in df.reddits:
try:
subreddit = reddit.subreddit(i)
except HTTPError as e:
print(f"Got {e} retrieving {subreddit}")
continue # control passes back to next iteration of outer loop
for submission in subreddit.new(limit=10):
submission = Submission(
datetime.utcfromtimestamp(submission.created_utc),
submission.score,
submission.title,
submission.selftext,
submission.author,
submission.num_comments,
submission.url,
submission.domain,
submission.permalink,
submission.ups,
submission.downs,
submission.likes,
submission.num_crossposts,
submission.num_duplicates,
submission.view_count,
)
data.append(submission)
df = pd.DataFrame(data)
os.chdir('wd')
filename = i + str(datetime.now()) + '.csv'
df.to_csv(filename, index=False, encoding='utf-8')
also, unrelated: i is not a good name for the value; it traditionally stands for "index", which is not what is contained there. e would be the corresponding generic name, standing for "element", but reddit would be the idiomatic choice in python.

Exporting tweets to a dataframe

I can't export the user information as a data frame even though it appears fine in the console when I print (users_info). Can someone help? Thanks.
# Define the search term and the date_since date as variables
search_words = "#sunrise"
date_since = "2019-09-01"
# Collect tweets
tweets = tw.Cursor(api.search,
q=search_words,
lang="en",
since=date_since).items(5)
users_info = [[tweet.user.screen_name, tweet.user.location, tweet.text, tweet.created_at, tweet.retweet_count, tweet.source] for tweet in tweets]
df = pd.DataFrame(users_info, columns =['user_name','user_location', 'text', 'date', 'retweet_count', 'url'])
df.to_excel=('sunrise_tweets.xlsx')

There should not be a = after df.to_excel, as this causes your filename to be assigned to df.to_excel instead calling the df.to_excel method:
df.to_excel('sunrise_tweets.xlsx')
Also ensure you have installed openpyxl or XlsxWriter. See the docs for further information.

Python - iterate through a list

I'm trying to automate email reporting using python. My problem is that i cant pull the subject from the data that my email client outputs.
Abbreviated dataset:
[(messageObject){
id = "0bd503eb00000000000000000000000d0f67"
name = "11.26.17 AM [TXT-CAT]{Shoppers:2}"
status = "active"
messageFolderId = "0bd503ef0000000000000000000000007296"
content[] =
(messageContentObject){
type = "html"
subject = "Early Cyber Monday – 60% Off Sitewide "
}
}
]
I can pull the other fields like this:
messageId = []
messageName = []
subject = []
for info in messages:
messageId.append(str(info['id']))
messageName.append(str(info['name']))
subject.append(str(info[content['subject']]))
data = pd.DataFrame({
'id': messageId,
'name': messageName,
'subject': subject
})
data.head()
I've been trying to iterate though content[] using a for loop, but i can't get it to work. Let me know if you have any suggestions.

#FamousJameous gave the correct answer:
That format is called SOAP. My guess for the syntax would be info['content']['subject'] or maybe info['content'][0]['subject']
info['content'][0]['subject'] worked with my data.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Putting several tweets in dataframe - python

Related

How do I generate a new column on Pandas with Python to Generate Tweet Hyperlinks with Conversation ID

Want to get twitter data using tweepy but in trouble

PRAW Loop With HTTP Exceptions

Exporting tweets to a dataframe

Python - iterate through a list

Categories

Resources