I am getting the tweets and the corresponding id of that user in an object obj. I want to know why I don't get the other informations like conversation_id. I want to use it to get the replies and the quotes. That's the solution that I found in the internet but didn't know how to make it work.
Does any anyone know to extract the conversation_id or any other parameters like geo.place_id? I am using tweepy but if anyone has any other solution using another library to get the same result it will be also helpful. Thanks for your help!!!
You can try the code if you create another file config and define your tokens. I can't share mine due to security purposes.
import tweepy
import config
users_name = ['derspiegel', 'zeitonline']
tweet_tab = []
def getClient():
client = tweepy.Client(bearer_token=config.BEARER_TOKEN,
consumer_key=config.API_KEY,
consumer_secret=config.API_KEY_SECRET,
access_token=config.ACCESS_TOKEN,
access_token_secret=config.ACCESS_TOKEN_SECRET)
def searchTweets(client):
for i in users_name:
client = getClient()
user = client.get_user(username=i)
userId = user.data.id
tweets = client.get_users_tweets(userId,
expansions=[
'author_id', 'referenced_tweets.id', 'referenced_tweets.id.author_id',
'in_reply_to_user_id', 'attachments.media_keys', 'entities.mentions.username', 'geo.place_id'],
tweet_fields=[
'id', 'text', 'author_id', 'created_at', 'conversation_id', 'entities',
'public_metrics', 'referenced_tweets'
],
user_fields=[
'id', 'name', 'username', 'created_at', 'description', 'public_metrics',
'verified'
],
place_fields=['full_name', 'id'],
media_fields=['type', 'url', 'alt_text', 'public_metrics'])
if not tweets is None and len(tweets) > 0:
obj = {}
obj['id'] = userId
obj['text'] = tweets
tweet_tab.append(obj)
return tweet_tab
searchTweets(client)
print("tableau final", tweet_tab)
my guess is that you need to put the ids into a list through which the function can iterate. Create the id list and try:
def get_tweets_from_timelines():
tweets_timelines_list = []
for id in range(0, len(ids), 1):
one_id = (ids[id:id+1])
one_id = ' '.join(one_id)
for tweet in tweepy.Paginator(client.get_users_tweets, id=one_id, max_results=100,
tweet_fields=['attachments', 'author_id', 'context_annotations', 'created_at', 'entities', \
'conversation_id', 'possibly_sensitive', 'public_metrics', 'referenced_tweets', \
'reply_settings', 'source', 'withheld' ],\
user_fields=['created_at', 'description', 'entities', 'profile_image_url', 'protected', \
'public_metrics', 'url', 'verified', 'withheld'],
expansions=['referenced_tweets.id', 'in_reply_to_user_id', 'attachments.media_keys', ],
media_fields=['preview_image_url'],
):
tweets_timelines_list.append(tweet)
return tweets_timelines_list
Related
I am using academic account to retrieve tweet information but I don't know how to get the status_id, I thought the conversation_id would be the same as status_id but when I track back, apparently it is not. What should I add to the tweet field?
for response in tweepy.Paginator(client.search_all_tweets,
query = 'query -is:retweet lang:en',
user_fields = ['username', 'public_metrics', 'description', 'location'],
tweet_fields = ['created_at', 'geo', 'public_metrics', 'text','id','conversation_id'],
expansions = ['author_id', 'geo.place_id'],
start_time = ['2020-01-01T00:00:00Z'],
end_time = ['2020-12-12T00:00:00Z']):
time.sleep(1)
tweets.append(response)
result
You've already got it - "id" is the status id
Tweets are the basic atomic building block of all things Twitter.
Tweets are also known as “status updates.” The Tweet object has a long
list of ‘root-level’ attributes, including fundamental attributes such
as id, created_at, and text
https://developer.twitter.com/en/docs/twitter-api/v1/data-dictionary/object-model/tweet
It may be a bit confusing because references to that id are labeled things like "in_reply_to_status_id" - but there is no field called "status_id" - it's just id.
I am trying to retrieve Twitter data using Tweepy, using that below code, but I'm having difficulties in collecting media_fields data. Especially, I want to get the type of media, but I failed.
As you can see below, the value is copied and exists in the cell that should be empty.
[enter image description here][1]
import tweepy
from twitter_authentication import bearer_token
import time
import pandas as pd
client = tweepy.Client(bearer_token, wait_on_rate_limit=True)
hoax_tweets = []
for response in tweepy.Paginator(client.search_all_tweets,
query = 'Covid hoax -is:retweet lang:en',
user_fields = ['username', 'public_metrics', 'description', 'location','verified','entities'],
tweet_fields=['id', 'in_reply_to_user_id', 'referenced_tweets', 'context_annotations',
'source', 'created_at', 'entities', 'geo', 'withheld', 'public_metrics',
'text'],
media_fields=['media_key', 'type', 'url', 'alt_text',
'public_metrics','preview_image_url'],
expansions=['author_id', 'in_reply_to_user_id', 'geo.place_id',
'attachments.media_keys','referenced_tweets.id','referenced_tweets.id.author_id'],
place_fields=['id', 'name', 'country_code', 'place_type', 'full_name', 'country',
'geo', 'contained_within'],
start_time = '2021-01-20T00:00:00Z',
end_time = '2021-01-21T00:00:00Z',
max_results=100):
time.sleep(1)
hoax_tweets.append(response)
result = []
user_dict = {}
media_dict = {}
# Loop through each response object
for response in hoax_tweets:
# Take all of the users, and put them into a dictionary of dictionaries with the info we want to keep
for user in response.includes['users']:
user_dict[user.id] = {'username': user.username,
'followers': user.public_metrics['followers_count'],
'tweets': user.public_metrics['tweet_count'],
'description': user.description,
'location': user.location,
'verified': user.verified
}
for media in response.includes['media']:
media_dict[tweet.id] = {'media_key':media.media_key,
'type':media.type
}
for tweet in response.data:
# For each tweet, find the author's information
author_info = user_dict[tweet.author_id]
# Put all of the information we want to keep in a single dictionary for each tweet
result.append({'author_id': tweet.author_id,
'username': author_info['username'],
'author_followers': author_info['followers'],
'author_tweets': author_info['tweets'],
'author_description': author_info['description'],
'author_location': author_info['location'],
'author_verified':author_info['verified'],
'tweet_id': tweet.id,
'text': tweet.text,
'created_at': tweet.created_at,
'retweets': tweet.public_metrics['retweet_count'],
'replies': tweet.public_metrics['reply_count'],
'likes': tweet.public_metrics['like_count'],
'quote_count': tweet.public_metrics['quote_count'],
'in_reply_to_user_id':tweet.in_reply_to_user_id,
'media':tweet.attachments,
'media_type': media,
'conversation':tweet.referenced_tweets
})
# Change this list of dictionaries into a dataframe
df = pd.DataFrame(result)
Also, when I change the code ''media':tweet.attachments' to 'media':tweet.attachments[0] to get 'media_key' data, I get the following error message."TypeError: 'NoneType' object is not subscriptable"
What am I doing wrong? Any suggestions would be appreciated.
[1]: https://i.stack.imgur.com/AxCcl.png
The subscriptable error comes from the fact that tweet.attachments is None, from here the NoneType part. To make it work, you can add a check for None:
'media':tweet.attachments[0] if tweet.attachments else None
I have never used the twitter API, but one thing is to make sure the tweet attachments are always present or if they may be absent.
I am using a Python script to loop through a list of subreddits and pull their posts. The list is long, however, and occassionally there will be 403, 404, etc. errors in there. I am attempting to bypass those which give errors, but have been unable to do so thus far. The code is below.
I am using a list of subreddits and praw to pull from them. However, the list is quite long and occasionally a subreddit on it will be deleted, resulting in an HTTP exception (403, 404, etc). My code is below, does anyone know a line or two I can put in to skip those which give errors?
df = pd.read_csv('reddits.csv', sep = ',')
df.head()
Submission = namedtuple('Submission', ['time', 'score', 'title', 'text', 'author', 'comments', 'url', 'domain', 'permalink', 'ups', 'downs', 'likes', 'crosspost', 'duplicates', 'views'])
data = []
for i in df.reddits:
subreddit = reddit.subreddit(i)
for submission in subreddit.new(limit=10):
time = datetime.utcfromtimestamp(submission.created_utc)
score = submission.score
title = submission.title
text = submission.selftext
author = submission.author
comments = submission.num_comments
url = submission.url
domain = submission.domain
permalink = submission.permalink
ups = submission.ups
downs = submission.downs
likes = submission.likes
crosspost = submission.num_crossposts
duplicates = submission.num_duplicates
views = submission.view_count
data.append(Submission(time, score, title, text, author, comments, url, domain, permalink, ups, downs, likes, crosspost, duplicates, views))
df = pd.DataFrame(data)
os.chdir('wd')
filename = i + str(datetime.now()) + '.csv'
df.to_csv(filename, index=False, encoding='utf-8')
You need to catch the exception, then you can continue
df = pd.read_csv('reddits.csv', sep = ',')
df.head()
Submission = namedtuple('Submission', ['time', 'score', 'title', 'text', 'author', 'comments', 'url', 'domain', 'permalink', 'ups', 'downs', 'likes', 'crosspost', 'duplicates', 'views'])
data = []
for i in df.reddits:
try:
subreddit = reddit.subreddit(i)
except HTTPError as e:
print(f"Got {e} retrieving {subreddit}")
continue # control passes back to next iteration of outer loop
for submission in subreddit.new(limit=10):
submission = Submission(
datetime.utcfromtimestamp(submission.created_utc),
submission.score,
submission.title,
submission.selftext,
submission.author,
submission.num_comments,
submission.url,
submission.domain,
submission.permalink,
submission.ups,
submission.downs,
submission.likes,
submission.num_crossposts,
submission.num_duplicates,
submission.view_count,
)
data.append(submission)
df = pd.DataFrame(data)
os.chdir('wd')
filename = i + str(datetime.now()) + '.csv'
df.to_csv(filename, index=False, encoding='utf-8')
also, unrelated: i is not a good name for the value; it traditionally stands for "index", which is not what is contained there. e would be the corresponding generic name, standing for "element", but reddit would be the idiomatic choice in python.
I'm trying to retrieve the field 'biddingStrategyConfiguration' via Adwords API for Python (3) using CampaignService(), but I always get an weird error. It's weird because the field does exist, as mentioned in the documentation found here.
account_id = 'any_id'
adwords = Adwords(account_id) # classes and objects already created, etc.
def get_bidding_strategy():
service = adwords.client.GetService('CampaignService', version = 'v201806')
selector = {
'fields': ['Id', 'Name', 'Status', 'biddingStrategyConfiguration']
}
results = service.get(selector)
data = []
if 'entries' in results:
for item in results['entries']:
if item['status'] == 'ENABLED':
data.append({
'id': item['id'],
'name': item['name'],
'status': item['status'] # i have to retrieve biddingStrategyConfiguration.biddingStrategyName (next line)
})
return results
This is the error:
Error summary:
{'faultMessage': "[SelectorError.INVALID_FIELD_NAME # serviceSelector; trigger:'biddingStrategyConfiguration']",
'requestId': '000581286e61247e0a376ac776062df4',
'serviceName': 'CampaignService',
'methodName': 'get',
'operations': '1',
'responseTime': '315'}
Notice that fields like "id" or "name" are easily retrievable, but the bidding configuration is not. In fact, I'm looking for the id/name of the biddingStrategies using .biddingStrategyID or .biddingStrategyName.
Can anyone help me? Thanks in advance.
How I solved it: biddingStrategyConfiguration is not a retrievable field, but biddingStrategyName is (part of the JSON).
account_id = 'any_id'
adwords = Adwords(account_id) # classes and objects already created, etc.
def get_bidding_strategy():
service = adwords.client.GetService('CampaignService', version = 'v201806')
selector = {
'fields': ['Id', 'Name', 'Status', 'biddingStrategyName']
}
results = service.get(selector)
I'm new to programming and I've looked at previous answers to this question but none seem relevant to this specific query.
I'm learning to analyse data with python.
This is the code:
import pandas as pd
import os
os.chdir('/Users/Benjy/Documents/Python/Data Analysis Python')
unames = ['user_id', 'gender', 'age', 'occupation', 'zip']
users = pd.read_table('ml-1m/users.dat', sep='::', header = None, names = unames)
rnames = ['user_id', 'movie_id', 'rating', 'timestamp']
ratings = pd.read_table('ml-1m/ratings.dat', sep='::', header = None, names = rnames)
mnames = ['movie_id', 'title', 'genres']
movies = pd.read_table('ml-1m/movies.dat', sep='::', header = None, names = mnames)
data = pd.merge(pd.merge(ratings, users), movies)
mean_ratings=data.pivot_table('ratings',rows='title', cols='gender',aggfunc='mean')
I keep getting an error saying mean_ratings is not defined...but surely it is defined in the last line of code above?
I think this will work: mean_ratings=data.pivot_table('rating',index='title',columns='gender',aggfunc='mean')