Attribute error when using user object on tweepy - python

I'm trying to write a program that will stream tweets from Twitter using their Stream API and Tweepy. Here's the relevant part of my code:
def on_data(self, data):
if data.user.id == "25073877" or data.in_reply_to_user_id == "25073877":
self.filename = trump.csv
elif data.user.id == "30354991" or data.in_reply_to_user_id == "30354991":
self.filename = harris.csv
if not 'RT #' in data.text:
csvFile = open(self.filename, 'a')
csvWriter = csv.write(csvFile)
print(data.text)
try:
csvWriter.writerow([data.text, data.created_at, data.user.id, data.user.screen_name, data.in_reply_to_status_id])
except:
pass
def on_error(self, status_code):
if status_code == 420:
return False
What the code should be doing is streaming the tweets and writing the text of the tweet, the creation date, the user ID of the tweeter, their screen name, and the reply ID of the status they're replying to if the tweet is a reply. However, I get the following error:
File "test.py", line 13, in on_data
if data.user.id == "25073877" or data.in_reply_to_user_id == "25073877":
AttributeError: 'unicode' object has no attribute 'user'
Could someone help me out? Thanks!
EDIT: Sample of what is being read into "data"
{"created_at":"Fri Feb 15 20:50:46 +0000 2019","id":1096512164347760651,"id_str":"1096512164347760651","text":"#realDonaldTrump \nhttps:\/\/t.co\/NPwSuJ6V2M","source":"\u003ca href=\"http:\/\/twitter.com\" rel=\"nofollow\"\u003eTwitter Web Client\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":25073877,"in_reply_to_user_id_str":"25073877","in_reply_to_screen_name":"realDonaldTrump","user":{"id":1050189031743598592,"id_str":"1050189031743598592","name":"Lauren","screen_name":"switcherooskido","location":"United States","url":null,"description":"Concerned citizen of the USA who would like to see Integrity restored in the US Government. Anti-marxist!\nSigma, INTP\/J\nREJECT PC and Identity Politics #WWG1WGA","translator_type":"none","protected":false,"verified":false,"followers_count":1459,"friends_count":1906,"listed_count":0,"favourites_count":5311,"statuses_count":8946,"created_at":"Thu Oct 11 00:59:11 +0000 2018","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"000000","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"FF691F","profile_sidebar_border_color":"000000","profile_sidebar_fill_color":"000000","profile_text_color":"000000","profile_use_background_image":false,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/1068591478329495558\/ng_tNAXx_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/1068591478329495558\/ng_tNAXx_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/1050189031743598592\/1541441602","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"urls":[{"url":"https:\/\/t.co\/NPwSuJ6V2M","expanded_url":"https:\/\/www.conservativereview.com\/news\/5-insane-provisions-amnesty-omnibus-bill\/","display_url":"conservativereview.com\/news\/5-insane-\u2026","indices":[18,41]}],"user_mentions":[{"screen_name":"realDonaldTrump","name":"Donald J. Trump","id":25073877,"id_str":"25073877","indices":[0,16]}],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"und","timestamp_ms":"1550263846848"}
So I supposed the revised question is how to tell the program to only write parts of this JSON output to the CSV file? I've been using the references Twitter's stream API provides for the attributes for "data".

As stated in your comment the tweet data is in "JSON format". I believe what you mean by this is that it is a string (unicode) in JSON format, not a parsed JSON object. In order to access the fields like you want to in your code you need to parse the data string using json.
e.g.
import json
json_data_object = json.loads(data)
you can then access the fields like you would a dictionary e.g.
json_data_object['some_key']['some_other_key']

This is a very late answer, but I'm answering here because this is the first search hit when you search for this error. I was also using Tweepy and found that the JSON response object had attributes that could not be accessed.
'Response' object has no attribute 'text'
Through lots of tinkering and research, I found that in the loop where you access the Twitter API, using Tweepy, you must specify '.data' in the loop, not within it.
For example:
tweets = client.search_recent_tweets(query = "covid" , tweet.fields = ['text'])
for tweet in tweets:
print(tweet.text) # or print(tweet.data.text)
Will not work because the Response variable doesn't have access to the attributes within the JSON response object. Instead, you do something like:
tweets = client.search_recent_tweets(query = "covid" , tweet.fields = ['text'])
for tweet in tweets.data:
print(tweet.text)
Basically, this was a long-winded way to fix a problem I was having for a long time. Cheers, hopefully, other noobs like me won't have to struggle as long as I did!

Related

How to get twitter handle from tweet using Tweepy API 2.0

I am using the Twitter API StreamingClient using the python module Tweepy. I am currently doing a short stream where I am collecting tweets and saving the entire ID and text from the tweet inside of a json object and writing it to a file.
My goal is to be able to collect the Twitter handle from each specific tweet and save it to a json file (preferably print it in the output terminal as well).
This is what the current code looks like:
KEY_FILE = './keys/bearer_token'
DURATION = 10
def on_data(json_data):
json_obj = json.loads(json_data.decode())
#print('Received tweet:', json_obj)
print(f'Tweet Screen Name: {json_obj.user.screen_name}')
with open('./collected_tweets/tweets.json', 'a') as out:
json.dump(json_obj, out)
bearer_token = open(KEY_FILE).read().strip()
streaming_client = tweepy.StreamingClient(bearer_token)
streaming_client.on_data = on_data
streaming_client.sample(threaded=True)
time.sleep(DURATION)
streaming_client.disconnect()
And I have no idea how to do this, the only thing I found is that someone did this:
json_obj.user.screen_name
However, this did not work at all, and I am completely stuck.
So a couple of things
Firstly, I'd recommend using on_response rather than on_data because StreamClient already defines a on_data function to parse the json. (Then it will fire on_tweet, on_response, on_error, etc)
Secondly, json_obj.user.screen_name is part of API v1 I believe, which is why it doesn't work.
To get extra data using Twitter Apiv2, you'll want to use Expansions and Fields (Tweepy Documentation, Twitter Documentation)
For your case, you'll probably want to use "username" which is under the user_fields.
def on_response(response:tweepy.StreamResponse):
tweet:tweepy.Tweet = response.data
users:list = response.includes.get("users")
# response.includes is a dictionary representing all the fields (user_fields, media_fields, etc)
# response.includes["users"] is a list of `tweepy.User`
# the first user in the list is the author (at least from what I've tested)
# the rest of the users in that list are anyone who is mentioned in the tweet
author_username = users and users[0].username
print(tweet.text, author_username)
streaming_client = tweepy.StreamingClient(bearer_token)
streaming_client.on_response = on_response
streaming_client.sample(threaded=True, user_fields = ["id", "name", "username"]) # using user fields
time.sleep(DURATION)
streaming_client.disconnect()
Hope this helped.
also tweepy documentation definitely needs more examples for api v2
KEY_FILE = './keys/bearer_token'
DURATION = 10
def on_data(json_data):
json_obj = json.loads(json_data.decode())
print('Received tweet:', json_obj)
with open('./collected_tweets/tweets.json', 'a') as out:
json.dump(json_obj, out)
bearer_token = open(KEY_FILE).read().strip()
streaming_client = tweepy.StreamingClient(bearer_token)
streaming_client.on_data = on_data
streaming_client.on_closed = on_finish
streaming_client.sample(threaded=True, expansions="author_id", user_fields="username", tweet_fields="created_at")
time.sleep(DURATION)
streaming_client.disconnect()

Tweepy Api Python Response, need help decoding the response

I trust all is well with you and yours. Thank you for taking a moment to read through this and I apologize if this is a repeat (if it is point me to the right spot and I will read through that!)
I am trying to hit the twitter api via tweepy (cause im to new to figure out python and the twitter official api) and return a result in a useable format.
import Auth_Codes
import json
twitter_auth_keys = {
"consumer_key" : Auth_Codes.consumer_key,
"consumer_secret" : Auth_Codes.consumer_secret,
"access_token" : Auth_Codes.access_token,
"access_token_secret" : Auth_Codes.access_token_secret
}
auth = tweepy.OAuthHandler(
twitter_auth_keys["consumer_key"],
twitter_auth_keys["consumer_secret"]
)
auth.set_access_token(
twitter_auth_keys["access_token"],
twitter_auth_keys["access_token_secret"]
)
api = tweepy.API(auth)
#api.search_tweets(q = "Aztar")
searched_tweets = [tweet for tweet in tweepy.Cursor(api.search_tweets,
q = "What you want to search",
lang = 'en',
result_type = 'recent',
count = 1)
.items(1)]
print(searched_tweets)
print(type(searched_tweets))
when this is executed, I get a very large response that I cannot fully post here.
it is also type: <class 'list'>
I hope that added the spoiler button as intended. My issue is that I have tried in several different ways to convert this into an actual json, and I am struggling as every guide I am following online leads me to a dead end (granted I am learning lots!). In node.js, I would normally leverage a map and sort it that way. Is there something similar I can do here? Not all the data is relevant to me.
Thanks in advance, and really sorry about not knowing how to add a spoiler button if it is at all possible.
I have added the following to it:
searched_tweets_dict = json.loads(searched_tweets)
print(searched_tweets_dict)
and the result is the following error code:
Traceback (most recent call last):
File "E:\Dropbox\Backup\Github\Python\Mid_Journey\Search.py", line 33, in <module>
searched_tweets_dict = json.loads(searched_tweets)
File "C:\Pthyon_3.10\lib\json\__init__.py", line 339, in loads
raise TypeError(f'the JSON object must be str, bytes or bytearray, '
TypeError: the JSON object must be str, bytes or bytearray, not list
Why are you using a Cursor if you are only requesting one tweet?
And why don't you just use the generator instead of creating that list?
Anyway, the json object is already included in the Tweepy objects (._json).
cursor = tweepy.Cursor(
api.search_tweets,
q = "What you want to search",
lang = 'en',
result_type = 'recent',
count = 1
)
for tweet in cursor.items(1):
print(tweet._json)

Parsing tweets in json format to find tweeter users

I am reading a tweeter feed in json format to read the number of users.
Some lines in the input file might not be tweets, but messages that the Twitter server sent to the developer (such as limit notices). I need to ignore these messages.
These messages would not contain the created_at field and can be filtered out accordingly.
I have written the following piece of code, to extract the valid tweets, and then extract the user.id and the text.
def safe_parse(raw_json):
try:
json_object = json.loads(raw_json)
if 'created_at' in json_object:
return json_object
else:
return
except ValueError as error:
return
def get_usr_txt (line):
tmp = safe_parse(line)
if(tmp != None):
return ((tmp.get('user').get('id_str'),tmp.get('text')))
else:
return
My challenge is that I get one extra user called "None"
Here is a sample output (it is a large file)
('49838600', 'This is the temperament you look for in a guy who would
have access to our nuclear arsenal. ), None, ('2678507624', 'RT
#GrlSmile: #Ricky_Vaughn99 Yep, which is why in 1992 I switched from
Democrat to Republican to vote Pat Buchanan, who warned of all of
t…'),
I am struggling to find out, what I am doing wrong. There is no None in the tweeter file, hence I am assuming that I am reading the
{"limit":{"track":1,"timestamp_ms":"1456249416070"}} but the code above should not include it, unless I am missing something.
Any pointers? and thanks for the your help and your time.
Some lines in the input file might not be tweets, but messages that the Twitter server sent to the developer (such as limit notices). I need to ignore these messages.
That's not exactly what happens. If one of the following happens:
raw_json is not a valid JSON document
created_at is not in the parsed object.
you return with default value, which is None. If you want to ignore these, you can add filter step between two operations:
rdd.map(safe_parse).filter(lambda x: x).map(get_usr_txt)
You can also use flatMap trick to avoid filter and simplify your code (borrowed from this answer by zero323):
def safe_parse(raw_json):
try:
json_object = json.loads(raw_json)
except ValueError as error:
return []
else:
if 'created_at' in json_object:
yield json_object
rdd.flatMap(safe_parse).map(get_usr_txt)

Transferring Twitter Tweets to a txt file

from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import json
from pprint import pprint
data_file = open('twitter.json')
data = json.load(data_file)
##Json file with all the ckey, csecret, atoken, and asecret
pprint(data)
#consumer key, consumer secret, access token, access secret.
ckey = data["ckey"]
csecret = data["csecret"]
atoken = data["atoken"]
asecret = data["asecret"]
class listener(StreamListener):
def on_data(self, data):
all_data = json.loads(data)
tweet = all_data["text"]
username = all_data["user"]["screen_name"]
print((username,tweet))
return True
def on_error(self, status):
print (status)
auth = OAuthHandler(ckey, csecret)
auth.set_access_token(atoken, asecret)
The code above is all standard in accessing the twitter api. However, I need to transfer the tweets obtained from twitter to a .txt file. I tried using the code below
twitterStream = Stream(auth, listener())
fid = open("cats based tweets.txt","w")
for tweet in twitterStream.filter(track=[cats]):
fid.write(tweet)
fid.close()
I intend on finding all twitter tweets/reposts that include the keyword cats, which it does. However, it is supposed to also write a txt file that includes all the tweets but it doesn't. Can anyone tell me what I need to do it fix it.
EDIT : I used the code that you guys have written but it doesn't return all of the tweets. It prints out like 5 or 6 then the error
RuntimeError: No active exception to reraise
appears and I have no idea why. Why does this occur cause I know it shouldn't.
I've done this in a project and my method involves changing the on_data method within the StreamListener object.
My code looks like this:
class Listener(StreamListener):
def __init__(self, api=None, path=None):
#I don't remember exactly why I defined this.
self.api = api
#We'll need this later.
self.path = path
def on_data(self, data):
all_data = json.loads(data)
tweet = all_data["text"]
username = all_data["user"]["screen_name"]
print((username,tweet))
#Open, write and close your file.
savefile = open(file_path, 'ab')
savefile.write(tweet)
savefile.close()
return True
A few things in the actual code, not where you redefined Listener or on_data. In order:
Define the file where you want to save. Let's call that variable the file_path. Don't forget to add the .txt extensions here.
Call the Stream and the Listener:
twitterStream = Stream(authorization, Listener(path=file_path))
Use your filters. Mine are coordinates and I put the filter in a try, except so that my code doesn't stop. Here it is adapted for you:
try:
twitterStream.filter(track=[cats])
except Exception, e:
print 'Failed filter() with this error:', str(e)
Now the text in the tweet should be written in the file whenever a text appears in the stream. Take a look at your file size and you should see it increase. Particularly, if your filter is about cats. Internet loves cats.
I guess there is a slight indentation error in the snippet you provided, However I will try to fix your error with 2 approaches, the first one is by correcting the indentation and the second one would be to change youron_data method
Approach 1:
fid = open("cats based tweets.txt","w")
for tweet in twitterStream.filter(track=[cats]):
fid.write(tweet+"\n")
fid.close()
Or you could simply write the above code as :
with open("cats based tweets.txt","w") as fid:
for tweet in twitterStream.filter(track=[cats]):
fid.write(tweet+"\n")
Approach 2:
In the second approach we can change the on_data method so that when the program receives a new tweet it opens and file and directly writes to it , but for this we need to open the file in append mode, as opening the file in w writeable mode would overwrite the contents of the file again and again.
def on_data(self, data):
all_data = json.loads(data)
tweet = all_data["text"]
username = all_data["user"]["screen_name"]
print((username,tweet))
with open("cats based tweets.txt","a") as fid:
fid.write(tweet+"\n")
return True
See the below link then you will know about how to save the tweets to Database as well as to the our local file.
https://github.com/anandstarz/Scrapee/blob/master/tweets

why do i see my own timeline tweets and not user's?

I am trying to view another user's tweets. The other user is following me and i am following the user on twitter. But when i try this, i only see my own tweets, no matter what name i enter as argument for GetUserTimeline.
What should i do??
import twitter
api = twitter.Api(consumer_key='', consumer_secret='', access_token_key='',access_token_secret='')
statuses = api.GetUserTimeline('chooimooi')
for tweet in statuses:
print tweet
Also, how can i export this data to a text file?
Take a look at pydoc for twitter.Api.GetUserTimeline
pydoc twitter.Api.GetUserTimeline
which states:
twitter.Api.GetUserTimeline = GetUserTimeline(self, user_id=None, screen_name=None,
since_id=None, max_id=None, count=None, include_rts=True, trim_user=None,
exclude_replies=None) unbound twitter.Api method
I think therefore that putting screen_name='usernamerequired' will work. For example
statuses = api.GetUserTimeline(screen_name='chooimooi')

Categories

Resources