Accessing JSON Attributes of Tweet using Twitter API - python

This is a pretty basic question, but regardless its had me stumped for a bit. I'm trying to access a specific attribute of a tweet (documentation found here), such as "text". I tried accessing it via data["text"], however this gives me the following error TypeError: string indices must be integers.
So I tried parsing the data using json.loads(data) thinking this would allow me to access each attribute of the tweet. However this instead returns solely the text portion of the tweet, meaning when I do print(newData), it prints out the text. Although this is useful, I need to be able to access other attributes of the tweet such as "created_at".
So my question is, how do I parse the tweet or access it which allows me to pluck out individual attributes I need. To reiterate, I'm sure this is pretty simple, however I'm new to handling JSON objects, and other solutions I found simply told me to use loads(), which isn't what I want.
class TwitterStreamer():
"""
Class for streaming and processing live tweets for a given list of hashtags
"""
def stream_tweets(selfself, hashtag_list):
listener = StdOutListener()
auth = OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
stream = Stream(auth, listener)
stream.filter(track=hashtag_list)
class StdOutListener(StreamListener):
def on_data(self, data):
print(data)
newData = json.loads(data)
print(newData["text"])
return True
def on_error(self, status):
print(status)
def main():
hashtag_list = ['Chelsea']
fetched_tweets_filename = "tweets.json"
twitter_streamer = TwitterStreamer()
twitter_streamer.stream_tweets(hashtag_list)
main()

Try using "." operator to access attributes of the tweet. I used it in my code as follow:
tweet = follow_user.status.created_at
In this I got the user in the form of JSON data "status" is an attribute of that JSON object "follow_user"

Try using json.load() to load the JSON as a Python object. The method json.loads() load the JSON as a string, that's why it gives you a TypeError Exception since string objects indices can only be integers.

Related

Using tweepy to get a keyword from a specific user

I'm trying to create a listener to a very specific twitter account (mine), so I can do some automation, if I tweet something with a "special" code at the end (could be a character like "…") it will trigger an action, like adding the previous characters to a database.
So, I used Tweepy and I'm able to create the listener, filter keywords and so, but it will filter keywords from all the Tweetverse. This is my code:
import tweepy
cfg = {
"consumer_key" : "...",
"consumer_secret" : "...",
"access_token" : "...",
"access_token_secret" : "..."
}
auth = tweepy.OAuthHandler(cfg['consumer_key'], cfg['consumer_secret'])
auth.set_access_token(cfg['access_token'], cfg['access_token_secret'])
api = tweepy.API(auth)
class MyStreamListener(tweepy.StreamListener):
def on_status(self, status):
print(status.text)
return True
def on_error(self, status):
print('error ',status)
return False
myStreamListener = MyStreamListener()
myStream = tweepy.Stream(auth=auth, listener=myStreamListener)
myStream.filter(track=['…'])
It will filter all the messages containing a "…" no matter who wrote it, so I added to the last line the parameter follow='' like:
myStream.filter(follow='myTwitterName', track=['…'])
It always gives me a 406 error, if I use myStream.userstream('myTwitterName') it will give me, not just the Tweets I write, but also my whole timeline.
So, what am I doing wrong?
EDIT
I just find my first error. I was using user's screen name, not Twitter ID. Now I got rid of the 406 error, but still doesn't work. I placed the Twitter ID in the follow parameter, but does absolutely nothing. I tried both, with my account and with an account that is too "live", like CNN (ID = 759251), I see new tweets coming in my browser, but nothing on the listener.
If you're interested on knowing your own Twitter ID, I used this service: http://gettwitterid.com/
OK, solved. It was working from the very beggining, I made two mistakes:
To solve the 406 error all it has to be done, is to use Twitter id instead of Twitter name.
The listener was apparently doing nothing, because I was sending "big" tweets, that is, tweets longer than 140 chars. In this case, you shouldn't use status.text, but status.extended_tweet['full_text']
You must check for the existance of the extended_tweet, if it is not in the status received, then you should use the text

Python function won't accept string as a parameter

I am trying to pass a string as a parameter to a function in python(2), but as i try to execute it, i get an error that says:
Error get_all_tweets expected a string or other character buffer object
This is my code:
def get_all_tweets(screen_name):
consumer_key = 'XXXXXXXXXXXXXXXXXXXXXXXX'
consumer_secret = 'XXXXXXXXXXXXXXXXXXXXXXXX'
access_token = 'XXXXXXXXXXXXXXXXXXXXXXXX'
access_secret = 'XXXXXXXXXXXXXXXXXXXXXXXX'
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth)
alltweets = []
new_tweets = api.user_timeline(screen_name = screen_name, count = 200)
alltweets.extend(new_tweets)
try:
with open('$screen_name.json', 'a') as f:
f.write(alltweets)
return True
except BaseException as e:
print ('Error get_all_tweets %s' % str(e))
return True
get_all_tweets(str("BarackObama"))
I can't understand why i get a complaint about the parameter not being a string, which it clearly is. I am fairly new to python, but every resource i have come across states that this is the way to pass a string as a parameter.
Is there something i have overseen? I don't get any other errors.
I am using Python 2.7.12.
Thanks in advance
The weird error stems from you catching BaseException, something you should never ever do.
The true error is a TypeError: You trying to write a list to a file:
f.write(alltweets)
This won't work, because the write method of a file object only accepts strings or other character buffer objects as arguments.
The way to write a list to a file is by iterating over it:
for tweet in alltweets:
f.write(tweet + "\n")
This will probably not work in your case, because I assume what tweepy returns as a tweet is a dictionary, not a simple string. In that case, use json to encode it:
import json
...
for tweet in alltweets:
f.write(json.dumps(tweet) + "\n")

How can I test if two objects are equal in python?

I have made a function that connects to a twitter api.
This function returns an twitter object. I want to create a testing function that checks if the returned object is really a twitter object.
So this is my function:
def authenticate_twitter_api():
"""Make connection with twitters REST api"""
try:
logger.info('Starting Twitter Authentication')
twitter_api = twitter.Twitter(auth=twitter.OAuth(config.TWITTER_ACCESS_KEY, config.TWITTER_ACCESS_SECRET,
config.TWITTER_CONSUMER_KEY, config.TWITTER_CONSUMER_SECRET))
print twitter_api
logger.info("Service has started")
return twitter_api
except:
logger.error("Authentication Error. Could not connect to twitter api service")
When i run this function it returns:
<twitter.api.Twitter object at 0x7fc751783910>
Now, i want to create a testing function, maybe through numpy.testing in order to check if the type is a object.
numpy.testing.assert_equal(actual, desired, err_msg='')
actual = type(authenticate_twitter_api())
desired =<class 'twitter.api.Twitter'>
And here is the problem. I can't save an object to 'desired'.
What can i do ?
The desired object you are looking for is twitter.api.Twitter, just import it and pass the class the assert_equal.
However, it's more idiomatic to use isinstance:
from twitter.api import Twitter
if isinstance(authenticate_twitter_api(), Twitter):
print("It was a Twitter object.")
classes are objects themselves in Python. So you can assign your desired variable like this:
import twitter
# (...)
desired = twitter.api.Twitter

Not able to store tweets in CouchDB

I retrieved user information using api.followers in tweepy and am trying to store them in couchDB, but I keep getting this error message
"u'doc validation, u'Bad Special document member" _json".
def save_user(self, u):
temp = jsonpickle.encode(u)
temp_obj = json.loads(temp)
user_obj = temp_obj['py/state']
self.db.save(user_obj)
u is the user profile returned by the command
for user in api.followers(screen_name="sharonsanderso6"):
storage.save_user(user)
directly storing user to couchDB gives an error "string indices must be integers,not str". So tried decoding it using jsonpickle and json.loads. after doing this I get u'Bad character error. How else can I store it couchDB?
Bad Special document member _json
CouchDB reserves JSON properties beginning with an underscore for itself. Change the key of the property to something that doesn't starts with underscore.

Stop stream.filter() on button click

I am using tweepy to stream tweets and store it in a file. I am using python and flask. On a click of a button the stream start to fetch the tweets. What I want is, on a click of a button the stream should get stopped.
I know the answers related to a counter variable, but I don't want specific number of tweets to fetch.
Thanks in advance
def fetch_tweet():
page_type = "got"
lang_list = request.form.getlist('lang')
print lang_list
#return render_template("index.html",lang_list=lang_list,page_type=page_type)
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)
with open('hashtags.txt') as hf:
hashtag = [line.strip() for line in hf]
print hashtag
print request.form.getlist('fetchbtn')
if (request.form.getlist('stopbtn')) == ['stop']:
print "inside stop"
stream.disconnect()
return render_template("analyse.html")
elif (request.form.getlist('fetchbtn')) == ['fetch']:
stream.filter(track=lang_list, async=True)
return render_template("fetching.html")
So I'm assuming your initial button links to the initializing of a tweepy stream (i.e. a call to stream.filter()).
If you're going to allow your application to run while tweet collection is happening, you'll need to collect tweets asynchronously (threaded). Otherwise once you call stream.filter() it will lock your program up while it collects tweets until it either reaches some condition you have provided it or you ctrl-c out, etc.
To take advantage of tweepy's built in threading, you simply need to add the async parameter to your stream initialization, like so:
stream.filter(track=['your_terms'], async=True)
This will thread your tweet collection and allow your application to continue to run.
Finally, to stop your tweet collection, link a flask call to a function that calls disconnect() on your stream object, like so:
stream.disconnect()
This will disconnect your stream and stop tweet collection. Here is an example of this exact approach in a more object oriented design (see the gather() and stop() methods in the Pyckaxe object).
EDIT - Ok, I can see your issue now, but I'm going to leave my original answer up for others who might find this. You issue is where you are creating your stream object.
Every time fetch_tweet() gets called via flask, you are creating a new stream object, so when you call it the first time to start your stream it creates an initial object, but the second time it calls disconnect() on a different stream object. Creating a single instance of your stream will solve the issue:
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)
def fetch_tweet():
with open('hashtags.txt') as hf:
hashtag = [line.strip() for line in hf]
print hashtag
print request.form.getlist('fetchbtn')
if (request.form.getlist('stopbtn')) == ['stop']:
print "inside stop"
stream.disconnect()
return render_template("analyse.html")
elif (request.form.getlist('fetchbtn')) == ['fetch']:
stream.filter(track=lang_list, async=True)
return render_template("fetching.html")
Long story short, you need to create your stream object outside of fetch_tweets(). Good luck!

Categories

Resources