Parsing Twitter JSON object in Python - python

I am trying to download tweets from twitter.
I have used python and Tweepy for this. Though I am new to both Python and Twitter API.
My Python script is as follow:
#!usr/bin/python
#import modules
import sys
import tweepy
import json
#global variables
consumer_key = ''
consumer_secret = ''
token_key = ''
token_secret = ''
#Main function
def main():
print sys.argv[0],'starts'
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(token_key, token_secret)
print 'Connected to Twitter'
api = tweepy.API(auth)
if not api.test():
print 'Twitter API test failed'
print 'Experiment with cursor'
print 'Get search method returns json objects'
json_search = api.search(q="football")
#json.loads(json_search())
print json_search
#Standard boilerplate to call main function if this file runs
if __name__ == '__main__':
main()
I am getting result as follows:
[<tweepy.models.SearchResult object at 0x9a0934c>, <tweepy.models.SearchResult object at 0x9a0986c>, <tweepy.models.SearchResult object at 0x9a096ec>, <tweepy.models.SearchResult object at 0xb76d8ccc>, <tweepy.models.SearchResult object at 0x9a09ccc>, <tweepy.models.SearchResult object at 0x9a0974c>, <tweepy.models.SearchResult object at 0x9a0940c>, <tweepy.models.SearchResult object at 0x99fdfcc>, <tweepy.models.SearchResult object at 0x99fdfec>, <tweepy.models.SearchResult object at 0x9a08cec>, <tweepy.models.SearchResult object at 0x9a08f4c>, <tweepy.models.SearchResult object at 0x9a08eec>, <tweepy.models.SearchResult object at 0x9a08a4c>, <tweepy.models.SearchResult object at 0x9a08c0c>, <tweepy.models.SearchResult object at 0x9a08dcc>]
Now I am confused how to extract tweets from this information?
I tried to use json.loads method on this data. But it gives me error as JSON expects string or buffer.
Example code would be highly appreciated.
Thanks in advance.

Tweepy gives you richer objects; it parsed the JSON for you.
The SearchResult objects have the same attributes as the JSON structures that Twitter sent; just look up the Tweet documentation to see what is available:
for result in api.search(q="football"):
print result.text
Demo:
>>> import tweepy
>>> tweepy.__version__
'3.3.0'
>>> consumer_key = '<consumer_key>'
>>> consumer_secret = '<consumer_secret>'
>>> access_token = '<access_token>'
>>> access_token_secret = '<access_token_secret>'
>>> auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
>>> auth.set_access_token(access_token, access_token_secret)
>>> api = tweepy.API(auth)
>>> for result in api.search(q="football"):
... print result.text
...
Great moments from the Women's FA Cup http://t.co/Y4C0LFJed9
RT #freebets: 6 YEARS AGO TODAY:
Football lost one of its great managers.
RIP Sir Bobby Robson. http://t.co/NCo90ZIUPY
RT #Oddschanger: COMPETITION CLOSES TODAY!
Win a Premier League or Football League shirt of YOUR choice!
RETWEET & FOLLOW to enter. http…
Berita Transfer: Transfer rumours and paper review – Friday, July 31 http://t.co/qRrDIEP2zh [TS] #nobar #gosip
#ajperry18 im sorry I don't know this football shit😂😅
#risu_football おれモロ誕生日で北辰なんすよ笑
NFF Unveils Oliseh As Super Eagles Coach - SUNDAY Oliseh has been unveiled by the Nigeria Football... http://t.co/IOYajD9bi2 #Sports
RT #BilelGhazi: RT #lequipe : Gourcuff, au tour de Guingamp http://t.co/Dkio8v9LZq
#EDS_Amy HP SAUCE ?
RT #fsntweet: マンCの塩対応に怒りの炎!ベトナム人ファン、チケットを燃やして猛抗議 - http://t.co/yg5iuABy3K
なめるなよ、プレミアリーグ!マンチェスターCのプレシーズンツアーの行き先でベトナム人男性が、衝撃的な行
RT #peterMwendo: Le football cest un sport collectif ou on doit se faire des passe http://t.co/61hy138yo8
RT #TSBible: 6 years ago today, football lost a true gentleman. Rest in Peace Sir Bobby Robson. http://t.co/6eHTI6UxaC
6 years ago today the greatest football manger of all time passed away SIR Bobby Robson a true Ipswich and footballing legend
The Guardian: PSG close to sealing £40m deal for Manchester United’s Ángel Di María. http://t.co/gAQEucRLZa
Sir Bobby Robson, the #football #legend passed away 6 years ago.
#Barcelona #newcastle #Porto http://t.co/4UXpnvrHhS

You can use the JSON parser to achieve this, here is my code on App Engine that handles a JSONP response ready to be used in with a JQuery client:
import webapp2
import tweepy
import json
from tweepy.parsers import JSONParser
class APISearchHandler(webapp2.RequestHandler):
def get(self):
CONSUMER_KEY = 'xxxx'
CONSUMER_SECRET = 'xxxx'
ACCESS_TOKEN_KEY = 'xxxx'
ACCESS_TOKEN_SECRET = 'xxxx'
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN_KEY, ACCESS_TOKEN_SECRET)
api = tweepy.API(auth, parser=JSONParser())
# Query String Parameters
qs = self.request.get('q')
max_id = self.request.get('max_id')
# JSONP Callback
callback = self.request.get('callback')
max_tweets = 100
search_results = api.search(q=qs, count=max_tweets, max_id=max_id)
json_str = json.dumps( search_results )
if callback:
response = "%s(%s)" % (callback, json_str)
else:
response = json_str
self.response.write( response )
So the key point is
api = tweepy.API(auth, parser=JSONParser())

Instead of using global variables, I would reorganize the code in a python class:
import tweepy
class TweetPrinter():
"""
Simple class to print tweets
"""
def __init__(self, consumer_key, consumer_secret, access_token,
access_token_secret):
self.consumer_key = consumer_key
self.consumer_secret = consumer_secret
self.access_token = access_token
self.access_token_secret = access_token_secret
self.auth = tweepy.OAuthHandler(self.consumer_key,
self.consumer_secret)
self.auth.set_access_token(access_token, access_token_secret)
def tweet_print(self):
api = tweepy.API(self.auth)
football_tweets = api.search(q="football")
for tweet in football_tweets:
print(tweet.text)
def main():
tweet_printer = TweetPrinter(my_consumer_key, my_consumer_secret,
my_access_token, my_access_token_secret)
tweet_printer.tweet_print()
if __name__ == '__main__':
main()

take my code for tweepy:
def twitterfeed():
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
statuses = tweepy.Cursor(api.home_timeline).items(20)
data = [s.text.encode('utf8') for s in statuses]
print data

Related

Twitter, tweepy, search_30_day - doesn't produce any results comparing to norma look

Could somebody point me what i am doing wrong in here?
When using normal look without premium feature i got results
import tweepy
consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''
OAUTH_KEYS = {'consumer_key': consumer_key, 'consumer_secret': consumer_secret,
'access_token_key': access_token, 'access_token_secret': access_token_secret}
auth = tweepy.OAuthHandler(OAUTH_KEYS['consumer_key'], OAUTH_KEYS['consumer_secret'])
api = tweepy.API(auth)
cricTweet = tweepy.Cursor(api.search, q='', geocode='60.00,10.00,40km').items(10)
how_many_tweets = 0
for tweet in cricTweet:
how_many_tweets += 1
print(how_many_tweets)
premiumTweet = api.search_30_day(environment_name='dev', query='point_radius:[60.00 10.00 40km]',
fromDate=202012300000, maxResults=100)
how_many_tweets = 0
for tweet in premiumTweet:
how_many_tweets += 1
print(how_many_tweets)
but when trying to do the same with premium search_30_day i got no results back and my question is what i am doing wrong

Tweepy: Trying to fetch 3200 tweets from a public twitter account

With the python code below I tried to fetch 3200 tweets from a public twitter profile, but so far I only get different amounts of tweets which are way less than the maximum of 3200 tweets and I can't really understand the problem. Can someone please explain me what I am doing wrong here?
import tweepy
import json
import pandas as pd
consumer_key = "xxx"
consumer_secret = "xxx"
access_token = "xxx"
access_token_secret = "xxx"
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True)
results=[]
timeline = tweepy.Cursor(api.user_timeline, screen_name='#realDonaldTrump', tweet_mode="extended").items()
for status in timeline:
data = (
status.user.id,
status.user.screen_name,
status.user.name,
status.full_text,
status.created_at,
status.lang)
results.append(data)
cols = "user_id screen_name name text date lang".split()
df = pd.DataFrame(results, columns=cols)

How improve my code to get friends in tweepy

I wrote a code to pick friends for a list of ids in twitter.
But API issues make this code very slow.
It's possible improve?
My code is:
import tweepy
consumer_key = ''
consumer_key_secret = ''
access_token = ''
access_token_secret = ''
auth = tweepy.OAuthHandler(consumer_key, consumer_key_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True)
for cara in fin1:
if cara in dici1.keys(): next
else:
amigos=[]
for user in tweepy.Cursor(api.friends, screen_name=cara).items():
time.sleep(60)
try:
amigos.append(user.screen_name)
comum = [pessoa for pessoa in amigos if pessoa in fin1]
dici = {cara : comum}
dici1.update(dici)
except: time.sleep(60), next
fin1 is the list of ids(name of the user, 39 in total)
dici1 is a dict is where i store the information.
Remove the time.sleep call, it's not necessary, you also have some stuff that make no sense at all, like those next
import tweepy
consumer_key = ''
consumer_key_secret = ''
access_token = ''
access_token_secret = ''
auth = tweepy.OAuthHandler(consumer_key, consumer_key_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True)
for cara in fin1:
if cara in dici1
continue
amigos = []
for user in tweepy.Cursor(api.friends, screen_name=cara).items():
try:
amigos.append(user.screen_name)
comum = [pessoa for pessoa in amigos if pessoa in fin1]
dici1[cara] = comum
except:
time.sleep(60)

How to extract tweets location which contain specific keyword using twitter API in Python

I am trying to extract the all tweets which contain specific keyword and its geo locations .
for example , I want download all the tweets in english which contains the keyword 'iphone' from 'france' and 'singapore'
My code
import tweepy
import csv
import pandas as pd
import sys
# API credentials here
consumer_key = 'INSERT CONSUMER KEY HERE'
consumer_secret = 'INSERT CONSUMER SECRET HERE'
access_token = 'INSERT ACCESS TOKEN HERE'
access_token_secret = 'INSERT ACCESS TOKEN SECRET HERE'
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth,wait_on_rate_limit=True,wait_on_rate_limit_notify=True)
# Search word/hashtag value
HashValue = ""
# search start date value. the search will start from this date to the current date.
StartDate = ""
# getting the search word/hashtag and date range from user
HashValue = input("Enter the hashtag you want the tweets to be downloaded for: ")
StartDate = input("Enter the start date in this format yyyy-mm-dd: ")
# Open/Create a file to append data
csvFile = open(HashValue+'.csv', 'a')
#Use csv Writer
csvWriter = csv.writer(csvFile)
for tweet in tweepy.Cursor(api.search,q=HashValue,count=20,lang="en",since=StartDate, tweet_mode='extended').items():
print (tweet.created_at, tweet.full_text)
csvWriter.writerow([tweet.created_at, tweet.full_text.encode('utf-8')])
print ("Scraping finished and saved to "+HashValue+".csv")
#sys.exit()
How can this be done.
-Hello- Rahul
As I understand it you are looking to get geo data off searched tweets rather then filter search based on geocode.
Here is a code sample with the relevant fields you are interested in. These may or may not be provided depending on the tweeters privacy settings.
Note there is no "since" parameter on the search API:
https://tweepy.readthedocs.io/en/latest/api.html#help-methods
https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets
Standard twitter api search goes back 7 days. The premium and enterprise APIs have 30 day search as well as Full Archive search, but you will pay $$$.
Unfortunately tweepy still hasn't had their models documented:
https://github.com/tweepy/tweepy/issues/720
So if you want to look at the tweet object you can use pprint package and run:
pprint(tweet.__dict__)
One difference I noticed was the "text" field in the JSON became "full_text" in the object.
There's also information on the original tweet in there if the one you found was a quote tweet, has the same info from what I could see.
Anyway here's the code, I added a max tweet count for looping through the cursor while I was testing to avoid blowing any API limits.
Let me know if you want csv code but it looks like you can handle that already.
import tweepy
# API credentials here
consumer_key = 'your-info'
consumer_secret = 'your-info'
access_token = 'your-info'
access_token_secret = 'your-info'
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth,wait_on_rate_limit=True,wait_on_rate_limit_notify=True)
searchString = "iPhone"
cursor = tweepy.Cursor(api.search, q=searchString, count=20, lang="en", tweet_mode='extended')
maxCount = 1
count = 0
for tweet in cursor.items():
print()
print("Tweet Information")
print("================================")
print("Text: ", tweet.full_text)
print("Geo: ", tweet.geo)
print("Coordinates: ", tweet.coordinates)
print("Place: ", tweet.place)
print()
print("User Information")
print("================================")
print("Location: ", tweet.user.location)
print("Geo Enabled? ", tweet.user.geo_enabled)
count = count + 1
if count == maxCount:
break;
Will output something like this:
Tweet Information
================================
Text: NowPlaying : Hashfinger - Leaving
https://derp.com
#iPhone free app https://derp.com
#peripouwebradio
Geo: None
Coordinates: None
Place: None
User Information
================================
Location: Greece
Geo Enabled? True

tweepy: truncated tweets when using tweet_mode='extended'

This is my code in python
import tweepy
import csv
consumer_key = "?"
consumer_secret = "?"
access_token = "?"
access_token_secret = "?"
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
search_tweets = api.search('trump',count=1,tweet_mode='extended')
print(search_tweets[0].full_text)
print(search_tweets[0].id)
and the output is the following tweet
RT #CREWcrew: When Ivanka Trump has business interests across the world, we have to
ask if she’s representing the United States or her busi…
967462205561212929
which is truncated, although I used tweet_mode='extended'.
How can I extract the full text??
I've had the same problem as you recently, this happened for the retweets only, I found that you can find the full text under here: tweet._json['retweeted_status']['full_text']
Code snippet:
...
search_tweets = api.search('trump',count=1,tweet_mode='extended')
for tweet in search_tweets:
if 'retweeted_status' in tweet._json:
print(tweet._json['retweeted_status']['full_text'])
else:
print(tweet.full_text)
...
EDIT Also please note that this won't show RT #.... at the beginning of the text, you might wanna add RT at the start of the text, whatever suits you.
EDIT 2
You can get the name of the author of the tweet and add it as the beginning as follows
retweet_text = 'RT # ' + api.get_user(tweet.retweeted_status.user.id_str).screen_name
This worked for me :
if 'retweeted_status' in status._json:
if 'extended_tweet' in status._json['retweeted_status']:
text = 'RT #'+status._json['retweeted_status']['user']['screen_name']+':'+status._json['retweeted_status']['extended_tweet']['full_text']
else:
text = 'RT #'+status._json['retweeted_status']['user']['screen_name']+':' +status._json['retweeted_status']['text']
else:
if 'extended_tweet' in status._json:
text = status._json['extended_tweet']['full_text']
else:
text = status.text

Categories

Resources