from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import json
from pprint import pprint
data_file = open('twitter.json')
data = json.load(data_file)
##Json file with all the ckey, csecret, atoken, and asecret
pprint(data)
#consumer key, consumer secret, access token, access secret.
ckey = data["ckey"]
csecret = data["csecret"]
atoken = data["atoken"]
asecret = data["asecret"]
class listener(StreamListener):
def on_data(self, data):
all_data = json.loads(data)
tweet = all_data["text"]
username = all_data["user"]["screen_name"]
print((username,tweet))
return True
def on_error(self, status):
print (status)
auth = OAuthHandler(ckey, csecret)
auth.set_access_token(atoken, asecret)
The code above is all standard in accessing the twitter api. However, I need to transfer the tweets obtained from twitter to a .txt file. I tried using the code below
twitterStream = Stream(auth, listener())
fid = open("cats based tweets.txt","w")
for tweet in twitterStream.filter(track=[cats]):
fid.write(tweet)
fid.close()
I intend on finding all twitter tweets/reposts that include the keyword cats, which it does. However, it is supposed to also write a txt file that includes all the tweets but it doesn't. Can anyone tell me what I need to do it fix it.
EDIT : I used the code that you guys have written but it doesn't return all of the tweets. It prints out like 5 or 6 then the error
RuntimeError: No active exception to reraise
appears and I have no idea why. Why does this occur cause I know it shouldn't.
I've done this in a project and my method involves changing the on_data method within the StreamListener object.
My code looks like this:
class Listener(StreamListener):
def __init__(self, api=None, path=None):
#I don't remember exactly why I defined this.
self.api = api
#We'll need this later.
self.path = path
def on_data(self, data):
all_data = json.loads(data)
tweet = all_data["text"]
username = all_data["user"]["screen_name"]
print((username,tweet))
#Open, write and close your file.
savefile = open(file_path, 'ab')
savefile.write(tweet)
savefile.close()
return True
A few things in the actual code, not where you redefined Listener or on_data. In order:
Define the file where you want to save. Let's call that variable the file_path. Don't forget to add the .txt extensions here.
Call the Stream and the Listener:
twitterStream = Stream(authorization, Listener(path=file_path))
Use your filters. Mine are coordinates and I put the filter in a try, except so that my code doesn't stop. Here it is adapted for you:
try:
twitterStream.filter(track=[cats])
except Exception, e:
print 'Failed filter() with this error:', str(e)
Now the text in the tweet should be written in the file whenever a text appears in the stream. Take a look at your file size and you should see it increase. Particularly, if your filter is about cats. Internet loves cats.
I guess there is a slight indentation error in the snippet you provided, However I will try to fix your error with 2 approaches, the first one is by correcting the indentation and the second one would be to change youron_data method
Approach 1:
fid = open("cats based tweets.txt","w")
for tweet in twitterStream.filter(track=[cats]):
fid.write(tweet+"\n")
fid.close()
Or you could simply write the above code as :
with open("cats based tweets.txt","w") as fid:
for tweet in twitterStream.filter(track=[cats]):
fid.write(tweet+"\n")
Approach 2:
In the second approach we can change the on_data method so that when the program receives a new tweet it opens and file and directly writes to it , but for this we need to open the file in append mode, as opening the file in w writeable mode would overwrite the contents of the file again and again.
def on_data(self, data):
all_data = json.loads(data)
tweet = all_data["text"]
username = all_data["user"]["screen_name"]
print((username,tweet))
with open("cats based tweets.txt","a") as fid:
fid.write(tweet+"\n")
return True
See the below link then you will know about how to save the tweets to Database as well as to the our local file.
https://github.com/anandstarz/Scrapee/blob/master/tweets
Related
I am using the Twitter API StreamingClient using the python module Tweepy. I am currently doing a short stream where I am collecting tweets and saving the entire ID and text from the tweet inside of a json object and writing it to a file.
My goal is to be able to collect the Twitter handle from each specific tweet and save it to a json file (preferably print it in the output terminal as well).
This is what the current code looks like:
KEY_FILE = './keys/bearer_token'
DURATION = 10
def on_data(json_data):
json_obj = json.loads(json_data.decode())
#print('Received tweet:', json_obj)
print(f'Tweet Screen Name: {json_obj.user.screen_name}')
with open('./collected_tweets/tweets.json', 'a') as out:
json.dump(json_obj, out)
bearer_token = open(KEY_FILE).read().strip()
streaming_client = tweepy.StreamingClient(bearer_token)
streaming_client.on_data = on_data
streaming_client.sample(threaded=True)
time.sleep(DURATION)
streaming_client.disconnect()
And I have no idea how to do this, the only thing I found is that someone did this:
json_obj.user.screen_name
However, this did not work at all, and I am completely stuck.
So a couple of things
Firstly, I'd recommend using on_response rather than on_data because StreamClient already defines a on_data function to parse the json. (Then it will fire on_tweet, on_response, on_error, etc)
Secondly, json_obj.user.screen_name is part of API v1 I believe, which is why it doesn't work.
To get extra data using Twitter Apiv2, you'll want to use Expansions and Fields (Tweepy Documentation, Twitter Documentation)
For your case, you'll probably want to use "username" which is under the user_fields.
def on_response(response:tweepy.StreamResponse):
tweet:tweepy.Tweet = response.data
users:list = response.includes.get("users")
# response.includes is a dictionary representing all the fields (user_fields, media_fields, etc)
# response.includes["users"] is a list of `tweepy.User`
# the first user in the list is the author (at least from what I've tested)
# the rest of the users in that list are anyone who is mentioned in the tweet
author_username = users and users[0].username
print(tweet.text, author_username)
streaming_client = tweepy.StreamingClient(bearer_token)
streaming_client.on_response = on_response
streaming_client.sample(threaded=True, user_fields = ["id", "name", "username"]) # using user fields
time.sleep(DURATION)
streaming_client.disconnect()
Hope this helped.
also tweepy documentation definitely needs more examples for api v2
KEY_FILE = './keys/bearer_token'
DURATION = 10
def on_data(json_data):
json_obj = json.loads(json_data.decode())
print('Received tweet:', json_obj)
with open('./collected_tweets/tweets.json', 'a') as out:
json.dump(json_obj, out)
bearer_token = open(KEY_FILE).read().strip()
streaming_client = tweepy.StreamingClient(bearer_token)
streaming_client.on_data = on_data
streaming_client.on_closed = on_finish
streaming_client.sample(threaded=True, expansions="author_id", user_fields="username", tweet_fields="created_at")
time.sleep(DURATION)
streaming_client.disconnect()
I'm trying to write a program that will stream tweets from Twitter using their Stream API and Tweepy. Here's the relevant part of my code:
def on_data(self, data):
if data.user.id == "25073877" or data.in_reply_to_user_id == "25073877":
self.filename = trump.csv
elif data.user.id == "30354991" or data.in_reply_to_user_id == "30354991":
self.filename = harris.csv
if not 'RT #' in data.text:
csvFile = open(self.filename, 'a')
csvWriter = csv.write(csvFile)
print(data.text)
try:
csvWriter.writerow([data.text, data.created_at, data.user.id, data.user.screen_name, data.in_reply_to_status_id])
except:
pass
def on_error(self, status_code):
if status_code == 420:
return False
What the code should be doing is streaming the tweets and writing the text of the tweet, the creation date, the user ID of the tweeter, their screen name, and the reply ID of the status they're replying to if the tweet is a reply. However, I get the following error:
File "test.py", line 13, in on_data
if data.user.id == "25073877" or data.in_reply_to_user_id == "25073877":
AttributeError: 'unicode' object has no attribute 'user'
Could someone help me out? Thanks!
EDIT: Sample of what is being read into "data"
{"created_at":"Fri Feb 15 20:50:46 +0000 2019","id":1096512164347760651,"id_str":"1096512164347760651","text":"#realDonaldTrump \nhttps:\/\/t.co\/NPwSuJ6V2M","source":"\u003ca href=\"http:\/\/twitter.com\" rel=\"nofollow\"\u003eTwitter Web Client\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":25073877,"in_reply_to_user_id_str":"25073877","in_reply_to_screen_name":"realDonaldTrump","user":{"id":1050189031743598592,"id_str":"1050189031743598592","name":"Lauren","screen_name":"switcherooskido","location":"United States","url":null,"description":"Concerned citizen of the USA who would like to see Integrity restored in the US Government. Anti-marxist!\nSigma, INTP\/J\nREJECT PC and Identity Politics #WWG1WGA","translator_type":"none","protected":false,"verified":false,"followers_count":1459,"friends_count":1906,"listed_count":0,"favourites_count":5311,"statuses_count":8946,"created_at":"Thu Oct 11 00:59:11 +0000 2018","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"000000","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"FF691F","profile_sidebar_border_color":"000000","profile_sidebar_fill_color":"000000","profile_text_color":"000000","profile_use_background_image":false,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/1068591478329495558\/ng_tNAXx_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/1068591478329495558\/ng_tNAXx_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/1050189031743598592\/1541441602","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"urls":[{"url":"https:\/\/t.co\/NPwSuJ6V2M","expanded_url":"https:\/\/www.conservativereview.com\/news\/5-insane-provisions-amnesty-omnibus-bill\/","display_url":"conservativereview.com\/news\/5-insane-\u2026","indices":[18,41]}],"user_mentions":[{"screen_name":"realDonaldTrump","name":"Donald J. Trump","id":25073877,"id_str":"25073877","indices":[0,16]}],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"und","timestamp_ms":"1550263846848"}
So I supposed the revised question is how to tell the program to only write parts of this JSON output to the CSV file? I've been using the references Twitter's stream API provides for the attributes for "data".
As stated in your comment the tweet data is in "JSON format". I believe what you mean by this is that it is a string (unicode) in JSON format, not a parsed JSON object. In order to access the fields like you want to in your code you need to parse the data string using json.
e.g.
import json
json_data_object = json.loads(data)
you can then access the fields like you would a dictionary e.g.
json_data_object['some_key']['some_other_key']
This is a very late answer, but I'm answering here because this is the first search hit when you search for this error. I was also using Tweepy and found that the JSON response object had attributes that could not be accessed.
'Response' object has no attribute 'text'
Through lots of tinkering and research, I found that in the loop where you access the Twitter API, using Tweepy, you must specify '.data' in the loop, not within it.
For example:
tweets = client.search_recent_tweets(query = "covid" , tweet.fields = ['text'])
for tweet in tweets:
print(tweet.text) # or print(tweet.data.text)
Will not work because the Response variable doesn't have access to the attributes within the JSON response object. Instead, you do something like:
tweets = client.search_recent_tweets(query = "covid" , tweet.fields = ['text'])
for tweet in tweets.data:
print(tweet.text)
Basically, this was a long-winded way to fix a problem I was having for a long time. Cheers, hopefully, other noobs like me won't have to struggle as long as I did!
I am trying to take a twitter stream, save it to a file and then analyze the contents. However I am having an issue with files generated in the program as opposed to create by CLI
Twitter Analysis program:
import json
import pandas as pd
import matplotlib.pyplot as plt
tweets_data = []
tweets_file = open(“test.txt”, “r”)
for line in tweets_file:
try:
tweet = json.loads(line)
tweets_data.append(tweet)
except:
continue
tweets = pd.DataFrame()
tweets[‘text’] = map(lambda tweet: tweet[‘text’], tweets_data)
However with the last line I keep getting “KeyError: ‘text’ “ which I understand to be related to the fact it can’t find the key.
When I first run the twitter search program, if I output the results to a file from the CLI, it works fine with no issues. If I save the output to a file from inside the program, it gives me the error?
Twitter Search program:
class Output(StreamListener):
def on_data(self, data):
with open(“test.txt”, “a”) as tf:
tf.write(data)
def on_error(self, status):
print status
L = Output()
auth = OAuthHandler(consKey, consSecret)
auth.set_access_token(Tok1, Tok2)
stream = Stream(auth, L)
stream.filter(track=[´cyber´])
If I run the above as is, analyzing the test.txt will give me the error. But if I remove the line and instead run the program as:
python TwitterSearch.py > test.txt
then it will work with no problem when running text.txt through the analysis program
I have tried changing the file handling from append to write which was of no help.
I also added the line:
print tweet[‘text’]
tweets[‘text’] = map(lambda tweet: tweet[‘text’], tweets_data)
This worked and showing that the program can see a value for the text key. I also compared the output file from the program and the CLI and could not see any difference. Please help me to understand and resolve the problem?
I'm writing a Python program which takes the Twitter name from one txt.file that contains a list of Twitter names, get the number of followers from the Twitter API and then write it to another txt.file. (every follower_count takes one line in the file I'm writing to.)
My program is now as follows, which contains some bugs and could anyone help me debug it. It's not running.
My program:
import tweepy
from tweepy import Stream
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
CONSUMER_KEY = 'abc'
CONSUMER_SECRET = 'abc'
ACCESS_KEY = 'abc'
ACCESS_SECRET = 'abc'
auth = OAuthHandler(CONSUMER_KEY,CONSUMER_SECRET)
api = tweepy.API(auth)
auth.set_access_token(ACCESS_KEY, ACCESS_SECRET)
f = open('Twitternames.txt', 'r')
for x in f:
class TweetListener(StreamListener):
# A listener handles tweets are the received from the stream.
#This is a basic listener that just prints received tweets to standard output
def on_data(self, data):
print data
return True
def on_error(self, status):
print status
#search
api = tweepy.API(auth)
twitterStream = Stream(auth,TweetListener())
test = api.lookup_users(screen_names=['x'])
for user in test:
print user.followers_count
#print it out and also write it into a file
f = open('followers_number.txt', 'w')
string = user.followers_count
f.write(string/n)
f.close()
I am getting the following error:
File "twittercount.py", line 21 def on_data(self, data): ^ IndentationError: expected an indented block
Every time you f = open('followers_number.txt', 'w') you overwrite the contents, open the file outside the loop and use a to append if you want to keep the data from previous run.
with open('followers_number.txt', 'a') as f: # with close your files automatically
for user in test:
print user.followers_count
#print it out and also write it into a file
s = user.followers_count
f.write(s +"\n") # add a newline with +
If user.followers_count returns an int you will need to use str(s)
You need to declare your class first not inside a loop and the methods should be inside the class:
# create class first
class TweetListener(StreamListener):
# A listener handles tweets are the received from the stream.
#This is a basic listener that just prints received tweets to standard output
def on_data(self, data): # indented inside the class
print(data)
return True
def on_error(self, status):
print(status)
# open both files outside the loop
with open('Twitternames.txt') as f,open('followers_number.txt', 'a') as f1:
for x in f:
#search
api = tweepy.API(auth)
twitterStream = Stream(auth,TweetListener())
test = api.lookup_users(screen_names=[x]) # pass the variable not "x"
for user in test:
print(user.followers_count)
#print it out and also write it into a file
s = user.followers_count
f1.write("{}\n".format(s)) # add a newline with +
I'm trying to get authenticated by an API I'm attempting to access. I'm using urllib.parse.urlencode to encode the parameters which go in my URL. I'm using urllib.request.urlopen to fetch the content.
This should return 3 values from the server, such as:
SID=AAAAAAAAAAA
LSID=BBBBBBBBBBB
AUTH=CCCCCCCCCCC
The problem is it only returns the first value, and the trailing new line character.
import urllib.request
import urllib.parse
Emailparamx = 'Email'
Emailparam = Emailparamx.encode('utf-8')
email = 'myemail#stackoverflow.com'
email = email.encode('utf-8')
Passwdparam = 'Passwd'
Passwdparam = Passwdparam.encode('utf-8')
password = 'hidden'
password = password.encode('utf-8')
Accounttypeparam = 'accountType'
Accounttypeparam = Accounttypeparam.encode('utf-8')
accounttype = 'GOOGLE'
accounttype = accounttype.encode('utf-8')
Serviceparam = 'service'
Serviceparam = Serviceparam.encode('utf-8')
service = 'adwords'
service = service.encode('utf-8')
url = 'https://accounts.google.com/ClientLogin?'
urlen = url.encode('utf-8')
data = [(Emailparamx, email), (Passwdparam, password),
(Accounttypeparam, accounttype), (Serviceparam, service)]
auth = ''
dataurl = urllib.parse.urlencode(data)
accessurl = (url + "%s" % dataurl)
fh = urllib.request.urlopen(accessurl)
equals = '='
eqenc = equals.encode('utf-8')
try:
msg = fh.readline().split(eqenc)
print (msg)
And then msg prints
[b'SID', b'AAAAAAAAAAAAAAAAA\n']
I know that's some seriously ugly code, I'm about a week old in Python. Any help would be greatly appreciated.
The problem is that you're only calling readline once, so it only reads one line. If you want to read the lines one by one, you have to keep calling readline in a loop until done:
while True:
msg = fh.readline()
if not msg:
break
msg = msg.split(eqenc)
print(msg)
However, there's really no good reason to call readline here, because any file-like object (including a urlopen object) is already an iterable full of lines, so you can just do this:
for msg in fh:
print(msg)
Meanwhile, your original code has a try without an except or a finally, which will just raise a SyntaxError. Presumably you wanted something like this:
try:
for msg in fh:
print(msg)
except Exception as e:
print('Exception: {}'.format(e))
While we're at it, we can simplify your code a bit.
If you look at the examples:
Here is an example session that uses the GET method to retrieve a URL containing parameters:
That's exactly what you want to do here (except for the last line). All the extra stuff you're doing with encoding the strings is not only unnecessary, but incorrect. UTF-8 is the wrong encoding is the wrong encoding to use for URLs (you get away with it because all of your strings are pure ASCII); urlopen requires a string rather than an encoded byte string (although, at least in CPython 3.0-3.3, it happens to work if you give it byte strings that happen to be encoded properly); urlencode can take byte strings but may not do the right thing (you want to give it the original Unicode so it can quote things properly); etc.
Also, you probably want to decode the result (which is sent as ASCII—for more complicated examples, you'll have to either parse the fh.getheader('Content-Type'), or read the documentation for the API), and strip the newlines.
You also may want to build a structure you can use in your code instead of just printing it out. For example, if you store the results in login_info, and you need the SID in a later request, it's just login_info['SID'].
So, let's wrap things up in a function, then call that function:
import urllib.request
import urllib.parse
def client_login(email, passwd, account_type, service):
params = {'Email': email,
'Passwd': passwd,
'accountType': account_type,
'service': service}
qs = urllib.parse.urlencode(params)
url = 'https://accounts.google.com/ClientLogin?'
with urllib.request.urlopen(url + qs) as fh:
return dict(line.strip().decode('ascii').split('=', 1) for line in fh)
email = 'myemail#stackoverflow.com'
password = 'hidden'
accounttype = 'GOOGLE'
service = 'adwords'
try:
results = client_login(email, password, accounttype, service)
for key, value in results.items():
print('key "{}" is "{}".format(key, value))
except Exception as e:
print('Exception: {}'.format(e))