Parsing tweets in json format to find tweeter users - python

I am reading a tweeter feed in json format to read the number of users.
Some lines in the input file might not be tweets, but messages that the Twitter server sent to the developer (such as limit notices). I need to ignore these messages.
These messages would not contain the created_at field and can be filtered out accordingly.
I have written the following piece of code, to extract the valid tweets, and then extract the user.id and the text.
def safe_parse(raw_json):
try:
json_object = json.loads(raw_json)
if 'created_at' in json_object:
return json_object
else:
return
except ValueError as error:
return
def get_usr_txt (line):
tmp = safe_parse(line)
if(tmp != None):
return ((tmp.get('user').get('id_str'),tmp.get('text')))
else:
return
My challenge is that I get one extra user called "None"
Here is a sample output (it is a large file)
('49838600', 'This is the temperament you look for in a guy who would
have access to our nuclear arsenal. ), None, ('2678507624', 'RT
#GrlSmile: #Ricky_Vaughn99 Yep, which is why in 1992 I switched from
Democrat to Republican to vote Pat Buchanan, who warned of all of
t…'),
I am struggling to find out, what I am doing wrong. There is no None in the tweeter file, hence I am assuming that I am reading the
{"limit":{"track":1,"timestamp_ms":"1456249416070"}} but the code above should not include it, unless I am missing something.
Any pointers? and thanks for the your help and your time.

Some lines in the input file might not be tweets, but messages that the Twitter server sent to the developer (such as limit notices). I need to ignore these messages.
That's not exactly what happens. If one of the following happens:
raw_json is not a valid JSON document
created_at is not in the parsed object.
you return with default value, which is None. If you want to ignore these, you can add filter step between two operations:
rdd.map(safe_parse).filter(lambda x: x).map(get_usr_txt)
You can also use flatMap trick to avoid filter and simplify your code (borrowed from this answer by zero323):
def safe_parse(raw_json):
try:
json_object = json.loads(raw_json)
except ValueError as error:
return []
else:
if 'created_at' in json_object:
yield json_object
rdd.flatMap(safe_parse).map(get_usr_txt)

Related

How to get twitter handle from tweet using Tweepy API 2.0

I am using the Twitter API StreamingClient using the python module Tweepy. I am currently doing a short stream where I am collecting tweets and saving the entire ID and text from the tweet inside of a json object and writing it to a file.
My goal is to be able to collect the Twitter handle from each specific tweet and save it to a json file (preferably print it in the output terminal as well).
This is what the current code looks like:
KEY_FILE = './keys/bearer_token'
DURATION = 10
def on_data(json_data):
json_obj = json.loads(json_data.decode())
#print('Received tweet:', json_obj)
print(f'Tweet Screen Name: {json_obj.user.screen_name}')
with open('./collected_tweets/tweets.json', 'a') as out:
json.dump(json_obj, out)
bearer_token = open(KEY_FILE).read().strip()
streaming_client = tweepy.StreamingClient(bearer_token)
streaming_client.on_data = on_data
streaming_client.sample(threaded=True)
time.sleep(DURATION)
streaming_client.disconnect()
And I have no idea how to do this, the only thing I found is that someone did this:
json_obj.user.screen_name
However, this did not work at all, and I am completely stuck.
So a couple of things
Firstly, I'd recommend using on_response rather than on_data because StreamClient already defines a on_data function to parse the json. (Then it will fire on_tweet, on_response, on_error, etc)
Secondly, json_obj.user.screen_name is part of API v1 I believe, which is why it doesn't work.
To get extra data using Twitter Apiv2, you'll want to use Expansions and Fields (Tweepy Documentation, Twitter Documentation)
For your case, you'll probably want to use "username" which is under the user_fields.
def on_response(response:tweepy.StreamResponse):
tweet:tweepy.Tweet = response.data
users:list = response.includes.get("users")
# response.includes is a dictionary representing all the fields (user_fields, media_fields, etc)
# response.includes["users"] is a list of `tweepy.User`
# the first user in the list is the author (at least from what I've tested)
# the rest of the users in that list are anyone who is mentioned in the tweet
author_username = users and users[0].username
print(tweet.text, author_username)
streaming_client = tweepy.StreamingClient(bearer_token)
streaming_client.on_response = on_response
streaming_client.sample(threaded=True, user_fields = ["id", "name", "username"]) # using user fields
time.sleep(DURATION)
streaming_client.disconnect()
Hope this helped.
also tweepy documentation definitely needs more examples for api v2
KEY_FILE = './keys/bearer_token'
DURATION = 10
def on_data(json_data):
json_obj = json.loads(json_data.decode())
print('Received tweet:', json_obj)
with open('./collected_tweets/tweets.json', 'a') as out:
json.dump(json_obj, out)
bearer_token = open(KEY_FILE).read().strip()
streaming_client = tweepy.StreamingClient(bearer_token)
streaming_client.on_data = on_data
streaming_client.on_closed = on_finish
streaming_client.sample(threaded=True, expansions="author_id", user_fields="username", tweet_fields="created_at")
time.sleep(DURATION)
streaming_client.disconnect()

I am looking to create an API endpoint route that returns txt in a json format -Python

I'm new to developing and my question(s) involves creating an API endpoint in our route. The api will be used for a POST from a Vuetify UI. Data will come from our MongoDB. We will be getting a .txt file for our shell script but it will have to POST as a JSON. I think these are the steps for converting the text file:
1)create a list for the lines of the .txt
2)add each line to the list
3) join the list elements into a string
4)create a dictionary with the file/file content and convert it to JSON
This is my current code for the steps:
import json
something.txt: an example of the shell script ###
f = open("something.txt")
create a list to put the lines of the file in
file_output = []
add each line of the file to the list
for line in f:
file_output.append(line)
mashes all of the list elements together into one string
fileoutput2 = ''.join(file_output)
print(fileoutput2)
create a dict with file and file content and then convert to JSON
json_object = {"file": fileoutput2}
json_response = json.dumps(json_object)
print(json_response)
{"file": "Hello\n\nSomething\n\nGoodbye"}
I have the following code for my baseline below that I execute on my button press in the UI
#bp_customer.route('/install-setup/<string:customer_id>', methods=['POST'])
def install_setup(customer_id):
cust = Customer()
customer = cust.get_customer(customer_id)
### example of a series of lines with newline character between them.
script_string = "Beginning\nof\nscript\n"
json_object = {"file": script_string}
json_response = json.dumps(json_object)
get the install shell script content
replace the values (somebody has already done this)
attempt to return the below example json_response
return make_response(jsonify(json_response), 200)
my current Vuetify button press code is here: so I just have to ammend it to a POST and the new route once this is established
onClickScript() {
console.log("clicked");
axios
.get("https://sword-gc-eadsusl5rq-uc.a.run.app/install-setup/")
.then((resp) => {
console.log("resp: ", resp.data);
this.scriptData = resp.data;
});
},
I'm having a hard time combining these 2 concepts in the correct way. Any input as to whether I'm on the right path? Insight from anyone who's much more experienced than me?
You're on the right path, but needlessly complicating things a bit. For example, the first bit could be just:
import json
with open("something.txt") as f:
json_response = json.dumps({'file': f.read()})
print(json_response)
And since you're looking to pass everything through jsonify anyway, even this would suffice:
with open("something.txt") as f:
data = {'file': f.read()}
Where you can pass data directly through jsonify. The rest of it isn't sufficiently complete to offer any concrete comments, but the basic idea is OK.
If you have a working whole, you could go to https://codereview.stackexchange.com/ to ask for some reviews, you should limit questions on StackOverflow to actual questions about getting something to work.

How does django really handle multiple requests on development server?

I am making a little django app to serve translations for my react frontend. The way it works is as follows:
The frontend tries to find a translation using a key.
If the translation for that key is not found, It sends a request to the backend with the missing key
On the backend, the missing key is appended to a json file
Everything works just fine when the requests are sent one at a time (when one finishes, the other is sent). But when multiple requests are sent at the same time, everything breaks. The json file gets corrupted. It's like all the requests are changing the file at the same time which causes this to happen. I am not sure if that's the case because I think that the file can not be edited by two processes at the same time(correct me if I am wrong) but I don't receive such an error which indicates that the requests are handled one at a time according to this and this
Also, I tried something, which to my surprise worked, that is to add time.sleep(1) to the top of my api view. When I did this, everything worked as expected.
What is going on ?
Here is the code, just in case it matters:
#api_view(['POST'])
def save_missing_translation_keys(request, lng, ns):
time.sleep(1)
missing_trans_path = MISSING_TRANS_DIR / f'{lng}.json'
# Read lng file and get current missing keys for given ns
try:
with open(missing_trans_path, 'r', encoding='utf-8') as missing_trans_file:
if is_file_empty(missing_trans_path):
missing_keys_dict = {}
else:
missing_keys_dict = json.load(missing_trans_file)
except FileNotFoundError:
missing_keys_dict = {}
except Exception as e:
# Even if file is not empty, we might not be able to parse it for some reason, so we log any errors in log file
with open(MISSING_LOG_FILE, 'a', encoding='utf-8') as logFile:
logFile.write(
f'could not save missing keys {str(list(request.data.keys()))}\nnamespace {lng}/{ns} file can not be parsed because\n{str(e)}\n\n\n')
raise e
# Add new missing keys to the list above.
ns_missing_keys = missing_keys_dict.get(ns, [])
for missing_key in request.data.keys():
if missing_key and isinstance(missing_key, str):
ns_missing_keys.append(missing_key)
else:
raise ValueError('Missing key not allowed')
missing_keys_dict.update({ns: list(set(ns_missing_keys))})
# Write new missing keys to the file
with open(missing_trans_path, 'w', encoding='utf-8') as missing_trans_file:
json.dump(missing_keys_dict, missing_trans_file, ensure_ascii=False)
return Response()

Attribute error when using user object on tweepy

I'm trying to write a program that will stream tweets from Twitter using their Stream API and Tweepy. Here's the relevant part of my code:
def on_data(self, data):
if data.user.id == "25073877" or data.in_reply_to_user_id == "25073877":
self.filename = trump.csv
elif data.user.id == "30354991" or data.in_reply_to_user_id == "30354991":
self.filename = harris.csv
if not 'RT #' in data.text:
csvFile = open(self.filename, 'a')
csvWriter = csv.write(csvFile)
print(data.text)
try:
csvWriter.writerow([data.text, data.created_at, data.user.id, data.user.screen_name, data.in_reply_to_status_id])
except:
pass
def on_error(self, status_code):
if status_code == 420:
return False
What the code should be doing is streaming the tweets and writing the text of the tweet, the creation date, the user ID of the tweeter, their screen name, and the reply ID of the status they're replying to if the tweet is a reply. However, I get the following error:
File "test.py", line 13, in on_data
if data.user.id == "25073877" or data.in_reply_to_user_id == "25073877":
AttributeError: 'unicode' object has no attribute 'user'
Could someone help me out? Thanks!
EDIT: Sample of what is being read into "data"
{"created_at":"Fri Feb 15 20:50:46 +0000 2019","id":1096512164347760651,"id_str":"1096512164347760651","text":"#realDonaldTrump \nhttps:\/\/t.co\/NPwSuJ6V2M","source":"\u003ca href=\"http:\/\/twitter.com\" rel=\"nofollow\"\u003eTwitter Web Client\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":25073877,"in_reply_to_user_id_str":"25073877","in_reply_to_screen_name":"realDonaldTrump","user":{"id":1050189031743598592,"id_str":"1050189031743598592","name":"Lauren","screen_name":"switcherooskido","location":"United States","url":null,"description":"Concerned citizen of the USA who would like to see Integrity restored in the US Government. Anti-marxist!\nSigma, INTP\/J\nREJECT PC and Identity Politics #WWG1WGA","translator_type":"none","protected":false,"verified":false,"followers_count":1459,"friends_count":1906,"listed_count":0,"favourites_count":5311,"statuses_count":8946,"created_at":"Thu Oct 11 00:59:11 +0000 2018","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"000000","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"FF691F","profile_sidebar_border_color":"000000","profile_sidebar_fill_color":"000000","profile_text_color":"000000","profile_use_background_image":false,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/1068591478329495558\/ng_tNAXx_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/1068591478329495558\/ng_tNAXx_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/1050189031743598592\/1541441602","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"urls":[{"url":"https:\/\/t.co\/NPwSuJ6V2M","expanded_url":"https:\/\/www.conservativereview.com\/news\/5-insane-provisions-amnesty-omnibus-bill\/","display_url":"conservativereview.com\/news\/5-insane-\u2026","indices":[18,41]}],"user_mentions":[{"screen_name":"realDonaldTrump","name":"Donald J. Trump","id":25073877,"id_str":"25073877","indices":[0,16]}],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"und","timestamp_ms":"1550263846848"}
So I supposed the revised question is how to tell the program to only write parts of this JSON output to the CSV file? I've been using the references Twitter's stream API provides for the attributes for "data".
As stated in your comment the tweet data is in "JSON format". I believe what you mean by this is that it is a string (unicode) in JSON format, not a parsed JSON object. In order to access the fields like you want to in your code you need to parse the data string using json.
e.g.
import json
json_data_object = json.loads(data)
you can then access the fields like you would a dictionary e.g.
json_data_object['some_key']['some_other_key']
This is a very late answer, but I'm answering here because this is the first search hit when you search for this error. I was also using Tweepy and found that the JSON response object had attributes that could not be accessed.
'Response' object has no attribute 'text'
Through lots of tinkering and research, I found that in the loop where you access the Twitter API, using Tweepy, you must specify '.data' in the loop, not within it.
For example:
tweets = client.search_recent_tweets(query = "covid" , tweet.fields = ['text'])
for tweet in tweets:
print(tweet.text) # or print(tweet.data.text)
Will not work because the Response variable doesn't have access to the attributes within the JSON response object. Instead, you do something like:
tweets = client.search_recent_tweets(query = "covid" , tweet.fields = ['text'])
for tweet in tweets.data:
print(tweet.text)
Basically, this was a long-winded way to fix a problem I was having for a long time. Cheers, hopefully, other noobs like me won't have to struggle as long as I did!

Not receiving a return message in python 2 code

I'm pretty new to python and just learning to ropes. In the code bellow I have a function taking several inputs from a json string. I'm attempting to have a return output in the specified strings. Problem? when I run the file I get nothing... I'm sure I'm missing something incredibly simply, but for the life of me I can't figure out what. I've attempted to use return as well as print at the end of the function. No cheese.
Help?
Here's what I've got so far:
import datetime, json
def jeeves(request): #defines the function
message=''
if request['type']=='maintainance':
message='Thank you tenant at unit'+str(request['unit'])+', your request for maintenance to deal with '+'"'+str(request['issue'])+'"'+' has been received #2 input'
elif request['type']=='purchase':
message='Thank you tenant at unit'+str(request['unit'])+'your request to purchase a'+str(request['commodity'])+ ' has been received'
elif request['type']=='reservation':
startTime=request['date'].split(" ")[1]
startTime=startTime.split('')
time=0;
num=[]
for item in startTime:
if isdigit(item):
num.append(item)
for index in range(len(num)):
time+=num[index]*10**(len(num)-index)
endTime=0
daySplit=''.join(startTime[-2:])
if time+int(request['duration'].split(' ')[0])>12:
endTime=time+int(request['duration'].split(' ')[0])-12
if daySplit=='AM':
endTime=str(endTime)+'PM'
else:
endTime=str(endTime)+'AM'
else:
endTime=endTime+int(request['duration'].split(' ')[0])
endTime=str(endTime)+daySplit
message='Thank you tenant at unit'+str(request['unit'])+'your request to reserve our '+str(request['location'])+' on '+str(request['date'].split(' ')[0])+' from '+str(request['date'].split(' ')[1])+' to '+ endTime+' has been received'
elif request['type']=='complaint':
message='Thank you tenant at unit'+str(request['unit'])+' we will have someone follow up on '+'"'+request['issue']+'"'+' in regards to our '+request['location']
return message
print message
json.dumps(jeeves({"type":"maintenance", "unit":221, "issue":"Air filter needs replacing"}))
ps: I'm new to coding in general. If there is a better, more productive way for me to ask questions, I'm open to feedback. Thank you in advanced.
You have to put return before the print function because when you use return it ends a function. You might also want to check out what return actually does here

Categories

Resources