Convert Tweepy streaming data to Dictionary - Python - python

I have been to various links, did hours of googling but could not find a simple way to convert the JSON data received from Tweepy StreamingListener() to python dictionary so that it can be used with pandas DataFrame.
What i did was save the data received to a json file and then read using json library. But there are various errors. I've also tried saving stream data to list and then convert to dictionary but of no use.
Here is my code:
class StreamCollector(StreamListener):
def __init__(self, api=None):
super(StreamListener, self).__init__()
self.num_tweets = 0
def on_data(self, raw_data):
try:
with open('java.json', 'a') as f:
f.write(raw_data)
self.num_tweets += 1
if self.num_tweets > 4:
return False
else:
return True
except BaseException as base_ex:
print(base_ex)
return False
def on_error(self, status_code):
print("Error Status code: --> {}".format(status_code))
return True
try:
twitterStream = Stream(auth, StreamCollector())
twitterStream.filter(track=['#Java'])
tweetDict = json.loads('java.json')
print(type(tweetDict))
print(tweetDict)
except TweepError as e:
print(e)
The above code produces following error:
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
EDIT- I checked my JSON and it appears to me that instead of one object, JSON has multiple objects which throws an error
eg:
{"name":"abc","created_at":"abc date"} //No comma
{"name":"xyz","created_at":"xyz date"}
The JSON file does not even have a root object or an array
How should i correct it?

Related

Byte Json cannot be loaded after transforming to string json

The Json that its receiving in message is a byte json like so: b'{"_timestamp": 1636472787, "actual": 59.9, "target": 60.0}'
The Code is supposed to change the byte Json to String Json and load it to access the items but when I load it I get the following error:
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Code:
import json
def handle_msg(topic, message):
m = message.decode("Utf-8")
print(json.loads(m))
this is happening because you message is null value not as you expected
if you write the following it will work for you
the following running for me
message = b'{"_timestamp": 1636472787, "actual": 59.9, "target": 60.0}'
topic ="what ever"
import json
def handle_msg(topic, message):
m = message.decode("Utf-8")
print(json.loads(m))
handle_msg(topic, message)

json.decoder.JSONDecodeError while converting response into json

I am doing a task in which I have to fetch the data for 20,000 records from external API. for this I am using requests module in python below is my code
def getdata():
datanotfound=[]
""" fetching the values from database to pass in the url """
values = mongo.db.collection.find()
for value in values:
value = pro.get("name")
""" adding the value parameter into the url and fetching the data """
r = requests.get('myurl'+ value)
if r!= None:
response = r.json()
info = response.get(val1)
if info != None:
val2 = info.get("val2")
val3 = info.get("val3")
val4 = info.get("val4")
val5 = info.get("val5")
""" saving the response in mongodb """
mongo.db.collection.insert_one({
'name':val1,
"address":val2,
"length":val3,
"width":val4,
"data":val5,
})
else:
""" sending the values for which response was None """
datanotfound.append(value)
return jsonify({'datanotfound':datanotfound})
This runs fine for some number of records but after some times i am gettting following errors:
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 5 (char 5)
how can i get rid of this error.

Python - IBM Watson Speech to Text 'NoneType' object has no attribute 'get_result'

I'm developing a program with IBM Watson Speech to Text and currently using Python 2.7. Here's a stub of some code for development:
class MyRecognizeCallback(RecognizeCallback):
def __init__(self):
RecognizeCallback.__init__(self)
def on_data(self, data):
pass
def on_error(self, error):
pass
def on_inactivity_timeout(self, error):
pass
speech_to_text = SpeechToTextV1(username='*goes here*', password='*goes here*')
speech_to_text.set_detailed_response(True)
f = '/home/user/file.wav'
rate, data = wavfile.read(f)
work = data.tolist()
with open(f, 'rb') as audio_file:
# Get IBM Watson analytics
currentModel = "en-US_NarrowbandModel" if rate <= 8000 else "en-US_BroadbandModel"
x = ""
print(" - " + f)
try:
# Callback info
myRecognizeCallback = MyRecognizeCallback()
# X represents the responce from Watson
audio_source = AudioSource(audio_file)
my_result = speech_to_text.recognize_using_websocket(
audio_source,
content_type='audio/wav',
timestamps=True,
recognize_callback=myRecognizeCallback,
model=currentModel,
inactivity_timeout=-1,
max_alternatives=0)
x = json.loads(json.dumps(my_result, indent=2), object_hook=lambda d: n
namedtuple('X', d.keys())(*d.values()))
What I'm expecting to be returned is a JSON object with the results of the file given the above parameters. What instead I'm recieving is an error that looks like this:
Error received: 'NoneType' object has no attribute 'connected'
That's the entire traceback - no other errors than that. However, when I try to access the JSON object in further code, I get this error:
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/watson_developer_cloud/websocket/recognize_listener.py", line 96, in run
chunk = self.audio_source.input.read(ONE_KB)
ValueError: I/O operation on closed file
Did I forget something or put something in the wrong place?
Edit:
My original code had an error in it that I fixed myself. Regardless, I'm still getting the original error. Here's the update:
my_result = speech_to_text.recognize_using_websocket(
audio_source,
content_type='audio/wav',
timestamps=True,
recognize_callback=myRecognizeCallback,
model=currentModel,
inactivity_timeout=None,
max_alternatives=None).get_result()
x = json.loads(json.dumps(my_result, indent=2), object_hook=lambda d: namedtuple('X', d.keys())(*d.values()))
Take a look at object_hook=lambda d: n, in python lambda d: n means "a function that takes d, ignores d, and returns n".
I'm guessing n is set to None somewhere else.
If that doesn't work, it may be easier to debug if you break your lambda into a separate function, def to_named_tuple(object): perhaps.

Python 3 json.loads - json.decoder error

I'm trying to parse a json but it doesn't work.
I remove the try and except in my code so you can see the Error Massege.
import sqlite3
import json
import codecs
conn = sqlite3.connect('geodata.sqlite')
cur = conn.cursor()
cur.execute('SELECT * FROM Locations')
fhand = codecs.open('where.js','w', "utf-8")
fhand.write("myData = [\n")
count = 0
for row in cur :
data = str(row[1])
print (data)
print (type(data))
#try:
js = json.loads(data)
#except: continue
if not('status' in js and js['status'] == 'OK') : continue
lat = js["results"][0]["geometry"]["location"]["lat"]
lng = js["results"][0]["geometry"]["location"]["lng"]
if lat == 0 or lng == 0 : continue
where = js['results'][0]['formatted_address']
where = where.replace("'","")
try :
print (where, lat, lng)
count = count + 1
if count > 1 : fhand.write(",\n")
output = "["+str(lat)+","+str(lng)+", '"+where+"']"
fhand.write(output)
except:
continue
fhand.write("\n];\n")
cur.close()
fhand.close()
print (count, "records written to where.js")
print ("Open where.html to view the data in a browser")
My problem is that
js = json.loads(data)
can't parse it for some reason and I get the following exception:
"raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)"
I thought it becuase the data type but its doing a weird thing.
I'm asking for type(data) and I'm getting str type, but when I print data I get Byte type.
Full output for the code:
Traceback (most recent call last):
File "C:/Users/user/Desktop/Courses Online/Coursera/Using Databases with Python/geodata/geodump.py", line 17, in <module>
js = json.loads(data)
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\json\__init__.py", line 319, in loads
return _default_decoder.decode(s)
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\json\decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\json\decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
b'{\n "results" : [\n {\n "address_components" : [\n {\n ...... long json line......
<class 'str'>
I also try to use decode("utf-8") on data , but I'm getting the following Error: 'str' object has no attribute 'decode'
js = json.loads(data.decode('utf8'))
solved the same problem for me.
You are converting a bytes value to a string the wrong way here:
data = str(row[1])
You forced it to be a str() object, but for bytes objects that'll include the b prefix and quotes, because bytes objects don't have a __str__ method, only __repr__ so you get a debug representation.
Decode the row without converting to a string:
data = row[1].decode('utf8')
You really shouldn't hand-craft JSON / Javascript output in your code either. Just use json.dumps(); if you must use per-row streaming, you can still use json.dump() to create each list entry:
import sqlite3
import json
conn = sqlite3.connect('geodata.sqlite')
cur = conn.cursor()
cur.execute('SELECT * FROM Locations')
with open('where.js', 'w', encoding="utf-8") as fhand:
fhand.write("myData = [\n")
for count, row in enumerate(row):
try:
js = json.loads(row[1].decode('utf8'))
except json.JSONDecodeError:
print('Could not decode a row: ', row[1])
continue
if js.get('status') != 'OK':
continue
lat = js["results"][0]["geometry"]["location"]["lat"]
lng = js["results"][0]["geometry"]["location"]["lng"]
if not (lat and lng):
continue
where = js['results'][0]['formatted_address']
where = where.replace("'", "")
print (where, lat, lng)
if count: fhand.write(",\n")
json.dump([lat, lng, where], fhand)
fhand.write("\n];\n")
This uses plain open() (in Python 3, there is never a need to use codecs.open()), uses the file as a context manager, and adds in enumerate() to track if you have the first row processed yet.

Multiple Term search by following multiple users using Streaming API

I am trying to Retrieve multiple keyword term tweets by following specific group of users. Using the code below:
I have posted one more code before that regarding issues for value error:
I figure it out somehow but again I am stuck because of this traceback
import tweepy
from tweepy.error import TweepError
consumer_key=('ABC'),
consumer_secret=('ABC'),
access_key=('ABC'),
access_secret=('ABC')
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api=tweepy.API(auth)
class CustomStreamListener(tweepy.StreamListener):
def on_status(self, status):
try:
print "%s\t%s\t%s\t%s" % (status.text,
status.author.screen_name,
status.created_at,
status.source,)
except Exception, e:
print error
#def filter(self, follow=None, track=None, async=False, locations=None):
#self.parameters = {}
#self.headers['Content-type'] = "application/x-www-form-urlencoded"
#if self.running:
#raise TweepError('Stream object already connected!')
#self.url = '/%i/statuses/filter.json?delimited=length' % STREAM_VERSION
def filter(self, follow=None, track=None, async=False, locations=None):
self.parameters = {}
self.headers['Content-type'] = "application/x-www-form-urlencoded"
if self.running:
raise TweepError('Stream object already connected!')
self.url = '/%i/statuses/filter.json?delimited=length' % STREAM_VERSION
if obey:
self.parameters['follow'] = ' '.join(map(str, obey))
if track:
self.parameters['track'] = ' '.join(map(str, track))
if locations and len(locations) > 0:
assert len(locations) % 4 == 0
self.parameters['locations'] = ' '.join('%.2f' % l for l in locations)
self.body = urllib.urlencode(self.parameters)
self.parameters['delimited'] = 'length'
self._start(async)
def on_error(self, status_code):
return True
streaming_api = tweepy.streaming.Stream(auth, CustomStreamListener(), timeout=60)
list_users = ['17006157','59145948','157009365','16686144','68044757','33338729']#Some ids
list_terms = ['narendra modi','robotics']#Some terms
streaming_api.filter(follow=[list_users])
streaming_api.filter(track=[list_terms])
I am getting a traceback:
Traceback (most recent call last):
File "C:\Python27\nytimes\26052014\Multiple term search with multiple addreses.py", line 49, in <module>
streaming_api.filter(follow=[list_users])
File "build\bdist.win32\egg\tweepy\streaming.py", line 296, in filter
encoded_follow = [s.encode(encoding) for s in follow]
AttributeError: 'list' object has no attribute 'encode'
Please help me resolving the issue.
You define list_users here
list_users = ['17006157','59145948','157009365','16686144','68044757','33338729']
and then you pass it to streaming_api.filter like this
streaming_api.filter(follow=[list_users])
When the streaming_api.filter function is iterating over the value you pass as follow, it gives the error
AttributeError: 'list' object has no attribute 'encode'
The reason for this is as follows
You call streaming_api.filter like this
streaming_api.filter(follow=[list_users])
Here
streaming_api.filter(follow=[list_users])
you are trying to pass your list as value for follow, however because you put list_users in enclosing [] you are creating a list in a list. Then streaming_api.filter iterates over follow, calling .encode on each entry as we see here
[s.encode(encoding) for s in follow]
But the entry s is a list while it should be a string.
That is because you accidentally created a list in a list like you can see above.
The solution is simple. Change
streaming_api.filter(follow=[list_users])
to
streaming_api.filter(follow=list_users)
To pass a list to a function, you can just specify the name. No need to enclose it in []
Same applies to the last line. Change
streaming_api.filter(track=[list_terms])
to
streaming_api.filter(track=list_terms)

Categories

Resources