Trying to parse JSON data from a url with python - python

I am trying to make a IP details grabber via python and a JSON API but I am having trouble parsing the JSON data.
I've tried loading and dumping the data, none of that works so I have 0 idea how I am going to parse this data.
#Importing
import requests
import json
import os
#Variables
cls = os.system('cls')
#Startup
cls #Clearing the console on startup
ipToSearch = input("Please enter the IP you wish to search: ")
saveDetails = input("Would you like to save the IP's deatils to a file? Y/N: ")
ip_JSON = requests.get(url="http://ip-api.com/json/" + ipToSearch).json()
ip_Data = json.loads(ip_JSON)
print(ip_Data)
I am trying to parse the IP's information but the result is this error currently.
Traceback (most recent call last):
File "main.py", line 16, in <module>
ip_Data = json.loads(ip_JSON)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37-32\lib\json\__init__.py", line 341, in loads
raise TypeError(f'the JSON object must be str, bytes or bytearray, '
TypeError: the JSON object must be str, bytes or bytearray, not dict

The traceback is because it looks like you already converted it to json on the previous line .json() and then you try to do it again.
ip_JSON = requests.get(url="http://ip-api.com/json/" + ipToSearch).json()
ip_Data = json.loads(ip_JSON)
Try
ip_JSON = requests.get(url="http://ip-api.com/json/" + ipToSearch).json()
print(ip_JSON)

try json.dumps, like this
ip_JSON = requests.get(url="http://ip-api.com/json/" + ipToSearch).json()
ip_Data = json.dumps(ip_JSON)

Related

Exception: TypeError(string indices must be integers)

I have written the below python function (a snippet of the full code) to work in AWS Lambda. The purpose of it is to take a GeoJSON from an S3 bucket and parse it accordingly.
Once parsed, it is placed back into JSON format (data) and then should be inserted into the specified database using
bulk_item['uuid'] = str(uuid.uuid4())
bulk_item['name'] = feature_name
bulk_item['type'] = feature_type
bulk_item['info'] = obj
bulk_item['created'] = epoch_time
bulk_item['scope'] = 2
data = json.dumps(bulk_item)
print(data)
self.database.upsert_record(self.organisation, json_doc=data)
except Exception as e:
print(f'Exception: {e.__class__.__name__}({e})')
The db_access file in which the above is relating to is another python script. The function upsert_record is as below:
def upsert_record(self, organisation,
json_doc={}):
My code is working perfectly until I try to upsert it into the database. Once this line is gotten to, it throws the error
Traceback (most recent call last):
File "/var/task/s3_asset_handler.py", line 187, in process_incoming_file
self.database.upsert_record(self.organisation, json_doc=data)
File "/opt/python/database_access.py", line 1218, in upsert_record
new_uuid = json_doc['uuid']
TypeError: string indices must be integers
I can't seem to figure out the issue at all
You are trying to get an element from a JSON object, but passing a string.
The
data = json.dumps(bulk_item)
creates a string representing the object.
Try using bulk_item on it's own.

Creating a program to store all the tweets off a page to be stored in a text file for analysis, however getting TypeErrors

I have started off by including Tweepy and managed to get a program that is able to output a correct output with a search parameter, however, while trying to create a program that can save and store a person timeline for data analysis I came across a TypeError: must be str, not ResultSet.
import tweepy
#API keys access
auth = tweepy.OAuthHandler("", "")
auth.set_access_token("", "")
client = tweepy.API(auth)
#Opening a file with the name of the user wanted
screen = input("Enter the screen name: ")
filename = (screen+".txt")
file = open(filename, "w")
#Getting the Users time line
user = client.get_user(screen_name=screen)
timeline = user.timeline()
#Writing new found data to the file.
file.write(timeline)
file.close()
This code keeps on spitting out the error:
Traceback (most recent call last):
File "GetTimeLine.py", line 20, in <module>
file.write(timeline)
TypeError: must be str, not ResultSet
However for the set line:
file.write(timeline)
I add where I want it to become str through the user of
file.write(str(timeline))
Throwing out an entirely different error.
Traceback (most recent call last):
File "GetTimeLine.py", line 20, in <module>
file.write(str(timeline))
File "C:\Program Files (x86)\Python\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 4122-4123: character maps to <undefined>
To try and fix this I tried to add .(encode"utf-8") however with no luck.
Any help greatly appreciated.
Try this, I searched for how to convert ResultSet to string and found this
unicode.join(u'\n',map(unicode,result))
Here "result" is your "timeline" variable, just put it and you will get it in string format. Doing simply str(timeline) will not encode it , that's why you are getting another error.
Just try it:) Cheers

Getting KeyError when parsing JSON containing three layers of keys, using Python

I'm building a Python program to parse some calls to a social media API into CSV and I'm running into an issue with a key that has two keys above it in the hierarchy. I get this error when I run the code with PyDev in Eclipse.
Traceback (most recent call last):
line 413, in <module>
main()
line 390, in main
postAgeDemos(monitorID)
line 171, in postAgeDemos
age0To17 = str(i["ageCount"]["sortedAgeCounts"]["ZERO_TO_SEVENTEEN"])
KeyError: 'ZERO_TO_SEVENTEEN'
Here's the section of the code I'm using for it. I have a few other functions built already that work with two layers of keys.
import urllib.request
import json
def postAgeDemos(monitorID):
print("Enter the date you'd like the data to start on")
startDate = input('The date must be in the format YYYY-MM-DD. ')
print("Enter the date you'd like the data to end on")
endDate = input('The date must be in the format YYYY-MM-DD. ')
dates = "&start="+startDate+"&end="+endDate
urlStart = getURL()
authToken = getAuthToken()
endpoint = "/monitor/demographics/age?id=";
urlData = urlStart+endpoint+monitorID+authToken+dates
webURL = urllib.request.urlopen(urlData)
fPath = getFilePath()+"AgeDemographics"+startDate+"&"+endDate+".csv"
print("Connecting...")
if (webURL.getcode() == 200):
print("Connected to "+urlData)
print("This query returns information in a CSV file.")
csvFile = open(fPath, "w+")
csvFile.write("postDate,totalPosts,totalPostsWithIdentifiableAge,0-17,18-24,25-34,35+\n")
data = webURL.read().decode('utf8')
theJSON = json.loads(data)
for i in theJSON["ageCounts"]:
postDate = i["startDate"]
totalDocs = str(i["numberOfDocuments"])
totalAged = str(i["ageCount"]["totalAgeCount"])
age0To17 = str(i["ageCount"]["sortedAgeCounts"]["ZERO_TO_SEVENTEEN"])
age18To24 = str(i["ageCount"]["sortedAgeCounts"]["EIGHTEEN_TO_TWENTYFOUR"])
age25To34 = str(i["ageCount"]["sortedAgeCounts"]["TWENTYFIVE_TO_THIRTYFOUR"])
age35Over = str(i["ageCount"]["sortedAgeCounts"]["THIRTYFIVE_AND_OVER"])
csvFile.write(postDate+","+totalDocs+","+totalAged+","+age0To17+","+age18To24+","+age25To34+","+age35Over+"\n")
print("File printed to "+fPath)
csvFile.close()
else:
print("Server Error, No Data" + str(webURL.getcode()))
Here's a sample of the JSON I'm trying to parse.
{"ageCounts":[{"startDate":"2016-01-01T00:00:00","endDate":"2016-01-02T00:00:00","numberOfDocuments":520813,"ageCount":{"sortedAgeCounts":{"ZERO_TO_SEVENTEEN":3245,"EIGHTEEN_TO_TWENTYFOUR":4289,"TWENTYFIVE_TO_THIRTYFOUR":2318,"THIRTYFIVE_AND_OVER":70249},"totalAgeCount":80101}},{"startDate":"2016-01-02T00:00:00","endDate":"2016-01-03T00:00:00","numberOfDocuments":633709,"ageCount":{"sortedAgeCounts":{"ZERO_TO_SEVENTEEN":3560,"EIGHTEEN_TO_TWENTYFOUR":1702,"TWENTYFIVE_TO_THIRTYFOUR":2786,"THIRTYFIVE_AND_OVER":119657},"totalAgeCount":127705}}],"status":"success"}
Here it is again with line breaks so it's a little easier to read.
{"ageCounts":[{"startDate":"2016-01-01T00:00:00","endDate":"2016-01-02T00:00:00","numberOfDocuments":520813,"ageCount":
{"sortedAgeCounts":{"ZERO_TO_SEVENTEEN":3245,"EIGHTEEN_TO_TWENTYFOUR":4289,"TWENTYFIVE_TO_THIRTYFOUR":2318,"THIRTYFIVE_AND_OVER":70249},"totalAgeCount":80101}},
{"startDate":"2016-01-02T00:00:00","endDate":"2016-01-03T00:00:00","numberOfDocuments":633709,"ageCount":
{"sortedAgeCounts":{"ZERO_TO_SEVENTEEN":3560,"EIGHTEEN_TO_TWENTYFOUR":1702,"TWENTYFIVE_TO_THIRTYFOUR":2786,"THIRTYFIVE_AND_OVER":119657},"totalAgeCount":127705}}],"status":"success"}
I've tried removing the ["sortedAgeCounts"] from in the middle of
age0To17 = str(i["ageCount"]["sortedAgeCounts"]["ZERO_TO_SEVENTEEN"])
but I still get the same error. I've remove the 0-17 section to test the other age ranges and I get the same error for them as well. I tried removing all the underscores from the JSON and then using keys without the underscores.
I've also tried moving the str() to convert to string from the call to where the output is printed but the error persists.
Any ideas? Is this section not actually a JSON key, maybe a problem with the all caps or am I just doing something dumb? Any other code improvements are welcome as well but I'm stuck on this one.
Let me know if you need to see anything else. Thanks in advance for your help.
Edited(This works):
JSON=json.loads(s)
for i in JSON:
print str(JSON[i][0]["ageCount"]["sortedAgeCounts"]["ZERO_TO_SEVENTEEN"])
s is a string which contains the your JSON.

JSON Parsing Issue in python

When I'm trying to parse a JSON dump, I get this attribute error
Traceback (most recent call last):
File "Security_Header_Collector.py", line 120, in <module>
process(sys.argv[-1])
File "Security_Header_Collector.py", line 67, in process
server_details = json.load(header_final)
File "/usr/lib/python2.7/json/__init__.py", line 274, in load
return loads(fp.read(),
AttributeError: 'str' object has no attribute 'read'
Script:
finalJson[App[0]] = headerJson
header_final=json.dumps(finalJson,indent=4)
#print header_final
#json_data=open(header_final)
server_details = json.load(header_final)
with open("Out.txt",'wb') as f :
for appid, headers in server_details.iteritems():
htypes = [h for h in headers if h in (
'content-security-policy', 'x-frame-options',
'strict-transport-security', 'x-content-type-options',
'x-xss-protection')]
headers='{},{}'.format(appid, ','.join(htypes))
f.write(headers+'\n')
f.close()
json.dumps returns a JSON formatted string, but json.load expects to get file-like objects, not strings.
Solution: use json.loads instead of json.load in your code
Your code
header_final=json.dumps(finalJson,indent=4)
will give you string,
you have to use json.loads to convert string to json.
json.load - is used for files / objects
json.loads - is used for the strings or array elements.
You may also think about creating the whole JSON in the form of HEREDOC formate at once and latter apply escaping on it - this way it become easier to validate JSON format.

Read BSON file in Python?

I want to read a BSON format Mongo dump in Python and process the data. I am using the Python bson package (which I'd prefer to use rather than have a pymongo dependency), but it doesn't explain how to read from a file.
This is what I'm trying:
bson_file = open('statistics.bson', 'rb')
b = bson.loads(bson_file)
print b[0]
But I get:
Traceback (most recent call last):
File "test.py", line 11, in <module>
b = bson.loads(bson_file)
File "/Library/Python/2.7/site-packages/bson/__init__.py", line 75, in loads
return decode_document(data, 0)[1]
File "/Library/Python/2.7/site-packages/bson/codec.py", line 235, in decode_document
length = struct.unpack("<i", data[base:base + 4])[0]
TypeError: 'file' object has no attribute '__getitem__'
What am I doing wrong?
I found this worked for me with a mongodb 2.4 BSON file and PyMongo's 'bson' module:
import bson
with open('survey.bson','rb') as f:
data = bson.decode_all(f.read())
That returned a list of dictionaries matching the JSON documents stored in that mongo collection.
The f.read() data looks like this in a BSON:
>>> rawdata[:100]
'\x04\x01\x00\x00\x12_id\x00\x01\x00\x00\x00\x00\x00\x00\x00\x02_type\x00\x07\x00\x00\x00simple\x00\tchanged\x00\xd0\xbb\xb2\x9eI\x01\x00\x00\tcreated\x00\xd0L\xdcfI\x01\x00\x00\x02description\x00\x14\x00\x00\x00testing the bu'
The documentation states :
> help(bson.loads)
Given a BSON string, outputs a dict.
You need to pass a string. For example:
> b = bson.loads(bson_file.read())
loads expects a string (that's what the 's' stands for), not a file. Try reading from the file, and passing the result to loads.

Categories

Resources