Read BSON file in Python? - python

I want to read a BSON format Mongo dump in Python and process the data. I am using the Python bson package (which I'd prefer to use rather than have a pymongo dependency), but it doesn't explain how to read from a file.
This is what I'm trying:
bson_file = open('statistics.bson', 'rb')
b = bson.loads(bson_file)
print b[0]
But I get:
Traceback (most recent call last):
File "test.py", line 11, in <module>
b = bson.loads(bson_file)
File "/Library/Python/2.7/site-packages/bson/__init__.py", line 75, in loads
return decode_document(data, 0)[1]
File "/Library/Python/2.7/site-packages/bson/codec.py", line 235, in decode_document
length = struct.unpack("<i", data[base:base + 4])[0]
TypeError: 'file' object has no attribute '__getitem__'
What am I doing wrong?

I found this worked for me with a mongodb 2.4 BSON file and PyMongo's 'bson' module:
import bson
with open('survey.bson','rb') as f:
data = bson.decode_all(f.read())
That returned a list of dictionaries matching the JSON documents stored in that mongo collection.
The f.read() data looks like this in a BSON:
>>> rawdata[:100]
'\x04\x01\x00\x00\x12_id\x00\x01\x00\x00\x00\x00\x00\x00\x00\x02_type\x00\x07\x00\x00\x00simple\x00\tchanged\x00\xd0\xbb\xb2\x9eI\x01\x00\x00\tcreated\x00\xd0L\xdcfI\x01\x00\x00\x02description\x00\x14\x00\x00\x00testing the bu'

The documentation states :
> help(bson.loads)
Given a BSON string, outputs a dict.
You need to pass a string. For example:
> b = bson.loads(bson_file.read())

loads expects a string (that's what the 's' stands for), not a file. Try reading from the file, and passing the result to loads.

Related

Exception: TypeError(string indices must be integers)

I have written the below python function (a snippet of the full code) to work in AWS Lambda. The purpose of it is to take a GeoJSON from an S3 bucket and parse it accordingly.
Once parsed, it is placed back into JSON format (data) and then should be inserted into the specified database using
bulk_item['uuid'] = str(uuid.uuid4())
bulk_item['name'] = feature_name
bulk_item['type'] = feature_type
bulk_item['info'] = obj
bulk_item['created'] = epoch_time
bulk_item['scope'] = 2
data = json.dumps(bulk_item)
print(data)
self.database.upsert_record(self.organisation, json_doc=data)
except Exception as e:
print(f'Exception: {e.__class__.__name__}({e})')
The db_access file in which the above is relating to is another python script. The function upsert_record is as below:
def upsert_record(self, organisation,
json_doc={}):
My code is working perfectly until I try to upsert it into the database. Once this line is gotten to, it throws the error
Traceback (most recent call last):
File "/var/task/s3_asset_handler.py", line 187, in process_incoming_file
self.database.upsert_record(self.organisation, json_doc=data)
File "/opt/python/database_access.py", line 1218, in upsert_record
new_uuid = json_doc['uuid']
TypeError: string indices must be integers
I can't seem to figure out the issue at all
You are trying to get an element from a JSON object, but passing a string.
The
data = json.dumps(bulk_item)
creates a string representing the object.
Try using bulk_item on it's own.

Trying to parse JSON data from a url with python

I am trying to make a IP details grabber via python and a JSON API but I am having trouble parsing the JSON data.
I've tried loading and dumping the data, none of that works so I have 0 idea how I am going to parse this data.
#Importing
import requests
import json
import os
#Variables
cls = os.system('cls')
#Startup
cls #Clearing the console on startup
ipToSearch = input("Please enter the IP you wish to search: ")
saveDetails = input("Would you like to save the IP's deatils to a file? Y/N: ")
ip_JSON = requests.get(url="http://ip-api.com/json/" + ipToSearch).json()
ip_Data = json.loads(ip_JSON)
print(ip_Data)
I am trying to parse the IP's information but the result is this error currently.
Traceback (most recent call last):
File "main.py", line 16, in <module>
ip_Data = json.loads(ip_JSON)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37-32\lib\json\__init__.py", line 341, in loads
raise TypeError(f'the JSON object must be str, bytes or bytearray, '
TypeError: the JSON object must be str, bytes or bytearray, not dict
The traceback is because it looks like you already converted it to json on the previous line .json() and then you try to do it again.
ip_JSON = requests.get(url="http://ip-api.com/json/" + ipToSearch).json()
ip_Data = json.loads(ip_JSON)
Try
ip_JSON = requests.get(url="http://ip-api.com/json/" + ipToSearch).json()
print(ip_JSON)
try json.dumps, like this
ip_JSON = requests.get(url="http://ip-api.com/json/" + ipToSearch).json()
ip_Data = json.dumps(ip_JSON)

Access JSON file data using preset query - Python

I am reading a json file with dictionary and values, but I am battling to use a variable as a query item when searching the json file.
x = value_cloud = "%s%s%s" % (["L1_METADATA_FILE"],["IMAGE_ATTRIBUTES"],["CLOUD_COVER"])
for meta in filelist(dir):
with open (meta) as data_file:
data = json.load(data_file)
cloud = str(data[x])
The error I get is:
Traceback (most recent call last):
File "E:\SAMPLE\Sample_Script_AWS\L8_TOA_using_gdal_rasterio.py", line 96, in <module>
cloud = str(data[x])
KeyError: "['L1_METADATA_FILE']['IMAGE_ATTRIBUTES']['CLOUD_COVER']"
What I actually want is to search the json file for the key in the variable...
The keys do exist in the json file because when I run the following I get the correct output.
cloud = str(data["L1_METADATA_FILE"]["IMAGE_ATTRIBUTES"]["CLOUD_COVER"])
print cloud
My knowledge of python is sketchy, and I am passing the variable through as a string and not an expression or object and therefore it gives me that error. What is the correct way to create the variable and call the keys that I want.
Thanks in advance!
Your key ends up including the brackets in the string, which which where the error comes from. If you use each key in its own variable, like this:
x, y, z = "L1_METADATA_FILE", "IMAGE_ATTRIBUTES" , "CLOUD_COVER"
and then:
cloud = str(data[x][y][z])
it should avoid any errors.

JSON Parsing Issue in python

When I'm trying to parse a JSON dump, I get this attribute error
Traceback (most recent call last):
File "Security_Header_Collector.py", line 120, in <module>
process(sys.argv[-1])
File "Security_Header_Collector.py", line 67, in process
server_details = json.load(header_final)
File "/usr/lib/python2.7/json/__init__.py", line 274, in load
return loads(fp.read(),
AttributeError: 'str' object has no attribute 'read'
Script:
finalJson[App[0]] = headerJson
header_final=json.dumps(finalJson,indent=4)
#print header_final
#json_data=open(header_final)
server_details = json.load(header_final)
with open("Out.txt",'wb') as f :
for appid, headers in server_details.iteritems():
htypes = [h for h in headers if h in (
'content-security-policy', 'x-frame-options',
'strict-transport-security', 'x-content-type-options',
'x-xss-protection')]
headers='{},{}'.format(appid, ','.join(htypes))
f.write(headers+'\n')
f.close()
json.dumps returns a JSON formatted string, but json.load expects to get file-like objects, not strings.
Solution: use json.loads instead of json.load in your code
Your code
header_final=json.dumps(finalJson,indent=4)
will give you string,
you have to use json.loads to convert string to json.
json.load - is used for files / objects
json.loads - is used for the strings or array elements.
You may also think about creating the whole JSON in the form of HEREDOC formate at once and latter apply escaping on it - this way it become easier to validate JSON format.

Python / Dictionary / List / Mongo insert issues - beginner

Sorry, trying to understand and get used to dictionary and list objects.
I'm calling eBay's API through their ebaysdk, and want to store the items from it to a collection as documents in Mongo. Simple.
Here's a sample of the schema that will be returned:
<timestamp>2009-09-04T00:47:12.456Z</timestamp>
<searchResult count="2">
<item>
<itemId>230371938681</itemId>
<title>Harry Potter and the Order of the Phoenix HD-DVD</title>
<globalId>EBAY-US</globalId>
<primaryCategory>
<categoryId>617</categoryId>
<categoryName>DVD, HD DVD & Blu-ray</categoryName>
</primaryCategory>
I've tried 500 iterations of this code, stripped down to the most basic here's what I have.
from ebaysdk import finding
from pymongo import MongoClient
api = finding(appid="billy-40d0a7e49d87")
api.execute('findItemsByKeywords', {'keywords': 'potter'})
listings = api.response_dict()
client = MongoClient('mongodb://user:pass#billy.mongohq.com:10099/ebaystuff')
db = client['ebaycollection']
ebay_collection = db.ebaysearch
for key in listings:
print key
ebay_collection.insert(key)
Will get this error:
Traceback (most recent call last):
File "ebay_search.py", line 34, in <module>
ebay_collection.insert(key)
File "/Library/Python/2.7/site-packages/pymongo/collection.py", line 408, in insert
self.uuid_subtype, client)
File "/Library/Python/2.7/site-packages/pymongo/collection.py", line 378, in gen
doc['_id'] = ObjectId()
TypeError: 'str' object does not support item assignment
Simple stuff. All I want to do is add each item as a document.
An immutable type like a string cannot be used as a document because it doesn't allow adding additional fields, like the _id field Mongo requires. You can instead wrap the string in a dictionary to serve as a wrapper document:
key_doc = {'key': key}
ebay_collection.insert(key_doc)

Categories

Resources