Cannot encode object: pymongo.cursor.Cursor object at

Cannot encode object: pymongo.cursor.Cursor object at - python

I am trying to retrieve an audio file stored in MongoDB when above error is thrown.
The code is as follows:
elif json_data != None and 'retriever' in json_data:
query_param = json_data['retriever']
data = db.soundData
x = data.find({'name': query_param})
y = data.find({'data': x})
return Response(y, mimetype='audio/mp3')
Under name I have the name of the file and under data is audio file itself.
As I am new to pymongo can somebody point to where an error could be coming from?

First of all, you need not be saving your file itself in mongo what you should be saving is the filename and the file itself is better off on the file system.
The error appears because, both x and y are indeed mongodb cursors rather than the data that you expect. You should be using find_one instead.
find_one(filter=None, *args, **kwargs) Get a single document from the
database.
All arguments to find() are also valid arguments for find_one(),
although any limit argument will be ignored. Returns a single
document, or None if no matching document is found.
y = data.find_one({'data': x})
You can make your code a bit more concise with
y = data.find_one({'data': {'name': query_param}})

Related

Exception: TypeError(string indices must be integers)

I have written the below python function (a snippet of the full code) to work in AWS Lambda. The purpose of it is to take a GeoJSON from an S3 bucket and parse it accordingly.
Once parsed, it is placed back into JSON format (data) and then should be inserted into the specified database using
bulk_item['uuid'] = str(uuid.uuid4())
bulk_item['name'] = feature_name
bulk_item['type'] = feature_type
bulk_item['info'] = obj
bulk_item['created'] = epoch_time
bulk_item['scope'] = 2
data = json.dumps(bulk_item)
print(data)
self.database.upsert_record(self.organisation, json_doc=data)
except Exception as e:
print(f'Exception: {e.__class__.__name__}({e})')
The db_access file in which the above is relating to is another python script. The function upsert_record is as below:
def upsert_record(self, organisation,
json_doc={}):
My code is working perfectly until I try to upsert it into the database. Once this line is gotten to, it throws the error
Traceback (most recent call last):
File "/var/task/s3_asset_handler.py", line 187, in process_incoming_file
self.database.upsert_record(self.organisation, json_doc=data)
File "/opt/python/database_access.py", line 1218, in upsert_record
new_uuid = json_doc['uuid']
TypeError: string indices must be integers
I can't seem to figure out the issue at all

You are trying to get an element from a JSON object, but passing a string.
The
data = json.dumps(bulk_item)
creates a string representing the object.
Try using bulk_item on it's own.

Why is "ObjectId('5efbe85b4aeb5d21e56fa81f')" not considered a valid ObjectId?

I am using PyMongo and I am trying to loop through an entire collection and display the ObjectId onto onto my Flask Web Page. However, when I write my method I keep getting the error "ObjectId('5efbe85b4aeb5d21e56fa81f')" is not a valid ObjectId.
The following is the code I am running
def get_class_names(self):
temp = list()
print("1")
for document_ in db.classes.find():
tempstr = document_.get("_id")
tempobjectid = ObjectId(tempstr)
temp.append(repr(tempobjectid))
print("2")
classes = list()
for class_ in temp:
classes.append(class_, Classes.get_by_id(class_).name)
return classes
How do I fix this?
Note: get_by_id, just takes in an ObjectId and finds it in the database.

The line
tempstr = document_.get("_id")
retrieves an ObjectId already. You then wrap it again in another ObjectId before calling repr on that. If you print(type(tempstr)), you'll see that it's an ObjectId.
Just do temp.append(tempstr).
BTW, you should rename the variable tempstr to tempId or something more appropriate.

Using python class with spark DataFrame to parse URL's

I'm trying to process URL's in a pyspark dataframe using a class that I've written and a udf. I'm aware of urllib and other url parsing libraries but for this case I need to use my own code.
In order to get the tld of a url I cross check it against the iana public suffix list.
Here's a simplification of my code
class Parser:
# list of available public suffixes for extracting top level domains
file = open("public_suffix_list.txt", 'r')
data = []
for line in file:
if line.startswith("//") or line == '\n':
pass
else:
data.append(line.strip('\n'))
def __init__(self, url):
self.url = url
#the code here extracts port,protocol,query etc.
#I think this bit below is causing the error
matches = [r for r in self.data if r in self.hostname]
#extra functionality in my actual class
i = matches.index(self.string)
try:
self.tld = matches[i]
# logic to find tld if no match
The class works in pure python so for example I can run
import Parser
x = Parser("www.google.com")
x.tld #returns ".com"
However when I try to do
import Parser
from pyspark.sql.functions import udf
parse = udf(lambda x: Parser(x).url)
df = sqlContext.table("tablename").select(parse("column"))
When I call an action I get
File "<stdin>", line 3, in <lambda>
File "<stdin>", line 27, in __init__
TypeError: 'in <string>' requires string as left operand
So my guess is that it's failing to interpret the data as a list of strings?
I've also tried to use
file = sc.textFile("my_file.txt")\
.filter(lambda x: not x.startswith("//") or != "")\
.collect()
data = sc.broadcast(file)
to open my file instead, but that causes
Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transforamtion. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.
Any ideas?
Thanks in advance
EDIT: Apologies, I didn't have my code to hand so my test code didn't explain very well the problems I was having. The error I initially reported was a result of the test data I was using.
I've updated my question to be more reflective of the challenge I'm facing.

Why do you need a class in this case (the code for defining your class is incorrect, you never declared self.data before using it in the init method) the only relevant line that affects the output you want is self.string=string, so you are basically passing the identity function as udf.
The UnicodeDecodeError is due to an encoding issue in your file, it has nothing to do with your definition of the class.
The second error is in the line sc.broadcast(file) , details of which can be found here : Spark: Broadcast variables: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transforamtion
EDIT 1
I would redefine your class structure as follows. You basically need to create the instance self.data by calling self.data = data before you can use it. Also anything that you write before the init method is executed irrespective of whether you call that class or not. So moving out the file parsing part will not have any effect.
# list of available public suffixes for extracting top level domains
file = open("public_suffix_list.txt", 'r')
data = []
for line in file:
if line.startswith("//") or line == '\n':
pass
else:
data.append(line.strip('\n'))
class Parser:
def __init__(self, url):
self.url = url
self.data = data
#the code here extracts port,protocol,query etc.
#I think this bit below is causing the error
matches = [r for r in self.data if r in self.hostname]
#extra functionality in my actual class
i = matches.index(self.string)
try:
self.tld = matches[i]
# logic to find tld if no match

Access JSON file data using preset query - Python

I am reading a json file with dictionary and values, but I am battling to use a variable as a query item when searching the json file.
x = value_cloud = "%s%s%s" % (["L1_METADATA_FILE"],["IMAGE_ATTRIBUTES"],["CLOUD_COVER"])
for meta in filelist(dir):
with open (meta) as data_file:
data = json.load(data_file)
cloud = str(data[x])
The error I get is:
Traceback (most recent call last):
File "E:\SAMPLE\Sample_Script_AWS\L8_TOA_using_gdal_rasterio.py", line 96, in <module>
cloud = str(data[x])
KeyError: "['L1_METADATA_FILE']['IMAGE_ATTRIBUTES']['CLOUD_COVER']"
What I actually want is to search the json file for the key in the variable...
The keys do exist in the json file because when I run the following I get the correct output.
cloud = str(data["L1_METADATA_FILE"]["IMAGE_ATTRIBUTES"]["CLOUD_COVER"])
print cloud
My knowledge of python is sketchy, and I am passing the variable through as a string and not an expression or object and therefore it gives me that error. What is the correct way to create the variable and call the keys that I want.
Thanks in advance!

Your key ends up including the brackets in the string, which which where the error comes from. If you use each key in its own variable, like this:
x, y, z = "L1_METADATA_FILE", "IMAGE_ATTRIBUTES" , "CLOUD_COVER"
and then:
cloud = str(data[x][y][z])
it should avoid any errors.

Variable Route Not Working in For Loop

I tried to create multiple routes in one go by using the variables from the database and a for loop.
I tried this
temp = "example"
#app.route("/speaker/<temp>")
def getSpeakerAtr(temp):
return '''%s''' % temp
It works very well. BUT:
for x in models.Speaker.objects:
temp = str(x.name)
#app.route("/speaker/<temp>")
def getSpeakerAtr(temp):
return '''%s''' % temp
Doesn't work. The error message:
File "/Users/yang/Documents/CCPC-Website/venv/lib/python2.7/site-packages/flask/app.py", line 1013, in decorator
02:03:04 web.1 | self.add_url_rule(rule, endpoint, f, **options)
**The reason I want to use multiple routes is that I need to get the full data of an object by querying from the route. For example:
if we type this url:
//.../speaker/sam
we can get the object who has the 'name' value as 'sam'. Then I can use all of the values in this object like bio or something.**

You don't need multiple routes. Just one route that validates its value, eg:
#app.route('/speaker/<temp>')
def getSpeakerAtr(temp):
if not any(temp == str(x.name) for x in models.Speaker.objects):
# do something appropriate (404 or something?)
# carry on doing something else
Or as to your real intent:
#app.route('/speaker/<name>')
def getSpeakerAtr(name):
speaker = # do something with models.Speaker.objects to lookup `name`
if not speaker: # or whatever check is suitable to determine name didn't exist
# raise a 404, or whatever's suitable
# we have a speaker object, so use as appropriate

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Cannot encode object: pymongo.cursor.Cursor object at - python

Related

Exception: TypeError(string indices must be integers)

Why is "ObjectId('5efbe85b4aeb5d21e56fa81f')" not considered a valid ObjectId?

Using python class with spark DataFrame to parse URL's

Access JSON file data using preset query - Python

Variable Route Not Working in For Loop

Categories

Resources