so far in my code bellow I managed to store my data into mongoDB.
Now I want to be able to retrieve the data I have stored.
As you can see I have been trying but keep on getting an error.
With BSON do I have to first decode the data to retrieve it from mongoDB?
Any help would be greatly appreciated!
(Apologies for the messy code, I am just practicing through trial and error)
import json
from json import JSONEncoder
import pymongo
from pymongo import MongoClient
from bson.binary import Binary
import pickle
#Do this for each
client = MongoClient("localhost", 27017)
db = client['datacampdb']
coll = db.personpractice4_collection #creating a collection in the database
#my collection on the database is called personpractice4_collection
class Person:
def __init__(self, norwegian, dame, brit, german, sweed):
self.__norwegian = norwegian
self.__dame = dame
self.__brit = brit
self.__german = german #private variable
self.__sweed = sweed
# create getters and setters later to make OOP
personone = Person("norwegian", "dame", "brit", "german","sweed")
class PersonpracticeEncoder(JSONEncoder):
def default(self, o):
return o.__dict__
#Encode Person Object into JSON"
personpracticeJson = json.dumps(personone, indent=4, cls=PersonpracticeEncoder)
practicedata = pickle.dumps(personpracticeJson)
coll.insert_one({'bin-data': Binary(practicedata)})
#print(personpracticeJson)
#print(db.list_collection_names()) #get then names of my collections in DB
#retriving data from mongodb
#Retrieving a Single Document with find_one()
print(({'bin-data': Binary(practicedata)}).find_one()) #not working
the find_one method should be called on a collection
{'bin-data': Binary(practicedata)} is a query to find a document
coll.find_one({'bin-data': Binary(practicedata)})
Witch means : Find a document in the collection coll where bin-data is equal to Binary(practicedata)
Related
I am trying to persist an object reference using only ZODB in a FileStorage database.
I made a test to analyze its performance, but the object when it is loaded it appears to be broken.
The test consists on:
create an object in one script and write it to database.
In another script read that object from the same database and use it there.
zodb1.py image from CMD
zodb2.py image from CMD
zodb1.py
import ZODB
from ZODB.FileStorage import FileStorage
import persistent
import transaction
storage = FileStorage('ODB.fs')
db = ZODB.DB(storage)
connection = db.open()
ODB = connection.root()
print(ODB)
class Instrument(persistent.Persistent):
def __init__(self, name, address):
self.name = name
self.address = address
def __str__(self):
return f'Instrument - {self.name}, ID: {self.address}'
camera = Instrument(name='Logitech', address='CAM0')
ODB['camera'] = camera
ODB._p_changed = True
transaction.commit()
print(ODB)
ob = ODB['camera']
print(ob)
print(dir(ob))
zodb2.py
import ZODB, ZODB.FileStorage
import persistent
import transaction
connection = ZODB.connection('ODB.fs')
ODB = connection.root()
print(ODB)
ob = ODB['camera']
print(ob)
print(dir(ob))
Am I missing something important? I've read the ZODB's documentation and I see no other configuration process or another way to aproach this.
Thank you in advance.
I think that the problem you see is because zodb2.py has no knowledge of the Instrument class defined in zodb1.py.
I guess that if you moved your class to a separate module and imported it in both zodb1 and zodb2, you would not see a broken object.
Background on the actual problem: I am trying to create an AWS Lambda function in Python that accumulates records from a DynamoDB stream into an S3 object. If you don't understand this context you can just ignore it, the question is really a pure Python question.
I got the code below barely working, the file is successfully concatenated with new records from the stream, in the desired format (one JSON object per line). But what I really want is to treat the file as a hashmap, using the fields in keys (4th line of the function definition), which are a subset of the fields in new, so that any records coming in will overwrite old records containing the same key values.
What is the obvious / idiomatic way to change the line journal += data so that instead of a concatenation, I get an overwrite of lines of the same keys value?
import json
import boto3
import re
import uuid
from decimal import Decimal
import six
import sys
from datetime import datetime
from boto3.dynamodb.types import TypeSerializer
s3 = boto3.resource('s3')
def lambda_handler(event, context):
object = s3.Object('some.bucket', 'address/dynamo-stream.json')
journal = object.get()['Body'].read().decode('utf-8')
for record in event['Records']:
keys = record['dynamodb'].get('Keys')
new = record['dynamodb'].get('NewImage')
if new:
data = json.dumps(loads(new))
journal += data + "\n"
object.put(Body=journal)
return "ok"
# below: code from https://github.com/Alonreznik/dynamodb-json/blob/master/dynamodb_json/json_util.py
[...]
def loads(s, as_dict=False, *args, **kwargs):
[...]
More explanation:
The variable keys is a subset of new in the sense that, for any json value of new in the format
{ "k1":"v1", "k2:v2", "k3:v3", ... "kN:vN" }
keys will have the value
{ "k1":"v1", "k2:v2" }
This question already has answers here:
How to download a file over HTTP?
(30 answers)
Closed 7 years ago.
While building a flask website, I'm using an external JSON feed to feed the local mongoDB with content. This feed is parsed and fed while repurposing keys from the JSON to keys in Mongo.
One of the available keys from the feed is called "img_url" and contains, guess what, an url to an image.
Is there a way, in Python, to mimic a php style cURL? I'd like to grab that key, download the image, and store it somewhere locally while keeping other associated keys, and have that as an entry to my db.
Here is my script up to now:
import json
import sys
import urllib2
from datetime import datetime
import pymongo
import pytz
from utils import slugify
# from utils import logger
client = pymongo.MongoClient()
db = client.artlogic
def fetch_artworks():
# logger.debug("downloading artwork data from Artlogic")
AL_artworks = []
AL_artists = []
url = "http://feeds.artlogic.net/artworks/artlogiconline/json/"
while True:
f = urllib2.urlopen(url)
data = json.load(f)
AL_artworks += data['rows']
# logger.debug("retrieved page %s of %s of artwork data" % (data['feed_data']['page'], data['feed_data']['no_of_pages']))
# Stop we are at the last page
if data['feed_data']['page'] == data['feed_data']['no_of_pages']:
break
url = data['feed_data']['next_page_link']
# Now we have a list called ‘artworks’ in which all the descriptions are stored
# We are going to put them into the mongoDB database,
# Making sure that if the artwork is already encoded (an object with the same id
# already is in the database) we update the existing description instead of
# inserting a new one (‘upsert’).
# logger.debug("updating local mongodb database with %s entries" % len(artworks))
for artwork in AL_artworks:
# Mongo does not like keys that have a dot in their name,
# this property does not seem to be used anyway so let us
# delete it:
if 'artworks.description2' in artwork:
del artwork['artworks.description2']
# upsert int the database:
db.AL_artworks.update({"id": artwork['id']}, artwork, upsert=True)
# artwork['artist_id'] is not functioning properly
db.AL_artists.update({"artist": artwork['artist']},
{"artist_sort": artwork['artist_sort'],
"artist": artwork['artist'],
"slug": slugify(artwork['artist'])},
upsert=True)
# db.meta.update({"subject": "artworks"}, {"updated": datetime.now(pytz.utc), "subject": "artworks"}, upsert=True)
return AL_artworks
if __name__ == "__main__":
fetch_artworks()
First, you might like the requests library.
Otherwise, if you want to stick to the stdlib, it will be something in the lines of:
def fetchfile(url, dst):
fi = urllib2.urlopen(url)
fo = open(dst, 'wb')
while True:
chunk = fi.read(4096)
if not chunk: break
fo.write(chunk)
fetchfile(
data['feed_data']['next_page_link'],
os.path.join('/var/www/static', uuid.uuid1().get_hex()
)
With the correct exceptions catching (i can develop if you want, but i'm sure the documentation will be clear enough).
You could put the fetchfile() into a pool of async jobs to fetch many files at once.
https://docs.python.org/2/library/json.html
https://docs.python.org/2/library/urllib2.html
https://docs.python.org/2/library/tempfile.html
https://docs.python.org/2/library/multiprocessing.html
I have a list of about 20 objects and for each object I return a list of 10 dictionaries.
I am trying to store the list of 10 dictionaries for each object in the list on GAE; I do not think I am writing the code correctly to store this information to GAE.
Here is what I have:
Before my main request handler I have this class:
class Tw(db.Model):
tags = db.ListProperty()
ip = db.StringProperty()
In my main request handler I have the following:
for city in lst_of_cities: # this is the list of 20 objects
dict_info = hw12.twitter(city) # this is the function to get the list of 10 dictionaries for each object in the list
datastore = Tw() # this is the class defined for db.model
datastore.tags.append(dict_info) #
datastore.ip = self.request.remote_addr
datastore.put()
data = Data.gql("") #data entities we need to fetch
I am not sure if this code is write at all. If anyone could please help it would be much appreciated.
Welcome to Stack Overflow!
I see a few issues:
Dictionaries are not supported value types for App Engine properties.
You're only storing the last entity; the rest are discarded.
You're using a ListProperty, but instead of appending each element of dict_info, you're doing a single append of the entire list.
Since you can't store a raw dictionary inside a property, you need to serialize it to some other format, like JSON or pickle. Here's a revised example using pickle:
from google.appengine.ext import db
import pickle
class Tw(db.Model):
tags = db.BlobProperty()
ip = db.StringProperty()
entities = []
for city in lst_of_cities:
dict_info = hw12.twitter(city)
entity = Tw()
entity.tags = db.Blob(pickle.dumps(dict_info))
entity.ip = self.request.remote_addr
entities.append(entity)
db.put(entities)
When you fetch the entity later, you can retrieve your list of dictionaries with pickle.loads(entity.tags).
When I deal with data types that are not directly supported by Google App Engine like dictionaries or custom data type, I usually adopt the handy PickleProperty.
from google.appengine.ext import db
import pickle
class PickleProperty(db.Property):
def get_value_for_datastore(self, model_instance):
value = getattr(model_instance, self.name, None)
return pickle.dumps(value)
def make_value_from_datastore(self, value):
return pickle.loads(value)
Once declared the PickleProperty class in your commons.py module, you can use it to store your custom data with something like this:
from google.appengine.ext import db
from commons import PickleProperty
class Tw(db.Model):
tags = PickleProperty()
ip = db.StringProperty()
entities = []
for city in lst_of_cities:
dict_info = hw12.twitter(city)
entity = Tw()
entity.tags = dict_info
entity.ip = self.request.remote_addr
entities.append(entity)
db.put(entities)
To retrieve the data back go with:
entity.tags
Since this was written, the App Engine has pushed out their experimental "ndb" Python database model, which contains in particular the JsonProperty, something that pretty well directly implements what you want.
Now, you need to be running the Python 2.7 version of the App Engine, which is still not quite ready for production, but it all seems pretty stable these days, GvR himself seems to be writing a lot of the code which bodes well for the code quality, and I'm intending to use this in production sometime this year...
for p in db.collection.find({"test_set":"abc"}):
posts.append(p)
thejson = json.dumps({'results':posts})
return HttpResponse(thejson, mimetype="application/javascript")
In my Django/Python code, I can't return a JSON from a mongo query because of "ObjectID". The error says that "ObjectID" is not serializable.
What do I have to do?
A hacky way would be to loop through:
for p in posts:
p['_id'] = ""
The json module won't work due to things like the ObjectID.
Luckily PyMongo provides json_util which ...
... allow[s] for specialized encoding and
decoding of BSON documents into Mongo
Extended JSON's Strict mode. This lets
you encode / decode BSON documents to
JSON even when they use special BSON
types.
Here is a simple sample, using pymongo 2.2.1
import os
import sys
import json
import pymongo
from bson import BSON
from bson import json_util
if __name__ == '__main__':
try:
connection = pymongo.Connection('mongodb://localhost:27017')
database = connection['mongotest']
except:
print('Error: Unable to Connect')
connection = None
if connection is not None:
database["test"].insert({'name': 'foo'})
doc = database["test"].find_one({'name': 'foo'})
return json.dumps(doc, sort_keys=True, indent=4, default=json_util.default)
It's pretty easy to write a custom serializer which copes with the ObjectIds. Django already includes one which handles decimals and dates, so you can extend that:
from django.core.serializers.json import DjangoJSONEncoder
from bson import objectid
class MongoAwareEncoder(DjangoJSONEncoder):
"""JSON encoder class that adds support for Mongo objectids."""
def default(self, o):
if isinstance(o, objectid.ObjectId):
return str(o)
else:
return super(MongoAwareEncoder, self).default(o)
Now you can just tell json to use your custom serializer:
thejson = json.dumps({'results':posts}, cls=MongoAwareEncoder)
Something even simpler which works for me on Python 3.6 using
motor==1.1
pymongo==3.4.0
from bson.json_util import dumps, loads
for mongo_doc in await cursor.to_list(length=10):
# mongo_doc is a <class 'dict'> returned from the async mongo driver, in this acse motor / pymongo.
# result of executing a simple find() query.
json_string = dumps(mongo_doc)
# serialize the <class 'dict'> into a <class 'str'>
back_to_dict = loads(json_string)
# to unserialize, thus return the string back to a <class 'dict'> with the original 'ObjectID' type.