pymongo: Error Creating embedded array in an OrderedDict - python

While importing SQL data into mongodb, I have merged few tables as an embedded array but while implementing I get syntactic errors stating 'key errors'.
Below is my code.
import pyodbc, json, collections, pymongo, datetime
arrayCol =[]
mongoConStr = 'localhost:27017'
sqlConStr = 'DRIVER={MSSQL-NC1311};SERVER=tcp:172.16.1.75,1433;DATABASE=devdb;UID=qauser;PWD=devuser'
mongoConnect = pymongo.MongoClient(mongoConStr)
sqlConnect = pyodbc.connect(sqlConStr)
dbo = mongoConnect.eaedw.ctArrayData
sqlCur = sqlConnect.cursor()
sqlCur.execute('''SELECT M.fldUserId ,TRU.intRuleGroupId ,TGM.strGroupName FROM TBL_USER_MASTER M
JOIN TBL_RULEGROUP_USER TRU ON M.fldUserId = TRU.intUserId
JOIN tbl_Group_Master TGM ON TRU.intRuleGroupId = TGM.intGroupId
''')
tuples = sqlCur.fetchall()
for tuple in tuples:
doc = collections.OrderedDict()
doc['fldUserId'] = tuple.fldUserId
doc['groups.gid'].append(tuple.intRuleGroupId)
doc['groups.gname'].append(tuple.strGroupName)
arrayCol.append(doc)
mongoImp = dbo.insert_many(arrayCol)
sqlCur.close()
mongoConnect.close()
sqlConnect.close()
Here, I was trying to create an embedded array name groups which will hold gid and groupname as a sub-doc in the array.
I get error for using append, it runs successfully without the embedded array.
Is there any error or mistake with the array definition?

You can't append to a list that doesn't exist. When you call append on them, doc['groups.gid'] and doc['groups.gname'] have no value. Even once you fix that problem, PyMongo prohibits you from inserting a document with keys like "groups.gid" that include dots. I think you intend to do this:
for tuple in tuples:
doc = collections.OrderedDict()
doc['fldUserId'] = tuple.fldUserId
doc['groups'] = collections.OrderedDict([
('gid', tuple.intRuleGroupId),
('gname', tuple.strGroupName)
])
arrayCol.append(doc)
I'm only guessing, based on your question, the schema that you really want to create.

Related

how to iterate array of dictionary without loop using django?

This my scenario. I have 30 records in the array of dictionary in django. So, I tried to iterate it's working fine. but it takes around one minute. How to reduce iteration time. I tried map function but it's not working. How to fix this and I will share my example code.
Example Code
def find_places():
data = [{'a':1},{'a':2},{'a':3},{'a':4},{'a':5},{'a':6},{'a':7},{'a':8}]
places =[]
for p in range(1,len(data)):
a = p.a
try:
s1 = sample.object.filter(a=a)
except:
s1 = sample(a=a)
s1.save()
plac={id:s1.id,
a:s1.a}
places.append(plac)
return places
find_places()
I need an efficient way to iterate the array of objects in python without a loop.
You can filter outside the loop and run get_or_create instead of reverting to an object creation if the filter doesn't match.
data_a = [d.a for d in data]
samples = sample.objects.filter(a__in=data_a)
places = []
for a in data_a:
s1, created = samples.get_or_create(
a=a
)
place = {id: s1.id, a:s1.a}
places.append(place)
You can try this:
You can create a list hen save it at once, try this:
def find_places():
data = [{'a':1},{'a':2},{'a':3},{'a':4},{'a':5},{'a':6},{'a':7},{'a':8}]
places =[]
lst = []
for p in data:
a = p['a']
lst.append(a) # store it at once
Then try to store it into database. You can search: How to store a list into Model in Django.
I only made changes to loop of the code, if database side also fails you can let me know.

"string indices must be integers" error while trying to build a list from json input

I am trying to create a list from a dictiory made from a json file in Python 3.7.
The json file has the following structure:
watches
collection
model
0 {…}
1
rmc
0 "value_I_need"
1 "value_I_need"
json extract:
{"watches":{"collection":{"event_banner":{"type":"banner","margin":false,"mobile_layer":true,"class":"tdr-banners-events","media":{"type":"image","src":"/public/banners/events/baselworld_2017_navigation.jpg","height":"150px"},"text":{"align":"left","animate":true,"positioning":"left","suptitle":"BANNER_EVENT_A_TITLE","title":"BANNER_EVENT_A_SUPTITLE","title_type":"h2","style":"light","link_text":"BANNER_EVENT_A_LINK_TEXT","link_href":"/magazine/article/baselworld-2017"}},"collection-navigation":{"type":"view","template":"nav.tdr-collection-navigation.tdr-flex.tdr-flex--align-items-center > ul.tdr-collection-navigation__list.tdr-flex.tdr-flex--align-items-flex-start#list","children":[{"type":"view","template":"li.tdr-collection-navigation__item","insert":{"where":"list"},"children":[{"type":"button-gamma","text":"FIND_YOUR_TUDOR_COLLECTION","href":"/search","cssClass":"tdr-button--gamma-collection-navigation","children":[{"type":"new-icon","cssClass":"circleicon dark-reverse-red","insert":{"where":"icon"},"icon":"search","width":"16","height":"16","colorClass":"tdr-icon-dark"}]}]},{"type":"view","template":"li.tdr-collection-navigation__item","insert":{"where":"list"},"children":[{"type":"collection-navigation-item","index":"0","text":"GRID_VIEW_COLLECTION","children":[{"type":"new-icon","cssClass":"red","insert":{"where":"icon"},"icon":"icon-grid","width":"36","height":"36","colorClass":"tdr-icon-dark"}]}]},{"type":"view","template":"li.tdr-collection-navigation__item","insert":{"where":"list"},"children":[{"type":"collection-navigation-item","index":"1","text":"LIST_VIEW_COLLECTION","children":[{"type":"new-icon","cssClass":"red","insert":{"where":"icon"},"icon":"icon-list-3","width":"36","height":"36","colorClass":"tdr-icon-dark"}]}]},{"type":"view","template":"li.tdr-collection-navigation__item.collection-navigation__item--new-collection","insert":{"where":"list"},"children":[{"type":"collection-navigation-item-new-collection","index":"2","text":"FEATURED_SELECTION","children":[{"type":"new-icon","cssClass":"red","insert":{"where":"icon"},"icon":"switch","width":"63","height":"63","colorClass":"tdr-icon-dark"}]}]}]},"collection_filter":{"0":{"route":"all","name":"all_collection","model_page":["black-bay","new-black-bay-fifty-eight","black-bay-32-36-41","new-black-bay-gmt","black-bay-chrono","black-bay-steel","black-bay-s-g","black-bay-dark","black-bay-bronze","north-flag","pelagos","new-1926","style","glamour-double-date","glamour-date-day","glamour-date","heritage-advisor","heritage-chrono","heritage-ranger","fastrider-black-shield","clair-de-rose","classic"]},"1":{"route":"featured-selection","name":"featured_selection","model_page":["glamour-double-date","new-black-bay-32","new-1926","black-bay-chrono"]},"length":2,"all":0,"featured-selection":1},"model":{"0":{"route":"black-bay-32-36-41","watch_model":"black_bay_32_36_41","model_group":"tudor","fam_intro_title":"bb32_36_41_intro_title","fam_intro_text":"bb32_36_41_intro_text","flagship_rmc":"m79580-0003","page_link":"/watches/black-bay-32-36-41/","tags":[],"optional_calibre":false,"no_wrap":true,"family_filter":true,"aggregated":true,"rmc":["m79540-0007","m79540-0009"]},
print(documents)
{'0': {'route': 'black-bay-32-36-41', 'watch_model': 'black_bay_32_36_41', 'model_group': 'tudor', 'fam_intro_title': 'bb32_36_41_intro_title', 'fam_intro_text': 'bb32_36_41_intro_text', 'flagship_rmc': 'm79580-0003', 'page_link': '/watches/black-bay-32-36-41/', 'tags': [], 'optional_calibre': False, 'no_wrap': True, 'family_filter': True, 'aggregated': True, 'rmc': ['m79580-0003', 'm79580-0004',
My code to build the list:
with open('test.json', 'r') as f:
dictionary = json.load(f)
documents = dictionary["watches"]["collection"]["model"]
for document in documents:
models = document["rmc"]
try:
for model in models:
start_urls.append('https://www.example.com/'+document['page_link']+'/'+model+'.html')
except Exception:
pass
The traceback error:
models = document["rmc"]
TypeError: string indices must be integers
The rmc values are another list within the model listing. So each model might have another list of rmc values.
My goal is to create a list of all models including their variantes (rmc).
Why is pyhton telling me it is a string, while I believe rmc rows are listed in integers?
You seem to think your model value is a list. The JSON says otherwise:
"model":{"0":{"route":"black-bay-32-36-41",
It's a dict whose keys are strings. You iterate over that dict:
for document in documents:
When you iterate over a dict that way, you iterate over the keys of that dict, so document holds the string "0". The string cannot be indexed by another string as document['rmc'], so Python rightly complains.
You can fix it in a couple of ways. First, you can change the way you read the model:
for document in documents:
models = documents[document]['rmc']
...
Or you can change the way you iterate over the dict:
for idx, document in documents.items():
models = document['rmc']
Pretty-printing the JSON instead of leaving it as one inscrutable line would probably have alerted you to this issue much faster.
Fix JSON Identifiers
The true,false statements were also causing an error (not valid identifiers). I was able to fix it with the following lines:
with open('test.json', 'r') as f:
dictionary = json.loads(f.read().replace("true","1").replace("false","0"))
for document in dictionary:
for i in range(len(dictionary['watches']['collection']['model'])):
models = dictionary['watches']['collection']['model'][str(i)]
try:
for i in range(0,len(models)):
_string = ('https://www.example.com'+models['page_link']+models['rmc'][i]+'.html')
print(_string) # This will show each generated string before processing
start_urls.append(_string)
except Exception as e:
pass
Access as a List
The contents of models is a list. Accessing a "list inside of a dict" is a bit different than accessing a dict. We needed to find the length of the list and iterate through it, because the elements aren't associated with string names.
Change this line:
documents = dictionary['watches']['collection']['model']
To this:
documents = dict(dictionary['watches']['collection']['model'])
And documents will become a dictionary

Fast way to convert SQLAlchemy objects to Python dicts

I have this query that returns a list of student objects:
query = db.session.query(Student).filter(Student.is_deleted == false())
query = query.options(joinedload('project'))
query = query.options(joinedload('image'))
query = query.options(joinedload('student_locator_map'))
query = query.options(subqueryload('attached_addresses'))
query = query.options(subqueryload('student_meta'))
query = query.order_by(Student.student_last_name, Student.student_first_name,
Student.student_middle_name, Student.student_grade, Student.student_id)
query = query.filter(filter_column == field_value)
students = query.all()
The query itself does not take much time. The problem is converting all these objects (can be 5000+) to Python dicts. It takes over a minute with this many objects.Currently, the code loops thru the objects and converts using to_dict(). I have also tried _dict__ which was much faster but this does not convert all relational objects it seems.
How can I convert all these Student objects and related objects quickly?
Maybe this will help you...
from collections import defaultdict
def query_to_dict(student_results):
result = defaultdict(list)
for obj in student_results:
instance = inspect(obj)
for key, x in instance.attrs.items():
result[key].append(x.value)
return result
output = query_to_dict(students)
query = query.options(joinedload('attached_addresses').joinedload('address'))
By chaining address joinedload to attached_addresses I was able to significantly speed up the query.
My understanding of why this is the case:
Address objects were not being loaded with the initial query. Every iteration thru the loop, the db would get hit to retrieve the Address object. With joined load, Address objects are now loaded upon initial query.
Thanks to Corley Brigman for the help.

NDB Model Querying of Key Ids using an array filter

I'm trying to query an NDB model using a list of provided key id strings. The model has string ids that are assigned at creation - for example:
objectKey = MyModel(
id="123456ABC",
name="An Object"
).put()
Now I can't figure out how to query the NDB key ids with a list filter. Normally you can do the MyModel.property.IN() to query properties:
names = ['An Object', 'Something else', 'etc']
# This query works
query = MyModel.query(MyModel.name.IN(names))
When I try to filter by a list of keys, I can't get it to work:
# This simple get works
object = MyModel.get_by_id("123456ABC")
ids = ["123456ABC", "CBA654321", "etc"]
# These queries DON'T work
query = MyModel.query(MyModel.id.IN(ids))
query = MyModel.query(MyModel.key.id.IN(ids))
query = MyModel.query(MyModel.key.id().IN(ids))
query = MyModel.query(MyModel._properties['id'].IN(ids))
query = MyModel.query(getattr(MyModel, 'id').IN(ids))
...
I always get AttributeError: type object 'MyModel' has no attribute 'id' errors.
I need to be able to filter by a list of IDs, rather than iterate through each ID in the list (which is sometimes long). How do I do it?
The following should work:
keys = [ndb.Key(MyModel, anid) for anid in ids]
objs = ndb.get_multi(keys)
You can also use urlsafe keys If you have problems using the ids.
keys = ndb.get_multi([ndb.Key(urlsafe=k) for k in ids])

How to use ResultSet in PyES

I'm using PyES to use ElasticSearch in Python.
Typically, I build my queries in the following format:
# Create connection to server.
conn = ES('127.0.0.1:9200')
# Create a filter to select documents with 'stuff' in the title.
myFilter = TermFilter("title", "stuff")
# Create query.
q = FilteredQuery(MatchAllQuery(), myFilter).search()
# Execute the query.
results = conn.search(query=q, indices=['my-index'])
print type(results)
# > <class 'pyes.es.ResultSet'>
And this works perfectly. My problem begins when the query returns a large list of documents.
Converting the results to a list of dictionaries is computationally demanding, so I'm trying to return the query results already in a dictionary. I came across with this documentation:
http://pyes.readthedocs.org/en/latest/faq.html#id3
http://pyes.readthedocs.org/en/latest/references/pyes.es.html#pyes.es.ResultSet
https://github.com/aparo/pyes/blob/master/pyes/es.py (line 1304)
But I can't figure out what exactly I'm supposed to do.
Based on the previous links, I've tried this:
from pyes import *
from pyes.query import *
from pyes.es import ResultSet
from pyes.connection import connect
# Create connection to server.
c = connect(servers=['127.0.0.1:9200'])
# Create a filter to select documents with 'stuff' in the title.
myFilter = TermFilter("title", "stuff")
# Create query / Search object.
q = FilteredQuery(MatchAllQuery(), myFilter).search()
# (How to) create the model ?
mymodel = lambda x, y: y
# Execute the query.
# class pyes.es.ResultSet(connection, search, indices=None, doc_types=None,
# query_params=None, auto_fix_keys=False, auto_clean_highlight=False, model=None)
resSet = ResultSet(connection=c, search=q, indices=['my-index'], model=mymodel)
# > resSet = ResultSet(connection=c, search=q, indices=['my-index'], model=mymodel)
# > TypeError: __init__() got an unexpected keyword argument 'search'
Anyone was able to get a dict from the ResultSet?
Any good sugestion to efficiently convert the ResultSet to a (list of) dictionary will be appreciated too.
I tried too many ways directly to cast ResultSet into dict but got nothing. The best way I recently use is appending ResultSet items into another list or dict. ResultSet covers every single item in itself as a dict.
Here is how I use:
#create a response dictionary
response = {"status_code": 200, "message": "Successful", "content": []}
#set restul set to content of response
response["content"] = [result for result in resultset]
#return a json object
return json.dumps(response)
Its not that complicated: just iterate over the result set. For example with a for loop:
for item in results:
print item

Categories

Resources