I use mongoengine for mongodb in django.
but.. mongoengine fields (like StringField) makes me build up schema toward the way that I don't want. I mean, it strictly insists that I pre-write key name before I do know what it will be. for example...
in case that I do not know what key name will be put into database...
> for(var i=0; i<10; i++){
... o = {};
... o[i.toString()] = i + 100;
... db.test.save(o)
... }
> db.test.find()
{ "_id" : ObjectId("4ed623aa45c8729573313811"), "0" : 100 }
{ "_id" : ObjectId("4ed623aa45c8729573313812"), "1" : 101 }
{ "_id" : ObjectId("4ed623aa45c8729573313813"), "2" : 102 }
{ "_id" : ObjectId("4ed623aa45c8729573313814"), "3" : 103 }
{ "_id" : ObjectId("4ed623aa45c8729573313815"), "4" : 104 }
{ "_id" : ObjectId("4ed623aa45c8729573313816"), "5" : 105 }
{ "_id" : ObjectId("4ed623aa45c8729573313817"), "6" : 106 }
{ "_id" : ObjectId("4ed623aa45c8729573313818"), "7" : 107 }
{ "_id" : ObjectId("4ed623aa45c8729573313819"), "8" : 108 }
{ "_id" : ObjectId("4ed623aa45c872957331381a"), "9" : 109 }
[addition]
as you can see above, key is very different from each other..
just assume that "I do not know what key name will be put into document as key ahead of time
as dcrosta replied.. I am looking for a way to use mongoengine without specifying the fields ahead of time.
[/addition]
How can I do the same thing through mongoengine?
please give me schema design like
class Test(Document):
tag = StringField(db_field='xxxx')
[addition]
I don't know what 'xxxx' will be as key name.
sorry.. I'm Korean so my english is awkward.
please give me your some knowledge.
Thanks for reading this.
[/addition]
Have you considered using PyMongo directly instead of using Mongoengine? Mongoengine is designed to declare and validate a schema for your documents, and provides many tools and conveniences around that. If your documents are going to vary, I'm not sure Mongoengine is the right choice for you.
If, however, you have some fields in common across all documents, and then each document has some set of fields specific to itself, you can use Mongoengine's DictField. The downside of this is that the keys will not be "top-level", for instance:
class UserThings(Document):
# you can look this document up by username
username = StringField()
# you can store whatever you want here
things = DictField()
dcrosta_things = UserThings(username='dcrosta')
dcrosta_things.things['foo'] = 'bar'
dcrosta_things.things['bad'] = 'quack'
dcrosta_things.save()
Results in a MongoDB document like:
{ _id: ObjectId(...),
_types: ["UserThings"],
_cls: "UserThings",
username: "dcrosta",
things: {
foo: "bar",
baz: "quack"
}
}
Edit: I should also note, there's work in progress on the development branch of Mongoengine for "dynamic" documents, where attributes on the Python document instances will be saved when the model is saved. See https://github.com/hmarr/mongoengine/pull/112 for details and history.
Related
I'm using db.collection.find({}, {'_id': False}).limit(2000) to get the documents from a collection. This documents are sent to a Facebook API, after the API return success this documents need to be deleted from the collection.
My main doubt is:
Is there a way to I delete all this 2000 documents withou using a for
loop? I know that collection.find returns a cursor, is there a way
to use this cursor in a delete_many?
The structure of my document is:
{
"_id" : ObjectId("61608068887f1a0e2162d94b"),
"event_time" : "1632582893",
"value" : "549.9000",
"contents" : [
{
"product_id" : "1-1",
"quantity" : "1.000000",
"value" : "10"
}
]
}
To solve this problem, based on the comments of #adarsh and #J.F I've used the following code:
rm = [x['_id'] for x in MongoDB(mongo).db.get_collection("DataToSend").find({}, {'_id' : 1}).limit(2000)
MongoDB(mongo).db.get_collection("DataToSend").delete_many({'_id' : { '$in' : list(rm)}})
I have a structure like this:
{
"id" : 1,
"user" : "somebody",
"players" : [
{
"name" : "lala",
"surname" : "baba",
"player_place" : "1",
"start_num" : "123",
"results" : {
"1" : { ... }
"2" : { ... },
...
}
},
...
]
}
I am pretty new to MongoDB and I just cannot figure out how to extract results for a specific user (in this case "somebody", but there are many other users and each has an array of players and each player has many results) for a specific player with start_num.
I am using pymongo and this is the code I came up with:
record = collection.find(
{'user' : name}, {'players' : {'$elemMatch' : {'start_num' : start_num}}, '_id' : False}
)
This extracts players with specific player for a given user. That is good, but now I need to get specific result from results, something like this:
{ 'results' : { '2' : { ... } } }.
I tried:
record = collection.find(
{'user' : name}, {'players' : {'$elemMatch' : {'start_num' : start_num}}, 'results' : result_num, '_id' : False}
)
but that, of course, doesn't work. I could just turn that to list in Python and extract what I need, but I would like to do that with query in Mongo.
Also, what would I need to do to replace specific result in results for specific player for specific user? Let's say I have a new result with key 2 and I want to replace existing result that has key 2. Can I do it with same query as for find() (just replacing method find with method replace or find_and_replace)?
You can replace a specific result and the syntax for that should be something like this,
assuming you want to replace the result with key 1,
collection.updateOne({
"user": name,
"players.start_num": start_num
},
{ $set: { "players.$.results.1" : new_result }})
I am using the following code to create an index and load data in elastic search
from elasticsearch import helpers, Elasticsearch
import csv
es = Elasticsearch()
es = Elasticsearch('localhost:9200')
index_name='wordcloud_data'
with open('./csv-data/' + index_name +'.csv') as f:
reader = csv.DictReader(f)
helpers.bulk(es, reader, index=index_name, doc_type='my-type')
print ("done")
My CSV data is as follows
date,word_data,word_count
2017-06-17,luxury vehicle,11
2017-06-17,signifies acceptance,17
2017-06-17,agency imposed,16
2017-06-17,customer appreciation,11
The data loads fine but then the datatype is not accurate
How do I force it to say that the word_count is integer and not text
See how it figures out the date type ?
Is there a way it can figure out the int datatype automatically ? or by passing some parameter ?
Also what do I do to increase the ignore_above or remove it for some of the fields if I wanted to. basically no limit to the number of characters ?
{
"wordcloud_data" : {
"mappings" : {
"my-type" : {
"properties" : {
"date" : {
"type" : "date"
},
"word_count" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"word_data" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
You need to create a mapping that would describe field types.
With the elasticsearch-py client this can be done using es.indices.put_mapping or index.create methods, by passing it JSON document that describes mappings, like shown in this SO answer. It would be something like this:
es.indices.put_mapping(
index="wordcloud_data",
doc_type="my-type",
body={
"properties": {
"date": {"type":"date"},
"word_data": {"type": "text"},
"word_count": {"type": "integer"}
}
}
)
However, I'd suggest to take a look at the elasticsearch-dsl package that provides much nicer declarative API to describe things. It would be something along those lines (untested):
from elasticsearch_dsl import DocType, Date, Integer, Text
from elasticsearch_dsl.connections import connections
from elasticsearch.helpers import bulk
connections.create_connection(hosts=["localhost"])
class WordCloud(DocType):
word_data = Text()
word_count = Integer()
date = Date()
class Index:
name = "wordcloud_data"
doc_type = "my_type" # If you need it to be called so
WordCloud.init()
with open("./csv-data/%s.csv" % index_name) as f:
reader = csv.DictReader(f)
bulk(
connections.get_connection(),
(WordCloud(**row).to_dict(True) for row in reader)
)
Please note, I haven't tried the code I've posted - just written it. Don't have an ES server at hand to test. There could be some small mistakes or typos there (please point out if there are), but the general idea should be correct.
Thanks. #drdaeman's Solution worked for me. Although, I thought it's worth mentioning that in elasticsearch-dsl 6+
class Meta:
index = "wordcloud_data"
doc_type = "my-type"
This snippet will raise cannot write to wildcard index exception. Change the following to,
class Index:
name = 'wordcloud_data'
doc_type = 'my_type'
I have this Document in mongo engine:
class Mydoc(db.Document):
x = db.DictField()
item_number = IntField()
And I have this data into the Document
{
"_id" : ObjectId("55e360cce725070909af4953"),
"x" : {
"mongo" : [
{
"list" : "lista"
},
{
"list" : "listb"
}
],
"hello" : "world"
},
"item_number" : 1
}
Ok if I want to push to mongo list using mongoengine, i do this:
Mydoc.objects(item_number=1).update_one(push__x__mongo={"list" : "listc"})
That works pretty well, if a query the database again i get this
{
"_id" : ObjectId("55e360cce725070909af4953"),
"x" : {
"mongo" : [
{
"list" : "lista"
},
{
"list" : "listb"
},
{
"list" : "listc"
}
],
"hello" : "world"
},
"item_number" : 1
}
But When I try to pull from same list using pull in mongo engine:
Mydoc.objects(item_number=1).update_one(pull__x__mongo={'list': 'lista'})
I get this error:
mongoengine.errors.OperationError: Update failed (Cannot apply $pull
to a non-array value)
comparising the sentences:
Mydoc.objects(item_number=1).update_one(push__x__mongo={"list" : "listc"}) # Works
Mydoc.objects(item_number=1).update_one(pull__x__mongo={"list" : "listc"}) # Error
How can I pull from this list?
I appreciate any help
I believe that the problem is that mongoengine doesn't know the structure of your x document. You declared it as DictField, so mongoengine thinks you are pulling from DictField not from ListField. Declare x as ListField and both queries should work just fine.
I suggest you should also create an issue for this:
https://github.com/MongoEngine/mongoengine/issues
As a workaround, you can use a raw query:
Mydoc.objects(item_number=1).update_one(__raw__={'$pull': {'x.mongo': {'list': 'listc'}}})
My items store in MongoDB like this :
{"ProductName":"XXXX",
"Catalogs" : [
{
"50008064" : "Apple"
},
{
"50010566" : "Box"
},
{
"50016422" : "Water"
}
]}
Now I want query all the items belong to Catalog:50008064,how to?
(the catalog id "50008064" , catalog name "Apple")
You cannot query this in an efficient manner and performance will decrease as your data grows. As such I would consider it a schema bug and you should refactor/migrate to the following model which does allow for indexing :
{"ProductName":"XXXX",
"Catalogs" : [
{
id : "50008064",
value : "Apple"
},
{
id : "50010566",
value : "Box"
},
{
id : "50016422",
value : "Water"
}
]}
And then index :
ensureIndex({'Catalogs.id':1})
Again, I strongly suggest you change your schema as this is a potential performance bottleneck you cannot fix any other way.
This should probably work according to the entry here, although this won't be very fast, as stated in in the link.
db.products.find({ "Catalogs.50008064" : { $exists: true } } )