How to set _id to be 32bit Integer? - python

I am currently pulling data into MongoDB, and will later need to pull this data into a separate application. This application has a requirement for the _id field to be a 32bit integer.
Be sure to explicitly set the _id attribute in the result document to unique 32 bit integers.
source
I am making use of pymongo to insert documents into a collection.
def parse_tweet(in_t):
t = {}
t["text"] = in_t["text"]
t["shape"] = in_t["coordinates"]["coordinates"][0], in_t["coordinates"]["coordinates"][1]
return t
This gives me the expected documents:
{
"_id" : ObjectId("50a0de04f26afb14f4bba03d"),
"text" : "hello world",
"shape" : [144.9557834, -37.8208589],
}
How can I explicitly set the _id value to be a 32bit integer?
I don't intend on storing more than 6 million documents.

Just generate an id and pass it along. Id can be anything (except for array).
def parse_tweet(in_t):
t = {}
t["_id"] = get_me_an_int32_id
t["text"] = in_t["text"]
t["shape"] = in_t["coordinates"]["coordinates"][0], in_t["coordinates"]["coordinates"][1]
return t
You will have to take care of its uniqueness yourself. MongoDB will only ensure that you don't store duplicate values. But where you get unique values - that's your problem.
Here are some ideas: How to make an Autoincrementing field.

Related

How to set type of field when inserting into mongodb

So I am querying an API, receiving data and then storing it into MongoDB.
All was working fine so far, Except now I have started using Mongo's Aggregation pipeline. During this I realized that Mongo is inserting the number data as strings. Hence now, my aggregation pipeline wont work as I am doing numerical computation such as calculating averages etc....Because Mongo is seeing it as a string.
How can I set the type of the field during Insert.....such that I specify that this is float etc...
What I have tried so far is the below code: but it does not work well, because the mongo shell is complaining because the field name starts with a number:
db.weeklycol.find().forEach(function(ch)
{
db.weeklycol.update({
"_id":ch._id},
{"$set":
{
"4_close":parseInt(ch.4_close)
}
});
To access property, which name is weird use []:
ch['4_close']
Then about saving numbers, well I made test:
> db.test.insertOne({_id:1, field: 2})
{ "acknowledged" : true, "insertedId" : 1 }
> db.test.find({_id:1})
{ "_id" : 1, "field" : 2 }
Seems to be added number alright. Can you please post exact example of code with some dummy values where inserted object have property with number value, and inserted document have this value turned to string?
I have managed to resolve this by the below code....So I changing the variable type before inserting it into my "insert" string with the below code: I created this function, and I call the function on my whole dictionary just before im inserting....If its a number it will convert, else it will pass:
I have similar function which converts as well....instead of float on line 4, i have it changed to date.
def convertint(bdic):
for key, value in bdic.items():
try:
bdic[key] = float(value)
except:
pass
return bdic

How to prevent python from inserting duplicates into mongodb?

I have json data stored in a variable that gets inserted into a MongoDB, once per day with Python. But the json data in the variable often does not change, but it still get's inserted into the MongoDB ... which creates masses of duplicates of the same entries.
Every entry in the json data variable each has 1 unique key: uuid.
How do you prevent Python from inserting duplicates into MongoDB? I looked into db.collection.update(), but I'm not sure if its suitable and I don't know how to use it with a variable?
As long as you can check its id for uniqueness, you can use the method update_one() and set upsert for that.
For example,
filter_data = {'uuid': '111'}
new_data = {'$set': {'new_value': 25}}
db.collection.update_one(filter_data, new_data, upsert=True)
This will check if uuid = '111' exists, if not it will create a document; otherwise, update it.

How to get dynamodb to only return certain columns

Hello, I have a simple dynamodb table here filled with placeholder values.
How would i go about retrieving only sort_number, current_balance and side with a query/scan?
I'm using python and boto3, however, just stating what to configure for each of the expressions and parameters is also enough.
Within the Boto3 SDK you can use:
get_item if you're trying to retrieve a specific value
query, if you're trying to get values from a single partition (the hash key).
scan if you're trying to retrieve values from across multiple parititions.
Each of these have a parameter named ProjectionExpression, using this parameter provides the following functionality
A string that identifies one or more attributes to retrieve from the specified table or index. These attributes can include scalars, sets, or elements of a JSON document. The attributes in the expression must be separated by commas.
]You would specify the attributes that you want to retrieve comma separated, be aware that this does not reduce the cost of RCU that is applied for performing the interaction.
table = dynamodb.Table('tablename')
response = table.scan(
AttributesToGet=['id']
)
This works. but this method is deprecated, using Projections is recommended
to return only some fields you should use ProjectionExpression in the Query configuration object, this is an string array with all the fields:
var params = {
TableName: 'TableName',
KeyConditionExpression: '#pk = :pk AND #sk = :sk',
ExpressionAttributeValues: {
':pk': pk,
':sk': sk,
},
ExpressionAttributeNames: {
'#sk':'sk',
'#pk':'pk'
},
ProjectionExpression:['sort_number', 'current_balance','side']
};

how to update entire object without changing the id in pymongo?

I am trying to update all properties of the record/object which is stored in MongoDB, now I am trying to do like this.
Deleted the object, but keeping the ID of the object being deleted.
Create a new object with the same ID which I have deleted.
Is it correct ? or What is they to do above objective using pymongo ?
mongo_object = {
_id : 123,
prop_key_1: some_value,
// ... many present
prop_key_n: some_value,
}
def delete(record):
doc = get_db().reviews.delete_many({"id" : record["_id"]})
print(doc.deleted_count)
# all key values are changed, mongo_object is changed except the id.
delete(mongo_object)
db.collection_name.insert_one(mongo_object)
But above code is not deleting the object, the doc.deleted_count is 0.
db.collection_name.update_one({"_id" : record["_id"]}, new_data}
just use update without $set , the document will get replaced completely without changing the _id
from bson.objectid import ObjectId
def replace_one(record):
result = client.test_db.test_collection.replace_one({"_id":ObjectId(record["_id"])}, record,upsert=True)
print(result.matched_count)
What is the correct way to query MongoDB for _id using string by using Python?
Pymongo doc - http://api.mongodb.com/python/current/api/pymongo/collection.html#pymongo.collection.Collection.replace_one

Why am I getting 2 different results for similar queries?

I'm trying to link one document to another. To do that, I'm trying to store the ObjectID of one document in the other. I'm trying a couple of different ways that should produce the same results, but they actually look different. Here are the ways I'm trying:
Method 1
owner['ownedCar'] = db.cars.find_one({ '_id' : ObjectId( $theCarsObjectIDstring ) }, {'_id': 1})
db.owners.save(owner)
which looks like this in the database:
{
_id {"$oid": "502186421fe3321dfa000001"}
}
and Method 2
car = db.cars.find_one( { '_id' : ObjectId( $theCarsObjectIDstring ) } )
owner['ownedCar'] = car['_id']
db.owners.save(owner)
which looks like this:
{"$oid": "502186421fe3321dfa000001"}
Shouldn't they look the same? What's the preferred way to link documents?
EDIT Why is this question getting downvoted?
These two results are the same, the difference is how you are picking out the results to populate the linked field.
When you use the second param of find to return fields, even if it is just one it will always return an object with the field names as the keys and the field values as the value. You make the linked field equal that object as such you don't just get the ID back as the value of the linked field. So the result of your first query is:
{
_id {"$oid": "502186421fe3321dfa000001"}
}
And you make the field equal that.
Alternatively you are physically picking out car['_id'] in the second query as such the value of the linked field is just the id.
This is a driver and language difference in interpretation of how it should return values.
I would say the second method is the best way since the first adds unnessecary bloat to the field in the form of the extra object.

Categories

Resources