I have a Mongo Collection that I need to update, and I'm trying to use the collection.update command to no avail.
Code below:
import pymongo
from pymongo import MongoClient
client = MongoClient()
db = client.SensorDB
sensors = db.Sensor
for sensor in sensors.find():
lat = sensor['location']['latitude']
lng = sensor['location']['longitude']
sensor['location'] = {
"type" : "Feature",
"geometry" : {
"type" : "Point",
"coordinates" : [lat ,lng]
},
"properties": {
"name": sensor['name']
}
}
sensors.update({'webid': sensor['webid']} , {"$set": sensor}, upsert=True)
However, running this gets me the following:
Traceback (most recent call last):
File "purgeDB.py", line 21, in <module>
cameras.update({'webid': sensor['webid']} , {"$set": sensor}, upsert=True)
File "C:\Anaconda\lib\site-packages\pymongo\collection.py", line 561, in update
check_keys, self.uuid_subtype), safe)
File "C:\Anaconda\lib\site-packages\pymongo\mongo_client.py", line 1118, in _send_message
rv = self.__check_response_to_last_error(response, command)
File "C:\Anaconda\lib\site-packages\pymongo\mongo_client.py", line 1060, in __check_response_to_last_error
raise OperationFailure(details["err"], code, result)
pymongo.errors.OperationFailure: Mod on _id not allowed
Change this line:
for sensor in sensors.find():
to this:
for sensor in sensors.find({}, {'_id': 0}):
What this does is prevent Mongo from returning the _id field, since you aren't using it, and it's causing your problem later in your update() call since you cannot "update" _id.
An even better solution (Only write the data that is needed)
for sensor in sensors.find():
lat = sensor['location']['latitude']
lng = sensor['location']['longitude']
location = {
"type" : "Feature",
"geometry" : {
"type" : "Point",
"coordinates" : [lat ,lng]
},
"properties": {
"name": sensor['name']
}
}
sensors.update({'webid': sensor['webid']} , {"$set": {'location': location}})
Edit:
As mentioned by Loïc Faure-Lacroix, you also do not need the upsert flag in your case - your code in this case is always updating, and never inserting.
Edit2:
Surrounded _id in quotes for first solution.
Related
Is there any way to use $cond along with ($set, $inc, ...) operators in update? (MongoDB 4.2)
I want to update a field in my document by $inc it with "myDataInt" if a condition comes true, otherwise keeps it as it is:
db.mycoll.update(
{"_id" : "5e9e5da03da783817d231dc4"},
{"$inc" : {
"my_data_sum" : {
"$cond" : [
{
"$ne" : ["snapshot_time", new_snapshot_time)]
},myDataInt, 0]
]
}
},
{upsert=True, multi=False}
)
However, this gives an error in pymongo:
raise WriteError(error.get("errmsg"), error.get("code"), error)
pymongo.errors.WriteError: The dollar ($) prefixed field '$cond' in 'my_data_sum.$cond' is not valid for storage.
Any idea to avoid using find() before update in this case?
Update:
If I use the approach that Joe has mentioned, an exception will be raised in PyMongo (v3.10.1) due to using 'list' as a parameter in update_many() instead of 'dict':
from pymongo import MongoClient
db = MongoClient()['mydb']
db.mycoll.update_many(
{"_id" : "5e9e5da03da783817d231dc4"},
[{"$set" : {
"my_data_sum" : {
"$sum": [
"$my_data_sum",
{"$cond" : [
{"$ne" : ["snapshot_time", new_snapshot_time]},
myDataInt,
0
]}
]
}
}}],
upsert:true
)
That ends up with this error:
File "/usr/local/lib64/python3.6/site-packages/pymongo/collection.py", line 1076, in update_many session=session),
File "/usr/local/lib64/python3.6/site-packages/pymongo/collection.py", line 856, in _update_retryable _update, session)
File "/usr/local/lib64/python3.6/site-packages/pymongo/mongo_client.py", line 1491, in _retryable_write return self._retry_with_session(retryable, func, s, None)
File "/usr/local/lib64/python3.6/site-packages/pymongo/mongo_client.py", line 1384, in _retry_with_session return func(session, sock_info, retryable)
File "/usr/local/lib64/python3.6/site-packages/pymongo/collection.py", line 852, in _update retryable_write=retryable_write)
File "/usr/local/lib64/python3.6/site-packages/pymongo/collection.py", line 823, in _update _check_write_command_response(result)
File "/usr/local/lib64/python3.6/site-packages/pymongo/helpers.py", line 221, in _check_write_command_response _raise_last_write_error(write_errors)
File "/usr/local/lib64/python3.6/site-packages/pymongo/helpers.py", line 203, in _raise_last_write_error raise WriteError(error.get("errmsg"), error.get("code"), error)
pymongo.errors.WriteError: Modifiers operate on fields but we found type array instead. For example: {$mod: {<field>: ...}} not {$set: [ { $set: { my_data_sum: { $sum: [ "$my_data_sum", { $cond: [ { $ne: [ "$snapshot_time", 1586910283 ] }, 1073741824, 0 ] } ] } } } ]}
If you are using MongoDB 4.2, you can use aggregation operators with updates. $inc is not an aggregation operator, but $sum is. To specify a pipeline, pass an array as the second argument to update:
db.coll.update(
{"_id" : "5e9e5da03da783817d231dc4"},
[{"$set" : {
"my_data_sum" : {
"$sum": [
"$my_data_sum",
{"$cond" : [
{"$ne" : ["snapshot_time", new_snapshot_time]},
myDataInt,
0
]}
]
}
}}],
{upsert:true, multi:false}
)
After spending some time and searching online, I figured that the update_many(), update_one(), and update() methods of Collection object in PyMongo do not accept type list as parameters to support the new Aggregation Pipeline feature of the Update operation in MongoDB 4.2+. (At least this option is not available in PyMongo v3.10 yet.)
However, looks like I could use the command method of the Database object in PyMongo which is an instance of the (MongoDB runCommand) and it worked just fine for me:
from pymongo import MongoClient
db = MongoClient()['mydb']
result = db.command(
{
"update" : "mycoll",
"updates" : [{
"q" : {"_id" : "5e9e5da03da783817d231dc4"},
"u" : [
{"$set" : {
"my_data_sum" : {
"$sum": [
"$my_data_sum",
{"$cond" : [
{"$ne" : ["snapshot_time", new_snapshot_time]},
myDataInt,
0
]}
]
}
}}
],
"upsert" : True,
"multi" : True
}],
"ordered": False
}
)
The command method of the database object gets a dict object of all the required commands as its first argument, and then the list of Aggregation Pipeline can be included inside the dict object (q is the update query, and the u defined the fields to be updated).
result is a dictionary of Ack message from MongoDB which contains 'nModified', 'upserted', and 'writeErrors'.
https://mongoplayground.net/p/1AklFKuhFi6
[
{
"id": 1,
"like": 3
},
{
"id": 2,
"like": 1
}
]
let value = 1,
if you want to increment then use
value = -1 * value
db.collection.aggregate([
{
"$match": {
"id": 1
}
},
{
"$set": {
"count": {
$cond: {
if: {
$gt: [
"$like",
0
]
},
then: {
"$subtract": [
"$like",
value
]
},
else: 0
}
}
}
}
])
I am a newbie to mongodb. I want to retrieve the datas of a certain fields on a specified date from mongodb using python. My Mongodb Collection looks like this
{
"_id" : ObjectId("5d9d7eec7c6265a42e352d6d"),
"browser" : "Chrome",
"countryCode" : "IN",
"Page" : "http://192.168.1.34/third.html",
"date" : "2019-10-09T10:32:08.438660"
}
{
"_id" : ObjectId("5d9d7eec7c6265a42e352d6e"),
"browser" : "Chrome",
"countryCode" : "IN",
"Page" : "http://192.168.1.14/fourth.html",
"date" : "2019-10-12T10:32:08.438662"
}
and so on
I retrieved the data from mongodb by using the following query in mongodb
db.collection_name.find({"date": {'$gte': "2019-10-09T10:32:08.438660", '$lte': "2019-10-10T10:32:08.438661"}},{}, {Page:[], _id:0})
I want to get that data using pymongo in python. Here's the Code I tried,
from pymongo import MongoClient
import pymongo
from bson.raw_bson import RawBSONDocument
myclient = pymongo.MongoClient(
"mongodb://localhost:27017/", document_class=RawBSONDocument)
mydb = myclient['smackcoders']
mycol = mydb['logs']
from_date = "2019-10-09T10:32:08.438663"
to_date = "2019-10-12T10:32:08.438671"
for doc in mycol.find({"date": {'$gte': from_date, '$lte': to_date}}, {}, {'Page': [], '_id': 0}):
print(doc)
It shows error:
Traceback (most recent call last):
File "temp3.py", line 20, in <module>
for doc in mycol.find({"date": {'$gte': from_date, '$lte': to_date}}, {}, {'url': [], '_id': 0}):
File "/home/paulsteven/.local/lib/python3.7/site-packages/pymongo/collection.py", line 1460, in find
return Cursor(self, *args, **kwargs)
File "/home/paulsteven/.local/lib/python3.7/site-packages/pymongo/cursor.py", line 145, in __init__
raise TypeError("skip must be an instance of int")
TypeError: skip must be an instance of int
Output Required:
["http://192.168.1.34/third.html","http://192.168.1.14/fourth.html",.....and goes on for a specified date]
I don't Know how to make it work. Query works in mongodb but in python, it fails. Help me with some solutions.
You've got 3 parameters in your find function; you probably only need 2; a query and a projection. The third parameter is skip which is why it's failing with that error.
Mongo shell only takes 2 parameters so it is likely ignoring the third which is why it looks like it is working.
How can i create external tables (federated data source) in BigQuery using python (google-cloud-bigquery)?
I know you can use bq commands like this, but that is not how i want to do it:
bq mk --external_table_definition=path/to/json tablename
bq update tablename path/to/schemafile
with external_table_definition as:
{
"autodetect": true,
"maxBadRecords": 9999999,
"csvOptions": {
"skipLeadingRows": 1
},
"sourceFormat": "CSV",
"sourceUris": [
"gs://bucketname/file_*.csv"
]
}
and a schemafile like this:
[
{
"mode": "NULLABLE",
"name": "mycolumn1",
"type": "INTEGER"
},
{
"mode": "NULLABLE",
"name": "mycolumn2",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "mycolumn3",
"type": "STRING"
}
]
Thank you for your help!
Lars
table_id = 'table1'
table = bigquery.Table(dataset_ref.table(table_id), schema=schema)
external_config = bigquery.ExternalConfig('CSV')
external_config = {
"autodetect": true,
"options": {
"skip_leading_rows": 1
},
"source_uris": [
"gs://bucketname/file_*.csv"
]
}
table.external_data_configuration = external_config
table = client.create_table(table)
Schema Format is :
schema = [
bigquery.SchemaField(name='mycolumn1', field_type='INTEGER', is_nullable=True),
bigquery.SchemaField(name='mycolumn2', field_type='STRING', is_nullable=True),
bigquery.SchemaField(name='mycolumn3', field_type='STRING', is_nullable=True),
]
I know this is well after the question has been asked and answered, but the above accepted answer does not work. I attempted to do the same thing you are describing and additionally trying to use the same approach to update an existing external table who added some new columns. This would be the correct snippet to use assuming you have that JSON file stored somewhere like /tmp/schema.json
[
{
"mode": "NULLABLE",
"name": "mycolumn1",
"type": "INTEGER"
},
{
"mode": "NULLABLE",
"name": "mycolumn2",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "mycolumn3",
"type": "STRING"
}
]
You should simply need to have the following if you already have the API representation of the options you want to add to the external table.
from google.cloud import bigquery
client = bigquery.Client()
# dataset must exist first
dataset_name = 'some_dataset'
dataset_ref = client.dataset(dataset_name)
table_name = 'tablename'
# Or wherever your json schema lives
schema = client.schema_from_json('/tmp/schema.json')
external_table_options = {
"autodetect": True,
"maxBadRecords": 9999999,
"csvOptions": {
"skipLeadingRows": 1
},
"sourceFormat": "CSV",
"sourceUris": [
"gs://bucketname/file_*.csv"
]
}
external_config = client.ExternalConfig.from_api_repr(external_table_options)
table = bigquery.Table(dataset_ref.table(table_name), schema=schema)
table.external_data_configuration = external_config
client.create_table(
table,
# Now you can create the table safely with this option
# so that it does not fail if the table already exists
exists_od=True
)
# And if you seek to update the table's schema and/or its
# external options through the same script then use
client.update_table(
table,
# As a side note, this portion of the code had me confounded for hours.
# I could not for the life of me figure our that "fields" did not point
# to the table's columns, but pointed to the `google.cloud.bigquery.Table`
# object's attributes. IMHO, the naming of this parameter is horrible
# given "fields" are already a thing (i.e. `SchemaField`s).
fields=['schema', 'external_data_configuration'])
)
In addition to setting the external table configuration using the API representation, you can set all of the same attributes by calling the names of those attributes on the bigquery.ExternalConfig object itself. So this would be another approach surrounding just the external_config portion of the code above.
external_config = bigquery.ExternalConfig('CSV')
external_config.autodetect = True
external_config.max_bad_records = 9999999
external_config.options.skip_leading_rows = 1
external_config.source_uris = ["gs://bucketname/file_*.csv"]
I must again however raise some frustration with the Google documentation. The bigquery.ExternalConfig.options attribute claims that it can be set with a dictionary
>>> from google.cloud import bigquery
>>> help(bigquery.ExternalConfig.options)
Help on property:
Optional[Dict[str, Any]]: Source-specific options.
but that is completely false. As you can see above the python object attribute names and the API representation names of those same attributes are slightly different. Either way you try it though, if you had a dict of the source-specific options (e.g. CSVOptions, GoogleSheetsOptions, BigTableOptions, etc...) and attempted to pass that dict as the options attribute, it laughs in your face and says mean things like this.
>>> from google.cloud import bigquery
>>> external_config = bigquery.ExternalConfig('CSV')
>>> options = {'skip_leading_rows': 1}
>>> external_config.options = options
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: cant set attribute
>>> options = {'skipLeadingRows': 1}
>>> external_config.options = options
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: cant set attribute
>>> options = {'CSVOptions': {'skip_leading_rows': 1}}
>>> external_config.options = options
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: cant set attribute
>>> options = {'CSVOptions': {'skipLeadingRows': 1}}
>>> external_config.options = options
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: cant set attribute
The workaround was iterating over the options dict and using the __setattr__() method on the options which worked well for me. Pick you favorite approach from above. I have tested all of this code and will be using it for some time.
I have this data in a mongo database,
{
"_id": {
"$oid": "5654a8f0d487dd1434571a6e"
},
"ValidationDate": {
"$date": "2015-11-24T13:06:19.363Z"
},
"DataRaw": " WL 00100100012015-08-28 02:44:17+0000+ 16.81 8.879 1084.00",
"ReadingsAreValid": true,
"locationID": " WL 001",
"Readings": {
"pH": {
"value": 8.879
},
"SensoreDate": {
"value": {
"$date": "2015-08-28T02:44:17.000Z"
}
},
"temperature": {
"value": 16.81
},
"Conductivity": {
"value": 1084
}
},
"HMAC": "ecb98d73fcb34ce2c5bbcc9c1265c8ca939f639d791a1de0f6275e2d0d71a801"
}
My goal is to calculate temperature, ph and conductivity values that satisfy a given range for the last 30 days but i am getting an error which i have not been able to resolve while searching online. Here is my code.
import datetime
from pymongo import MongoClient
def total_data_push():
data = MongoClient().cloudtest.test_5_27
now = datetime.datetime.utcnow()
last_30d = now - datetime.timedelta(days=30)
last_year = now.replace(year=now.year - 1)
since_last_month = data.find({"ReadingsAreValid": False}, {"ValidationDate": {"$gte": last_30d}},
{"Readings.temperature.value": {"$gt": 1.0}}).count()
print since_last_month
def main():
total_data_push()
if __name__ == "__main__":
main()
When i run the script without the ValidationDate piece, it returns correct values but adding this data component to get that for last 30 days returns the following error
traceback (most recent call last):
File "total_data_push.py", line 29, in <module>
main()
File "total_data_push.py", line 26, in main
total_data_push()
File "total_data_push.py", line 17, in total_data_push
{"Readings.temperature.value": {"$gt": 1.0}}).count()
File "/Library/Python/2.7/site-packages/pymongo/collection.py", line 866, in find
return Cursor(self, *args, **kwargs)
File "/Library/Python/2.7/site-packages/pymongo/cursor.py", line 90, in __init__
raise TypeError("skip must be an instance of int")
TypeError: skip must be an instance of int
What am i really missing here? thanks for your help in advance
As said #BenCr, if you look at find signature :
find(filter=None, projection=None, skip=0, limit=0,
no_cursor_timeout=False, cursor_type=CursorType.NON_TAILABLE,
sort=None, allow_partial_results=False, oplog_replay=False,
modifiers=None, manipulate=True)
The filter is the first parameter as the following :
since_last_month = db.data.find({
"ReadingsAreValid": False,
"ValidationDate": {"$gte": last_30d},
"Readings.temperature.value": {"$gte": 1.0}
}).count()
This filter should like:
since_last_month = db.data.find({
"ReadingsAreValid": False,
"ValidationDate": {"$gte": last_30d},
"Readings.temperature.value": {"$gte": 1.0}
}).count()
I am working on pymongo and this is my document:
{
"_id": ObjectId("51211b57f07ddaa377000000"),
"assignments": {
"0": {
"0": {
"_id": ObjectId("5120dd7400a4453d58a0d0ec")
},
"1": {
"_id": ObjectId("5120dd8e00a4453d58a0d0ed")
},
"2": {
"_id": ObjectId("5120ddad00a4453d58a0d0ee")
}
}
},
"password": "my_passwd",
"username": "john"
}
I would like to unset the "assignment" property of all such docs.
I was able to achieve this on the mongo shell by doing:
db.users.update({}, {$unset: {"assignments": 1}}, false, true)
i.e., I passed the upsert and multi flag as the last two parameters to the update function function on users collection.
However I did this with pymongo:
db.users.update({}, {"$unset": {"assignments": 1}}, False, True)
But the python interpreter threw an error as follows:
File "notes/assignment.py", line 34, in <module>
db.users.update({}, {"$unset": {"assignments": 1}}, False, True)
File "/usr/local/lib/python2.7/dist-packages/pymongo/collection.py", line 481, in update
check_keys, self.__uuid_subtype), safe)
File "/usr/local/lib/python2.7/dist-packages/pymongo/mongo_client.py", line 852, in _send_message
rv = self.__check_response_to_last_error(response)
File "/usr/local/lib/python2.7/dist-packages/pymongo/mongo_client.py", line 795, in __check_response_to_last_error
raise OperationFailure(details["err"], details["code"])
pymongo.errors.OperationFailure: Modifiers and non-modifiers cannot be mixed
Where am I going wrong?
The problem is that the two flags you are passing in aren't upsert and multi. Based on the documentation of PyMongo's Collection.update (found here), it looks like you might be passing in values for the upsert and manipulate options, although I am not certain.
All you have to do to solve this is use one of Python's most awesome features: named arguments. By specifying which options you are passing to update, you add clarity to your code in addition to making sure accidents like this don't happen.
In this case, we want to pass the options upsert=False and multi=True.
db.users.update({}, { "$unset": { "assignments": 1 } }, upsert=False, multi=True)