Elasticsearch fix date field value. (change field value from int to string) - python

I used python ship data from mongodb to elasticsearch.
There is a timestamp field named update_time, mapping like:
"update_time" : {
"type": "date",
"format": "yyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
},
epoch_millis is used for timestamp.
After everything done, I used range to query doc in date range. But nothing return. After some researching, I thought the problem was: python timestamp int(time.time()) is 10 digit, but it is 13 digit in elasticsearch , official example :
PUT my_index/my_type/2?timestamp=1420070400000.
so I tried to update the update_time field by multiple 1000:
{
"script": {
"inline": "ctx._source.update_time=ctx._source.update_time*1000"
}
}
Unfortunately, I found all update_time become minus. Then I come up with that Java int type is pow(2,32) -1 , much lower than 1420070400000. So I guess timestamp field should not be int ?
Then I want to update the field value from int to string(I don't want ot change field mapping type, I know change mapping type need reindex, but here only need change value )
But I can't figure out what script I can use, official script document not mention about this

I didn't know groovy was a lauguage and thought it only used for math method because ES doc only wrote about that.
The sulotion is simple, just convert int to long.
curl -XPOST http://localhost:9200/my_index/_update_by_query -d '
{
"script": {
"inline": "ctx._source.update_time=(long)ctx._source.update_time*1000"
}
}'

Related

How to set type of field when inserting into mongodb

So I am querying an API, receiving data and then storing it into MongoDB.
All was working fine so far, Except now I have started using Mongo's Aggregation pipeline. During this I realized that Mongo is inserting the number data as strings. Hence now, my aggregation pipeline wont work as I am doing numerical computation such as calculating averages etc....Because Mongo is seeing it as a string.
How can I set the type of the field during Insert.....such that I specify that this is float etc...
What I have tried so far is the below code: but it does not work well, because the mongo shell is complaining because the field name starts with a number:
db.weeklycol.find().forEach(function(ch)
{
db.weeklycol.update({
"_id":ch._id},
{"$set":
{
"4_close":parseInt(ch.4_close)
}
});
To access property, which name is weird use []:
ch['4_close']
Then about saving numbers, well I made test:
> db.test.insertOne({_id:1, field: 2})
{ "acknowledged" : true, "insertedId" : 1 }
> db.test.find({_id:1})
{ "_id" : 1, "field" : 2 }
Seems to be added number alright. Can you please post exact example of code with some dummy values where inserted object have property with number value, and inserted document have this value turned to string?
I have managed to resolve this by the below code....So I changing the variable type before inserting it into my "insert" string with the below code: I created this function, and I call the function on my whole dictionary just before im inserting....If its a number it will convert, else it will pass:
I have similar function which converts as well....instead of float on line 4, i have it changed to date.
def convertint(bdic):
for key, value in bdic.items():
try:
bdic[key] = float(value)
except:
pass
return bdic

MongoDB $set changing datatypes

I have a MongoDB 3.2 instance running on Ubuntu 14.04. Single node set up. Last night I performed a migration where I ran this code for ~1400 documents in a collection:
for r in responses: # find cursor with ~1400 documents in it
database.responses.update_one({
"_id" : r["_id"]
}, {
"$set" : {
"client_id" : client["_id"]
}
})
After the migration, some of the fields in my response documents in the responses collection had switched from DateObject types to Int32 timestamp representations. Some of the Int32 fields had changed to Doubles. These fields were not updated in my $set statement (obviously). This affected only a small subset of the cursor (~75 documents).
This caused catastrophic failure as our models expected those fields to have data types they no longer had. Can someone explain to me what went wrong here?
after reading your question I got curious to know what went wrong, I guess if you have explicitly sat your type in the creation/update of these record you would not have faced that issue, for example:-
for r in responses: # find cursor with ~1400 documents in it
database.responses.update_one({
"_id" : r["_id"]
}, {
"$set" : {
"client_id" : new DateObject(client["_id"]);
}
})
My guess is that somewhere else in your code python has made changes in the types (maybe some code that is trying to automatically infer the type !?).
I am pretty sure that before your "for r in responses: " code there is something else that is maybe trying to detect the type of the fields. Is this the case? Can you provide the code before the snippet you provided ?

Property has meaning INDEX_VALUE, but is not expected to have any meaning

I am receiving an error from BigQuery when attempting to load data from a cloud backup.
From Google BigQuery Console
Errors:
query: Property established_year of type integer has meaning INDEX_VALUE, but is not expected to have any meaning. (error code: invalidQuery)
Job ID: csgapi:bquijob_736ab47b_156b2f2a4f8
I have looked up in the model that the error is referencing to and it is of type ndb.IntegerProperty. When I then look at the Property class that ndb.IntegerProperty extends, I see a comment where new_p.set_meaning(entity_pb.Property.INDEX_VALUE) is set (Docs here) that says.
# Projected properties have the INDEX_VALUE meaning and only contain
# the original property's name and value.
How is my property of type ndb.IntegerProperty have this INDEX_VALUE set and what must I do to fix this?
EDIT
Looking further into this, I was curious as to why the property was even being included on this item. According to the model I have two elements that could be the offending properties. The first property is company, which is an NDB.KeyProperty, which does not seem likely because this would only create a reference. The other is a Generic Property, company_base, that is not indexed and also is mapped on query; this is not saved to the data store. We have 14 GB of data for this specific kind. I have attempted to loop through all the elements within the backup to search for the field company_base, but I am not able to find the elements that are in the file.
I have looked at other kinds that use this same structure, and their schema's do not appear to have the same "company_base" within their schema. They too should not save that element to the datastore.
If the elements are appearing in the backup schema, does that mean that at some point in time these non-indexed properties were saved to the datastore and thus appear in the backup?
The generic field "company_base" from the backup schema:
field {
name: "company_base"
type {
is_list: false
embedded_schema {
kind: "Company"
field {
name: "ambest_outlook"
type {
is_list: false
primitive_type: STRING
}
}
field {
name: "name_full"
type {
is_list: false
primitive_type: STRING
}
}
field {
name: "established_year"
type {
is_list: false
primitive_type: INTEGER
}
}
field {
name: "ambest_rating"
type {
is_list: false
primitive_type: STRING
}
}
field {
name: "parent_company"
type {
is_list: false
primitive_type: REFERENCE
}
}
field {
name: "sp_rating"
type {
is_list: false
primitive_type: STRING
}
}
}
}
}
Smitty
Within the schema, though it would only appear on the entire backup, there appeared a single element with wrong type. The wouldn't appear normally due to NDB's structure. The only way that I was able to fix this issue was to navigate the entities and remove the offending properties to ensure correctness.
It took a long time for the system to fix it via iteration, but they were not searchable items.

Python MongoDB retrieve value of ISODate field

I am writing a script in python that will query a MongoDB database, parse the results in a format that can be imported into a relational database.
The data is stored in an associative array. I am able to query most of the fields by using dot notation such as $status.state.
Issue:
The issue is that the field $last_seen ISODate is not returning a value when attempting to use dot notation.
# Last seen dot notation does not return a value
"u_updated_timestamp": "$last_seen.date"
Here is the data structure:
{
"status" : {
"state" : "up",
},
"addresses" : {
"ipv4" : "192.168.1.1"
},
"last_seen" : ISODate("2016-04-29T14:06:17.441Z")
}
Here is the code that I am starting with. All of the other fields are returnign in the correct format. However, the last_seen ISO date field is not returning any value at all. What other steps are required in order to retrieve the value?
I tried $dateToString but it did not work (we are running pymongo 2.7).
computers = db['computer'].aggregate([
{"$project" : {
"u_ipv4": "$addresses.ipv4",
"u_status": "$status.state",
# Last seen dot notation does not return a value
"u_updated_timestamp": "$last_seen.date"
}}
])
I also tried simply $last_seen but that returns key and value, I only need the value.
UPDATE: The desired format is flexible. It could be a unix timestamp or mm-dd-yyyy. Any format would be acceptable. The main issue is that there is no date value being returned at all with this query as it stands currently.

Simple MongoDB query slow

I am new to MongoDB. I am trying to write some data to a Mongo database from Python script, the data structure is simple:
{"name":name, "first":"2016-03-01", "last":"2016-03-01"}
I have a script to query if the "name" exists, if yes, update the "last" date, otherwise, create the document.
if db.collections.find_one({"name": the_name}):
And the size of data is actually very small, <5M bytes, and <150k records.
It was fast at first (e.g. the first 20,000 records), and then getting slower and slower. I checked the analyzer profile, some queries were > 50 miliseconds, but I don't see anything abnormal with those records.
Any ideas?
Update 1:
Seems there is no index for the "name" field:
> db.my_collection.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "domains.my_collection"
}
]
First, you should check if the collection has an index on the "name" field. See the output of the following command in mongo CLI.
db.my_collection.getIndexes();
If there is no index then create it (note, on production environment you'd better create index in background).
db.my_collection.createIndex({name:1},{unique:true});
And if you want to insert a document if the document does not exist or update one field if the document exists then you can do it in one step without pre-querying. Use UPDATE command with upsert option and $set/$setOnInsert operators (see https://docs.mongodb.org/manual/reference/operator/update/setOnInsert/).
db.my_collection.update(
{name:"the_name"},
{
$set:{last:"current_date"},
$setOnInsert:{first:"current_date"}
},
{upsert:true}
);

Categories

Resources