Add Timestamp to ElasticSearch with Elasticsearch-py using Bulk-API

Add Timestamp to ElasticSearch with Elasticsearch-py using Bulk-API - python

I'm trying to add a timestamp to my data, have elasticsearch-py bulk index it, and then display the data with kibana.
My data is showing up in kibana, but my timestamp is not being used. When I go to the "Discovery" tab after configuring my index pattern, I get 0 results (yes, I tried adjusting the search time).
Here is what my bulk index json looks like:
{'index':
{'_timestamp': u'2015-08-11 14:18:26',
'_type': 'webapp_fingerprint',
'_id': u'webapp_id_redacted_2015_08_13_12_39_34',
'_index': 'webapp_index'
}
}
****JSON DATA HERE***
This will be accepted by elasticsearch and will get imported into Kibana, but the _timestamp field will not actually be indexed (it does show up in the dropdown when configuring an index pattern under "Time-field name").
I also tried formatting the metaFields like this:
{'index': {
'_type': 'webapp_fingerprint',
'_id': u'webapp_id_redacted_2015_08_13_12_50_04',
'_index': 'webapp_index'
},
'source': {
'_timestamp': {
'path': u'2015-08-11 14:18:26',
'enabled': True,
'format': 'YYYY-MM-DD HH:mm:ss'
}
}
}
This also doesn't work.
Finally, I tried including the _timestamp field within the index and applying the format, but I got an error with elasticsearch.
{'index': {
'_timestamp': {
'path': u'2015-08-11 14:18:26',
'enabled': True,
'format': 'YYYY-MM-DD HH:mm:ss'
},
'_type': 'webapp_fingerprint',
'_id': u'webapp_id_redacted_2015_08_13_12_55_53',
'_index': 'webapp_index'
}
}
The error is:
elasticsearch.exceptions.TransportError:
TransportError(500,u'IllegalArgumentException[Malformed action/metadata
line [1], expected a simple value for field [_timestamp] but found [START_OBJECT]]')
Any help someone can provide would be greatly appreciated. I apologize if I haven't explained the issue well enough. Let me know if I need to clarify more. Thanks.

Fixed my own problem. Basically, I needed to add mappings for the timestamp when I created the index.
request_body = {
"settings" : {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings" : {
"_default_":{
"_timestamp":{
"enabled":"true",
"store":"true",
"path":"plugins.time_stamp.string",
"format":"yyyy-MM-dd HH:m:ss"
}
}
}
}
print("creating '%s' index..." % (index_name))
res = es.indices.create(index = index_name, body = request_body)
print(" response: '%s'" % (res))

In the latest versions of Elasticsearch, just using the PUT/POST API and ISOFORMAT strings should work.
import datetime
import requests
query = json.dumps(
{
"createdAt": datetime.datetime.now().replace(microsecond=0).isoformat(),
}
)
response = requests.post("https://search-XYZ.com/your-index/log", data=query,
headers={'Content-Type': 'application/json'})
print(response)

Related

Notion database does not send name of linked page

I am using Notion API to fetch data from Notio.
I am using
Notion-Version": "2022-06-28"
The code below is my request
readUrl = f"https://api.notion.com/v1/databases/{self.db_id}/query"
res = requests.request("POST", readUrl, headers=self.headers)
self.data = res.json()
Even though I have names in Sprints and Tickets, they don't send names.
The data looks like as below
{
id: 'Rv%5Cn',
type: 'rollup',
rollup: { type: 'array', array: [], function: 'show_original' }
}
and ID is the same for all of them.
I tried one trick, created a new column, and used the CONCAT function, but this time they send data as below
{'id': 'rGlG', 'type': 'formula', 'formula': {'type': 'string', 'string': None}}
even though they have a name.
Response
"Sprint (contact)":{
"id":"?`:a",
"type":"formula",
"formula":{
"type":"string",
"string":null
}
},
"Sprint":{
"id":"?}m;",
"type":"rollup",
"rollup":{
"type":"array",
"array":[
],
"function":"show_original"
}
}

CDK WAF Python Multiple Statement velues error

I have AWS WAF CDK that is working with rules, and now I'm trying to add a rule in WAF with multiple statements, but I'm getting this error:
Resource handler returned message: "Error reason: You have used none or multiple values for a field that requires exactly one value., field: STATEMENT, parameter: Statement (Service: Wafv2, Status Code: 400, Request ID: 6a36bfe2-543c-458a-9571-e929142f5df1, Extended Request ID: null)" (RequestToken: b751ae12-bb60-bb75-86c0-346926687ea4, HandlerErrorCode: InvalidRequest)
My Code:
{
'name': 'ruleName',
'priority': 3,
'statement': {
'orStatement': {
'statements': [
{
'iPSetReferenceStatement': {
'arn': 'arn:myARN'
}
},
{
'iPSetReferenceStatement': {
'arn': 'arn:myARN'
}
}
]
}
},
'action': {
'allow': {}
},
'visibilityConfig': {
'sampledRequestsEnabled': True,
'cloudWatchMetricsEnabled': True,
'metricName': 'ruleName'
}
},

There are two things going on there:
Firstly, your capitalization is off. iPSetReferenceStatement cannot be parsed and creates an empty statement reference. The correct key is ipSetReferenceStatement.
However, as mentioned here, there is a jsii implementation bug causing some issues with the IPSetReferenceStatementProperty. This causes it not to be parsed properly resulting in a jsii error when synthesizing.
You can fix it by using the workaround mentioned in the post.
Add to your file containing the construct:
import jsii
from aws_cdk import aws_wafv2 as wafv2 # just for clarity, you might already have this imported
#jsii.implements(wafv2.CfnRuleGroup.IPSetReferenceStatementProperty)
class IPSetReferenceStatement:
#property
def arn(self):
return self._arn
#arn.setter
def arn(self, value):
self._arn = value
Then define your ip reference statement as follows:
ip_set_ref_stmnt = IPSetReferenceStatement()
ip_set_ref_stmnt.arn = "arn:aws:..."
ip_set_ref_stmnt_2 = IPSetReferenceStatement()
ip_set_ref_stmnt_2.arn = "arn:aws:..."
Then in the rules section of the webacl, you can use it as follows:
...
rules=[
{
'name': 'ruleName',
'priority': 3,
'statement': {
'orStatement': {
'statements': [
wafv2.CfnWebACL.StatementProperty(
ip_set_reference_statement=ip_set_ref_stmnt
),
wafv2.CfnWebACL.StatementProperty(
ip_set_reference_statement=ip_set_ref_stmnt_2
),
]
}
},
'action': {
'allow': {}
},
'visibilityConfig': {
'sampledRequestsEnabled': True,
'cloudWatchMetricsEnabled': True,
'metricName': 'ruleName'
}
}
]
...
This should synthesize your stack as expected.

Need help regarding relevancy score sorting in elastic search

### Settings for Indexing ###
import requests
import json
import logging
settings = {
'settings': {
'index': {
'number_of_shards': 1,
'number_of_replicas': 1,
'similarity': {
'default': {
'type': 'BM25',
"b": 0.3,
"k1": 0
}
}
}
},
'mappings': {
'properties': {
'title': {
'type': 'text',
}
}
}
}
headers = {'Content-Type': 'application/json'}
response = requests.put('http://localhost:9200/alldocs', data=json.dumps(settings), headers=headers)
response.json()
I am using the above elastic index setting for my search. I am using the BM25 scoring measure here.
Apparently, when I search for the top 20 results, the scores are not sorted. Furthermore, I also see that certain documents that are not in my top-20 results, by means of random sampling, have a better BM25 score (used a different BM25 library).
Can anyone help me figure why is this behavior and how can I resolve this?
(Elasticsearch documentation says it sorts all the scores by default)
Could it be because of sharding? But then I have explicitly asked the engine to use a single shard here.

Specifying a Custom Date Range in Googleads API TargetingIdeaService

I am trying to use the Googleads API TargetingIdeaService to plan budget and get forecast for a list of keywords, as in Keyword Planner:
Keyword Planner Screenshot
I want to specify the date range as next year (for example, 20180301~20190228), but I can't find any parameters or syntax that allow me to do so. The MonthlySearchVolumeAttribute is for selecting data for the past 12 months, so that doesn't seem to work for the situation.
Below is my python script for defining the selector:
selector = {
'searchParameters': [
{
'xsi_type': 'RelatedToQuerySearchParameter',
'queries': keyword_list
},
{
# Location setting (optional)
'xsi_type': 'LocationSearchParameter',
'locations': [{'id': '2840'}] #US
},
{
# Language setting (optional).
'xsi_type': 'LanguageSearchParameter',
'languages': [{'id': '1000'}] #English
},
{
# Network search parameter (optional)
'xsi_type': 'NetworkSearchParameter',
'networkSetting': {
'targetGoogleSearch': True,
'targetSearchNetwork': False,
'targetContentNetwork': False,
'targetPartnerSearchNetwork': True
}
}
],
'ideaType': 'KEYWORD',
'requestType': 'STATS',
'requestedAttributeTypes': ['KEYWORD_TEXT', 'SEARCH_VOLUME','COMPETITION','AVERAGE_CPC'],
# Configure the selector's Paging to limit the number of results returned by a single request
'paging': {
'startIndex': str(offset),
'numberResults': str(PAGE_SIZE)
},
}
Is there anything I can do to achieve my goal? Thanks!

JSON Schema: Input malformed

I'm using Tornado_JSON which is based on jsonschema and there is a problem with my schema definition. I tried fixing it in an online schema validator and the problem seems to lie in "additionalItems": True. True with capital T works for python and leads to an error in the online validator (Schema is invalid JSON.). With true the online validator is happy and the example json validates against the schema, but my python script doesn't start anymore (NameError: name 'true' is not defined). Can this be resolved somehow?
#schema.validate(
"""input_schema={
'type': 'object',
'properties': {
'DB': {
'type': 'number'
},
'values': {
'type': 'array',
'items': [
{
'type': 'array',
'items': [
{
'type': 'string'
},
{
'type': [
'number',
'string',
'boolean',
'null'
]
}
]
}
],
'additionalItems': true
}
}
},
input_example={
'DB': 22,
'values': [['INT', 44],['REAL', 33.33],['CHAR', 'b']]
}"""
)
I changed it according to your comments ( external file with json.loads() ). Perfect. Thank you.

Put the schema in a triple-quoted string or an external file, then parse it with json.loads(). Use the lower-case spelling.

The error stems from trying to put a builtin Python datatype into a JSON schema. The latter is a template syntax that is used to check type consistency and should not hold actual data. Instead, under input_schema you'll want to define "additionalItems" to be of { "type": "boolean" } and then add it to the test JSON in your input_example with a boolean after for testing purposes.
Also, I'm not too familiar with Tornado_JSON but it looks like you aren't complying with the schema definition language by placing "additionalItems" inside of the "values" property. Bring that up one level.
More specifically, I think what you're trying to do should look like:
"values": {
...value schema definition...
}
"additionalItems": {
"type": "boolean"
}
And the input examples would become:
input_example={
"DB": 22,
"values": [['INT', 44],['REAL', 33.33],['CHAR', 'b']],
"additionalItems": true
}

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Add Timestamp to ElasticSearch with Elasticsearch-py using Bulk-API - python

Related

Notion database does not send name of linked page

CDK WAF Python Multiple Statement velues error

Need help regarding relevancy score sorting in elastic search

Specifying a Custom Date Range in Googleads API TargetingIdeaService

JSON Schema: Input malformed

Categories

Resources