Elasticsearch sorting fails after update / insert - python

I'm inserting documents into elasticsearch and trying to sort on a given field that's present in all documents. However, whenever I update a document, indexing seems to break and I do not get a sorted order. I have created an index by doing:
self.conn = ES(server=url)
self.conn.create_index("test.test")
For instance, I would like to sort on a "_ts" field. Given the following dictionaries and code:
def update_or_insert(doc):
doc_type = "string"
index = doc['ns']
doc['_id'] = str(doc['_id'])
doc_id = doc['_id']
self.conn.index(doc, index, doc_type, doc_id)
to_insert = [
{'_id': '4', 'name': 'John', '_ts': 3, 'ns':'test.test'},
{'_id': '5', 'name': 'Paul', '_ts': 2', ns':'test.test'},
{'_id': '6', 'name': 'George', '_ts': 1', ns':'test.test'},
{'_id': '6', 'name': 'Ringo', '_ts': 4, 'ns':'test.test'} ,
]
for x in to_insert:
update_or_insert(x)
result = self.conn.search(q, sort={'_ts:desc'})
for it in result:
print it
I would expect to get an ordering of "Ringo, John, Paul" but instead get an ordering of "John, Paul, Ringo". Any reason why this might be the case? I see there's a bug here:
https://github.com/elasticsearch/elasticsearch/issues/3078
But that seems to affect ES .90.0 and I'm using .90.1.

It should be:
sort={"_ts":"desc"}

Related

Extract Only Values as csv format in gremlin

I am trying to extract values from Graph database. i am trying with the below gremlin console command but it is returning key value pairs, we can convert this to list.
%%gremlin
g.V().hasLabel('airport').limit(2).project('id','label','region','country').by(id()).by(label()).by('region').by('country').fold()
output
[{'id': '1', 'label': 'airport', 'region': 'US-GA', 'country': 'US'}, {'id': '2', 'label': 'airport', 'region': 'US-AK', 'country': 'US'}]
Expected output:
'1', 'airport', 'US-GA', 'US'
'2', 'airport', 'US-AK', 'US'
or
[['1','airport','US-GA','US'], ['2','airport', 'US-AK','US']]
Rather than use project you can use values. Steps like project and valueMap return a key:value map, whereas values does not include the keys in its result.
gremlin> g.V().
hasLabel('airport').
limit(2).
local(union(id(),label(),values('region','country')).fold())
==>[1,airport,US,US-GA]
==>[2,airport,US,US-AK]
As an alternative you can just add a select(values) to your current query which I think I prefer as it avoids needing the local and union steps.
gremlin> g.V().
hasLabel('airport').
limit(2).
project('id','label','region','country').
by(id()).
by(label()).
by('region').by('country').
select(values).
fold()
==>[[1,airport,US-GA,US],[2,airport,US-AK,US]]

Convert nested dict in pandas dataframe

I have a nested dictionary with the following structure. I am trying to convert it to pandas dataframe, however I have problems to split the 'mappings' dictionary to have it in separate columns.
{'16':
{'label': 't1',
'prefLab': 'name',
'altLabel': ['test1', 'test3'],
'map': [{'id': '16', 'idMap': {'ciID': 16, 'map3': '033441'}}]
},
'17':
{'label': 't2',
'prefLab': 'name2',
'broader': ['18'],
'altLabel': ['test2'],
'map': [{'id': '17', 'idMap': {'ciID': 17, 'map1': 1006558, 'map2': 1144}}]
}
}
ideal outcome would be a dataframe with the following structure.
label prefLab broader altLab ciID, map1, map2, map3 ...
16
17
Try with this: assuming your json format name is "data" then
train = pd.DataFrame.from_dict(data, orient='index')

Convert Nested JSON into Dataframe

I have a nested JSON like below. I want to convert it into a pandas dataframe. As part of that, I also need to parse the weight value only. I don't need the unit.
I also want the number values converted from string to numeric.
Any help would be appreciated. I'm relatively new to python. Thank you.
JSON Example:
{'id': '123', 'name': 'joe', 'weight': {'number': '100', 'unit': 'lbs'},
'gender': 'male'}
Sample output below:
id name weight gender
123 joe 100 male
use " from pandas.io.json import json_normalize ".
id name weight.number weight.unit gender
123 joe 100 lbs male
if you want to discard the weight unit, just flatten the json:
temp = {'id': '123', 'name': 'joe', 'weight': {'number': '100', 'unit': 'lbs'}, 'gender': 'male'}
temp['weight'] = temp['weight']['number']
then turn it into a dataframe:
pd.DataFrame(temp)
Something like this should do the trick:
json_data = [{'id': '123', 'name': 'joe', 'weight': {'number': '100', 'unit': 'lbs'}, 'gender': 'male'}]
# convert the data to a DataFrame
df = pd.DataFrame.from_records(json_data)
# conver id to an int
df['id'] = df['id'].apply(int)
# get the 'number' field of weight and convert it to an int
df['weight'] = df['weight'].apply(lambda x: int(x['number']))
df

Writing json from two lists in Python

I am new to Python. I have two lists. One is key list and another list is value list.
title = ["Code","Title","Value",....] value = [["100","abcd",100",...],["101","efgh",200",...],...] data={} data.setdefault("data",[]).append({"code": sp[0],"val": sp[2]})
this code gives me the following result.
{'data': [{'code': '100', 'val': '100'},{'code': '101', 'val': '200'}]}
But I want the result as the below,
{ "100": { "Title": "abcd", "Value": "100", ............, ............}, "101": { "Title": "efgh", "Value": "200", ............, ............} }
i.e., The first column of the value list should be the key of every Json array list and other items of the lists should be generated as key and value pair. How can I generate the Json array using Python code referring that two lists.
As it is not mentioned that about the size of list ,the below could would do the job.I am using python3.x
title = ["Code","Title","Value"]
value = [["100","abcd","100"],["101","efgh","200"]]
dic1={}
for i in range(len(title)-1):
for j in range(len(title)-1):
dic1.setdefault(value[i][0],{}).update({title[j+1]:value[i][j+1]})
Output is
{'101': {'Title': 'efgh', 'Value': '200'}, '100': {'Title': 'abcd', 'Value': '100'}}
I hope it is helpful!
You can build a dict with this lists. I made a quick snippet just for you to understand
title = ["Code","Title","Value"]
value = [['100','abcd','100'],['101','efgh','200']]
data={}
for whatever in value:
your_titles = {}
print(whatever[0])
your_titles[title[1]] = whatever[1]
your_titles[title[2]] = whatever[0]
your_titles[title[0]] = whatever[2]
data[whatever[0]] = your_titles
print(data)
The output:
{'100': {'Code': '100', 'Value': '100', 'Title': 'abcd'}, '101': {'Code': '200', 'Value': '101', 'Title': 'efgh'}}
Please read this tutorial and try to make it yourself. This is not the optimal solution for this problem.
Make a data frame and then set the column to index and then convert it to json:
data_frame = pd.DataFrame(columns = title, data = value)
data = data_frame.set_index('Code')
json1 = data.to_json(orient='index')

how to take the specific details out in Python that are separated by a semi colon or a slash?

I have the following results from a vet analyser
result{type:PT/APTT;error:0;PT:32.3 s;INR:0.0;APTT:119.2;code:470433200;lot:405
4H0401;date:20/01/2017 06:47;PID:TREKKER20;index:015;C1:-0.1;C2:-0.1;qclock:0;ta
rget:2;name:;Sex:;BirthDate:;operatorID:;SN:024000G0900046;version:V2.8.0.09}
Using Python how do i separate the date the time the type PT and APTT.... please note that the results will be different everytime so i need to make a code that will find the date using the / and will get the time because of four digits and the : .... do i use a for loop?
This code makes further usage of fields easier by converting them to dict.
from pprint import pprint
result = "result{type:PT/APTT;error:0;PT:32.3 s;INR:0.0;APTT:119.2;code:470433200;lot:405 4H0401;date:20/01/2017 06:47;PID:TREKKER20;index:015;C1:-0.1;C2:-0.1;qclock:0;ta rget:2;name:;Sex:;BirthDate:;operatorID:;SN:024000G0900046;version:V2.8.0.09}"
if result.startswith("result{") and result.endswith("}"):
result = result[(result.index("{") + 1):result.index("}")]
# else:
# raise ValueError("Invalid data '" + result + "'")
# Separate fields
fields = result.split(";")
# Separate field names and values
# First part is the name of the field for sure, but any additional ":" must not be split, as example "date:dd/mm/yyyy HH:MM" -> "date": "dd/mm/yyyy HH:MM"
fields = [field.split(":", 1) for field in fields]
fields = {field[0]: field[1] for field in fields}
a = fields['type'].split("/")
print(fields)
pprint(fields)
print(a)
The result:
{'type': 'PT/APTT', 'error': '0', 'PT': '32.3 s', 'INR': '0.0', 'APTT': '119.2', 'code': '470433200', 'lot': '405 4H0401', 'date': '20/01/2017 06:47', 'PID': 'TREKKER20', 'index': '015', 'C1': '-0.1', 'C2': '-0.1', 'qclock': '0', 'ta rget': '2', 'name': '', 'Sex': '', 'BirthDate': '', 'operatorID': '', 'SN': '024000G0900046', 'version': 'V2.8.0.09'}
{'APTT': '119.2',
'BirthDate': '',
'C1': '-0.1',
'C2': '-0.1',
'INR': '0.0',
'PID': 'TREKKER20',
'PT': '32.3 s',
'SN': '024000G0900046',
'Sex': '',
'code': '470433200',
'date': '20/01/2017 06:47',
'error': '0',
'index': '015',
'lot': '405 4H0401',
'name': '',
'operatorID': '',
'qclock': '0',
'ta rget': '2',
'type': 'PT/APTT',
'version': 'V2.8.0.09'}
['PT', 'APTT']
Note that dictionaries are not sorted (they don't need to be in most cases as you access the fields by the keys).
If you want to split the results by semicolon:
result_array = result.split(';')
In results_array you'll get all strings separated by semicolon, then you can access the date there: result_array[index]
That's quite a bad format to store data as fields might have colons in their values, but if you have to - you can strip away the surrounding result, split the rest on a semicolon, then do a single split on a colon to get dict key-value pairs and then just build a dict from that, e.g.:
data = "result{type:PT/APTT;error:0;PT:32.3 s;INR:0.0;APTT:119.2;code:470433200;lot:405 " \
"4H0401;date:20/01/2017 06:47;PID:TREKKER20;index:015;C1:-0.1;C2:-0.1;qclock:0;ta " \
"rget:2;name:;Sex:;BirthDate:;operatorID:;SN:024000G0900046;version:V2.8.0.09}"
parsed = dict(e.split(":", 1) for e in data[7:-1].split(";"))
print(parsed["APTT"]) # 119.2
print(parsed["PT"]) # 32.3 s
print(parsed["date"]) # 20/01/2017 06:47
If you need to further separate the date field to date and time, you can just do date, time = parsed["date"].split(), although if you're going to manipulate the object I'd suggest you to use the datetime module and parse it e.g.:
import datetime
date = datetime.datetime.strptime(parsed["date"], "%d/%m/%Y %H:%M")
print(date) # 2017-01-20 06:47:00
print(date.year) # 2017
print(date.hour) # 6
# etc.
To go straight to the point and get your type, PT, APTT, date and time, use re:
import re
from source import result_gen
result = result_gen()
def from_result(*vars):
regex = re.compile('|'.join([f'{re.encode(var)}:.*?;' for var in vars]))
matches =dict(f.group().split(':', 1) for f in re.finditer(regex, result))
return tuple(matches[v][:-1] for v in vars)
type, PT, APTT, datetime = from_result('type', 'PT', 'APTT', 'date')
date, time = datetime.split()
Notice that this can be easily extended in the event you became suddenly interested in some other 'var' in the string...
In short you can optimize this further (to avoid the split step) by capturing groups in the regex search...

Categories

Resources