Python Parse JSON objects in Different Way - python

I'm working on a python project In which we are getting some inputs from the user.
We are working on microservice deployments actually. Where user needs to provide the following things:
1): User will provide a GitHub repo which will include all of his microservices he wants to deploy inside a specific directory.
For example, we have a directory structure in GitHub repo like this:
mysvcs
|----nodeservice
|----pyservice
2): User will provide a JSON object in which he will mention the URL for this repo and some other information for these microservices, like this:
{
"repo_url": "https://github.com/arycloud/mysvcs.git",
"services":[
{
"name": "pyservice",
"routing": {
"path": "/",
"service": "pyservice",
"port": "5000"
}
},
{
"name": "nodeservice",
"routing": {
"path": "/",
"service": "nodeservice",
"port": "8080"
}
}
]
}
Then we are reading all the services from GitHub repo and using their directories to read the source code., and along with that, we are parsing the JSON object to get some information regarding these services.
We are reading the repo like this:
tempdir = tempfile.mkdtemp()
saved_unmask = os.umask(0o077)
out_dir = os.path.join(tempdir)
Repo.clone_from(data['repo_url'], out_dir)
list_dir = os.listdir(out_dir)
print(list_dir)
services = []
for svc in range(0, len(data['services'])):
services.append(list_dir[svc])
print(services)
According to our example above, it will return:
['nodesvc', 'pyservice']
But when we are reading the JSON object there user have mentioned the services in different order instead of alphabetically, so when we are the loop through the services using the above array we are trying to use the same index for JSON object services and the list of directories after cloning the GitHub repo but due to different orders it interchange the data.
Here's a sample code:
def my_deployment(data):
# data is JSON object
# Clone github repo and grab Dockerfiles
tempdir = tempfile.mkdtemp()
saved_unmask = os.umask(0o077)
out_dir = os.path.join(tempdir)
Repo.clone_from(data['repo_url'], out_dir)
list_dir = os.listdir(out_dir)
print(list_dir)
services = []
for svc in range(0, len(data['services'])):
services.append(list_dir[svc])
print(services)
for service in range(len(services)):
# Here i need to use the data from JSON object for current service
data['services'][service]['routing']['port']
# Here it's using the data of **pyservice** instead of **nodeservice** and vice versa.
Important: Ther order of services in GitHub is ['nodeservices', 'nodeservices'] but in the JSON object user can mention his services in a different order like pyservices, nodeservices. So when we are looping through how can we sync the order of both of these sources? This is the main issue.
I have tried it by changing the structure of my JSON object in this way:
"services":[
"pyservice": {
"routing": {
"path": "/",
"service": "pyservice",
"port": "5000"
}
},
"nodeservice": {
"routing": {
"path": "/node",
"service": "nodeservice",
"port": "8080"
}
}
]
But it says syntax is not correct.
How can I overcome this issue?
Thanks in advance!

You are thinking too complicated.
for svc in data['services']:
print(svc['name'], svc['routing']['port'])
Done.
General observation: You seem to cling to loop indexes. Don't. It's a good thing that Python loops have no indexes.
Whenever you are tempted to write
for thing in range(len(some_list)):
stop, and write
for thing in some_list:
instead.

The reason the JSON is not valid, is that you can't have name value pairs in a JSON array. This page tells you an array can be:
A comma-delimited list of unnamed values, either simple or complex, enclosed in brackets
Is the below JSON any use?
{
"repo_url": "https://github.com/arycloud/mysvcs.git",
"services":[
{
"pyservice": {
"routing": {
"path": "/",
"service": "pyservice",
"port": "5000"
}
}
},
{
"nodeservice": {
"routing": {
"path": "/node",
"service": "nodeservice",
"port": "8080"
}
}
}
]
}
If you want services sorted alphabetically, you can do the following:
services = data["services"]
b = {}
for node in services:
b.update(dict(node))
alphabetical_list = sorted(b)
Note:
This gives you a list ['nodeservice', 'pyservice'] which you can then use to get the object in b.

Here an approach we can use to overcome this order-sync issue:
First, the order of directories in GitHub repo is by default alphabetically, so if we sort the order for the array of services in JSON objects we will be able to get the same index for both sources. Even to make sure we can sort both of these sources alphabetically.
Here's the code:
First sort the services array of JSON object as:
data['services'] = sorted(data["services"], key=lambda d: d["name"])
By considering the example in the question, it will give us:
services = [
{"nodeservice": {
"A": "B"
}
},
{"pyservice":{
"X": "Y"
}
}
]
Then we will sort the list of directories from GitHub repo like this:
Repo.clone_from(data['repo_url'], out_dir)
list_dir = os.listdir(out_dir)
print(list_dir)
services = []
for svc in range(0, len(data['services'])):
services.append(list_dir[svc])
services.sort()
print(services)
It will give us: ['nodeservice', 'pyservice'] according to the example above in the question.
So, In both cases we have the nodeservice first the pyservice, mean the same order.

Related

Using Solr-Docker with python return a wrong results

I have a flask app which runs in a Docker container and I wanted to use Solr with it for indexing and searching, so I built a container for Solr using the Solr official image and used it with my app using docker-compose.
In the app I have multiple types of objects that I want to index for example type1 and type2 and each type has specific fields, so I got in Solr, documents that have different fields, such as doc1 could have field1 and field2, and doc2 could have field3, field4 and field5, and each document has a field called type to specify its type.
I have two types of search first one is searching for documents of a specific type and this is an example URL of it which is used with requests Python package:
response = requests.get("http://solr:8983/solr/myCollection/select?q=*val*&defType=edismax&fq=type:type1&qf=field1^2&qf=field2^1")
, and the other is overall search so I search for documents of all types, and here is its URL example:
response = requests.get("http://solr:8983/solr/myCollection/select?q=*val*&defType=edismax&fq=type:type1||type2&qf=field1^1&qf=field2^1&qf=field3^1&qf=field4^1&qf=field1^1")
I have two problems with my work:
I don't get the result that I expected when I run some queries.
some fields have values with special characters like (z=x+y*f) and when I try to escape these special characters by '\' it doesn't work.
So, is the queries that I wrote have something wrong and is there any article or tutorial that could help me because I searched a lot in the documentation and the internet but I couldn't find I way to solve my problems.
Note: I didn't change the schema file I let it as default.
I've solved the problems by using the tokenizers and filters in indexing and querying.
You can use them by the Client API that Solr provide.
Here is an example of JSON data to add tokenizers and filters to a field type:
{
"replace-field-type": {
"name": "field_name",
"class": "solr.TextField",
"multiValued": True,
"indexAnalyzer": {
"tokenizer": {
"class": "solr.LowerCaseTokenizerFactory"
},
"filters": [
{
"class": "solr.LowerCaseFilterFactory"
}
]
},
"queryAnalyzer": {
"tokenizer": {
"class": "solr.WhitespaceTokenizerFactory",
"rule": "java"
},
"filters": [
{
"class": "solr.LowerCaseFilterFactory"
}
]
}
}
}

Unique index in mongodb using db.command

I am using mongodb with PyMongo and I would like to separate schema definition from the rest of the application. I have file user_schema.jsonwith schema for user collection:
{
"collMod": "user",
"validator": {
"$jsonSchema": {
"bsonType": "object",
"required": ["name"],
"properties": {
"name": {
"bsonType": "string"
}
}
}
}
}
Then in db.py:
with open("user_schema.json", "r") as coll:
data = OrderedDict(json.loads(coll.read())) # Read JSON schema.
name = data["collMod"] # Get name of collection.
db.create_collection(name) # Create collection.
db.command(data) # Add validation to the collection.
Is there any way to add unique index to name field in user collection without changing db.py (only by changing user_schema.json)? I know I can use this:
db.user.create_index("name", unique=True)
however, then I have information about the collection in two places. I would like to have all configuration of the collection in the user_schema.json file. I need something like that:
{
"collMod": "user",
"validator": {
"$jsonSchema": {
...
}
},
"index": {
"name": {
"unique": true
}
}
}
No, you won't be able to do that without changing db.py.
The contents of user_schema.json is an object that can be passed to db.command to run the collmod command.
In order to create an index, you need to run the createIndexes command, or one of the helpers that calls this.
It is not possible to complete both of these actions with a single command.
A simple modification would be to store an array of commands to run in user_schema.json, and have db.py iterate the array and run each command.

Create a Drive File with a set locale in international settings through Drive API

I need to create a file using the Google Drive API (I'm using v3, the latest at the moment). Using python if it matters.
My code is below,
drive_service.files().create(supportsTeamDrives=True, body={
'name': 'test-file',
'mimeType': 'application/vnd.google-apps.spreadsheet',
'parents': [folder_id],
'properties': {'locale': 'en_GB',
'timeZone': 'Europe/Berlin'}
})
Following the documentation #here, I tried to set the properties key with the locale set to the wanted one but it keeps creating the file with the default locale of my account.
How can I make it work at the creation time? is there another parameter I can fill?
Your problem is that you are mixing up two different "properties".
The properties you are setting are user-defined properties which are only ever consumed by you yourself. They are of no significance to Google.
The properties you want to set are part of the Spreadsheet API. See https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets#SpreadsheetProperties
The simplest solution is to not use the Drive API to create your spreadsheet. Instead use the Spreadsheet API as descibed https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/create
I just tested this in the Apis Explorer
Create file Request
POST https://www.googleapis.com/drive/v3/files?key={YOUR_API_KEY}
{
"properties": {
"test": "test"
},
"name": "Hello"
}
Response
{
"kind": "drive#file",
"id": "1CYFI5rootSO5cndBD2gFb1n8SVvJ7_jo",
"name": "Hello",
"mimeType": "application/octet-stream"
}
File get request
GET https://www.googleapis.com/drive/v3/files/1CYFI5rootSO5cndBD2gFb1n8SVvJ7_jo?fields=*&key={YOUR_API_KEY}
Response
"kind": "drive#file",
"id": "1CYFI5rootSO5cndBD2gFb1n8SVvJ7_jo",
"name": "Hello",
"mimeType": "application/octet-stream",
"starred": false,
"trashed": false,
"explicitlyTrashed": false,
"parents": [
"0AJpJkOVaKccEUk9PVA"
],
"properties": {
"test": "test"
},
It appears to be working just fine i suggest you try checking the following:
The file id that is returned in the response from creating the file. To ensure that you are checking the one you just uploaded. Every time you run that you are going t create a new file.
Also remember to add fields=* with file.get if that's what you are using to check the result of your properties.

Dynamodb get_item and put_item without data types in python

I currently have a python script that looks like:
import boto3
...
response = dynamodb.get_item(
TableName = dynamodb_table_name,
Key = {
"snippet_id": {
"S": snippet_id
}
}
)
if "Item" in response:
item = response["Item"]
print(json.dumps(item, indent=4, cls=DecimalEncoder))
This prints something akin to:
{
"var_1": {
"BOOL": false
},
"var_2": {
"S": "Text"
},
"snippet_id": {
"S": "3a97e45c-ffed-4c76-8bb4-b2a32f49a5d2"
}
}
Any idea how to do the type detection and return:
{
"var_1": False,
"var_2": "Text",
"snippet_id": "3a97e45c-ffed-4c76-8bb4-b2a32f49a5d2"
}
Also, can this be done for the query as well?
TLDR
Use resource instead of client.
Summary
In essence, you can call boto3.client() or boto3.resource().
Client returns dynamoDB syntax, which looks like this:
'var_1' : {'S':"string"}
Resource returns normal syntax, which looks like this:
'var_1' : "string"
Further Reading
At its core, all that Boto3 does is call AWS APIs on your behalf. For the majority of the AWS services, Boto3 offers two distinct ways of accessing these abstracted APIs:
Client: low-level service access
Resource: higher-level object-oriented service access
Reference: https://realpython.com/lessons/clients-and-resources/

Translating Elasticsearch request from Kibana into elasticsearch-dsl

Recently migrated from AWS Elasticsearch Service (used Elasticsearch 1.5.2) to Elastic Cloud (currently using Elasticsearch 5.1.2). Glad I did it, but with that change comes a newer version of Elasticsearch and newer API's. Struggling to get my head around the new way of requesting stuff. Formerly, I could more or less copy/paste from Kibana's "Elasticsearch Request Body", adjust a few things, run elasticsearch.Elasticsearch.search() and get what I expect.
Here's my Elasticsearch Request Body from Kibana (for brevity, removed some of the extraneous stuff that Kibana usually inserts):
{
"size": 500,
"sort": [
{
"Time.ISO8601": {
"order": "desc",
"unmapped_type": "boolean"
}
}
],
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "Message\\ ID: 2003",
"analyze_wildcard": true
}
},
{
"range": {
"Time.ISO8601": {
"gte": 1484355455678,
"lte": 1484359055678,
"format": "epoch_millis"
}
}
}
],
"must_not": []
}
},
"stored_fields": [
"*"
],
"script_fields": {},
}
Now I want to use elasticsearch-dsl to do it, since that seems to be the recommended method (instead of using elasticsearch-py). How would I translate the above into elasticsearch-dsl?
Here's what I have so far:
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search, Q
client = Elasticsearch(
hosts=['HASH.REGION.aws.found.io/elasticsearch'],
use_ssl=True,
port=443,
http_auth=('USER','PASS')
)
s = Search(using=client, index="emp*")
s = s.query("query_string", query="Message\ ID:2003", analyze_wildcards=True)
s = s.query("range", **{"Time.ISO8601": {"gte": 1484355455678, "lte": 1484359055678, "format": "epoch_millis"}})
s = s.sort("Time.ISO8601")
response = s.execute()
for hit in response:
print '%s %s' % (hit['Time']['ISO8601'], hit['Message ID'])
My code written as above is not giving me what I expect. Getting results that include stuff that doesn't match "Message\ ID:2003", and also it's giving me things outside the requested range of Time.ISO8601 as well.
Totally new to elasticsearch-dsl and ES 5.1.2's way of doing things, so I know I've got lots to learn. What am I doing wrong? Thanks in advance for the help!
I don't have elasticsearch running right now but the query looks like what you wanted (you can always see the query produced by looking at s.to_dict()) with the exception of escaping the \ sign. In the original query it was escaped yet in python the result might be different due to different escaping.
I wuld strongly advise to not have spaces in your fields and also to use a more structured query than query_string:
s = Search(using=client, index="emp*")
s = s.filter("term", message_id=2003)
s = s.query("range", Time__ISO8601={"gte": 1484355455678, "lte": 1484359055678, "format": "epoch_millis"})
s = s.sort("Time.ISO8601")
Note that I also changed query() to filter() for a slight speedup and used __ instead of . in the field name keyword argument. elasticsearch-dsl will automatically expand that to ..
Hope this helps...

Categories

Resources