Python regex for selecting multiple lines of code [duplicate] - python

This question already has answers here:
Python: How to match nested parentheses with regex?
(13 answers)
Closed 27 days ago.
I'd like to create a python Regex that can select a Terraform resource block in its entirety. There are multiple resource blocks in a file (example below), and I want to select each one separately.
I've tried the following regexes. The first one gets caught up when there are multiple closing brackets in the code. The second one just selects the whole file.
1) match = re.search(r'resource.*?\{(.*?)\}', code, re.DOTALL)
2) match = re.search(r'resource.*?\{(.*)\}', code, re.DOTALL)
Sample file:
resource "aws_s3_bucket_notification" "aws-lambda-trigger" {
bucket = aws_s3_bucket.newbucket.id
lambda_function {
lambda_function_arn = aws_lambda_function.test_lambda.arn
events = ["s3:ObjectCreated:*"]
filter_prefix = var.prefix
filter_suffix = var.suffix
}
}
resource "aws_s3_bucket" "newbucket" {
bucket = var.bucket_name
force_destroy = true
acl = var.acl_value
}

generally speaking you wnt to avoid using regex to parse structures such as html, xml or in this case hcl. Instead use a parser like pyhcl
import hcl
import json
with open("stack.tf") as main:
obj = hcl.load(main)
print(json.dumps(obj, indent=4))
print(f'newbucket-Force-Destroy: {obj["resource"]["aws_s3_bucket"]["newbucket"]["force_destroy"]}')
Then you can parse it all into a dict and just look up any values you are interested in.
output
$ python ./stack.py
{
"resource": {
"aws_s3_bucket_notification": {
"aws-lambda-trigger": {
"bucket": "aws_s3_bucket.newbucket.id",
"lambda_function": {
"lambda_function_arn": "aws_lambda_function.test_lambda.arn",
"events": [
"s3:ObjectCreated:*"
],
"filter_prefix": "var.prefix",
"filter_suffix": "var.suffix"
}
}
},
"aws_s3_bucket": {
"newbucket": {
"bucket": "var.bucket_name",
"force_destroy": true,
"acl": "var.acl_value"
}
}
}
}
newbucket-Force-Destroy: True

Related

Using a configuration list of JSON paths in a script to dynamically transform from one payload to another

I have an existing question which may have been initially too big so I am breaking it down into smaller questions so that I can piece it together. Related question is Python JSON transformation from explicit to generic by configuration.
I want to be able to process some transformations using a configuration list which it would try each listed JSON path and where it can successfully create an object, it would append it to an array called characteristics.
INPUT DATA
This data is a single value from an explicit data payload which has over 2,500 values individually defined at schema level. It's nasty hence we want to transform it into a more data driven object which can be maintained through configuration. Unfortunately, we have no control over the input data, else we would ask for the data to arrive in the preferred generic state instead.
data = {
"activities_acceptance" : {
"contractors_sub_contractors" : {
"contractors_subcontractors_engaged" : "yes"
}
}
CONFIGURATION JSON
This configuration example is used to create an object with category and type and for this example, adds a single value. The set_value may be used to add more than one mapped value from the origin data.
config = {
"processing_configuration" : [
{
"origin_path" : "activities_acceptance.contractors_sub_contractors",
"set_category" : "business-activities",
"set_type" : "contractors-sub-contractors-engaged",
"set_value" : [
{
"use_value" : "contractors_subcontractors_engaged",
"set_value" : "value"
}
]
}
MANUAL METHOD in PYTHON
This script is currently working but requires me to create a repeat of this for every generic object I want to create. I need the configuration JSON to be looped through instead thus reducing script and allows the addition of new data to be configuration management.
# Create Business Characteristics
business_characteristics = {
"characteristics" : []
}
# Create Characteristics - Business - Liability
try:
acc_liability = {
"category" : "business-activities",
"type" : "contractors-sub-contractors-engaged",
"description" : "",
"value" : "",
"details" : ""
}
acc_liability['value'] = data['line_of_businesses'][0]['assets']['commercial_operations'][0]['liability_asset']['acceptance']['contractors_and_subcontractors']['contractors_and_subcontractors_engaged']
business_characteristics['characteristics'].append(acc_liability)
except:
acc_liability = {}
What I'm trying to set the path for `acc_liability['value'] using the configuration JSON like as shown below.
Note, I used a . separator for the entire path to avoid having to do all the [''] throughout the whole configuration file so, not only do I need it to read from the configuration but also, for each path depth, wrap in a ['']. If that complicates things from a script perspective, I'll just use the full path as I have entered in the manual version.
DYNAMIC VERSION - currently not working and need help on
# Create Business Characteristics
business_characteristics = {
"characteristics" : []
}
# Create Characteristics - Business - Liability
try:
acc_liability = {
"category" : "",
"type" : "",
"description" : "",
"value" : "",
"details" : ""
}
acc_liability['category'] = config['processing_configuration'][0]['set_category']
acc_liability['type'] = config['processing_configuration'][0]['set_type']
acc_liability['value'] = data config['processing_configuration'][0]['origin_path'] + config['set_value']`
business_characteristics['characteristics'].append(acc_liability)
except:
acc_liability = {}
EXPECTED OUTPUT
{
"characteristics": [
{
"category": "business-activities",
"type": "contractors-sub-scontractors-engaged",
"description": "",
"value": "YES",
"details": ""
}
]
}

The best way to transform a response to a json format in the example

Appreciate if you could help me for the best way to transform a result into json as below.
We have a result like below, where we are getting an information on the employees and the companies. In the result, somehow, we are getting a enum like T, but not for all the properties.
[ {
"T.id":"Employee_11",
"T.category":"Employee",
"node_id":["11"]
},
{
"T.id":"Company_12",
"T.category":"Company",
"node_id":["12"],
"employeecount":800
},
{
"T.id":"id~Employee_11_to_Company_12",
"T.category":"WorksIn",
},
{
"T.id":"Employee_13",
"T.category":"Employee",
"node_id":["13"]
},
{
"T.id":"Parent_Company_14",
"T.category":"ParentCompany",
"node_id":["14"],
"employeecount":900,
"childcompany":"Company_12"
},
{
"T.id":"id~Employee_13_to_Parent_Company_14",
"T.category":"Contractorin",
}]
We need to transform this result into a different structure and grouping based on the category, if category in Employee, Company and ParentCompany, then it should be under the node_properties object, else, should be in the edge_properties. And also, apart from the common properties(property_id, property_category and node), different properties to be added if the category is company and parent company. There are few more logic also where we have to get the from and to properties of the edge object based on the 'to' . the expected response is,
"node_properties":[
{
"property_id":"Employee_11",
"property_category":"Employee",
"node":{node_id: "11"}
},
{
"property_id":"Company_12",
"property_category":"Company",
"node":{node_id: "12"},
"employeecount":800
},
{
"property_id":"Employee_13",
"property_category":"Employee",
"node":{node_id: "13"}
},
{
"property_id":"Company_14",
"property_category":"ParentCompany",
"node":{node_id: "14"},
"employeecount":900,
"childcompany":"Company_12"
}
],
"edge_properties":[
{
"from":"Employee_11",
"to":"Company_12",
"property_id":"Employee_11_to_Company_12",
},
{
"from":"Employee_13",
"to":"Parent_Company_14",
"property_id":"Employee_13_to_Parent_Company_14",
}
]
In java, we have used the enhanced for loop, switch etc. How we can write the code in the python to get the structure as above from the initial result structure. ( I am new to python), thank you in advance.
Regards
Here is a method that I quickly made, you can adjust it to your requirements. You can use regex or your own function to get the IDs of the edge_properties then assign it to an object like the way I did. I am not so sure of your full requirements but if that list that you gave is all the categories then this will be sufficient.
def transform(input_list):
node_properties = []
edge_properties = []
for input_obj in input_list:
# print(obj)
new_obj = {}
if input_obj['T.category'] == 'Employee' or input_obj['T.category'] == 'Company' or input_obj['T.category'] == 'ParentCompany':
new_obj['property_id'] = input_obj['T.id']
new_obj['property_category'] = input_obj['T.category']
new_obj['node'] = {input_obj['node_id'][0]}
if "employeecount" in input_obj:
new_obj['employeecount'] = input_obj['employeecount']
if "childcompany" in input_obj:
new_obj['childcompany'] = input_obj['childcompany']
node_properties.append(new_obj)
else: # You can do elif == to as well based on your requirements if there are other outliers
# You can use regex or whichever method here to split the string and add the values like above
edge_properties.append(new_obj)
return [node_properties, edge_properties]

filter based media search with google photos API (python)

I'm trying to use the mediaItems().search() method, using the following body:
body = {
"pageToken": page_token if page_token != "" else "",
"pageSize": 100,
"filters": {
"contentFilter": {
"includedContentCategories": {"LANDSCAPES","CITYSCAPES"}
}
},
"includeArchiveMedia": include_archive
}
but the problem is that the set {"LANDSCAPES","CITYSCAPES"} should actually be a set of enums (as in Java enums), and not strings as ive written. this is specified in the API: (https://developers.google.com/photos/library/reference/rest/v1/albums)
ContentFilter - This filter allows you to return media items based on the content type.
JSON representation
{
"includedContentCategories": [
enum (ContentCategory)
],
"excludedContentCategories": [
enum (ContentCategory)
]
}
is there a proper way of solving this in python?
Modification points:
When albumId and filters are used, an error of The album ID cannot be set if filters are used. occurs. So when you want to use filters, please remove albumId.
The value of includedContentCategories is an array as follows.
"includedContentCategories": ["LANDSCAPES","CITYSCAPES"]
includeArchiveMedia is includeArchivedMedia.
Please include includeArchivedMedia in filters.
When above points are reflected to your script, it becomes as follows.
Modified script:
body = {
# "albumId": album_id, # <--- removed
"pageToken": page_token if page_token != "" else "",
"pageSize": 100,
"filters": {
"contentFilter": {
"includedContentCategories": ["LANDSCAPES", "CITYSCAPES"]
},
"includeArchivedMedia": include_archive
}
}
Reference:
Method: mediaItems.search

Parse a puppet manifest with Python

I have a puppet manifest file - init.pp for my puppet module
In this file there are parameters for the class and in most cases they're written in the same way:
Example Input:
class test_module(
$first_param = 'test',
$second_param = 'new' )
What is the best way that I can parse this file with Python and get a dict object like this, which includes all the class parameters?
Example output:
param_dict = {'first_param':'test', 'second_param':'new'}
Thanks in Advance :)
Puppet Strings is a rubygem that can be installed on top of Puppet and can output a JSON document containing lists of the class parameters, documentation etc.
After installing it (see above link), run this command either in a shell or from your Python program to generate JSON:
puppet strings generate --emit-json-stdout init.pp
This will generate:
{
"puppet_classes": [
{
"name": "test_module",
"file": "init.pp",
"line": 1,
"docstring": {
"text": "",
"tags": [
{
"tag_name": "param",
"text": "",
"types": [
"Any"
],
"name": "first_param"
},
{
"tag_name": "param",
"text": "",
"types": [
"Any"
],
"name": "second_param"
}
]
},
"defaults": {
"first_param": "'test'",
"second_param": "'new'"
},
"source": "class test_module(\n $first_param = 'test',\n $second_param = 'new' ) {\n}"
}
]
}
(JSON trimmed slightly for brevity)
You can load the JSON in Python with json.loads, and extract the parameter names from root["puppet_classes"]["docstring"]["tags"] (where tag_name is param) and any default values from root["puppet_classes"]["defaults"].
You can use regular expression (straightforward but fragile)
import re
def parse(data):
mm = re.search('\((.*?)\)', data,re.MULTILINE)
dd = {}
if not mm:
return dd
matches = re.finditer("\s*\$(.*?)\s*=\s*'(.*?)'", mm.group(1), re.MULTILINE)
for mm in matches:
dd[mm.group(1)] = mm.group(2)
return dd
You can use it as follows:
import codecs
with codecs.open(filename,'r') as ff:
dd = parse(ff.read())
I don't know about the "best" way, but one way would be:
1) Set up Rspec-puppet (see google or my blog post for how to do that).
2) Compile your code and generate a Puppet catalog. See my other blog post for that.
Now, the Puppet catalog you compiled is a JSON document.
3) Visually inspect the JSON document to find the data you are looking for. Its precise location in the JSON document depends on the version of Puppet you are using.
4) You can now use Python to extract the data as a dictionary from the JSON document.

Add fields and correct indentation for json file (using python or ruby) [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have looked at a number of Stackoverflow questions related to this topic but can't find a solution which I can apply.
I have a database of 1450 restaurants, originally as a csv file, with the information for each restaurant occupying one row.
I converted this to a json file using an online tool, but can't get the customisation I want - which I think can only be achieved with code.
This is the nested pattern I need which requires one additional "location" field below "website" and an "address" field below "longitude":
{
"id":
"name":
"phone":
"email":
"website":
"location": {
"latitude":
"longitude":
"address": {
"line1":
"line2":
"line3":
"postcode"
"city":
"country":
}
}
}
A sample of the full raw json data I have now from the csv file looks like this (1450 entries):
{
"id":"101756",
"name":"1 Lombard Street",
"phone":"+44 2079296611",
"email":"reception#1lombardstreet.com",
"website":"http://www.1lombardstreet.com/",
"latitude":"51.5129",
"longitude":"-0.089",
"line1":"1 Lombard Street",
"line2":"",
"line3":"",
"postcode":"EC3V 9AA",
"city":"London",
"country":"UK"
},
{
"id":"105371",
"name":"108 Brasserie",
"phone":"+44 2079693900",
"email":"enquiries#108marylebonelane.com",
"website":"http://www.108brasserie.com",
"latitude":"51.51795",
"longitude":"-0.15079",
"line1":"108 Marylebone Lane",
"line2":"",
"line3":"",
"postcode":"W1U 2QE",
"city":"London",
"country":"UK"
},
{
"id":"108701",
"name":"1901 Restaurant",
"phone":"+44 2076187000",
"email":"london.restres#andaz.com",
"website":"http://www.andazdining.com",
"latitude":"51.51736",
"longitude":"-0.08123",
"line1":"Andaz Hotel",
"line2":"40 Liverpool Street",
"line3":"",
"postcode":"EC2M 7QN",
"city":"London",
"country":"UK"
},
Is there a way using Ruby or Python to change it to the nested pattern in the first example above? How can I do this? Thanks!
I think you can write up a template file specifying how the records should look like, with field names and empty strings as their values.
Considering the structure you want the data to be in, the format file will look like :
{
"id":"",
"name":"",
"phone":"",
"email":"",
"website":"",
"location": {
"latitude":"",
"longitude":"",
"address": {
"line1":"",
"line2":"",
"line3":"",
"postcode":"",
"city":"",
"country":""
}
}
}
And then utilize it in the code like this :
require 'json'
format = JSON.parse File.read('format.json')
records = JSON.parse File.read('input.json')
def convert(record, format)
ret = {}
format.each do |key, value|
ret[key] = record[key] ? record[key] : convert(record, format[key])
end
ret
end
records.map! {|record| convert(record, format) }
File.open('output.json', 'w') do |file|
file << JSON.generate(records)
end
It will convert to the format given in the format file. This solution works for any kind of formats, if it's all about just grouping original fields under a new field or fields. You can simply change the format to another in the format file and the data will be converted to that format without any change to the code.
UPDATE
Here is the code to convert the data back to the regular CSV list :
data = {
:id => 1,
:location => {
:address => {
:line1 => 'line1'
}
},
:website => 'site'
}
def deconvert(record)
ret = {}
record.each do |key, value|
if value.is_a? Hash
ret.merge!( deconvert(value) )
else
ret.merge!(key => value)
end
end
ret
end
puts deconvert data
# => {:id=>1, :line1=>"line1", :website=>"site"}
The most straightforward way is to simply iterate over your data, and nest the keys you need
data.each do |entry|
entry[:location] = {}
entry[:location][:latitude] = entry[:latitude]
entry[:location][:longitude] = entry[:longitude]
entry[:location][:address] = {}
entry[:location][:address][:line1] = entry[:line1]
...
entry.delete(:latitude)
entry.delete(:longitude)
...
end
Depending on how you initialy parse your data, each key can be a symbol (:location) or a string ('location'), and you should call them how they are (example assumes symbols).

Categories

Resources