Python, parsing JSON-order by/sort by - python

I have this JSON data:
"InstanceProfileList": [
{
"InstanceProfileId": "AIPAI6ZC646GGONRADRSK",
"Roles": [
{
"AssumeRolePolicyDocument": {
"Version": "2012-10-17",
"Statement": [
{
"Action": "sts:AssumeRole",
"Effect": "Allow",
"Principal": {
"Service": [
"ec2.amazonaws.com",
"ssm.amazonaws.com"
]
}
}
]
},
"RoleId": "AROAJMI3DEQ4AW5JJMFII",
"CreateDate": "2018-03-23T15:23:28Z",
"RoleName": "ec2ssmMaintWindow",
"Path": "/",
"Arn": "arn:aws:iam::279052847476:role/ec2ssmMaintWindow"
}
]
I use the following code to parse it:
def get_user_group_service(element):
s = ''
for e in element['AssumeRolePolicyDocument']['Statement']:
p = e['Principal']
if 'Federated' in p:
s += p['Federated']
if 'Service' in p:
obj = p['Service']
if type(obj) is str:
s += obj # element is string
else:
s += ''.join(obj) # element is array of strings
if 'AWS' in p:
s += p['AWS']
return s
Now, the issue is that sometimes the Service element contains:
ec2.amazonaws.com ssm.amazonaws.com
and sometimes:
ssm.amazonaws.com ec2.amazonaws.com
The order is different every time.
It really doesn't matter in which order it will be shown, I just need the output to be consistent. Is there any way to order this output alphabetically?
I googled it and it seems obj.sort() will fix it but don't know how to apply it.

From what I understand you want a sorted string which are space separted. Here is my approach.
x = 'ec2ssmMaintWindow,AmazonSSMMaintenanceWindowRole,ec2.amazonaws.com ssm.amazonaws.com'
sorted_only_space_sparated = [ ' '.join( z for z in sorted(y.split(' '), reverse=True)) for y in x.split(',')]
print(','.join(str(i) for i in sorted_only_space_sparated))
Output:
ec2ssmMaintWindow,AmazonSSMMaintenanceWindowRole,ssm.amazonaws.com ec2.amazonaws.com
Let me know if it helps.

The problem is your string have both upper case and lower case use the key parameter in sorted method to sort the data irrespective to cases:
Services=["ec2ssmMaintWindow","AmazonSSMMaintenanceWindowRole","ssm.amazonaws.com" ,"ec2.amazonaws.com"]
s=""
s+=" ".join(sorted(Services,key=lambda x:x.lower()))
OUT
AmazonSSMMaintenanceWindowRole ec2.amazonaws.com ec2ssmMaintWindow ssm.amazonaws.com
if the Services have always these 4 values then sorting the list will rearrange the index and make the index same for each time then you can simple access the value by its index.

Thanks everyone, i found another solution: i had issues because code in question is native python3 code (i ran it under python 2.7) and obj.sort() thrown an error that
unicode' object has no attribute 'sort',
i corrected code so it checks if obj is unicode
def get_user_group_service(element):
s = ''
for e in element['AssumeRolePolicyDocument']['Statement']:
p = e['Principal']
if 'Federated' in p:
s += p['Federated']
if 'Service' in p:
obj = p['Service']
if type(obj) in (str, unicode):
s += obj # element is string
else:
obj.sort()
s += ''.join(obj) # element is array of strings
if 'AWS' in p:
s += p['AWS']
return s
Now values are alphabetically sorted on every itteration

Related

Access dictionary to X depth with a list of X values

Situation
I want to make a function that makes me free to give a full dictionary path parameter, and get back the value or node I need, without doing it node by node.
Code
This is the function. Obviously, as is now, it throws TypeError: unhashable type: 'list'. But it's only for getting the idea.
def get_section(api_data, section):
if "/" in section:
section = section.split("/")
return api_data.json()[section]
return api_data.json()[section]
Example
JSON
{
"component": {
"name": "gino",
"measures": [
{
"value": "12",
},
{
"value": "14"
}
]
},
"metrics": {
...
}
}
Expectation
analyses = get_section(analyses_data, "component/measures") # Returns measures node
analyses = get_section(analyses_data, "component/name") # Returns 'gino'
analyses = get_section(analyses_data, "component/measures/value") # Returns error, because it's ambigous
Request
How can I do it?
Edits
Added examples for clarity
A cool solution could be:
def get_section(api_data, section):
return [api_data := api_data[sec] for sec in section.split("/")][-1]
So if you execute it with:
analyses_data = {
"analyses": {
"dates": {
"xyz": "abc"
}
}
}
print(get_section(analyses_data, "analyses/dates/xyz")) # Returns: abc
Or since you are accessing a json using a custom method:
print(get_section(analyses_data.json(), "analyses/dates/xyz")) # Returns: abc
This works because the := operator in python is a variable assignment that returns the assigned value, so it loops all the parts of the section string by reassigning the api_data variable to the result of accessing that key and storing the result of every assignment in a list. Then with the [-1] at the end it returns the last assignment that corresponds to the last accessed key (a.k.a the last accessed dictionary level).

How to I convert it to dict

the list I have -
[
"Mathematics-2 (21SMT-125)",
"Mid-Semester Test-1",
"40",
"23.5",
"Mid-Semester Test-2",
"40",
"34",
"Disruptive Technologies - 2 (21ECH-103)",
"Experiment-1",
"20",
"19",
"Experiment-2",
"20",
"17",
"Experiment-3",
"20",
"18.5",
]
This list of stings is parsed from html using bs4
format to convert in :
{
"Subject": {
"Mathematics-2 (21SMT-125)": {
"Mid-Semester Test-1": [40,23.5],
"Mid-Semester Test-2": [40,34]
},
"Disruptive Technologies - 2 (21ECH-103)": {
"Experiment-1": [20,19],
"Experiment-2": [20,17],
"Experiment-3": [20,18.5]
}
}
}
The problem is that the list you provided is a flat list of items with no indicator of their hierarchical position in the desired structure.
One approach you could consider is if the entries that represent a parent object (Mathematics, etc...) are the only entries that contain parentheses, you could iterate on your list and use either string matching or regex to identify the parent, create a top level object for it then you'd need to add the next two entries as the value of the key/value pair as a list.
This assumes that you'll always have two subsequent values at the child level. If the number of attributes isn't fixed but they're always numeric you could use regex to determine if it's numeric or non-numeric and keep adding items to the value list until you hit another non-numeric entry, which would be treated as the next sibling in the hierarchy.
I would review the approach and check whether information from bs4 can be parsed in some smarter way - try to do more scrapping steps, first to reach subject, second "Semester/Experiment" third - grades.
If it's not possible and data returned from bs4 cannot be changed.. Only thing you can do is to try determine whether string is name of subject, semester or grade/score and try to use some while loops. Name of subject seems to have special code in the end, which can be distinguished from name of the semester/experiment using regexp and grade/scrore can be always parsed to number..
For data exactly like yours (where a string with a ( denotes a top-level entry, and there are always two numbers per entry), you could come up with a state machine sort of thing like this -- but like I commented, you really should improve your parsing code instead, since the HTML you're scraping your data off is likely already structured.
def is_float(s):
try:
float(s)
return True
except ValueError:
return False
def parse_inp(inp):
flat_map = {}
stack = []
x = 0
while x < len(inp):
if "(" in inp[x]:
stack.clear()
if is_float(inp[x]) and is_float(inp[x + 1]):
flat_map[tuple(stack)] = (float(inp[x]), float(inp[x + 1]))
x += 2
stack.pop(-1)
continue
stack.append(inp[x])
x += 1
return flat_map
def nest_flat_map(flat_map):
root = {}
for key_path, values_list in flat_map.items():
dst = root
for key in key_path[:-1]:
dst = dst.setdefault(key, {})
dst[key_path[-1]] = values_list
return root
inp = [
# ... data from original post
]
nested_map = nest_flat_map(parse_inp(inp))
print(nested_map)
This outputs the expected
{
"Mathematics-2 (21SMT-125)": {
"Mid-Semester Test-1": (40.0, 23.5),
"Mid-Semester Test-2": (40.0, 34.0),
},
"Disruptive Technologies - 2 (21ECH-103)": {
"Experiment-1": (20.0, 19.0),
"Experiment-2": (20.0, 17.0),
"Experiment-3": (20.0, 18.5),
},
}
You can use a fuzzy form of itertools.groupby to find the groups in this list of strings. This assumes that every class ends with the pattern "(classref-section)", and that it is followed by test or homework names each followed by one or more numeric scores.
source_data = [
"Mathematics-2 (21SMT-125)",
"Mid-Semester Test-1",
"40",
"23.5",
"Mid-Semester Test-2",
"40",
"34",
"Disruptive Technologies - 2 (21ECH-103)",
"Experiment-1",
"20",
"19",
"Experiment-2",
"20",
"17",
"Experiment-3",
"20",
"18.5",
]
from collections import defaultdict
import itertools
import json
import re
class_id_pattern = re.compile(r"\([A-Z0-9]+-\d+\)")
def is_class_reference(s):
return bool(class_id_pattern.match(s.rsplit(" ", 1)[-1]))
def group_by_class(s):
if is_class_reference(s):
group_by_class.current_class = s
return group_by_class.current_class
group_by_class.current_class = ""
def convert_numeric(s):
try:
return int(s)
except ValueError:
try:
return float(s)
except ValueError:
return None
def is_score(s):
return convert_numeric(s) is not None
def is_test(s):
return not is_score(s)
def group_by_test(s):
if is_test(s):
group_by_test.current_test = s
return group_by_test.current_test
group_by_test.current_test = ""
accum = defaultdict(lambda: defaultdict(list))
for class_name, class_name_and_tests in itertools.groupby(source_data, key=group_by_class):
class_name, *tests = class_name_and_tests
for test_name, test_name_and_scores in itertools.groupby(tests, key=group_by_test):
test_name, *scores = test_name_and_scores
accum[class_name][test_name].extend(convert_numeric(s) for s in scores)
print(json.dumps(accum, indent=4))
Prints:
{
"Mathematics-2 (21SMT-125)": {
"Mid-Semester Test-1": [
40,
23.5
],
"Mid-Semester Test-2": [
40,
34
]
},
"Disruptive Technologies - 2 (21ECH-103)": {
"Experiment-1": [
20,
19
],
"Experiment-2": [
20,
17
],
"Experiment-3": [
20,
18.5
]
}
}
Read more about fuzzy groupby in my blog post: https://thingspython.wordpress.com/2020/11/11/fuzzy-groupby-unusual-restaurant-part-ii/

Save values from POST request of a list of dicts

I a trying to expose an API (if that's the correct way to say it). I am using Quart, a python library made out of Flask and this is what my code looks like:
async def capture_post_request(request_json):
for item in request_json:
callbackidd = item['callbackid']
print(callbackidd)
#app.route('/start_work/', methods=['POST'])
async def start_work():
content_type = request.headers.get('content-type')
if (content_type == 'application/json'):
request_json = await request.get_json()
loop = asyncio.get_event_loop()
loop.create_task(capture_post_request(request_json))
body = "Async Job Started"
return body
else:
return 'Content-Type not supported!'
My schema looks like that:
[
{
"callbackid": "dd",
"itemid": "234r",
"input": [
{
"type": "thistype",
"uri": "www.uri.com"
}
],
"destination": {
"type": "thattype",
"uri": "www.urino2.com"
}
},
{
"statusCode": "202"
}
]
So far what I am getting is this error:
line 11, in capture_post_request
callbackidd = item['callbackid']
KeyError: 'callbackid'
I've tried so many stackoverflow posts to see how to iterate through my list of dicts but nothing worked. At one point in my start_work function I was using the get_data(as_text=True) method but still no results. In fact with the last method (or attr) I got:
TypeError: string indices must be integers
Any help on how to access those values is greatly appreciated. Cheers.
Your schema indicates there are two items in the request_json. The first indeed has the callbackid, the 2nd only has statusCode.
Debugging this should be easy:
async def capture_post_request(request_json):
for item in request_json:
print(item)
callbackidd = item.get('callbackid')
print(callbackidd) # will be None in case of the 2nd 'item'
This will print two dicts:
{
"callbackid": "dd",
"itemid": "234r",
"input": [
{
"type": "thistype",
"uri": "www.uri.com"
}
],
"destination": {
"type": "thattype",
"uri": "www.urino2.com"
}
}
And the 2nd, the cause of your KeyError:
{
"statusCode": "202"
}
I included the 'fix' of sorts already:
callbackidd = item.get('callbackid')
This will default to None if the key isn't in the dict.
Hopefully this will get you further!
Edit
How to work with only the dict containing your key? There are two options.
First, using filter. Something like this:
def has_callbackid(dict_to_test):
return 'callbackid' in dict_to_test
list_with_only_list_callbackid_items = list(filter(has_callbackid, request_json))
# Still a list at this point! With dicts which have the `callbackid` key
Filter accepts some arguments:
Function to call to determine if the value being tested should be filtered out or not.
The iterable you want to filter
Could also use a 'lambda function', but it's a bit evil. But serves the purpose just as well:
list_with_only_list_callbackid_items = list(filter(lambda x: 'callbackid' in x, request_json))
# Still a list at this point! With dict(s) which have the `callbackid` key
Option 2, simply loop over the result and only grab the one you want to use.
found_item = None # default
for item in request_json:
if 'callbackid' in item:
found_item = item
break # found what we're looking for, stop now
# Do stuff with the found_item from this point.

Python3: How to convert plain html into nested dictionary based on level of `h` tags?

I have a html that looks like this:
<h1>Sanctuary Verses</h1>
<h2>Purpose and Importance of the Sanctuary</h2>
<p>Ps 73:17\nUntil I went into the sanctuary of God; [then] understood I their end.</p>
<p>...</p>
<h2>Some other title</h2>
<p>...</p>
<h3>sub-sub-title</h3>
<p>sub-sub-content</p>
<h2>Some different title</h2>
<p>...</p>...
There are no div or section tags that group the p tags. It works well for display purposes. I want to extract data such that I get the desired output.
Desired Output:
The h tags should be displayed as titles and nested according to their levels
The p tags should be added to the contents under the specific title as given by the h tag
Desired Output:
{
"title": "Sanctuary Verses"
"contents": [
{"title": "Purpose and Importance of the Sanctuary"
"contents":["Ps 73:17\nUntil I went into the sanctuary of God; [then] understood I their end.",
"...."
]
},
{"title": "Some other title"
"contents": ["...",
{"title": "sub-sub-title"
"content": ["sub-sub-content"]
}
]
},
{"title": "Some different title"
"content": ["...","..."]
}
}
I had written some workaround code that helped me get the desired output. I am wondering which is the easiest way to get the desired output
This is sort of a stack problem/graph problem. Lets call it a tree. (or document or whatever.)
I think your initial tuple could be improved. (text, depth, type)
stack = []
depth = 0
broken_value = -1
current = {"title":"root", "contents":[]}
for item in list_of_tuples:
if item[1]>depth:
#deeper
next = { "title":item[0], "contents":[] }
current["contents"].append(next)
stack.append(current)
current=next
depth = item[1]
elif item[1]<depth:
#shallower closes current gets previous level
while depth>item[1]:
prev = stack.pop()
depth = depth-1
current = {"title":item[0], "content":[]}
stack[-1].append(current)
depth=item[1]
else:
#same depth
if item[2]==broken_value:
#<p> element gets added to current level.
current['contents'].append(item[0])
else:
#<h> element gets added to parent of current.
current = {"title":item[0], "content":[]}
stack[-1]["contents"].append(current)
broken_value = item[2]
This would create an arbitrary depth graph that assumes the depth increases by 1 but
could decrease by an arbitrary number.
It would probably be best to keep track of the depth in the dictionary so that you can move more than one depth at a time. Instead of just "title" and "content" maybe "title", "depth", and "content"
Explanation
The stack keeps track of open elements, and our current element is the element we are building.
If we find a depth > than our current depth, then we put the current element on the stack (it is still open) and start working on the next level element.
If the depth is less than the current element, we will close the current element and parent elements up to the same depth.
Finally if it is the same depth, we decide if it is a 'p' element that just gets added, or another 'h' that closes the current and starts a new current.
You can use recursion with itertools.groupby:
import itertools as it, re
def to_tree(d):
v, r = [list(b) for _, b in it.groupby(d, key=lambda x:not x[0])], []
for i in v:
if r and isinstance(r[-1], dict) and not r[-1]['content']:
r[-1]['content'] = to_tree([(j[4:], k) for j, k in i])
else:
for _, k in i:
r.append(re.sub('</*\w+\>', '', k) if not re.findall('^\<h', k) else {'title':re.sub('</*\w+\>', '', k), 'content':[]})
return r
import json
result = to_tree([((lambda x:'' if not x else x[0])(re.findall('^\s+', i)), re.sub('^\s+', '', i)) for i in filter(None, html.split('\n'))])
print(json.dumps(result[0], indent=4))
Output:
{
"title": "Sanctuary Verses",
"content": [
{
"title": "Purpose and Importance of the Sanctuary",
"content": [
"Ps 73:17 Until I went into the sanctuary of God; [then] understood I their end.",
"..."
]
},
{
"title": "Some other title",
"content": [
"...",
{
"title": "sub-sub-title",
"content": [
"sub-sub-content"
]
}
]
},
{
"title": "Some different title",
"content": [
"..."
]
}
]
}

merge dictionaries and have one big dictionary within list

payload = [
{
"Beds:": "3"
},
{
"Baths:": "2.0"
},
{
"Sqft:": "1,260"
},
]
How would I have such list be like:
payload = [{'Beds':"3","Baths":"2.0","Sqft":"1,260"}]
instead of multiple dictionaries; I want one dictionary within the list.
Try this:
payload_new = [{i: j[i] for j in payload for i in j}]
This should help. Use the replace method to remove ":"
payload = [
{
"Beds:": "3"
},
{
"Baths:": "2.0"
},
{
"Sqft:": "1,260"
},
]
newDict = [{k.replace(":", ""): v for j in payload for k,v in j.items()}]
print newDict
Output:
[{'Beds': '3', 'Sqft': '1,260', 'Baths': '2.0'}]
Python 3 has built-in dictionary unfolding, try this
payload = {**payload_ for payload_ in payload}
To merge dictionaries in a big dictionary, you can write it this way:
payload={"Beds": 3 ,
"Baths": 2.0,
"Sqft": 1260
}
output:
>>>payload["Baths"]
2.0
views:
using [] was making it a array/list rather than a dictionary.
using "" on keys (e.g: "3") was making them strings instead of integers.

Categories

Resources