Iterating through separately-JSON-encoded strings found inside a document - python
I am having trouble iterating though json, containing nested json strings (with escaped quotes) in itself.
(My apologies in advance, I am sort of new and probably missing some important info...)
Actually I have several questions:
1) How can I iterate (as I tried to do below with nested for loops) through the elements beneath "section-content" of the section "nodes" (!not of the section "element-names"!)? My problem seems to be, that section-content is a string with escaped quotes, which represents a separate json string in itself.
2) Is the JSON example provided even valid json? I tried several validators, which all seem to fail when the escaped quotes come into play.
3) Is there a smarter method of accessing specific elements, instead of just iterating through the whole tree?
I am thinking of something that specifies key/value pairs like:
my_json_obj['sections']['section-id' = 'nodes']['section-content']['occ_id' = '051MZjd97jUdYfSEOG}k10']
Code:
import json
import requests
import pprint
client = requests.session()
header = {'X-CSRF-Token': 'Fetch', 'Accept': 'application/json', 'Content-Type': 'application/json'}
response = client.get('http://xxxxxx.xxx/ProcessManagement/BranchContentSet(BranchId=\'051MZjd97jUdYfX7{dREAm\',SiteId=\'\',SystemRole=\'D\')/$value',auth=('TestUser', 'TestPass'),headers=header)
my_json_obj = response.json()
sections = my_json_obj['sections']
for mysection in sections:
print(mysection['section-id'])
if mysection['section-id'] == 'NODES':
nodes = mysection['section-content'] #nodes seems to be string
for mynode in nodes:
print(mynode) #prints string character by character
JSON example:
{
"smud-data-version": "0.1",
"sections": [
{
"section-id": "ELEMENT-NAMES",
"section-content-version": "",
"section-content": "{\"D\":[
{\"occ_id\":\"051MZjd97kcBgtZiEI0IvW\",\"lang\":\"E\",\"name\":\"0TD1 manuell\"},
{\"occ_id\":\"051MZjd97kcBgtZiEH}IvW\",\"lang\":\"E\",\"name\":\"Documentation\"}
]}"
},
{
"section-id": "NODES",
"section-content-version": "1.0",
"section-content": "[
{\"occ_id\":\"051MZjd97jUdYfSEOG}k10\",\"obj_type\":\"ROOT\",\"reference\":\"\",\"deleted\":\"\",\"attributes\":[]},
{\"occ_id\":\"051MZjd97jUdYfSEOH0k10\",\"obj_type\":\"ROOTGRP\",\"reference\":\"\",\"deleted\":\"\",\"attributes\":[]},
{\"occ_id\":\"051MZjd97jcAnKoe03JRRm\",\"obj_type\":\"SCN\",\"reference\":\"\",\"deleted\":\"\",\"attributes\":[
{\"attr_type\":\"NODE_CHANGED_AT\",\"lang\":\"\",\"values\":[\"20190213095843\"]},
{\"attr_type\":\"NODE_CHANGED_BY\",\"lang\":\"\",\"values\":[\"TestUser\"]},
{\"attr_type\":\"TCASSIGNMENTTYPE\",\"lang\":\"\",\"values\":[\"A\"]},
{\"attr_type\":\"DESCRIPTION\",\"lang\":\"E\",\"values\":[\"Scenario\"]}
]}
]"
}
]
}
Actual output:
ELEMENT-NAMES
NODES
[
{
"
o
c
c
_
i
d
"
Hopefully you can convince the folks who are generating this data to fix their server. That said, to work around the issues might look like:
# instead of using requests.json(), remove literal newlines and then decode ourselves
# ...because the original data has newline literals in positions where they aren't allowed.
my_json_obj = json.loads(response.text.replace('\n', ''))
for section in my_json_obj['sections']:
if section['section-id'] != 'NODES': continue
# doing another json.loads() here so you treat content as an array, not a string
for node in json.loads(section['section-content']):
__import__('pprint').pprint(node)
...properly emits as output:
{u'attributes': [],
u'deleted': u'',
u'obj_type': u'ROOT',
u'occ_id': u'051MZjd97jUdYfSEOG}k10',
u'reference': u''}
{u'attributes': [],
u'deleted': u'',
u'obj_type': u'ROOTGRP',
u'occ_id': u'051MZjd97jUdYfSEOH0k10',
u'reference': u''}
{u'attributes': [{u'attr_type': u'NODE_CHANGED_AT',
u'lang': u'',
u'values': [u'20190213095843']},
{u'attr_type': u'NODE_CHANGED_BY',
u'lang': u'',
u'values': [u'TestUser']},
{u'attr_type': u'TCASSIGNMENTTYPE',
u'lang': u'',
u'values': [u'A']},
{u'attr_type': u'DESCRIPTION',
u'lang': u'E',
u'values': [u'Scenario']}],
u'deleted': u'',
u'obj_type': u'SCN',
u'occ_id': u'051MZjd97jcAnKoe03JRRm',
u'reference': u''}```
Related
Extract array values from JSON with character removal in python
I have the following JSON string and I am trying to extract the values to a python list. I achieved getting the id_list string but I want to get every single value without the : in each of them. EDIT: Using python json library is not an option. My approach (never used a lot of regex before): https://regex101.com/r/qxYe9N/1 I want to use the expression with re.filterall(EXPR, jsonstr) to receive a list like: result = ["B01M8QSY16", "B017XBDBI6", ...more ] { "ajax": { "params": { "asinMetadataKeys": "adId", "featureId": "SimilaritiesCarousel", "reftagPrefix": "pd_sbs_60", "widgetTemplateClass": "PI::Similarities::ViewTemplates::Carousel::Desktop", "imageHeight": 160, "linkGetParameters": "{\"pf_rd_s\":\"desktop-dp-sims\",\"pf_rd_m\":\"A3JWKAKR8XB7XF\",\"pd_rd_r\":\"ac83cd73-b019-11e8-99c8-33d23753c678\",\"pf_rd_r\":\"H21WNBAW5EGZX90ND4PN\",\"pf_rd_t\":\"40701\",\"pd_rd_wg\":\"e6DPw\",\"pf_rd_p\":\"946762da-975a-438a-9e2b-a585cbe769b5\",\"pf_rd_i\":\"desktop-dp-sims\",\"pd_rd_w\":\"xg8TH\"}", "faceoutTemplateClass": "PI::P13N::ViewTemplates::Product::Desktop::CarouselFaceout", "auiDeviceType": "desktop", "imageWidth": 160, "schemaVersion": 2, "productDetailsTemplateClass": "PI::P13N::ViewTemplates::ProductDetails::Desktop::Base", "forceFreshWin": 0, "productDataFlavor": "Faceout", "relatedRequestID": "H21WNBAW5EGZX90ND4PN", "maxLineCount": 6 }, "id_list": ["B01M8QSY16:", "B017XBDBI6:", "B01GL5MYCE:", "B0751DHYXC:", "B01AHWOH54:", "B01M7XYENW:", "B01N7FKKXV:", "B07C1NLKS5:", "B00R25QZDC:", "B01AJB1VFW:", "B079K773M7:", "B07DX3W41P:", "B01GL5606A:", "B07654YLSB:", "B01GFL6MZE:", "B00WLI5E3M:", "B01CTE28DG:", "B01BELELVC:", "B00ZY7H91M:", "B077TPG2WK:", "B01G503MC6:", "B01LYZFC4V:", "B00ID9UQYK:", "B07C3T52LB:", "B07DX39RNS:", "B076551MZP:", "B0761RWKPQ:", "B00T8FD9YM:", "B07653JBYS:", "B07G316H74:", "B01FSEBC9K:", "B014QKBVH0:", "B01BVA2I4S:", "B01CVOZNAE:", "B07D19JDH9:", "B018ACDMJK:", "B00V0H83YW:", "B07C432PK3:", "B07B9P4T4V:", "B076H4WWLK:", "B077G3Y86F:", "B077Z7XLJF:", "B01NCFB2BB:", "B01M4I7FMC:", "B01BEVFJCM:", "B01FSEBC8G:", "B07DXCTKB6:", "B01NBHYAR0:", "B07DGWJ887:", "B00SLP58SU:", "B01N55H5AE:", "B013AZCPLS:", "B076PC3NYV:", "B01BVA2JHE:", "B07FF38J8C:", "B07DHGTS81:", "B00R25QZHS:"], "url": "/gp/p13n-shared/faceout-partial", "id_param_name": "asins" }, "baseAsin": "B01GL56060", "name": "desktop-dp-sims_session-similarities", "set_size": 57 } EDIT: Raw string: {"ajax":{"params":{"asinMetadataKeys":"adId","featureId":"SimilaritiesCarousel","reftagPrefix":"pd_sbs_193","widgetTemplateClass":"PI::Similarities::ViewTemplates::Carousel::Desktop","imageHeight":160,"linkGetParameters":"{\"pf_rd_s\":\"desktop-dp-sims\",\"pf_rd_m\":\"A3JWKAKR8XB7XF\",\"pd_rd_r\":\"e672bcd4-b03e-11e8-8dbb-41abd883f66d\",\"pf_rd_r\":\"X5Z293FJ403CC225M759\",\"pf_rd_t\":\"40701\",\"pd_rd_wg\":\"CrGGS\",\"pf_rd_p\":\"946762da-975a-438a-9e2b-a585cbe769b5\",\"pf_rd_i\":\"desktop-dp-sims\",\"pd_rd_w\":\"ktYgt\"}","faceoutTemplateClass":"PI::P13N::ViewTemplates::Product::Desktop::CarouselFaceout","auiDeviceType":"desktop","imageWidth":160,"schemaVersion":2,"productDetailsTemplateClass":"PI::P13N::ViewTemplates::ProductDetails::Desktop::Base","forceFreshWin":0,"productDataFlavor":"Faceout","relatedRequestID":"X5Z293FJ403CC225M759","maxLineCount":6},"id_list":["B07BHS22V6:","B00ITJNHX6:","B07DDGCLZ1:","B017XYQ4X2:","B01LYA8CLG:","B0747T62HS:","B00LHT0I78:","B071D5LL18:","B071NPLTRS:","B00CFMRFO0:","B01N4X1EL9:","B077R4WZ46:","B00YTZSTVY:","B073V5T8G2:","B00CFMRI7E:","B01ARIYIPM:","B0747X16FY:","B00ZWNPJVA:","B01N4WZ4AL:","B00BU662AU:","B07C2NYVMP:","B01FD7ZOB4:","B017M17VTC:","B00YTZST0K:","B07CVSJG6H:","B00V63GQBC:","B00NYBAJJY:","B01MCZ2ZQC:","B078BSJ8TV:","B077QXWJBR:","B07BL5FWVP:","B00N8SPSSU:","B01LXMVFGI:","B06ZY83D2Z:","B00ZQYY9TI:","B0761HT6JJ:","B06XRWB686:","B075XHDQ85:","B01LYJMK02:","B018JWYKRE:","B0759W61P6:","B078ZKNGRS:","B013BJBZBE:","B01LYMTVY2:","B072VMTVGZ:","B077QXW1Z9:","B07CMB96BX:","B07BNXNMZ5:","B01N3CY4Y3:","B018JX3J7U:","B0747T5MY1:","B07CQPTFDB:","B077QW292J:","B00LHT0GLQ:","B01C4B17XG:","B019WD74F4:"],"url":"/gp/p13n-shared/faceout-partial","id_param_name":"asins"},"baseAsin":"B01LS24R2U","name":"desktop-dp-sims_session-similarities","set_size":56}
just use pythons json library import json j1 = """{ "ajax": { "params": { "asinMetadataKeys": "adId", "featureId": "SimilaritiesCarousel", "reftagPrefix": "pd_sbs_60", "widgetTemplateClass": "PI::Similarities::ViewTemplates::Carousel::Desktop", "imageHeight": 160, "faceoutTemplateClass": "PI::P13N::ViewTemplates::Product::Desktop::CarouselFaceout", "auiDeviceType": "desktop", "imageWidth": 160, "schemaVersion": 2, "productDetailsTemplateClass": "PI::P13N::ViewTemplates::ProductDetails::Desktop::Base", "forceFreshWin": 0, "productDataFlavor": "Faceout", "relatedRequestID": "H21WNBAW5EGZX90ND4PN", "maxLineCount": 6 }, "id_list": ["B01M8QSY16:", "B017XBDBI6:", "B01GL5MYCE:", "B0751DHYXC:", "B01AHWOH54:", "B01M7XYENW:", "B01N7FKKXV:", "B07C1NLKS5:", "B00R25QZDC:", "B01AJB1VFW:", "B079K773M7:", "B07DX3W41P:", "B01GL5606A:", "B07654YLSB:", "B01GFL6MZE:", "B00WLI5E3M:", "B01CTE28DG:", "B01BELELVC:", "B00ZY7H91M:", "B077TPG2WK:", "B01G503MC6:", "B01LYZFC4V:", "B00ID9UQYK:", "B07C3T52LB:", "B07DX39RNS:", "B076551MZP:", "B0761RWKPQ:", "B00T8FD9YM:", "B07653JBYS:", "B07G316H74:", "B01FSEBC9K:", "B014QKBVH0:", "B01BVA2I4S:", "B01CVOZNAE:", "B07D19JDH9:", "B018ACDMJK:", "B00V0H83YW:", "B07C432PK3:", "B07B9P4T4V:", "B076H4WWLK:", "B077G3Y86F:", "B077Z7XLJF:", "B01NCFB2BB:", "B01M4I7FMC:", "B01BEVFJCM:", "B01FSEBC8G:", "B07DXCTKB6:", "B01NBHYAR0:", "B07DGWJ887:", "B00SLP58SU:", "B01N55H5AE:", "B013AZCPLS:", "B076PC3NYV:", "B01BVA2JHE:", "B07FF38J8C:", "B07DHGTS81:", "B00R25QZHS:"], "url": "/gp/p13n-shared/faceout-partial", "id_param_name": "asins" }, "baseAsin": "B01GL56060", "name": "desktop-dp-sims_session-similarities", "set_size": 57 }""" d1 = json.loads(j1) id_list = [elem.replace(":", "") for elem in d1["ajax"]['id_list']] id_list Output: ['B01M8QSY16', 'B017XBDBI6', ... 'B00R25QZHS'] I had to remove the line "linkGetParameters : ... " because it seems to be not json conform.
If you are sure that the attribute "id_list" will always be in one line in a similar single-space format after commas and colon, and the json module is not an option, then you can do the following: list( # make sure the result is a list filter( # filter to… None, # …remove any empty items re.split( # split the line of id_list on… r':(?:,\s)?', # …colon and then optional comma and spaces re.search( # search… r'(?<="id_list": \[)((?:"[^"]+:"(?:,\s*)?)+)', j1) # …for the id_list property and its value .group(0) # take the match .replace('"', '') # and drop all double quotes ))) ['B01M8QSY16', 'B017XBDBI6', 'B01GL5MYCE', 'B0751DHYXC', 'B01AHWOH54', 'B01M7XYENW', 'B01N7FKKXV', 'B07C1NLKS5', 'B00R25QZDC', 'B01AJB1VFW', 'B079K773M7', 'B07DX3W41P', 'B01GL5606A', 'B07654YLSB', 'B01GFL6MZE', 'B00WLI5E3M', 'B01CTE28DG', 'B01BELELVC', 'B00ZY7H91M', 'B077TPG2WK', 'B01G503MC6', 'B01LYZFC4V', 'B00ID9UQYK', 'B07C3T52LB', 'B07DX39RNS', 'B076551MZP', 'B0761RWKPQ', 'B00T8FD9YM', 'B07653JBYS', 'B07G316H74', 'B01FSEBC9K', 'B014QKBVH0', 'B01BVA2I4S', 'B01CVOZNAE', 'B07D19JDH9', 'B018ACDMJK', 'B00V0H83YW', 'B07C432PK3', 'B07B9P4T4V', 'B076H4WWLK', 'B077G3Y86F', 'B077Z7XLJF', 'B01NCFB2BB', 'B01M4I7FMC', 'B01BEVFJCM', 'B01FSEBC8G', 'B07DXCTKB6', 'B01NBHYAR0', 'B07DGWJ887', 'B00SLP58SU', 'B01N55H5AE', 'B013AZCPLS', 'B076PC3NYV', 'B01BVA2JHE', 'B07FF38J8C', 'B07DHGTS81', 'B00R25QZHS'] This is dense and mostly unreadable code; use as-is, or I can break down more readably the logic if you want.
Seeing as you can't use the JSON library, you can try this here expression (tested on Python3): result = [ id.strip('":') for id in re.search('"id_list": \[(.*)\],', jsonstr).group(1).split(", ") ] (where jsonstr is a string containing all of the original JSON code). To make it easier to understand, the above code uses re.search (not re.filterall as you had suggested) to broadly locate and select the line, group to narrow down the selection, split to transform the string into a list, and strip to trim off the unnecessary characters in each list item leaving you with a list of IDs like the one you specify in your question.
First, as Florian H stated. You should claim valid JSON from your source in order to be able to use the json Python module. Someone who provides JSON should provide valid JSON... EDIT: The JSON seems valid, see below Trying to use the json module anyway to address your need, I noted that the parsing problem comes from the escaped double-quote in linkGetParameters value. I assume the JSON string has been copied/pasted as is and this is probably the source of the JSON parsing problem. Simply pasting this JSON in a Python string makes Python use the anti-slash to escape the double quote instead of preserving the two characters. To test the JSON content, you have to copy it into a raw string (= prefixed by a r): import json json_ = r"""{ "ajax": { "params": { "asinMetadataKeys": "adId", "featureId": "SimilaritiesCarousel", "reftagPrefix": "pd_sbs_60", "widgetTemplateClass": "PI::Similarities::ViewTemplates::Carousel::Desktop", "imageHeight": 160, "linkGetParameters": "{\"pf_rd_s\":\"desktop-dp-sims\",\"pf_rd_m\":\"A3JWKAKR8XB7XF\",\"pd_rd_r\":\"ac83cd73-b019-11e8-99c8-33d23753c678\",\"pf_rd_r\":\"H21WNBAW5EGZX90ND4PN\",\"pf_rd_t\":\"40701\",\"pd_rd_wg\":\"e6DPw\",\"pf_rd_p\":\"946762da-975a-438a-9e2b-a585cbe769b5\",\"pf_rd_i\":\"desktop-dp-sims\",\"pd_rd_w\":\"xg8TH\"}", "faceoutTemplateClass": "PI::P13N::ViewTemplates::Product::Desktop::CarouselFaceout", "auiDeviceType": "desktop", "imageWidth": 160, "schemaVersion": 2, "productDetailsTemplateClass": "PI::P13N::ViewTemplates::ProductDetails::Desktop::Base", "forceFreshWin": 0, "productDataFlavor": "Faceout", "relatedRequestID": "H21WNBAW5EGZX90ND4PN", "maxLineCount": 6 }, "id_list": ["B01M8QSY16:", "B017XBDBI6:", "B01GL5MYCE:", "B0751DHYXC:", "B01AHWOH54:", "B01M7XYENW:", "B01N7FKKXV:", "B07C1NLKS5:", "B00R25QZDC:", "B01AJB1VFW:", "B079K773M7:", "B07DX3W41P:", "B01GL5606A:", "B07654YLSB:", "B01GFL6MZE:", "B00WLI5E3M:", "B01CTE28DG:", "B01BELELVC:", "B00ZY7H91M:", "B077TPG2WK:", "B01G503MC6:", "B01LYZFC4V:", "B00ID9UQYK:", "B07C3T52LB:", "B07DX39RNS:", "B076551MZP:", "B0761RWKPQ:", "B00T8FD9YM:", "B07653JBYS:", "B07G316H74:", "B01FSEBC9K:", "B014QKBVH0:", "B01BVA2I4S:", "B01CVOZNAE:", "B07D19JDH9:", "B018ACDMJK:", "B00V0H83YW:", "B07C432PK3:", "B07B9P4T4V:", "B076H4WWLK:", "B077G3Y86F:", "B077Z7XLJF:", "B01NCFB2BB:", "B01M4I7FMC:", "B01BEVFJCM:", "B01FSEBC8G:", "B07DXCTKB6:", "B01NBHYAR0:", "B07DGWJ887:", "B00SLP58SU:", "B01N55H5AE:", "B013AZCPLS:", "B076PC3NYV:", "B01BVA2JHE:", "B07FF38J8C:", "B07DHGTS81:", "B00R25QZHS:"], "url": "/gp/p13n-shared/faceout-partial", "id_param_name": "asins" }, "baseAsin": "B01GL56060", "name": "desktop-dp-sims_session-similarities", "set_size": 57 }""" result = json.loads(json_) print [id_[:-1] for id_ in result['ajax']['id_list']] # [u'B01M8QSY16', u'B017XBDBI6', u'B01GL5MYCE', u'B0751DHYXC', u'B01AHWOH54', u'B01M7XYENW', u'B01N7FKKXV', u'B07C1NLKS5', u'B00R25QZDC', u'B01AJB1VFW', u'B079K773M7', u'B07DX3W41P', u'B01GL5606A', u'B07654YLSB', u'B01GFL6MZE', u'B00WLI5E3M', u'B01CTE28DG', u'B01BELELVC', u'B00ZY7H91M', u'B077TPG2WK', u'B01G503MC6', u'B01LYZFC4V', u'B00ID9UQYK', u'B07C3T52LB', u'B07DX39RNS', u'B076551MZP', u'B0761RWKPQ', u'B00T8FD9YM', u'B07653JBYS', u'B07G316H74', u'B01FSEBC9K', u'B014QKBVH0', u'B01BVA2I4S', u'B01CVOZNAE', u'B07D19JDH9', u'B018ACDMJK', u'B00V0H83YW', u'B07C432PK3', u'B07B9P4T4V', u'B076H4WWLK', u'B077G3Y86F', u'B077Z7XLJF', u'B01NCFB2BB', u'B01M4I7FMC', u'B01BEVFJCM', u'B01FSEBC8G', u'B07DXCTKB6', u'B01NBHYAR0', u'B07DGWJ887', u'B00SLP58SU', u'B01N55H5AE', u'B013AZCPLS', u'B076PC3NYV', u'B01BVA2JHE', u'B07FF38J8C', u'B07DHGTS81', u'B00R25QZHS'] Once the id_list retrieved, you can remove the last character of each id using the string slicing. When using JSON content from your original source instead of a litteral string, you should not encounter this kind of escaping problem. If it is really not possible, assuming an id is always 10 characters long, this should do the trick: import re json = """{ "ajax": { "params": { "asinMetadataKeys": "adId", "featureId": "SimilaritiesCarousel", "reftagPrefix": "pd_sbs_60", "widgetTemplateClass": "PI::Similarities::ViewTemplates::Carousel::Desktop", "imageHeight": 160, "linkGetParameters": "{\"pf_rd_s\":\"desktop-dp-sims\",\"pf_rd_m\":\"A3JWKAKR8XB7XF\",\"pd_rd_r\":\"ac83cd73-b019-11e8-99c8-33d23753c678\",\"pf_rd_r\":\"H21WNBAW5EGZX90ND4PN\",\"pf_rd_t\":\"40701\",\"pd_rd_wg\":\"e6DPw\",\"pf_rd_p\":\"946762da-975a-438a-9e2b-a585cbe769b5\",\"pf_rd_i\":\"desktop-dp-sims\",\"pd_rd_w\":\"xg8TH\"}", "faceoutTemplateClass": "PI::P13N::ViewTemplates::Product::Desktop::CarouselFaceout", "auiDeviceType": "desktop", "imageWidth": 160, "schemaVersion": 2, "productDetailsTemplateClass": "PI::P13N::ViewTemplates::ProductDetails::Desktop::Base", "forceFreshWin": 0, "productDataFlavor": "Faceout", "relatedRequestID": "H21WNBAW5EGZX90ND4PN", "maxLineCount": 6 }, "id_list": ["B01M8QSY16:", "B017XBDBI6:", "B01GL5MYCE:", "B0751DHYXC:", "B01AHWOH54:", "B01M7XYENW:", "B01N7FKKXV:", "B07C1NLKS5:", "B00R25QZDC:", "B01AJB1VFW:", "B079K773M7:", "B07DX3W41P:", "B01GL5606A:", "B07654YLSB:", "B01GFL6MZE:", "B00WLI5E3M:", "B01CTE28DG:", "B01BELELVC:", "B00ZY7H91M:", "B077TPG2WK:", "B01G503MC6:", "B01LYZFC4V:", "B00ID9UQYK:", "B07C3T52LB:", "B07DX39RNS:", "B076551MZP:", "B0761RWKPQ:", "B00T8FD9YM:", "B07653JBYS:", "B07G316H74:", "B01FSEBC9K:", "B014QKBVH0:", "B01BVA2I4S:", "B01CVOZNAE:", "B07D19JDH9:", "B018ACDMJK:", "B00V0H83YW:", "B07C432PK3:", "B07B9P4T4V:", "B076H4WWLK:", "B077G3Y86F:", "B077Z7XLJF:", "B01NCFB2BB:", "B01M4I7FMC:", "B01BEVFJCM:", "B01FSEBC8G:", "B07DXCTKB6:", "B01NBHYAR0:", "B07DGWJ887:", "B00SLP58SU:", "B01N55H5AE:", "B013AZCPLS:", "B076PC3NYV:", "B01BVA2JHE:", "B07FF38J8C:", "B07DHGTS81:", "B00R25QZHS:"], "url": "/gp/p13n-shared/faceout-partial", "id_param_name": "asins" }, "baseAsin": "B01GL56060", "name": "desktop-dp-sims_session-similarities", "set_size": 57 }""" # https://regex101.com/r/qxYe9N/11 id_re = re.compile('"([A-Z0-9]{10}):"') result = id_re.findall(json) print result # ['B01M8QSY16', 'B017XBDBI6', 'B01GL5MYCE', 'B0751DHYXC', 'B01AHWOH54', 'B01M7XYENW', 'B01N7FKKXV', 'B07C1NLKS5', 'B00R25QZDC', 'B01AJB1VFW', 'B079K773M7', 'B07DX3W41P', 'B01GL5606A', 'B07654YLSB', 'B01GFL6MZE', 'B00WLI5E3M', 'B01CTE28DG', 'B01BELELVC', 'B00ZY7H91M', 'B077TPG2WK', 'B01G503MC6', 'B01LYZFC4V', 'B00ID9UQYK', 'B07C3T52LB', 'B07DX39RNS', 'B076551MZP', 'B0761RWKPQ', 'B00T8FD9YM', 'B07653JBYS', 'B07G316H74', 'B01FSEBC9K', 'B014QKBVH0', 'B01BVA2I4S', 'B01CVOZNAE', 'B07D19JDH9', 'B018ACDMJK', 'B00V0H83YW', 'B07C432PK3', 'B07B9P4T4V', 'B076H4WWLK', 'B077G3Y86F', 'B077Z7XLJF', 'B01NCFB2BB', 'B01M4I7FMC', 'B01BEVFJCM', 'B01FSEBC8G', 'B07DXCTKB6', 'B01NBHYAR0', 'B07DGWJ887', 'B00SLP58SU', 'B01N55H5AE', 'B013AZCPLS', 'B076PC3NYV', 'B01BVA2JHE', 'B07FF38J8C', 'B07DHGTS81', 'B00R25QZHS']
Remove python dict item from nested json file
I have a JSON file that I fetch from an API that returns KeyError:0 while I attempt to remove items in a python dict. I assume its a combination of my lack of skill and format of the json. My goal is to remove all instances of 192.168.1.1 from ip_address_1 My Code: from api import Request import requests, json, ordereddict # prepare request request = Request().service('').where({"query":"192.168.1.0"}).withType("json") # call request response = request.execute() # parse response into python object obj = json.loads(response) # remove items for i in xrange(len(obj)): if obj[i]["ip_address_1"] == "192.168.1.1": obj.pop(i) # display print json.dumps(obj,indent=1) Example JSON: { "response": { "alerts": [ { "action": "New", "ip_address_1": "192.168.1.1", "domain": "example.com", "ip_address_2": "192.68.1.2" }, { "action": "New", "ip_address_1": "192.168.1.3", "domain": "example2.com", "ip_address_2": "192.168.1.1" } ], "total": "2", "query": "192.168.1.0", } }
This is incorrect: # remove items for i in xrange(len(obj)): if obj[i]["ip_address_1"] == "192.168.1.1": obj.pop(i) You are iterating over an object as if it were a list. What you want to do: for sub_obj in obj["response"]["alerts"]: if sub_obj["ip_address_1"] == "192.168.1.1": sub_obj.pop("ip_address_1")
I've interpreted your requirements to be: Remove from the "alerts" list any dictionary with ip_address_1 set to 192.168.1.1. Create a list of all other ip_address_1 values. json.loads(response) produces this dictionary: {u'response': {u'alerts': [{u'action': u'New', u'domain': u'example.com', u'ip_address_1': u'192.168.1.1', u'ip_address_2': u'192.68.1.2'}, {u'action': u'New', u'domain': u'example2.com', u'ip_address_1': u'192.168.1.3', u'ip_address_2': u'192.168.1.1'}], u'query': u'192.168.1.0', u'total': u'2'}} The "alerts" list is accessed by (assuming the dict is bound to obj): >>> obj['response']['alerts'] [{u'action': u'New', u'domain': u'example.com', u'ip_address_1': u'192.168.1.1', u'ip_address_2': u'192.68.1.2'}, {u'action': u'New', u'domain': u'example2.com', u'ip_address_1': u'192.168.1.3', u'ip_address_2': u'192.168.1.1'}] The first part can be done like this: alerts = obj['response']['alerts'] obj['response']['alerts'] = [d for d in alerts if d.get('ip_address_1') != '192.168.1.1'] Here a list comprehension is used to filter out those dictionaries with ip_address_1 192.168.1.1 and the resulting list is then rebound the the obj dictionary. After this obj is: >>> pprint(obj) {u'response': {u'alerts': [{u'action': u'New', u'domain': u'example2.com', u'ip_address_1': u'192.168.1.3', u'ip_address_2': u'192.168.1.1'}], u'query': u'192.168.1.0', u'total': u'2'}} Next, creating a list of the other ip addresses is easy with another list comprehension run on the alerts list after removing the undesired dicts as shown above: ip_addresses = [d['ip_address_1'] for d in obj['response']['alerts'] if d.get('ip_address_1') is not None] Notice that we use get() to handle the possibility that some dictionaries might not have a ip_address_1 key. >>> ip_addresses [u'192.168.1.3']
how to format this string with % arguments according to PEP8
url_base = ("https://maps.googleapis.com/maps/api/place/search/json?" "location=%s,%s&radius=500&types=food|doctor&sensor=false&" "key=%s&pagetoken=%s") % ( r['lat'], r['lng'], YOUR_API_KEY, '' ) While formatted as shown above I have no underlines in SublimeText regarding PEP8, but it looks weird to me. How can I format it: a) better (code is more readable) b) still according to PEP8?
Perhaps what makes it look weird to you is the way the very long string is being broken up to conform with PEP8. However, if you do need to break up a very long string, you are doing it the right way. Two juxtaposed strings are automatically concatenated by Python. In your particular situation you don't need to write a very long string, however. Instead, you could use urllib.urlencode to format the parameters for you: import urllib url_base = "https://maps.googleapis.com/maps/api/place/search/json?" params = urllib.urlencode( {'location': '{},{}'.format(r['lat'], r['lng']), 'radius': 500, 'types': 'food|doctor', 'sensor': 'false', 'key': YOUR_API_KEY, 'pagetoken': '' }) url = url_base + params
from urlparse import urlunparse query_params = { "location": "%s,%s" % (r['lat'], r['lng']), "radius": "500", "types": "food|doctor", "sensor": "false", "key": YOUR_API_KEY, "pagetoken": "" } url_base = (urlunparse(("https", "maps.googleapis.com", "/maps/api/place/search/json", None, "&".join("=".join(qp) for qp in query_params.items()), None)), ) Remember though, PEP8 are guidelines and suggestions, not a hard set of rules.
You can format it using a text editor of your choice. You can make it better if you find a format that suits you better. And you can ensure PEP8 compliance by sticking to PEP8. Or, straight from the urllib docs: >>> import urllib >>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0}) >>> params 'eggs=2&bacon=0&spam=1' >>> url = 'http://www.musi-cal.com/cgi-bin/query?' + params >>> url 'http://www.musi-cal.com/cgi-bin/query?eggs=2&bacon=0&spam=1'
Python Json Parse
I have made a mistake during my storage of json strings to a database. Accidentally I did not store the string as json but I stored it as the string formation of the Object. I received my_jstring['field'] and I have inserted as a string to the database. my_jstring['field'] is not json but a python json object. Is it possible to parse again this object that is in string format? My string is the following: '"\'{u\'\'full_name\'\': u\'\'Dublin City\'\', u\'\'url\'\': u\'\'https://api.twitter.com/1.1/geo/id/7dde0febc9ef245b.json\'\', u\'\'country\'\': u\'\'Ireland\'\', u\'\'place_type\'\': u\'\'city\'\', u\'\'bounding_box\'\': {u\'\'type\'\': u\'\'Polygon\'\', u\'\'coordinates\'\': [[[-6.3873911, 53.2987449], [-6.3873911, 53.4110598], [-6.1078047, 53.4110598], [-6.1078047, 53.2987449]]]}, u\'\'contained_within\'\': [], u\'\'country_code\'\': u\'\'IE\'\', u\'\'attributes\'\': {}, u\'\'id\'\': u\'\'7dde0febc9ef245b\'\', u\'\'name\'\': u\'\'Dublin City\'\'}\'"'
Use ast.literal_eval() to parse Python literals back into a Python object. You appear to have doubly qouted the value however, adding in extra single quotes. These need to be repaired too: data = ast.literal_eval(data) data = data[1:-1].replace("''", "'") obj = ast.literal_eval(data) Demo: >>> import ast >>> data = '"\'{u\'\'full_name\'\': u\'\'Dublin City\'\', u\'\'url\'\': u\'\'https://api.twitter.com/1.1/geo/id/7dde0febc9ef245b.json\'\', u\'\'country\'\': u\'\'Ireland\'\', u\'\'place_type\'\': u\'\'city\'\', u\'\'bounding_box\'\': {u\'\'type\'\': u\'\'Polygon\'\', u\'\'coordinates\'\': [[[-6.3873911, 53.2987449], [-6.3873911, 53.4110598], [-6.1078047, 53.4110598], [-6.1078047, 53.2987449]]]}, u\'\'contained_within\'\': [], u\'\'country_code\'\': u\'\'IE\'\', u\'\'attributes\'\': {}, u\'\'id\'\': u\'\'7dde0febc9ef245b\'\', u\'\'name\'\': u\'\'Dublin City\'\'}\'"' >>> data = ast.literal_eval(data) >>> data = data[1:-1].replace("''", "'") >>> obj = ast.literal_eval(data) >>> obj {u'country_code': u'IE', u'url': u'https://api.twitter.com/1.1/geo/id/7dde0febc9ef245b.json', u'country': u'Ireland', u'place_type': u'city', u'bounding_box': {u'type': u'Polygon', u'coordinates': [[[-6.3873911, 53.2987449], [-6.3873911, 53.4110598], [-6.1078047, 53.4110598], [-6.1078047, 53.2987449]]]}, u'contained_within': [], u'full_name': u'Dublin City', u'attributes': {}, u'id': u'7dde0febc9ef245b', u'name': u'Dublin City'}
json parsing using dictionaries
I have a json file that look like this: I will have to extract the events eg. 'APP_STARTED' 'ORIENTATION' etc {u'ParamElement_ReceivedTime': u'2012-11-02-00-05-31-748', u'ParamElement_Name': u'LOG_CONTENT', u'ParamElement_Info_0': {u'dict': {u'Events_list': [ { u'Event': u'APP_STARTED', u'time': u'2012-11-01 20:00:59.565 -0400'}, { u'time': u'2012-11-01 20:01:01.168 -0400', u'Event': u'ORIENTATION', u'Orientation': u'Portrait'}, {u'Event': u'CLIENT_RESULT_RECEIVED', u'time': u'2012-11-01 20:01:15.927 -0400'}, {u'Prev_SessionID': u'802911CC329E47139B61B58E21BF2FFF', u'Prev_TransactionID': u'2', u'Tab_Index': u'5', u'time': u'2012-11-01 20:01:15.941 -0400', u'Event': u'RESOLVED_TAB', u'Accuracy': u'5.000000'}, {u'Prev_TransactionID': u'2', u'Prev_SessionID': u'802911CC329E47139B61B58E21BF2FFF', u'Event': u'CLIENT_RESULT_RECEIVED', u'time': u'2012-11-01 20:01:16.568 -0400'} } The whole thing is stored in a variable called event_dict. I have a code that looks like: if event_dict: if 'dict' in event_dict['ParamElement_Info_0']: if 'el' in event_dict['ParamElement_Info_0']['dict']: if 'e' in event_dict['ParamElement_Info_0']['dict']['el']: print e['Event'] What could be the mistake?
Python approach is Ask forgiveness, not permission, and it is easier and better to use try-catch blocks instead of condition checks unless condition fail must be handled separately. try: event = event_dict['ParamElement_Info_0']['dict']['Events_list'] except Exception, e: log('Opsss, incorrect data format: %s' % e.message) In that way, you can see your errors easily.
You never define the variable e: your last line should be a loop, not a conditional like the earlier lines: for e in event_dict['ParamElement_Info_0']['dict']['el']: print e Also, I think you say "el" when you would need to say "Events_list", making your corrected code: if event_dict: if 'dict' in event_dict['ParamElement_Info_0']: if 'Events_list' in event_dict['ParamElement_Info_0']['dict']: for e in event_dict['ParamElement_Info_0']['dict']['Events_list']: print e
There is no 'el' element in your dictionary. When you write a for A in B you are creating a variable A to hold the contents of the iterable B. What you are doing is saying, if the key 'el' is in my dictionary... which it isn't. But Events_list is as #David points out. Here is what may be an easier approach. def item_getter(struct, key): parts = key.split('.', 1) if len(parts) > 1: key_part, rest_part = parts return item_getter(struct.get(key_part, {}), rest_part) return struct.get(key, None) items = item_getter(event_dict, "ParamElement_Info_0.dict.Events_list") events = [item.get('Event', 'No Event') for item in items] print events OUTPUT [u'APP_STARTED', u'ORIENTATION', u'CLIENT_RESULT_RECEIVED', u'RESOLVED_TAB', u'CLIENT_RESULT_RECEIVED']