Extract values from json-file which has no unique markers - python

A json-file which has unique markers (or [more appropriate] field-names) preceeding the values is (rather) easy to dissect, because you can perform a string search on the unique markers/field-names to find within the string the first and last position of the characters of the value, and with that info you can pinpoint the position of the value, and extract the value.
Have performed that function with various lua-scripts and Python-scripts (also on xml-files).
Now need to extract values from a json-file which does not have unique markers/ field-names, but just a multiple occurrence of "value_type" and "value", preceeding the 'name', respectively the 'value': see below.
{
"software_version": "NRZ-2017-099",
"age":"78",
"sensordatavalues":[
{"value_type":"SDS_P1","value":"4.43"},
{"value_type":"SDS_P2","value":"3.80"},
{"value_type":"temperature","value":"20.10"},
{"value_type":"humidity","value":"44.50"},
{"value_type":"samples","value":"614292"},
{"value_type":"min_micro","value":"233"},
{"value_type":"max_micro","value":"25951"},
{"value_type":"signal","value":"-66"}
]
}
Experience as described above does not provide working solution.
Question: In this json-filelayout, how to directly extract the specific, individual values (preferably by lua-script)?
[Or might XML-parsing provide an easier solution?]

Here is Python to read the JSON file and make it more convenient:
import json
import pprint
with open("/tmp/foo.json") as j:
data = json.load(j)
for sdv in data.pop('sensordatavalues'):
data[sdv['value_type']] = sdv['value']
pprint.pprint(data)
The results:
{'SDS_P1': '4.43',
'SDS_P2': '3.80',
'age': '78',
'humidity': '44.50',
'max_micro': '25951',
'min_micro': '233',
'samples': '614292',
'signal': '-66',
'software_version': 'NRZ-2017-099',
'temperature': '20.10'}

You might want to have a look into filter functions.
E.g. in your example json to get only the dict that contains the value for samples you could go by:
sample_sensordata = list(filter(lambda d: d["value_type"] == "samples", your_json_dict["sensordatavalues"]))
sample_value = sample_sensordata["value"]
To make a dictionary like Ned Batchelder said you could also go with a dict comprehension like this:
sensor_data_dict = {d['value_type']: d['value'] for d in a}
and then get the value you want just by sensor_data_dict['<ValueTypeYouAreLookingFor>']

A little bit late and I'm trying Anvil in which the previous answers didn't work. just for the curious people.
resp = anvil.http.request("http://<ipaddress>/data.json", json=True)
#print(resp) # prints json file
tempdict = resp['sensordatavalues'][2].values()
humiddict = resp['sensordatavalues'][3].values()
temperature = float(list(tempdict)[1])
humidity = float(list(humiddict)[1])
print(temperature)
print(humidity)

Related

Parse a json/dictionary with same key values

I currently have a list variable that looks like this:
list_of_dicts = [{"Away_Team":"KC", "Home_Team":"NYY"},
{"Away_Team":"TB", "Home_Team":"MIA"},
{"Away_Team":"TOR", "Home_Team":"BOS"},
]
As you can see, there are multiple keys with the same names, pertaining to the game matchups.
When I try to use:
print(json.dumps(list_of_dicts[0], indent=4, sort_keys=True))
...it only prints out the first matchup due to the same keys:
{
"Away_Team": "KC",
"Home_Team": "NYY"
}
How can I convert this list_of_dicts variable into something like the following output so I can use it like a valid dictionary or json object?
{
"Away_Team_1":"KC", "Home_Team_1":"NYY",
"Away_Team_2":"TB", "Home_Team_2":"MIA",
"Away_Team_3":"TOR", "Home_Team_3":"BOS",
}
This output doesn't need to be exactly that if a better solution is available, this is just to give you an idea of how I'd like to be able to parse the data.
The list_of_dicts variable can be of varying sizes, I've shown 3 here, but it could contain 1 or 10 matchups, it varies, so the solution needs to be dynamic to that.
You can add suffixes to the keys with enumerate:
list_of_dicts2 = [{f"{k}_{i}":v for k,v in d.items()} for i,d in enumerate(list_of_dicts, start=1)]
One option is to use pandas:
pd.DataFrame(list_of_dicts).to_csv('filename.csv', index=False)
gives
Away_Team,Home_Team
KC,NYY
TB,MIA
TOR,BOS
Now the index is implied by the row, and if you load it back in you'll have those indices. Pandas also supports to_json if you are hard set on using json though. You can even recover your original list from a dataframe using .to_dict(orient='records')
Data structure is important. You really don't need a dictionary for this. Simply reducing to a list of tuples the first slot always the away team, and the second the home team.
list_of_dicts = [{"Away_Team":"KC", "Home_Team":"NYY"},
{"Away_Team":"TB", "Home_Team":"MIA"},
{"Away_Team":"TOR", "Home_Team":"BOS"},
]
l = [tuple(l.values()) for l in list_of_dicts]
output:
[('KC', 'NYY'), ('TB', 'MIA'), ('TOR', 'BOS')]
The problem with your proposed solution is iterating through dicts where you don't know the key name is cumbersome, this solution makes the data structure easy to decipher, transform, or manipulate.

How to parse a list of dictionaries in Python 3

I'm using python(requests) to query an API. The JSON response is list of dictionaries, like below:
locationDescriptions = timeseries.publish.get('/GetLocationDescriptionList')['LocationDescriptions']
print(locationDescriptions[0])
{'Name': 'Test',
'Identifier': '000045',
'UniqueId': '3434jdfsiu3hk34uh8',
'IsExternalLocation': False,
'PrimaryFolder': 'All Locations',
'SecondaryFolders': [],
'LastModified': '2021-02-09T06:01:25.0446910+00:00',}
I'd like to extract 1 field (Identifier) as a list for further analysis (count, min, max, etc.) but I'm having a hard time figuring out how to do this.
Python has a syntax feature called "list comprehensions", and you can do something like:
identifiers = [item['Identifier'] for item in locationDescriptions]
Here is a small article that gives you more details, and also shows an alternate way using map. And here is one of the many resources detailing list comprehensions, should you need it.
You could extract them with a list comprehension:
identifiers = [i['Identifier'] for i in locationDescriptions]
You allude to needing a list of numbers (count, min, max, etc...), in which case:
identifiers = [int(i['Identifier']) for i in locationDescriptions]
You can do
ids = [locationDescription['Identifier'] for locationDescription in locationDescriptions]
You will have a list of identifiers as a string.
Best regards

removing json items from array if value is duplicate python

I am incredibly new to python.
I have an array full of json objects. Some of the json objects contain duplicated values. The array looks like this:
[{"id":"1","name":"Paul","age":"21"},
{"id":"2","name":"Peter","age":"22"},
{"id":"3","name":"Paul","age":"23"}]
What I am trying to do is to remove an item if the name is the same as another json object, and leave the first one in the array.
So in this case I should be left with
[{"id":"1"."name":"Paul","age":"21"},
{"id":"2","name":"Peter","age":"22"}]
The code I currently have can be seen below and is largely based on this answer:
import json
ds = json.loads('python.json') #this file contains the json
unique_stuff = { each['name'] : each for each in ds }.values()
all_ids = [ each['name'] for each in ds ]
unique_stuff = [ ds[ all_ids.index(text) ] for text in set(texts) ]
print unique_stuff
I am not even sure that this line is working ds = json.loads('python.json') #this file contains the json as when I try and print ds nothing shows up in the console.
You might have overdone in your approach. I might tend to rewrite the list as a dictionary with "name" as a key and then fetch the values
ds = [{"id":"1","name":"Paul","age":"21"},
{"id":"2","name":"Peter","age":"22"},
{"id":"3","name":"Paul","age":"23"}]
{elem["name"]:elem for elem in ds}.values()
Out[2]:
[{'age': '23', 'id': '3', 'name': 'Paul'},
{'age': '22', 'id': '2', 'name': 'Peter'}]
Off-course the items within the dictionary and the list may not be ordered, but I do not see much of a concern. If it is, let us know and we can think over it.
If you need to keep the first instance of "Paul" in your data a dictionary comprehension gives you the opposite result.
A simple solution could be as following
new = []
seen = set()
for record in old:
name = record['name']
if name not in seen:
seen.add(name)
new.append(record)
del seen
First of all, your json snippet has invalid format - there are dot instead of commas separating some keys.
You can solve your problem using a dictionary with names as keys:
import json
with open('python.json') as fp:
ds = json.load(fp) #this file contains the json
mem = {}
for record in ds:
name = record["name"]
if name not in mem:
mem[name] = record
print mem.values()

Finding a key value by referencing another key value in the same dict level in Python

I'm pretty new to Python, so I'm having a hard time even coming up with the proper jargon to describe my issue.
Basic idea is I have a dict that has the following structure:
myDict =
"SomeMetric":{
"day":[
{"date": "2013-01-01","value": 1234},
{"date": "2013-01-02","value": 5678},
etc...
I want to pull out the "value" where the date is known. So I want:
myDict["SomeMetric"]["day"]["value"] where myDict["SomeMetric"]["day"]["date"] = "2013-01-02"
Is there a nice one-line method for this without iterating through the whole dict as my dict is much larger, and I'm already iterating through it, so I'd rather not do nested iteritems.
Generator expressions to the resque:
next(d['value']
for d in myDict['SomeMetric']['day']
if d['date'] == "2013-01-02")
So, loop over all day dictionaries, and find the first one that matches the date you are looking for. This loop stops as soon as a match is found.
Do you have control over your data structure? It seems to be constructed in such a way that lends itself to sub-optimal lookups.
I'd structure it as such:
data = { 'metrics': { '2013-01-02': 1234, '2013-01-01': 4321 } }
And then your lookup is simply:
data['metrics']['2013-01-02']
Can you change the structure? If you can, you might find it much easier to change the day list to a dictionary which has dates as keys and values as values, so
myDict = {
"SomeMetric":{
"day":{
"2013-01-01": 1234,
"2013-01-02": 5678,
etc...
Then you can just index into it directly with
myDict["SomeMetric"]["day"]["2013-01-02"]

python 3: splitting information and reporting occurrences

Say for example i want to count how many times bob visits sears and walmart how would i do this by creating dictionaries?
information given:
bob:oct1:sears
bob:oct1:walmart
mary:oct2:walmart
don:oct2:sears
bob:oct4:walmart
mary:oct4:sears
Okay, as this might be homework, I’ll try to give you some hints on how to do this. If this is not homework, please say so, and I’ll restore my original answer and example code.
So first of all, you have your data set in a way that each entry is in single line. As we want to work with each data entry on its own, we have to split the original data into each lines. We can use str.split for that.
Each entry is constructed in a simple format name:date:location. So to get each of those segments again, we can use str.split again. Then we end up with separated content for each entry.
To store this, we want to sort the data by the name first. So we choose a dictionary taking the name as the key, and put in the visits as the data. As we don’t care about the date, we can forget about it. Instead we want to count how often a single location occurs for a given name. So what we do, is keep another dictionary using the locations as the key and the visit count as the data. So we end up with a doubled dictionary, looking like this:
{
'bob': {
'sears': 1,
'walmart': 1,
},
'mary': {
...
}
}
So to get the final answers we just look into that dictionary and can immediately read out the values.
#poke provided a nice explanation, here's a corresponding code:
Read input from files provided on command-line or stdin and dump occurrences in json format:
#!/usr/bin/env python
import csv
import fileinput
import json
import sys
from collections import defaultdict
visits = defaultdict(lambda: defaultdict(int))
for name, _, shop in csv.reader(fileinput.input(), delimiter=':'):
visits[name][shop] += 1
# pretty print
json.dump(visits, sys.stdout, indent=2)
Output
{
"bob": {
"sears": 1,
"walmart": 2
},
"don": {
"sears": 1
},
"mary": {
"sears": 1,
"walmart": 1
}
}
This representation allows easily to find out how many visits and where a person had.
If you always know both name and location then you could use a simpler representation:
visits = defaultdict(int)
for name, _, shop in csv.reader(fileinput.input(), delimiter=':'):
visits[name,shop] += 1
print(visits['bob','walmart'])
# -> 2

Categories

Resources