Parse a json/dictionary with same key values - python

I currently have a list variable that looks like this:
list_of_dicts = [{"Away_Team":"KC", "Home_Team":"NYY"},
{"Away_Team":"TB", "Home_Team":"MIA"},
{"Away_Team":"TOR", "Home_Team":"BOS"},
]
As you can see, there are multiple keys with the same names, pertaining to the game matchups.
When I try to use:
print(json.dumps(list_of_dicts[0], indent=4, sort_keys=True))
...it only prints out the first matchup due to the same keys:
{
"Away_Team": "KC",
"Home_Team": "NYY"
}
How can I convert this list_of_dicts variable into something like the following output so I can use it like a valid dictionary or json object?
{
"Away_Team_1":"KC", "Home_Team_1":"NYY",
"Away_Team_2":"TB", "Home_Team_2":"MIA",
"Away_Team_3":"TOR", "Home_Team_3":"BOS",
}
This output doesn't need to be exactly that if a better solution is available, this is just to give you an idea of how I'd like to be able to parse the data.
The list_of_dicts variable can be of varying sizes, I've shown 3 here, but it could contain 1 or 10 matchups, it varies, so the solution needs to be dynamic to that.

You can add suffixes to the keys with enumerate:
list_of_dicts2 = [{f"{k}_{i}":v for k,v in d.items()} for i,d in enumerate(list_of_dicts, start=1)]

One option is to use pandas:
pd.DataFrame(list_of_dicts).to_csv('filename.csv', index=False)
gives
Away_Team,Home_Team
KC,NYY
TB,MIA
TOR,BOS
Now the index is implied by the row, and if you load it back in you'll have those indices. Pandas also supports to_json if you are hard set on using json though. You can even recover your original list from a dataframe using .to_dict(orient='records')

Data structure is important. You really don't need a dictionary for this. Simply reducing to a list of tuples the first slot always the away team, and the second the home team.
list_of_dicts = [{"Away_Team":"KC", "Home_Team":"NYY"},
{"Away_Team":"TB", "Home_Team":"MIA"},
{"Away_Team":"TOR", "Home_Team":"BOS"},
]
l = [tuple(l.values()) for l in list_of_dicts]
output:
[('KC', 'NYY'), ('TB', 'MIA'), ('TOR', 'BOS')]
The problem with your proposed solution is iterating through dicts where you don't know the key name is cumbersome, this solution makes the data structure easy to decipher, transform, or manipulate.

Related

How to parse a list of dictionaries in Python 3

I'm using python(requests) to query an API. The JSON response is list of dictionaries, like below:
locationDescriptions = timeseries.publish.get('/GetLocationDescriptionList')['LocationDescriptions']
print(locationDescriptions[0])
{'Name': 'Test',
'Identifier': '000045',
'UniqueId': '3434jdfsiu3hk34uh8',
'IsExternalLocation': False,
'PrimaryFolder': 'All Locations',
'SecondaryFolders': [],
'LastModified': '2021-02-09T06:01:25.0446910+00:00',}
I'd like to extract 1 field (Identifier) as a list for further analysis (count, min, max, etc.) but I'm having a hard time figuring out how to do this.
Python has a syntax feature called "list comprehensions", and you can do something like:
identifiers = [item['Identifier'] for item in locationDescriptions]
Here is a small article that gives you more details, and also shows an alternate way using map. And here is one of the many resources detailing list comprehensions, should you need it.
You could extract them with a list comprehension:
identifiers = [i['Identifier'] for i in locationDescriptions]
You allude to needing a list of numbers (count, min, max, etc...), in which case:
identifiers = [int(i['Identifier']) for i in locationDescriptions]
You can do
ids = [locationDescription['Identifier'] for locationDescription in locationDescriptions]
You will have a list of identifiers as a string.
Best regards

Extract values from json-file which has no unique markers

A json-file which has unique markers (or [more appropriate] field-names) preceeding the values is (rather) easy to dissect, because you can perform a string search on the unique markers/field-names to find within the string the first and last position of the characters of the value, and with that info you can pinpoint the position of the value, and extract the value.
Have performed that function with various lua-scripts and Python-scripts (also on xml-files).
Now need to extract values from a json-file which does not have unique markers/ field-names, but just a multiple occurrence of "value_type" and "value", preceeding the 'name', respectively the 'value': see below.
{
"software_version": "NRZ-2017-099",
"age":"78",
"sensordatavalues":[
{"value_type":"SDS_P1","value":"4.43"},
{"value_type":"SDS_P2","value":"3.80"},
{"value_type":"temperature","value":"20.10"},
{"value_type":"humidity","value":"44.50"},
{"value_type":"samples","value":"614292"},
{"value_type":"min_micro","value":"233"},
{"value_type":"max_micro","value":"25951"},
{"value_type":"signal","value":"-66"}
]
}
Experience as described above does not provide working solution.
Question: In this json-filelayout, how to directly extract the specific, individual values (preferably by lua-script)?
[Or might XML-parsing provide an easier solution?]
Here is Python to read the JSON file and make it more convenient:
import json
import pprint
with open("/tmp/foo.json") as j:
data = json.load(j)
for sdv in data.pop('sensordatavalues'):
data[sdv['value_type']] = sdv['value']
pprint.pprint(data)
The results:
{'SDS_P1': '4.43',
'SDS_P2': '3.80',
'age': '78',
'humidity': '44.50',
'max_micro': '25951',
'min_micro': '233',
'samples': '614292',
'signal': '-66',
'software_version': 'NRZ-2017-099',
'temperature': '20.10'}
You might want to have a look into filter functions.
E.g. in your example json to get only the dict that contains the value for samples you could go by:
sample_sensordata = list(filter(lambda d: d["value_type"] == "samples", your_json_dict["sensordatavalues"]))
sample_value = sample_sensordata["value"]
To make a dictionary like Ned Batchelder said you could also go with a dict comprehension like this:
sensor_data_dict = {d['value_type']: d['value'] for d in a}
and then get the value you want just by sensor_data_dict['<ValueTypeYouAreLookingFor>']
A little bit late and I'm trying Anvil in which the previous answers didn't work. just for the curious people.
resp = anvil.http.request("http://<ipaddress>/data.json", json=True)
#print(resp) # prints json file
tempdict = resp['sensordatavalues'][2].values()
humiddict = resp['sensordatavalues'][3].values()
temperature = float(list(tempdict)[1])
humidity = float(list(humiddict)[1])
print(temperature)
print(humidity)

A list of data structures in Python

I am pretty new to Python (coming from a Java background) so was wondering if someone would have any advice on a data structure design question. I need to create a data structure with default values that would look something like this:
[
(Name=”name1”, {id=1, val1=”val1”} ),
(Name=”name2”, {id=2, val1=”val2”} )
]
i.e a list of tuples where each tuple consists of one string value (Name) and a dictionary of values.
The first piece of functionality I need is to be able to add to or override the above data structure with additional details e.g:
[
(Name=”name2”, {id=2, val1=”new value”} ) ,
(Name=”name2”, {id=3, val1=”another value”} ) ,
(Name=”name3”, {id=3, val1=”val3”} )
]
Which would ultimately result in a final data structure that looks like this:
[
(Name=”name1”, {id=1, val1=”val1”} ),
(Name=”name2”, {id=2, val1=”new value”} ) ,
(Name=”name2”, {id=3, val1=”another value”} ) ,
(Name=”name3”, {id=3, val1=”val3”} )
]
The second piece of functionality I need is to be able to access each tuple in the list according to the id value in the dictionary i.e
Get me tuple where name = “name2” and id=”3” .
Could anybody give me their opinions on how best this could be implemented in Python?
Thanks!
The namedtuple is closest to what you wrote but as others have said, there may be better designs for what you want.
You should try using dictionaries may be.
A dictionary is mutable and is another container type that can store any number of Python objects, including other container types. Dictionaries consist of pairs (called items) of keys and their corresponding values.
Python dictionaries are also known as associative arrays or hash tables. The general syntax of a dictionary is as follows:
dict = {'Alice': '2341', 'Beth': '9102', 'Cecil': '3258'}
You can create dictionary in the following way as well:
dict1 = { 'abc': 456 };
dict2 = { 'abc': 123, 98.6: 37 };
Each key is separated from its value by a colon (:), the items are separated by commas, and the whole thing is enclosed in curly braces. An empty dictionary without any items is written with just two curly braces, like this: {}.
Keys are unique within a dictionary while values may not be. The values of a dictionary can be of any type, but the keys must be of an immutable data type such as strings, numbers, or tuples.
Accessing Values in Dictionary:
To access dictionary elements, you can use the familiar square brackets along with the key to obtain its value. Following is a simple example:
#!/usr/bin/python
dict = {'Name': 'Zara', 'Age': 7, 'Class': 'First'};
print "dict['Name']: ", dict['Name'];
print "dict['Age']: ", dict['Age'];
When the above code is executed, it produces the following result:
dict['Name']: Zara
dict['Age']: 7
Source: http://www.tutorialspoint.com/python/python_dictionary.htm

Finding a key value by referencing another key value in the same dict level in Python

I'm pretty new to Python, so I'm having a hard time even coming up with the proper jargon to describe my issue.
Basic idea is I have a dict that has the following structure:
myDict =
"SomeMetric":{
"day":[
{"date": "2013-01-01","value": 1234},
{"date": "2013-01-02","value": 5678},
etc...
I want to pull out the "value" where the date is known. So I want:
myDict["SomeMetric"]["day"]["value"] where myDict["SomeMetric"]["day"]["date"] = "2013-01-02"
Is there a nice one-line method for this without iterating through the whole dict as my dict is much larger, and I'm already iterating through it, so I'd rather not do nested iteritems.
Generator expressions to the resque:
next(d['value']
for d in myDict['SomeMetric']['day']
if d['date'] == "2013-01-02")
So, loop over all day dictionaries, and find the first one that matches the date you are looking for. This loop stops as soon as a match is found.
Do you have control over your data structure? It seems to be constructed in such a way that lends itself to sub-optimal lookups.
I'd structure it as such:
data = { 'metrics': { '2013-01-02': 1234, '2013-01-01': 4321 } }
And then your lookup is simply:
data['metrics']['2013-01-02']
Can you change the structure? If you can, you might find it much easier to change the day list to a dictionary which has dates as keys and values as values, so
myDict = {
"SomeMetric":{
"day":{
"2013-01-01": 1234,
"2013-01-02": 5678,
etc...
Then you can just index into it directly with
myDict["SomeMetric"]["day"]["2013-01-02"]

Accessing elements of Python dictionary by index

Consider a dict like
mydict = {
'Apple': {'American':'16', 'Mexican':10, 'Chinese':5},
'Grapes':{'Arabian':'25','Indian':'20'} }
How do I access for instance a particular element of this dictionary?
for instance, I would like to print the first element after some formatting the first element of Apple which in our case is 'American' only?
Additional information
The above data structure was created by parsing an input file in a python function. Once created however it remains the same for that run.
I am using this data structure in my function.
So if the file changes, the next time this application is run the contents of the file are different and hence the contents of this data structure will be different but the format would be the same.
So you see I in my function I don't know that the first element in Apple is 'American' or anything else so I can't directly use 'American' as a key.
Given that it is a dictionary you access it by using the keys. Getting the dictionary stored under "Apple", do the following:
>>> mydict["Apple"]
{'American': '16', 'Mexican': 10, 'Chinese': 5}
And getting how many of them are American (16), do like this:
>>> mydict["Apple"]["American"]
'16'
If the questions is, if I know that I have a dict of dicts that contains 'Apple' as a fruit and 'American' as a type of apple, I would use:
myDict = {'Apple': {'American':'16', 'Mexican':10, 'Chinese':5},
'Grapes':{'Arabian':'25','Indian':'20'} }
print myDict['Apple']['American']
as others suggested. If instead the questions is, you don't know whether 'Apple' as a fruit and 'American' as a type of 'Apple' exist when you read an arbitrary file into your dict of dict data structure, you could do something like:
print [ftype['American'] for f,ftype in myDict.iteritems() if f == 'Apple' and 'American' in ftype]
or better yet so you don't unnecessarily iterate over the entire dict of dicts if you know that only Apple has the type American:
if 'Apple' in myDict:
if 'American' in myDict['Apple']:
print myDict['Apple']['American']
In all of these cases it doesn't matter what order the dictionaries actually store the entries. If you are really concerned about the order, then you might consider using an OrderedDict:
http://docs.python.org/dev/library/collections.html#collections.OrderedDict
As I noticed your description, you just know that your parser will give you a dictionary that its values are dictionary too like this:
sampleDict = {
"key1": {"key10": "value10", "key11": "value11"},
"key2": {"key20": "value20", "key21": "value21"}
}
So you have to iterate over your parent dictionary. If you want to print out or access all first dictionary keys in sampleDict.values() list, you may use something like this:
for key, value in sampleDict.items():
print value.keys()[0]
If you want to just access first key of the first item in sampleDict.values(), this may be useful:
print sampleDict.values()[0].keys()[0]
If you use the example you gave in the question, I mean:
sampleDict = {
'Apple': {'American':'16', 'Mexican':10, 'Chinese':5},
'Grapes':{'Arabian':'25','Indian':'20'}
}
The output for the first code is:
American
Indian
And the output for the second code is:
American
EDIT 1:
Above code examples does not work for version 3 and above of python; since from version 3, python changed the type of output of methods keys and values from list to dict_values. Type dict_values is not accepting indexing, but it is iterable. So you need to change above codes as below:
First One:
for key, value in sampleDict.items():
print(list(value.keys())[0])
Second One:
print(list(list(sampleDict.values())[0].keys())[0])
I know this is 8 years old, but no one seems to have actually read and answered the question.
You can call .values() on a dict to get a list of the inner dicts and thus access them by index.
>>> mydict = {
... 'Apple': {'American':'16', 'Mexican':10, 'Chinese':5},
... 'Grapes':{'Arabian':'25','Indian':'20'} }
>>>mylist = list(mydict.values())
>>>mylist[0]
{'American':'16', 'Mexican':10, 'Chinese':5},
>>>mylist[1]
{'Arabian':'25','Indian':'20'}
>>>myInnerList1 = list(mylist[0].values())
>>>myInnerList1
['16', 10, 5]
>>>myInnerList2 = list(mylist[1].values())
>>>myInnerList2
['25', '20']
As a bonus, I'd like to offer kind of a different solution to your issue. You seem to be dealing with nested dictionaries, which is usually tedious, especially when you have to check for existence of an inner key.
There are some interesting libraries regarding this on pypi, here is a quick search for you.
In your specific case, dict_digger seems suited.
>>> import dict_digger
>>> d = {
'Apple': {'American':'16', 'Mexican':10, 'Chinese':5},
'Grapes':{'Arabian':'25','Indian':'20'}
}
>>> print(dict_digger.dig(d, 'Apple','American'))
16
>>> print(dict_digger.dig(d, 'Grapes','American'))
None
You can use mydict['Apple'].keys()[0] in order to get the first key in the Apple dictionary, but there's no guarantee that it will be American. The order of keys in a dictionary can change depending on the contents of the dictionary and the order the keys were added.
You can't rely on order of dictionaries, but you may try this:
mydict['Apple'].items()[0][0]
If you want the order to be preserved you may want to use this:
http://www.python.org/dev/peps/pep-0372/#ordered-dict-api
Simple Example to understand how to access elements in the dictionary:-
Create a Dictionary
d = {'dog' : 'bark', 'cat' : 'meow' }
print(d.get('cat'))
print(d.get('lion'))
print(d.get('lion', 'Not in the dictionary'))
print(d.get('lion', 'NA'))
print(d.get('dog', 'NA'))
Explore more about Python Dictionaries and learn interactively here...
Few people appear, despite the many answers to this question, to have pointed out that dictionaries are un-ordered mappings, and so (until the blessing of insertion order with Python 3.7) the idea of the "first" entry in a dictionary literally made no sense. And even an OrderedDict can only be accessed by numerical index using such uglinesses as mydict[mydict.keys()[0]] (Python 2 only, since in Python 3 keys() is a non-subscriptable iterator.)
From 3.7 onwards and in practice in 3,6 as well - the new behaviour was introduced then, but not included as part of the language specification until 3.7 - iteration over the keys, values or items of a dict (and, I believe, a set also) will yield the least-recently inserted objects first. There is still no simple way to access them by numerical index of insertion.
As to the question of selecting and "formatting" items, if you know the key you want to retrieve in the dictionary you would normally use the key as a subscript to retrieve it (my_var = mydict['Apple']).
If you really do want to be able to index the items by entry number (ignoring the fact that a particular entry's number will change as insertions are made) then the appropriate structure would probably be a list of two-element tuples. Instead of
mydict = {
'Apple': {'American':'16', 'Mexican':10, 'Chinese':5},
'Grapes':{'Arabian':'25','Indian':'20'} }
you might use:
mylist = [
('Apple', {'American':'16', 'Mexican':10, 'Chinese':5}),
('Grapes', {'Arabian': '25', 'Indian': '20'}
]
Under this regime the first entry is mylist[0] in classic list-endexed form, and its value is ('Apple', {'American':'16', 'Mexican':10, 'Chinese':5}). You could iterate over the whole list as follows:
for (key, value) in mylist: # unpacks to avoid tuple indexing
if key == 'Apple':
if 'American' in value:
print(value['American'])
but if you know you are looking for the key "Apple", why wouldn't you just use a dict instead?
You could introduce an additional level of indirection by cacheing the list of keys, but the complexities of keeping two data structures in synchronisation would inevitably add to the complexity of your code.
With the following small function, digging into a tree-shaped dictionary becomes quite easy:
def dig(tree, path):
for key in path.split("."):
if isinstance(tree, dict) and tree.get(key):
tree = tree[key]
else:
return None
return tree
Now, dig(mydict, "Apple.Mexican") returns 10, while dig(mydict, "Grape") yields the subtree {'Arabian':'25','Indian':'20'}. If a key is not contained in the dictionary, dig returns None.
Note that you can easily change (or even parameterize) the separator char from '.' to '/', '|' etc.
mydict = {
'Apple': {'American':'16', 'Mexican':10, 'Chinese':5},
'Grapes':{'Arabian':'25','Indian':'20'} }
for n in mydict:
print(mydict[n])

Categories

Resources