I know this is going to sound like I just need to use json.loads from the title. But I don't think that's the case here. Or at least I should be able to do this without any libraries (plus I want to understand this and not just solve it with a library method).
What I have currently is a dictionary where the keys are words and the values are total counts for those words:
myDict = { "word1": 12, "word2": 18, "word3": 4, "word4": 45 }
and so on...
what I want is for it to become something like the following (so that I can insert it into a scraperwiki datastore):
myNewDict = {"entry": "word1", "count": 12, "entry": "word2", "count": 18, "entry": "word3", "count": 4, "entry": "word4", "count": 45}
I figured I could just loop over myDict and insert each key/value after my new chosen keys "entry" and "count" (like so):
for k, v in myDict.iteritems():
myNewDict = { "entry": k, "count": v }
but with this, myNewDict is only saving the last result (so only giving me myNewDict={"entry":"word4", "total":45}
what am I missing here?
What you need is a list:
entries = []
for k, v in myDict.iteritems():
entries.append({ "entry": k, "count": v })
Or even better with list comprehensions:
entries = [{'entry': k, 'count': v} for k, v in myDict.iteritems()]
In more details, you were overriding myDict at each iteration of the loop, creating a new dict every time. In case you need it, you could can add key/values to a dict like this :
myDict['key'] = ...
.. but used inside a loop, this would override the previous value associated to 'key', hence the use of a list. In the same manner, if you type:
myNewDict = {"entry": "word1", "count": 12, "entry": "word2", "count": 18, "entry": "word3", "count": 4, "entry": "word4", "count": 45}
you actually get {'count': 45, 'entry': 'word4'} !
Note: I don't know what's the expected format of the output data but JSON-wise, a list of dicts should be correct (in JSON keys are unique too I believe).
While it's not clear 100% clear what output you want, if you want just a string in the format you've outlined above, you could modify your loop to be of the form:
myCustomFormat = '{'
for k, v in myDict.iteritems():
myCustomFormat += '"entry": {0}, "count": {1},'.format(k, v)
# Use slice to trim off trailing comma
myCustomFormat = myCustomFormat[:-1] + '}'
That being said, this probably isn't what you want. As others have pointed out, the duplicative nature of the "keys" will make this somewhat difficult to parse.
Related
So essentially I have a JSON object obtained through an API that looks similar to the one below and I am wondering how I would collect the sub-elements such as name and quantity and place it into an array/list.
{
"item_one": {
"name": "Item One",
"weight": 0,
"quantity": 1
},
"item_two": {
"name": "Item Two",
"weight": 0,
"quantity": 23
},
"item_three": {
"name": "Item Three",
"weight": 0,
"quantity": 53
}
}
An example for what the desired output is would be the following:
nameLst = ['Item One', 'Item Two', 'Item Three']
quantityLst = ['1', '23', '53']
So far the only way I know how to do this would be to individually collect the quantity and name data by searching through all the specific items, this however would be impossible due to the sheer number of potential items.
You don't need to know the item names, you can simply loop over the keys of the dictionary and use those keys to query the JSON blob for each subdict.
namelst = []
quantitylst = []
for key in d.keys():
subdict = d[key]
namelst.append(subdict["name"])
quantitylst.append(subdict["quantity"])
If you don't need the keys at any point, then you can loop over the values solely as Kelly Bundy mentions.
for v in d.values():
namelst.append(v["name"])
quantitylst.append(v["quantity"])
So far the only way I know how to do this would be to individually collect the quantity and name data by searching through all the specific items, this however would be impossible due to the sheer number of potential items.
I imagine you're just saying that this would be hard to do by hand, and you could do something like this.
distinct_keys = {k for d in json_obj.values() for k in d}
# you seem to want to convert ints to strings?
# if so, consider (some_transform(d[k]) if k in d else None)
result = {k:[d.get(k, None) for d in json_obj.values()] for k in distinct_keys}
If you actually need to iterate through this thing one object at a time though, consider something like the following:
from collections import defaultdict
result = defaultdict(list)
for d in json_obj.values():
# if you KNOW you don't have missing data
# for k,v in d.items(): result[k].append(v)
# you probably do have missing data though, so a cost proportional
# to your key sizes is unavoidable starting from completely unprocessed
# json data. you could save a little work, but here's the basic idea
# the work we do is different based on which sets/maps have they
# keys we're operating on
s = set(d.keys())
new_keys = s.difference(result)
missing_keys = [k for k in result if k not in s]
same_keys = s.intersection(result)
# this doesn't necessarily have to be special cased, but it
# allows us to guarantee result is non-empty everywhere else
# and avoid some more special casing.
if new_keys and not result:
for k,v in d.items():
result[k].append(v)
else:
# backfill new keys we found with None
L = result[next(iter(result))]
for key in new_keys:
result[key] = [None]*len(L)
result[key].append(d[key])
# fill in None for the current object for any keys in result
# that we don't have available
for key in missing_keys:
result[key].append(None)
# for everything in both objects, just append the new data
for key in same_keys:
result[key].append(d[key])
Then if you really needed variables and not a dictionary you can explicitly store them that way.
for k,L in result.items():
globals()[f'{k}Lst'] = L
I'm trying to get a list of all keys in the nested level of my dictionary.
My dictionary resembles:
my_dict= {
'DICT':{
'level_1a':{
'level_2a':{}
},
'level_1b': {
'level_2b':{},
'level_2c':{}
}
}
My desired output should resemble:
['level_2a', 'level_2b', 'level_2c']
What I've tried:
[list(v) for k, v in json['DICT'].items()]
My current output:
[['level_2a'], ['level_2b', 'level_2c']]
I want my result to be fully flattened to a single-level list. I've tried flattening libraries but the result tends to appear as: ['level_2a', 'level_2blevel_2c'] which is incorrect. Not looking to make the code more complex by creating another method just to flatten this list.
Would appreciate some help, thank you!
Try:
my_dict = {
"DICT": {
"level_1a": {"level_2a": {}},
"level_1b": {"level_2b": {}, "level_2c": {}},
}
}
lst = [vv for v in my_dict["DICT"].values() for vv in v]
print(lst)
Prints:
['level_2a', 'level_2b', 'level_2c']
I currently have a dictionary of dictionaries in Python. They may lo0ok something like this:
stocks = {
"VPER": {
"mentions": 6,
"score": 120,
"currentPrice": 0.0393,
},
"APPL": {
"mentions": 16,
"score": 120,
"currentPrice": 0.0393,
},
"NIO": {
"mentions": 36,
"score": 120,
"currentPrice": 0.0393,
}
}
What I am trying to do is look through the dictionaries and count how many times mentions equals 5, then if that count is 10 remove the nested dictionary (APPL, NIO and so on). So if I had NIO, APPL, TSLA, EPR, EKG, LPD, TTL, AGR, JKR, POP as nested dictionaries and they each had their mentions key set to a value of 5 then I would want to remove them all from the stocks dictionary.
I am not really sure how to go about this, any documentation, advice or examples would be highly appreciated.
CLARIFIED LOGIC:
If there are ten occurrences of mentions: 5 then delete all nested dictionaries where the mentions are equal to five.
counter = sum(value["mentions"] == 5 for key, value in stocks.items())
if counter > 10:
stocks = {key: value for key, value in stocks.items() if value["mentions"] != 5}
You would get the sum of mentions equal to 5 with:
sum(1 for k,v in stocks.items() if v["mentions"]==5)
And you would delete the nested dicts upon that condition with:
stocks={k:v for k,v in stocks.items() if v["mentions"]!=5}
Let us imagine the following dictionary
dictionary = {
"key1": {
"value": [1, 3, 5],
},
"key2": {
"value": [1, 2, -1],
},
}
Is it possible to set all the "values" to [] without iterating over the dictionary keys? I want something like dictionary[]["value"]=[] such that all "value" attributes are set to []. But that doesn't work.
Because you need to avoid iteration, here is a little hacky way of solving the case.
Convert dictionary to string, replace and then back to dictionary:
import re, ast
dictionary = {
"key1": {
"value": [1, 3, 5],
},
"key2": {
"value": [1, 2, -1],
},
}
print(ast.literal_eval(re.sub(r'\[.*?\]', '[]', str(dictionary))))
# {'key1': {'value': []}, 'key2': {'value': []}}
I'm going to take a different tack here. Your question is a little misinformed. The implication is that it's "better" to avoid iterating dictionary keys. As mentioned, you can iterate over dictionary values. But, since internally Python stores dictionaries via two arrays, iteration is unavoidable.
Returning to your core question:
I want something like dictionary[]["value"]=[] such that all "value"
attributes are set to [].
Just use collections.defaultdict:
from collections import defaultdict
d = {k: defaultdict(list) for k in dictionary}
print(d['key1']['value']) # []
print(d['key2']['value']) # []
For the dictionary structure you have defined, this will certainly be more efficient than string conversion via repr + regex substitution.
If you insist on explicitly setting keys, you can avoid defaultdict at the cost of an inner dictionary comprehension:
d = {k: {i: [] for i in v} for k, v in dictionary.items()}
{'key1': {'value': []}, 'key2': {'value': []}}
I have this data structure in Python:
result = {
"data": [
{
"2015-08-27": {
"clicks": 10,
"views":20
}
},
{
"2015-08-28": {
"clicks": 6,
}
}
]
}
How can I add the elements of each dictionary? The output should be :
{
"clicks":16, # 10 + 6
"views":20
}
I am looking for a Pythonic solution for this. Any solutions using Counter are welcome but I am not able to implement it.
I have tried this but I get an error:
counters = []
for i in result:
for k,v in i.items():
counters.append(Counter(v))
sum(counters)
Your code was quite close to a workable solution, and we can make it work with a few important changes. The most important change is that we need to iterate over the "data" item in result.
from collections import Counter
result = {
"data": [
{
"2015-08-27": {
"clicks": 10,
"views":20
}
},
{
"2015-08-28": {
"clicks": 6,
}
}
]
}
counts = Counter()
for d in result['data']:
for k, v in d.items():
counts.update(v)
print(counts)
output
Counter({'views': 20, 'clicks': 16})
We can simplify that a little because we don't need the keys.
counts = Counter()
for d in result['data']:
for v in d.values():
counts.update(v)
The code you posted makes a list of Counters and then tries to sum them. I guess that's also a valid strategy, but unfortunately the sum built-in doesn't know how to add Counters together. But we can do it using functools.reduce.
from functools import reduce
counters = []
for d in result['data']:
for v in d.values():
counters.append(Counter(v))
print(reduce(Counter.__add__, counters))
However, I suspect that the first version will be faster, especially if there are lots of dicts to add together. Also, this version consumes more RAM, since it keeps a list of all the Counters.
Actually we can use sum to add the Counters together, we just have to give it an empty Counter as the start value.
print(sum(counters, Counter()))
We can combine this into a one-liner, eliminating the list by using a generator expression instead:
from collections import Counter
result = {
"data": [
{
"2015-08-27": {
"clicks": 10,
"views":20
}
},
{
"2015-08-28": {
"clicks": 6,
}
}
]
}
totals = sum((Counter(v) for i in result['data'] for v in i.values()), Counter())
print(totals)
output
Counter({'views': 20, 'clicks': 16})
This is not the best solution as I am sure that there are libraries that can get you there in a less verbose way but it is one you can easily read.
res = {}
for x in my_dict['data']:
for y in x:
for t in x[y]:
res.setdefault(t, 0)
res[t] += x[y][t]
print(res) # {'views': 20, 'clicks': 16}