I am trying to pass a dictionary that was created automatically into a function that defines the scheme of a JSON.
This is my function:
def split_line_items(df_transactions):
for row in df_transactions.itertuples():
yield {
"count_0": row.count_0,
"count_1": row.count_1,
"count_2": row.count_2,
"total_count": row.total_count
}
and this is my dictionary:
d = {'"count_0"': 'row.count_0',
'"count_1"': 'row.count_1',
'"count_2"': 'row.count_2',
'"total_count"': 'row.total_count'}
How can I pass the dictionary into the function without having to modify it manually?
EDIT: I don't want to use the data contained in the dictionary, I want to use the structure (key/values) from the dictionary to define a JSON. The example in yield {} is how it should look like.
I guess you mean something like this, however the question is not very clear.
# considering this is your tf_transactions
transactions = namedtuple('r',['count_0', 'count_1', 'count_2', 'total_count'])
tf_transactions = [transactions(1,2,3,4), transactions(1,2,3,4)]
for row in tf_transactions:
yield {k.strip('\"'): getattr(row, v.split('.')[1]) for k,v in d.items()}
Related
I am trying to understand a Python for loop that is implemented as below
samples= [(objectinstance.get('sample', record['token'])['timestamp'], record)
for record in objectinstance.scene]
'scene' is a JSON file with list of dictionaries and each dictionary entry refers through values of the token to another JSON file called 'sample' containing 'timestamp' key among other keys.
Although I can roughly understand at a high level, I am not able to decipher how the 'record' is being used here as the output of object's get method. I am thinking this is some sort of list comprehension, but not sure. Can you help understand this and also point me any reference to understand this better? thank you
in non comprehension form it is as below
samples = []
for record in objectinstance.scene:
data = (
objectinstance.get('sample', record['token'])['timestamp'],
record
)
samples.append(data)
objectinstance.get('sample', record['token']) this looks like a method, which took two arguments and return a json/dictionary
{<key1>:<value1>, ... ,'timestmap':<somedata>, ...<keyn>:<valuen>}
and you are saving record with the timestamp value of this call.
it this objectinstance.get can be seen as
class Tmp:
def __init__(self):
self.scene = [{'token': 'a'}, {'token':'b'}, {'token':'c'}]
def get(self, arg1, arg2):
# calculation
return result
objectinstance = Tmp()
samples =[]
for record in objectinstance.scene:
object_instance_data = objectinstance.get('sample', record['token'])
data = object_instance_data['timestamp']
samples.append(data)
so as you can see, there is method in the object class name get, which take 2 arguments, and use them calculation to provide you result in dict/json which as timestamp as key value
Yes, you are right, it is a list comprehension. Schematically, it is something like this:
samples = [(timestamp, item) for item in list_of_dicts]
The result will be a list of touples, where (objectinstance.get('sample', record['token'])['timestamp'] is the first entry and record is the second.
Moreover, objectinstance.get('key', default) gets 'key' from a dict, if not present returns the default value, cf. documentation at python.org. The result of the get method seems to be a dict as well, from which the value of key ['timestamp'] is retrieved.
Background
I have a module called db.py that is basically consist of wrapper functions that make calls to the db. I have a table called nba and that has columns like player_name age player_id etc.
I have a simple function called db_cache() where i make a call to the db table and request to get all the player ids. The output of the response looks something like this
[Record(player_id='31200952409069'), Record(player_id='31201050710077'), Record(player_id='31201050500545'), Record(player_id='31001811412442'), Record(player_id='31201050607711')]
Then I simply iterate through the list and dump each item inside a dictionary.
I am wondering if there is a more pythonic way to populate the dictionary?
My code
def db_cache():
my_dict: Dict[str, None] = {}
response = db.run_query(sql="SELECT player_id FROM nba")
for item in response:
my_dict[item.player_id] = None
return my_dict
my_dict = db_cache()
This is built-in to the dict type:
>>> help(dict.fromkeys)
Help on built-in function fromkeys:
fromkeys(iterable, value=None, /) method of builtins.type instance
Create a new dictionary with keys from iterable and values set to value.
The value we want is the default of None, so all we need is:
my_dict = dict.from_keys(db.run_query(sql="SELECT player_id FROM nba"))
Note that the value will be reused, and not copied, which can cause problems if you want to use a mutable value. In these cases, you should instead simply use the dict comprehension, as given in #AvihayTsayeg's answer.
my_arr = [1,2,3,4]
my_dict = {"item":item for item in my_arr}
I need to parse a json file which unfortunately for me, does not follow the prototype. I have two issues with the data, but i've already found a workaround for it so i'll just mention it at the end, maybe someone can help there as well.
So i need to parse entries like this:
"Test":{
"entry":{
"Type":"Something"
},
"entry":{
"Type":"Something_Else"
}
}, ...
The json default parser updates the dictionary and therfore uses only the last entry. I HAVE to somehow store the other one as well, and i have no idea how to do this. I also HAVE to store the keys in the several dictionaries in the same order they appear in the file, thats why i am using an OrderedDict to do so. it works fine, so if there is any way to expand this with the duplicate entries i'd be grateful.
My second issue is that this very same json file contains entries like that:
"Test":{
{
"Type":"Something"
}
}
Json.load() function raises an exception when it reaches that line in the json file. The only way i worked around this was to manually remove the inner brackets myself.
Thanks in advance
You can use JSONDecoder.object_pairs_hook to customize how JSONDecoder decodes objects. This hook function will be passed a list of (key, value) pairs that you usually do some processing on, and then turn into a dict.
However, since Python dictionaries don't allow for duplicate keys (and you simply can't change that), you can return the pairs unchanged in the hook and get a nested list of (key, value) pairs when you decode your JSON:
from json import JSONDecoder
def parse_object_pairs(pairs):
return pairs
data = """
{"foo": {"baz": 42}, "foo": 7}
"""
decoder = JSONDecoder(object_pairs_hook=parse_object_pairs)
obj = decoder.decode(data)
print obj
Output:
[(u'foo', [(u'baz', 42)]), (u'foo', 7)]
How you use this data structure is up to you. As stated above, Python dictionaries won't allow for duplicate keys, and there's no way around that. How would you even do a lookup based on a key? dct[key] would be ambiguous.
So you can either implement your own logic to handle a lookup the way you expect it to work, or implement some sort of collision avoidance to make keys unique if they're not, and then create a dictionary from your nested list.
Edit: Since you said you would like to modify the duplicate key to make it unique, here's how you'd do that:
from collections import OrderedDict
from json import JSONDecoder
def make_unique(key, dct):
counter = 0
unique_key = key
while unique_key in dct:
counter += 1
unique_key = '{}_{}'.format(key, counter)
return unique_key
def parse_object_pairs(pairs):
dct = OrderedDict()
for key, value in pairs:
if key in dct:
key = make_unique(key, dct)
dct[key] = value
return dct
data = """
{"foo": {"baz": 42, "baz": 77}, "foo": 7, "foo": 23}
"""
decoder = JSONDecoder(object_pairs_hook=parse_object_pairs)
obj = decoder.decode(data)
print obj
Output:
OrderedDict([(u'foo', OrderedDict([(u'baz', 42), ('baz_1', 77)])), ('foo_1', 7), ('foo_2', 23)])
The make_unique function is responsible for returning a collision-free key. In this example it just suffixes the key with _n where n is an incremental counter - just adapt it to your needs.
Because the object_pairs_hook receives the pairs exactly in the order they appear in the JSON document, it's also possible to preserve that order by using an OrderedDict, I included that as well.
Thanks a lot #Lukas Graf, i got it working as well by implementing my own version of the hook function
def dict_raise_on_duplicates(ordered_pairs):
count=0
d=collections.OrderedDict()
for k,v in ordered_pairs:
if k in d:
d[k+'_dupl_'+str(count)]=v
count+=1
else:
d[k]=v
return d
Only thing remaining is to automatically get rid of the double brackets and i am done :D Thanks again
If you would prefer to convert those duplicated keys into an array, instead of having separate copies, this could do the work:
def dict_raise_on_duplicates(ordered_pairs):
"""Convert duplicate keys to JSON array."""
d = {}
for k, v in ordered_pairs:
if k in d:
if type(d[k]) is list:
d[k].append(v)
else:
d[k] = [d[k],v]
else:
d[k] = v
return d
And then you just use:
dict = json.loads(yourString, object_pairs_hook=dict_raise_on_duplicates)
Suppose I have multiple functions:
type = {'a', 'b', ..., 'z'}
f={}
f['a'] = some func_a...
f['b'] = some func_b...
...
f['z'] = some func_z...
Now I want to get the outputs of them
output = {}
for t in type:
output[t] = f[t](input)
I wonder if there is any way that we can do this in one line using a loop in a different way, like
[output[t] for t in type] = [f[t](input) for t in type]
Of course, this does not work. So would there be any valid way?
You want a dictionary comprehension. It works just like a list comprehension, but instead of a single expression to form the values, you get to provide two expressions to generate both a key and a value:
output = {t: f[t](input) for t in type}
The dict comprehension produces a new dictionary object; there is no need or use for an initial output = {} line.
I'd just iterate over the items of f, as it already has the keys we need:
output = {t: func(input) for t, func in f.items()}
As a side note, instead of using separate assignments for all your f functions, just use a single dictionary definition:
f = {
'a': some_func_a,
'b': some_func_b,
# ...
'z': some_func_z,
}
type is not a great name for a variable, either, as that masks the built-in function you may sometimes want to use. You don't need to create that set separately, as iteration over f would give you the same keys, or you can use set(f) to create a set copy, or f.keys(), to get a dictionary view object over the keys of f, which acts just like a set but is 'live' in that changes to f are reflected in it.
I am trying to return two dictionaries. person_to_friends and person_to_networks are given functions, and profiles_file is a text file.
What I wrote is:
def load_profiles(profiles_file, person_to_friends, person_to_networks):
"""
(file, dict of {str : list of strs}, dict of {str : list of strs}) -> NoneType
Update person to friends and person to networks dictionaries to include
the data in open file.
"""
profiles_file = open('data.txt', 'r')
person_to_friends = person_to_friends(profiles_file)
person_to_networks = person_to_networks(profiles_file)
return person_to_friends, person_to_networks
This only gives me person_to_friends dictionary..Could anyone can help this problem?
What I want to return is
{person_to_friends}
{person_to_networks}
Simply do:
return (person_to_friends, person_to_networks)
and when you call the function you need to unpack the return value:
person_to_friends, person_to_networks = load_profiles(var1, var2, var3)
You can return only one value (this value can be a tuple, as in your case). However, you can yield as much values as you need:
def load_profiles(profiles_file, person_to_friends, person_to_networks):
"""
(file, dict of {str : list of strs}, dict of {str : list of strs}) -> NoneType
Update person to friends and person to networks dictionaries to include
the data in open file.
"""
profiles_file = open('data.txt', 'r')
person_to_friends = person_to_friends(profiles_file)
person_to_networks = person_to_networks(profiles_file)
yield person_to_friends # you can do it without temp variable, obv.
yield person_to_networks
The difference is that with yield statement you don't construct a temporary tuple just to return two results at once. However, getting the values out of your "function" (that became a generator) will be slightly more difficult:
profiles = load_profiles(your args)
will not actually run your function at all, it just initializes a generator. To actually get values, you'll need to:
person_to_friends = next(profiles)
person_to_networks = next(profiles)
or just do a loop:
for result in load_profiles(your args):
do_something_with_your_dictionaries
So your function will return one value: the initialized generator object. Iterating over it in a loop (it can be for loop, map, filter, list(your_generator) or something else) or just calling next(your_generator) will give you both dictionaries you actually need.
The way you are returning two dictionaries is fine, something funny must be going on in the other parts of the code, if your remove them, everything works fine:
def load_profiles():
person_to_friends = {'a' : 1}
person_to_networks = {'b' : 2}
return person_to_friends, person_to_networks
Result:
>>> load_profiles()
({'a': 1}, {'b': 2})
>>> dict_1, dict_2 = load_profiles()
>>> dict_1
{'a': 1}
>>> dict_2
{'b': 2}
Your docstring states that the function parameter person_to_friends is a
dict of {str : list of strs}
But then you call it as though it were a function and overwrite it with the result:
person_to_friends = person_to_friends(profiles_file)
Is this a mistake in the docstring, or the code?
Possibly you are masking the real function definition by having a locally defined variable of the same name (ie the parameter). In general it is bad practice to override a variable of one type (eg function) with another vastly different type (eg dict) - although there are exceptions to this.
maybe you can try
class temp(a, b):
return dict(a=a, b=b)