I'm following the book 'Data Science from Scratch' and this is a piece of code in it:
dd_pair = defaultdict(lambda: [0, 0])
dd_pair[2][1] = 1 # now dd_pair contains {2: [0, 1]}
Can someone please help me understand why and how it works?
defaultdict takes a data-type as an initializer. Let's consider we have a dictionary called "users" with "ID" being key and a list as value.
We have to check if a "ID" exists in the dictionary, if yes we append something to the list, else we put an empty list in that place.
So with a regular dictionary, we do something like:
users = {}
if "id1" not in users:
users["id1"] = []
users["id1"].append("log")
Now with defaultdict, all we have to do is to set an initialiser as:
from collections import defaultdict
users = defaultdict(list) # Any key not existing in the dictionary will get assigned a `list()` object, which is an empty list
users["id1"].append("log")
So coming to your code,
dd_pair = defaultdict(lambda: [0, 0])
This says, any key which doesn't exist in dd_pair will get a list of two elements initialised to 0 as their initial value. So if you just do print(dd_pair["somerandomkey"]) it should print [0,0].
Therefore, dd_pair[2][1] translates roughly to look like this:
dd_pair[2] = [0,0] # dd_pair looks like: {2:[0,0]}
dd_pair[2][1] = 1 # dd_pair looks like: {2:[0,1]}
Why the need for lambda, why not just use [0,0] ?
The defaultdict constructor expects a callable (The constructor actually expects a default_factory, check out Python docs). In extremely simple terms, if we do defaultdict(somevar), somevar() should be valid.
So, if you just pass [0,0] to defaultdict it'll be wrong since [0,0]() is not valid at all.
So what you need is a function which returns [0,0], which can be simply implemented using lambda:[0,0]. (To verify, just do (lambda:[0,0])() , it will return [0,0]).
One more way is to create a class for your specific type, which is better explained in this answer: https://stackoverflow.com/a/36320098/
Related
I am computing returns from data in a dictionary. My keys are dates and for every key I have a dataframe with data to compute my returns. To compute the returns I need data today and yesterday (t and t-1), hence I want to initiate from the second observation (key).
Since I do not have much experience my initial thought was to execute like this:
dict_return = {}
for t, value in dict_data.items()[1:]:
returns = 'formula'
dict_returns[t] = returns
Which gave me the error:
TypeError: 'dict_items' object is not subscriptable
Searching for an answer, the only discussion I could find was skipping the first item, e.g. like this:
from itertools import islice
for key, value in islice(largeSet.items(), 1, None):
Is there a simple approach to skip the first key?
Thank you
If you are in Python 3 you need to use a list, Dict_ items ([‘No surfacing ‘,’flippers’]) returns a dict_ The items object is no longer of the list type and does not support index, this is why the list type can be used
I can think of 2 options, both require an extra step:
Option 1: Create a second dict without your key and loop over that
loop_dict = dict_data.pop(<key_to_remove>)
Then loop over loop_dict as you have done above.
Option 2: Create a list of keys from your dict and loop over that
keys = dict_data.keys()
loop_keys = keys[1:]
for key in loop_keys:
Etc
If you pass a reference to your dictionary to list() you will get a list of the dictionary's keys. This is because dictionaries are iterable. In your code you're not interested in the key's value so:
dict_data = {'a': 1, 'b': 2} # or whatever
dict_data[list(dict_data)[1]] = 3
print(dict_data)
Output:
{'a': 1, 'b': 3}
Background
I have a module called db.py that is basically consist of wrapper functions that make calls to the db. I have a table called nba and that has columns like player_name age player_id etc.
I have a simple function called db_cache() where i make a call to the db table and request to get all the player ids. The output of the response looks something like this
[Record(player_id='31200952409069'), Record(player_id='31201050710077'), Record(player_id='31201050500545'), Record(player_id='31001811412442'), Record(player_id='31201050607711')]
Then I simply iterate through the list and dump each item inside a dictionary.
I am wondering if there is a more pythonic way to populate the dictionary?
My code
def db_cache():
my_dict: Dict[str, None] = {}
response = db.run_query(sql="SELECT player_id FROM nba")
for item in response:
my_dict[item.player_id] = None
return my_dict
my_dict = db_cache()
This is built-in to the dict type:
>>> help(dict.fromkeys)
Help on built-in function fromkeys:
fromkeys(iterable, value=None, /) method of builtins.type instance
Create a new dictionary with keys from iterable and values set to value.
The value we want is the default of None, so all we need is:
my_dict = dict.from_keys(db.run_query(sql="SELECT player_id FROM nba"))
Note that the value will be reused, and not copied, which can cause problems if you want to use a mutable value. In these cases, you should instead simply use the dict comprehension, as given in #AvihayTsayeg's answer.
my_arr = [1,2,3,4]
my_dict = {"item":item for item in my_arr}
[I had problem on how to iter through dict to find a pair of similar words and output it then the delete from dict]
My intention is to generate a random output label then store it into dictionary then iter through the dictionary and store the first key in the list or some sort then iter through the dictionary to search for similar key eg Light1on and Light1off has Light1 in it and get the value for both of the key to store into a table in its respective columns.
such as
Dict = {Light1on,Light2on,Light1off...}
store value equal to Light1on the iter through the dictionary to get eg Light1 off then store its Light1on:value1 and Light1off:value2 into a table or DF with columns name: On:value1 off:value2
As I dont know how to insert the code as code i can only provide the image sry for the trouble,its my first time asking question here thx.
from collections import defaultdict
import difflib, random
olist = []
input = 10
olist1 = ['Light1on','Light2on','Fan1on','Kettle1on','Heater1on']
olist2 = ['Light2off','Kettle1off','Light1off','Fan1off','Heater1off']
events = list(range(input + 1))
for i in range(len(olist1)):
output1 = random.choice(olist1)
print(output1,'1')
olist1.remove(output1)
output2 = random.choice(olist2)
print(output2,'2')
olist2.remove(output2)
olist.append(output1)
olist.append(output2)
print(olist,'3')
outputList = {olist[i]:events[i] for i in range(10)}
print (str(outputList),'4')
# Iterating through the keys finding a pair match
for s in range(5):
for i in outputList:
if i == list(outputList)[0]:
skeys = difflib.get_close_matches(i, outputList, n=2, cutoff=0.75)
print(skeys,'5')
del outputList[skeys]
# Modified Dictionary
difflib.get_close_matches('anlmal', ['car', 'animal', 'house', 'animaltion'])
['animal']
Updated: I was unable to delete the pair of similar from the list(Dictionary) after founding par in the dictionary
You're probably getting an error about a dictionary changing size during iteration. That's because you're deleting keys from a dictionary you're iterating over, and Python doesn't like that:
d = {1:2, 3:4}
for i in d:
del d[i]
That will throw:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: dictionary changed size during iteration
To work around that, one solution is to store a list of the keys you want to delete, then delete all those keys after you've finished iterating:
keys_to_delete = []
d = {1:2, 3:4}
for i in d:
if i%2 == 1:
keys_to_delete.append(i)
for i in keys_to_delete:
del d[i]
Ta-da! Same effect, but this way avoids the error.
Also, your code above doesn't call the difflib.get_close_matches function properly. You can use print(help(difflib.get_close_matches)) to see how you are meant to call that function. You need to provide a second argument that indicates the items to which you wish to compare your first argument for possible matches.
All of that said, I have a feeling that you can accomplish your fundamental goals much more simply. If you spend a few minutes describing what you're really trying to do (this shouldn't involve any references to data types, it should just involve a description of your data and your goals), then I bet someone on this site can help you solve that problem much more simply!
Suppose I have multiple functions:
type = {'a', 'b', ..., 'z'}
f={}
f['a'] = some func_a...
f['b'] = some func_b...
...
f['z'] = some func_z...
Now I want to get the outputs of them
output = {}
for t in type:
output[t] = f[t](input)
I wonder if there is any way that we can do this in one line using a loop in a different way, like
[output[t] for t in type] = [f[t](input) for t in type]
Of course, this does not work. So would there be any valid way?
You want a dictionary comprehension. It works just like a list comprehension, but instead of a single expression to form the values, you get to provide two expressions to generate both a key and a value:
output = {t: f[t](input) for t in type}
The dict comprehension produces a new dictionary object; there is no need or use for an initial output = {} line.
I'd just iterate over the items of f, as it already has the keys we need:
output = {t: func(input) for t, func in f.items()}
As a side note, instead of using separate assignments for all your f functions, just use a single dictionary definition:
f = {
'a': some_func_a,
'b': some_func_b,
# ...
'z': some_func_z,
}
type is not a great name for a variable, either, as that masks the built-in function you may sometimes want to use. You don't need to create that set separately, as iteration over f would give you the same keys, or you can use set(f) to create a set copy, or f.keys(), to get a dictionary view object over the keys of f, which acts just like a set but is 'live' in that changes to f are reflected in it.
Summary of issue: I'm trying to create a nested Python dictionary, with keys defined by pre-defined variables and strings. And I'm populating the dictionary from regular expressions outputs. This mostly works. But I'm getting an error because the nested dictionary - not the main one - doesn't like having the key set to a string, it wants an integer. This is confusing me. So I'd like to ask you guys how I can get a nested python dictionary with string keys.
Below I'll walk you through the steps of what I've done. What is working, and what isn't. Starting from the top:
# Regular expressions module
import re
# Read text data from a file
file = open("dt.cc", "r")
dtcc = file.read()
# Create a list of stations from regular expression matches
stations = sorted(set(re.findall(r"\n(\w+)\s", dtcc)))
The result is good, and is as something like this:
stations = ['AAAA','BBBB','CCCC','DDDD']
# Initialize a new dictionary
rows = {}
# Loop over each station in the station list, and start populating
for station in stations:
rows[station] = re.findall("%s\s(.+)" %station, dtcc)
The result is good, and is something like this:
rows['AAAA'] = ['AAAA 0.1132 0.32 P',...]
However, when I try to create a sub-dictionary with a string key:
for station in stations:
rows[station] = re.findall("%s\s(.+)" %station, dtcc)
rows[station]["dt"] = re.findall("%s\s(\S+)" %station, dtcc)
I get the following error.
"TypeError: list indices must be integers, not str"
It doesn't seem to like that I'm specifying the second dictionary key as "dt". If I give it a number instead, it works just fine. But then my dictionary key name is a number, which isn't very descriptive.
Any thoughts on how to get this working?
The issue is that by doing
rows[station] = re.findall(...)
You are creating a dictionary with the station names as keys and the return value of re.findall method as values, which happen to be lists. So by calling them again by
rows[station]["dt"] = re.findall(...)
on the LHS row[station] is a list that is indexed by integers, which is what the TypeError is complaining about. You could do rows[station][0] for example, you would get the first match from the regex. You said you want a nested dictionary. You could do
rows[station] = dict()
rows[station]["dt"] = re.findall(...)
To make it a bit nicer, a data structure that you could use instead is a defaultdict from the collections module.
The defaultdict is a dictionary that accepts a default type as a type for its values. You enter the type constructor as its argument. For example dictlist = defaultdict(list) defines a dictionary that has as values lists! Then immediately doing dictlist[key].append(item1) is legal as the list is automatically created when setting the key.
In your case you could do
from collections import defaultdict
rows = defaultdict(dict)
for station in stations:
rows[station]["bulk"] = re.findall("%s\s(.+)" %station, dtcc)
rows[station]["dt"] = re.findall("%s\s(\S+)" %station, dtcc)
Where you have to assign the first regex result to a new key, "bulk" here but you can call it whatever you like. Hope this helps.