I'm currently working on a small project (I'm relatively new to python) in which I have a dictionary of items (as an example):
dic = {
'ab': 'a',
'ac': 'a',
'bc': 'b',
'bd': 'b',
'cd': 'c',
'ce': 'c'
}
And would like to convert it into a list with each initial index being correlated to one of the values
sorted_lst[0][0] = ab
sorted_lst[2][1] = ce
and so forth. How would I go about doing this? Would it be easier to start with a sorted list rather than a dictionary?
The reason for doing so, is that I would like to print out columnal lists of guesses in which their assignment is derived from the initial letter of the guess. ie. something like:
ab bc cd
ac bd ce
So that the user can see their guesses in a more appealing way rather than just as an alphabetized list.
Looks like you could use a defaultdict to transform your data to 2D:
from collections import defaultdict
d = defaultdict(list)
for k,v in dic.items():
d[v].append(k)
sorted_lst = list(d.values())
# or to ensure sorting of the first level:
# sorted_lst = [d[k] for k in sorted(d)]
sorted_lst[0][1]
# ac
sorted_lst[2][1]
# ce
output: [['ab', 'ac'], ['bc', 'bd'], ['cd', 'ce']]
Here is one way to do so:
We sort the unique values, then loop through each one and group the keys into a list.
data = {
'ab': 'a',
'ac': 'a',
'bc': 'b',
'bd': 'b',
'cd': 'c',
'ce': 'c'
}
sorted_data = []
for item in sorted(set(data.values())):
sorted_data.append([key for key in data.keys() if data[key] == item])
# sorted_data = [['ab', 'ac'], ['bc', 'bd'], ['cd', 'ce']]
You can also use setdefault():
sorted_data = {}
for key, item in data.items():
sorted_data.setdefault(item, []).append(key)
sorted_data = list(sorted_data.values())
To print the values you can then loop over each item:
for i in range(len(sorted_data[0])):
for j in range(len(sorted_data)):
print(sorted_data[j][i], end=" ")
print("")
output:
ab bc cd
ac bd ce
Be the following list of elements:
list = ['aaa', 'xxx', 'bbb', 'ccc', 'xxx', 'bb']
also be the following dictionary
dict = {111:'aaa', 222:'bbb', 333:'ccc', 444:'ddd'}
I would like to delete the elements of the list that are NOT among the items (values) of the dictionary, that is, the new list will look like this:
list_new = ['aaa', 'bbb', 'ccc']
I did the following function in python:
for keys, value in enumerate(list):
if value not in dict.values():
list.remove(value)
The function should check if the dictionary values are in the list, if they are not, then delete. However, this function misses some elements (it does not delete all that you should delete). Any idea? What's wrong?
As #Jan pointed out, you shouldn't alter a list while iterating through it, since that can cause unwanted behaviour. You can, instead, create another list using list comprehension:
my_list = ['aaa', 'xxx', 'bbb', 'ccc', 'xxx', 'bb']
my_dict = {111:'aaa', 222:'bbb', 333:'ccc', 444:'ddd'}
my_new_list = [value for value in my_list if value in my_dict.values()]
To clarify on why you shouldn't alter a list while looping through it: removing a value from a list alter the index of all the following elements. For example, if I have a list ['a', 'b', 'c', 'd', 'e'] and I remove the value 'c', the values 'd' and 'e' would have their indexes subtracted by 1, since the list would now be ['a', 'b', 'd', 'e']. The problem is for iterates through a list by their indexes, what would cause value 'd' to be skipped from the iteration. Code example:
>>> my_list = ['a', 'b', 'c', 'd', 'e']
>>> for value in my_list:
... print(value)
... if value == 'c':
... my_list.remove(value)
...
a
b
c
e
Considering that I have two lists like:
l1 = ['a', 'c', 'b', 'e', 'f', 'd']
l2 = [
'x','q','we','da','po',
'a', 'el1', 'el2', 'el3', 'el4',
'b', 'some_other_el_1', 'some_other_el_2',
'c', 'another_element_1', 'another_element_2',
'd', '', '', 'another_element_3', 'd4'
]
and I need to create a dictionary where the keys are those element from second list that are found in the first and values are lists of elements found between "keys" like:
result = {
'a': ['el1', 'el2', 'el3', 'el4'],
'b': ['some_other_el_1', 'some_other_el_2'],
'c': ['another_element_1', 'another_element_2'],
'd': ['', '', 'another_element_3', 'd4']
}
What's a more pythonic way to do this?
Currently I'm doing this :
# I'm not sure that the first element in the second list
# will also be in the first so I have to create a key
k = ''
d[k] = []
for x in l2:
if x in l1:
k = x
d[k] = []
else:
d[k].append(x)
But I'm quite positive that this is not the best way to do it and it also doesn't looks nice :)
Edit:
I also have to mention that no list is necessary ordered and neither the second list must start with an element from the first one.
I don't think you'll do much better if this is the most specific statement of the problem. I mean I'd do it this way, but it's not much better.
import collections
d = collections.defaultdict(list)
s = set(l1)
k = ''
for x in l2:
if x in s:
k = x
else:
d[k].append(x)
For fun, you can also do this with itertools and 3rd party numpy:
import numpy as np
from itertools import zip_longest, islice
arr = np.where(np.in1d(l2, l1))[0]
res = {l2[i]: l2[i+1: j] for i, j in zip_longest(arr, islice(arr, 1, None))}
print(res)
{'a': ['el1', 'el2', 'el3', 'el4'],
'b': ['some_other_el_1', 'some_other_el_2'],
'c': ['another_element_1', 'another_element_2'],
'd': ['', '', 'another_element_3', 'd4']}
Here is a version using itertools.groupby. It may or may not be more efficient than the plain version from your post, depending on how groupby is implemented, because the for loop has fewer iterations.
from itertools import groupby
from collections import defaultdict, deque
def group_by_keys(keys, values):
"""
>>> sorted(group_by_keys('abcdef', [
... 1, 2, 3,
... 'b', 4, 5,
... 'd',
... 'a', 6, 7,
... 'c', 8, 9,
... 'a', 10, 11, 12
... ]).items())
[('a', [6, 7, 10, 11, 12]), ('b', [4, 5]), ('c', [8, 9])]
"""
keys = set(keys)
result = defaultdict(list)
current_key = None
for is_key, items in groupby(values, key=lambda x: x in keys):
if is_key:
current_key = deque(items, maxlen=1).pop() # last of items
elif current_key is not None:
result[current_key].extend(items)
return result
This doesn't distinguish between keys that don't occur in values at all (like e and f), and keys for which there are no corresponding values (like d). If this information is needed, one of the other solutions might be better suited.
Updated ... Again
I misinterpreted the question. If you are using large lists then list comprehensions are the way to go and they are fairly simple once you learn how to use them.
I am going to use two list comprehensions.
idxs = [i for i, val in enumerate(l2) if val in l1] + [len(l2)+1]
res = {l2[idxs[i]]: list(l2[idxs[i]+1: idxs[i+1]]) for i in range(len(idxs)-1)}
print(res)
Results:
{'a': ['el1', 'el2', 'el3', 'el4'],
'b': ['some_other_el_1', 'some_other_el_2'],
'c': ['another_element_1', 'another_element_2'],
'd': ['', '', 'another_element_3', 'd4']}
Speed Testing for large lists:
import collections
l1 = ['a', 'c', 'b', 'e', 'f', 'd']
l2 = [
'x','q','we','da','po',
'a', 'el1', 'el2', 'el3', 'el4', *(str(i) for i in range(300)),
'b', 'some_other_el_1', 'some_other_el_2', *(str(i) for i in range(100)),
'c', 'another_element_1', 'another_element_2', *(str(i) for i in range(200)),
'd', '', '', 'another_element_3', 'd4'
]
def run_comp():
idxs = [i for i, val in enumerate(l2) if val in l1] + [len(l2)+1]
res = {l2[idxs[i]]: list(l2[idxs[i]+1: idxs[i+1]]) for i in range(len(idxs)-1)}
def run_other():
d = collections.defaultdict(list)
k = ''
for x in l2:
if x in l1:
k = x
else:
d[k].append(x)
import timeit
print('For Loop:', timeit.timeit(run_other, number=1000))
print("List Comprehension:", timeit.timeit(run_comp, number=1000))
Results:
For Loop: 0.1327093063242541
List Comprehension: 0.09343156142774986
old stuff below
This is rather simple with list comprehensions.
{key: [val for val in l2 if key in val] for key in l1}
Results:
{'a': ['a', 'a1', 'a2', 'a3', 'a4'],
'b': ['b', 'b1', 'b2', 'b3', 'b4'],
'c': ['c', 'c1', 'c2', 'c3', 'c4'],
'd': ['d', 'd1', 'd2', 'd3', 'd4'],
'e': [],
'f': []}
The code below shows what is happening above.
d = {}
for key in l1:
d[key] = []
for val in l2:
if key in val:
d[key].append(val)
The list comprehension / dictionary comprehension (First piece of code) is actually way faster. List comprehensions are creating the list in place which is much faster than walking through and appending to the list. Appending makes the program walk the list, allocate more memory, and add the data to the list which can be very slow for large lists.
References:
http://www.pythonforbeginners.com/basics/list-comprehensions-in-python
https://docs.python.org/3.6/tutorial/datastructures.html#list-comprehensions
You can use itertools.groupby:
import itertools
l1 = ['a', 'c', 'b', 'e', 'f', 'd']
l2 = ['x', 'q', 'we', 'da', 'po', 'a', 'el1', 'el2', 'el3', 'el4', 'b', 'some_other_el_1', 'some_other_el_2', 'c', 'another_element_1', 'another_element_2', 'd', '', '', 'another_element_3', 'd4']
groups = [[a, list(b)] for a, b in itertools.groupby(l2, key=lambda x:x in l1)]
final_dict = {groups[i][-1][-1]:groups[i+1][-1] for i in range(len(groups)-1) if groups[i][0]}
Output:
{'a': ['el1', 'el2', 'el3', 'el4'], 'b': ['some_other_el_1', 'some_other_el_2'], 'c': ['another_element_1', 'another_element_2'], 'd': ['', '', 'another_element_3', 'd4']}
Your code is readable, does the job and is reasonably efficient. There's no need to change much!
You could use more descriptive variable names and replace l1 with a set for faster lookup:
keys = ('a', 'c', 'b', 'e', 'f', 'd')
keys_and_values = [
'x','q','we','da','po',
'a', 'el1', 'el2', 'el3', 'el4',
'b', 'some_other_el_1', 'some_other_el_2',
'c', 'another_element_1', 'another_element_2',
'd', '', '', 'another_element_3', 'd4'
]
current_key = None
result = {}
for x in keys_and_values:
if x in keys:
current_key = x
result[current_key] = []
elif current_key:
result[current_key].append(x)
print(result)
# {'a': ['el1', 'el2', 'el3', 'el4'],
# 'c': ['another_element_1', 'another_element_2'],
# 'b': ['some_other_el_1', 'some_other_el_2'],
# 'd': ['', '', 'another_element_3', 'd4']}
def find_index():
idxs = [l2.index(i) for i in set(l1).intersection(set(l2))]
idxs.sort()
idxs+= [len(l2)+1]
res = {l2[idxs[i]]: list(l2[idxs[i]+1: idxs[i+1]]) for i in range(len(idxs)-1)}
return(res)
Comparison of methods, using justengel's test:
justengel
run_comp: .455
run_other: .244
mkrieger1
group_by_keys: .160
me
find_index: .068
Note that my method ignores keys that don't appear l2, and doesn't handle cases where keys appear more than once in l2. Adding in empty lists for keys that don't appear in l2 can be done by {**res, **{key: [] for key in set(l1).difference(set(l2))}}, which raises the time to .105.
Even cleaner than turning l1 into a set, use the keys of the dictionary you're building. Like this
d = {x: [] for x in l1}
k = None
for x in l2:
if x in d:
k = x
elif k is not None:
d[k].append(x)
This is because (in the worst case) your code would be iterating over all the values in l1 for every value in l2 on the if x in l1: line, because checking if a value is in a list takes linear time. Checking if a value is in a dictionary's keys is constant time in the average case (same with sets, as already suggested by Eric Duminil).
I set k to None and check for it because your code would've returned d with '': ['x','q','we','da','po'], which is presumably not what you want. This assumes l1 can't contain None.
My solution also assumes it's okay for the resulting dictionary to contain keys with empty lists if there are items in l1 that never appear in l2. If that's not okay, you can remove them at the end with
final_d = {k: v for k, v in d.items() if v}
In Python 2.7: I'm measuring a process that counts the keys of a dictionary returned from a function.
A basic example is shown where the function getList() returns a list of chars which may be ['a'], ['b'], ['c'] or ['d']; most lists are single elements though two may be returned sometimes, e.g. ['a', 'd']. I'd like to count everything returned. A way I thought of doing this is shown below:
myDict = {'a':0, 'b':0, 'c':0, 'd':0, 'error':0, 'total':0}
for key in charList:
myDict[key] += 1
myDict['total'] += 1
Is there a more Pythonic way, perhaps dictionary comprehension to count keys within lists (of varying length)?
import random
def getList():
'''mimics a prcoess that returns a list of chars between a - d
[most lists are single elements, though some are two elements]'''
number = (random.randint(97,101))
if number == 101:
charList = [chr(number-1), chr(random.randint(97,100))]
if charList[0] == charList[1]:
getList()
else:
charList = [chr(number)]
return charList
myDict = {'a':0, 'b':0, 'c':0, 'd':0, 'error':0, 'total':0}
for counter in range(0,5):
charList = getList()
for key in charList:
print charList, '\t', key
try:
myDict[key] += 1
myDict['total'] += 1
except:
myDict['error'] += 1
print "\n",myDict
Output generated:
You can use the built-in collections.Counter class: https://docs.python.org/2/library/collections.html#collections.Counter
For example with your code:
import collections
ctr = collections.Counter()
for ii in range(0,5):
charList = getList()
ctr.update(charList)
ctr['total'] = sum(ctr.values())
print ctr
This will print:
Counter({'total': 7, 'd': 5, 'a': 1, 'c': 1})
You can use collections.Counter:
# You need to initialize the counter or you won't get the entry with 0 count.
myDict = collections.Counter({'a': 0, 'b': 0, 'c': 0, 'd': 0})
myDict.update(x for _ in range(0, 5) for x in getList())
# Then create the 'total' entry
myDict['total'] = sum(myDict.values())
Note: This may add new keys to the counter without setting the 'error' entry if the list returned by getList() contains new characters ('e', 'f', ...).
Use collections.Counter and a double-loop generator expression for feeding the individual elements into the counter:
>>> lst = [['a'], ['a', 'b'], ['c'], ['c', 'd']]
>>> c = collections.Counter((y for x in lst for y in x))
>>> c
Counter({'a': 2, 'c': 2, 'b': 1, 'd': 1})
>>> c.most_common(2)
[('a', 2), ('c', 2)]
>>> sum(c.values())
6
The easiest way I can think of is to flatten your list with chain and then use a Counter: let lst be the list [['a'], ['b'], ['c'], ['d'], ['a', 'd']]
>>> from itertools import chain
>>> from collections import Counter
>>> c = Counter(chain(*lst))
>>> c['total'] = sum(c.values())
>>> c
Counter({'total': 6, 'd': 2, 'a': 2, 'b': 1, 'c': 1})
I have dictionary like below
dict1 = {'a':{'a':20, 'b':30}, 'b':{'a':30, 'b':40}, 'c':{'a':20, 'b':30}, 'd':{'a':30, 'b':40}}
Then in following dictionary two dictionaries are same, so expected result will be like below
result = [['a','c'],['b','d']]
>>> seen = {}
>>> dict1 = {'a':{'a':20, 'b':30}, 'b':{'a':30, 'b':40}, 'c':{'a':20, 'b':30}, 'd':{'a':30, 'b':40}}
>>> for k in dict1:
fs = frozenset(dict1[k].items())
seen.setdefault(fs, []).append(k)
>>> seen.values() # note: unordered
[['a', 'c'], ['b', 'd']]
If order is needed:
>>> from collections import OrderedDict
>>> dict1 = {'a':{'a':20, 'b':30}, 'b':{'a':30, 'b':40}, 'c':{'a':20, 'b':30}, 'd':{'a':30, 'b':40}}
>>> seen = OrderedDict()
>>> for k in sorted(dict1):
fs = frozenset(dict1[k].items())
seen.setdefault(fs, []).append(k)
>>> seen.values()
[['a', 'c'], ['b', 'd']]
Note: This code is currently cross-compatible on Python 2/3. On Python 2 you can make it more efficient by using .iteritems() instead of .items()
A quick one: 1st get different values, then list comprehension.
>>> values = []
>>> for k in dict1:
if dict1[k] not in values:
values.append(dict1[k])
>>> values
[{'a': 20, 'b': 30}, {'a': 30, 'b': 40}]
>>> [[k for k in dict1 if dict1[k] == v] for v in values]
[['a', 'c'], ['b', 'd']]