Say I have a list with some numbers that are duplicates.
list = [1,1,1,1,2,3,4,4,1,2,5,6]
I want to identify all the elements in the list that are repeating and consecutive, including the first element, i.e. replacing all elements in the list to values in a dictionary:
mydict = {1: 'a', 4: 'd'}
list = ['a','a','a','a',2,3,'d','d',1,2,5,6]
Because I want to replace the first instance of the repetition as well, I am quite confused as to how to proceed!
itertools.groupby is your friend:
from itertools import groupby
mydict = {1: 'a', 4: 'd'}
A = [1,1,1,1,2,3,4,4,1,2,5,6]
res = []
for k, g in groupby(A):
size = len(list(g))
if size > 1:
res.extend([mydict[k]] * size) # see note 1
else:
res.append(k)
print(res) # -> ['a', 'a', 'a', 'a', 2, 3, 'd', 'd', 1, 2, 5, 6]
Notes:
If you want to catch possible KeyErrors and have a default value you want to fall back on, use mydict.get(k, <default>) instead of mydict[k]
Considering that I have two lists like:
l1 = ['a', 'c', 'b', 'e', 'f', 'd']
l2 = [
'x','q','we','da','po',
'a', 'el1', 'el2', 'el3', 'el4',
'b', 'some_other_el_1', 'some_other_el_2',
'c', 'another_element_1', 'another_element_2',
'd', '', '', 'another_element_3', 'd4'
]
and I need to create a dictionary where the keys are those element from second list that are found in the first and values are lists of elements found between "keys" like:
result = {
'a': ['el1', 'el2', 'el3', 'el4'],
'b': ['some_other_el_1', 'some_other_el_2'],
'c': ['another_element_1', 'another_element_2'],
'd': ['', '', 'another_element_3', 'd4']
}
What's a more pythonic way to do this?
Currently I'm doing this :
# I'm not sure that the first element in the second list
# will also be in the first so I have to create a key
k = ''
d[k] = []
for x in l2:
if x in l1:
k = x
d[k] = []
else:
d[k].append(x)
But I'm quite positive that this is not the best way to do it and it also doesn't looks nice :)
Edit:
I also have to mention that no list is necessary ordered and neither the second list must start with an element from the first one.
I don't think you'll do much better if this is the most specific statement of the problem. I mean I'd do it this way, but it's not much better.
import collections
d = collections.defaultdict(list)
s = set(l1)
k = ''
for x in l2:
if x in s:
k = x
else:
d[k].append(x)
For fun, you can also do this with itertools and 3rd party numpy:
import numpy as np
from itertools import zip_longest, islice
arr = np.where(np.in1d(l2, l1))[0]
res = {l2[i]: l2[i+1: j] for i, j in zip_longest(arr, islice(arr, 1, None))}
print(res)
{'a': ['el1', 'el2', 'el3', 'el4'],
'b': ['some_other_el_1', 'some_other_el_2'],
'c': ['another_element_1', 'another_element_2'],
'd': ['', '', 'another_element_3', 'd4']}
Here is a version using itertools.groupby. It may or may not be more efficient than the plain version from your post, depending on how groupby is implemented, because the for loop has fewer iterations.
from itertools import groupby
from collections import defaultdict, deque
def group_by_keys(keys, values):
"""
>>> sorted(group_by_keys('abcdef', [
... 1, 2, 3,
... 'b', 4, 5,
... 'd',
... 'a', 6, 7,
... 'c', 8, 9,
... 'a', 10, 11, 12
... ]).items())
[('a', [6, 7, 10, 11, 12]), ('b', [4, 5]), ('c', [8, 9])]
"""
keys = set(keys)
result = defaultdict(list)
current_key = None
for is_key, items in groupby(values, key=lambda x: x in keys):
if is_key:
current_key = deque(items, maxlen=1).pop() # last of items
elif current_key is not None:
result[current_key].extend(items)
return result
This doesn't distinguish between keys that don't occur in values at all (like e and f), and keys for which there are no corresponding values (like d). If this information is needed, one of the other solutions might be better suited.
Updated ... Again
I misinterpreted the question. If you are using large lists then list comprehensions are the way to go and they are fairly simple once you learn how to use them.
I am going to use two list comprehensions.
idxs = [i for i, val in enumerate(l2) if val in l1] + [len(l2)+1]
res = {l2[idxs[i]]: list(l2[idxs[i]+1: idxs[i+1]]) for i in range(len(idxs)-1)}
print(res)
Results:
{'a': ['el1', 'el2', 'el3', 'el4'],
'b': ['some_other_el_1', 'some_other_el_2'],
'c': ['another_element_1', 'another_element_2'],
'd': ['', '', 'another_element_3', 'd4']}
Speed Testing for large lists:
import collections
l1 = ['a', 'c', 'b', 'e', 'f', 'd']
l2 = [
'x','q','we','da','po',
'a', 'el1', 'el2', 'el3', 'el4', *(str(i) for i in range(300)),
'b', 'some_other_el_1', 'some_other_el_2', *(str(i) for i in range(100)),
'c', 'another_element_1', 'another_element_2', *(str(i) for i in range(200)),
'd', '', '', 'another_element_3', 'd4'
]
def run_comp():
idxs = [i for i, val in enumerate(l2) if val in l1] + [len(l2)+1]
res = {l2[idxs[i]]: list(l2[idxs[i]+1: idxs[i+1]]) for i in range(len(idxs)-1)}
def run_other():
d = collections.defaultdict(list)
k = ''
for x in l2:
if x in l1:
k = x
else:
d[k].append(x)
import timeit
print('For Loop:', timeit.timeit(run_other, number=1000))
print("List Comprehension:", timeit.timeit(run_comp, number=1000))
Results:
For Loop: 0.1327093063242541
List Comprehension: 0.09343156142774986
old stuff below
This is rather simple with list comprehensions.
{key: [val for val in l2 if key in val] for key in l1}
Results:
{'a': ['a', 'a1', 'a2', 'a3', 'a4'],
'b': ['b', 'b1', 'b2', 'b3', 'b4'],
'c': ['c', 'c1', 'c2', 'c3', 'c4'],
'd': ['d', 'd1', 'd2', 'd3', 'd4'],
'e': [],
'f': []}
The code below shows what is happening above.
d = {}
for key in l1:
d[key] = []
for val in l2:
if key in val:
d[key].append(val)
The list comprehension / dictionary comprehension (First piece of code) is actually way faster. List comprehensions are creating the list in place which is much faster than walking through and appending to the list. Appending makes the program walk the list, allocate more memory, and add the data to the list which can be very slow for large lists.
References:
http://www.pythonforbeginners.com/basics/list-comprehensions-in-python
https://docs.python.org/3.6/tutorial/datastructures.html#list-comprehensions
You can use itertools.groupby:
import itertools
l1 = ['a', 'c', 'b', 'e', 'f', 'd']
l2 = ['x', 'q', 'we', 'da', 'po', 'a', 'el1', 'el2', 'el3', 'el4', 'b', 'some_other_el_1', 'some_other_el_2', 'c', 'another_element_1', 'another_element_2', 'd', '', '', 'another_element_3', 'd4']
groups = [[a, list(b)] for a, b in itertools.groupby(l2, key=lambda x:x in l1)]
final_dict = {groups[i][-1][-1]:groups[i+1][-1] for i in range(len(groups)-1) if groups[i][0]}
Output:
{'a': ['el1', 'el2', 'el3', 'el4'], 'b': ['some_other_el_1', 'some_other_el_2'], 'c': ['another_element_1', 'another_element_2'], 'd': ['', '', 'another_element_3', 'd4']}
Your code is readable, does the job and is reasonably efficient. There's no need to change much!
You could use more descriptive variable names and replace l1 with a set for faster lookup:
keys = ('a', 'c', 'b', 'e', 'f', 'd')
keys_and_values = [
'x','q','we','da','po',
'a', 'el1', 'el2', 'el3', 'el4',
'b', 'some_other_el_1', 'some_other_el_2',
'c', 'another_element_1', 'another_element_2',
'd', '', '', 'another_element_3', 'd4'
]
current_key = None
result = {}
for x in keys_and_values:
if x in keys:
current_key = x
result[current_key] = []
elif current_key:
result[current_key].append(x)
print(result)
# {'a': ['el1', 'el2', 'el3', 'el4'],
# 'c': ['another_element_1', 'another_element_2'],
# 'b': ['some_other_el_1', 'some_other_el_2'],
# 'd': ['', '', 'another_element_3', 'd4']}
def find_index():
idxs = [l2.index(i) for i in set(l1).intersection(set(l2))]
idxs.sort()
idxs+= [len(l2)+1]
res = {l2[idxs[i]]: list(l2[idxs[i]+1: idxs[i+1]]) for i in range(len(idxs)-1)}
return(res)
Comparison of methods, using justengel's test:
justengel
run_comp: .455
run_other: .244
mkrieger1
group_by_keys: .160
me
find_index: .068
Note that my method ignores keys that don't appear l2, and doesn't handle cases where keys appear more than once in l2. Adding in empty lists for keys that don't appear in l2 can be done by {**res, **{key: [] for key in set(l1).difference(set(l2))}}, which raises the time to .105.
Even cleaner than turning l1 into a set, use the keys of the dictionary you're building. Like this
d = {x: [] for x in l1}
k = None
for x in l2:
if x in d:
k = x
elif k is not None:
d[k].append(x)
This is because (in the worst case) your code would be iterating over all the values in l1 for every value in l2 on the if x in l1: line, because checking if a value is in a list takes linear time. Checking if a value is in a dictionary's keys is constant time in the average case (same with sets, as already suggested by Eric Duminil).
I set k to None and check for it because your code would've returned d with '': ['x','q','we','da','po'], which is presumably not what you want. This assumes l1 can't contain None.
My solution also assumes it's okay for the resulting dictionary to contain keys with empty lists if there are items in l1 that never appear in l2. If that's not okay, you can remove them at the end with
final_d = {k: v for k, v in d.items() if v}
I've implemented the exact same functionality with recursions, I also want a version without recursion as Python has a recursion limit and there are problems while sharing data.
sublist2 = [{'nothing': "Notice This"}]
sublist1 = [{'include':[sublist2]}]
mainlist = [{'nothing': 1}, {'include':[sublist1, sublist2]},
{'nothing': 2}, {'include':[sublist2]}]
What is to be filled in the Todo?
for i in mainlist:
if 'nothing' in i:
# Do nothing
else if 'include' in i:
# Todo
# Append the contents of the list mentioned recursively
# in it's own place without disrupting the flow
After the operation the expected result
mainlist = [{'nothing': 1},
{'nothing': "Notice This"}, {'nothing': "Notice This"},
{'nothing':2},
{'nothing': "Notice This"}]
If you notice sublist1 references to sublist2. That's the reason
{'include':[sublist1, sublist2]} is replaced by
{'nothing':"Notice This"}, {'nothing':"Notice This"}
I've tried the following
Inserting values into specific locations in a list in Python
How to get item's position in a list?
Instead of using recursion, you just look at the nth element and change it in place until it doesn't need any further processing.
sublist2 = [{'nothing': "Notice This"}]
sublist1 = [{'include':[sublist2]}]
mainlist = [{'nothing': 1}, {'include':[sublist1, sublist2]},
{'nothing': 2}, {'include':[sublist2]}]
index = 0
while index < len(mainlist):
if 'nothing' in mainlist[index]:
index += 1
elif 'include' in mainlist[index]:
# replace the 'include' entries with their corresponding list
mainlist[index:index+1] = mainlist[index]['include']
elif isinstance(mainlist[index], list):
# if an entry is a list, replace it with its entries
mainlist[index:index+1] = mainlist[index]
Note the difference between assigning to an entry l[0] and assigning to a slice l[0:1]
>>> l = [1, 2, 3, 4]
>>> l[3] = ['a', 'b', 'c']
>>> l
[1, 2, 3, ['a', 'b', 'c']]
>>> l[0:1] = ['x', 'y', 'z']
>>> l
>>> ['x', 'y', 'z', 2, 3, ['a', 'b', 'c']]
In Python 2.7: I'm measuring a process that counts the keys of a dictionary returned from a function.
A basic example is shown where the function getList() returns a list of chars which may be ['a'], ['b'], ['c'] or ['d']; most lists are single elements though two may be returned sometimes, e.g. ['a', 'd']. I'd like to count everything returned. A way I thought of doing this is shown below:
myDict = {'a':0, 'b':0, 'c':0, 'd':0, 'error':0, 'total':0}
for key in charList:
myDict[key] += 1
myDict['total'] += 1
Is there a more Pythonic way, perhaps dictionary comprehension to count keys within lists (of varying length)?
import random
def getList():
'''mimics a prcoess that returns a list of chars between a - d
[most lists are single elements, though some are two elements]'''
number = (random.randint(97,101))
if number == 101:
charList = [chr(number-1), chr(random.randint(97,100))]
if charList[0] == charList[1]:
getList()
else:
charList = [chr(number)]
return charList
myDict = {'a':0, 'b':0, 'c':0, 'd':0, 'error':0, 'total':0}
for counter in range(0,5):
charList = getList()
for key in charList:
print charList, '\t', key
try:
myDict[key] += 1
myDict['total'] += 1
except:
myDict['error'] += 1
print "\n",myDict
Output generated:
You can use the built-in collections.Counter class: https://docs.python.org/2/library/collections.html#collections.Counter
For example with your code:
import collections
ctr = collections.Counter()
for ii in range(0,5):
charList = getList()
ctr.update(charList)
ctr['total'] = sum(ctr.values())
print ctr
This will print:
Counter({'total': 7, 'd': 5, 'a': 1, 'c': 1})
You can use collections.Counter:
# You need to initialize the counter or you won't get the entry with 0 count.
myDict = collections.Counter({'a': 0, 'b': 0, 'c': 0, 'd': 0})
myDict.update(x for _ in range(0, 5) for x in getList())
# Then create the 'total' entry
myDict['total'] = sum(myDict.values())
Note: This may add new keys to the counter without setting the 'error' entry if the list returned by getList() contains new characters ('e', 'f', ...).
Use collections.Counter and a double-loop generator expression for feeding the individual elements into the counter:
>>> lst = [['a'], ['a', 'b'], ['c'], ['c', 'd']]
>>> c = collections.Counter((y for x in lst for y in x))
>>> c
Counter({'a': 2, 'c': 2, 'b': 1, 'd': 1})
>>> c.most_common(2)
[('a', 2), ('c', 2)]
>>> sum(c.values())
6
The easiest way I can think of is to flatten your list with chain and then use a Counter: let lst be the list [['a'], ['b'], ['c'], ['d'], ['a', 'd']]
>>> from itertools import chain
>>> from collections import Counter
>>> c = Counter(chain(*lst))
>>> c['total'] = sum(c.values())
>>> c
Counter({'total': 6, 'd': 2, 'a': 2, 'b': 1, 'c': 1})
Sorry about the question repost...I should have just edited this question in the first place. Flagged the new one for the mods. Sorry for the trouble
Had to re-write the question due to changed requirements.
I have a dictionary such as the following:
d = {'a': [4, 2], 'b': [3, 4], 'c': [4, 3], 'd': [4, 3], 'e': [4], 'f': [4], 'g': [4]}
I want to get the keys that are associated with the smallest length in the dictionary d, as well as those that have the maximum value.
In this case, the keys with the smallest length (smallest length of lists in this dictionary) should return
'e, 'f', 'g'
And those with the greatest value(the sum of the integers in each list) should return
'b' 'c'
I have tried
min_value = min(dict.itervalues())
min_keys = [k for k in d if dict[k] == min_value]
But that does not give me the result I want.
Any ideas?
Thanks!
Your problem is that your lists contain strings ('2'), and not integers (2). Leave out the quotes, or use the following:
min_value = min(min(map(int, v) for v in dct.values()))
min_keys = [k for k,v in d.items() if min_value in map(int, v)]
Similarily, to calculate the keys with the max length:
max_length = max(map(len, dct.values()))
maxlen_keys = [k for k,v in d.items() if max_length == len(v)]
Also, it's a bad idea to use dict as a variable name, as doing so overshadows the built-in dict.
You can use min() with a key= argument, and specify a key function that compares the way you want.
d = {'a': ['1'], 'b': ['1', '2'], 'c': ['2'], 'd':['1']}
min_value = min(d.values())
min_list = [key for key, value in d.items() if value == min_value]
max_len = len(max(d.values(), key=len))
long_list = [key for key, value in d.items() if len(value) == max_len]
print(min_list)
print(long_list)
Notes:
0) Don't use dict as a variable name; that's the name of the class for dictionary, and if you use it as a variable name you "shadow" it. I just used d for the name here.
1) min_value was easy; no need to use a key= function.
2) max_len uses a key= function, len(), to find the longest value.
How about using sorting and lambdas?
#!/usr/bin/env python
d = {'a': ['1'], 'b': ['1', '2'], 'c': ['8', '1'], 'd':['1'], 'e':['1', '2', '3'], 'f': [4, 1]}
sorted_by_sum_d = sorted(d, key=lambda key: sum(list(int(item) for item in d[key])))
sorted_by_length_d = sorted(d, key=lambda key: len(d[key]))
print "Sorted by sum of the items in the list : %s" % sorted_by_sum_d
print "Sorted by length of the items in the list : %s" % sorted_by_length_d
This would output:
Sorted by sum of the items in the list : ['a', 'd', 'b', 'f', 'e', 'c']
Sorted by length of the items in the list : ['a', 'd', 'c', 'b', 'f', 'e']
Be aware I changed the initial 'd' dictionary (just to make sure it was working)
Then, if you want the item with the biggest sum, you get the last element of the sorted_by_sum_d list.
(I'm not too sure this is what you want, though)
Edit:
If you can ensure that the lists are always going to be lists of integers (or numeric types, for that matter, such as long, float...), there's not need to cast strings to integers. The calculation of the sorted_by_sum_d variable can be done simply using:
d = {'a': [1], 'b': [1, 2], 'c': [8, 1], 'd':[1], 'e':[1, 2, 3], 'f': [4, 1]}
sorted_by_sum_d = sorted(d, key=lambda key: sum(d[key]))
I've found such a simple solution:
min_len = len(min(d.values(), key=(lambda value: len(value)))) # 1
min_keys = [key for i, key in enumerate(d) if len(d[key]) == min_len] # ['e', 'f', 'g']