Dict with curly braces and OrderedDict - python

I thought I set out a simple project for myself but I guess not. I think im using the Ordered dict function long because I keep getting:
ValueError: too many values to unpack (expected 2)
Code:
import random
import _collections
shop = {
'bread': 2,
'chips': 4,
'tacos': 5,
'tuna': 4,
'bacon': 8,
}
print(shop)
'''
items = list(shop.keys())
random.shuffle(items)
_collections.OrderedDict(items)
'''
n = random.randrange(0, len(shop.keys()))
m = random.randrange(n, len(shop.keys()))
if m <= n:
m += 1
print(n, " ", m)
for key in shop.keys():
value = shop[key] * random.uniform(0.7,2.3)
print(key, "=", int(value))
if n < m:
n += 1
else:
break
I would like for this code to mix up the dictionary, then multiply the values by 0.7 - 2.3. Then loop within the range 0-5 times in order to give me few random keys from the dictionary.
I have placed ''' ''' over the code that I struggle with and gives me the errors.

You are very close, but you cannot just give the list of keys ot the new OrderedDict, you must give the values too... try this:
import random
import collections
shop = {
'bread': 2,
'chips': 4,
'tacos': 5,
'tuna': 4,
'bacon': 8,
}
print(shop)
items = list(shop.keys())
random.shuffle(items)
print(items)
ordered_shop = collections.OrderedDict()
for item in items:
ordered_shop[item] = shop[item]
print(ordered_shop)
Example output:
{'chips': 4, 'tuna': 4, 'bread': 2, 'bacon': 8, 'tacos': 5}
['bacon', 'chips', 'bread', 'tuna', 'tacos']
OrderedDict([('bacon', 8), ('chips', 4), ('bread', 2), ('tuna', 4), ('tacos', 5)])
You could also do this like this (as pointed out by #ShadowRanger):
items = list(shop.items())
random.shuffle(items)
oshop = collections.OrderedDict(items)
This works because the OrderedDict constructor takes a list of key-value tuples. On reflection, this is probably what you were after with your initial approach - swap keys() for items().

d = collections.OrderedDict.fromkeys(items)
And then use newly created dict d as you wish.

Related

Order dictionary by key with numerical representation

I have this input, where each value has a range of 200:
d = {'600-800': 3, '1800-2000': 3, '1000-1200': 5, '400-600': 1, '2600-2800': 1}
And I am looking for this expected order:
{'400-600': 1, '600-800': 3, '1000-1200': 5, '1800-2000': 3, '2600-2800': 1}
Already tried something like this, but the order is just wrong:
import collections
od = collections.OrderedDict(sorted(d.items()))
print od
You can split the key into parts at '-' and use the first part as integer value to sort it. The second part is irrelevant for ordering because of the nature of your key-values (when converted to integer):
d = {'600-800': 3, '1800-2000': 3, '1000-1200': 5, '400-600': 1, '2600-2800': 1}
import collections
od = collections.OrderedDict(sorted(d.items(),key =lambda x: int(x[0].split("-")[0])))
print od
Output:
OrderedDict([('400-600', 1), ('600-800', 3), ('1000-1200', 5),
('1800-2000', 3), ('2600-2800', 1)])
Doku:
sorted(iterable,key)
Related:
How to sort a list of objects based on an attribute of the objects? for more "sort by key" examples
Are dictionaries ordered in Python 3.6+? .. which lets you omit the OrderedDict from 3.7+ on (or 3.6 CPython)
If you want to order your dictionary by the first year first (and then by the second year if needed, which is unnecessary in the given example, but feels more natural), you need to convert to integers and set a custom key:
d = {'600-800': 3, '1800-2000': 3, '1000-1200': 5, '400-600': 1, '2600-2800': 1}
sorted(d.items(), key=lambda t: tuple(map(int, t[0].split("-"))))
# [('400-600', 1),
# ('600-800', 3),
# ('1000-1200', 5),
# ('1800-2000', 3),
# ('2600-2800', 1)]
The conversion to integers is needed because e.g. "1000" < "200", but 1000 > 200. This list can be passed to OrderedDict afterwards like in your code, if needed.

Need take a value list from a dictionary, and subtract that value from a total number of values

I realize that title may be confusing, so allow me to explain.
I take input from a list that looks like L = [21.123, 22.123, 23.123, 21.123]
I remove the decimals, and sort the list high to low. I also change it to a dictionary with occurrences, which looks like
newlist = {23: 1, 22: 1, 21: 2}
What I need to do is to make a list of keys and values, which I can do. This gives me two lists, of [23, 22, 21] and [1, 1, 2] one for values and one for occurrences. I need to turn my occurrence list into the number of occurrences that are the same as, or lower than it's corresponding key.
I would like my list to look like [23, 22, 21] (which is easy to do) and [4, 3, 2] because 4 of the times are 23 seconds or less, 3 of the times are 22 seconds or less, and 2 of the times are 21 seconds or less.
I'm pretty sure I need a for loop to iterate through every frequency value, and change that value to be the total number of times entered into the list, and subtract any value more than it. I'm not sure how to go about this, so any help would be greatly appreciated.
You want a dictionary where, for each item in your data, the key is the rounded value (int(item)) and the value is the number of of items that are smaller than or equal to this rounded value.
A dictionary comprehension (combined with a list comprehension) can do this:
data = [21.123, 22.123, 23.123, 21.123]
aggregate = {
item: len([n for n in data if int(n) <= item])
for item in set(map(int, data))
}
print(aggregate) # -> {21: 2, 22: 3, 23: 4}
which is the single-statement form of writing such a loop:
aggregate = {}
for item in set(map(int, data)):
aggregate[item] = len([n for n in data if int(n) <= item])
}
Using set() makes the list unique. This way the loop only runs as often as necessary.
Here's a functional solution. The marginally tricky part is the backwards cumulative sum, which is possible feeding a reversed tuple to itertools.accumulate and then reversing the result.
from collections import Counter
from itertools import accumulate
from operator import itemgetter
L = [21.123, 22.123, 23.123, 21.123]
c = Counter(map(int, L)) # Counter({21: 2, 22: 1, 23: 1})
counter = sorted(c.items(), reverse=True) # [(23, 1), (22, 1), (21, 2)]
keys, counts = zip(*counter) # ((23, 22, 21), (1, 1, 2))
cumsum = list(accumulate(counts[::-1]))[::-1] # [4, 3, 2]
Your desired result is stored in keys and cumsum:
print(keys)
(23, 22, 21)
print(cumsum)
[4, 3, 2]
Assuming you get the counts correctly from [21.123, 22.123, 23.123, 21.123], a simple nested loop with a running sum can do the rest:
from collections import Counter
newlist = {23: 1, 22: 1, 21: 2}
counts = Counter()
for k in newlist:
for v in newlist:
if v <= k:
counts[k] += newlist[v]
print(counts)
# Counter({23: 4, 22: 3, 21: 2})
You could also use itertools.product() to condense the double loops into one:
from itertools import product
from collections import Counter
newlist = {23: 1, 22: 1, 21: 2}
counts = Counter()
for k, v in product(newlist, repeat=2):
if v <= k:
counts[k] += newlist[v]
print(counts)
# Counter({23: 4, 22: 3, 21: 2})
The above stores the counts in a collections.Counter(), you can get [4, 3, 2] by calling list(counts.values()).
I found my own solution which seems relatively simple. Code looks like
counter = 0
print(valuelist)
for i in valuelist:
print(int(solves - counter))
counter = counter + i
redonevalues.append(solves - counter + 1)
It takes my values, goes to the first one, adds the occurrences to counter, subtracts counter from solves, and adds 1 to even it out

Get index range of the repetitive elements in the list

Suppose I have a list a = [-1,-1,-1,1,1,1,2,2,2,-1,-1,-1,1,1,1] in python what i want is if there is any built in function in python in which we pass a list and it will return which element are present at what what index ranges for example
>>> index_range(a)
{-1 :'0-2,9-11', 1:'3-5,12-14', 2:'6-8'}
I have tried to use Counter function from collection.Counter library but it only outputs the count of the element.
If there is not any built in function can you please guide me how can i achieve this in my own function not the whole code just a guideline.
You can create your custom function using itertools.groupby and collections.defaultdict to get the range of numbers in the form of list as:
from itertools import groupby
from collections import defaultdict
def index_range(my_list):
my_dict = defaultdict(list)
for i, j in groupby(enumerate(my_list), key=lambda x: x[1]):
index_range, numlist = list(zip(*j))
my_dict[numlist[0]].append((index_range[0], index_range[-1]))
return my_dict
Sample Run:
>>> index_range([-1,-1,-1,1,1,1,2,2,2,-1,-1,-1,1,1,1])
{1: [(3, 5), (12, 14)], 2: [(6, 8)], -1: [(0, 2), (9, 11)]}
In order to get the values as string in your dict, you may either modify the above function, or use the return value of the function in dictionary comprehension as:
>>> result_dict = index_range([-1,-1,-1,1,1,1,2,2,2,-1,-1,-1,1,1,1])
>>> {k: ','.join('{}:{}'.format(*i) for i in v)for k, v in result_dict.items()}
{1: '3:5,12:14', 2: '6:8', -1: '0:2,9:11'}
You can use a dict that uses list items as keys and their indexes as values:
>>> lst = [-1,-1,-1,1,1,1,2,2,2,-1,-1,-1,1,1,1]
>>> indexes = {}
>>> for index, item in enumerate(lst):
... indexes.setdefault(value, []).append(index)
>>> indexes
{1: [3, 4, 5, 12, 13, 14], 2: [6, 7, 8], -1: [0, 1, 2, 9, 10, 11]}
You could then merge the index lists into ranges if that's what you need. I can help you with that too if necessary.

Fast removal of consecutive duplicates in a list and corresponding items from another list

My question is similar to this previous SO question.
I have two very large lists of data (almost 20 million data points) that contain numerous consecutive duplicates. I would like to remove the consecutive duplicate as follows:
list1 = [1,1,1,1,1,1,2,3,4,4,5,1,2] # This is 20M long!
list2 = ... # another list of size len(list1), also 20M long!
i = 0
while i < len(list)-1:
if list[i] == list[i+1]:
del list1[i]
del list2[i]
else:
i = i+1
And the output should be [1, 2, 3, 4, 5, 1, 2] for the first list.
Unfortunately, this is very slow since deleting an element in a list is a slow operation by itself. Is there any way I can speed up this process? Please note that, as shown in the above code snipped, I also need to keep track of the index i so that I can remove the corresponding element in list2.
Python has this groupby in the libraries for you:
>>> list1 = [1,1,1,1,1,1,2,3,4,4,5,1,2]
>>> from itertools import groupby
>>> [k for k,_ in groupby(list1)]
[1, 2, 3, 4, 5, 1, 2]
You can tweak it using the keyfunc argument, to also process the second list at the same time.
>>> list1 = [1,1,1,1,1,1,2,3,4,4,5,1,2]
>>> list2 = [9,9,9,8,8,8,7,7,7,6,6,6,5]
>>> from operator import itemgetter
>>> keyfunc = itemgetter(0)
>>> [next(g) for k,g in groupby(zip(list1, list2), keyfunc)]
[(1, 9), (2, 7), (3, 7), (4, 7), (5, 6), (1, 6), (2, 5)]
If you want to split those pairs back into separate sequences again:
>>> zip(*_) # "unzip" them
[(1, 2, 3, 4, 5, 1, 2), (9, 7, 7, 7, 6, 6, 5)]
You can use collections.deque and its max len argument to set a window size of 2. Then just compare the duplicity of the 2 entries in the window, and append to the results if different.
def remove_adj_dups(x):
"""
:parameter x is something like '1, 1, 2, 3, 3'
from an iterable such as a string or list or a generator
:return 1,2,3, as list
"""
result = []
from collections import deque
d = deque([object()], maxlen=2) # 1st entry is object() which only matches with itself. Kudos to Trey Hunner -->object()
for i in x:
d.append(i)
a, b = d
if a != b:
result.append(b)
return result
I generated a random list with duplicates of 20 million numbers between 0 and 10.
def random_nums_with_dups(number_range=None, range_len=None):
"""
:parameter
:param number_range: use the numbers between 0 and number_range. The smaller this is then the more dups
:param range_len: max len of the results list used in the generator
:return: a generator
Note: If number_range = 2, then random binary is returned
"""
import random
return (random.choice(range(number_range)) for i in range(range_len))
I then tested with
range_len = 2000000
def mytest():
for i in [1]:
return [remove_adj_dups(random_nums_with_dups(number_range=10, range_len=range_len))]
big_result = mytest()
big_result = mytest()[0]
print(len(big_result))
The len was 1800197 (read dups removed), in <5 secs, which includes the random list generator spinning up.
I lack the experience/knowhow to say if it is memory efficient as well. Could someone comment please

Counting occurrences in a loop

gzip_files=["complete-credit-ctrl-txn-SE06_2013-07-17-00.log.gz","complete-credit-ctrl-txn-SE06_2013-07-17-01.log.gz"]
def input_func():
num = input("Enter the number of MIN series digits: ")
return num
for i in gzip_files:
import gzip
f=gzip.open(i,'rb')
file_content=f.read()
digit = input_func()
file_content = file_content.split('[')
series = [] #list of MIN
for line in file_content:
MIN = line.split('|')[13:15]
for x in MIN:
n = digit
x = x[:n]
series.append(x)
break
#count the number of occurences in the list named series
for i in series:
print i
#end count
Result:
63928
63928
63929
63929
63928
63928
That is only a part of the result. the actual result shows a really long list. Now i want to just list unique numbers and specify how many times it showed on the list.
So
63928 = 4,
63929 = 2
I would use a collections.Counter class here.
>>> a = [1, 1, 1, 2, 3, 4, 4, 5]
>>> from collections import Counter
>>> Counter(a)
Counter({1: 3, 4: 2, 2: 1, 3: 1, 5: 1})
Just pass your series variable to Counter and you'll get a dictionary where the keys are the unique elements and the values are their occurences in the list.
collections.Counter was introduced in Python 2.7. Use the following list comprehension for versions below 2.7
>>> [(elem, a.count(elem)) for elem in set(a)]
[(1, 3), (2, 1), (3, 1), (4, 2), (5, 1)]
You can then just convert this into a dictionary for easy access.
>>> dict((elem, a.count(elem)) for elem in set(a))
{1: 3, 2: 1, 3: 1, 4: 2, 5: 1}
You can use a Counter() for this.
So this will print what you need:
from collections import Counter
c = Counter(series)
for item,count in c.items():
print "%s = %s" % (item,count)
Compile a dictionary using unique numbers as keys, and their total occurrences as values:
d = {} #instantiate dictionary
for s in series:
# set default key and value if key does not exist in dictionary
d.setdefault(s, 0)
# increment by 1 for every occurrence of s
d[s] += 1
If this problem were any more complex. Implementation of map reduce (aka map fold) may be appropriate.
Map Reduce:
https://en.wikipedia.org/wiki/MapReduce
Python map function:
http://docs.python.org/2/library/functions.html#map
Python reduce function:
http://docs.python.org/2/library/functions.html#reduce

Categories

Resources