python intersection initial state standard practice

python intersection initial state standard practice - python

Example:
complete_dict = dict()
common_set = set()
for count, group_name in enumerate(groups_dict):
complete_dict[group_name] = dict() # grabs a json file = group_name.json
if count > 0:
common_set.intersection_update(complete_dict[group_name])
else:
# this accounts for the fact common_set is EMPTY to begin with
common_set.update(complete_dict[group_name])
WHERE: dict2 contains k:v of members(of the group): int(x)
Is there a more appropriate way to handle the initial state of the intersection?
I.e. We cannot intersect the first complete_dict[group_name] with common_set because it is empty and the result would therefore also be empty.

One way to avoid the special case would be to initialize the set with the contents of the item. Since a value intersected with itself is the value itself, you won't lose anything this way.
Here's an attempt to show how this could work. I'm not certain that I've understood what your different dict(...) calls are supposed to represent, so this may not translate perfectly into your code, but it should get you on the right path.
it = iter(dict(...)) # this is the dict(...) from the for statement
first_key = next(it)
results[first_key] = dict(...) # this is the dict(...) from inside the loop
common = set(results[first_key]) # initialize the set with the first item
for key in it:
results[key] = dict(...) # the inner dict(...) again
common.intersection_update(result[key])
As jamylak commented, the calls you were making to the keys method of various dictionaries were always unnecessary (a dictionary acts pretty much like a set of keys if you don't do any indexing or use mapping-specific methods). I've also given chosen variables that are more in keeping with Python's style (lowercase for regular variables, with CAPITALS reserved for constants).

Following Blcknght's answer but with less complexity and repetition:
common = None
uncommon = {}
for key in outer_dict:
inner = uncommon[key] = inner_dict(...)
if common is None:
common = set(inner)
else:
common.intersection_update(inner)
Like Blcknght, it is hard to know if this captures the intent of the original because the variable names are not descriptive nor distinct.

Logically when we want to assign the output of intersection between several sets, we assign the smallest set (set with minimum length) to the output set and compare other sets with it right?
uncommon_result = dict(...) # a method that returns a dict()
common_set=sorted([(len(x),x) for x in uncommon_result.values()])[0][1]
for a_set in uncommon_result.values():
common_set.intersection_update(a_set)
I know the second line can be one of the worst thing to do to initiate COMMON_SET cause it does lots of unnecessary works but mathematically almost all the time we want to know which one of our sets is the smallest, in that case, these works aren't in vain.
EDIT: If uncommon_result is a dict of dicts, you may need to add another for loop to go over its keys and with some minor changes in the above inner for, you'll be good again.

Related

Python: Obtaining index of an element within a value list of a dictionary

I have a dictionary with key:value list pairings, and I intend to find the index of the value list that contains the desired element.
E.g., if the dictionary is:
my_dict = {"key1":['v1'], "key2":None, "key3":['v2','v3'], "key4":['v4','v5','v6']}
Then, given element 'v2' I should be able to get index 2.
For a value list with one element, the index can be obtained with: list(my_dict.values()).index(['v1']) , however this approach does not work with lists containing multiple elements.
Using for loop, it can be obtained via:
for key, value in my_dict.items():
if value is None:
continue
if 'v2' in value:
print (list(my_dict.keys()).index(key))
Is there a neater (pythonic) way to obtain the same?

You've got an XY problem. You want to know the key that points to a value, and you think you need to find the enumeration index iterating the values so you can then use it to find the key by iteration as well. You don't need all that. Just find the key directly:
my_dict = {"key1":['v1'], "key2":None, "key3":['v2','v3'], "key4":['v4','v5','v6']}
value = 'v2'
# Iterate key/vals pairs in genexpr; if the vals contains value, yield the key,
# next stops immediately for the first key yielded so you don't iterate the whole dict
# when the value is found on an early key
key_for_value = next(key for key, vals in my_dict.items() if vals and value in vals)
print(key_for_value)
Try it online!
That'll raise StopIteration if the value doesn't exist, otherwise it directly retrieves the first key where the values list for that key contains the desired value.
If you don't really have an XY problem, and the index is important (it shouldn't be, that's a misuse of dicts) it's trivial to produce it as well, changing the extraction of the key to get both, e.g.:
index, key_for_value = next((i, key) for i, (key, vals) in enumerate(my_dict.items()) if vals and value in vals)
Mind you, this is a terrible solution if you need to perform these lookups a lot and my_dict isn't trivially small; it's O(n) on the total number of values, so a large dict would take quite a while to check (relative to the cost of just looking up an arbitrary key, which is average-case O(1)). In that case, ideally, if my_dict doesn't change much/at all, you'd construct a reversed dictionary up-front to find the key(s) associated with a value, e.g.:
from collections import defaultdict
my_dict = {"key1":['v1'], "key2":None, "key3":['v2','v3'], "key4":['v4','v5','v6']}
reversed_my_dict = defaultdict(set)
for key, vals in my_dict:
for val in vals:
reversed_my_dict[val].add(key)
reversed_my_dict = dict(reversed_my_dict) # Optional: Prevents future autovivification of keys
# by converting back to plain dict
after which you can cheaply determine the key(s) associated with a given value with:
reversed_my_dict.get(value, ()) # Using .get prevents autovivification of keys even if still a defaultdict
which returns the set of all keys that map to that value, if any, or the empty tuple if not (if you convert back to dict above, reversed_my_dict[value] would also work if you'd prefer to get a KeyError when the value is missing entirely; leaving it a defaultdict(set) would silently construct a new empty set, map it to the key and return it, which is fine if this happens rarely, but a problem if you test thousands of unmapped values and create a corresponding thousands of empty sets for no benefit, consuming memory wastefully).
Which you choose depends on how big my_dict is (for small my_dict, O(n) work doesn't really matter that much), how many times you need to search it (fewer searches mean less gain from reversed dict), and whether it's regularly modified. For that last point, if it's never modified, or rarely modified between lookups, rebuilding the reversed dict from scratch after each modification might be worth it for simplicity (assuming you perform many lookups per rebuild); if it's frequently modified, the reversed dict might still be worth it, you'd just have to update both the forward and reversed dicts rather than just one, e.g., expanding:
# New key
my_dict[newkey] = [newval1, newval2]
# Add value
my_dict[existingkey].append(newval)
# Delete value
my_dict[existingkey].remove(badval)
# Delete key
del my_dict[existingkey]
to:
# New key
newvals = my_dict[newkey] = [newval1, newval2]
for newval in newvals:
reversed_my_dict[newval].add(newkey) # reversed_my_dict.setdefault(newval, set()).add(newkey) if not defaultdict(set) anymore
# Add value
my_dict[existingkey].append(newval)
reversed_my_dict[newval].add(existingkey) # reversed_my_dict.setdefault(newval, set()).add(existingkey) if not defaultdict(set) anymore
# Delete value
my_dict[existingkey].remove(badval)
if badval not in my_dict[existingkey]: # Removed last copy; test only needed if one key can hold same value more than once
reversed_my_dict[badval].discard(existingkey)
# Optional delete badval from reverse mapping if last key removed:
if not reversed_my_dict[badval]:
del reversed_my_dict[badval]
# Delete key
# set() conversion not needed if my_dict's value lists guaranteed not to contain duplicates
for badval in set(my_dict.pop(existingkey)):
reversed_my_dict[badval].discard(existingkey)
# Optional delete badval from reverse mapping if last key removed:
if not reversed_my_dict[badval]:
del reversed_my_dict[badval]
respectively, roughly doubling the work incurred by modifications, in exchange for always getting O(1) lookups in either direction.

If you are looking for the key corresponding to a value, you can reverse the dictionary like so:
reverse_dict = {e: k for k, v in my_dict.items() if v for e in v}
Careful with duplicate values though. The last occurence will override the previous ones.

Don't know if it's the best solution but this works:
value = 'v2'
list(map(lambda x : value in x, list(map(lambda x : x[1] or [], list(my_dict.items()))))).index(True)

How to implement dicts / sets opposed to a list search, to increase speed

I am making a program that has to search through very long lists, and I have seen people suggesting that using sets and dicts speeds it up massively. However, I am at a loss as to how to make it work within my code. Currently, the program does this:
indexes = []
print("Collecting indexes...")
for term in sliced_5:
indexes.append(hex_crypted.index(term))
The code searches through the hex_crypted list, which contains 1,000,000+ terms, finds the index of the term, and then appends it to the the 'indexes' list.
I simply need to speed this process. Thanks for any help.

You want to build a lookup table so you don't need to repeatedly loop over hex_crypted. Then you can simply look up each term in the table.
print("Collecting indexes...")
lookup = {term: index for (index, term) in enumerate(hex_crypted)}
indexes = [lookup[term] for term in sliced_5]

The fastest method if you have a list is to do a set function on the list to return it as a set, but I don't think that is what you want to do in this case.
hex_crypted_set = set(hex_crypted)
If you need to keep that index for some reason, you'll want to instead build a dictionary first.
hex_crypted_dict = {}
for i in enumerate(hex_crypted):
hex_crypted_dict[i[1]] = i[0]
And then to get that index you just search the dict:
indexes = []
for term in sliced_5:
indexes.append(hex_crypted_dict[term])
You will end up with the appropriate indexes which correspond to the original long list and only iterate that long list one time, which will be a lot better performance than iterating it for every time you do a lookup.

The first step is to generate a dict, for example:
hex_crypted_dict = {v: i for i, v in enumerate(hex_crypted)}
Then your code changed to
indexes = []
hex_crypted_dict = {v: i for i, v in enumerate(hex_crypted)}
print("Collecting indexes...")
for term in sliced_5:
indexes.append(hex_crypted_dict[term])

the code is giving me a number from the list instead of the mode

in one of my work i need to find the mode a list called "dataset" using no modual or function that would find the mode by itself.
i tried to make it so it can output the mode or the list of modes depending on the list of numbers. I used 2 for loops so the first number of the list checks each number of the list including its self to see how many numbers of its self there is, for example if my list was 123415 it would say there is 2 ones, and it does this for all the numbers of the list. the number with the most counts would be the mode. The bottom section of the code where the if elif and else is, there is where it checks if the number has the most counts by comparing with the other numbers of the list checking if it has more numbers or the same as the previous biggest number.
I've tried to change the order of the codes but i'm still confused why it is doing this error
pop_number = []
pop_amount = 0
amount = 0
for i in range(len(dataset)):
for x in dataset:
if dataset[i] == x:
amount += 1
if amount>pop_amount:
pop_amount = amount
pop_number = []
pop_number.append(x)
amount = 0
elif amount==pop_amount:
pop_amount = amount
if x not in pop_number:
pop_number.append(x)
pop_amount = amount
amount = 0
else:
continue
print(pop_number)
i expected the output to be the mode of the list or the list of modes but it came up with the last number from the list

As this is apparently homework, I will present a sketch, not working code.
Observe that a dict in Python can hold key-value mappings.
Let the numbers in the input list be the keys, and the values the number of times they occur. Going over the list, use each item as the key for the dict, and add one to the value (starting at 0 -- defaultdict(int) is good for this). If the result is bigger than any previous maximum, remember this key.
Since you want to allow for more than one mode value, the variable which remembers the maximum key should be a list; but since you have a new maximum, replace the old list with a list containing just this key. If another value also reaches the maximum, add it to the list. (That's the append method.)
(See how this is if bigger than maximum so far and then else if equal to maximum so far and then otherwise there is no need to do anything.)
When you have looped over all items in the input list, the list of remembered keys is your result.
Go back and think about what variables you need already before the loop. The maximum so far should be defined but guaranteed to be smaller than any value you will see -- it makes sense to start this at 0 because as soon as you see one key, it will have a bigger count than zero. And the keys you want to remember can start out as an empty list.
Now think about how you would test this. What happens if the input list is empty? What happens if the input list contains just the same number over and over? What happens if every item on the input list is unique? Can you think of other corner cases?

Without using any module or function that will specifically find the mode itself, you can do that with much less code. Your code will work with a little more effort, I highly suggest you to try to solve the problem on your own logic, but meanwhile let me show you how to take the help of all the built-in data structures in Python List, Tuples, Dictionaries and Sets within 7-8 lines. Also there is unzipping at the end (*). I will suggest you to look these up, when you get time.
lst = [1,1,1,1,2,2,2,3,3,3,3,3,3,4,2,2,2,5,5,6]
# finds the unique elements
unique_elems = set(lst)
# creates a dictionary with the unique elems as keys and initializes the values to 0
count = dict.fromkeys(unique_elems,0)
# gets the frequency of each element in the lst
for elem in unique_elems:
count[elem] = lst.count(elem)
# finds max frequency
max_freq = max(count.values())
# stores list of mode(s)
modes = [i for i in count if count[i] == max_freq]
# prints mode(s), I have used unzipping here so that in case there is one mode,
# you don't have to print ugly [x]
print(*modes)
Or if you want to go for the shortest (I really shouldn't be making such bold claims in StackOverflow), then I guess this will be it (even though, writing short codes for the sake of it is discouraged)
lst = [1,1,1,1,2,2,2,3,3,3,3,3,3,4,2,2,2,5,5,6]
freq_dist = [(i, lst.count(i)) for i in set(lst)]
[print(i,end=' ') for i,j in freq_dist if j==max(freq_dist, key=lambda x:x[1])[1]]
And if you just want to go bonkers and say goodbye to loops (Goes without saying, this is ugly, really ugly):
lst = [1,1,1,1,2,2,2,3,3,3,3,3,3,4,2,2,2,5,5,6]
unique_elems = set(lst)
freq_dist = list(map(lambda x:(x, lst.count(x)), unique_elems))
print(*list(map(lambda x:x[0] if x[1] == max(freq_dist,key = lambda y: y[1])[1] else '', freq_dist)))

can this code be written in two lines?

I have a feeling that using the setdefault method or lamda that this code can be written in two lines:
variables = ['a','b','c','d']
for value in indefinite_dict.values():
str1 = immediate_instantiation.get(value)
if str1 == None:
immediate_instantiation.update({value:variables[0]})
del variables[0]
It loops through the values of the indefinite_dict and puts them in and if that value is not already a key in the immediate instantiation dict then it adds that as an entry into that dict with a value of the first member of the variables list and deletes the first member of the variables list.

If you’re okay with values in variables being deleted even if a corresponding key already exists in immediate_instantiation when that key has the same value, you’re right that you can do it with only setdefault:
for value in indefinite_dict.values():
if immediate_instantiation.setdefault(value, variables[0]) is variables[0]:
del variables[0]
To get it down to two lines without any other context takes a different (and kind of unpleasant) approach, though:
updates = (v for v in indefinite_dict.values() if v not in immediate_instantiation)
immediate_instantiation.update({v: variables.pop(0) for v in updates})
And indefinite_dict had better be an OrderedDict – otherwise you’re removing variables in a potentially random order! (Don’t rely on Python 3.6’s dict representation for this.)
If you don’t need variables after this and variables is guaranteed to be at least as long as updates, a non-mutating solution is much cleaner, note:
updates = (v for v in indefinite_dict.values() if v not in immediate_instantiation)
immediate_instantiation.update(zip(updates, variables))

One line solution:
immediate_instantiation.update({value: variables.pop(0) for value in indefinite_dict.values() if value not in immediate_instantiation})

Optimised Python dictionary / negative index storage

Raised by this question's comments (I can see that this is irrelevant), I am now aware that using dictionaries for data that needs to be queried/accessed regularly is not good, speedwise.
I have a situation of something like this:
someDict = {}
someDict[(-2, -2)] = something
somedict[(3, -10)] = something else
I am storing keys of coordinates to objects that act as arrays of tiles in a game. These are going to be negative at some point, so I can't use a list or some kind of sparse array (I think that's the term?).
Can I either:
Speed up dictionary lookups, so this would not be an issue
Find some kind of container that will support sparse, negative indices?
I would use a list, but then the querying would go from O(log n) to O(n) to find the area at (x, y). (I think my timings are off here too).

Python dictionaries are very very fast, and using a tuple of integers is not going to be a problem. However your use case seems that sometimes you need to do a single-coordinate check and doing that traversing all the dict is of course slow.
Instead of doing a linear search you can however speed up the data structure for the access you need using three dictionaries:
class Grid(object):
def __init__(self):
self.data = {} # (i, j) -> data
self.cols = {} # i -> set of j
self.rows = {} # j -> set of i
def __getitem__(self, ij):
return self.data[ij]
def __setitem__(self, ij, value):
i, j = ij
self.data[ij] = value
try:
self.cols[i].add(j)
except KeyError:
self.cols[i] = set([j])
try:
self.rows[j].add(i)
except KeyError:
self.rows[j] = add([i])
def getRow(self, i):
return [(i, j, data[(i, j)])
for j in self.cols.get(i, [])]
def getCol(self, j):
return [(i, j, data[(i, j)])
for i in self.rows.get(j, [])]
Note that there are many other possible data structures depending on exactly what you are trying to do, how frequent is reading, how frequent is updating, if you query by rectangles, if you look for nearest non-empty cell and so on.

To start off with
Speed up dictionary lookups, so this would not be an issue
Dictionary lookups are pretty fast O(1), but (from your other question) you're not relying on the hash-table lookup of the dictionary, your relying on a linear search of the dictionary's keys.
Find some kind of container that will support sparse, negative indices?
This isn't indexing into the dictionary. A tuple is an immutable object, and you are hashing the tuple as a whole. The dictionary really has no idea of the contents of the keys, just their hash.
I'm going to suggest, as others did, that you restructure your data.
For example, you could create objects that encapsulate the data you need, and arrange them in a binary tree for O(n lg n) searches. You can even go so far as to wrap the entire thing in a class that will give you the nice if foo in Bar: syntax your looking for.
You probably need a couple coordinated structures to accomplish what you want. Here's a simplified example using dicts and sets (tweaking user 6502's suggestion a bit).
# this will be your dict that holds all the data
matrix = {}
# and each of these will be a dict of sets, pointing to coordinates
cols = {}
rows = {}
def add_data(coord, data)
matrix[coord] = data
try:
cols[coord[0]].add(coord)
except KeyError:
# wrap coords in a list to prevent set() from iterating over it
cols[coord[0]] = set([coord])
try:
rows[coord[1]].add(coord)
except KeyError:
rows[coord[1]] = set([coord])
# now you can find all coordinates from a row or column quickly
>>> add_data((2, 7), "foo4")
>>> add_data((2, 5), "foo3")
>>> 2 in cols
True
>>> 5 in rows
True
>>> [matrix[coord] for coord in cols[2]]
['foo4', 'foo3']
Now just wrap that in a class or a module, and you'll be off, and as always, if it's not fast enough profile and test before you guess.

Dictionary lookups are very fast. Searching for part of the key (e.g. all tiles in row x) is what's not fast. You could use a dict of dicts. Rather than a single dict indexed by a 2-tuple, use nested dicts like this:
somedict = {0: {}, 1:{}}
somedict[0][-5] = "thingy"
somedict[1][4] = "bing"
Then if you want all the tiles in a given "row" it's just somedict[0].
You will need some logic to add the secondary dictionaries where necessary and so on. Hint: check out getitem() and setdefault() on the standard dict type, or possibly the collections.defaultdict type.
This approach gives you quick access to all tiles in a given row. It's still slow-ish if you want all the tiles in a given column (though at least you won't need to look through every single cell, just every row). However, if needed, you could get around that by having two dicts of dicts (one in column, row order and the other in row, column order). Updating then becomes twice as much work, which may not matter for a game where most of the tiles are static, but access is very easy in either direction.
If you only need to store numbers and most of your cells will be 0, check out scipy's sparse matrix classes.

One alternative would be to simply shift the index so it's positive.
E.g. if your indices are contiguous like this:
...
-2 -> a
-1 -> c
0 -> d
1 -> e
2 -> f
...
Just do something like LookupArray[Index + MinimumIndex], where MinimumIndex is the absolute value of the smallest index you would use.
That way, if your minimum was say, -50, it would map to 0. -20 would map to 30, and so forth.
Edit:
An alternative would be to use a trick with how you use the indices. Define the following key function
Key(n) = 2 * n (n >= 0)
Key(n) = -2 * n - 1. (n < 0)
This maps all positive keys to the positive even indices, and all negative elements to the positive odd indices. This may not be practical though, since if you add 100 negative keys, you'd have to expand your array by 200.
One other thing to note: If you plan on doing look ups and the number of keys is constant (or very slowly changing), stick with an array. Otherwise, dictionaries aren't bad at all.

Use multi-dimensional lists -- usually implemented as nested objects. You can easily make this handle negative indices with a little arithmetic. It might use a more memory than a dictionary since something has to be put in every possible slot (usually None for empty ones), but access will be done via simple indexing lookup rather than hashing as it would with a dictionary.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.