can this code be written in two lines? - python

I have a feeling that using the setdefault method or lamda that this code can be written in two lines:
variables = ['a','b','c','d']
for value in indefinite_dict.values():
str1 = immediate_instantiation.get(value)
if str1 == None:
immediate_instantiation.update({value:variables[0]})
del variables[0]
It loops through the values of the indefinite_dict and puts them in and if that value is not already a key in the immediate instantiation dict then it adds that as an entry into that dict with a value of the first member of the variables list and deletes the first member of the variables list.

If you’re okay with values in variables being deleted even if a corresponding key already exists in immediate_instantiation when that key has the same value, you’re right that you can do it with only setdefault:
for value in indefinite_dict.values():
if immediate_instantiation.setdefault(value, variables[0]) is variables[0]:
del variables[0]
To get it down to two lines without any other context takes a different (and kind of unpleasant) approach, though:
updates = (v for v in indefinite_dict.values() if v not in immediate_instantiation)
immediate_instantiation.update({v: variables.pop(0) for v in updates})
And indefinite_dict had better be an OrderedDict – otherwise you’re removing variables in a potentially random order! (Don’t rely on Python 3.6’s dict representation for this.)
If you don’t need variables after this and variables is guaranteed to be at least as long as updates, a non-mutating solution is much cleaner, note:
updates = (v for v in indefinite_dict.values() if v not in immediate_instantiation)
immediate_instantiation.update(zip(updates, variables))

One line solution:
immediate_instantiation.update({value: variables.pop(0) for value in indefinite_dict.values() if value not in immediate_instantiation})

Related

Python: Obtaining index of an element within a value list of a dictionary

I have a dictionary with key:value list pairings, and I intend to find the index of the value list that contains the desired element.
E.g., if the dictionary is:
my_dict = {"key1":['v1'], "key2":None, "key3":['v2','v3'], "key4":['v4','v5','v6']}
Then, given element 'v2' I should be able to get index 2.
For a value list with one element, the index can be obtained with: list(my_dict.values()).index(['v1']) , however this approach does not work with lists containing multiple elements.
Using for loop, it can be obtained via:
for key, value in my_dict.items():
if value is None:
continue
if 'v2' in value:
print (list(my_dict.keys()).index(key))
Is there a neater (pythonic) way to obtain the same?
You've got an XY problem. You want to know the key that points to a value, and you think you need to find the enumeration index iterating the values so you can then use it to find the key by iteration as well. You don't need all that. Just find the key directly:
my_dict = {"key1":['v1'], "key2":None, "key3":['v2','v3'], "key4":['v4','v5','v6']}
value = 'v2'
# Iterate key/vals pairs in genexpr; if the vals contains value, yield the key,
# next stops immediately for the first key yielded so you don't iterate the whole dict
# when the value is found on an early key
key_for_value = next(key for key, vals in my_dict.items() if vals and value in vals)
print(key_for_value)
Try it online!
That'll raise StopIteration if the value doesn't exist, otherwise it directly retrieves the first key where the values list for that key contains the desired value.
If you don't really have an XY problem, and the index is important (it shouldn't be, that's a misuse of dicts) it's trivial to produce it as well, changing the extraction of the key to get both, e.g.:
index, key_for_value = next((i, key) for i, (key, vals) in enumerate(my_dict.items()) if vals and value in vals)
Mind you, this is a terrible solution if you need to perform these lookups a lot and my_dict isn't trivially small; it's O(n) on the total number of values, so a large dict would take quite a while to check (relative to the cost of just looking up an arbitrary key, which is average-case O(1)). In that case, ideally, if my_dict doesn't change much/at all, you'd construct a reversed dictionary up-front to find the key(s) associated with a value, e.g.:
from collections import defaultdict
my_dict = {"key1":['v1'], "key2":None, "key3":['v2','v3'], "key4":['v4','v5','v6']}
reversed_my_dict = defaultdict(set)
for key, vals in my_dict:
for val in vals:
reversed_my_dict[val].add(key)
reversed_my_dict = dict(reversed_my_dict) # Optional: Prevents future autovivification of keys
# by converting back to plain dict
after which you can cheaply determine the key(s) associated with a given value with:
reversed_my_dict.get(value, ()) # Using .get prevents autovivification of keys even if still a defaultdict
which returns the set of all keys that map to that value, if any, or the empty tuple if not (if you convert back to dict above, reversed_my_dict[value] would also work if you'd prefer to get a KeyError when the value is missing entirely; leaving it a defaultdict(set) would silently construct a new empty set, map it to the key and return it, which is fine if this happens rarely, but a problem if you test thousands of unmapped values and create a corresponding thousands of empty sets for no benefit, consuming memory wastefully).
Which you choose depends on how big my_dict is (for small my_dict, O(n) work doesn't really matter that much), how many times you need to search it (fewer searches mean less gain from reversed dict), and whether it's regularly modified. For that last point, if it's never modified, or rarely modified between lookups, rebuilding the reversed dict from scratch after each modification might be worth it for simplicity (assuming you perform many lookups per rebuild); if it's frequently modified, the reversed dict might still be worth it, you'd just have to update both the forward and reversed dicts rather than just one, e.g., expanding:
# New key
my_dict[newkey] = [newval1, newval2]
# Add value
my_dict[existingkey].append(newval)
# Delete value
my_dict[existingkey].remove(badval)
# Delete key
del my_dict[existingkey]
to:
# New key
newvals = my_dict[newkey] = [newval1, newval2]
for newval in newvals:
reversed_my_dict[newval].add(newkey) # reversed_my_dict.setdefault(newval, set()).add(newkey) if not defaultdict(set) anymore
# Add value
my_dict[existingkey].append(newval)
reversed_my_dict[newval].add(existingkey) # reversed_my_dict.setdefault(newval, set()).add(existingkey) if not defaultdict(set) anymore
# Delete value
my_dict[existingkey].remove(badval)
if badval not in my_dict[existingkey]: # Removed last copy; test only needed if one key can hold same value more than once
reversed_my_dict[badval].discard(existingkey)
# Optional delete badval from reverse mapping if last key removed:
if not reversed_my_dict[badval]:
del reversed_my_dict[badval]
# Delete key
# set() conversion not needed if my_dict's value lists guaranteed not to contain duplicates
for badval in set(my_dict.pop(existingkey)):
reversed_my_dict[badval].discard(existingkey)
# Optional delete badval from reverse mapping if last key removed:
if not reversed_my_dict[badval]:
del reversed_my_dict[badval]
respectively, roughly doubling the work incurred by modifications, in exchange for always getting O(1) lookups in either direction.
If you are looking for the key corresponding to a value, you can reverse the dictionary like so:
reverse_dict = {e: k for k, v in my_dict.items() if v for e in v}
Careful with duplicate values though. The last occurence will override the previous ones.
Don't know if it's the best solution but this works:
value = 'v2'
list(map(lambda x : value in x, list(map(lambda x : x[1] or [], list(my_dict.items()))))).index(True)

Perform function on many variables

I have many variables I want to perform the same function on. The return value of the function should be saved in a variable with the the letter m appended to it. I'm currently using a dictionary but I'm having trouble converting the key value pair to a variable name with that pair. I'm listing my approach below and I'm opening to either appending/adding on to this method or using a better method all together
A=4
B=5
C=6
listletter = {'mA':A,'mB':B,'mC':C}
for key in listletter :
listletter[key]=performfunc(listletter[key])
Note that I do not want to use the dictionary to reference values. My actual use of the variables is in numpy and they're is already alot of indexing going on
and the lines are long. I therefore want distinct variable names as opposed to dict['variablename'][][]...
I am aware of the exec function but I here this is bad convention. I'd also appreciate any general advice on simple improvements unrelated to the task(perhaps I should use key, value instead?).
EDIT: I would also be okay with simply performing the function on the variables but not appending the letter m to it. I want to perform the same function on a, b and c. The function would return the new values of a,b and c. Once again I ultimately want to access the variables as a,b & c and not through a dictionary
You should be able to use locals() to convert all local variables into a dictionary then prepend m to them in your resulting dictionary.
res = {}
for key, value in locals().items():
res["m"+key] = somefunc(value)
I am not sure how many variables you have. If you have a handful then this should work.
a,b,c,d = 31,21,311,11
print("original values: ", a,b,c,d)
#replace the lambda function with whatever function that you want to use.
a,b,c,d = map(lambda x:x**2,(a,b,c,d))
print("changed values,same variables: ", a,b,c,d)

Index value confusion in python

Hey all am new to python programming and i have noticed some code which is really confusing me.
import collectors
s = 'mississippi'
d = collectors.defaultdict(int)
for k in s:
d[k] += 1
d.items()
The thing i need to know is the use of d[k] here ..I know k is the value in the string s.But i didnt understood what d[k] returns.In defaultdict(int) new value is created if dictonary has no values..
Please help me any help would be appreciated ..Thanks ..
Dictionaries in Python are "mapping" types. (This applies to both regular dict dictionaries and the more specialized variations like defaultdict.) A mapping takes a key and "maps" it to a value. The syntax d[k] is used to look up the key k in the dictionary d. Depending on where it appears in your code, it can have slightly different semantics (either returning the existing value for the key or setting a new one).
In your example, you're using d[k] += 1, which increments the value under key k in the dictionary. Since integers are immutable, it actually breaks out into d[k] = d[k] + 1. The right side d[k] does a look up of the value in the dictionary. Then it adds one and, using the d[k] on the left side, assigns the result into the dictionary as a new value.
defaultdict changes things a bit in that keys that don't yet exist in the dictionary are treated as if they did exist. The argument to its constructor is a "factory" object which will be called to create the new values when an unknown key is requested.
Here you go
d[key]
Return the item of d with key key. Raises a KeyError if key is not in the map.
Straigth from the python docs, under mapping types.
Go to https://docs.python.org/ and bookmark it. It will become your best friend.

python intersection initial state standard practice

Example:
complete_dict = dict()
common_set = set()
for count, group_name in enumerate(groups_dict):
complete_dict[group_name] = dict() # grabs a json file = group_name.json
if count > 0:
common_set.intersection_update(complete_dict[group_name])
else:
# this accounts for the fact common_set is EMPTY to begin with
common_set.update(complete_dict[group_name])
WHERE: dict2 contains k:v of members(of the group): int(x)
Is there a more appropriate way to handle the initial state of the intersection?
I.e. We cannot intersect the first complete_dict[group_name] with common_set because it is empty and the result would therefore also be empty.
One way to avoid the special case would be to initialize the set with the contents of the item. Since a value intersected with itself is the value itself, you won't lose anything this way.
Here's an attempt to show how this could work. I'm not certain that I've understood what your different dict(...) calls are supposed to represent, so this may not translate perfectly into your code, but it should get you on the right path.
it = iter(dict(...)) # this is the dict(...) from the for statement
first_key = next(it)
results[first_key] = dict(...) # this is the dict(...) from inside the loop
common = set(results[first_key]) # initialize the set with the first item
for key in it:
results[key] = dict(...) # the inner dict(...) again
common.intersection_update(result[key])
As jamylak commented, the calls you were making to the keys method of various dictionaries were always unnecessary (a dictionary acts pretty much like a set of keys if you don't do any indexing or use mapping-specific methods). I've also given chosen variables that are more in keeping with Python's style (lowercase for regular variables, with CAPITALS reserved for constants).
Following Blcknght's answer but with less complexity and repetition:
common = None
uncommon = {}
for key in outer_dict:
inner = uncommon[key] = inner_dict(...)
if common is None:
common = set(inner)
else:
common.intersection_update(inner)
Like Blcknght, it is hard to know if this captures the intent of the original because the variable names are not descriptive nor distinct.
Logically when we want to assign the output of intersection between several sets, we assign the smallest set (set with minimum length) to the output set and compare other sets with it right?
uncommon_result = dict(...) # a method that returns a dict()
common_set=sorted([(len(x),x) for x in uncommon_result.values()])[0][1]
for a_set in uncommon_result.values():
common_set.intersection_update(a_set)
I know the second line can be one of the worst thing to do to initiate COMMON_SET cause it does lots of unnecessary works but mathematically almost all the time we want to know which one of our sets is the smallest, in that case, these works aren't in vain.
EDIT: If uncommon_result is a dict of dicts, you may need to add another for loop to go over its keys and with some minor changes in the above inner for, you'll be good again.

More efficient way to get unique first occurrence from a Python dict

I have a very large file I'm parsing and getting the key value from the line. I want only the first key and value, for only one value. That is, I'm removing the duplicate values
So it would look like:
{
A:1
B:2
C:3
D:2
E:2
F:3
G:1
}
and it would output:
{E:2,F:3,G:1}
It's a bit confusing because I don't really care what the key is. So E in the above could be replaced with B or D, F could be replaced with C, and G could be replaced with A.
Here is the best way I have found to do it but it is extremely slow as the file gets larger.
mapp = {}
value_holder = []
for i in mydict:
if mydict[i] not in value_holder:
mapp[i] = mydict[i]
value_holder.append(mydict[i])
Must look through value_holder every time :( Is there a faster way to do this?
Yes, a trivial change makes it much faster:
value_holder = set()
(Well, you also have to change the append to add. But still pretty simple.)
Using a set instead of a list means each lookup is O(1) instead of O(N), so the whole operation is O(N) instead of O(N^2). In other words, if you have 10,000 lines, you're doing 10,000 hash lookups instead of 50,000,000 comparisons.
One caveat with this solution—and all of the others posted—is that it requires the values to be hashable. If they're not hashable, but they are comparable, you can still get O(NlogN) instead of O(N^2) by using a sorted set (e.g., from the blist library). If they're neither hashable nor sortable… well, you'll probably want to find some way to generate something hashable (or sortable) to use as a "first check", and then only walk the "first check" matches for actual matches, which will get you to O(NM), where M is the average number of hash collisions.
You might want to look at how unique_everseen is implemented in the itertools recipes in the standard library documentation.
Note that dictionaries don't actually have an order, so there's no way to pick the "first" duplicate; you'll just get one arbitrarily. In which case, there's another way to do this:
inverted = {v:k for k, v in d.iteritems()}
reverted = {v:k for k, v in inverted.iteritems()}
(This is effectively a form of the decorate-process-undecorate idiom without any processing.)
But instead of building up the dict and then filtering it, you can make things better (simpler, and faster, and more memory-efficient, and order-preserving) by filtering as you read. Basically, keep the set alongside the dict as you go along. For example, instead of this:
mydict = {}
for line in f:
k, v = line.split(None, 1)
mydict[k] = v
mapp = {}
value_holder = set()
for i in mydict:
if mydict[i] not in value_holder:
mapp[i] = mydict[i]
value_holder.add(mydict[i])
Just do this:
mapp = {}
value_holder = set()
for line in f:
k, v = line.split(None, 1)
if v not in value_holder:
mapp[k] = v
value_holder.add(v)
In fact, you may want to consider writing a one_to_one_dict that wraps this up (or search PyPI modules and ActiveState recipes to see if someone has already written it for you), so then you can just write:
mapp = one_to_one_dict()
for line in f:
k, v = line.split(None, 1)
mapp[k] = v
I'm not completely clear on exactly what you're doing, but set is a great way to remove duplicates. For example:
>>> k = [1,3,4,4,5,4,3,2,2,3,3,4,5]
>>> set(k)
set([1, 2, 3, 4, 5])
>>> list(set(k))
[1, 2, 3, 4, 5]
Though it depends a bit on the structure of the input that you're loading, there might be a way to simply use set so that you don't have to iterate through the entire object every time to see if there any matching keys--instead run it through set once.
The first way to speed this up, as others have mentioned, is a using a set to record seen values, as checking for membership on a set is much faster.
We can also make this a lot shorter with a dict comprehension:
seen = set()
new_mapp = {k: v for k, v in mapp.items() if v not in seen or seen.add(i)}
The if case requires a little explanation: we only add key/value pairs where we havn't seen the value before, but we use or a little bit hackishly to ensure any unseen values are added to the set. As set.add() returns None, it will not affect the outcome.
As always, in 2.x, user dict.iteritems() over dict.items().
Using a set instead of a list would speed you up considerably ...
You said you are reading from a very large file and want to keep only the first occurrence of a key. I originally assumed this meant you care about the order in which the key/value pairs occurs in the very large file. This code will do that and will be fast.
values_seen = set()
mapp = {}
with open("large_file.txt") as f:
for line in f:
key, value = line.split()
if value not in values_seen:
values_seen.add(value)
mapp[key] = value
You were using a list to keep track of what keys your code had seen. Searching through a list is very slow: it gets slower the larger the list gets. A set is much faster because lookups are close to constant time (don't get much slower, or maybe at all slower, the larger the list gets). (A dict also works the way a set works.)
Part of your problem is that dicts do not preserve any sort of logical ordering when they are iterated through. They use hash tables to index items (see this great article). So there's no real concept of "first occurence of value" in this sort of data structure. The right way to do this would probably be a list of key-value pairs. e.g. :
kv_pairs = [(k1,v1),(k2,v2),...]
or, because the file is so large, it would be better to use the excellent file iteration python provides to retrieve the k/v pairs:
def kv_iter(f):
# f being the file descriptor
for line in f:
yield ... # (whatever logic you use to get k, v values from a line)
Value_holder is a great candidate for a set variable. You are really just testing whether value_holder. Because values are unique, they can be indexed more efficiently using a similar hashing method. So it would end up a bit like this:
mapp = {}
value_holder = set()
for k,v in kv_iter(f):
if v in value_holder:
mapp[k] = v
value_holder.add(v)

Categories

Resources