Pickling a dictionary that uses defaultdict

Pickling a dictionary that uses defaultdict - python

I have this dictionary defined by:
def train(features):
model = collections.defaultdict(lambda: 1)
for f in features:
model[f] += 1
return model
Later along the way, I want to to use pickle and dump the dictionary into a text file:
f = open('dict.txt', 'wb')
pickle.dump(Nwords, f)
However the code doesn't work and I receive an error. Apparently pickle can't work with lambda and I'm better off defining the model using a module-level function. I have already read the answers here
Unfortunately as I am not experienced with Python I am not exactly sure how to do this. I tried:
def dd():
return defaultdict(int)
def train(features):
## model = defaultdict(lambda: 1)
model = defaultdict(dd)
for f in features:
model[f] += 1
return model
I receive the error:
TypeError: unsupported operand type(s) for +=: 'collections.defaultdict' and 'int'
Other than that, return defaultdict(int) would always assign a zero to the first occurrence of a key, whereas I want it to assign 1. Any ideas on how I can fix this?

Unfortunately, that answer there is correct for that question, but subtly wrong for yours. Although a top-level function instead of a lambda is great and indeed would make pickle a lot happier, the function should return the default value to be used, which for your case is not another defaultdict object.
Simply return the same value your lambda returns:
def dd():
return 1
Every time you try to access a key in the defaultdict instance that doesn't yet exist, dd is called. The other post then returns another defaultdict instance, that one set to use int as a default, which matches the lambda shown in the other question.

Related

How can I get the function object inside a lambda function

Sorry if this is very lame, but I'm pretty new to Python.
As in Python everything is an object, I assume in every object the object itself can be get somehow. In object methods the self variable contains it. From the object reference the class object can be get (like type(self)). But how this could be got inside a lambda?
I could figure out for a normal function:
import inspect
def named_func():
func_name = inspect.stack()[0].function
func_obj = inspect.stack()[1].frame.f_locals[func_name]
print(func_name, func_obj, func_obj.xxx)
named_func.xxx = 15
named_func()
The output looks like this:
named_func <function named_func at 0x7f56e368c2f0> 15
Unfortunately in a lambda the inspect.stack()[0].function gives <lambda> inside the lambda.
Is there a way to get the function object inside a lambda?
Is there a way to get function object directly (not using the name of the function)?
I imagined __self__, but it does not work.
UPDATE
I tried something like this in lambda:
lambda_func = lambda : inspect.stack()[0]
lambda_func.xxx = 2
print(lambda_func())
This prints:
FrameInfo(frame=<frame at 0x7f3eee8a6378, file './test_obj.py', line 74, code <lambda>>, filename='./test_obj.py', lineno=74, function='<lambda>', code_context=['lambda_func = lambda : inspect.stack()[0]\n'], index=0)
But for example is there a way to get the lambda object field xxx in this case? For this the lambda object should be got somehow.

We can now use a new python syntax to make it shorter and easier to read, without the need to define a new function for this purpose.
You can find two examples below:
Fibonacci:
(f:=lambda x: 1 if x <= 1 else f(x - 1) + f(x - 2))(5)
Factorial:
(f:=lambda x: 1 if x == 0 else x*f(x - 1))(5)
We use := to name the lambda: use the name directly in the lambda itself and call it right away as an anonymous function.
So in your particular use-case it would give something like that:
print((f:=lambda: f.__hash__())()) # prints the hash for example
You can do whatever you want with that f variable now (inside the lambda).
But in fact, if you don't mind multi-lines for your code, you could also just use the name directly and do something like that:
f = lambda : f.xxx
f.xxx = 2
print(f())
(see https://www.python.org/dev/peps/pep-0572 for more information about this := operator)

Note, this is not an efficient/pragmatic solution to the problem. This is not a recommendation about how to write actual software.. it simply presents how access to the lambda reference from the lambda can be achieved without assigning it to a variable name. This is a very hacky answer.
This will only work completely correctly if you follow the advice from the answer found here
In short, given the stack you can find the code object, and then using the gc module you can find the reference to your lambda.
Example with #Tomalak's factorial lambda!
import gc
import inspect
def current_lambda():
lambda_code = inspect.stack()[1].frame.f_code
candidates = [
referrer
for referrer in gc.get_referrers(lambda_code)
if inspect.isfunction(referrer)
and referrer.__code__ is lambda_code
]
if len(candidates) != 1:
raise ValueError(
"Multiple candidates found! Cannot determine correct function!"
)
return candidates[0]
print((lambda n: 1 if n < 2 else n * current_lambda()(n - 1))(5))
Outputs
120
Revisiting your example:
lambda_func = lambda: current_lambda().xxx
lambda_func.xxx = 10
print(lambda_func())
Outputs:
10

"AttributeError: 'generator' object has no attribute 'replace' "

I'm not sure why I'm seeing this error message: AttributeError: 'generator' object has no attribute 'replace' (on line: modified_file = hex_read_file.replace(batch_to_amend_final, batch_amendment_final).
import binascii, os, re, time
os.chdir(...)
files_to_amend = os.listdir(...)
joiner = "00"
# Allow user to input the text to be replaced, and with what
while True:
batch_to_amend3 = input("\n\nWhat number would you like to amend? \n\n >>> ")
batch_amendment3 = input("\n\nWhat is the new number? \n\n >>> ")
batch_to_amend2 = batch_to_amend3.encode()
batch_to_amend = joiner.encode().join(binascii.hexlify(bytes((i,))) for i in batch_to_amend2)
batch_amendment2 = batch_amendment3.encode()
batch_amendment = joiner.encode().join(binascii.hexlify(bytes((i,))) for i in batch_amendment2)
# Function to translate label files
def lbl_translate(files_to_amend):
with open(files_to_amend, 'rb') as read_file:
read_file2 = read_file.read()
hex_read_file = (binascii.hexlify(bytes((i,))) for i in read_file2)
print(hex_read_file)
modified_file = hex_read_file.replace(batch_to_amend, batch_amendment)
with open(files_to_amend, 'wb') as write_file:
write_file.write(modified_file)
write_file.close()
print("Amended: " + files_to_amend)
# Calling function to modify labels
for label in files_to_amend:
lbl_translate(label)

hex_read_file is a generator comprehension (note the round brackets around the statement) defined here:
hex_read_file = (binascii.hexlify(bytes((i,))) for i in read_file2)
As many already pointed out in the comments, comprehesions don't have a replace method as strings have, so you have two possibilities, depending on your specific use-case:
Turn the comprehension in a bytestring and call replace on that (considering how you use write_file.write(modified_file) afterwards, this is the option that would work with that directly):
hex_read_file = bytes(binascii.hexlify(bytes((int(i),))) for i in read_file2) # note: I added th eadditional int() call to fix the issue mentioned in the comments
Filter and replace directly in the comprehension (and modify how you write out the result):
def lbl_translate(files_to_amend, replacement_map):
with open(files_to_amend, 'rb') as read_file:
read_file2 = read_file.read()
hex_read_file = ( replacement_map.get(binascii.hexlify(bytes((int(i),))), binascii.hexlify(bytes((int(i),)))) for i in read_file2) # see Note below
with open(files_to_amend, 'wb') as write_file:
for b in hex_read_file:
write_file.write(b)
print("Amended: " + files_to_amend)
where replacement_map is a dict that you fill in with the batch_to_amend as key and the batch_amendment value (you can speficy multiple amendments too and it will work just the same). The call would then be:
for label in files_to_amend:
lbl_translate(label,{batch_to_amend:batch_amendment})
NOTE: Using standard python dicts, because of how comprehensions work, you need to call binascii.hexlify(bytes((int(i),))) twice here. A better option uses collections.defaultdict
A better option would use defaultdict, if they were implemented in a sensible way (see here for more context on why I say that). defaltdicts expect a lambda with no parameters generating the value for unknown keys, instead you need to create your own subclass of dict and implement the __missing__ method to obtain the desired behaviour:
hex_read_file = ( replacement_map[binascii.hexlify(bytes((int(i),)))] for i in read_file2) # replacement_map is a collections.defaultdict
and you define replacement_map as:
class dict_with_key_as_default(dict): # find a better name for the type
def __missing__(self, key):
'''if a value is not in the dictionary, return the key value instead.'''
return key
replacement_map = dict_with_key_as_default()
replacement_map[batch_to_amend] = batch_amendment
for label in files_to_amend:
lbl_translate(label, replacement_map)
(class dict_with_key_as_default taken from this answer and renamed for clarity)
Edit note: As mentioned in the comments, the OP has an error in the comprehension where they call hexlify() on some binary string instead of integer values. The solution adds a cast to int for the bytes where relevant, but it's far from the best solution to this problem. Since the OP's intent is not clear, I left it as close to the original as possible, but an alternative solution should be used instead.

How to pass "random" amount of variables not all of them exist

I have a method to validate input:
def validate_user_input(*args):
for item in args:
if not re.match('^[a-zA-Z0-9_-]+$', item):
And I'm calling it like this:
validate_user_input(var1, var2, ..., var7)
But those are generated from user input, and some of those can be missing. What would be the proper way to do that, without creating tons of if statements?
Variables are assigned from a json input like so, and json input might not have some of the needed properties:
var1 = request.json.get('var1')
I assume they are <class 'NoneType'>
Here's the error: TypeError: expected string or buffer

If your request.json object is a dict or dict-like you can just pass a default value as second argument to get

If I understand correctly you are generating var_ variables by request.json.get('var_') which will either return a string which you want to validate or None if the field was missing.
If this is the case then you can just add a special case to validate_user_input for a None value:
def validate_user_input(*args):
for item in args:
if item is None:
continue #this is acceptable, don't do anything with it
elif not re.match('^[a-zA-Z0-9_-]+$', item):
...
Or it may make more sense to store all of the values you are interested in in a dictionary:
wanted_keys = {'var1','var2','var3'}
## set intersection works in python3
present_keys = wanted_keys & response.json.keys()
## or for python 2 use a basic list comp
#present_keys = [key for key in response.json.keys() if key in wanted_keys]
actual_data = {key: response.json[key] for key in present_keys}
Then you would pass actual_data.values() as the argument list to validate_user_input.

If it really is possible that some var-variables are undefined when you call validate_user_input, why not just initialize them all (e.g. to the empty string '' so that your regex fails) before actually defining them?

Can I implement a function or better a decorator that makes func(a1)(a2)(a3)...(an) == func(a1, a2, a3,...,an)? [duplicate]

On Codewars.com I encountered the following task:
Create a function add that adds numbers together when called in succession. So add(1) should return 1, add(1)(2) should return 1+2, ...
While I'm familiar with the basics of Python, I've never encountered a function that is able to be called in such succession, i.e. a function f(x) that can be called as f(x)(y)(z).... Thus far, I'm not even sure how to interpret this notation.
As a mathematician, I'd suspect that f(x)(y) is a function that assigns to every x a function g_{x} and then returns g_{x}(y) and likewise for f(x)(y)(z).
Should this interpretation be correct, Python would allow me to dynamically create functions which seems very interesting to me. I've searched the web for the past hour, but wasn't able to find a lead in the right direction. Since I don't know how this programming concept is called, however, this may not be too surprising.
How do you call this concept and where can I read more about it?

I don't know whether this is function chaining as much as it's callable chaining, but, since functions are callables I guess there's no harm done. Either way, there's two ways I can think of doing this:
Sub-classing int and defining __call__:
The first way would be with a custom int subclass that defines __call__ which returns a new instance of itself with the updated value:
class CustomInt(int):
def __call__(self, v):
return CustomInt(self + v)
Function add can now be defined to return a CustomInt instance, which, as a callable that returns an updated value of itself, can be called in succession:
>>> def add(v):
... return CustomInt(v)
>>> add(1)
1
>>> add(1)(2)
3
>>> add(1)(2)(3)(44) # and so on..
50
In addition, as an int subclass, the returned value retains the __repr__ and __str__ behavior of ints. For more complex operations though, you should define other dunders appropriately.
As #Caridorc noted in a comment, add could also be simply written as:
add = CustomInt
Renaming the class to add instead of CustomInt also works similarly.
Define a closure, requires extra call to yield value:
The only other way I can think of involves a nested function that requires an extra empty argument call in order to return the result. I'm not using nonlocal and opt for attaching attributes to the function objects to make it portable between Pythons:
def add(v):
def _inner_adder(val=None):
"""
if val is None we return _inner_adder.v
else we increment and return ourselves
"""
if val is None:
return _inner_adder.v
_inner_adder.v += val
return _inner_adder
_inner_adder.v = v # save value
return _inner_adder
This continuously returns itself (_inner_adder) which, if a val is supplied, increments it (_inner_adder += val) and if not, returns the value as it is. Like I mentioned, it requires an extra () call in order to return the incremented value:
>>> add(1)(2)()
3
>>> add(1)(2)(3)() # and so on..
6

You can hate me, but here is a one-liner :)
add = lambda v: type("", (int,), {"__call__": lambda self, v: self.__class__(self + v)})(v)
Edit: Ok, how this works? The code is identical to answer of #Jim, but everything happens on a single line.
type can be used to construct new types: type(name, bases, dict) -> a new type. For name we provide empty string, as name is not really needed in this case. For bases (tuple) we provide an (int,), which is identical to inheriting int. dict are the class attributes, where we attach the __call__ lambda.
self.__class__(self + v) is identical to return CustomInt(self + v)
The new type is constructed and returned within the outer lambda.

If you want to define a function to be called multiple times, first you need to return a callable object each time (for example a function) otherwise you have to create your own object by defining a __call__ attribute, in order for it to be callable.
The next point is that you need to preserve all the arguments, which in this case means you might want to use Coroutines or a recursive function. But note that Coroutines are much more optimized/flexible than recursive functions, specially for such tasks.
Here is a sample function using Coroutines, that preserves the latest state of itself. Note that it can't be called multiple times since the return value is an integer which is not callable, but you might think about turning this into your expected object ;-).
def add():
current = yield
while True:
value = yield current
current = value + current
it = add()
next(it)
print(it.send(10))
print(it.send(2))
print(it.send(4))
10
12
16

Simply:
class add(int):
def __call__(self, n):
return add(self + n)

If you are willing to accept an additional () in order to retrieve the result you can use functools.partial:
from functools import partial
def add(*args, result=0):
return partial(add, result=sum(args)+result) if args else result
For example:
>>> add(1)
functools.partial(<function add at 0x7ffbcf3ff430>, result=1)
>>> add(1)(2)
functools.partial(<function add at 0x7ffbcf3ff430>, result=3)
>>> add(1)(2)()
3
This also allows specifying multiple numbers at once:
>>> add(1, 2, 3)(4, 5)(6)()
21
If you want to restrict it to a single number you can do the following:
def add(x=None, *, result=0):
return partial(add, result=x+result) if x is not None else result
If you want add(x)(y)(z) to readily return the result and be further callable then sub-classing int is the way to go.

The pythonic way to do this would be to use dynamic arguments:
def add(*args):
return sum(args)
This is not the answer you're looking for, and you may know this, but I thought I would give it anyway because if someone was wondering about doing this not out of curiosity but for work. They should probably have the "right thing to do" answer.

Python 3 changing value of dictionary key in for loop not working

I have python 3 code that is not working as expected:
def addFunc(x,y):
print (x+y)
def subABC(x,y,z):
print (x-y-z)
def doublePower(base,exp):
print(2*base**exp)
def RootFunc(inputDict):
for k,v in inputDict.items():
if v[0]==1:
d[k] = addFunc(*v[1:])
elif v[0] ==2:
d[k] = subABC(*v[1:])
elif v[0]==3:
d[k] = doublePower(*v[1:])
d={"s1_7":[1,5,2],"d1_6":[2,12,3,3],"e1_3200":[3,40,2],"s2_13":[1,6,7],"d2_30":[2,42,2,10]}
RootFunc(d)
#test to make sure key var assignment works
print(d)
I get:
{'d2_30': None, 's2_13': None, 's1_7': None, 'e1_3200': None, 'd1_6': None}
I expected:
{'d2_30': 30, 's2_13': 13, 's1_7': 7, 'e1_3200': 3200, 'd1_6': 6}
What's wrong?
Semi related: I know dictionaries are unordered but is there any reason why python picked this order? Does it run the keys through a randomizer?

print does not return a value. It returns None, so every time you call your functions, they're printing to standard output and returning None. Try changing all print statements to return like so:
def addFunc(x,y):
return x+y
This will give the value x+y back to whatever called the function.
Another problem with your code (unless you meant to do this) is that you define a dictionary d and then when you define your function, you are working on this dictionary d and not the dictionary that is 'input':
def RootFunc(inputDict):
for k,v in inputDict.items():
if v[0]==1:
d[k] = addFunc(*v[1:])
Are you planning to always change d and not the dictionary that you are iterating over, inputDict?
There may be other issues as well (accepting a variable number of arguments within your functions, for instance), but it's good to address one problem at a time.
Additional Notes on Functions:
Here's some sort-of pseudocode that attempts to convey how functions are often used:
def sample_function(some_data):
modified_data = []
for element in some_data:
do some processing
add processed crap to modified_data
return modified_data
Functions are considered 'black box', which means you structure them so that you can dump some data into them and they always do the same stuff and you can call them over and over again. They will either return values or yield values or update some value or attribute or something (the latter are called 'side effects'). For the moment, just pay attention to the return statement.
Another interesting thing is that functions have 'scope' which means that when I just defined it with a fake-name for the argument, I don't actually have to have a variable called "some_data". I can pass whatever I want to the function, but inside the function I can refer to the fake name and create other variables that really only matter within the context of the function.
Now, if we run my function above, it will go ahead and process the data:
sample_function(my_data_set)
But this is often kind of pointless because the function is supposed to return something and I didn't do anything with what it returned. What I should do is assign the value of the function and its arguments to some container so I can keep the processed information.
my_modified_data = sample_function(my_data_set)
This is a really common way to use functions and you'll probably see it again.
One Simple Way to Approach Your Problem:
Taking all this into consideration, here is one way to solve your problem that comes from a really common programming paradigm:
def RootFunc(inputDict):
temp_dict = {}
for k,v in inputDict.items():
if v[0]==1:
temp_dict[k] = addFunc(*v[1:])
elif v[0] ==2:
temp_dict[k] = subABC(*v[1:])
elif v[0]==3:
temp_dict[k] = doublePower(*v[1:])
return temp_dict
inputDict={"s1_7":[1,5,2],"d1_6":[2,12,3,3],"e1_3200":[3,40,2],"s2_13":[1,6,7],"d2_30"[2,42,2,10]}
final_dict = RootFunc(inputDict)

As erewok stated, you are using "print" and not "return" which may be the source of your error. And as far as the ordering is concerned, you already know that dictionaries are unordered, according to python doc at least, the ordering is not random, but rather implemented as hash tables.
Excerpt from the python doc: [...]A mapping object maps hashable values to arbitrary objects. Mappings are mutable objects. There is currently only one standard mapping type, the dictionary. [...]
Now key here is that the order of the element is not really random. I have often noticed that the order stays the same no matter how I construct a dictionary on some values... using lambda or just creating it outright, the order has always remained the same, so it can't be random, but it's definitely arbitrary.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pickling a dictionary that uses defaultdict - python

Related

How can I get the function object inside a lambda function

"AttributeError: 'generator' object has no attribute 'replace' "

How to pass "random" amount of variables not all of them exist

Can I implement a function or better a decorator that makes func(a1)(a2)(a3)...(an) == func(a1, a2, a3,...,an)? [duplicate]

Python 3 changing value of dictionary key in for loop not working

Categories

Resources