How to pass arguments when using apply in DataFrame - python

I want to apply a function along a column, eg coco.convert, and pass to it, along with the value in the row, the argument to = 'ISO3'.
df['alpha-3'] = df['Country'].apply(coco.convert('ISO3'))
misses 1 positional argument:
TypeError: convert() missing 1 required positional argument: 'names'
but
df['alpha-3'] = df['Country'].apply(coco.convert)
works fine (I assume it uses default values).
How do I pass the positional argument here?
Also, what exactly is happening here - can someone explain a little how the function is passed to apply?

Probably the most normal way to solve this would be to use the **kwds argument for .apply which are covered in the apply documentation. Basically you can just pass any other named arguments to .apply() and it uses them with the passed function.
df['alpha-3'] = df['Country'].apply(coco.convert, to='ISO3')
An alternative way to do it would be to define your own new function with the arguments pre-passed such as below.
def my_fun(x):
return coco.convert(names = x, to = 'ISO3')
df['alpha-3'] = df['Country'].apply(my_fun)
To your other about how .apply works...
Take the column of data you give it, and loop through each element
For each element, feed its value into the supplied function as the *first
For each return of that function, convert it into another Pandas series.

Related

Python - parenthesis used in syntax

Im VERY new to programming. Its my second day. In functions like file.read(), empty parenthesis are part of the syntax. Are they always supposed to be empty, or is there an option to fill them? My script works fine, It's just a question I've always had.
When you define a function, you specify the function's arguments. A function can have zero or more arguments. When a function has zero arguments, or all of its arguments have default values, then you can call the function without passing any arguments.
It depends whether you what them empty (no arguments ) or not (with arguments)
Here is an example of a function
#Let's create a function that can add two values :)
def add(x, y): # the function have positional arguments 'x' 'y'
z = x + y
return z # the function output the value of z
addition = add(5, 33) # call the add() function with arrguments x=5, y=33
print(addition)
#this time variables as arguments
a = 5
b = 33
additon = add(a ,b)
print(addition)
As you can see above that function takes input as arguments and returns the output

difference between "function()" and "function"

I see
df["col2"] = df["col1"].apply(len)
len(df["col1"])
My question is,
Why use "len" function without parenthesis in 1, but use it with parenthesis in 2?
What is the difference between the two?
I see this kind of occasion a lot, where using a function with and without parenthesis.
Can someone explain to me what exactly is going on?
Thanks.
The first example that you mentioned(the above code) maps the function len to the target variable df["col1"]
df["col2"] = df["col1"].apply(len)
Whenever we have to map a function to any iterable object, the syntax needs the function to be without parenthesis.
In your case, df["col1"] must be having elements whose length can be calculated. And it will return a Pandas Series will lengths of all the elements.
Take the following example.
a = ["1", "2","3","4"]
z = list( map( int, a ) ) >> [1, 2, 3, 4]
Here, we mapped the builtin int function(which does typecasting), to the entire list.
The second example that you mentioned would give out the length of the df["col1"] series.
len(df["col1"])
It won't do any operations on the elements within that Series.
Take the following example.
a = ["1", "2","3","4"]
z = len(a) >> 4
Since, on both the occasions, the function len was fed an iterable object, it didn't give any error. But, the outputs are completely different as I explained!
In 1, the function len is being passed to a method called apply. That method presumably will apply the function len along the first axis (probably returning something like a list of lengths). In 2, the function len is being called directly, with an argument df["col2"], presumably to get the length of the data frame.
The use in 1 is sometimes called a "higher order function", but in principle it's just passing a function to another function for it to use.
In the second case you are directly calling the len method and will get the result, i.e. how many rows are in col1 in the df.
In the first you are giving the reference to the len function to the apply function.
This is a shortcut for df["col2"] = df["col1"].apply(lambda x: len(x))
This version you use if you want to make the behavior of a method flexible by letting the user of the method hand in the function to influence some part of an algorithm. Like here in the case with the apply method. Depending of the conents in the column you want to fill the new column with something, and here it was decided to fill this with the lengths of the content of other column.
len(s) will return the lenght of the s variable
len will return the function itslelf. So if I do a=len, then I can do a(s). Of course, it is not recommended to do such thing as a=len.
Let's have a look at the documentation of DataFrame.apply:
its first parameter is func: function which is a function that we'll apply to each column or row of the DataFrame. In your case this function is len().
Now let's see what happens when you pass len as a parameter with parenthesis:
df.apply(len())
-----------------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/tmp/ipykernel_11920/3211016940.py in <module>
----> 1 df.apply(len())
TypeError: len() takes exactly one argument (0 given)
While this perfectly works when we use df.apply(len).
This is because your parameter must be a function and the way Python uses to distinguish between functions and the return value of the call to a function is the use of parenthesis in the second case.

Can I use one argument as the default argument in the same function? [duplicate]

I want to define a resize(h, w) method, and I want to be able to call it in one of two ways:
resize(x,y)
resize(x)
Where, in the second call, I want y to be equal to x. Can I do this in the method definition or should I do something like resize(x,y=None) and check inside:
if y is None:
y = x
Can I do this in the method definition
No. During the method definition there's no way to know what value x might have at run-time. Default arguments are evaluated once at definition time, there's no way to save a dynamic value for y.
or should I do something like resize(x,y=None) and check inside
exactly. This is a common idiom in Python.
To complete Jim's answer, in the case that None is valid as a parameter value, you could use variable length arguments feature (positional, and/or keyword). Example of use for positional:
def resize(x,*args):
if args:
if len(args)>1:
raise Exception("Too many arguments")
y = args[0]
else:
y = x
in that example, you have to pass x, but you can pass 0 or more extra positional arguments. Of course you have to do the checking manually, and you lose the y keyword.

Python function argument passing sequence

Following code is incorrect:
def add(a, b, c):
return a + b + c
args = (2, 3)
add(a = 1, *args)
TypeError: add() got multiple values for keyword argument 'a'
I've seen some example in python docs, but I still don't know why there's an error, can anybody explain in detail?
When applying the arguments, Python first fills in the positional arguments, then the keyword arguments.
In your specific case, *args is then applied firsts, so the first positional argument is passed 2, the second is passed 3. The first argument is a here.
Then the a = 1 is applied, and Python finds that you already applied a value to it.
In other words, Python cannot and will not take positional arguments out of consideration when you use one as a keyword argument. Just because you used a as keyword argument does not make it ineligible as a positional argument.

Python - list of function/argument tuples

def f1(n): #accepts one argument
pass
def f2(): #accepts no arguments
pass
FUNCTION_LIST = [(f1,(2)), #each list entry is a tuple containing a function object and a tuple of arguments
(f1,(6)),
(f2,())]
for f, arg in FUNCTION_LIST:
f(arg)
The third time round in the loop, it attempts to pass an empty tuple of arguments to a function that accepts no arguments. It gives the error TypeError: f2() takes no arguments (1 given). The first two function calls work correctly - the content of the tuple gets passed, not the tuple itself.
Getting rid of the empty tuple of arguments in the offending list entry doesn't solve the problem:
FUNCTION_LIST[2] = (f2,)
for f,arg in FUNCTION_LIST:
f(arg)
results in ValueError: need more than 1 value to unpack.
I've also tried iterating over the index rather then the list elements.
for n in range(len(FUNCTION_LIST)):
FUNCTION_LIST[n][0](FUNCTION_LIST[n][1])
This gives the same TypeError in the first case, and IndexError: tuple index out of range when the third entry of the list is (f2,).
Finally, asterisk notation doesn't work either. This time it errors on the call to f1:
for f,args in FUNCTION_LIST:
f(*args)
gives TypeError: f1() argument after * must be a sequence, not int.
I've run out of things to try. I still think the first one ought to work. Can anyone point me in the right direction?
Your comment in this code snippet shows a misconception relevant in this context:
FUNCTION_LIST = [(f1,(2)), #each list entry is a tuple containing a function object and a tuple of arguments
(f1,(6)),
(f2,())]
The expressions (2) and (6) are not tuples – they are integers. You should use (2,) and (6,) to denote the single-element tuples you want. After fixing this, your loop code should look thus:
for f, args in FUNCTION_LIST:
f(*args)
See Unpacking Argument Lists in the Python tutorial for an explanation of the *args syntax.
The problem is that such notation:
(6)
evaluates to integer value and you need tuple, so write this way:
(6, )
and your asterisk notation will succeed.
Try passing *() instead of (). The * symbol tells python to unpack the iterable that follows it, so it unpacks the empty tuple and passes nothing to the function, since the tuple was empty.
For the record, a nice alternative I have since discovered is the use of functools.partial. The following code does what I was trying to do:
from functools import partial
def f1(n): #accepts one argument
pass
def f2(): #accepts no arguments
pass
FUNCTION_LIST = [partial(f1,2), #each list entry is a callable with the argument pre-ordained
partial(f1,6),
partial(f2)] #the call to partial is not really necessary for the entry with no arguments.
for f in FUNCTION_LIST: f()

Categories

Resources