PySpark: when function with multiple outputs [duplicate] - python

This question already has answers here:
Spark Equivalent of IF Then ELSE
(4 answers)
Closed 5 years ago.
I am trying to use a "chained when" function.
In other words, I'd like to get more than two outputs.
I tried using the same logic of the concatenate IF function in Excel:
df.withColumn("device_id", when(col("device")=="desktop",1)).otherwise(when(col("device")=="mobile",2)).otherwise(null))
But that doesn't work since I can't put a tuple into the "otherwise" function.

Have you tried:
from pyspark.sql import functions as F
df.withColumn('device_id', F.when(col('device')=='desktop', 1).when(col('device')=='mobile', 2).otherwise(None))
Note that when chaining when functions you do not need to wrap the successive calls in an otherwise function.

Related

Explanation or documentation of how this string replace lambda group function works [duplicate]

This question already has answers here:
Function to dictate the replacements in re.sub method (Python)
(2 answers)
How to input a regex in string.replace?
(7 answers)
Closed 2 years ago.
I have this function here which I use to replace the a string like this 'ABCDe.CO' with 'ABCD-E.CO' in a pandas dataframe.
I don't understand how the group(0) part works, I can't find any documentation on it. Could someone explain to me what this function does or where I can read up on it?
(df.loc[df.country.eq('ST'), 'ticker'].str.replace('([a-z])', lambda x: '-'+x.group(0).upper()))

How to export a list of pandas data frames to Excel using a nested generator expression? [duplicate]

This question already has answers here:
Understanding generators in Python
(13 answers)
How exactly does a generator comprehension work?
(8 answers)
Closed 3 years ago.
I'm trying to export a list of Pandas data frames to Excel files using a generator expression. However nothing is exported once the script has finished executing. It works if I use a for loop, but not using a generator expression. I'm really interested in knowing how it could work, but also why, thanks in advance.
This doesn't work:
def process_pandas_dfs():
(df.to_excel('df_name.xlsx') for df in list_of_dfs)
However this does:
def process_pandas_dfs():
for df in list_of_dfs:
df.to_excel('df_name.xlsx')
generator expressions are not evaluated during definition, they create a iterable generator object. Use a list comprehension instead:
[df.to_excel('df_name.xlsx') for df in list_of_dfs]
Though as Yatu pointed out, the for loop would be the appropriate method of executing this method

Difference between two conditional queries on a pandas dataframe? [duplicate]

This question already has answers here:
pandas logical and operator with and without brackets produces different results [duplicate]
(2 answers)
Logical operators for Boolean indexing in Pandas
(4 answers)
Closed 3 years ago.
I was trying to find records based on two conditions on a data frame preg
First:
preg[preg.caseid==2298 & preg.pregordr==1]
This throws and error that truth value of a series is ambiguous.
Why?
Second:
But this one works!
preg[(preg.caseid==2298) & (preg.pregordr==1)]
So what exactly is the difference between the two?
Because it thinks that you're doing 2298 & preg.pregordr something like that, without parenthesis you can do:
preg[preg.caseid.eq(2298) & preg.pregordr.eq(1)]

Indexing Pandas DataFrame using variable [duplicate]

This question already has answers here:
Python slice how-to, I know the Python slice but how can I use built-in slice object for it?
(6 answers)
Closed 6 years ago.
I'm just wondering if I can do something like:
df.loc['1990':'2000']
by doing something like:
my_slice = '1990':'2000'
df.loc[my_slice]
What I've written doesn't work, but is there something similar that does?
Yes, but you don't write slices like that. You write slice('1900', '2000', None) instead.

2 methods with the same name in python [duplicate]

This question already has answers here:
Python function overloading
(19 answers)
Closed 5 years ago.
In Python, could I write 2 methods having the same name but different number of parameters ?
em.on_create_experience(action.dest_id)
em.on_create_experience2(action.dest_id,0)
You can't write two separate methods with different parameter lists. But you can write one method with optional keyword parameters to do what you want.

Categories

Resources