every month I get a dataframe , so every month I will have to do some adjusts to the dataframe, I would like to create a function for just apply it on every dataframe without create the code again.
I have for the first dataframe, called enero:
for i in range(0,len(enero)):
if enero.loc[i,"VENDEDOR_CLIENTE"] == "ARTURO":
enero.loc[i,"MARCA"]="MAQUILA PINTUCO"
elif enero.loc[i,"PROVEEDOR"] == "PEPITO" and enero.loc[i,"VENDEDOR_CLIENTE"] != "ARTURO":
enero.loc[i,"MARCA"]="PINTURAS"
For the second dataframe, called febrero:
for i in range(0,len(febrero)):
if febrero.loc[i,"VENDEDOR_CLIENTE"] == "ARTURO":
febrero.loc[i,"MARCA"]="MAQUILA PINTUCO"
elif febrero.loc[i,"PROVEEDOR"] == "PEPITO" and febrero.loc[i,"VENDEDOR_CLIENTE"] != "ARTURO":
febrero.loc[i,"MARCA"]="PINTURAS"
So, as not to repeat the code every month, I would like to create a function:
def ajustemarca(df,VENDEDOR_CLIENTE,MARCA,PROVEEDOR):
for i in range(0,len(df)):
if df.loc[i,"VENDEDOR_CLIENTE"] == "ARTURO":
df.loc[i,"MARCA"]="MAQUILA PINTUCO"
elif df.loc[i,"PROVEEDOR"] == "PEPITO" and df.loc[i,"VENDEDOR_CLIENTE"] != "ARTURO":
df.loc[i,"MARCA"]="PINTURAS"
return df.loc[i,"MARCA"]
Then, I am calling the function:
enero.apply(ajustemarca)
febrero.apply(ajustemarca)
But, it does not work. How can I do this function?
I share an answer that someone wrote here, but it was deleted :(
The answer was perfect and now the code work:
def ajustemarca(df):
for i in range(0,len(df)):
if df.loc[i,"VENDEDOR_CLIENTE"] == "ARTURO":
df.loc[i,"MARCA"]="MAQUILA PINTUCO"
elif df.loc[i,"PROVEEDOR"] == "PEPITO." and df.loc[i,"VENDEDOR_CLIENTE"] != "ARTURO":
df.loc[i,"MARCA"]="PINTURAS"
ajustemarca(enero)
ajustemarca(febrero)
Related
I am trying to add an if condition in F. when in a pyspark column
my code:
df = df.withColumn("column_fruits",F.when(F.col('column_fruits') == "Berries"
if("fruit_color")== "red":
return "cherries"
elif("fruit_color") == "pink":
return "strawberries"
else:
return "balackberries").otherwise("column_fruits")
I want to first filter out berries and change fruit names according to color. And all the remaining fruits remain the same.
Can anyone tell me if this is a valid way of writing withColumn code?
This would work
df.withColumn("column_fruits", F.when((F.col('column_fruits') == "Berries") & (F.col('fruit_color') == "red"), "cherries")\
.when((F.col('column_fruits') == "Berries") & (F.col('fruit_color') == "pink"), "strawberries")\
.otherwise("blackberries"))\
.show()
Sample Input/Output:
I'm trying to create a new variable based on a simple variable ModelType and a df variable model.
Currently I'm doing it in this way
if ModelType == 'FRSG':
df=df.withColumn(MODEL_NAME+'_veh', F.when(df["model"].isin(MDL_CD), df["ford_cd"]))
elif ModelType == 'TYSG':
df=df.withColumn(MODEL_NAME+'_veh', F.when(df["model"].isin(MDL_CD), df["toyota_cd"]))
else:
df=df.withColumn(MODEL_NAME+'_veh', F.when(df["model"].isin(MDL_CD), df["cm_cd"]))
I have tried this as well
df=df.withColumn(MODEL_NAME+'_veh', F.when((ModelType == 'FRSG') &(df["model"].isin(MDL_CD)), df["ford_cd"]))
but since the variable ModelType is not a column so it gives an error
TypeError: condition should be a Column
Is there any other efficient method also to perform the same?
You can also use a dict that holds the possible mappings for ModelType and use it like this:
model_mapping = {"FRSG": "ford_cd", "TYSG": "toyota_cd"}
df = df.withColumn(
MODEL_NAME + '_veh',
F.when(df["model"].isin(MDL_CD), df[model_mapping.get(ModelType, "cm_cd")])
)
I would probably use a variable for the column to be chosen in the then part:
if ModelType == 'FRSG':
x = "ford_cd"
elif ModelType == 'TYSG':
x = "toyota_cd"
else:
x = "cm_cd"
df=df.withColumn(MODEL_NAME+'_veh', F.when(df["model"].isin(MDL_CD), df[x]))
I would like to learn how to use df.loc and for-loop to calculate new columns for the dataframe below
Problem: from df_G, for T = 400, take value of each Go_j as input
Then add new column "G_ads_400" in dataframe df = df['Adsorption_energy_eV'] - Go_h2o
df_G
df
here is my code for each Temperature
Go_co2 = df_G.loc[df_G.index == "400" & df_G.Go_CO2]
Go_o2= df_G.loc[df_G.index == "400" & df_G.Go_O2]
Go_co= df_G.loc[df_G.index == "400" & df_G.Go_CO]
df.loc[df['Adsorbates'] == "CO2", "G_ads_400"] = df.Adsorption_energy_eV-Go_co2
df.loc[df['Adsorbates'] == "CO", "G_ads_400"] = df.Adsorption_energy_eV-Go_co
df.loc[df['Adsorbates'] == "O2", "G_ads_400"] = df.Adsorption_energy_eV-Go_o2
I am not sure why I kept having error and I would like to know how to put it in a for-loop so it looks less messy
I have a Dataframe for instance:
df <- data.frame("condition" = [a,a,a,b,b,b,a,b], "dv1" = [7,8,6,3,2,1,5,4])`
and I want to subtract 10 from column dv1 only if values in column condition equals to "a". Is there a way in python to do so such as using def and if function? I have tried the following but doesn't work:
def recode():
for i in df["condition']:
if i == "a":
return abs(10-df["dv1"])
Indent the code like this:
def recode():
for i in df["condition"]:
if i == "a":
return abs(10-df["dv1"])
I have an Access Query which I want to convert into Python Script:
SELECT
[Functional_Details].Customer_No,
Sum([Functional_Details].[SUM(Incoming_Hours)]) AS [SumOfSUM(Incoming_Hours)],
Sum([Functional_Details].[SUM(Incoming_Minutes)]) AS [SumOfSUM(Incoming_Minutes)],
Sum([Functional_Details].[SUM(Incoming_Seconds)]) AS [SumOfSUM(Incoming_Seconds)],
[Functional_Details].Rate,
[Functional_Details].Customer_Type
FROM [Functional_Details]
WHERE(
(([Functional_Details].User_ID) Not In ("IND"))
AND
(([Functional_Details].Incoming_ID)="Airtel")
AND
(([Functional_Details].Incoming_Category)="Foreign")
AND
(([Functional_Details].Outgoing_ID)="Airtel")
AND
(([Functional_Details].Outgoing_Category)="Foreign")
AND
(([Functional_Details].Current_Operation)="NO")
AND
(([Functional_Details].Active)="NO")
)
GROUP BY [Functional_Details].Customer_No, [Functional_Details].Rate, [Functional_Details].Customer_Type
HAVING ((([Functional_Details].Customer_Type)="Check"));
I have Functional_Details stored in a dataframe: df_functional_details
I am not able to understand how to proceed with the python script.
So far I have tried:
df_fd_temp=df_functional_details.copy()
if(df_fd_temp['User_ID'] != 'IND'
and df_fd_temp['Incoming_ID'] == 'Airtel'
and df_fd_temp['Incoming_Category'] == 'Foreign'
and df_fd_temp['Outgoing_ID'] == 'Airtel'
and df_fd_temp['Outgoing_Category'] == 'Foreign'
and df_fd_temp['Current_Operation'] == 'NO'
and df_fd_temp['Active'] == 'NO'):
df_fd_temp.groupby(['Customer_No','Rate','Customer_Type']).groups
df_fd_temp[df_fd_temp['Customer_Type'].str.contains("Check")]
First, select the rows where the conditions apply (note the parentheses and & instead of and):
df_fd_temp = df_fd_temp[(df_fd_temp['User_ID'] != 'IND') &
(df_fd_temp['Incoming_ID'] == 'Airtel') &
(df_fd_temp['Incoming_Category'] == 'Foreign') &
(df_fd_temp['Outgoing_ID'] == 'Airtel') &
(df_fd_temp['Outgoing_Category'] == 'Foreign') &
(df_fd_temp['Current_Operation'] == 'NO') &
(df_fd_temp['Active'] == 'NO')]
Then, do the group-by logic:
df_grouped = df_fd_temp.groupby(['Customer_No','Rate','Customer_Type'])
You now have a groupby object, which you can further manipulate and filter:
df_grouped.filter(lambda x: "Check" in x['Customer_Type'])
You might need to tweak the group filtering based on what your actual dataset looks like.
Further reading:
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.DataFrameGroupBy.filter.html