How to Dynamically generate for loop based on columns in dataframe?

How to Dynamically generate for loop based on columns in dataframe? - python

I am trying to generate a for loop dynamically based on the number of columns in a dataframe.
For e.g if my columns in dataframe is 5, then I generate the for loop and assign variables accordingly.
if
df_cols = ['USER_ID', 'BLID', 'PACKAGE_NAME', 'PACKAGE_PRICE', 'ENDED_DATE']
and brics is my dataframe
Then
for index, row in brics.iterrows():
analytics.track(row['USER_ID'], 'Cancelled Subscription', {
df_cols[1]: row['BLID']
df_cols[2]: row['PACKAGE_NAME'],
df_cols[3]: row['PACKAGE_PRICE'],
df_cols[4]: row['ENDED_DATE'],
})
The df_cols and the row[value] should be generated based on the number of columns in dataframe.
For e.g, if there are only 2 columns in data frame the below is how the code should look like.
if
df_cols = ['USER_ID', 'BLID']
Then
for index, row in brics.iterrows():
analytics.track(row['USER_ID'], 'Cancelled Subscription', {
df_cols[1]: row['BLID']
})
I searched SO for this solution but couldnt find the one related to dataframe's (Though R is available). Any pointers will be helpful.THank you.

df_cols = ['USER_ID', 'BLID', 'PACKAGE_NAME', 'PACKAGE_PRICE', 'ENDED_DATE']
for index, row in brics.iterrows():
analytics.track(row['USER_ID'], 'Cancelled Subscription', {
df_cols[i+1]: row[df_cols[i]] for i in range(len(df_cols)-1)
})

Related

Find values in a Pandas dataframe and insert the data in a column of another Pandas dataframe

I have a dataframe that I need to convert the Custom Field column rows to columns in a second dataframe. This part I have managed to do and it works fine.
The problem is that I need to add the corresponding values from the id column to the respective columns of the second dataframe.
Here is an example:
This is first dataframe:
This is the second dataframe, with the columns already converted.
But I would like to add the values corresponding to the id column of the first dataframe to the second dataframe:
Attached is the code:
import pandas as pd
Data = {
"Custom Field": ["CF1", "CF2", "CF3"],
"id": [50, 40, 45],
"Name": ["Wilson", "Junior", "Otavio"]
}
### create the dataframe ###
df = pd.DataFrame(data)
print(df)
### add new columns from a list ###
columns_list = []
for x in df['Custom Field']:
### create multiple columns with x ##
columns_list.append(x)
### convert list to new columns ###
df2 = pd.DataFrame(df,columns=columns_list)
df2["Name"] = df["Name"]
print(df2)
### If Name df3 is equal to Name df and equal to Custom Field of df, then get the id of df and insert the value into the corresponding column in df3. ###
#### First unsuccessful attempt ###
df2_columns_names = list(df2.columns.values)
for df2_name in df2['Name']:
for df2_cf in df2_columns_names:
for df_name in df['Name']:
for df_cf in df['Custom Field']:
for df_id in df['id']:
if df2_name == df_name and df2_cf == df_cf:
df2.loc[df2_name, df2_cf] = df_id
print(df2)
Any suggestions?
Thanks in advance.

Use pivot_table
df.pivot_table(index=['Name'], columns=['Custom Field'])
As a general rule of thumb, if you are doing for loops and changing cells manually, you're using pandas wrong. Explore the methods of the framework in the docs, it can be very powerful :)

Adding column in dataframe through loop and populate according to content in another column

I have the following dataframe (New_Data) and I want to add a new column depending on the content of column 'merchant'. For example if 'merchant' contains 'AMZ' or 'AMZN', I want the column to return 'Amazon', if 'merchant' contains 'PRIME', I want the column to return 'Video' and so on and so forth until the last row. I would like to do this through a loop.
I have attempted the following which creates a column but I dont know how to combine loop with content if-function:
merchantlength=len(New_Data[['Merchant']])
merchantlength
i=0
for i in range(merchantlength):
df['newcolumn']="1"
New_Data = pd.concat([ df], axis=1)
New_Data
Dataframe

Do you really want to use looping?
Look at Numpy select function.
https://numpy.org/doc/stable/reference/generated/numpy.select.html
condlist = [
df['merchant'] == 'AMZ',
df['merchant'] == 'PRIME',
]
choicelist = [
'Amazon',
'Video'
]
df['newcol'] = np.select(condlist, choicelist)

So you are trying to add some data in the new_column based on the value of "merchant" column so try doing this:
values = []
for i in df['merchant']:
if i=='AMZ':
values.append('Amazon')
elif i == 'prime':
values.append('video')
after adding every possible category values just append this list as new column with some name as
df['new_column'] = values

pandas df masking specific row by list

I have pandas df which has 7000 rows * 7 columns. And I have list (row_list) that consists with the value that I want to filter out from df.
What I want to do is to filter out the rows if the rows from df contain the corresponding value in the list.
This is what I got when I tried,
"Empty DataFrame
Columns: [A,B,C,D,E,F,G]
Index: []"
df = pd.read_csv('filename.csv')
df1 = pd.read_csv('filename1.csv', names = 'A')
row_list = []
for index, rows in df1.iterrows():
my_list = [rows.A]
row_list.append(my_list)
boolean_series = df.D.isin(row_list)
filtered_df = df[boolean_series]
print(filtered_df)

replace
boolean_series = df.RightInsoleImage.isin(row_list)
with
boolean_series = df.RightInsoleImage.isin(df1.A)
And let us know the result. If it doesn't work show a sample of df and df1.A

(1) generating separate dfs for each condition, concat, then dedup (slow)
(2) a custom function to annotate with bool column (default as False, then annotated True if condition is fulfilled), then filter based on that column
(3) keep a list of indices of all rows with your row_list values, then filter using iloc based on your indices list
Without an MRE, sample data, or a reason why your method didn't work, it's difficult to provide a more specific answer.

Create a dictionary from pandas empty dataframe with only column names

I have a pandas data frame with only two column names( single row, which can be also considered as headers).I want to make a dictionary out of this with the first column being the value and the second column being the key.I already tried the
to.dict() method, but it's not working as it's an empty dataframe.
Example
df=|Land |Norway| to {'Land': Norway}
I can change the pandas data frame to some other type and find my way around it, but this question is mostly to learn the best/different/efficient approach for this problem.
For now I have this as the solution :
dict(zip(a.iloc[0:0,0:1],a.iloc[0:0,1:2]))
Is there any other way to do this?

Here's a simple way convert the columns to a list and a list to a dictionary
def list_to_dict(a):
it = iter(a)
ret_dict = dict(zip(it, it))
return ret_dict
df = pd.DataFrame([], columns=['Land', 'Normway'])
dict_val = list_to_dict(df.columns.to_list())
dict_val # {'Land': 'Normway'}

Very manual solution
df = pd.DataFrame(columns=['Land', 'Norway'])
df = pd.DataFrame({df.columns[0]: df.columns[1]}, index=[0])
If you have any number of columns and you want each sequential pair to have this transformation, try:
df = pd.DataFrame(dict(zip(df.columns[::2], df.columns[1::2])), index=[0])
Note: You will get an error if your DataFrame does not have at least two columns.

Merge 2 relational dataframes to nested JSON / dataframe

I have another problem with joining to dataframes using pandas. I want to merge a complete dataframe into a column/field of another dataframe where the foreign key field of DF2 matches the unique key of DF1.
The input data are 2 CSV files roughly looking likes this:
CSV 1 / DF 1:
cid;name;surname;address
1;Mueller;Hans;42553
2;Meier;Peter;42873
3;Schmidt;Micha;42567
4;Pauli;Ulli;98790
5;Dick;Franz;45632
CSV 2 / DF 1:
OID;ticketid;XID;message
1;9;1;fgsgfs
2;8;2;gdfg
3;7;3;gfsfgfg
4;6;4;fgsfdgfd
5;5;5;dgsgd
6;4;5;dfgsgdf
7;3;1;dfgdhfd
8;2;2;dfdghgdh
I want each row of DF2, which XID matches with a cid of DF1, as a single field in DF1. my final goal is to convert the above input files into a nested JSON format.
Edit 1:
Something like this:
[
{
"cid": 1,
"name": "Mueller",
"surname": "Hans",
"address": 42553,
"ticket" :[{
"OID": 1,
"ticketid": 9,
"XID": 1,
"message": "fgsgfs"
}]
},
...]
Edit 2:
Some further thoughts: Would it be possible to create a dictionary of each row in dataframe 2 and then append this dictionary to a new column in dataframe 1 where some value (xid) of the dictionary matches with a unique id in a row (cid) ?
Some pseudo code I have in my mind:
Add new column "ticket" in DF1
Iterate over rows in DF2:
row to dictionary
iterate over DF1
find row where cid = dict.XID
append dictionary to field in "ticket"
convert DF1 to JSON
Non Python solution are also acceptable.

Not sure what you expect as output but check merge
df1.merge(df2, left_on="cid", right_on="XID", how="left")
[EDIT based on the expected output]
Maybe something like this:
(
df1.merge(
df2.groupby("XID").apply(lambda g: g.to_dict(orient="records")).reset_index(name="ticket"),
how="left", left_on="cid", right_on="XID")
.drop(["XID"], axis=1)
.to_json(orient="records")
)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to Dynamically generate for loop based on columns in dataframe? - python

df_cols = ['USER_ID', 'BLID', 'PACKAGE_NAME', 'PACKAGE_PRICE', 'ENDED_DATE'] for index, row in brics.iterrows(): analytics.track(row['USER_ID'], 'Cancelled Subscription', { df_cols[i+1]: row[df_cols[i]] for i in range(len(df_cols)-1) })

Related

Find values in a Pandas dataframe and insert the data in a column of another Pandas dataframe

Adding column in dataframe through loop and populate according to content in another column

pandas df masking specific row by list

Create a dictionary from pandas empty dataframe with only column names

Merge 2 relational dataframes to nested JSON / dataframe

Categories

Resources