Change values in DataFrame - .iloc vs .loc - python

Hey I write this code:
import pandas as pd
d1 = {"KEY": ["KEY1", "KEY2", "KEY3"], "value": ["A", "B", "C"]}
df1 = pd.DataFrame(d1)
df1["value 2"] = "nothing"
d2 = {"KEY": ["KEY2"], "value_alternative": ["D"]}
df2 = pd.DataFrame(d2)
for k in range(3):
key = df1.iloc[k]["KEY"]
print(key)
if key in list(df2["KEY"]):
df1.iloc[k]["value 2"] = df2.loc[df2["KEY"] == key, "value_alternative"].item()
else:
df1.iloc[k]["value 2"] = df1.iloc[k]["value"]
but unfortunately values in df1["value 2"] haven't changed :( I rewrite it as follows:
import pandas as pd
d1 = {"KEY": ["KEY1", "KEY2", "KEY3"], "value": ["A", "B", "C"]}
df1 = pd.DataFrame(d1)
df1["value 2"] = "nothing"
d2 = {"KEY": ["KEY2"], "value_alternative": ["D"]}
df2 = pd.DataFrame(d2)
for k in range(3):
key = df1.iloc[k]["KEY"]
print(key)
if key in list(df2["KEY"]):
df1.loc[k, "value 2"] = df2.loc[df2["KEY"] == key, "value_alternative"].item()
else:
df1.loc[k, "value 2"] = df1.iloc[k]["value"]
and then everything works fine, but I dont understand why the previous method don't work. What is the easiest way to change value in dataframe in a loop?

First of all. Don't use a for loop with dataframes if you really really have to.
Just use a boolean array to filter your dataframe with loc and assign your values that way.
You can do what you want with a simple merge.
df1 = df1.merge(df2, on='KEY', how='left').rename(columns={'value_alternative': 'value 2'})
df1.loc[df1['value 2'].isna(), 'value 2'] = df1['value']
Reason for iloc not working with assignment is in pandas you can't set a value in a copy of a dataframe. Pandas does this in order to work fast. To have access to the underlying data you need to use loc for filtering. Don't forget loc and iloc do different things. loc looks at the lables of the index while iloc looks at the index number.
In order for this to work you also have to delete the
df1["value 2"] = "nothing"
line from your program

Related

Calling a list of df column names to a function based on dictionary keys

I would like to call a pd.dataframe object but only the objects that are the ones in the key of a dictionary. I have multiple excel template files and they column names vary causing for the need of removal of certain column names. For reproducible reason i attached a sample below.
import pandas as pd
filename='template'
data= [['Auto','','','']]
df= pd.DataFrame(data,columns=['industry','System_Type__c','AccountType','email'])
valid= {'industry': ['Automotive'],
'SME Vertical': ['Agriculture'],
'System_Type__c': ['Access'],
'AccountType': ['Commercial']}
valid={k:v for k, v in valid.items() if k in df.columns.values}
errors= {}
errors[filename]={}
df1= df[['industry','System_Type__c','AccountType']]
mask = df1.apply(lambda c: c.isin(valid[c.name]))
df1.mask(mask|df1.eq(' ')).stack()
for err_i, (r, v) in enumerate(df1.mask(mask|df1.eq(' ')).stack().iteritems()):
errors[filename][err_i] = {"row": r[0],
"column": r[1],
"message": v + " is invalid check column " + r[1] + ' and replace with a standard value'}
I would like df1 to be a variable to a more dynamic list of df.DataFrame objects
how would I replace this piece of code to be more dynamic?
df1= df[['industry','System_Type__c','AccountType', 'SME Vertical']]
#desired output would drop SME Vertical since it is not a df column
df1= df[['industry','System_Type__c','AccountType']]
# list of the dictionary returns the keys
# you then filter the DF based on it and assign to DF1
df1=df[list(valid)]
df1
industry System_Type__c AccountType
0 Auto

python define values for each data-frame if they meet a condition

i have 5 different data frames that are output of different conditions or tables.
i want to have an output if these data-frames are empty or not. basically i will define with len(df) each data frame and will pass a string if they have anything in them.
def(df1,df2,df3,df4,df5)
if len(df1) > 0:
"df1 not empty"
else: ""
if len(df2) > 0:
"df2 not empty"
else: ""
then i want to append these string to each other and will have a string like
**df1 not empty, df3 not empty**
try this :
import pandas as pd
dfs = {'milk': pd.DataFrame(['a']), 'bread': pd.DataFrame(['b']), 'potato': pd.DataFrame()}
print(''.join(
[f'{name} not empty. ' for name, df in dfs.items() if (not df.empty)])
)
output:
milk not empty. bread not empty.
data = [1,2,3]
df = pd.DataFrame(data, columns=['col1']) #create not empty df
data1 = []
df1 = pd.DataFrame(data) #create empty df
dfs = [df, df1] #list them
#the "for loop" is replaced here by a list comprehension
#I used enumerate() to attribute an index to each df in the list of dfs, because otherwise in the print output if you call directly df0 or df1 it will print th entire dataframe, not only his name
print(' '.join([f'df{i} is not empty.' for i,df in enumerate(dfs) if not df.empty]))
Result:
df0 is not empty. df1 is not empty.
With a one-liner:
dfs = [df1,df2,df3,df4,df5]
output = ["your string here" for df in dfs if not df.empty]
You can then concatenate strings together, if you want:
final_string = "; ".join(output)

Turning dictionary key into an element of dataframe

My code can get the job done but I know it is not a good way to handle it.
The input is thisdict and the output is shown at the end.
Can you help to make it more efficient?
import pandas as pd
thisdict = {
"A": {'v1':'3','v2':5},
"B": {'v1':'77','v2':99},
"ZZ": {'v1':'311','v2':152}
}
output=pd.DataFrame()
for key, value in thisdict.items():
# turn value to df
test2 =pd.DataFrame(value.items(), columns = ['item','value'])
test2['id'] = key
#transpose
test2 = test2.pivot(index='id',columns='item', values = 'value')
#concat
output=pd.concat([output,test2])
output
You can use:
output = pd.DataFrame.from_dict(thisdict, orient='index')
or
output = pd.DataFrame(thisdict).T
and if you wish, rename the index by:
output.index.rename('id', inplace=True)

how to convert a column contaiting lists of nested dictionaries into dataframe?

I have a column that has lists of dictionaries:
The dictionaries have three things, createdBy , CreatedAt, Notes of different candidates. I need to get them in separate columns. I need to open this candidatesNotes into three columns that is inside dictionary. I have tried the following but this is creating 6 columns
. I need only 3. The nested candidateNotes column looks like this:
. There is data and time in the candidatenotes dictionary, using time and date for each candidate id(that is in separate column) I need to update the status in separate columns, may be looking like this:
df2 = pd.DataFrame.from_records(df1['candidateNotes']).add_prefix('s')
df2 = df2.s1.apply(pd.Series).add_prefix('') \
.merge(df2, left_index = True, right_index = True)
df2 = df2.s0.apply(pd.Series).add_prefix('') \
.merge(df2, left_index = True, right_index = True)
here is the error screenshot #qaiser :
import pandas as pd
#creating dataframe which contain dictionary as row
cf = pd.DataFrame([[[{'createdat':'abc','createdby':'asas','createdon':'lklj'}]],[[{'createdat':'aaaaa','createdby':'asas','createdon':'lklj'}]]])
df = pd.DataFrame()
for i in range(cf.shape[0]):
df = df.append(pd.DataFrame([cf[0][i][0]]), ignore_index = True)
cf = pd.DataFrame([[[{ "createdAt": "2019-07-09T21:47:59.748Z", "notes": "Candidate initial submission.", "createdBy": "Akash D" }]],[[ { "createdAt": "2019-07-09T21:47:59.748Z", "note": "Candidate initial submission.", "createdBy": "Akash D","demo":"abc" } ]]], columns=['CandidateNotes'])
print(cf)
df = pd.DataFrame()
for i in range(cf.shape[0]):
try:
df = df.append(pd.DataFrame([cf['CandidateNotes'][i][0]]), ignore_index = True)
except:
print(i)
df

Rename variously formatted column headers in pandas

I'm working on a small tool that does some calculations on a dataframe, let's say something like this:
df['column_c'] = df['column_a'] + df['column_b']
for this to work the dataframe need to have the columns 'column_a' and 'column_b'. I would like this code to work if the columns are named slightly different named in the import file (csv or xlsx). For example 'columnA', 'Col_a', ect).
The easiest way would be renaming the columns inside the imported file, but let's assume this is not possible. Therefore I would like to do some think like this:
if column name is in list ['columnA', 'Col_A', 'col_a', 'a'... ] rename it to 'column_a'
I was thinking about having a dictionary with possible column names, when a column name would be in this dictionary it will be renamed to 'column_a'. An additional complication would be the fact that the columns can be in arbitrary order.
How would one solve this problem?
I recommend you formulate the conversion logic and write a function accordingly:
lst = ['columnA', 'Col_A', 'col_a', 'a']
def converter(x):
return 'column_'+x[-1].lower()
res = list(map(converter, lst))
['column_a', 'column_a', 'column_a', 'column_a']
You can then use this directly in pd.DataFrame.rename:
df = df.rename(columns=converter)
Example usage:
df = pd.DataFrame(columns=['columnA', 'col_B', 'c'])
df = df.rename(columns=converter)
print(df.columns)
Index(['column_a', 'column_b', 'column_c'], dtype='object')
Simply
for index, column_name in enumerate(df.columns):
if column_name in ['columnA', 'Col_A', 'col_a' ]:
df.columns[index] = 'column_a'
with dictionary
dico = {'column_a':['columnA', 'Col_A', 'col_a' ], 'column_b':['columnB', 'Col_B', 'col_b' ]}
for index, column_name in enumerate(df.columns):
for name, ex_names in dico:
if column_name in ex_names:
df.columns[index] = name
This should solve it:
df=pd.DataFrame({'colA':[1,2], 'columnB':[3,4]})
def rename_df(col):
if col in ['columnA', 'Col_A', 'colA' ]:
return 'column_a'
if col in ['columnB', 'Col_B', 'colB' ]:
return 'column_b'
return col
df = df.rename(rename_df, axis=1)
if you have the list of other names like list_othername_A or list_othername_B, you can do:
for col_name in df.columns:
if col_name in list_othername_A:
df = df.rename(columns = {col_name : 'column_a'})
elif col_name in list_othername_B:
df = df.rename(columns = {col_name : 'column_b'})
elif ...
EDIT: using the dictionary of #djangoliv, you can do even shorter:
dico = {'column_a':['columnA', 'Col_A', 'col_a' ], 'column_b':['columnB', 'Col_B', 'col_b' ]}
#create a dict to rename, kind of reverse dico:
dict_rename = {col:key for key in dico.keys() for col in dico[key]}
# then just rename:
df = df.rename(columns = dict_rename )
Note that this method does not work if in df you have two columns 'columnA' and 'Col_A' but otherwise, it should work as rename does not care if any key in dict_rename is not in df.columns.

Categories

Resources