Renaming columns of dataframe list in Pandas - python

I have a list with lots of dataframes
col = ['open', 'high', 'low', 'close']
index = [1, 2, 3, 4]
df1 = pd.DataFrame(columns=col, index=index)
df2 = pd.DataFrame(columns=col, index=index)
df3 = pd.DataFrame(columns=col, index=index)
dflist = [df1, df2, df3]
I need to rename all the columns of all the dataframes in the list. I need to add the name of each dataframe to the name of each column. I tried to do it with a for loop.
for key in dflist:
key.rename(columns=lambda x: key+x)
Obviously, this is not working. The desired output would be:
In [1]: df1.columns.tolist()
Out [2]: ['df1open', 'df1high', 'df1low', 'df1close']
In [3]: df2.columns.tolist()
Out [4]: ['df2open', 'df2high', 'df2low', 'df2close']
In [5]: df3.columns.tolist()
Out [6]: ['df3open', 'df3high', 'df3low', 'df3close']
Thanks for your help.

You want to use a dict instead of a list to store the DataFrames, if you need to somehow access their "names" and manipulate them programmatically (think when you have thousands of them). Also note the use of the inplace argument, which is common in pandas:
import pandas as pd
col = ['open', 'high', 'low', 'close']
index = [1, 2, 3, 4]
df_all = {'df1': pd.DataFrame(columns=col, index=index),
'df2': pd.DataFrame(columns=col, index=index),
'df3': pd.DataFrame(columns=col, index=index)}
for key, df in df_all.iteritems():
df.rename(columns=lambda x: key+x, inplace=True)
print df_all['df1'].columns.tolist()
Output:
['df1open', 'df1high', 'df1low', 'df1close']

There are a couple of issues here. Firstly, dflist is the list of DataFrames, as opposed to the names of those DataFrames. So df1 is not the same as "df1", which means that key + x isn't a string concatenation.
Secondly, the rename() function returns a new DataFrame. So you have to pass the inplace=True parameter to overwrite the existing column names.
Try this instead:
dflist = ['df1', 'df2', 'df3']
for key in dflist:
df = eval(key)
df.rename(columns=lambda x: key+x, inplace=True)

Related

Assign specific value from a column to specific number of rows

I would like to assign agent_code to specific number of rows in df2.
df1
df2
Thank you.
df3 (Output)
First make sure in both DataFrames is default index by DataFrame.reset_index with drop=True, then repeat agent_code, convert to default index and last use concat:
df1 = df1.reset_index(drop=True)
df2 = df2.reset_index(drop=True)
s = df1['agent_code'].repeat(df1['number']).reset_index(drop=True)
df3 = pd.concat([df2, s], axis=1)

rename columns according to list

I have 3 lists of data frames and I want to add a suffix to each column according to whether it belongs to a certain list of data frames. its all in order, so the first item in the suffix list should be appended to the columns of data frames in the first list of data frames etc. I am trying here but its adding each item in the suffix list to each column.
In the expected output
all columns in dfs in cat_a need group1 appended
all columns in dfs in cat_b need group2 appended
all columns in dfs in cat_c need group3 appended
data and code are here
df1, df2, df3, df4 = (pd.DataFrame(np.random.randint(0,10,size=(10, 2)), columns=('a', 'b')),
pd.DataFrame(np.random.randint(0,10,size=(10, 2)), columns=('c', 'd')),
pd.DataFrame(np.random.randint(0,10,size=(10, 2)), columns=('e', 'f')),
pd.DataFrame(np.random.randint(0,10,size=(10, 2)), columns=('g', 'h')))
cat_a = [df1, df2]
cat_b = [df3, df4, df2]
cat_c = [df1]
suffix =['group1', 'group2', 'group3']
dfs = [cat_a, cat_b, cat_c]
for x, y in enumerate(dfs):
for i in y:
suff=suffix
i.columns = i.columns + '_' + suff[x]
thanks for taking a look!
Brian Joseph's answer is great*, but I'd like to point out that you were very close, you just weren't renaming the columns correctly. Your last line should be like this:
i.columns = [col + '_' + suff[x] for col in i.columns]
instead of this:
i.columns = i.columns + '_' + suff[x]
Assuming you want to have multiple suffixes for some dataframes, I think this is what you want?:
suffix_mapper = {
'group1': [df1, df2],
'group2': [df3, df4, df2],
'group3': [df1]
}
for suffix, dfs in suffix_mapper.items():
for df in dfs:
df.columns = [f"{col}_{suffix}" for col in df.columns]
I think the issue is because you're not taking a copy of the dataframe so each cat dataframe is referencing a df dataframe multiple times.
Try:
cat_a = [df1.copy(), df2.copy()]
cat_b = [df3.copy(), df4.copy(), df2.copy()]
cat_c = [df1.copy()]

python pandas loops to melt or pivot multiple df

I have several df with the same structure. I'd like to create a loop to melt them or create a pivot table.
I tried the following but are not working
my_df = [df1, df2, df3]
for df in my_df:
df = pd.melt(df, id_vars=['A','B','C'], value_name = 'my_value')
for df in my_df:
df = pd.pivot_table(df, values = 'my_value', index = ['A','B','C'], columns = ['my_column'])
Any help would be great. Thank you in advance
You need assign output to new list of DataFrames:
out = []
for df in my_df:
df = pd.melt(df, id_vars=['A','B','C'], value_name = 'my_value')
out.append(df)
Same idea in list comprehension:
out = [pd.melt(df, id_vars=['A','B','C'], value_name = 'my_value') for df in my_df]
If need overwitten origional values in list:
for i, df in enumerate(my_df):
df = pd.melt(df, id_vars=['A','B','C'], value_name = 'my_value')
my_df[i] = df
print (my_df)
If need overwrite variables df1, df2, df3:
df1, df2, df3 = [pd.melt(df, id_vars=['A','B','C'], value_name = 'my_value') for df in my_df]

Changing pandas dataframe by reference

I have two large DataFrames that I don't want to make copies of, but want to apply the same change to. How can I do this properly? For example, this is similar to what I want to do, but on a smaller scale. This only creates the temporary variable df that gives the result of each DataFrame, but I want both DataFrames to be themselves changed:
import pandas as pd
df1 = pd.DataFrame({'a':[1,2,3]})
df2 = pd.DataFrame({'a':[0,1,5,7]})
for df in [df1, df2]:
df = df[df['a'] < 3]
We can do query with inplace
df1 = pd.DataFrame({'a':[1,2,3]})
df2 = pd.DataFrame({'a':[0,1,5,7]})
for df in [df1, df2]:
df.query('a<3',inplace=True)
df1
a
0 1
1 2
df2
a
0 0
1 1
Don't think this is the best solution, but should do the job.
import pandas as pd
df1 = pd.DataFrame({'a':[1,2,3]})
df2 = pd.DataFrame({'a':[0,1,5,7]})
dfs = [df1, df2]
for i, df in enumerate(dfs):
dfs[i] = df[df['a'] < 3]
dfs[0]
a
0 1
1 2

Python panda search for value in a df from another df

I’ve got two data frames :-
Df1
Time V1 V2
02:00 D3F3 0041
02:01 DD34 0040
Df2
FileName V1 V2
1111.txt D3F3 0041
2222.txt 0000 0040
Basically I want to compare the v1 v2 columns and if they match print the row time from df1 and the row from df2 filename. So far all i can find is the
isin()
, which simply gives you a boolean output.
So the output would be :
1111.txt 02:00
I started using dataframes because i though i could query the two df's on the V1 / V2 values but I can't see a way. Any pointers would be much appreciated
Use merge on the dataframe columns that you want to have the same values. You can then drop the rows with NaN values, as those will not have matching values. From there, you can print the merged dataframes values however you see fit.
df1 = pd.DataFrame({'Time': ['8a', '10p'], 'V1': [1, 2], 'V2': [3, 4]})
df2 = pd.DataFrame({'fn': ['8.txt', '10.txt'], 'V1': [3, 2], 'V2': [3, 4]})
df1.merge(df2, on=['V1', 'V2'], how='outer').dropna()
=== Output: ===
Time V1 V2 fn
1 10p 2 4 10.txt
The most intuitive solution is:
1) iterate the V1 column in DF1;
2) for each item in this column, check if this item exists in the V1 column of DF2;
3) if the item exists in DF2's V1, then find the index of that item in the DF2 and then you would be able to find the file name.
You can try using pd.concat.
On this case it would be like:
pd.concat([df1, df2.reindex(df1.index)], axis=1)
It will create a new dataframe with all the values, but in case there are some values that doesn't match in both dataframes, it'll return NaN. If you doesn't want this to happen you must use this:
pd.concat([df1, df4], axis=1, join='inner')
If you wanna learn a bit more, use pydata: https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html
You can use merge option with inner join
df2.merge(df1,how="inner",on=["V1","V2"])[["FileName","Time"]]
While I think Eric's solution is more pythonic, if your only aim is to print the rows on which df1 and df2 have v1 and v2 values the same, provided the two dataframes are of the same length, you can do the following:
for row in range(len(df1)):
if (df1.iloc[row,1:] == df2.iloc[row,1:]).all() == True:
print(df1.iloc[row], df2.iloc[row])
Try this:
client = boto3.client('s3')
obj = client.get_object(Bucket='', Key='')
data = obj['Body'].read()
df1 = pd.read_excel(io.BytesIO(data), sheet_name='0')
df2 = pd.read_excel(io.BytesIO(data), sheet_name='1')
head = df2.columns[0]
print(head)
data = df1.iloc[[8],[0]].values[0]
print(data)
print(df2)
df2.columns = df2.iloc[0]
df2 = df2.drop(labels=0, axis=0)
df2['Head'] = head
df2['ID'] = pd.Series([data,data])
print(df2)
df2.to_csv('test.csv',index=False)

Categories

Resources