I have 3 lists of data frames and I want to add a suffix to each column according to whether it belongs to a certain list of data frames. its all in order, so the first item in the suffix list should be appended to the columns of data frames in the first list of data frames etc. I am trying here but its adding each item in the suffix list to each column.
In the expected output
all columns in dfs in cat_a need group1 appended
all columns in dfs in cat_b need group2 appended
all columns in dfs in cat_c need group3 appended
data and code are here
df1, df2, df3, df4 = (pd.DataFrame(np.random.randint(0,10,size=(10, 2)), columns=('a', 'b')),
pd.DataFrame(np.random.randint(0,10,size=(10, 2)), columns=('c', 'd')),
pd.DataFrame(np.random.randint(0,10,size=(10, 2)), columns=('e', 'f')),
pd.DataFrame(np.random.randint(0,10,size=(10, 2)), columns=('g', 'h')))
cat_a = [df1, df2]
cat_b = [df3, df4, df2]
cat_c = [df1]
suffix =['group1', 'group2', 'group3']
dfs = [cat_a, cat_b, cat_c]
for x, y in enumerate(dfs):
for i in y:
suff=suffix
i.columns = i.columns + '_' + suff[x]
thanks for taking a look!
Brian Joseph's answer is great*, but I'd like to point out that you were very close, you just weren't renaming the columns correctly. Your last line should be like this:
i.columns = [col + '_' + suff[x] for col in i.columns]
instead of this:
i.columns = i.columns + '_' + suff[x]
Assuming you want to have multiple suffixes for some dataframes, I think this is what you want?:
suffix_mapper = {
'group1': [df1, df2],
'group2': [df3, df4, df2],
'group3': [df1]
}
for suffix, dfs in suffix_mapper.items():
for df in dfs:
df.columns = [f"{col}_{suffix}" for col in df.columns]
I think the issue is because you're not taking a copy of the dataframe so each cat dataframe is referencing a df dataframe multiple times.
Try:
cat_a = [df1.copy(), df2.copy()]
cat_b = [df3.copy(), df4.copy(), df2.copy()]
cat_c = [df1.copy()]
Related
I just asked a similar question rename columns according to list which has a correct answer for how to add suffixes to column names correctly. But i have a new issue. I want to rename the actual index name for the columns per dataframe. I have three lists of data frames (some of the data frames contain duplicate column index names (and actual data frame names as well - but thats not the issue, the issue is the duplicated original column.names). I simply want to append a suffix to each dataframe.column.name within each list, with a name in the suffix list, based on its numeric order.
here is an example of the data and the output i would like:
# add string to end of x in list of dfs
df1, df2, df3, df4 = (pd.DataFrame(np.random.randint(0,10,size=(10, 2)), columns=('a', 'b')),
pd.DataFrame(np.random.randint(0,10,size=(10, 2)), columns=('c', 'd')),
pd.DataFrame(np.random.randint(0,10,size=(10, 2)), columns=('e', 'f')),
pd.DataFrame(np.random.randint(0,10,size=(10, 2)), columns=('g', 'h')))
df1.columns.name = 'abc'
df2.columns.name = 'abc'
df3.columns.name = 'efg'
df4.columns.name = 'abc'
cat_a = [df2, df1]
cat_b = [df3, df2, df1]
cat_c = [df1]
dfs = [cat_a, cat_b, cat_c]
suffix = ['group1', 'group2', 'group3']
# expected output =
#for df in cat_a: df.columns.name = df.columns.name + 'group1'
#for df in cat_b: df.columns.name = df.columns.name + 'group2'
#for df in cat_c: df.columns.name = df.columns.name + 'group3'
and here is some code that i have written that doesn't work - where df.column.names are duplicated across data frames, multiple suffixes are appended
for x, df in enumerate(dfs):
for i in df:
n = ([(i.columns.name + '_' + str(suffix[x])) for out in i.columns.name])
i.columns.name=n[x]
thank you for looking, i really appreciate it
Your current code is not working as you have multiple references to the same df in your lists, so only the last change matters. You need to make copies.
Assuming you want to change the columns index name for each df in dfs, you can use a list comprehension:
dfs = [[d.rename_axis(suffix[i], axis=1) for d in group]
for i,group in enumerate(dfs)]
output:
>>> dfs[0][0]
group1 c d
0 5 0
1 9 3
2 3 9
3 4 2
4 1 0
5 7 6
6 5 2
7 8 0
8 1 2
9 7 2
If you have two Pandas dataframes in Python with identical axes, is there a function to merge the elements as tuples so that they maintain their positions? If there is a better way to combine these dataframes without duplicating the number of indices or columns, that works as well.
Expected logic:
You can do this in pure pandas:
(pd.concat([df1,df2])
.stack()
.groupby(level=[0,1])
.apply(tuple)
.unstack()
)
Output:
A B
0 (1, 7) (4, 10)
1 (2, 8) (5, 11)
2 (3, 9) (6, 12)
Input:
import pandas as pd
df1 = pd.DataFrame({"A":[1,2,3],"B":[4,5,6]})
df2 = pd.DataFrame({"A":[7,8,9],"B":[10,11,12]})
The operation you're looking for seems like "zip". That is, match elements of two sequences together into a sequence of tuples. If you look at each column in your dataframes and zip them together you will have a result that is a list of lists of tuples - what you want to be in your result dataframe. You can then construct a dataframe with the same columns and index out of that data. In code, that looks like this:
data = [list(zip(df1[col], df2[col])) for col in df1]
pd.DataFrame(data, index=[1,2,3], columns=["A", "B", "C"])
You can maybe use something like this to achieve what you want.
df3 = pd.DataFrame({x: zip(df1[x], df2[x]) for x in df1.columns})
df1 = pd.DataFrame({"A" : [1,2,3], "B":[4,5,6]})
df2 = pd.DataFrame({"A" : [7,8,9], "B":[10,11,12]})
def add_dfs(df1, df2):
for col in df1.columns:
df1[col] = df1[col].apply(lambda x: (x,))
for col in df2.columns:
df2[col] = df2[col].apply(lambda x: (x,))
df = df1 + df2 # using + operator , satisfies answer technically
return df
df = add_dfs(df1, df2)
I have 4 different dfs.
Example of column names:
df1 = a1, b1, c1, d1
df2 = b1, c1, e1
df3 = a1, b1, c1
And I created dicrionary like this:
dict = {a1:art1, b1:base1, c1:cell1, d1:dan1, e1:el1}
It's possible to rename column not using for loop? I mean, I tried to do that by rename function in for loop, but then I need to loop through all dataframe and it's not looks good in code and I think it's not that fast as should be.
My excpected result is:
df1 = art1, base1, cell1, dan1
df2 = base1, cell1, el1
df3 = art1, base1, cell1
And I found some answers on stack but nothing fit to my problem, where I have one dictionary and few df with not unique columns names.
A comprehension:
d = {'a1': 'art1', 'b1': 'base1', 'c1': 'cell1', 'd1': 'dan1', 'e1': 'el1'}
df1, df2, df3 = [df.rename(columns=d) for df in [df1, df2, df3]]
A simple loop:
for df in [df1, df2, df3]:
df.rename(columns=d, inplace=True)
Using map:
df1, df2, df3 = list(map(lambda df: df.rename(columns=d), [df1, df2, df3]))
At the end, IMHO the simple loop with inplace=True is the most elegant.
Basically, I have 5 pd.dataframes, named= df0, df1, df2, df3, df4. What I would like to do is use a for loop to add data to these 5 dataframes. Something the likes of:
for i, dataset in enumerate([df0,df1,df2,df3,df4]):
dataset = pd.concat([dataset, NEW_DATA])
However, when you do it like this (or when you use a solely list instead of enumerate), 'dataset' returns the dataset, rather than the name (i.e. df0). How can I solve this. For example, the output for the second iteration should be:
for i, dataset in enumerate([df0,df1,df2,df3,df4]):
df1 = pd.concat([df1, NEW_DATA])
edit: I have also tried dictionaries, such as {'df0':df0... etc}, however, it again prints the dataset rather than the dataset 'variable name'.
You can re-assign the new df into your list:
# setup example
df0 = pd.DataFrame(np.random.randint(0, 10, (3, 2)))
df1 = pd.DataFrame(np.random.randint(0, 10, (3, 2)))
df2 = pd.DataFrame(np.random.randint(0, 10, (3, 2)))
# then
lst = [df0, df1, df2]
for i, df in enumerate(lst):
newdata = pd.DataFrame([[0,0], [0,0]]) # (say)
lst[i] = df.append(newdata)
df0, df1, df2 = lst
>>> df0
0 1
0 8 7
1 9 1
2 5 6
0 0 0
1 0 0
But, BTW, it might be better to store your DataFrames collection in a dict instead of a list, if you want to refer to them by name instead of by index.
Edit: Rewriting the solution to provide some proper practice.
So the problem is you have a bunch of values that need to be updated through reassignment. There's a stylistic thing going on where if you have df1, df2, ..., maybe you'd much rather have them in a list.
Using a list in any case is also how I'd address the issue.
dfs = [df0, df1, df2, ...]
dfs = [pd.concat([df, NEW_DATA]) for df in dfs]
[df0, df1, df2, ...] = dfs
See how, if you'd just use dfs in general and refer to dfs[0] instead of df0, this solution could almost come for free?
I have several df with the same structure. I'd like to create a loop to melt them or create a pivot table.
I tried the following but are not working
my_df = [df1, df2, df3]
for df in my_df:
df = pd.melt(df, id_vars=['A','B','C'], value_name = 'my_value')
for df in my_df:
df = pd.pivot_table(df, values = 'my_value', index = ['A','B','C'], columns = ['my_column'])
Any help would be great. Thank you in advance
You need assign output to new list of DataFrames:
out = []
for df in my_df:
df = pd.melt(df, id_vars=['A','B','C'], value_name = 'my_value')
out.append(df)
Same idea in list comprehension:
out = [pd.melt(df, id_vars=['A','B','C'], value_name = 'my_value') for df in my_df]
If need overwitten origional values in list:
for i, df in enumerate(my_df):
df = pd.melt(df, id_vars=['A','B','C'], value_name = 'my_value')
my_df[i] = df
print (my_df)
If need overwrite variables df1, df2, df3:
df1, df2, df3 = [pd.melt(df, id_vars=['A','B','C'], value_name = 'my_value') for df in my_df]