Python Pandas concatenate dataframes and rename index - python

I have three dataframes, all of which have 50 columns and one row. The same column names are used in each dataframe, and the single row is always indexed as 0. I'm trying to concatenate them to make viewing and comparing the data easier.
features = pd.concat([raw_features, fea_features, transformed_features], axis=0)
Now I want to rename the rows. I've tried several things including:
features = pd.concat([raw_features, fea_features, transformed_features], axis=0).reindex(['Raw_pulltest', 'FEA', 'Transformed_pulltest'])
which gives the error cannot reindex from a duplicate axis
and
features = pd.concat([raw_features, fea_features, transformed_features], axis=0).reset_index().reindex(['Raw_pulltest', 'FEA', 'Transformed_pulltest'])
which gives me the structure I want, except all values are now nan.
Please can you help me rename the index on the concatenated dataframe?

Use keys parameter in pd.concat:
Try this:
pd.concat([raw_features, fea_features, transformed_features],
axis=0, keys=['Raw_pulltest', 'FEA', 'Transformed_pulltest'])\
.reset_index(level=1, drop=True)
Example:
d1 = pd.DataFrame([[1,1,1]],index=[0])
d2 = pd.DataFrame([[2,2,2]],index=[0])
d3 = pd.DataFrame([[3,3,3]], index=[0])
pd.concat([d1,d2,d3],axis=0, keys=['d1','d2','d3']).reset_index(level=1, drop=True)
Output:
0 1 2
d1 1 1 1
d2 2 2 2
d3 3 3 3

Related

add/combine columns after searching in a DataFrame

I'm trying to copy data from different columns to a particular column in the same DataFrame.
Index
col1A
col2A
colB
list
CT
CW
CH
0
1
:
1
b
2
2
3
3d
But prior to that I wanted to search if those columns(col1A,col2A,colB) exist in the DataFrame and group those columns which are present and move the grouped data to relevant columns(CT,CH,etc) like,
CH
CW
CT
0
1
1
1
b
b
2
2
2
3
3d
3d
I did,
col_list1 = ['ColA','ColB','ColC']
test1 = any([ i in df.columns for i in col_list1 ])
if test1==True:
df['CH'] = df['Col1A'] +df['Col2A']
df['CT'] = df['ColB']
this code is throwing me a keyerror
.
I want it to ignore columns that are not present and add only those that are present
IIUC, you can use Python set or Series.isin to find the common columns
cols = list(set(col_list1) & set(df.columns))
# or
cols = df.columns[df.columns.isin(col_list1)]
df['CH'] = df[cols].sum(axis=1)
Instead of just concatenating the columns with +, collect them into a list and use sum with axis=1:
df['CH'] = np.sum([df[c] for c in cl if c in df], axis=1)

Add multi level column to dataframe

At the beginning, I'd like to add a multilevel column to an empty dataframe.
df = pd.DataFrame({"nodes": list(range(1, 5, 2))})
df.set_index("nodes", inplace=True)
So this is the dataframe to start with (still empty):
>>> df
nodes
1
3
Now I'd like to a first multilevel column.
I tried the following:
new_df = pd.DataFrame.from_dict(dict(zip(df.index, [1,2])), orient="index",
columns=["value"])
df = pd.concat([new_df], axis=1, keys=["test"])
Now the dataframe df looks like this:
>>> df
test
value
1 1
3 2
To add another column, i've done something similar.
new_df2 = pd.DataFrame.from_dict(dict(zip(df.index, [3,4])), orient="index",
columns=[("test2", "value2")])
df = pd.concat([df, new_df2], axis=1)
df.index.name = "nodes"
So the desired dataframe looks like this:
>>> df
test test2
nodes value value2
1 1 3
3 2 4
This way of adding multilevel columns seems a bit strange. Is there a better way of doing so?
Create a MultIndex on the columns by storing your DataFrames in a dict then concat along axis=1. The keys of the dict become levels of the column MultiIndex (if you use tuples it adds multiple levels depending on the length, scalar keys add a single level) and the DataFrame columns stay as is. Alignment is enforced on the row Index.
import pandas as pd
d = {}
d[('foo', 'bar')] = pd.DataFrame({'val': [1,2,3]}).rename_axis(index='nodes')
d[('foo2', 'bar2')] = pd.DataFrame({'val2': [4,5,6]}).rename_axis(index='nodes')
d[('foo2', 'bar1')] = pd.DataFrame({'val2': [7,8,9]}).rename_axis(index='nodes')
pd.concat(d, axis=1)
foo foo2
bar bar2 bar1
val val2 val2
nodes
0 1 4 7
1 2 5 8
2 3 6 9

Is there a Python function to repeat string patterns to get faster multiple columns in different dataframes

I'm new in Python, I need to get many variables in multiple dataframes:
I wrote this code but I need a long time to configure it for many excersises.
This is the code:
import pandas as pd
df = pd.concat([df1[df1.columns[0]], df2[df1.columns[0]], df1[df1.columns[1]],
df2[df1.columns[1]], df1[df1.columns[2]], df2[df1.columns[2]],
df1[df1.columns[3]], df2[df1.columns[3]], df1[df1.columns[4]],
df2[df1.columns[4]], df1[df1.columns[5]], df2[df1.columns[5]],
df1[df1.columns[6]], df2[df1.columns[6]]], axis=1)
The number of dataframes and columns can be much bigger. Thanks.
It looks like what you're trying to do is: for all of the columns in one dataframe, combine the columns from that dataframe with those from another with the same columns, into a single dataframe with two of every column in the same original order.
In your case:
df1 = DataFrame([['a','b','c'], ['d','e','f']])
df2 = DataFrame([['g','h','i'], ['j','k','l']])
df = concat([s for ss in [(df1[c], df2[c]) for c in df1.columns] for s in ss], axis=1)
print(df)
Result:
0 0 1 1 2 2
0 a g b h c i
1 d j e k f l

Sum of columns from two data frames that contain float values

I have two data frames.
The columns name are the same of those data frames.
I want to sum the float values of the same columns from dataframes
Then I can use
df3 = df1.add(df2)
However, my dataframes contain two colums of string. These strings are added too.
How can I wrtie the code not to add the string but to add the float in two data frames
The two sample dataframes are as follow:
df1 = pd.DataFrame(dict(Team=['A','B','C','D'],Value=[1,2,3,4]),index=[0,1,2,3])
df2 = pd.DataFrame(dict(Team=['A','B','C','D'],Value=[3,1,2,4]),index=[0,1,2,3])
When I used df3 = df1.add(df2)
it also added the string in column "Team" as follow:
Team Value
0 AA 4
1 BB 3
2 CC 5
3 DD 8
How can I write code without adding the Team but the Value.
Thanks,
Zep
Use the team names as indices instead of integer indices:
In [2]: df1 = pd.DataFrame(dict(Team=['A','B','C','D'],Value=[1,2,3,4])).set_index('Team')
...: df2 = pd.DataFrame(dict(Team=['A','B','C','D'],Value=[3,1,2,4])).set_index('Team')
In [3]: df1 + df2
Out[3]:
Value
Team
A 4
B 3
C 5
D 8
In case you have multiple other columns, just sum the columns:
total = df1['Value'] + df2['Value']
If, in addition, you need a dataframe of the same shape as df1 and df2 with Value replaced by the sum, you can do
df3 = df1.copy()
df3['Value'] = total

Added column to existing dataframe but entered all numbers as NaN

So I created two dataframes from existing CSV files, both consisting of entirely numbers. The second dataframe consists of an index from 0 to 8783 and one column of numbers and I want to add it on as a new column to the first dataframe which has an index consisting of a month, day and hour. I tried using append, merge and concat and none worked and then tried simply using:
x1GBaverage['Power'] = x2_cut
where x1GBaverage is the first dataframe and x2_cut is the second. When I did this it added x2_cut on properly but all the values were entered as NaN instead of the numerical values that they should be. How should I be approaching this?
x1GBaverage['Power'] = x2_cut.values
problem solved :)
The thing about pandas is that values are implicitly linked to their indices unless you deliberately specify that you only need the values to be transferred over.
If they're the same row counts and you just want to tack it on the end, the indexes either need to match, or you need to just pass the underlying values. In the example below, columns 3 and 5 are the index matching & value versions, and 4 is what you're running into now:
In [58]: df = pd.DataFrame(np.random.random((3,3)))
In [59]: df
Out[59]:
0 1 2
0 0.670812 0.500688 0.136661
1 0.185841 0.239175 0.542369
2 0.351280 0.451193 0.436108
In [61]: df2 = pd.DataFrame(np.random.random((3,1)))
In [62]: df2
Out[62]:
0
0 0.638216
1 0.477159
2 0.205981
In [64]: df[3] = df2
In [66]: df.index = ['a', 'b', 'c']
In [68]: df[4] = df2
In [70]: df[5] = df2.values
In [71]: df
Out[71]:
0 1 2 3 4 5
a 0.670812 0.500688 0.136661 0.638216 NaN 0.638216
b 0.185841 0.239175 0.542369 0.477159 NaN 0.477159
c 0.351280 0.451193 0.436108 0.205981 NaN 0.205981
If the row counts differ, you'll need to use df.merge and let it know which columns it should be using to join the two frames.

Categories

Resources