How Can I combine two columns is one dataframe? - python

I have a dataset like this.
A B C A2
1 2 3 4
5 6 7 8
and I want to combine A and A2.
A B C
1 2 3
5 6 7
4
8
how can I combine two columns?
Hope for help. Thank you.

I don't think it is possible directly. But you can do it with a few lines of code:
df = pd.DataFrame({'A':[1,5],'B':[2,6],'C':[3,7],'A2':[4,8]})
df_A2 = df[['A2']]
df_A2.columns = ['A']
df = pd.concat([df.drop(['A2'],axis=1),df_A2])
You will get this if you print df:
A B C
0 1 2.0 3.0
1 5 6.0 7.0
0 4 NaN NaN
1 8 NaN NaN

You could append the last columns after renaming it:
df.append(df[['A2']].set_axis(['A'], axis=1)).drop(columns='A2')
it gives as expected:
A B C
0 1 2.0 3.0
1 5 6.0 7.0
0 4 NaN NaN
1 8 NaN NaN

if the index is not important to you:
import pandas as pd
pd.concat([df[['A','B','C']], df[['A2']].rename(columns={'A2': 'A'})]).reset_index(drop=True)

Related

How to merge two tables while preserving all values?

I am relatively new to python and I am wondering how I can merge these two tables and preserve both their values?
Consider these two tables:
df = pd.DataFrame([[1, 3], [2, 4],[2.5,1],[5,6],[7,8]], columns=['A', 'B'])
A B
1 3
2 4
2.5 1
5 6
7 8
df2 = pd.DataFrame([[1],[2],[3],[4],[5],[6],[7],[8]], columns=['A'])
A
1
2
...
8
I want to obtain the following result:
A B
1 3
2 4
2.5 1
3 NaN
4 NaN
5 6
6 NaN
7 8
8 NaN
You can see that column A includes all values from both the first and second dataframe in an ordered manner.
I have attempted:
pd.merge(df,df2,how='outer')
pd.merge(df,df2,how='right')
But the former does not result in an ordered dataframe and the latter does not include rows that are unique to df.
Let us do concat then drop_duplicates
out = pd.concat([df2,df]).drop_duplicates('A',keep='last').sort_values('A')
Out[96]:
A B
0 1.0 3.0
1 2.0 4.0
2 2.5 1.0
2 3.0 NaN
3 4.0 NaN
3 5.0 6.0
5 6.0 NaN
4 7.0 8.0
7 8.0 NaN

How to remove observations with missing values for specific columns from pandas DataFrame?

I have pandas DataFrame containing columns with missing values. I want remove observations, rows with them but only for specific columns. For example:
A B C D E
2 1 NaN 7 9
1 3 6 NaN 10
NaN 3 11 0 8
And let's say I want to remove observations with missing value for column D. So I want result like this:
A B C D E
2 1 NaN 7 9
NaN 3 11 0 8
Thank you for all suggestions.
Lets try mask pd.Series.notna()
df[df.D.notna()]
A B C D E
0 2.0 1 NaN 7.0 9
2 NaN 3 11.0 0.0 8

Compute difference between values in dataframe column

i have this dataframe:
a b c d
4 7 5 12
3 8 2 8
1 9 3 5
9 2 6 4
i want the column 'd' to become the difference between n-value of column a and n+1 value of column 'a'.
I tried this but it doesn't run:
for i in data.index-1:
data.iloc[i]['d']=data.iloc[i]['a']-data.iloc[i+1]['a']
can anyone help me?
Basically what you want is diff.
df = pd.DataFrame.from_dict({"a":[4,3,1,9]})
df["d"] = df["a"].diff(periods=-1)
print(df)
Output
a d
0 4 1.0
1 3 2.0
2 1 -8.0
3 9 NaN
lets try simple way:
df=pd.DataFrame.from_dict({'a':[2,4,8,15]})
diff=[]
for i in range(len(df)-1):
diff.append(df['a'][i+1]-df['a'][i])
diff.append(np.nan)
df['d']=diff
print(df)
a d
0 2 2.0
1 4 4.0
2 8 7.0
3 15 NaN

Parse columns to reshape dataframe

I have a csv that I import as a dataframe with pandas. The columns are like:
Step1:A Step1:B Step1:C Step1:D Step2:A Step2:B Step2:D Step3:B Step3:D Step3:E
0 1 2 3 4 5 6 7 8 9
Where the step and parameter are separated by ':'. I want to reshape the dataframe to look like this:
Step1 Step2 Step3
A 0 4 nan
B 1 5 7
C 2 nan nan
D 3 6 8
E nan nan 9
Now, If I want to maintain column sequential order such that I have this case:
Step2:A Step2:B Step2:C Step2:D Step1:A Step1:B Step1:D AStep3:B AStep3:D AStep3:E
0 1 2 3 4 5 6 7 8 9
Where the step and parameter are separated by ':'. I want to reshape the dataframe to look like this:
Step2 Step1 AStep3
A 0 4 nan
B 1 5 7
C 2 nan nan
D 3 6 8
E nan nan 9
Try read_csv with delim_whitespace:
df = pd.read_csv('file.csv', delim_whitespace=True)
df.columns = df.columns.str.split(':', expand=True)
df.stack().reset_index(level=0, drop=True)
output:
Step1 Step2 Step3
A 0.0 4.0 NaN
B 1.0 5.0 7.0
C 2.0 NaN NaN
D 3.0 6.0 8.0
E NaN NaN 9.0

Find observations in which both columns are NaN and replace them with 0 in pandas DataFrame

Here is a dataframe
a b c d
nan nan 3 5
nan 1 2 3
1 nan 4 5
2 3 7 9
nan nan 2 3
I want to replace the observations in both columns 'a' and 'b' where both of them are NaNs with 0s. Rows 2 and 5 in columns 'a' and 'b' have both both NaN, so I want to replace only those rows with 0's in those matching NaN columns.
so my output must be
a b c d
0 0 3 5
nan 1 2 3
1 nan 4 5
2 3 7 9
0 0 2 3
There might be a easier builtin function in Pandas, but this one should work.
df[['a', 'b']] = df.ix[ (np.isnan(df.a)) & (np.isnan(df.b)), ['a', 'b'] ].fillna(0)
Actually the solution from #Psidom much easier to read.
You can create a boolean series based on the conditions on columns a/b, and then use loc to modify corresponding columns and rows:
df.loc[df[['a','b']].isnull().all(1), ['a','b']] = 0
df
# a b c d
#0 0.0 0.0 3 5
#1 NaN 1.0 2 3
#2 1.0 NaN 4 5
#3 2.0 3.0 7 9
#4 0.0 0.0 2 3
Or:
df.loc[df.a.isnull() & df.b.isnull(), ['a','b']] = 0

Categories

Resources