How Can I combine two columns is one dataframe?

How Can I combine two columns is one dataframe? - python

I have a dataset like this.
A B C A2
1 2 3 4
5 6 7 8
and I want to combine A and A2.
A B C
1 2 3
5 6 7
4
8
how can I combine two columns?
Hope for help. Thank you.

I don't think it is possible directly. But you can do it with a few lines of code:
df = pd.DataFrame({'A':[1,5],'B':[2,6],'C':[3,7],'A2':[4,8]})
df_A2 = df[['A2']]
df_A2.columns = ['A']
df = pd.concat([df.drop(['A2'],axis=1),df_A2])
You will get this if you print df:
A B C
0 1 2.0 3.0
1 5 6.0 7.0
0 4 NaN NaN
1 8 NaN NaN

You could append the last columns after renaming it:
df.append(df[['A2']].set_axis(['A'], axis=1)).drop(columns='A2')
it gives as expected:
A B C
0 1 2.0 3.0
1 5 6.0 7.0
0 4 NaN NaN
1 8 NaN NaN

if the index is not important to you:
import pandas as pd
pd.concat([df[['A','B','C']], df[['A2']].rename(columns={'A2': 'A'})]).reset_index(drop=True)

Related

How to merge two tables while preserving all values?

I am relatively new to python and I am wondering how I can merge these two tables and preserve both their values?
Consider these two tables:
df = pd.DataFrame([[1, 3], [2, 4],[2.5,1],[5,6],[7,8]], columns=['A', 'B'])
A B
1 3
2 4
2.5 1
5 6
7 8
df2 = pd.DataFrame([[1],[2],[3],[4],[5],[6],[7],[8]], columns=['A'])
A
1
2
...
8
I want to obtain the following result:
A B
1 3
2 4
2.5 1
3 NaN
4 NaN
5 6
6 NaN
7 8
8 NaN
You can see that column A includes all values from both the first and second dataframe in an ordered manner.
I have attempted:
pd.merge(df,df2,how='outer')
pd.merge(df,df2,how='right')
But the former does not result in an ordered dataframe and the latter does not include rows that are unique to df.

Let us do concat then drop_duplicates
out = pd.concat([df2,df]).drop_duplicates('A',keep='last').sort_values('A')
Out[96]:
A B
0 1.0 3.0
1 2.0 4.0
2 2.5 1.0
2 3.0 NaN
3 4.0 NaN
3 5.0 6.0
5 6.0 NaN
4 7.0 8.0
7 8.0 NaN

How to remove observations with missing values for specific columns from pandas DataFrame?

I have pandas DataFrame containing columns with missing values. I want remove observations, rows with them but only for specific columns. For example:
A B C D E
2 1 NaN 7 9
1 3 6 NaN 10
NaN 3 11 0 8
And let's say I want to remove observations with missing value for column D. So I want result like this:
A B C D E
2 1 NaN 7 9
NaN 3 11 0 8
Thank you for all suggestions.

Lets try mask pd.Series.notna()
df[df.D.notna()]
A B C D E
0 2.0 1 NaN 7.0 9
2 NaN 3 11.0 0.0 8

Compute difference between values in dataframe column

i have this dataframe:
a b c d
4 7 5 12
3 8 2 8
1 9 3 5
9 2 6 4
i want the column 'd' to become the difference between n-value of column a and n+1 value of column 'a'.
I tried this but it doesn't run:
for i in data.index-1:
data.iloc[i]['d']=data.iloc[i]['a']-data.iloc[i+1]['a']
can anyone help me?

Basically what you want is diff.
df = pd.DataFrame.from_dict({"a":[4,3,1,9]})
df["d"] = df["a"].diff(periods=-1)
print(df)
Output
a d
0 4 1.0
1 3 2.0
2 1 -8.0
3 9 NaN

lets try simple way:
df=pd.DataFrame.from_dict({'a':[2,4,8,15]})
diff=[]
for i in range(len(df)-1):
diff.append(df['a'][i+1]-df['a'][i])
diff.append(np.nan)
df['d']=diff
print(df)
a d
0 2 2.0
1 4 4.0
2 8 7.0
3 15 NaN

Parse columns to reshape dataframe

I have a csv that I import as a dataframe with pandas. The columns are like:
Step1:A Step1:B Step1:C Step1:D Step2:A Step2:B Step2:D Step3:B Step3:D Step3:E
0 1 2 3 4 5 6 7 8 9
Where the step and parameter are separated by ':'. I want to reshape the dataframe to look like this:
Step1 Step2 Step3
A 0 4 nan
B 1 5 7
C 2 nan nan
D 3 6 8
E nan nan 9
Now, If I want to maintain column sequential order such that I have this case:
Step2:A Step2:B Step2:C Step2:D Step1:A Step1:B Step1:D AStep3:B AStep3:D AStep3:E
0 1 2 3 4 5 6 7 8 9
Where the step and parameter are separated by ':'. I want to reshape the dataframe to look like this:
Step2 Step1 AStep3
A 0 4 nan
B 1 5 7
C 2 nan nan
D 3 6 8
E nan nan 9

Try read_csv with delim_whitespace:
df = pd.read_csv('file.csv', delim_whitespace=True)
df.columns = df.columns.str.split(':', expand=True)
df.stack().reset_index(level=0, drop=True)
output:
Step1 Step2 Step3
A 0.0 4.0 NaN
B 1.0 5.0 7.0
C 2.0 NaN NaN
D 3.0 6.0 8.0
E NaN NaN 9.0

Find observations in which both columns are NaN and replace them with 0 in pandas DataFrame

Here is a dataframe
a b c d
nan nan 3 5
nan 1 2 3
1 nan 4 5
2 3 7 9
nan nan 2 3
I want to replace the observations in both columns 'a' and 'b' where both of them are NaNs with 0s. Rows 2 and 5 in columns 'a' and 'b' have both both NaN, so I want to replace only those rows with 0's in those matching NaN columns.
so my output must be
a b c d
0 0 3 5
nan 1 2 3
1 nan 4 5
2 3 7 9
0 0 2 3

There might be a easier builtin function in Pandas, but this one should work.
df[['a', 'b']] = df.ix[ (np.isnan(df.a)) & (np.isnan(df.b)), ['a', 'b'] ].fillna(0)
Actually the solution from #Psidom much easier to read.

You can create a boolean series based on the conditions on columns a/b, and then use loc to modify corresponding columns and rows:
df.loc[df[['a','b']].isnull().all(1), ['a','b']] = 0
df
# a b c d
#0 0.0 0.0 3 5
#1 NaN 1.0 2 3
#2 1.0 NaN 4 5
#3 2.0 3.0 7 9
#4 0.0 0.0 2 3
Or:
df.loc[df.a.isnull() & df.b.isnull(), ['a','b']] = 0

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How Can I combine two columns is one dataframe? - python

I have a dataset like this. A B C A2 1 2 3 4 5 6 7 8 and I want to combine A and A2. A B C 1 2 3 5 6 7 4 8 how can I combine two columns? Hope for help. Thank you.

You could append the last columns after renaming it: df.append(df[['A2']].set_axis(['A'], axis=1)).drop(columns='A2') it gives as expected: A B C 0 1 2.0 3.0 1 5 6.0 7.0 0 4 NaN NaN 1 8 NaN NaN

if the index is not important to you: import pandas as pd pd.concat([df[['A','B','C']], df[['A2']].rename(columns={'A2': 'A'})]).reset_index(drop=True)

Related

How to merge two tables while preserving all values?

How to remove observations with missing values for specific columns from pandas DataFrame?

Compute difference between values in dataframe column

Parse columns to reshape dataframe

Find observations in which both columns are NaN and replace them with 0 in pandas DataFrame

Categories

Resources