sum values in different rows and columns dataframe python - python

My Data Frame
A B C D
2 3 4 5
1 4 5 6
5 6 7 8
How do I add values of different rows and different columns
Column A Row 2 with Column B row 1
Column A Row 3 with Column B row 2
Similarly for all rows

If you only need do this with two columns (and I understand your question well), I think you can use the shift function.
Your data frame (pandas?) is something like:
d = {'A': [2, 1, 5], 'B': [3, 4, 6], 'C': [4, 5, 7], 'D':[5, 6, 8]}
df = pd.DataFrame(data=d)
So, it's possible to create a new data frame with B column shifted:
df2 = df['B'].shift(1)
which gives:
0 NaN
1 3.0
2 4.0
Name: B, dtype: float64
and then, merge this new data with the previous df and, for example, sum the values:
df = df.join(df2, rsuffix='shift')
df['out'] = df['A'] + df['Bshift']
The final output is in out column:
A B C D Bshift out
0 2 3 4 5 NaN NaN
1 1 4 5 6 3.0 4.0
2 5 6 7 8 4.0 9.0
But it's only an intuition, I'm not sure about your question!

Related

Select rows of pandas dataframe in order of a given list with repetitions and keep the original index

After looking here and here and in the documentation, I still cannot find a way to select rows from a DataFrame according to all these criteria:
Return rows in an order given from a list of values from a given column
Return repeated rows (associated with repeated values in the list)
Preserve the original indices
Ignore values of the list not present in the DataFrame
As an example, let
df = pd.DataFrame({'A': [5, 6, 3, 4], 'B': [1, 2, 3, 5]})
df
A B
0 5 1
1 6 2
2 3 3
3 4 5
and let
list_of_values = [3, 4, 6, 4, 3, 8]
Then I would like to get the following DataFrame:
A B
2 3 3
3 4 5
1 6 2
3 4 5
2 3 3
How can I accomplish that? Zero's answer looks promising as it is the only one I found which preserves the original index, but it does not work with repetitions. Any ideas about how to modify/generalize it?
We have to preserve the index by assigning it as a column first so we can set_index after the mering:
list_of_values = [3, 4, 6, 4, 3, 8]
df2 = pd.DataFrame({'A': list_of_values, 'order': range(len(list_of_values))})
dfn = (
df.assign(idx=df.index)
.merge(df2, on='A')
.sort_values('order')
.set_index('idx')
.drop('order', axis=1)
)
A B
idx
2 3 3
3 4 5
1 6 2
3 4 5
2 3 3
If you want to get rid of the index name (idx), use rename_axis:
dfn = dfn.rename_axis(None)
A B
2 3 3
3 4 5
1 6 2
3 4 5
2 3 3
Here's a way to do that using merge:
list_df = pd.DataFrame({"A": list_of_values, "order": range(len(list_of_values))})
pd.merge(list_df, df, on="A").sort_values("order").drop("order", axis=1)
The output is:
A B
0 3 3
2 4 5
4 6 2
3 4 5
1 3 3

Add two columns of different dataframes taking into account missing values

How can I add columns of two dataframes (A + B), so that the result (C) takes into account missing values ('---')?
DataFrame A
a = pd.DataFrame({'A': [1, 2, 3, '---', 5]})
A
0 1
1 2
2 3
3 ---
4 5
DataFrame B
b = pd.DataFrame({'B': [3, 4, 5, 6, '---']})
B
0 3
1 4
2 5
3 6
4 ---
Desired Result of A+B
C
0 4
1 6
2 8
3 ---
4 ---
Replace the '---' with np.nan, add the columns and fillna with '---'
(a['A'].replace('---', np.nan)+b['B'].replace('---', np.nan)).fillna('---')
You can assign the result to a new dataframe or an existing one:
df = pd.DataFrame()
df.assign(C = (a['A'].replace('---', np.nan)+b['B'].replace('---', np.nan)).fillna('---'))
OR
a.assign(C = (a['A'].replace('---', np.nan)+b['B'].replace('---', np.nan)).fillna('---'))

How do I transpose dataframe in pandas without index?

Pretty sure this is very simple.
I am reading a csv file and have the dataframe:
Attribute A B C
a 1 4 7
b 2 5 8
c 3 6 9
I want to do a transpose to get
Attribute a b c
A 1 2 3
B 4 5 6
C 7 8 9
However, when I do df.T, it results in
0 1 2
Attribute a b c
A 1 2 3
B 4 5 6
C 7 8 9`
How do I get rid of the indexes on top?
You can set the index to your first column (or in general, the column you want to use as as index) in your dataframe first, then transpose the dataframe. For example if the column you want to use as index is 'Attribute', you can do:
df.set_index('Attribute',inplace=True)
df.transpose()
Or
df.set_index('Attribute').T
It works for me:
>>> data = {'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}
>>> df = pd.DataFrame(data, index=['a', 'b', 'c'])
>>> df.T
a b c
A 1 2 3
B 4 5 6
C 7 8 9
If your index column 'Attribute' is really set to index before the transpose, then the top row after the transpose is not the first row, but a title row. if you don't like it, I would first drop the index, then rename them as columns after the transpose.

Augment DataFrame index

I want to write a series ('b') of a dataframe from one dataframe (df2) to another one (df1). Both DataFrames use the same index column, but the range of df2's index goes a bit further and it's missing some of the indices of df1.
This is the current behaviour:
>>> import pandas as pd
>>> pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
a b
0 1 4
1 2 5
2 3 6
>>>
>>> df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
>>> df1 = df.set_index(['a'])
>>> df1
b
a
1 4
2 5
3 6
>>> dg = pd.DataFrame({'a': [3, 4, 5], 'b': [7, 8, 9]})
>>> dg
a b
0 3 7
1 4 8
2 5 9
>>> df2 = dg.set_index('a')
>>> df2
b
a
3 7
4 8
5 9
>>> df1['b'] = df2['b']
>>> df1
b
a
1 NaN
2 NaN
3 7.0
When I call df1['b'] = df2['b'] those values of the indices not in df2 are becoming nan and the indices of df2 that aren't in df1 are not getting carried over into df1.
Is there any way to change this behaviour so that the resulting DataFrame is the below?
>>> df1
b
a
1 1
2 2
3 7
4 8
5 9
This is a use case for combine_first. It will prioritize the calling dataframe and fill in any missing values with the second. It will also concatenate rows from the second data frame that don't have labels in the first.
df2.combine_first(df1)
One option you can go with is reindex() df2 and then fill missing values with df1:
df2 = df2.reindex(df1.index.union(df2.index))
df2['b'] = df2['b'].fillna(df1['b'])
df2
# b
#a
#1 4.0
#2 5.0
#3 7.0
#4 8.0
#5 9.0

pandas rearrange dataframe to have all values in ascending order per every column independently

The title should say it all, I want to turn this DataFrame:
A NaN 4 3
B 2 1 4
C 3 4 2
D 4 2 8
into this DataFrame:
A 2 1 2
B 3 2 3
C 4 4 4
D NaN 4 8
And I want to do it in a nice manner. The ugly solution would be to take every column and form a new DataFrame.
To test, use:
d = {'one':[None, 2, 3, 4],
'two':[4, 1, 4, 2],
'three':[3, 4, 6, 8],}
df = pd.DataFrame(d, index = list('ABCD'))
The desired sort ignores the index values, so the operation appears to be more
like a NumPy operation than a Pandas one:
import pandas as pd
d = {'one':[None, 2, 3, 4],
'two':[4, 1, 4, 2],
'three':[3, 4, 6, 8],}
df = pd.DataFrame(d, index = list('ABCD'))
# one three two
# A NaN 3 4
# B 2 4 1
# C 3 6 4
# D 4 8 2
arr = df.values
arr.sort(axis=0)
df = pd.DataFrame(arr, index=df.index, columns=df.columns)
print(df)
yields
one three two
A 2 3 1
B 3 4 2
C 4 6 4
D NaN 8 4

Categories

Resources