append pandas dataframe to column

append pandas dataframe to column - python

I'm stuck and need some help. I have the following dataframe:
+-----+---+---+--+--+
| | A | B | | |
+-----+---+---+--+--+
| 288 | 1 | 4 | | |
+-----+---+---+--+--+
| 245 | 2 | 3 | | |
+-----+---+---+--+--+
| 543 | 3 | 6 | | |
+-----+---+---+--+--+
| 867 | 1 | 9 | | |
+-----+---+---+--+--+
| 345 | 2 | 7 | | |
+-----+---+---+--+--+
| 122 | 3 | 8 | | |
+-----+---+---+--+--+
| 233 | 1 | 1 | | |
+-----+---+---+--+--+
| 346 | 2 | 6 | | |
+-----+---+---+--+--+
| 765 | 3 | 3 | | |
+-----+---+---+--+--+
Column A has repeating values as shown. What I want to do is every time I see the repeating value in Column A I want to append a new colum with the corresponding values from column B as column C as shown below:
+-----+---+---+-----+
| | A | B | C |
+-----+---+---+-----+
| 288 | 1 | 4 | 9 |
+-----+---+---+-----+
| 245 | 2 | 3 | 7 |
+-----+---+---+-----+
| 543 | 3 | 6 | 8 |
+-----+---+---+-----+
| 867 | 1 | 9 | 1 |
+-----+---+---+-----+
| 345 | 2 | 7 | 6 |
+-----+---+---+-----+
| 122 | 3 | 8 | 3 |
+-----+---+---+-----+
| 233 | 1 | 1 | NaN |
+-----+---+---+-----+
| 346 | 2 | 6 | NaN |
+-----+---+---+-----+
| 765 | 3 | 3 | NaN |
+-----+---+---+-----+
Thanks.

Assuming that val is one of the repeated values,
slice = df.loc[df.A == val, 'B'].shift(-1)
will create a one-column data frame with the values re-indexed to their new positions.
Since none of the re-assigned index values should be redundant, you can use pandas.concat to stitch the different slices together without fear of losing data. Then just attach them as a new column:
df['C'] = pd.concat([df.loc[df['A'] == x, 'B'].shift(-1) for x in [1, 2, 3]])
When the column is assigned, the index values will make everything line up:
A B C
0 1 4 9.0
1 2 3 7.0
2 3 6 8.0
3 1 9 1.0
4 2 7 6.0
5 3 8 3.0
6 1 1 NaN
7 2 6 NaN
8 3 3 NaN

Reverse the dataframe order, groupby transform it against shift function, and reverse it back:
df = df[::-1]
df['C'] = df.groupby(df.columns[0]).transform('shift')
df = df[::-1]
df
A B C
0 1 4 9.0
1 2 3 7.0
2 3 6 8.0
3 1 9 1.0
4 2 7 6.0
5 3 8 3.0
6 1 1 NaN
7 2 6 NaN
8 3 3 NaN

Related

How to unmerge the features of a dataframe from one column into several single columns separated by "\" via pandas?

More visually, I would like to move from this dataframe :
| A\B\C\D | Unnamed:1 | Unnamed:2 | Unnamed:3 | Unnamed:4 |
| --------| ----------|
0 | 1\2\3\4 | NaN | NaN | NaN | NaN |
1 | 1\2\3\4 | NaN | NaN | NaN | NaN |
2 | a\2\7\C | NaN | NaN | NaN | NaN |
3 | d\2\u\4 | NaN | NaN | NaN | NaN |
to this one:
| A | B | C | D |
| --------| ----------|
0 | 1 | 2 | 3 | 4 |
1 | 1 | 2 | 3 | 4 |
2 | a | 2 | 7 | C |
3 | d | 2 | u | 4 |
Thanks !

Try splitting the values first and then split the column name:
df2 = df.iloc[:,0].str.split('\\', expand = True)
df2.columns = df.columns[0].split('\\')
df2
result:
A B C D
0 1 2 3 4
1 1 2 3 4
2 a 2 7 C
3 d 2 u 4

You can use DataFrame constructor:
out = pd.DataFrame(df.iloc[:, 0].str.split('\\').tolist(),
columns=df.columns[0].split('\\'))
print(out)
# Output
A B C D
0 1 2 3 4
1 1 2 3 4
2 a 2 7 C
3 d 2 u 4
The question is: why do you have a such input? Do you read your data from csv file and you don't use the right separator?

Averaging five rows above the value in the target column

The challenge that I have, and don't know how to approach is to have averaged five, ten, or whatever amount of rows above the target value plus the target row.
Dataset
target | A | B |
----------------------
nan | 6 | 4 |
nan | 2 | 7 |
nan | 4 | 9 |
nan | 7 | 3 |
nan | 3 | 7 |
nan | 6 | 8 |
nan | 7 | 6 |
53 | 4 | 5 |
nan | 6 | 4 |
nan | 2 | 7 |
nan | 3 | 3 |
nan | 4 | 9 |
nan | 7 | 3 |
nan | 3 | 7 |
51 | 1 | 3 |
Desired format:
target | A | B |
----------------------
53 | 5.16|6.33 |
51 |3.33 |5.33 |

Try this, [::-1] reversing element to order the dataframe bottom to top, so we can group the values "above" valid targets:
df.groupby(df['target'].notna()[::-1].cumsum()[::-1]).apply(lambda x: x.tail(6).mean())
Output:
target A B
target
1 51.0 3.333333 5.333333
2 53.0 5.166667 6.333333

Increment rank each time flag changes

I have the following pandas dataframe where the first column is the datetime index. I am trying to achieve the desired_output column which increments every time the flag changes from 0 to 1 or 1 to 0. I have been able to achieve this type of thing in SQL however after finding that pandasql sqldf for some strange reason changes the values of the field undergoing the partition I am now trying to achieve this using regular python syntax.
Any help would be much appreciated.
+-------------+------+----------------+
| date(index) | flag | desired_output |
+-------------+------+----------------+
| 1/01/2020 | 0 | 1 |
| 2/01/2020 | 0 | 1 |
| 3/01/2020 | 0 | 1 |
| 4/01/2020 | 1 | 2 |
| 5/01/2020 | 1 | 2 |
| 6/01/2020 | 0 | 3 |
| 7/01/2020 | 1 | 4 |
| 8/01/2020 | 1 | 4 |
| 9/01/2020 | 1 | 4 |
| 10/01/2020 | 1 | 4 |
| 11/01/2020 | 1 | 4 |
| 12/01/2020 | 1 | 4 |
| 13/01/2020 | 0 | 5 |
| 14/01/2020 | 0 | 5 |
| 15/01/2020 | 0 | 5 |
| 16/01/2020 | 0 | 5 |
| 17/01/2020 | 1 | 6 |
| 18/01/2020 | 0 | 7 |
| 19/01/2020 | 0 | 7 |
| 20/01/2020 | 0 | 7 |
| 21/01/2020 | 0 | 7 |
| 22/01/2020 | 1 | 8 |
| 23/01/2020 | 1 | 8 |
+-------------+------+----------------+

Use diff and cumsum:
print (df["flag"].diff().ne(0).cumsum())
0 1
1 1
2 1
3 2
4 2
5 3
6 4
7 4
8 4
9 4
10 4
11 4
12 5
13 5
14 5
15 5
16 6
17 7
18 7
19 7
20 7
21 8
22 8

How can I turn the following dataframe into a multi-index dataframe?

How can I achieve the following:
I have a table like so
|----------------------|
| Date | A | B | C | D |
|------+---+---+---+---|
| 2000 | 1 | 2 | 5 | 4 |
|------+---+---+---+---|
| 2001 | 2 | 2 | 7 | 4 |
|------+---+---+---+---|
| 2002 | 3 | 1 | 7 | 7 |
|------+---+---+---+---|
| 2003 | 4 | 1 | 5 | 7 |
|----------------------|
and turn it into a multi-index type dataframe:
|------------------------------------|
| Column Name | Date | Value | C | D |
|-------------+------+-------+---+---|
| A | 2000 | 1 | 5 | 4 |
| |------+-------+---+---|
| | 2001 | 2 | 7 | 4 |
| |------+-------+---+---|
| | 2002 | 3 | 7 | 7 |
| |------+-------+---+---|
| | 2003 | 4 | 5 | 7 |
|-------------+------+-------+---+---|
| B | 2000 | 2 | 5 | 4 |
| |------+-------+---+---|
| | 2001 | 2 | 7 | 4 |
| |------+-------+---+---|
| | 2002 | 1 | 7 | 7 |
| |------+-------+---+---|
| | 2003 | 1 | 5 | 7 |
|------------------------------------|
I have tried using the Melt function on a dataframe but could not figure out how to achieve this desired look. I think I would also then have to apply a groupby function to the melted dataframe.

You can use melt with set_index. By adding C and D as id_vars, the columns will keep the same structure, then you can just set the columns of interest as index to get a MultiIndex dataframe:
df.melt(id_vars=['Date', 'C', 'D']).set_index(['variable', 'Date'])
C D value
variable Date
A 2000 5 4 1
2001 7 4 2
2002 7 7 3
2003 5 7 4
B 2000 5 4 2
2001 7 4 2
2002 7 7 1
2003 5 7 1

Get next value from a row that satisfies a condition in pandas

I have a DataFrame that looks something like this:
| event_type | object_id
------ | ------ | ------
0 | A | 1
1 | D | 1
2 | A | 1
3 | D | 1
4 | A | 2
5 | A | 2
6 | D | 2
7 | A | 3
8 | D | 3
9 | A | 3
What I want to do is get the index of the next row where the event_type is A and the object_id is still the same, so as an additional column this would look like this:
| event_type | object_id | next_A
------ | ------ | ------ | ------
0 | A | 1 | 2
1 | D | 1 | 2
2 | A | 1 | NaN
3 | D | 1 | NaN
4 | A | 2 | 5
5 | A | 2 | NaN
6 | D | 2 | NaN
7 | A | 3 | 9
8 | D | 3 | 9
9 | A | 3 | NaN
and so on.
I want to avoid using .apply() because my DataFrame is quite large, is there a vectorized way to do this?
EDIT: for multiple A/D pairs for the same object_id, I'd like it to always use the next index of A, like this:
| event_type | object_id | next_A
------ | ------ | ------ | ------
0 | A | 1 | 2
1 | D | 1 | 2
2 | A | 1 | 4
3 | D | 1 | 4
4 | A | 1 | NaN

You can do it with groupby like:
def populate_next_a(object_df):
object_df['a_index'] = pd.Series(object_df.index, index=object_df.index)[object_df.event_type == 'A']
object_df['a_index'].fillna(method='bfill', inplace=True)
object_df['next_A'] = object_df['a_index'].where(object_df.event_type != 'A', object_df['a_index'].shift(-1))
object_df.drop('a_index', axis=1)
return object_df
result = df.groupby(['object_id']).apply(populate_next_a)
print(result)
event_type object_id next_A
0 A 1 2.0
1 D 1 2.0
2 A 1 NaN
3 D 1 NaN
4 A 2 5.0
5 A 2 NaN
6 D 2 NaN
7 A 3 9.0
8 D 3 9.0
9 A 3 NaN
GroupBy.apply will not have as much overhead as a simple apply.
Note you cannot (yet) store integer with NaN: http://pandas.pydata.org/pandas-docs/stable/gotchas.html#support-for-integer-na so they end up as float values

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

append pandas dataframe to column - python

Reverse the dataframe order, groupby transform it against shift function, and reverse it back: df = df[::-1] df['C'] = df.groupby(df.columns[0]).transform('shift') df = df[::-1] df A B C 0 1 4 9.0 1 2 3 7.0 2 3 6 8.0 3 1 9 1.0 4 2 7 6.0 5 3 8 3.0 6 1 1 NaN 7 2 6 NaN 8 3 3 NaN

Related

How to unmerge the features of a dataframe from one column into several single columns separated by "\" via pandas?

Averaging five rows above the value in the target column

Increment rank each time flag changes

How can I turn the following dataframe into a multi-index dataframe?

Get next value from a row that satisfies a condition in pandas

Categories

Resources