Repeat the dataframe and increment the timestamps - python

I have a dataframe like this
Temp time[s]
0 20 0
1 21 1
2 21.5 2
I want to repeat the dataframe but add the timestamp, my output should like this ,
Temp time[s]
0 20 0
1 21 1
2 21.5 2
3 20 3
4 21 4
5 21.5 5
I tried something like this , but didn't worked for me
df2 = pd.concat([df]*2, ignore_index=True)
df2['time[s]'] = df2.groupby(level=0).cumcount() * 1
Can anyone help me plz ?

For a generic approach, you can use:
N = 2 # number or repetitions
df2 = pd.concat([df]*N, ignore_index=True)
df2['time[s]'] += np.repeat(np.arange(N), len(df))*len(df)
# or
# df2['time[s]'] += np.arange(len(df2))//len(df)*len(df)
output:
Temp time[s]
0 20.0 0
1 21.0 1
2 21.5 2
3 20.0 3
4 21.0 4
5 21.5 5

Try this for your exact use case with one repetition:
df_x = pd.concat([df, df]).reset_index(drop=True)
df_x["time[s]"] = df_x.index
Print out:
Temp time[s]
0 20.0 0
1 21.0 1
2 21.5 2
3 20.0 3
4 21.0 4
5 21.5 5

Related

Combining two dataframes

I've tried merging two dataframes, but I can't seem to get it to work. Each time I merge, the rows where I expect values are all 0. Dataframe df1 already as some data in it, with some left blank. Dataframe df2 will populate those blank rows in df1 where column names match at each value in "TempBin" and each value in "Month" in df1.
EDIT:
Both dataframes are in a for loop. df1 acts as my "storage", df2 changes for each location iteration. So if df2 contained the results for LocationZP, I would also want that data inserted in the matching df1 rows. If I use df1 = df1.append(df2) in the for loop, all of the rows from df2 keep inserting at the very end of df1 for each iteration.
df1:
Month TempBin LocationAA LocationXA LocationZP
1 0 7 1 2
1 1 98 0 89
1 2 12 23 38
1 3 3 14 17
1 4 7 9 14
1 5 1 8 99
13 0 0 0 0
13 1 0 0 0
13 2 0 0 0
13 3 0 0 0
13 4 0 0 0
13 5 0 0 0
df2:
Month TempBin LocationAA
13 0 11
13 1 22
13 2 33
13 3 44
13 4 55
13 5 66
desired output in df1:
Month TempBin LocationAA LocationXA LocationZP
1 0 7 1 2
1 1 98 0 89
1 2 12 23 38
1 3 3 14 17
1 4 7 9 14
1 5 1 8 99
13 0 11 0 0
13 1 22 0 0
13 2 33 0 0
13 3 44 0 0
13 4 55 0 0
13 5 66 0 0
import pandas as pd
df1 = pd.DataFrame({'Month': [1]*6 + [13]*6,
'TempBin': [0,1,2,3,4,5]*2,
'LocationAA': [7,98,12,3,7,1,0,0,0,0,0,0],
'LocationXA': [1,0,23,14,9,8,0,0,0,0,0,0],
'LocationZP': [2,89,38,17,14,99,0,0,0,0,0,0]}
)
df2 = pd.DataFrame({'Month': [13]*6,
'TempBin': [0,1,2,3,4,5],
'LocationAA': [11,22,33,44,55,66]}
)
df1 = pd.merge(df1, df2, on=["Month","TempBin","LocationAA"], how="left")
result:
Month TempBin LocationAA LocationXA LocationZP
1 0 7.0 1.0 2.0
1 1 98.0 0.0 89.0
1 2 12.0 23.0 38.0
1 3 3.0 14.0 17.0
1 4 7.0 9.0 14.0
1 5 1.0 8.0 99.0
13 0 NaN NaN NaN
13 1 NaN NaN NaN
13 2 NaN NaN NaN
13 3 NaN NaN NaN
13 4 NaN NaN NaN
13 5 NaN NaN NaN
Here's some code that worked for me:
# Merge two df into one dataframe on the columns "TempBin" and "Month" filling nan values with 0.
import pandas as pd
df1 = pd.DataFrame({'Month': [1]*6 + [13]*6,
'TempBin': [0,1,2,3,4,5]*2,
'LocationAA': [7,98,12,3,7,1,0,0,0,0,0,0],
'LocationXA': [1,0,23,14,9,8,0,0,0,0,0,0],
'LocationZP': [2,89,38,17,14,99,0,0,0,0,0,0]}
)
df2 = pd.DataFrame({'Month': [13]*6,
'TempBin': [0,1,2,3,4,5],
'LocationAA': [11,22,33,44,55,66]})
df_merge = pd.merge(df1, df2, how='left',
left_on=['TempBin', 'Month'],
right_on=['TempBin', 'Month'])
df_merge.fillna(0, inplace=True)
# add column LocationAA and fill it with the not null value from column LocationAA_x and LocationAA_y
df_merge['LocationAA'] = df_merge.apply(lambda x: x['LocationAA_x'] if pd.isnull(x['LocationAA_y']) else x['LocationAA_y'], axis=1)
# remove column LocationAA_x and LocationAA_y
df_merge.drop(['LocationAA_x', 'LocationAA_y'], axis=1, inplace=True)
print(df_merge)
Output:
Month TempBin LocationXA LocationZP LocationAA
0 1 0 1.0 2.0 0.0
1 1 1 0.0 89.0 0.0
2 1 2 23.0 38.0 0.0
3 1 3 14.0 17.0 0.0
4 1 4 9.0 14.0 0.0
5 1 5 8.0 99.0 0.0
6 13 0 0.0 0.0 11.0
7 13 1 0.0 0.0 22.0
8 13 2 0.0 0.0 33.0
9 13 3 0.0 0.0 44.0
10 13 4 0.0 0.0 55.0
11 13 5 0.0 0.0 66.0
Let me know if there's something you don't understand in the comments :)
PS: Sorry for the extra comments. But I left them there for some more explanations.
You need to use append to get the desired output:
df1 = df1.append(df2)
and if you want to replace the Nulls to zeros add:
df1 = df1.fillna(0)
Here is another way using combine_first()
i = ['Month','TempBin']
df2.set_index(i).combine_first(df1.set_index(i)).reset_index()

How to sort values in Pandas based on values in other column?

I have dataframe like this:
int_time
leng
id
3
4123
1
5
243
2
7
1232
3
8
3883
4
..
...
..
and I want to sort values in int_time and leng based on value id. So output will look like this:
int_time
leng
id
0
0
1
0
0
2
3
4123
3
4
0
4
5
243
5
6
0
6
7
1232
7
8
3883
8
..
...
..
In other words, i want to change row index for int_time and leng based on value in id. Can somebody help me with this please?
you can use merge -
n = df['int_time'].max()
new_df = pd.DataFrame({'id': range(1, int(n) + 1)})
new_df = new_df.merge(df, left_on='id', right_on='int_time', how= 'left').fillna(0).drop('id_y', axis=1).rename(columns={'id_x': 'Id'})
print(new_df)
output-
Id
int_time
leng
1
0.0
0.0
2
0.0
0.0
3
3.0
4123.0
4
0.0
0.0
5
5.0
243.0
6
0.0
0.0
7
7.0
1232.0
8
8.0
3883.0

how to adjust subtotal columns in pandas using grouby?

I'm working on exporting data frames to Excel using dataframe join.
However, after Join dataframe,
when calculating subtotal using groupby, the figure below is executed.
There's a "Subtotal" word in the index column.
enter image description here
Is there any way to move it into the code column and sort the indexes?
enter image description here
here codes :
def subtotal(df__, str):
container = []
for key, group in df__.groupby(['key']):
group.loc['subtotal'] = group[['quantity', 'quantity2', 'quantity3']].sum()
container.append(group)
df_subtotal = pd.concat(container)
df_subtotal.loc['GrandTotal'] = df__[['quantity', 'quantity2', 'quantity3']].sum()
print(df_subtotal)
return (df_subtotal.to_excel(writer, sheet_name=str))
Use np.where() to fill NaN in code column with value in df.index. Then assign a new index array to df.index.
import numpy as np
df['code'] = np.where(df['code'].isna(), df.index, df['code'])
df.index = np.arange(1, len(df) + 1)
print(df)
code key product quntity1 quntity2 quntity3
1 cs01767 a apple-a 10 0 10.0
2 Subtotal NaN NaN 10 0 10.0
3 cs0000 b bannana-a 50 10 40.0
4 cs0000 b bannana-b 0 0 0.0
5 cs0000 b bannana-c 0 0 0.0
6 cs0000 b bannana-d 80 20 60.0
7 cs0000 b bannana-e 0 0 0.0
8 cs01048 b bannana-f 0 0 NaN
9 cs01048 b bannana-g 0 0 0.0
10 Subtotal NaN NaN 130 30 100.0
11 cs99999 c melon-a 50 10 40.0
12 cs99999 c melon-b 20 20 0.0
13 cs01188 c melon-c 10 0 10.0
14 Subtotal NaN NaN 80 30 50.0
15 GrandTotal NaN NaN 220 60 160.0

Pandas Create New Column Based on Value in Another Column, If False Return Previous Value of New Column

this is a Python pandas problem I've been struggling with for a while now. Lets say I have a simple dataframe df where df['a'] = [1,2,3,1,4,6] and df['b'] = [10,20,30,40,50,60]. I would like to create a third column 'c', where if the value of df['a'] == 1, df['c'] = df['b']. If this is false, df['c'] = the previous value of df['c']. I have tried using np.where to make this happen, but the result is not what I was expecting. Any advice?
df = pd.DataFrame()
df['a'] = [1,2,3,1,4,6]
df['b'] = [10,20,30,40,50,60]
df['c'] = np.nan
df['c'] = np.where(df['a'] == 1, df['b'], df['c'].shift(1))
The result is:
a b c
0 1 10 10.0
1 2 20 NaN
2 3 30 NaN
3 1 40 40.0
4 4 50 NaN
5 6 60 NaN
Whereas I would have expected:
a b c
0 1 10 10.0
1 2 20 10.0
2 3 30 10.0
3 1 40 40.0
4 4 50 40.0
5 6 60 40.0
Try this:
df.c.ffill(inplace=True)
Output:
a b c
0 1 10 10.0
1 2 20 10.0
2 3 30 10.0
3 1 40 40.0
4 4 50 40.0
5 6 60 40.0

Use Pandas dataframe to add lag feature from MultiIindex Series

I have a MultiIndex Series (3 indices) that looks like this:
Week ID_1 ID_2
3 26 1182 39.0
4767 42.0
31393 20.0
31690 42.0
32962 3.0
....................................
I also have a dataframe df which contains all the columns (and more) used for indices in the Series above, and I want to create a new column in my dataframe df that contains the value matching the ID_1 and ID_2 and the Week - 2 from the Series.
For example, for the row in dataframe that has ID_1 = 26, ID_2 = 1182 and Week = 3, I want to match the value in the Series indexed by ID_1 = 26, ID_2 = 1182 and Week = 1 (3-2) and put it on that row in a new column. Further, my Series might not necessarily have the value required by the dataframe, in which case I'd like to just have 0.
Right now, I am trying to do this by using:
[multiindex_series.get((x[1].get('week', 2) - 2, x[1].get('ID_1', 0), x[1].get('ID_2', 0))) for x in df.iterrows()]
This however is very slow and memory hungry and I was wondering what are some better ways to do this.
FWIW, the Series was created using
saved_groupby = df.groupby(['Week', 'ID_1', 'ID_2'])['Target'].median()
and I'm willing to do it a different way if better paths exist to create what I'm looking for.
Increase the Week by 2:
saved_groupby = df.groupby(['Week', 'ID_1', 'ID_2'])['Target'].median()
saved_groupby = saved_groupby.reset_index()
saved_groupby['Week'] = saved_groupby['Week'] + 2
and then merge df with saved_groupby:
result = pd.merge(df, saved_groupby, on=['Week', 'ID_1', 'ID_2'], how='left')
This will augment df with the target median from 2 weeks ago.
To make the median (target) saved_groupby column 0 when there is no match, use fillna to change NaNs to 0:
result['Median'] = result['Median'].fillna(0)
For example,
import numpy as np
import pandas as pd
np.random.seed(2016)
df = pd.DataFrame(np.random.randint(5, size=(20,5)),
columns=['Week', 'ID_1', 'ID_2', 'Target', 'Foo'])
saved_groupby = df.groupby(['Week', 'ID_1', 'ID_2'])['Target'].median()
saved_groupby = saved_groupby.reset_index()
saved_groupby['Week'] = saved_groupby['Week'] + 2
saved_groupby = saved_groupby.rename(columns={'Target':'Median'})
result = pd.merge(df, saved_groupby, on=['Week', 'ID_1', 'ID_2'], how='left')
result['Median'] = result['Median'].fillna(0)
print(result)
yields
Week ID_1 ID_2 Target Foo Median
0 3 2 3 4 2 0.0
1 3 3 0 3 4 0.0
2 4 3 0 1 2 0.0
3 3 4 1 1 1 0.0
4 2 4 2 0 3 2.0
5 1 0 1 4 4 0.0
6 2 3 4 0 0 0.0
7 4 0 0 2 3 0.0
8 3 4 3 2 2 0.0
9 2 2 4 0 1 0.0
10 2 0 4 4 2 0.0
11 1 1 3 0 0 0.0
12 0 1 0 2 0 0.0
13 4 0 4 0 3 4.0
14 1 2 1 3 1 0.0
15 3 0 1 3 4 2.0
16 0 4 2 2 4 0.0
17 1 1 4 4 2 0.0
18 4 1 0 3 0 0.0
19 1 0 1 0 0 0.0

Categories

Resources