Question is pretty self explanatory, how would you insert a dataframe with a couple of values in to a bigger dataframe at a given point (between index's 10 and 11). Meaning that .append cant be used
You can use concat with sliced df by loc:
np.random.seed(100)
df1 = pd.DataFrame(np.random.randint(100, size=(5,6)), columns=list('ABCDEF'))
print (df1)
A B C D E F
0 8 24 67 87 79 48
1 10 94 52 98 53 66
2 98 14 34 24 15 60
3 58 16 9 93 86 2
4 27 4 31 1 13 83
df2 = pd.DataFrame({'A':[1,2,3],
'B':[4,5,6],
'C':[7,8,9],
'D':[1,3,5],
'E':[5,3,6],
'F':[7,4,3]})
print (df2)
A B C D E F
0 1 4 7 1 5 7
1 2 5 8 3 3 4
2 3 6 9 5 6 3
#inserted between 4 and 5 index values
print (pd.concat([df1.loc[:4], df2, df1.loc[4:]], ignore_index=True))
A B C D E F
0 8 24 67 87 79 48
1 10 94 52 98 53 66
2 98 14 34 24 15 60
3 58 16 9 93 86 2
4 27 4 31 1 13 83
5 1 4 7 1 5 7
6 2 5 8 3 3 4
7 3 6 9 5 6 3
8 27 4 31 1 13 83
Related
I have this dataset:
menu alternative id varA varB varC
1 NaN A NaN NaN NaN
1 NaN A NaN NaN NaN
2 94 A 8.089481019 7.07639559 0.90627215
2 89 A 7.52310322 19.49894193 14.4562262
3 79 A 24.79634962 18.91163612 23.85341972
3 95 A 21.10990397 17.00630516 1.09875582
4 47 A 5.681766806 4.136047755 17.38880496
4 62 A 10.39459876 0.997853805 0.045331687
5 58 A 11.91790497 5.696799013 27.21424163
5 23 A 11.71107828 2.165751058 11.56534045
6 57 A 1.068603487 27.95362014 1.334049372
6 100 A 26.31848796 6.757305213 4.742282633
7 NaN A NaN NaN NaN
7 NaN A NaN NaN NaN
8 24 A 29.99608877 28.49057834 0.14073638
8 7 A 8.749041949 14.17745528 9.604565417
9 64 A 29.4316969 19.57593592 9.174503643
9 60 A 13.53995541 1.898164567 16.49089291
10 85 A 20.1394155 0.995839592 16.18638727
10 22 A 22.68625486 14.26052953 17.79707308
1 82 B 21.34428845 9.901326583 1.053134597
1 91 B 19.04689216 16.29217346 29.56962312
2 35 B 25.44168095 29.00407645 2.246459981
2 100 B 15.79687903 20.37920541 28.45071525
3 44 B 7.359501131 23.66924419 7.198215907
3 41 B 22.65272801 8.66227065 12.05186217
4 59 B 26.67565422 9.608511948 26.45016581
4 53 B 5.64870847 21.83063691 19.20105218
5 48 B 5.591317152 25.17616679 24.30522374
5 16 B 23.85069753 23.12154586 0.781450997
6 NaN B NaN NaN NaN
6 NaN B NaN NaN NaN
7 68 B 9.334935288 16.39114327 21.17696541
7 41 B 5.841577934 6.901223007 28.38116983
8 35 B 21.20288984 9.665414964 4.472546438
8 96 B 0.451299457 27.66880932 26.2120144
9 84 B 19.67310555 1.993071082 9.08442779
9 65 B 0.475983889 16.72261394 17.17122898
10 40 B 9.553130945 17.88616649 22.17570401
10 40 B 19.70487161 5.898428653 11.25844279
1 19 C 20.47792809 9.344376127 7.855311112
1 59 C 14.59141273 8.090534362 19.6972446
2 19 C 6.624345353 0.192145343 26.31356322
2 67 C 24.483236 6.718856437 25.75609679
3 67 C 27.6408808 24.91014602 25.90758755
3 30 C 26.52738124 10.78363589 4.873602089
4 14 C 3.776964641 21.16561036 24.03153234
4 46 C 16.53719818 23.86634958 25.61504006
5 NaN C NaN NaN NaN
5 NaN C NaN NaN NaN
6 58 C 28.1357636 15.89359176 0.567406646
6 28 C 0.708229201 12.20641988 0.309303591
7 17 C 4.319991082 21.29233667 3.516184987
7 8 C 24.09490443 9.507000131 14.93472971
8 85 C 19.99606403 21.61509867 0.161222766
8 5 C 6.056082264 25.35186187 5.375641692
9 24 C 19.83904205 24.54037422 11.08571464
9 13 C 4.388769239 7.928106767 4.279531285
10 78 C 13.67598922 5.3140143 15.2710129
10 13 C 12.27642791 16.04610858 1.815260029
As you can see here, I have some null values which I need to fill. However, I need to fill them in a somewhat custom manner. For every id and for every menu I need to fill the null values based on random selection among the same menus (same menu number) in different ids which have non-null values.
Example. The menu 1 in id A has null values. I want to randomly select menu 1 in different id which has non-null values and fill them there. Let it be, id B and menu 1. For menu 7 in id A let it be menu 7 in id C and etc.
It is somehow similar to this question but iin my case, the filling should happen within the same "subgroups" if we can say so.
The final output should be something like this:
menu alternative id varA varB varC
1 82 A 21.34428845 9.901326583 1.053134597
1 91 A 19.04689216 16.29217346 29.56962312
2 94 A 8.089481019 7.07639559 0.90627215
2 89 A 7.52310322 19.49894193 14.4562262
3 79 A 24.79634962 18.91163612 23.85341972
3 95 A 21.10990397 17.00630516 1.09875582
4 47 A 5.681766806 4.136047755 17.38880496
4 62 A 10.39459876 0.997853805 0.045331687
5 58 A 11.91790497 5.696799013 27.21424163
5 23 A 11.71107828 2.165751058 11.56534045
6 57 A 1.068603487 27.95362014 1.334049372
6 100 A 26.31848796 6.757305213 4.742282633
7 17 A 4.319991082 21.29233667 3.516184987
7 8 A 24.09490443 9.507000131 14.93472971
8 24 A 29.99608877 28.49057834 0.14073638
8 7 A 8.749041949 14.17745528 9.604565417
9 64 A 29.4316969 19.57593592 9.174503643
9 60 A 13.53995541 1.898164567 16.49089291
10 85 A 20.1394155 0.995839592 16.18638727
10 22 A 22.68625486 14.26052953 17.79707308
1 82 B 21.34428845 9.901326583 1.053134597
1 91 B 19.04689216 16.29217346 29.56962312
2 35 B 25.44168095 29.00407645 2.246459981
2 100 B 15.79687903 20.37920541 28.45071525
3 44 B 7.359501131 23.66924419 7.198215907
3 41 B 22.65272801 8.66227065 12.05186217
4 59 B 26.67565422 9.608511948 26.45016581
4 53 B 5.64870847 21.83063691 19.20105218
5 48 B 5.591317152 25.17616679 24.30522374
5 16 B 23.85069753 23.12154586 0.781450997
6 57 B 1.068603487 27.95362014 1.334049372
6 100 B 26.31848796 6.757305213 4.742282633
7 68 B 9.334935288 16.39114327 21.17696541
7 41 B 5.841577934 6.901223007 28.38116983
8 35 B 21.20288984 9.665414964 4.472546438
8 96 B 0.451299457 27.66880932 26.2120144
9 84 B 19.67310555 1.993071082 9.08442779
9 65 B 0.475983889 16.72261394 17.17122898
10 40 B 9.553130945 17.88616649 22.17570401
10 40 B 19.70487161 5.898428653 11.25844279
1 19 C 20.47792809 9.344376127 7.855311112
1 59 C 14.59141273 8.090534362 19.6972446
2 19 C 6.624345353 0.192145343 26.31356322
2 67 C 24.483236 6.718856437 25.75609679
3 67 C 27.6408808 24.91014602 25.90758755
3 30 C 26.52738124 10.78363589 4.873602089
4 14 C 3.776964641 21.16561036 24.03153234
4 46 C 16.53719818 23.86634958 25.61504006
5 48 C 5.591317152 25.17616679 24.30522374
5 16 C 23.85069753 23.12154586 0.781450997
6 58 C 28.1357636 15.89359176 0.567406646
6 28 C 0.708229201 12.20641988 0.309303591
7 17 C 4.319991082 21.29233667 3.516184987
7 8 C 24.09490443 9.507000131 14.93472971
8 85 C 19.99606403 21.61509867 0.161222766
8 5 C 6.056082264 25.35186187 5.375641692
9 24 C 19.83904205 24.54037422 11.08571464
9 13 C 4.388769239 7.928106767 4.279531285
10 78 C 13.67598922 5.3140143 15.2710129
10 13 C 12.27642791 16.04610858 1.815260029
Any guidance would be appreciated. Maybe even there is some groupby apply logic which could assist in this.
You can run fillna() row-wise in apply(), then fill with a random sample from the dataframe filtered by your conditions:
df.apply(lambda row: row.fillna(df[(df['menu'] == row['menu']) & (df['id'] != row['id'])].dropna().sample(n=1).iloc[0]), axis=1)
I have basically two DataFrames from different dates and want to join them into one
let's say this is data from 25 Sep
hour columnA columnB
0 12 24
1 45 87
2 10 58
3 12 13
4 12 20
here is data from 26sep
hour columnA columnB
0 54 89
1 45 3
2 33 97
3 12 13
4 78 47
now I want to join both DataFrames and get MultiIndex DataFrame like this
25sep hour columnA columnB
0 12 24
1 45 87
2 10 58
3 12 13
4 12 20
26sep hour columnA columnB
0 54 89
1 45 3
2 33 97
3 12 13
4 78 47
I read the docs about MultiIndex but am not sure how to apply it to my situation.
Use pandas.concat
https://pandas.pydata.org/docs/reference/api/pandas.concat.html
>>> df = pd.concat([df1.set_index('hour'), df2.set_index('hour')],
keys=["25sep", "26sep"])
>>> df
columnA columnB
hour
25sep 0 12 24
1 45 87
2 10 58
3 12 13
4 12 20
26sep 0 54 89
1 45 3
2 33 97
3 12 13
4 78 47
Let us try
out = pd.concat({ y : x.set_index('hour') for x, y in zip([df1,df2],['25sep','26sep'])})
columnA columnB
hour
25sep 0 12 24
1 45 87
2 10 58
3 12 13
4 12 20
26sep 0 54 89
1 45 3
2 33 97
3 12 13
4 78 47
my dataset
name day value
A 7 88
A 15 101
A 21 121
A 29 56
B 21 131
B 30 78
B 35 102
C 8 80
C 16 101
...
I am trying to plot with values for these days, but I want to label because there are too many unique numbers of days.
I try to label it consistently,
Is there a way to speed up labeling by cutting it every 7 days(week)?
For example, ~ 7day = 1week, 8 ~ 14day = 2week, and so on.
output what I want
name day value week
A 7 88 1
A 15 101 3
A 21 121 3
A 29 56 5
B 21 131 3
B 30 78 5
B 35 102 5
C 8 80 2
C 16 101 3
thank you for reading
Subtract 1, then use integer division by 7 and last add 1:
df['week'] = (df['day'] - 1) // 7 + 1
print (df)
name day value week
0 A 7 88 1
1 A 15 101 3
2 A 21 121 3
3 A 29 56 5
4 B 21 131 3
5 B 30 78 5
6 B 35 102 5
7 C 8 80 2
8 C 16 101 3
I am working with python to create a new frame starting from two frame by using Pandas.
The first frame (called frame1) is composed by the following line:
A B C D E
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
6 6 6 6 6
7 7 7 7 7
8 8 8 8 8
9 9 9 9 9
10 10 10 10 10
11 11 11 11 11
12 12 12 12 12
13 13 13 13 13
14 14 14 14 14
15 15 15 15 15
The second frame (called frame2) is:
A B C D E
19 19 19 19 19
24 24 24 24 24
29 29 29 29 29
34 34 34 34 34
39 39 39 39 39
44 44 44 44 44
49 49 49 49 49
54 54 54 54 54
59 59 59 59 59
64 64 64 64 64
69 69 69 69 69
74 74 74 74 74
79 79 79 79 79
84 84 84 84 84
89 89 89 89 89
94 94 94 94 94
99 99 99 99 99
Now i want to create a new dataset with this logic: starting from frame1 substitute every 5 row until the end of the frame1, the row of the frame1 with a random row of the frame2 (and remove the added row from frame2). A possible output should be:
A B C D E
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
59 59 59 59 59
6 6 6 6 6
7 7 7 7 7
8 8 8 8 8
9 9 9 9 9
29 29 29 29 29
11 11 11 11 11
12 12 12 12 12
13 13 13 13 13
14 14 14 14 14
84 84 84 84 84
How can i do this operation?
It's quite simple:
frame1.loc[4::5] = frame2.sample(frac=1).reset_index(drop=True)
where
df.loc[4::5] selects every fifth element, starting with the fifth one in df, and
df.sample(frac=1).reset_index(drop=True) shuffles a df around randomly
One way is to first obtain the indices where to update (we could also slice assign, but we'd have the problem of the end not being included), and then assign back taking a sample from df2 of the corresponding size:
ix = np.flatnonzero(np.diff(np.arange(df.shape[0]+1)//5))
df1.iloc[ix] = df2.sample(df1.shape[0]//5).to_numpy()
print(df1)
A B C D E
0 1 1 1 1 1
1 2 2 2 2 2
2 3 3 3 3 3
3 4 4 4 4 4
4 84 84 84 84 84
5 6 6 6 6 6
6 7 7 7 7 7
7 8 8 8 8 8
8 9 9 9 9 9
9 89 89 89 89 89
10 11 11 11 11 11
11 12 12 12 12 12
12 13 13 13 13 13
13 14 14 14 14 14
14 99 99 99 99 99
I have 2 data frames, df1 and df2:
df1
Out[66]:
A B
0 1 11
1 1 2
2 1 32
3 1 42
4 1 54
5 1 66
6 2 16
7 2 23
8 3 13
9 3 24
10 3 35
11 3 46
12 3 51
13 4 12
14 4 28
15 4 39
16 4 49
df2
Out[80]:
B
0 32
1 42
2 13
3 24
4 35
5 39
6 49
I want to merge dataframes but at the same time including the first and/or last value of the set in column A. This is an example of the desired outcome:
df3
Out[93]:
A B
0 1 2
1 1 32
2 1 42
3 1 54
4 3 13
5 3 24
6 3 35
7 3 46
8 4 28
9 4 39
10 4 49
I'm trying to use merge but that only slice the portion of data frames that coincides. Someone have an idea to deal with this? thanks!
Here's one way to do it using merge with indicator, groupby, and rolling:
df[df.merge(df2, on='B', how='left', indicator='Ind').eval('Found=Ind == "both"')
.groupby('A')['Found']
.apply(lambda x: x.rolling(3, center=True, min_periods=2).max()).astype(bool)]
Output:
A B
1 1 2
2 1 32
3 1 42
4 1 54
8 3 13
9 3 24
10 3 35
11 3 46
14 4 28
15 4 39
16 4 49
pd.concat([df1.groupby('A').min().reset_index(), pd.merge(df1,df2, on="B"), df1.groupby('A').max().reset_index()]).reset_index(drop=True).drop_duplicates().sort_values(['A','B'])
A B
0 1 2
4 1 32
5 1 42
1 2 16
2 3 13
7 3 24
8 3 35
3 4 12
9 4 39
10 4 49
Breaking down each part
#Get Minimum
df1.groupby('A').min().reset_index()
# Merge on B
pd.merge(df1,df2, on="B")
# Get Maximum
df1.groupby('A').max().reset_index()
# Reset the Index and drop duplicated rows since there may be similarities between the Merge and Min/Max. Sort values by 'A' then by 'B'
.reset_index(drop=True).drop_duplicates().sort_values(['A','B'])