image of jupter notebook issue
For my quarters instead of values for examples 1,0,0,0 showing up I get NaN.
How do I fix the code below so I return values in my dataframe
qrt_1 = {'q1':[1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0]}
qrt_2 = {'q2':[0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0]}
qrt_3 = {'q3':[0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0]}
qrt_4 = {'q4':[0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1]}
year = {'year': [1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5,6,6,6,6,7,7,7,7,8,8,8,8,9,9,9,9]}
value = data_1['Sales']
data = [year, qrt_1, qrt_2, qrt_3, qrt_4]
dataframes = []
for x in data:
dataframes.append(pd.DataFrame(x))
df = pd.concat(dataframes)
I am expecting a dataframe that returns the qrt_1, qrt_2 etc with their corresponding column names
Try to use axis=1 in pd.concat:
df = pd.concat(dataframes, axis=1)
print(df)
Prints:
year q1 q2 q3 q4
0 1 1 0 0 0
1 1 0 1 0 0
2 1 0 0 1 0
3 1 0 0 0 1
4 2 1 0 0 0
5 2 0 1 0 0
6 2 0 0 1 0
7 2 0 0 0 1
8 3 1 0 0 0
9 3 0 1 0 0
10 3 0 0 1 0
11 3 0 0 0 1
12 4 1 0 0 0
13 4 0 1 0 0
14 4 0 0 1 0
15 4 0 0 0 1
16 5 1 0 0 0
17 5 0 1 0 0
18 5 0 0 1 0
19 5 0 0 0 1
20 6 1 0 0 0
21 6 0 1 0 0
22 6 0 0 1 0
23 6 0 0 0 1
24 7 1 0 0 0
25 7 0 1 0 0
26 7 0 0 1 0
27 7 0 0 0 1
28 8 1 0 0 0
29 8 0 1 0 0
30 8 0 0 1 0
31 8 0 0 0 1
32 9 1 0 0 0
33 9 0 1 0 0
34 9 0 0 1 0
35 9 0 0 0 1
I have a data frame and I need to group by at least one occurrence greater than 0 and I need to sum it to last occurance. My code is below
data = {'id':
[7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1],
'timeatAcc':
[0,0,0,0,0,0,0,0,1,1,1,0,0,1,1,0,0,1,1,1,1,1,1,0,0,0,0,0,1,1,1,0,0,1,1,0,0,0,0,0]
}
df =pd.DataFrame(data, columns= ['id','timeatAcc'])
df['consecutive'] = df['id'].groupby((df['timeatAcc'] !=
df['timeatAcc'].shift()).cumsum()).transform('size') * df['timeatAcc']
print(df)
Current Output
Expected output
Need help and thanks in advance
Let's try groupby().diff():
df['Occurences'] = df.groupby('id')['timeatAcc'].diff(-1).eq(1).astype(int)
Output:
id timeatAcc Occurences
0 7 0 0
1 7 0 0
2 7 0 0
3 7 0 0
4 7 0 0
5 7 0 0
6 7 0 0
7 7 0 0
8 7 1 0
9 7 1 0
10 7 1 1
11 7 0 0
12 7 0 0
13 7 1 0
14 7 1 1
15 7 0 0
16 7 0 0
17 7 1 0
18 7 1 0
19 7 1 0
20 1 1 0
21 1 1 0
22 1 1 1
23 1 0 0
24 1 0 0
25 1 0 0
26 1 0 0
27 1 0 0
28 1 1 0
29 1 1 0
30 1 1 1
31 1 0 0
32 1 0 0
33 1 1 0
34 1 1 1
35 1 0 0
36 1 0 0
37 1 0 0
38 1 0 0
39 1 0 0
Update: to get the sum instead of 1:
df['Occurences'] = df.groupby(['id', df['timeatAcc'].eq(0).cumsum()])['timeatAcc'].transform('sum')
df['Occurences'] = np.where(df.groupby('id')['timeatAcc'].diff(-1).eq(1).astype(int)
, df['Occurences'], 0)
Output:
id timeatAcc Occurences
0 7 0 0
1 7 0 0
2 7 0 0
3 7 0 0
4 7 0 0
5 7 0 0
6 7 0 0
7 7 0 0
8 7 1 0
9 7 1 0
10 7 1 3
11 7 0 0
12 7 0 0
13 7 1 0
14 7 1 2
15 7 0 0
16 7 0 0
17 7 1 0
18 7 1 0
19 7 1 0
20 1 1 0
21 1 1 0
22 1 1 3
23 1 0 0
24 1 0 0
25 1 0 0
26 1 0 0
27 1 0 0
28 1 1 0
29 1 1 0
30 1 1 3
31 1 0 0
32 1 0 0
33 1 1 0
34 1 1 2
35 1 0 0
36 1 0 0
37 1 0 0
38 1 0 0
39 1 0 0
I am trying to create a new index for a dataframe from created from a root file. I'm using uproot to bring in the file using the command
upfile_muon = uproot.open(file_prefix_muon + '.root')
tree_muon = upfile_muon['ntupler']['tree']
df_muon = tree_muon.pandas.df(['vh_sim_r','vh_sim_phi','vh_sim_z','vh_sim_tp1','vh_sim_tp2',
'vh_type','vh_station','vh_ring','vh_sim_theta'], entrystop=args.max_events)
This then creates a multiindex pandas dataframe with entries and subentries as my two indexes. I want to filter out all subentries of length 3 or less. I do that with the following command while creating vectors that slice the dataframe into the data that I need.
a = 0
bad_entries = 0
entries = []
nuindex = []
tru = 0
while(a < args.max_events):
if(df_muon.loc[(a),:].shape[0] > 3):
entries.append(a)
b = 0
while( b < df_muon.loc[(a),:].shape[0]):
nuindex.append(tru)
b = b + 1
tru = tru + 1
else:
bad_entries = bad_entries + 1
a = a + 1
df_muon = df_muon.loc[pd.IndexSlice[entries,:],:]
So now my dataframe looks like this
vh_sim_r vh_sim_phi vh_sim_z vh_sim_tp1 vh_sim_tp2 vh_type vh_station vh_ring vh_sim_theta
entry subentry
0 0 149.724701 -124.728081 793.598755 0 0 3 2 1 10.684152
1 149.236725 -124.180763 796.001221 -1 -1 3 2 1 10.618716
2 149.456131 -124.687302 796.001221 0 0 3 2 1 10.633972
3 92.405533 -126.913628 539.349976 0 0 4 1 1 9.721958
4 149.345184 -124.332527 839.810669 0 0 1 2 1 10.083608
5 176.544983 -123.978333 964.500000 0 0 2 3 1 10.372764
6 194.614502 -123.764595 1054.994995 0 0 2 4 1 10.451831
7 149.236725 -124.180763 796.001221 -1 -1 3 2 1 10.618716
8 149.456131 -124.687302 796.001221 0 0 3 2 1 10.633972
9 92.405533 -126.913628 539.349976 0 0 4 1 1 9.721958
10 149.345184 -124.332527 839.810669 0 0 1 2 1 10.083608
11 176.544983 -123.978333 964.500000 0 0 2 3 1 10.372764
12 194.614502 -123.764595 1054.994995 0 0 2 4 1 10.451831
1 0 265.027252 -3.324370 796.001221 0 0 3 2 1 18.415092
1 272.908997 -3.531896 839.903625 0 0 1 2 1 18.000479
2 299.305176 -3.531351 923.885132 0 0 1 3 1 17.950438
3 312.799255 -3.499015 964.500000 0 0 2 3 1 17.968519
4 328.321442 -3.530087 1013.620056 0 0 1 4 1 17.947645
5 181.831726 -1.668625 567.971252 0 0 3 1 1 17.752077
6 265.027252 -3.324370 796.001221 0 0 3 2 1 18.415092
7 197.739120 -2.073746 615.796265 0 0 1 1 1 17.802410
8 272.908997 -3.531896 839.903625 0 0 1 2 1 18.000479
9 299.305176 -3.531351 923.885132 0 0 1 3 1 17.950438
10 312.799255 -3.499015 964.500000 0 0 2 3 1 17.968519
11 328.321442 -3.530087 1013.620056 0 0 1 4 1 17.947645
12 356.493073 -3.441958 1065.694946 0 0 2 4 2 18.495964
2 0 204.523163 -124.065643 839.835571 0 0 1 2 1 13.686690
1 135.439163 -122.568153 567.971252 0 0 3 1 1 13.412345
2 196.380875 -123.940300 796.001221 0 0 3 2 1 13.858652
3 129.801193 -122.348656 539.349976 0 0 4 1 1 13.531607
4 224.134796 -124.194283 923.877441 0 0 1 3 1 13.636631
5 237.166031 -124.181770 964.500000 0 0 2 3 1 13.814683
6 246.809235 -124.196938 1013.871643 0 0 1 4 1 13.681540
7 259.389587 -124.164017 1054.994995 0 0 2 4 1 13.813211
8 204.523163 -124.065643 839.835571 0 0 1 2 1 13.686690
9 196.380875 -123.940300 796.001221 0 0 3 2 1 13.858652
10 129.801193 -122.348656 539.349976 0 0 4 1 1 13.531607
11 224.134796 -124.194283 923.877441 0 0 1 3 1 13.636631
12 237.166031 -124.181770 964.500000 0 0 2 3 1 13.814683
13 246.809235 -124.196938 1013.871643 0 0 1 4 1 13.681540
14 259.389587 -124.164017 1054.994995 0 0 2 4 1 13.813211
3 0 120.722900 -22.053474 615.786621 0 0 1 1 4 11.091969
1 170.635376 -23.190208 793.598755 0 0 3 2 1 12.134683
2 110.061127 -21.370941 539.349976 0 0 4 1 1 11.533570
3 164.784668 -23.263920 814.977478 0 0 1 2 1 11.430829
4 192.868652 -23.398684 948.691345 0 0 1 3 1 11.491603
5 199.817978 -23.325649 968.900024 0 0 2 3 1 11.652840
6 211.474625 -23.265354 1038.803833 0 0 1 4 1 11.506759
7 216.406830 -23.275047 1059.395020 0 0 2 4 1 11.545199
8 170.612457 -23.136520 793.598755 -1 -1 3 2 1 12.133101
5 0 179.913177 -14.877813 615.749207 0 0 1 1 1 16.287615
1 160.188034 -14.731569 565.368774 0 0 3 1 1 15.819215
2 240.671204 -15.410946 793.598755 0 0 3 2 1 16.870745
3 166.238678 -14.774992 586.454590 0 0 1 1 1 15.826117
4 241.036865 -15.400753 815.009399 0 0 1 2 1 16.475443
5 281.086792 -15.534301 948.707581 0 0 1 3 1 16.503710
6 288.768768 -15.577776 968.900024 0 0 2 3 1 16.596043
7 309.145935 -15.533208 1038.588745 0 0 1 4 1 16.576143
8 312.951233 -15.579374 1059.395020 0 0 2 4 1 16.457436
9 312.313416 -16.685022 1059.395020 -1 -1 2 4 1 16.425705
Now my goal is to find a way to change the 5 value in the entry index to a 4. I want to do this in a way that automates the process such that I can have a huge number of entries (~20,000), I can have my filter delete the unusable entries, then it renumbers all of the entries sequentially from 0 to the last unfiltered entry. I've tried all sorts of commands but I've had no luck. Is there a way to do this directly?
df_muon = (df_muon
.reset_index() # Get the multi-index back as columns
.replace({'entry': 5}, {'entry': 4}) # Replace 5 in column 'entry' with 4
.set_index(['entry', 'subentry']) # Go back to the multi-index
)
Now I have a DataFrame as below:
video_id 0 1 2 3 4 5 6 7 8 9 ... 53 54 55 56
user_id ...
0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0
1 2 0 4 13 16 2 0 10 6 45 ... 3 352 6 0
2 0 0 0 0 0 0 0 11 0 0 ... 0 0 0 0
3 4 13 0 8 0 0 5 9 12 11 ... 14 17 0 6
4 0 0 4 13 25 4 0 33 0 39 ... 5 7 4 3
6 2 0 0 0 12 0 0 0 2 0 ... 19 4 0 0
7 33 59 52 59 113 53 29 32 59 82 ... 60 119 57 39
9 0 0 0 0 5 0 0 1 0 4 ... 16 0 0 0
10 0 0 0 0 40 0 0 0 0 0 ... 26 0 0 0
11 2 2 32 3 12 3 3 11 19 10 ... 16 3 3 9
12 0 0 0 0 0 0 0 7 0 0 ... 7 0 0 0
We can see that part of the DataFrame is missing, like user_id_5 and user_id_8. What I want to do is to fill these rows with 0, like:
video_id 0 1 2 3 4 5 6 7 8 9 ... 53 54 55 56
user_id ...
0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0
1 2 0 4 13 16 2 0 10 6 45 ... 3 352 6 0
2 0 0 0 0 0 0 0 11 0 0 ... 0 0 0 0
3 4 13 0 8 0 0 5 9 12 11 ... 14 17 0 6
4 0 0 4 13 25 4 0 33 0 39 ... 5 7 4 3
5 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0
6 2 0 0 0 12 0 0 0 2 0 ... 19 4 0 0
7 33 59 52 59 113 53 29 32 59 82 ... 60 119 57 39
8 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0
9 0 0 0 0 5 0 0 1 0 4 ... 16 0 0 0
10 0 0 0 0 40 0 0 0 0 0 ... 26 0 0 0
11 2 2 32 3 12 3 3 11 19 10 ... 16 3 3 9
12 0 0 0 0 0 0 0 7 0 0 ... 7 0 0 0
Is there any solution to this issue?
You could use arange + reindex -
df = df.reindex(np.arange(df.index.min(), df.index.max() + 1), fill_value=0)
Assuming your index is meant to be monotonically increasing index.
df
0 1 2 3 4 5 6 7 8 9
0 0 0 0 0 0 0 0 0 0 0
1 2 0 4 13 16 2 0 10 6 45
2 0 0 0 0 0 0 0 11 0 0
3 4 13 0 8 0 0 5 9 12 11
4 0 0 4 13 25 4 0 33 0 39
6 2 0 0 0 12 0 0 0 2 0
7 33 59 52 59 113 53 29 32 59 82
9 0 0 0 0 5 0 0 1 0 4
10 0 0 0 0 40 0 0 0 0 0
11 2 2 32 3 12 3 3 11 19 10
12 0 0 0 0 0 0 0 7 0 0
df.reindex(np.arange(df.index.min(), df.index.max() + 1), fill_value=0)
0 1 2 3 4 5 6 7 8 9
0 0 0 0 0 0 0 0 0 0 0
1 2 0 4 13 16 2 0 10 6 45
2 0 0 0 0 0 0 0 11 0 0
3 4 13 0 8 0 0 5 9 12 11
4 0 0 4 13 25 4 0 33 0 39
5 0 0 0 0 0 0 0 0 0 0 # <-----
6 2 0 0 0 12 0 0 0 2 0
7 33 59 52 59 113 53 29 32 59 82
8 0 0 0 0 0 0 0 0 0 0 # <-----
9 0 0 0 0 5 0 0 1 0 4
10 0 0 0 0 40 0 0 0 0 0
11 2 2 32 3 12 3 3 11 19 10
12 0 0 0 0 0 0 0 7 0 0