how to replace specific data using pandas python or excel - python

I having a data csv file containing some data. In which i have some data within semi colons. In these semi colon there is some specific id numbers and i need to replace it with the specific location name.
Available data
24CFA4A-12L - GF Electrical corridor
Replacing data within semicolons of id number
1;1;35;;1/2/1/37 24CFA4A;;;0;;;
Files with data - https://gofile.io/d/bQDppz
Thank you if anyone have solution.
[![Main data to replaced after finding id number and replacing with location ][3]][3]

Supposing you have dataframes:
df1 = pd.read_excel("ID_list.xlsx", header=None)
df2 = pd.read_excel("location.xlsx", header=None)
df1:
0
0 1;1;27;;1/2/1/29 25BAB3D;;;0;;;
1 1;1;27;1;;;;0;;;
2 1;1;28;;1/2/1/30 290E6D2;;;0;;;
3 1;1;28;1;;;;0;;;
4 1;1;29;;1/2/1/31 28BA737;;;0;;;
5 1;1;29;1;;;;0;;;
6 1;1;30;;1/2/1/32 2717823;;;0;;;
7 1;1;30;1;;;;0;;;
8 1;1;31;;1/2/1/33 254DEAA;;;0;;;
9 1;1;31;1;;;;0;;;
10 1;1;32;;1/2/1/34 28AE041;;;0;;;
11 1;1;32;1;;;;0;;;
12 1;1;33;;1/2/1/35 254DE82;;;0;;;
13 1;1;33;1;;;;0;;;
14 1;1;34;;1/2/1/36 2539D70;;;0;;;
15 1;1;34;1;;;;0;;;
16 1;1;35;;1/2/1/37 24CFA4A;;;0;;;
17 1;1;35;1;;;;0;;;
18 1;1;36;;1/2/1/39 28F023E;;;0;;;
19 1;1;36;1;;;;0;;;
20 1;1;37;;1/2/1/40 2717831;;;0;;;
21 1;1;37;1;;;;0;;;
22 1;1;38;;1/2/1/41 2397D75;;;0;;;
23 1;1;38;1;;;;0;;;
24 1;1;39;;1/2/1/42 287844C;;;0;;;
25 1;1;39;1;;;;0;;;
26 1;1;40;;1/2/1/43 28784F0;;;0;;;
27 1;1;40;1;;;;0;;;
28 1;1;41;;1/2/1/44 2865B67;;;0;;;
29 1;1;41;1;;;;0;;;
30 1;1;42;;1/2/1/45 2865998;;;0;;;
31 1;1;42;1;;;;0;;;
32 1;1;43;;1/2/1/46 287852F;;;0;;;
33 1;1;43;1;;;;0;;;
34 1;1;44;;1/2/1/47 287AC43;;;0;;;
35 1;1;44;1;;;;0;;;
36 1;1;45;;1/2/1/48 287ACF8;;;0;;;
37 1;1;45;1;;;;0;;;
38 1;1;46;;1/2/1/49 2878586;;;0;;;
39 1;1;46;1;;;;0;;;
40 1;1;47;;1/2/1/50 2878474;;;0;;;
41 1;1;47;1;;;;0;;;
42 1;1;48;;1/2/1/51 2846315;;;0;;;
df2:
0 1
0 GF General Dining TC 254DEAA-02L
1 GF General Dining TC 2717823-26L
2 GF General Dining FC 28BA737-50L
3 GF Preparation FC 25BAB3D-10L
4 GF Preparation TC 290E6D2-01M
5 GF Hospital Kitchen FC 25BAB2F-10L
6 GF Hospital Kitchen TC 2906F5C-01M
7 GF Food Preparation FC 25F5723-10L
8 GF Food Preparation TC 29070D6-01M
9 GF KITCHEN Corridor 254DF5D-02L
Then:
df1 = df1[0].str.split(";", expand=True)
df1[4] = df1[4].apply(lambda x: v[-1] if (v := x.split()) else "")
df2[1] = df2[1].apply(lambda x: x.split("-")[0])
df1:
0 1 2 3 4 5 6 7 8 9 10
0 1 1 27 25BAB3D 0
1 1 1 27 1 0
2 1 1 28 290E6D2 0
3 1 1 28 1 0
4 1 1 29 28BA737 0
5 1 1 29 1 0
6 1 1 30 2717823 0
7 1 1 30 1 0
8 1 1 31 254DEAA 0
9 1 1 31 1 0
10 1 1 32 28AE041 0
11 1 1 32 1 0
12 1 1 33 254DE82 0
13 1 1 33 1 0
14 1 1 34 2539D70 0
15 1 1 34 1 0
16 1 1 35 24CFA4A 0
17 1 1 35 1 0
18 1 1 36 28F023E 0
19 1 1 36 1 0
20 1 1 37 2717831 0
21 1 1 37 1 0
22 1 1 38 2397D75 0
23 1 1 38 1 0
24 1 1 39 287844C 0
25 1 1 39 1 0
26 1 1 40 28784F0 0
27 1 1 40 1 0
28 1 1 41 2865B67 0
29 1 1 41 1 0
30 1 1 42 2865998 0
31 1 1 42 1 0
32 1 1 43 287852F 0
33 1 1 43 1 0
34 1 1 44 287AC43 0
35 1 1 44 1 0
36 1 1 45 287ACF8 0
37 1 1 45 1 0
38 1 1 46 2878586 0
39 1 1 46 1 0
40 1 1 47 2878474 0
41 1 1 47 1 0
42 1 1 48 2846315 0
df2:
0 1
0 GF General Dining TC 254DEAA
1 GF General Dining TC 2717823
2 GF General Dining FC 28BA737
3 GF Preparation FC 25BAB3D
4 GF Preparation TC 290E6D2
5 GF Hospital Kitchen FC 25BAB2F
6 GF Hospital Kitchen TC 2906F5C
7 GF Food Preparation FC 25F5723
8 GF Food Preparation TC 29070D6
9 GF KITCHEN Corridor 254DF5D
To replace the values:
m = dict(zip(df2[1], df2[0]))
df1[4] = df1[4].replace(m)
df1:
0 1 2 3 4 5 6 7 8 9 10
0 1 1 27 GF Preparation FC 0
1 1 1 27 1 0
2 1 1 28 GF Preparation TC 0
3 1 1 28 1 0
4 1 1 29 GF General Dining FC 0
5 1 1 29 1 0
6 1 1 30 GF General Dining TC 0
7 1 1 30 1 0
8 1 1 31 GF General Dining TC 0
9 1 1 31 1 0
10 1 1 32 28AE041 0
11 1 1 32 1 0
12 1 1 33 254DE82 0
13 1 1 33 1 0
14 1 1 34 2539D70 0
15 1 1 34 1 0
16 1 1 35 24CFA4A 0
17 1 1 35 1 0
18 1 1 36 28F023E 0
19 1 1 36 1 0
20 1 1 37 2717831 0
21 1 1 37 1 0
22 1 1 38 2397D75 0
23 1 1 38 1 0
24 1 1 39 287844C 0
25 1 1 39 1 0
26 1 1 40 28784F0 0
27 1 1 40 1 0
28 1 1 41 2865B67 0
29 1 1 41 1 0
30 1 1 42 2865998 0
31 1 1 42 1 0
32 1 1 43 287852F 0
33 1 1 43 1 0
34 1 1 44 287AC43 0
35 1 1 44 1 0
36 1 1 45 287ACF8 0
37 1 1 45 1 0
38 1 1 46 2878586 0
39 1 1 46 1 0
40 1 1 47 2878474 0
41 1 1 47 1 0
42 1 1 48 2846315 0

Related

Iterate over rows and calculate values

I have the following pandas dataframe:
temp stage issue_datetime
20 1 2022/11/30 19:20
21 1 2022/11/30 19:21
20 1 None
25 1 2022/11/30 20:10
30 2 None
22 2 2022/12/01 10:00
22 2 2022/12/01 10:01
31 3 2022/12/02 11:00
32 3 2022/12/02 11:01
19 1 None
20 1 None
I want to get the following result:
temp stage num_issues
20 1 3
21 1 3
20 1 3
25 1 3
30 2 2
22 2 2
22 2 2
31 3 2
32 3 2
19 1 0
20 1 0
Basically, I need to calculate the number of non-None per continuous value of stage and create a new column called num_issues.
How can I do it?
You can find the blocks of continuous value with cumsum on the diff, then groupby that and transform the non-null`
blocks = df['stage'].ne(df['stage'].shift()).cumsum()
df['num_issues'] = df['issue_datetime'].notna().groupby(blocks).transform('sum')
# or
# df['num_issues'] = df['issue_datetime'].groupby(blocks).transform('count')
Output:
temp stage issue_datetime num_issues
0 20 1 2022/11/30 19:20 3
1 21 1 2022/11/30 19:21 3
2 20 1 None 3
3 25 1 2022/11/30 20:10 3
4 30 2 None 2
5 22 2 2022/12/01 10:00 2
6 22 2 2022/12/01 10:01 2
7 31 3 2022/12/02 11:00 2
8 32 3 2022/12/02 11:01 2
9 19 1 None 0
10 20 1 None 0

Counting values in data frame rows against another df to see how many values are higher

I have two data frames
df2022fl One is a list of 24 rows
df One is one row of values
1759 columns in each df.
I want to reference every row in dataframe with 24 rows too count how many columns are above the corresponding column in the one row df.
I used the code below, but keep getting the error below the line of code
( df2022fl > df.T[df2022fl.columns].values ).sum(axis=1)
KeyError: "None of [Index(['id', 'table_position', 'performance_rank', 'risk', 'competition_id',\n 'suspended_matches', 'homeAttackAdvantage', 'homeDefenceAdvantage',\n 'homeOverallAdvantage', 'seasonGoals_overall',\n ...\n 'freekicks_total_over275_away', 'freekicks_total_over285_overall',\n 'freekicks_total_over285_home', 'freekicks_total_over285_away',\n 'freekicks_total_over295_overall', 'freekicks_total_over295_home',\n 'freekicks_total_over295_away', 'freekicks_total_over305_overall',\n 'freekicks_total_over305_home', 'freekicks_total_over305_away'],\n dtype='object', length=1759)] are in the [columns]"
I have no idea why this is happening as I removed all type object also, to have just float64 dtypes
Any ideas to help please?
df in text format, this is the dataframe with one row
234 5 5 42 32 0 4 -33 -2 54 30 84 55 29 54 31 19 30 20 10 35 31 34 56 49 58 74 71 71 3 4 -4 16 8 7 13 5 6 7 3 4 38 19 19 4 3 3 1 13 5 5 28 26 21 22 10 9 48 50 39 10 23 9 13 2 3 19 50 42 42 9 10 18 10 6 47 42 32 6 2 2 13 9 9 1 1 1 2 2 1 1 1 1 0 0 0 35 35 30 27 26 25 18 13 21 10 6 2 21 26 8 8 8 17 35 33 39 2 3 8 9 16 17 51 26 17 1 1 0 0 0 0 0 0 0 0 0 0 20 12 7 16 7 5 37 19 14 -8 -2 -9 0 3 5 14 27 34 0 8 13 37 60 81 96 85 67 44 26 4 37 35 32 21 11 2 0 8 25 48 67 79 0 2 5 11 16 18 92 78 65 37 16 0 18 16 14 7 3 0 0 0 0 11 50 83 0 0 0 2 11 16 92 83 67 48 21 4 19 19 16 11 5 1 1 8 24 3 17 52 4 25 48 1 6 11 0 9 57 0 2 12 38 19 19 25 25 22 15 14 9 5 3 2 66 64 49 39 36 19 13 8 4 12 12 9 6 5 3 1 1 1 63 63 47 32 26 16 5 5 5 13 13 12 9 9 4 3 2 0 68 63 50 46 39 17 13 9 0 31 24 19 13 8 5 2 82 63 50 34 21 13 5 16 14 11 6 3 2 1 84 74 56 32 16 11 4 15 10 8 7 5 2 1 78 53 42 37 26 9 5 26 21 15 10 5 3 2 57 47 32 21 11 6 4 12 9 3 2 1 0 0 52 41 14 9 5 0 0 14 12 9 6 2 1 0 61 52 38 25 8 4 0 37 34 25 18 12 4 2 0 0 94 81 57 46 28 8 4 0 0 18 15 11 8 5 1 1 0 0 88 74 46 39 21 4 4 0 0 19 19 14 10 5 3 1 0 0 96 83 63 42 26 13 4 0 0 29 19 7 2 0 0 0 75 40 15 4 0 0 0 12 8 3 2 0 0 0 63 33 13 8 0 0 0 17 10 4 0 0 0 0 83 42 17 0 0 0 0 33 21 11 5 0 0 0 77 55 25 11 0 0 0 17 10 4 2 0 0 0 71 46 17 8 0 0 0 16 11 4 1 0 0 0 83 57 21 4 0 0 0 5 6 2 7 7 14 30 29 30 176 91 85 66 27 35 8 8 9 4 4 4 161 63 94 3 3 3 10 0 4 0 1 1 1 374 229 145 9 12 7 177 88 75 197 127 70 4 4 3 5 6 3 48 51 46 9 9 8 377 182 195 151 66 79 62 28 31 32 16 16 3 2 3 1 1 1 32 31 27 19 12 6 4 96 78 59 41 26 13 9 16 15 12 7 5 1 1 96 78 52 30 22 4 4 16 16 14 10 7 4 1 96 75 57 46 28 17 4 30 18 4 1 0 0 0 78 38 9 2 0 0 0 16 9 0 0 0 0 0 78 39 0 0 0 0 0 14 9 4 1 0 0 0 75 38 17 4 0 0 0 8 4 3 17 17 13 38 19 19 6 4 1 13 17 5 7 3 3 15 13 13 1 1 0 3 5 0 1 1 0 0 0 0 0 0 0 46 31 15 14 9 5 32 19 10 4 10 26 11 26 67 3 7 14 13 37 70 0 3 12 0 16 63 57 33 24 1 1 1 14 8 5 30 35 22 14 5 9 33 26 39 5 2 3 13 11 13 0 2 0 6 4 2 16 21 9 6 2 2 13 9 9 18 9 9 39 39 39 17 6 10 38 26 42 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 2 3 3 1 3 4 4 3 6 3 5 1 5 4 8 2 5 5 5 7 7 10 7 10 11 9 9 16 13 12 16 3 4 5 7 9 11 5 3 6 5 3 5 1 1 0 2 3 2 4 3 4 1 1 2 4 5 6 0 0 0 1 0 2 0 2 1 2 1 2 2 1 3 1 2 0 4 6 7 5 6 5 3 2 9 9 9 9 0 0 0 0 0 2 0 1 2 0 0 3 2 1 3 1 2 0 2 1 1 0 0 1 2 1 2 2 1 1 2 2 1 3 1 3 2 2 6 3 3 7 4 3 7 42 0 0 0 0 0 0 -2 -1 -1 -2 -1 -1 1 5 17 28 2 11 37 67 1 3 9 15 4 13 39 65 0 1 6 12 0 5 28 56 0 1 5 24 0 3 13 63 0 1 5 13 0 4 21 54 0 0 0 10 0 0 0 53 37 18 19 38 19 19 44 21 21 92 44 48 19 8 7 40 19 20 22 11 11 47 23 22 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 2 2 2 50 52 48 23 22 25 0 0 0 27 27 21 40 28 39 15 9 18 14 11 9 56 52 61 34 36 25 50 43 56 66 27 35 73 38 34 1 1 1 1 1 1 139 66 73 3 2 3 4 1 1 3 1 1 3 1 1 0 0 0 38 19 19 10 11 9 3 5 3 2 3 6 4 1 12 8 7 4 8 5 5 3 4 2 2 1 31 21 16 9 42 25 23 14 17 9 9 4 19 16 13 5 4 3 11 9 4 42 36 28 23 18 14 57 47 21 19 13 11 8 4 4 13 11 9 7 7 7 6 2 2 1 1 1 50 34 28 21 11 11 61 56 47 37 37 37 32 11 9 5 5 5 28 18 12 6 11 7 5 2 13 7 3 1 62 40 31 13 50 32 23 9 68 37 16 5 1 0 1 32 15 17 1 0 0 2 0 4 78 74 74 2 0 0 4 1 2 0 0 0 1 0 0 8 4 11 0 0 0 2 0 0 32 16 16 350 172 178 10 10 11 23 27 20 6 9 21 13 8 13 26 17 4 2 5 6 7 2 2 0 0 0 1 1 3 2 0 0 0 1 4 3 1 0 0 0 0 38 19 19 10 8 1 1 1 0 26 35 5 2 4 0 0 0 0 3 1 0 26 11 13 6 4 1 7 3 3 10 7 3 12 3 5 15 13 13 22 13 25 13 21 0 0 0 7 4 0 57 48 54 15 17 4 30 6 6 6 2 2 3 9 10 7 18 12 6 0 0 0 0 0 0 41 52 30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 2 7 2 3 17 8 13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
df2022fl below why is the 24 rows dataframe
Compare both the dataframe using
df2022fl.ge(df.iloc[0]).sum()
This gives us the number of values in df2022fl which is greater than the value in df
Output :
id 24
table_position 20
performance_rank 20
risk 23
competition_id 24
..
freekicks_total_over295_home 24
freekicks_total_over295_away 24
freekicks_total_over305_overall 24
freekicks_total_over305_home 24
freekicks_total_over305_away 24
Length: 1759, dtype: int64
To get the number of column which came out to be greater than the values in dataframe df you can use axis = 1.
df2022fl['stats'] = df2022fl.ge(df.iloc[0]).sum(axis=1)
This gives you expected output :
id table_position ... freekicks_total_over305_away stats
1 234.0 6.0 ... 0.0 1688
2 235.0 18.0 ... 0.0 1529
3 236.0 16.0 ... 0.0 1565
4 237.0 24.0 ... 0.0 1409
5 242.0 3.0 ... 0.0 1566
6 244.0 4.0 ... 0.0 1681
7 246.0 23.0 ... 0.0 1607
8 247.0 5.0 ... 0.0 1642
9 248.0 14.0 ... 0.0 1603
10 253.0 15.0 ... 0.0 1575
11 254.0 12.0 ... 0.0 1554
12 255.0 13.0 ... 0.0 1593
13 257.0 20.0 ... 0.0 1533
14 258.0 21.0 ... 0.0 1537
15 259.0 9.0 ... 0.0 1585
16 262.0 17.0 ... 0.0 1488
17 265.0 11.0 ... 0.0 1647
18 267.0 7.0 ... 0.0 1628
19 268.0 2.0 ... 0.0 1615
20 1020.0 1.0 ... 0.0 1601
21 1827.0 8.0 ... 0.0 1603
22 1833.0 22.0 ... 0.0 1587
23 3124.0 19.0 ... 0.0 1594
24 3141.0 10.0 ... 0.0 1623

Pandas code to get the count of each values

Here I'm sharing a sample data(I'm dealing with Big Data), the "counts" value varies from 1 to 3000+,, sometimes more than that..
Sample data looks like :
ID counts
41 44 17 16 19 52 6
17 30 16 19 4
52 41 44 30 17 16 6
41 44 52 41 41 41 6
17 17 17 17 41 5
I was trying to split "ID" column into multiple & trying to get that count,,
data= reading the csv_file
split_data = data.ID.apply(lambda x: pd.Series(str(x).split(" "))) # separating columns
as I mentioned, I'm dealing with big data,, so this method is not that much effective..i'm facing problem to get the "ID" counts
I want to collect the total counts of each ID & map it to the corresponding ID column.
Expected output:
ID counts 16 17 19 30 41 44 52
41 41 17 16 19 52 6 1 1 1 0 2 0 1
17 30 16 19 4 1 1 1 1 0 0 0
52 41 44 30 17 16 6 1 1 0 1 1 1 1
41 44 52 41 41 41 6 0 0 0 0 4 1 1
17 17 17 17 41 5 0 4 0 0 1 0 0
If you have any idea,, please let me know
Thank you
Use Counter for get counts of values splitted by space in list comprehension:
from collections import Counter
L = [{int(k): v for k, v in Counter(x.split()).items()} for x in df['ID']]
df1 = pd.DataFrame(L, index=df.index).fillna(0).astype(int).sort_index(axis=1)
df = df.join(df1)
print (df)
ID counts 16 17 19 30 41 44 52
0 41 44 17 16 19 52 6 1 1 1 0 1 1 1
1 17 30 16 19 4 1 1 1 1 0 0 0
2 52 41 44 30 17 16 6 1 1 0 1 1 1 1
3 41 44 52 41 41 41 6 0 0 0 0 4 1 1
4 17 17 17 17 41 5 0 4 0 0 1 0 0
Another idea, but I guess slowier:
df1 = df.assign(a = df['ID'].str.split()).explode('a')
df1 = df.join(pd.crosstab(df1['ID'], df1['a']), on='ID')
print (df1)
ID counts 16 17 19 30 41 44 52
0 41 44 17 16 19 52 6 1 1 1 0 1 1 1
1 17 30 16 19 4 1 1 1 1 0 0 0
2 52 41 44 30 17 16 6 1 1 0 1 1 1 1
3 41 44 52 41 41 41 6 0 0 0 0 4 1 1
4 17 17 17 17 41 5 0 4 0 0 1 0 0

Defining Target based on two column values

I am new to python and I was facing some issue solving the following problem.
I have the following dataframe:
SoldDate CountSoldperMonth
2019-06-01 20
5
10
12
33
16
50
27
2019-05-01 2
5
11
13
2019-04-01 32
35
39
42
47
55
61
80
I need to add a Target column such that for the top 5 values in 'CountSoldperMonth' for a particular SoldDate, target should be 1 else 0. If the number of rows in 'CountSoldperMonth' for a particular 'SoldDate' is less than 5 then only the row with highest count will be marked as 1 in the Target and rest as 0. The resulting dataframe should look as below.
SoldDate CountSoldperMonth Target
2019-06-01 20 1
5 0
10 0
12 0
33 1
16 1
50 1
27 1
2019-05-01 2 0
5 0
11 0
13 1
2019-04-01 32 0
35 0
39 0
42 1
47 1
55 1
61 1
80 1
How do I do this?
In your case , using groupby with your rules chain with apply if...else
df.groupby('SoldDate').CountSoldperMonth.\
apply(lambda x : x==max(x) if len(x)<=5 else x.isin(sorted(x)[-5:])).astype(int)
Out[346]:
0 1
1 0
2 0
3 0
4 1
5 1
6 1
7 1
8 0
9 0
10 0
11 1
12 0
13 0
14 0
15 1
16 1
17 1
18 1
19 1
Name: CountSoldperMonth, dtype: int32

Pandas DataFrame Return Value from Column Index

I have a dataframe that has values of the different column numbers for another dataframe. Is there a way that I can just return the value from the other dataframe instead of just having the column index.
I basically want to match up the index between the Push and df dataframes. The values in the Push dataframe contain what column I want to return from the df dataframe.
Push dataframe:
0 1
0 1 2
1 0 3
2 0 3
3 1 3
4 0 2
df dataframe:
0 1 2 3 4
0 10 11 22 33 44
1 10 11 22 33 44
2 10 11 22 33 44
3 10 11 22 33 44
4 10 11 22 33 44
return:
0 1
0 11 22
1 10 33
2 10 33
3 11 33
4 10 22
You can do it with np.take ; However this function works on the flattened array. push must be shift like that :
In [285]: push1 = push.values+np.arange(0,25,5)[:,None]
In [229]: pd.DataFrame(df.values.take(push1))
EDIT
No, I just reinvent np.choose :
In [24]: df
Out[24]:
0 1 2 3 4
0 0 1 2 3 4
1 10 11 12 13 14
2 20 21 22 23 24
3 30 31 32 33 34
4 40 41 42 43 44
In [25]: push
Out[25]:
0 1
0 1 2
1 0 3
2 0 3
3 1 3
4 0 2
In [27]: np.choose(push.T,df).T
Out[27]:
0 1
0 1 2
1 10 13
2 20 23
3 31 33
4 40 42
We using melt then replace notice (df1 is your push , df2 is your df)
df1.astype(str).replace(df2.melt().drop_duplicates().set_index('variable').value.to_dict())
Out[31]:
0 1
0 11 22
1 10 33
2 10 33
3 11 33
4 10 22

Categories

Resources