If I create a dataframe like so:
import pandas as pd, numpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(100, 2)), columns=list('AB'))
How would I change the entry in column A to be the number 16 from row 0 -15, for example? In other words, how do I replace cells based purely on index?
Use loc:
df.loc[0:15,'A'] = 16
print (df)
A B
0 16 45
1 16 5
2 16 97
3 16 58
4 16 26
5 16 87
6 16 51
7 16 17
8 16 39
9 16 73
10 16 94
11 16 69
12 16 57
13 16 24
14 16 43
15 16 77
16 41 0
17 3 21
18 0 98
19 45 39
20 66 62
21 8 53
22 69 47
23 48 53
Solution with ix is deprecated.
In addition to the other answers, here is what you can do if you have a list of individual indices:
indices = [0,1,3,6,10,15]
df.loc[indices,'A'] = 16
print(df.head(16))
Output:
A B
0 16 4
1 16 4
2 4 3
3 16 4
4 1 1
5 3 0
6 16 4
7 2 1
8 4 4
9 3 4
10 16 0
11 3 1
12 4 2
13 2 2
14 2 1
15 16 1
One more solution is
df.at[0:15, 'A']=16
print(df.head(20))
OUTPUT:
A B
0 16 44
1 16 86
2 16 97
3 16 79
4 16 94
5 16 24
6 16 88
7 16 43
8 16 64
9 16 39
10 16 84
11 16 42
12 16 8
13 16 72
14 16 23
15 16 28
16 18 11
17 76 15
18 12 38
19 91 6
Very interesting observation, that code below does change the value in the original dataframe
df.loc[0:15,'A'] = 16
But if you use a pretty similar code like this
df.loc[0:15]['A'] = 16
Than it will give back just a copy of your dataframe with changed value and doesn't change the value in the original df object.
Hope that this will save some time for someone dealing with this issue.
Could you instead of 16, update the value of that column to -1.0? for me, it returns 255 instead of -1.0.
>>> effect_df.loc[3:5, ['city_SF', 'city_Seattle']] = -1.0
Rent city_SF city_Seattle
0 3999 1 0
1 4000 1 0
2 4001 1 0
3 3499 255 255
4 3500 255 255
5 3501 255 255
6 2499 0 1
7 2500 0 1
8 2501 0 1
To Mad Physicist: it appears that at first you need to change the column data types from short integer to float. Looks like your -1.0 was cast as short integer.
Related
How to iterate rows in pandas to perform the Manipulation in a format below
I have a csv file that contains a 365 column and 1152 rows(the rows index is divided like(1,48),(1,48)...), I need to select K maximum rows from every (1,48) row index and perform some manipulation.
Steps I took:
I used df.apply to do this.
Code I tried
def with_battery(val):
for i in range(d2i.shape[0]):
if i in [31,32,33,34,35,36]: #[31,32,33,34,35,36] should be replaced by top K max.
#batterysize = 50
if val.iloc[i]>batterysize:
val.iloc[i]=0
else:
val.iloc[i] -= batterysize
return val
D2j = D2i.apply(with_battery,axis=0)
How the data is:
**Input Dataframe**
1 2 3 4 5 6 7
1 10 11 34 21 23 12 10
2 11 11 11 11 11 11 11
3 32 32 32 32 32 32 32
4 21 21 21 21 21 21 21
5 42 42 42 42 42 42 42
6 34 34 34 34 34 34 34
1 21 21 21 21 21 21 21
2 22 22 22 22 22 22 22
3 54 54 54 54 54 54 54
4 45 45 45 45 45 45 45
5 43 43 43 43 43 43 43
6 42 42 42 42 42 42 42
> for K=3, the row (3,5,6) is max so I made the value less than 50 as Zero and value more than 50 as value - 50. Similarly in next chunk of rows (3,4,5) is top 3 max rows and I performed similar action as above
Output Dataframe
1 2 3 4 5 6 7
1 10 11 34 21 23 12 10
2 11 11 11 11 11 11 11
3 0 0 0 0 0 0 0
4 21 21 21 21 21 21 21
5 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0
1 21 21 21 21 21 21 21
2 22 22 22 22 22 22 22
3 4 4 4 4 4 4 4
4 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0
6 42 42 42 42 42 42 42
I am working on a dataframe and I want to group the data for an hour into 4 different slots of 15 mins,
0-15 - 1st slot
15-30 - 2nd slot
30-45 - 3rd slot
45-00(or 60) - 4th slot
I am not even able to think, how to go forward with this
I tried extracting hours, minutes and seconds from the time, but what to do now?
Use integer division by 15 and then add 1:
df = pd.DataFrame({'M': range(60)})
df['slot'] = df['M'] // 15 + 1
print (df)
M slot
0 0 1
1 1 1
2 2 1
3 3 1
4 4 1
5 5 1
6 6 1
7 7 1
8 8 1
9 9 1
10 10 1
11 11 1
12 12 1
13 13 1
14 14 1
15 15 2
16 16 2
17 17 2
18 18 2
19 19 2
20 20 2
21 21 2
22 22 2
23 23 2
24 24 2
25 25 2
26 26 2
27 27 2
28 28 2
29 29 2
30 30 3
31 31 3
32 32 3
33 33 3
34 34 3
35 35 3
36 36 3
37 37 3
38 38 3
39 39 3
40 40 3
41 41 3
42 42 3
43 43 3
44 44 3
45 45 4
46 46 4
47 47 4
48 48 4
49 49 4
50 50 4
51 51 4
52 52 4
53 53 4
54 54 4
55 55 4
56 56 4
57 57 4
58 58 4
59 59 4
Here I'm sharing a sample data(I'm dealing with Big Data), the "counts" value varies from 1 to 3000+,, sometimes more than that..
Sample data looks like :
ID counts
41 44 17 16 19 52 6
17 30 16 19 4
52 41 44 30 17 16 6
41 44 52 41 41 41 6
17 17 17 17 41 5
I was trying to split "ID" column into multiple & trying to get that count,,
data= reading the csv_file
split_data = data.ID.apply(lambda x: pd.Series(str(x).split(" "))) # separating columns
as I mentioned, I'm dealing with big data,, so this method is not that much effective..i'm facing problem to get the "ID" counts
I want to collect the total counts of each ID & map it to the corresponding ID column.
Expected output:
ID counts 16 17 19 30 41 44 52
41 41 17 16 19 52 6 1 1 1 0 2 0 1
17 30 16 19 4 1 1 1 1 0 0 0
52 41 44 30 17 16 6 1 1 0 1 1 1 1
41 44 52 41 41 41 6 0 0 0 0 4 1 1
17 17 17 17 41 5 0 4 0 0 1 0 0
If you have any idea,, please let me know
Thank you
Use Counter for get counts of values splitted by space in list comprehension:
from collections import Counter
L = [{int(k): v for k, v in Counter(x.split()).items()} for x in df['ID']]
df1 = pd.DataFrame(L, index=df.index).fillna(0).astype(int).sort_index(axis=1)
df = df.join(df1)
print (df)
ID counts 16 17 19 30 41 44 52
0 41 44 17 16 19 52 6 1 1 1 0 1 1 1
1 17 30 16 19 4 1 1 1 1 0 0 0
2 52 41 44 30 17 16 6 1 1 0 1 1 1 1
3 41 44 52 41 41 41 6 0 0 0 0 4 1 1
4 17 17 17 17 41 5 0 4 0 0 1 0 0
Another idea, but I guess slowier:
df1 = df.assign(a = df['ID'].str.split()).explode('a')
df1 = df.join(pd.crosstab(df1['ID'], df1['a']), on='ID')
print (df1)
ID counts 16 17 19 30 41 44 52
0 41 44 17 16 19 52 6 1 1 1 0 1 1 1
1 17 30 16 19 4 1 1 1 1 0 0 0
2 52 41 44 30 17 16 6 1 1 0 1 1 1 1
3 41 44 52 41 41 41 6 0 0 0 0 4 1 1
4 17 17 17 17 41 5 0 4 0 0 1 0 0
I am working with python to create a new frame starting from two frame by using Pandas.
The first frame (called frame1) is composed by the following line:
A B C D E
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
6 6 6 6 6
7 7 7 7 7
8 8 8 8 8
9 9 9 9 9
10 10 10 10 10
11 11 11 11 11
12 12 12 12 12
13 13 13 13 13
14 14 14 14 14
15 15 15 15 15
The second frame (called frame2) is:
A B C D E
19 19 19 19 19
24 24 24 24 24
29 29 29 29 29
34 34 34 34 34
39 39 39 39 39
44 44 44 44 44
49 49 49 49 49
54 54 54 54 54
59 59 59 59 59
64 64 64 64 64
69 69 69 69 69
74 74 74 74 74
79 79 79 79 79
84 84 84 84 84
89 89 89 89 89
94 94 94 94 94
99 99 99 99 99
Now i want to create a new dataset with this logic: starting from frame1 substitute every 5 row until the end of the frame1, the row of the frame1 with a random row of the frame2 (and remove the added row from frame2). A possible output should be:
A B C D E
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
59 59 59 59 59
6 6 6 6 6
7 7 7 7 7
8 8 8 8 8
9 9 9 9 9
29 29 29 29 29
11 11 11 11 11
12 12 12 12 12
13 13 13 13 13
14 14 14 14 14
84 84 84 84 84
How can i do this operation?
It's quite simple:
frame1.loc[4::5] = frame2.sample(frac=1).reset_index(drop=True)
where
df.loc[4::5] selects every fifth element, starting with the fifth one in df, and
df.sample(frac=1).reset_index(drop=True) shuffles a df around randomly
One way is to first obtain the indices where to update (we could also slice assign, but we'd have the problem of the end not being included), and then assign back taking a sample from df2 of the corresponding size:
ix = np.flatnonzero(np.diff(np.arange(df.shape[0]+1)//5))
df1.iloc[ix] = df2.sample(df1.shape[0]//5).to_numpy()
print(df1)
A B C D E
0 1 1 1 1 1
1 2 2 2 2 2
2 3 3 3 3 3
3 4 4 4 4 4
4 84 84 84 84 84
5 6 6 6 6 6
6 7 7 7 7 7
7 8 8 8 8 8
8 9 9 9 9 9
9 89 89 89 89 89
10 11 11 11 11 11
11 12 12 12 12 12
12 13 13 13 13 13
13 14 14 14 14 14
14 99 99 99 99 99
I have two different dataframes with the same column names:
eg.
0 1 2
0 10 13 17
1 14 21 34
2 68 32 12
0 1 2
0 45 56 32
1 9 22 86
2 55 64 19
I would like to append the second frame to the right of the first one while continuing the column names from the first frame. The output would look like this:
0 1 2 3 4 5
0 10 13 17 45 56 32
1 14 21 34 9 22 86
2 68 32 12 55 64 19
What is the most efficient way of doing this?
Thanks.
Use pd.concat first and then reset the columns.
In [1108]: df_out = pd.concat([df1, df2], axis=1)
In [1109]: df_out.columns = list(range(len(df_out.columns)))
In [1110]: df_out
Out[1110]:
0 1 2 3 4 5
0 10 13 17 45 56 32
1 14 21 34 9 22 86
2 68 32 12 55 64 19
Why not join:
>>> df=df.join(df_,lsuffix='_')
>>> df.columns=range(len(df.columns))
>>> df
0 1 2 3 4 5
0 10 13 17 45 56 32
1 14 21 34 9 22 86
2 68 32 12 55 64 19
>>>
join is your friend, i use lsuffix (could be rsuffix too) to ignore error for saying duplicate columns.