Pandas: Perform operation on various columns and create, rename new columns - python

We have a dataframe 'A' with 5 columns, and we want to add the rolling mean of each column, we could do:
A = pd.DataFrame(np.random.randint(100, size=(5, 5)))
for i in range(0,5):
A[i+6] = A[i].rolling(3).mean()
If however 'A' has column named 'A', 'B'...'E':
A = pd.DataFrame(np.random.randint(100, size=(5, 5)), columns = ['A', 'B',
'C', 'D', 'E'])
How could we neatly add 5 columns with the rolling mean, and each name being ['A_mean', 'B_mean', ....'E_mean']?

try this:
for col in df:
A[col+'_mean'] = A[col].rolling(3).mean()
Output with your way:
0 1 2 3 4 6 7 8 9 10
0 16 53 9 16 67 NaN NaN NaN NaN NaN
1 55 37 93 92 21 NaN NaN NaN NaN NaN
2 10 5 93 99 27 27.0 31.666667 65.000000 69.000000 38.333333
3 94 32 81 91 34 53.0 24.666667 89.000000 94.000000 27.333333
4 37 46 20 18 10 47.0 27.666667 64.666667 69.333333 23.666667
and Output with mine:
A B C D E A_mean B_mean C_mean D_mean E_mean
0 16 53 9 16 67 NaN NaN NaN NaN NaN
1 55 37 93 92 21 NaN NaN NaN NaN NaN
2 10 5 93 99 27 27.0 31.666667 65.000000 69.000000 38.333333
3 94 32 81 91 34 53.0 24.666667 89.000000 94.000000 27.333333
4 37 46 20 18 10 47.0 27.666667 64.666667 69.333333 23.666667

Without loops:
pd.concat([A, A.apply(lambda x:x.rolling(3).mean()).rename(
columns={col: str(col) + '_mean' for col in A})], axis=1)
A B C D E A_mean B_mean C_mean D_mean E_mean
0 67 54 85 61 62 NaN NaN NaN NaN NaN
1 44 53 30 80 58 NaN NaN NaN NaN NaN
2 10 59 14 39 12 40.333333 55.333333 43.0 60.000000 44.000000
3 47 25 58 93 38 33.666667 45.666667 34.0 70.666667 36.000000
4 73 80 30 51 77 43.333333 54.666667 34.0 61.000000 42.333333

Related

New column based on last time row value equals some numbers in Pandas dataframe

I have a dataframe sorted in descending order date that records the Rank of students in class and the predicted score.
Date Student_ID Rank Predicted_Score
4/7/2021 33 2 87
13/6/2021 33 4 88
31/3/2021 33 7 88
28/2/2021 33 2 86
14/2/2021 33 10 86
31/1/2021 33 8 86
23/12/2020 33 1 81
8/11/2020 33 3 80
21/10/2020 33 3 80
23/9/2020 33 4 80
20/5/2020 33 3 80
29/4/2020 33 4 80
15/4/2020 33 2 79
26/2/2020 33 3 79
12/2/2020 33 5 79
29/1/2020 33 1 70
I want to create a column called Recent_Predicted_Score that record the last predicted_score where that student actually ranks top 3. So the desired outcome looks like
Date Student_ID Rank Predicted_Score Recent_Predicted_Score
4/7/2021 33 2 87 86
13/6/2021 33 4 88 86
31/3/2021 33 7 88 86
28/2/2021 33 2 86 81
14/2/2021 33 10 86 81
31/1/2021 33 8 86 81
23/12/2020 33 1 81 80
8/11/2020 33 3 80 80
21/10/2020 33 3 80 80
23/9/2020 33 4 80 80
20/5/2020 33 3 80 79
29/4/2020 33 4 80 79
15/4/2020 33 2 79 79
26/2/2020 33 3 79 70
12/2/2020 33 5 79 70
29/1/2020 33 1 70
Here's what I have tried but it doesn't quite work, not sure if I am on the right track:
df.sort_values(by = ['Student_ID', 'Date'], ascending = [True, False], inplace = True)
lp1 = df['Predicted_Score'].where(df['Rank'].isin([1,2,3])).groupby(df['Student_ID']).bfill()
lp2 = df.groupby(['Student_ID', 'Rank'])['Predicted_Score'].shift(-1)
df = df.assign(Recent_Predicted_Score=lp1.mask(df['Rank'].isin([1,2,3]), lp2))
Thanks in advance.
Try:
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df = df.sort_values(['Student_ID', 'Date'])
df['Recent_Predicted_Score'] = np.where(df['Rank'].isin([1, 2, 3]), df['Predicted_Score'], np.nan)
df['Recent_Predicted_Score'] = df.groupby('Student_ID', group_keys=False)['Recent_Predicted_Score'].apply(lambda x: x.ffill().shift().fillna(''))
df = df.sort_values(['Student_ID', 'Date'], ascending = [True, False])
print(df)
Prints:
Date Student_ID Rank Predicted_Score Recent_Predicted_Score
0 2021-07-04 33 2 87 86.0
1 2021-06-13 33 4 88 86.0
2 2021-03-31 33 7 88 86.0
3 2021-02-28 33 2 86 81.0
4 2021-02-14 33 10 86 81.0
5 2021-01-31 33 8 86 81.0
6 2020-12-23 33 1 81 80.0
7 2020-11-08 33 3 80 80.0
8 2020-10-21 33 3 80 80.0
9 2020-09-23 33 4 80 80.0
10 2020-05-20 33 3 80 79.0
11 2020-04-29 33 4 80 79.0
12 2020-04-15 33 2 79 79.0
13 2020-02-26 33 3 79 70.0
14 2020-02-12 33 5 79 70.0
15 2020-01-29 33 1 70
Mask the scores where rank is greater than 3 then group the masked column by Student_ID and backward fill to propagate the last predicted score
c = 'Recent_Predicted_Score'
df[c] = df['Predicted_Score'].mask(df['Rank'].gt(3))
df[c] = df.groupby('Student_ID')[c].apply(lambda s: s.shift(-1).bfill())
Result
Date Student_ID Rank Predicted_Score Recent_Predicted_Score
0 4/7/2021 33 2 87 86.0
1 13/6/2021 33 4 88 86.0
2 31/3/2021 33 7 88 86.0
3 28/2/2021 33 2 86 81.0
4 14/2/2021 33 10 86 81.0
5 31/1/2021 33 8 86 81.0
6 23/12/2020 33 1 81 80.0
7 8/11/2020 33 3 80 80.0
8 21/10/2020 33 3 80 80.0
9 23/9/2020 33 4 80 80.0
10 20/5/2020 33 3 80 79.0
11 29/4/2020 33 4 80 79.0
12 15/4/2020 33 2 79 79.0
13 26/2/2020 33 3 79 70.0
14 12/2/2020 33 5 79 70.0
15 29/1/2020 33 1 70 NaN
Note: Make sure your dataframe is sorted on Date in descending order.
Let's assume:
there may be more than one unique Student_ID
the rows are ordered by descending Date as indicated by OP, but may not be ordered by Student_ID
we want to preserve the index of the original dataframe
Subject to these assumptions, here's a way to do what your question asks:
df['Recent_Predicted_Score'] = df.loc[df.Rank <= 3, 'Predicted_Score']
df['Recent_Predicted_Score'] = ( df
.groupby('Student_ID', sort=False)
.apply(lambda group: group.shift(-1).bfill())
['Recent_Predicted_Score'] )
Explanation:
create a new column Recent_Predicted_Score containing the PredictedScore where Rank is in the top 3 and NaN otherwise
use groupby() on Student_ID with the sort argument set to False for better performance (note that groupby() preserves the order of rows within each group, specifically, not influencing the existing descending order by Date)
within each group, do shift(-1) and bfill() to get the desired result for Recent_Predicted_Score.
Sample input (with two distinct Student_ID values):
Date Student_ID Rank Predicted_Score
0 2021-07-04 33 2 87
1 2021-07-04 66 2 87
2 2021-06-13 33 4 88
3 2021-06-13 66 4 88
4 2021-03-31 33 7 88
5 2021-03-31 66 7 88
6 2021-02-28 33 2 86
7 2021-02-28 66 2 86
8 2021-02-14 33 10 86
9 2021-02-14 66 10 86
10 2021-01-31 33 8 86
11 2021-01-31 66 8 86
12 2020-12-23 33 1 81
13 2020-12-23 66 1 81
14 2020-11-08 33 3 80
15 2020-11-08 66 3 80
16 2020-10-21 33 3 80
17 2020-10-21 66 3 80
18 2020-09-23 33 4 80
19 2020-09-23 66 4 80
20 2020-05-20 33 3 80
21 2020-05-20 66 3 80
22 2020-04-29 33 4 80
23 2020-04-29 66 4 80
24 2020-04-15 33 2 79
25 2020-04-15 66 2 79
26 2020-02-26 33 3 79
27 2020-02-26 66 3 79
28 2020-02-12 33 5 79
29 2020-02-12 66 5 79
30 2020-01-29 33 1 70
31 2020-01-29 66 1 70
Output:
Date Student_ID Rank Predicted_Score Recent_Predicted_Score
0 2021-07-04 33 2 87 86.0
1 2021-07-04 66 2 87 86.0
2 2021-06-13 33 4 88 86.0
3 2021-06-13 66 4 88 86.0
4 2021-03-31 33 7 88 86.0
5 2021-03-31 66 7 88 86.0
6 2021-02-28 33 2 86 81.0
7 2021-02-28 66 2 86 81.0
8 2021-02-14 33 10 86 81.0
9 2021-02-14 66 10 86 81.0
10 2021-01-31 33 8 86 81.0
11 2021-01-31 66 8 86 81.0
12 2020-12-23 33 1 81 80.0
13 2020-12-23 66 1 81 80.0
14 2020-11-08 33 3 80 80.0
15 2020-11-08 66 3 80 80.0
16 2020-10-21 33 3 80 80.0
17 2020-10-21 66 3 80 80.0
18 2020-09-23 33 4 80 80.0
19 2020-09-23 66 4 80 80.0
20 2020-05-20 33 3 80 79.0
21 2020-05-20 66 3 80 79.0
22 2020-04-29 33 4 80 79.0
23 2020-04-29 66 4 80 79.0
24 2020-04-15 33 2 79 79.0
25 2020-04-15 66 2 79 79.0
26 2020-02-26 33 3 79 70.0
27 2020-02-26 66 3 79 70.0
28 2020-02-12 33 5 79 70.0
29 2020-02-12 66 5 79 70.0
30 2020-01-29 33 1 70 NaN
31 2020-01-29 66 1 70 NaN
Output sorted by Student_ID, Date for easier inspection:
Date Student_ID Rank Predicted_Score Recent_Predicted_Score
0 2021-07-04 33 2 87 86.0
2 2021-06-13 33 4 88 86.0
4 2021-03-31 33 7 88 86.0
6 2021-02-28 33 2 86 81.0
8 2021-02-14 33 10 86 81.0
10 2021-01-31 33 8 86 81.0
12 2020-12-23 33 1 81 80.0
14 2020-11-08 33 3 80 80.0
16 2020-10-21 33 3 80 80.0
18 2020-09-23 33 4 80 80.0
20 2020-05-20 33 3 80 79.0
22 2020-04-29 33 4 80 79.0
24 2020-04-15 33 2 79 79.0
26 2020-02-26 33 3 79 70.0
28 2020-02-12 33 5 79 70.0
30 2020-01-29 33 1 70 NaN
1 2021-07-04 66 2 87 86.0
3 2021-06-13 66 4 88 86.0
5 2021-03-31 66 7 88 86.0
7 2021-02-28 66 2 86 81.0
9 2021-02-14 66 10 86 81.0
11 2021-01-31 66 8 86 81.0
13 2020-12-23 66 1 81 80.0
15 2020-11-08 66 3 80 80.0
17 2020-10-21 66 3 80 80.0
19 2020-09-23 66 4 80 80.0
21 2020-05-20 66 3 80 79.0
23 2020-04-29 66 4 80 79.0
25 2020-04-15 66 2 79 79.0
27 2020-02-26 66 3 79 70.0
29 2020-02-12 66 5 79 70.0
31 2020-01-29 66 1 70 NaN

first argument must be an iterable of pandas objects, you passed an object of type "DataFrame" - not sure why

I have the following code below. I am trying to concatenate columns together and fill an empty dataframe 'emptyframe'.
The idea is that I start with 3 columns, then add on another 3 columns, then another 3 colums etc...
emptyframe = pd.DataFrame()
for j in range(0,len(df.columns)):
newdata = pd.concat((newdf.loc[:,j],happydf.loc[:,j],motivedf.loc[:,j]),axis=1)
emptyframe = pd.concat(newdata,emptyframe)
print(emptyframe)
However, the following code gives me the error 'first argument must be an iterable of pandas objects, you passed an object of type "DataFrame"'.
As an example:
newdata = pd.concat((newdf.loc[:,0],happydf.loc[:,0],motivedf.loc[:,0]),axis=1)
Gives me:
0 0 0
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN
5 fullMiss 37 66
6 nearMiss 33 67
7 hit 75 60
8 fullMiss 36 63
9 hit 74 42
10 nearMiss 19 45
11 fullMiss 24 28
12 fullMiss 13 20
13 nearMiss 2 9
14 fullMiss 8 9
15 fullMiss 3 4
16 nearMiss 52 5
17 fullMiss 49 2
18 fullMiss 52 3
19 fullMiss 52 0
20 hit 50 10
21 nearMiss 59 3
22 hit 52 2
23 fullMiss 54 4
24 nearMiss 35 1
25 fullMiss 49 0
26 nearMiss 51 13
27 fullMiss 54 9
28 nearMiss 53 4
I would be so grateful for a helping hand!
As an example, the first 10 lines of 'newdf':
0 1 2 3 4 5 6 ... 71 72 73 74 75 76 77
0 NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN
5 fullMiss fullMiss fullMiss fullMiss fullMiss hit fullMiss ... fullMiss fullMiss nearMiss nearMiss fullMiss nearMiss fullMiss
6 nearMiss nearMiss nearMiss nearMiss nearMiss fullMiss nearMiss ... nearMiss nearMiss hit fullMiss hit fullMiss nearMiss
7 hit hit hit fullMiss fullMiss nearMiss fullMiss ... hit hit fullMiss hit fullMiss hit fullMiss
8 fullMiss fullMiss hit fullMiss fullMiss fullMiss fullMiss ... fullMiss nearMiss fullMiss nearMiss nearMiss fullMiss nearMiss
9 hit fullMiss hit nearMiss nearMiss fullMiss fullMiss ... nearMiss nearMiss hit hit fullMiss nearMiss fullMiss
happydf:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 ... 64 65 66 67 68 69 70 71 72 73 74 75 76 77
0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
5 37 31 48 32 17 70 6 3 22 52 20 99 51 29 ... 3 51 52 50 18 28 22 52 35 36 1 9 4 20
6 33 42 39 35 34 8 15 0 23 49 67 50 50 29 ... 49 50 41 43 54 39 35 35 16 68 0 89 0 62
7 75 75 65 36 52 20 4 0 28 100 34 49 47 28 ... 97 42 100 43 20 36 100 43 23 28 99 21 30 35
8 36 63 70 33 53 1 5 50 7 18 29 1 64 74 ... 50 46 50 45 19 72 58 30 48 12 3 54 0 38
9 74 32 58 35 30 16 8 49 83 50 30 1 51 39 ... 48 39 31 39 12 37 42 59 18 68 88 12 5 33
motivedf
0 1 2 3 4 5 6 7 8 9 10 11 12 13 ... 64 65 66 67 68 69 70 71 72 73 74 75 76 77
0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
5 66 50 65 62 17 35 46 95 51 72 51 0 56 24 ... 79 26 82 25 76 25 69 51 86 15 71 66 2 60
6 67 44 60 51 40 67 46 49 52 74 50 0 58 26 ... 85 24 48 24 77 34 62 64 66 50 63 70 4 64
7 60 58 67 39 45 52 48 0 52 95 53 0 51 24 ... 83 11 66 28 76 28 90 63 45 23 71 53 55 50
8 63 44 67 34 52 0 48 53 52 50 53 0 55 41 ... 51 10 28 28 51 72 65 62 50 0 62 50 4 62
9 42 44 58 28 50 69 52 50 52 49 51 1 53 32 ... 39 0 52 45 35 16 45 64 31 15 69 39 3 49
I used the 'append' function to solve this issue:
emptyframe = pd.DataFrame()
for j in range(0,len(df.columns)):
newdata = pd.concat((newdf.iloc[:,j],happydf.iloc[:,j],motivedf.iloc[:,j]),axis=1)
emptyframe = emptyframe.append(newdata)
print(emptyframe)
:)

Python socket not receiving bytes of data from server

The following is my code for server and client. The client receives the first 4 bytes telling the length of the data block, which is then received by the client from the server. Once received, it performs certain calculations and writes it into the file and then receives the next 4 bytes to get the next block and so on.
After the first time of running the loop and writing into the file, it fails to receives the next 4 bytes and hence comes out of the loop. Not sure why this is happening. Please let me know where am I going wrong.
Thank you in advance!
Server code
import socket
import threading
import os
def Main():
host = '127.0.0.1'
port = 5009
s = socket.socket()
s.bind((host,port))
s.listen(5)
print("Server started")
while True:
c,addr = s.accept()
print("Client connected ip:<" + str(addr) + ">")
c.sendall('1685 2020/03/02 14:42:05 318397 4 1 25 0 0 0 0 1513,094 1516,156 1519,154 1521,969 1525,029 1527,813 1530,921 1533,869 1536,740 1539,943 1542,921 1545,879 1548,843 1551,849 1554,760 1557,943 1560,782 1563,931 1566,786 1569,751 1572,690 1575,535 1578,638 1581,755 1584,759 41 39 33 39 48 44 49 55 61 58 64 55 68 74 68 59 57 74 61 68 58 64 54 47 46 2 25 0 0 0 0 1512,963 1515,935 1518,857 1521,849 1524,655 1527,577 1530,332 1533,233 1536,204 1539,488 1542,571 1545,725 1549,200 1552,430 1555,332 1558,484 1561,201 1564,285 1567,001 1569,870 1572,758 1575,491 1578,512 1581,547 1584,405 48 43 37 42 57 54 59 62 67 58 71 59 77 82 82 64 71 88 77 79 72 73 63 49 50 3 25 0 0 0 0 1513,394 1516,517 1519,536 1522,082 1525,428 1527,963 1531,288 1534,102 1536,659 1539,757 1542,707 1545,627 1548,389 1551,459 1554,406 1557,986 1560,667 1564,103 1567,036 1570,144 1573,189 1575,888 1579,185 1582,323 1585,338 35 36 32 37 57 58 61 64 75 73 70 62 61 62 59 51 52 64 58 62 70 70 64 54 55 4 25 0 0 0 0 1512,658 1515,752 1518,797 1521,707 1524,744 1527,627 1530,871 1534,002 1537,086 1540,320 1543,217 1546,010 1548,660 1551,385 1554,253 1557,074 1560,193 1563,116 1566,043 1568,963 1571,855 1574,957 1577,954 1581,128 1584,273 43 42 39 40 56 50 56 62 65 54 59 62 75 79 73 63 67 77 73 75 68 62 54 51 51 100 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN'.encode())
c.sendall('1685 2020/03/03 14:42:05 318398 4 1 25 0 0 0 0 1513,094 1516,156 1519,154 1521,969 1525,029 1527,812 1530,921 1533,869 1536,740 1539,943 1542,922 1545,878 1548,843 1551,849 1554,760 1557,944 1560,782 1563,931 1566,786 1569,751 1572,691 1575,535 1578,638 1581,755 1584,758 41 39 33 39 48 44 49 55 61 58 64 55 68 74 68 59 57 74 61 68 58 64 54 47 46 2 25 0 0 0 0 1512,963 1515,935 1518,857 1521,849 1524,655 1527,577 1530,332 1533,233 1536,204 1539,489 1542,571 1545,725 1549,200 1552,430 1555,331 1558,484 1561,201 1564,285 1567,002 1569,870 1572,758 1575,491 1578,512 1581,547 1584,405 48 43 37 42 57 54 58 62 67 58 70 59 77 82 82 64 71 88 77 79 72 73 63 49 50 3 25 0 0 0 0 1513,394 1516,517 1519,536 1522,082 1525,427 1527,963 1531,288 1534,102 1536,659 1539,757 1542,707 1545,627 1548,389 1551,459 1554,406 1557,986 1560,666 1564,103 1567,036 1570,144 1573,190 1575,887 1579,185 1582,323 1585,338 34 35 32 37 57 58 61 64 75 73 70 62 61 61 59 51 52 64 58 62 70 70 64 54 55 4 25 0 0 0 0 1512,658 1515,753 1518,797 1521,707 1524,744 1527,627 1530,872 1534,002 1537,086 1540,320 1543,217 1546,011 1548,660 1551,385 1554,253 1557,074 1560,193 1563,116 1566,043 1568,963 1571,855 1574,957 1577,953 1581,129 1584,273 43 42 39 40 56 50 56 62 65 54 59 62 75 79 73 63 67 77 73 75 68 62 54 51 51 100 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN'.encode())
c.sendall('1685 2020/03/04 14:42:05 318398 4 1 25 0 0 0 0 1513,094 1516,156 1519,154 1521,969 1525,029 1527,812 1530,921 1533,869 1536,740 1539,943 1542,922 1545,878 1548,843 1551,849 1554,760 1557,944 1560,782 1563,931 1566,786 1569,751 1572,691 1575,535 1578,638 1581,755 1584,758 41 39 33 39 48 44 49 55 61 58 64 55 68 74 68 59 57 74 61 68 58 64 54 47 46 2 25 0 0 0 0 1512,963 1515,935 1518,857 1521,849 1524,655 1527,577 1530,332 1533,233 1536,204 1539,489 1542,571 1545,725 1549,200 1552,430 1555,331 1558,484 1561,201 1564,285 1567,002 1569,870 1572,758 1575,491 1578,512 1581,547 1584,405 48 43 37 42 57 54 58 62 67 58 70 59 77 82 82 64 71 88 77 79 72 73 63 49 50 3 25 0 0 0 0 1513,394 1516,517 1519,536 1522,082 1525,427 1527,963 1531,288 1534,102 1536,659 1539,757 1542,707 1545,627 1548,389 1551,459 1554,406 1557,986 1560,666 1564,103 1567,036 1570,144 1573,190 1575,887 1579,185 1582,323 1585,338 34 35 32 37 57 58 61 64 75 73 70 62 61 61 59 51 52 64 58 62 70 70 64 54 55 4 25 0 0 0 0 1512,658 1515,753 1518,797 1521,707 1524,744 1527,627 1530,872 1534,002 1537,086 1540,320 1543,217 1546,011 1548,660 1551,385 1554,253 1557,074 1560,193 1563,116 1566,043 1568,963 1571,855 1574,957 1577,953 1581,129 1584,273 43 42 39 40 56 50 56 62 65 54 59 62 75 79 73 63 67 77 73 75 68 62 54 51 51 100 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN'.encode())
c.sendall('1685 2020/03/05 14:42:05 318398 4 1 25 0 0 0 0 1513,094 1516,156 1519,154 1521,969 1525,029 1527,812 1530,921 1533,869 1536,740 1539,943 1542,922 1545,878 1548,843 1551,849 1554,760 1557,944 1560,782 1563,931 1566,786 1569,751 1572,691 1575,535 1578,638 1581,755 1584,758 41 39 33 39 48 44 49 55 61 58 64 55 68 74 68 59 57 74 61 68 58 64 54 47 46 2 25 0 0 0 0 1512,963 1515,935 1518,857 1521,849 1524,655 1527,577 1530,332 1533,233 1536,204 1539,489 1542,571 1545,725 1549,200 1552,430 1555,331 1558,484 1561,201 1564,285 1567,002 1569,870 1572,758 1575,491 1578,512 1581,547 1584,405 48 43 37 42 57 54 58 62 67 58 70 59 77 82 82 64 71 88 77 79 72 73 63 49 50 3 25 0 0 0 0 1513,394 1516,517 1519,536 1522,082 1525,427 1527,963 1531,288 1534,102 1536,659 1539,757 1542,707 1545,627 1548,389 1551,459 1554,406 1557,986 1560,666 1564,103 1567,036 1570,144 1573,190 1575,887 1579,185 1582,323 1585,338 34 35 32 37 57 58 61 64 75 73 70 62 61 61 59 51 52 64 58 62 70 70 64 54 55 4 25 0 0 0 0 1512,658 1515,753 1518,797 1521,707 1524,744 1527,627 1530,872 1534,002 1537,086 1540,320 1543,217 1546,011 1548,660 1551,385 1554,253 1557,074 1560,193 1563,116 1566,043 1568,963 1571,855 1574,957 1577,953 1581,129 1584,273 43 42 39 40 56 50 56 62 65 54 59 62 75 79 73 63 67 77 73 75 68 62 54 51 51 100 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN'.encode())
c.close()
if __name__ == '__main__':
Main()
Client code
import socket
import turtle
from tkinter import *
root = Tk()
canvas = Canvas(root, width = 150, height = 50)
canvas.grid(row = 4, column = 3)
input_label = Label(root, text = "Input all the gratings set straight wavelength values in nm")
input_label.grid(row = 0)
# Green light
green_light = turtle.RawTurtle(canvas)
green_light.shape('circle')
green_light.color('grey')
green_light.penup()
green_light.goto(0,0)
# Red light
red_light = turtle.RawTurtle(canvas)
red_light.shape('circle')
red_light.color('grey')
red_light.penup()
red_light.goto(40,0)
core_string = "Core "
entries = []
label_col_inc = 0
entry_col_inc = 1
core_range = range(1, 5)
for y in core_range:
core_text = core_string + str(y) + '_' + '25'
core_label = Label(root, text = core_text)
entry = Entry(root)
core_label.grid(row=1, column=label_col_inc, sticky=E)
entry.grid(row=1, column=entry_col_inc)
entries.append(entry)
label_col_inc += 2
entry_col_inc += 2
threshold_label = Label(root, text = "Threshold in nm")
entry_threshold = Entry(root)
threshold_label.grid(row = 2, sticky = E)
entry_threshold.grid(row = 2, column = 1)
#dataNo_label = Label(root, text = "Number of data to be collected")
#entry_datNo = Entry(root)
#dataNo_label.grid(row = 3, sticky = E)
#entry_datNo.grid(row = 3, column = 1)
# function to receive TCP data blocks
def getData():
host = '127.0.0.1'
port = 5009
s = socket.socket()
s.connect((host, port))
len_message = s.recv(4)
print('len_msg is', len_message)
# input of the number of data to be received
#dataNo = float(entry_datNo.get())
while len_message:
bytes_length = int.from_bytes(len_message,'big')
data = s.recv(bytes_length)
stringdata = data.decode('utf-8')
rep_str = stringdata.replace(",", ".")
splitstr = rep_str.split()
# received wavelength values
inc = 34
wav_threshold = []
for y in entries:
straight_wav = float(y.get())
print('y is', straight_wav)
wav = float(splitstr[inc])
wav_diff = wav - straight_wav
if wav_diff < 0:
wav_diff = wav_diff * (-1)
wav_threshold.append(wav_diff)
inc += 56
threshold = float(entry_threshold.get())
# writing into the file
data = []
inc1 = 0
col1 = 2
col2 = 6
data.insert(0, (str(splitstr[0])))
data.insert(1, (str(splitstr[1])))
for x in wav_threshold:
if (x > threshold):
red_light.color('red')
green_light.color('grey')
data.insert(col1, (str(splitstr[34 + inc1])))
data.insert(col2,(str(x)))
else:
red_light.color('grey')
green_light.color('green')
data.insert(col1,'-')
data.insert(col2,'-')
inc1 += 56
col1 += 1
col2 += 1
write_file(data)
print('writing...')
len_message = s.recv(4)
print('len_message is', len_message)
print('out of loop')
s.close()
# function to write into the file
def write_file(data):
with open("Output.txt", "a") as text_file:
text_file.write('\t'.join(data[0:]))
text_file.write('\n')
data_button = Button(root, text = "Get data above threshold", command = getData)
data_button.grid(row = 5, column = 0)
root.mainloop()
The same server code works fine with the following Client code:
import socket
def Main():
host = '127.0.0.1'
port = 5001
s = socket.socket()
s.connect((host,port))
i = 0
first_wav = float(input("Enter the threshold for 1-25: "))
second_wav = float(input("Enter the threshold for 2-25: "))
third_wav = float(input("Enter the threshold for 3-25: "))
fourth_wav = float(input("Enter the threshold for 4-25: "))
len_message = s.recv(4)
while len_message:
print(len_message)
while i < 2:
bytes_length = int(len_message.decode())
data = s.recv(bytes_length)
stringdata = data.decode('utf-8')
rep_str = stringdata.replace(",",".")
splitstr = rep_str.split()
maxlength = len(splitstr))
print('1-25 wavelength value:', splitstr[34])
print('2-25 wavelength value:', splitstr[90])
print('3-25 wavelength value:', splitstr[146])
print('3-25 wavelength value:', splitstr[202], '\n')
print('Threshold limit check:')
a = float(splitstr[34])
b = float(splitstr[90])
c = float(splitstr[146])
d = float(splitstr[202])
if a > first_wav:
print('1-25 Above Threshold')
if b > second_wav:
print('2-25 Above Threshold')
if c > third_wav:
print('3-25 Above Threshold')
if d > fourth_wav:
print('4-45 Above Threshold \n')
len_message = s.recv(4)
i+=1
print('i is', i)
else:
print('out of loop')
s.close()
def write_file(data):
with open("Output.txt", "ab") as text_file:
text_file.write(data)
print('data written')
text_file.write('\n'.encode())
def write_excel(data):
df = pd.DataFrame(data)
df.to_excel('Output_Excel.xlsx','Sheet 1')
if __name__ == '__main__':
Main()
ยดยดยด

Move every second row to row above in pandas dataframe

I have dataframe in this shape:
A B C D E
213-1 XL NaN NaN NaN
21 22.0 12 232.0 101.32
23-0 L NaN NaN NaN
12 23 12 232.2 NaN
31-0 LS NaN NaN NaN
70 70 23 NaN 21.22
I would like to move every second row of that dataframe to the row above so that there are only combined rows left as seen in the expected result:
ID Name A B C D E
213-1 XL 21 22.0 12 232.0 101.32
23-0 L 12 23 12 232.2 NaN
31-0 LS 70 70 23 NaN 21.22
Is it possible to do with Pandas?
I would use concat:
new_df = pd.concat((df.iloc[::2, :2].reset_index(drop=True),
df.iloc[1::2].reset_index(drop=True)),
axis=1)
# rename
new_df.columns = ['ID', 'Name'] + new_df.columns[2:].to_list()
Output:
ID Name A B C D E
0 213-1 XL 21 22.0 12.0 232.0 101.32
1 23-0 L 12 23 12.0 232.2 NaN
2 31-0 LS 70 70 23.0 NaN 21.22
concat on df.iloc[::2] and df.iloc[1::2]:
df1= (df.iloc[::2].dropna(axis=1).reset_index(drop=True))
df2 = (df.iloc[1::2].reset_index(drop=True))
print (pd.concat([df1,df2],ignore_index=True,axis=1))
#
0 1 2 3 4 5 6
0 213-1 XL 21 22.0 12.0 232.0 101.32
1 23-0 L 12 23 12.0 232.2 NaN
2 31-0 LS 70 70 23.0 NaN 21.22
master_df = df[~df['C'].isna()].reset_index(drop=True)
master_df[['ID','Name']] = pd.DataFrame(df[df['C'].isna()][['A','B']].reset_index(drop=True), index=master_df.index)
Output
##print(master_df[['ID','Name','A', 'B', 'C', 'D', 'E']])
ID Name A B C D E
0 213-1 XL 21 22.0 12.0 232.0 101.32
1 23-0 L 12 23 12.0 232.2 NaN
2 31-0 LS 70 70 23.0 NaN 21.22

Python/Scikit-learn/regressions - from pandas Dataframes to Scikit prediction

I have the following pandas DataFrame, called main_frame:
target_var input1 input2 input3 input4 input5 input6
Date
2013-09-01 13.0 NaN NaN NaN NaN NaN NaN
2013-10-01 13.0 NaN NaN NaN NaN NaN NaN
2013-11-01 12.2 NaN NaN NaN NaN NaN NaN
2013-12-01 10.9 NaN NaN NaN NaN NaN NaN
2014-01-01 11.7 0 13 42 0 0 16
2014-02-01 12.0 13 8 58 0 0 14
2014-03-01 12.8 13 15 100 0 0 24
2014-04-01 13.1 0 11 50 34 0 18
2014-05-01 12.2 12 14 56 30 71 18
2014-06-01 11.7 13 16 43 44 0 22
2014-07-01 11.2 0 19 45 35 0 18
2014-08-01 11.4 12 16 37 31 0 24
2014-09-01 10.9 14 14 47 30 56 20
2014-10-01 10.5 15 17 54 24 56 22
2014-11-01 10.7 12 18 60 41 63 21
2014-12-01 9.6 12 14 42 29 53 16
2015-01-01 10.2 10 16 37 31 0 20
2015-02-01 10.7 11 20 39 28 0 19
2015-03-01 10.9 10 17 75 27 87 22
2015-04-01 10.8 14 17 73 30 43 25
2015-05-01 10.2 10 17 55 31 52 24
I've been having trouble to explore the dataset on Scikit-learn and I'm not sure if the problem is the pandas Dataset, the dates as index, the NaN's/Infs/Zeros (which I don't know how to solve), everything, something else I wasn't able to track.
I want to build a simple regression to predict the next target_var item based on the variables named "Input" (1,2,3..).
Note that there are a lot of zeros and NaN's in the time series, and eventually we might find Inf's as well.
You should first try to remove any row with a Inf, -Inf or NaN values (other methods include filling in the NaNs with, for example, the mean value of the feature).
df = df.replace(to_replace=[np.Inf, -np.Inf], value=np.NaN)
df = df.dropna()
Now, create a numpy matrix of you features and a vector of your targets. Given that your target variable is in the first column, you can use integer based indexing as follows:
X = df.iloc[:, 1:].values
y = df.iloc[:, 0].values
Then create and fit your model:
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X=X, y=y)
Now you can observe your estimates:
>>> model.intercept_
12.109583092421092
>>> model.coef_
array([-0.05269033, -0.17723251, 0.03627883, 0.02219596, -0.01377465,
0.0111017 ])

Categories

Resources