Date time interpretation for Work& break time calculation

Date time interpretation for Work& break time calculation - python

I extracted the data from csv and converted to below format after data preparation with python.
I want to further prepare as below to store it as table in DB.
If we see below table, 8th hour from 0 min to 52 min its working time (Status:1)
from 8th hour from 53min to 59min its break (snacks break)(Status:2)
How do i convert it.
Existing
+------+-------+------------+------+------+------+----------+--------+--------+-------+-----+
| | plant | date | shop | line | hour | startmin | endmin | status | shift | uph |
+------+-------+------------+------+------+------+----------+--------+--------+-------+-----+
| 8 | HEF1 | 03-01-2020 | E | 1 | 8 | 0 | 52 | 1 | 2 | 25 |
| 9 | HEF1 | 03-01-2020 | E | 1 | 8 | 53 | 59 | 2 | 2 | 25 |
| 10 | HEF1 | 03-01-2020 | E | 1 | 9 | 0 | 59 | 1 | 2 | 25 |
| 11 | HEF1 | 03-01-2020 | E | 1 | 10 | 0 | 59 | 1 | 2 | 25 |
| 9645 | HEF2 | 27-01-2020 | E | 1 | 7 | 0 | 59 | 1 | 1 | 58 |
| 9646 | HEF2 | 27-01-2020 | E | 1 | 8 | 0 | 52 | 1 | 1 | 58 |
| 9647 | HEF2 | 27-01-2020 | E | 1 | 8 | 53 | 59 | 2 | 1 | 58 |
+------+-------+------------+------+------+------+----------+--------+--------+-------+-----+
I want to convert it to as below
Required
+-------+---------------------+------+------+------+--------+-------+-----+
| plant | datetime | shop | line | hour | status | shift | uph |
+-------+---------------------+------+------+------+--------+-------+-----+
| HEF1 | 03-01-2020 08:00:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:01:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:02:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:03:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:04:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:05:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:06:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:07:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:08:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:09:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:10:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:11:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:12:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:13:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:14:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:15:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:16:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:17:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:18:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:19:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:20:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:21:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:22:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:23:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:24:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:25:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:26:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:27:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:28:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:29:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:30:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:31:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:32:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:33:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:34:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:35:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:36:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:37:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:38:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:39:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:40:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:41:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:42:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:43:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:44:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:45:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:46:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:47:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:48:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:49:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:50:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:51:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:52:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:53:00 | E | 1 | 8 | 2 | 2 | 25 |
| HEF1 | 03-01-2020 08:54:00 | E | 1 | 8 | 2 | 2 | 25 |
| HEF1 | 03-01-2020 08:55:00 | E | 1 | 8 | 2 | 2 | 25 |
| HEF1 | 03-01-2020 08:56:00 | E | 1 | 8 | 2 | 2 | 25 |
| HEF1 | 03-01-2020 08:57:00 | E | 1 | 8 | 2 | 2 | 25 |
| HEF1 | 03-01-2020 08:58:00 | E | 1 | 8 | 2 | 2 | 25 |
| HEF1 | 03-01-2020 08:59:00 | E | 1 | 8 | 2 | 2 | 25 |
| HEF1 | 03-01-2020 09:00:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 09:01:00 | E | 1 | 8 | 1 | 2 | 25 |
+-------+---------------------+------+------+------+--------+-------+-----+

first create begin and end timestamps:
df['start_ts'] = pd.to_datetime(df['date'].astype(str) +' '+ df['hour'].astype(str)+':'+df['startmin'].astype(str))
df['end_ts'] = pd.to_datetime(df['date'].astype(str) +' '+ df['hour'].astype(str)+':'+df['endmin'].astype(str))
Then create a date range column:
df['t_range'] = [pd.date_range(start=x[0], end=x[1], freq='min') for x in zip(df['start_ts'], df['end_ts'])]
Then explode by that column:
df.explode('t_range')
delete and rename columns as needed

Related

Efficient way to apply a pandas conditional to a large list of values [duplicate]

I have dataframe df_groups that contain sample number, group number and accuracy.
Tabel 1: Samples with their groups
+----+----------+------------+------------+
| | sample | group | Accuracy |
|----+----------+------------+------------|
| 0 | 0 | 6 | 91.6 |
| 1 | 1 | 4 | 92.9333 |
| 2 | 2 | 2 | 91 |
| 3 | 3 | 2 | 90.0667 |
| 4 | 4 | 4 | 91.8 |
| 5 | 5 | 5 | 92.5667 |
| 6 | 6 | 6 | 91.1 |
| 7 | 7 | 5 | 92.3333 |
| 8 | 8 | 2 | 92.7667 |
| 9 | 9 | 0 | 91.1333 |
| 10 | 10 | 4 | 92.5 |
| 11 | 11 | 5 | 92.4 |
| 12 | 12 | 7 | 93.1333 |
| 13 | 13 | 7 | 93.5333 |
| 14 | 14 | 2 | 92.1 |
| 15 | 15 | 6 | 93.2 |
| 16 | 16 | 8 | 92.7333 |
| 17 | 17 | 8 | 90.8 |
| 18 | 18 | 3 | 91.9 |
| 19 | 19 | 3 | 93.3 |
| 20 | 20 | 5 | 90.6333 |
| 21 | 21 | 9 | 92.9333 |
| 22 | 22 | 4 | 93.3333 |
| 23 | 23 | 9 | 91.5333 |
| 24 | 24 | 9 | 92.9333 |
| 25 | 25 | 1 | 92.3 |
| 26 | 26 | 9 | 92.2333 |
| 27 | 27 | 6 | 91.9333 |
| 28 | 28 | 5 | 92.1 |
| 29 | 29 | 8 | 84.8 |
+----+----------+------------+------------+
I want to return a dataframe with any accuracy above (e.g. 92).
so the results will be like this
Tabel 1: Samples with their groups when accuracy above 92.
+----+----------+------------+------------+
| | sample | group | Accuracy |
|----+----------+------------+------------|
| 1 | 1 | 4 | 92.9333 |
| 2 | 5 | 5 | 92.5667 |
| 3 | 7 | 5 | 92.3333 |
| 4 | 8 | 2 | 92.7667 |
| 5 | 10 | 4 | 92.5 |
| 6 | 11 | 5 | 92.4 |
| 7 | 12 | 7 | 93.1333 |
| 8 | 13 | 7 | 93.5333 |
| 9 | 14 | 2 | 92.1 |
| 10 | 15 | 6 | 93.2 |
| 11 | 16 | 8 | 92.7333 |
| 12 | 19 | 3 | 93.3 |
| 13 | 21 | 9 | 92.9333 |
| 14 | 22 | 4 | 93.3333 |
| 15 | 24 | 9 | 92.9333 |
| 16 | 25 | 1 | 92.3 |
| 17 | 26 | 9 | 92.2333 |
| 18 | 28 | 5 | 92.1 |
+----+----------+------------+------------+
so, the result will return based on the condition that is greater than or equal to the predefined accuracy (e.g. 92, 90 or 85, ect).

You can use df.loc[df['Accuracy'] >= predefined_accuracy] .

How do I login to a website prior to scraping data?

Before web scraping, I'm using the following code to login.
import requests
from bs4 import BeautifulSoup
login_url = 'https://www.footballoutsiders.com/user/login'
data = {
'username': 'username',
'password': 'password'
}
with requests.Session() as s:
response = s.post(login_url , data)
print(response.text)
I then do the following to view the table, but the cells are still locked.
index_page= s.get('https://www.footballoutsiders.com/stats/nfl/historical-lookup-by-
week/2020/1/overall')
soup = BeautifulSoup(index_page.text, 'lxml')
table1 = soup.find('table')
table1
What am I doing wrong?

You are passing in the wrong data. It should be 'user' and 'pass'. Also, don't use BeautifulSoup to parse <table> tags (when you only need the content). Pandas can do that for you (uses bs4 under the hood).
import requests
import pandas as pd
LOGIN_URL = 'https://www.footballoutsiders.com/user/login?destination=home'
login = {
'name': '123#email.com',
'pass': '54321',
'form_id': 'user_login_form',
'op': 'Login'}
s = requests.Session()
s.post(LOGIN_URL, data=login)
index_page= s.get('https://www.footballoutsiders.com/stats/nfl/historical-lookup-by-week/2020/1/overall')
df = pd.read_html(index_page.text)[0]
Output:
print(df.to_markdown())
| | Team | W-L | Total DVOA | Total DVOA.1 | Weighted DVOA | Weighted DVOA.1 | Offense DVOA | Offense DVOA.1 | Offense Weighted DVOA | Offense Weighted DVOA.1 | Defense DVOA | Defense DVOA.1 | Defense Weighted DVOA | Defense Weighted DVOA.1 | Special Teams DVOA | Special Teams DVOA.1 | Special Teams Weighted DVOA | Special Teams Weighted DVOA.1 |
|---:|:-------|:------|-------------:|:---------------|----------------:|:------------------|---------------:|:-----------------|------------------------:|:--------------------------|---------------:|:-----------------|------------------------:|:--------------------------|---------------------:|:-----------------------|------------------------------:|:--------------------------------|
| 0 | BAL | 1-0 | 1 | 88.0% | 1 | 88.0% | 1 | 39.9% | 1 | 39.9% | 3 | -38.8% | 3 | -38.8% | 2 | 9.4% | 2 | 9.4% |
| 1 | NE | 1-0 | 2 | 52.3% | 2 | 52.3% | 3 | 36.4% | 3 | 36.4% | 5 | -23.8% | 5 | -23.8% | 25 | -7.9% | 25 | -7.9% |
| 2 | JAX | 1-0 | 3 | 38.0% | 3 | 38.0% | 4 | 35.8% | 4 | 35.8% | 13 | 0.5% | 13 | 0.5% | 5 | 2.8% | 5 | 2.8% |
| 3 | SEA | 1-0 | 4 | 37.0% | 4 | 37.0% | 2 | 38.6% | 2 | 38.6% | 21 | 9.5% | 21 | 9.5% | 3 | 7.8% | 3 | 7.8% |
| 4 | PIT | 1-0 | 5 | 36.0% | 5 | 36.0% | 14 | 6.5% | 14 | 6.5% | 2 | -39.0% | 2 | -39.0% | 27 | -9.4% | 27 | -9.4% |
| 5 | WAS | 1-0 | 6 | 35.9% | 6 | 35.9% | 28 | -32.7% | 28 | -32.7% | 1 | -69.4% | 1 | -69.4% | 13 | -0.8% | 13 | -0.8% |
| 6 | BUF | 1-0 | 7 | 16.7% | 7 | 16.7% | 17 | 2.6% | 17 | 2.6% | 7 | -19.0% | 7 | -19.0% | 22 | -4.8% | 22 | -4.8% |
| 7 | LV | 1-0 | 8 | 13.7% | 8 | 13.7% | 5 | 31.7% | 5 | 31.7% | 23 | 16.5% | 23 | 16.5% | 15 | -1.4% | 15 | -1.4% |
| 8 | NO | 1-0 | 9 | 10.8% | 9 | 10.8% | 24 | -13.7% | 24 | -13.7% | 9 | -14.3% | 9 | -14.3% | 1 | 10.2% | 1 | 10.2% |
| 9 | MIN | 0-1 | 10 | 10.8% | 10 | 10.8% | 6 | 28.2% | 6 | 28.2% | 26 | 20.1% | 26 | 20.1% | 6 | 2.7% | 6 | 2.7% |
| 10 | LAC | 1-0 | 11 | 4.1% | 11 | 4.1% | 22 | -7.6% | 22 | -7.6% | 6 | -20.4% | 6 | -20.4% | 26 | -8.7% | 26 | -8.7% |
| 11 | CAR | 0-1 | 12 | 2.5% | 12 | 2.5% | 8 | 23.4% | 8 | 23.4% | 27 | 24.0% | 27 | 24.0% | 4 | 3.2% | 4 | 3.2% |
| 12 | CHI | 1-0 | 13 | 0.3% | 13 | 0.3% | 19 | 0.9% | 19 | 0.9% | 12 | -1.6% | 12 | -1.6% | 16 | -2.2% | 16 | -2.2% |
| 13 | DAL | 0-1 | 14 | 0.3% | 14 | 0.3% | 12 | 9.2% | 12 | 9.2% | 18 | 3.9% | 18 | 3.9% | 23 | -5.1% | 23 | -5.1% |
| 14 | DET | 0-1 | 15 | -0.1% | 15 | -0.1% | 21 | -0.3% | 21 | -0.3% | 15 | 1.2% | 15 | 1.2% | 9 | 1.4% | 9 | 1.4% |
| 15 | KC | 1-0 | 16 | -1.3% | 16 | -1.3% | 9 | 17.6% | 9 | 17.6% | 25 | 17.5% | 25 | 17.5% | 14 | -1.3% | 14 | -1.3% |
| 16 | GB | 1-0 | 17 | -5.0% | 17 | -5.0% | 7 | 24.2% | 7 | 24.2% | 30 | 28.7% | 30 | 28.7% | 12 | -0.4% | 12 | -0.4% |
| 17 | ARI | 1-0 | 18 | -5.2% | 18 | -5.2% | 20 | 0.5% | 20 | 0.5% | 17 | 2.9% | 17 | 2.9% | 17 | -2.8% | 17 | -2.8% |
| 18 | SF | 0-1 | 19 | -6.1% | 19 | -6.1% | 18 | 2.3% | 18 | 2.3% | 14 | 0.6% | 14 | 0.6% | 24 | -7.8% | 24 | -7.8% |
| 19 | HOU | 0-1 | 20 | -9.3% | 20 | -9.3% | 11 | 11.1% | 11 | 11.1% | 24 | 16.7% | 24 | 16.7% | 19 | -3.6% | 19 | -3.6% |
| 20 | LAR | 1-0 | 21 | -13.5% | 21 | -13.5% | 13 | 7.5% | 13 | 7.5% | 20 | 9.5% | 20 | 9.5% | 28 | -11.4% | 28 | -11.4% |
| 21 | TEN | 1-0 | 22 | -14.3% | 22 | -14.3% | 16 | 3.4% | 16 | 3.4% | 10 | -11.1% | 10 | -11.1% | 32 | -28.7% | 32 | -28.7% |
| 22 | TB | 0-1 | 23 | -16.3% | 23 | -16.3% | 25 | -16.1% | 25 | -16.1% | 8 | -16.4% | 8 | -16.4% | 30 | -16.7% | 30 | -16.7% |
| 23 | DEN | 0-1 | 24 | -17.1% | 24 | -17.1% | 23 | -13.3% | 23 | -13.3% | 19 | 3.9% | 19 | 3.9% | 11 | 0.1% | 11 | 0.1% |
| 24 | ATL | 0-1 | 25 | -26.0% | 25 | -26.0% | 10 | 11.8% | 10 | 11.8% | 32 | 38.9% | 32 | 38.9% | 10 | 1.1% | 10 | 1.1% |
| 25 | IND | 0-1 | 26 | -27.0% | 26 | -27.0% | 15 | 3.8% | 15 | 3.8% | 28 | 26.0% | 28 | 26.0% | 21 | -4.8% | 21 | -4.8% |
| 26 | CIN | 0-1 | 27 | -28.4% | 27 | -28.4% | 29 | -33.2% | 29 | -33.2% | 11 | -8.5% | 11 | -8.5% | 20 | -3.7% | 20 | -3.7% |
| 27 | NYJ | 0-1 | 28 | -35.2% | 28 | -35.2% | 26 | -21.1% | 26 | -21.1% | 16 | 2.7% | 16 | 2.7% | 29 | -11.4% | 29 | -11.4% |
| 28 | PHI | 0-1 | 29 | -41.1% | 29 | -41.1% | 32 | -71.3% | 32 | -71.3% | 4 | -33.6% | 4 | -33.6% | 18 | -3.4% | 18 | -3.4% |
| 29 | MIA | 0-1 | 30 | -48.9% | 30 | -48.9% | 27 | -23.2% | 27 | -23.2% | 29 | 28.3% | 29 | 28.3% | 8 | 2.6% | 8 | 2.6% |
| 30 | NYG | 0-1 | 31 | -54.7% | 31 | -54.7% | 30 | -45.4% | 30 | -45.4% | 22 | 11.9% | 22 | 11.9% | 7 | 2.6% | 7 | 2.6% |
| 31 | CLE | 0-1 | 32 | -107.6% | 32 | -107.6% | 31 | -57.5% | 31 | -57.5% | 31 | 33.3% | 31 | 33.3% | 31 | -16.8% | 31 | -16.8% |

shift below cells to count for R

I am using the code below to produce following result in Python and I want equivalent for this code on R.
here N is the column of dataframe data . CN column is calculated from values of column N with a specific pattern and it gives me following result in python.
+---+----+
| N | CN |
+---+----+
| 0 | 0 |
| 1 | 1 |
| 1 | 1 |
| 2 | 2 |
| 2 | 2 |
| 0 | 3 |
| 0 | 3 |
| 1 | 4 |
| 1 | 4 |
| 1 | 4 |
| 2 | 5 |
| 2 | 5 |
| 3 | 6 |
| 4 | 7 |
| 0 | 8 |
| 1 | 9 |
| 2 | 10 |
+---+----+
a short overview of my code is
data = pd.read_table(filename,skiprows=15,decimal=',', sep='\t',header=None,names=["Date ","Heure ","temps (s) ","X","Z"," LVDT V(mm) " ,"Force normale (N) ","FT","FN(N) ","TS"," NS(kPa) ","V (mm/min)","Vitesse normale (mm/min)","e (kPa)","k (kPa/mm) " ,"N " ,"Nb cycles normal" ,"Cycles " ,"Etat normal" ,"k imposÈ (kPa/mm)"])
data.columns = [col.strip() for col in data.columns.tolist()]
N = data[data.keys()[15]]
N = np.array(N)
data["CN"] = (data.N.shift().bfill() != data.N).astype(int).cumsum()
an example of data.head() is here
+-------+-------------+------------+-----------+----------+----------+------------+-------------------+-----------+-------------+-----------+------------+------------+--------------------------+------------+------------+-----+------------------+--------+-------------+-------------------+----+
| Index | Date | Heure | temps (s) | X | Z(mm) | LVDT V(mm) | Force normale (N) | FT | FN(N) | FT (kPa) | NS(kPa) | V (mm/min) | Vitesse normale (mm/min) | e (kPa) | k (kPa/mm) | N | Nb cycles normal | Cycles | Etat normal | k imposÈ (kPa/mm) | CN |
+-------+-------------+------------+-----------+----------+----------+------------+-------------------+-----------+-------------+-----------+------------+------------+--------------------------+------------+------------+-----+------------------+--------+-------------+-------------------+----+
| 184 | 01/02/2022 | 12:36:52 | 402.163 | 6.910243 | 1.204797 | 0.001101 | 299.783665 | 31.494351 | 1428.988908 | 11.188704 | 505.825016 | 0.1 | 2.0 | 512.438828 | 50.918786 | 0.0 | 0.0 | Sort | Monte | 0.0 | 0 |
| 185 | 01/02/2022 | 12:36:54 | 404.288 | 6.907822 | 1.205647 | 4.9e-05 | 296.072718 | 31.162313 | 1404.195316 | 11.028167 | 494.97955 | 0.1 | -2.0 | 500.084986 | 49.685639 | 0.0 | 0.0 | Sort | Descend | 0.0 | 0 |
| 186 | 01/02/2022 | 12:36:56 | 406.536 | 6.907906 | 1.204194 | -0.000214 | 300.231424 | 31.586401 | 1429.123486 | 11.21895 | 505.750815 | 0.1 | 2.0 | 512.370164 | 50.914002 | 0.0 | 0.0 | Sort | Monte | 0.0 | 0 |
| 187 | 01/02/2022 | 12:36:58 | 408.627 | 6.910751 | 1.204293 | -0.000608 | 300.188686 | 31.754064 | 1428.979519 | 11.244542 | 505.624564 | 0.1 | 2.0 | 512.309254 | 50.906544 | 0.0 | 0.0 | Sort | Monte | 0.0 | 0 |
| 188 | 01/02/2022 | 12:37:00 | 410.679 | 6.907805 | 1.205854 | -0.000181 | 296.358074 | 31.563389 | 1415.224427 | 11.129375 | 502.464948 | 0.1 | 2.0 | 510.702313 | 50.742104 | 0.0 | 0.0 | Sort | Monte | 0.0 | 0 |
+-------+-------------+------------+-----------+----------+----------+------------+-------------------+-----------+-------------+-----------+------------+------------+--------------------------+------------+------------+-----+------------------+--------+-------------+-------------------+----+

A one line cumsum trick solves it.
cumsum(c(0L, diff(df1$N) != 0))
#> [1] 0 1 1 2 2 3 3 4 4 4 5 5 6 7 8 9 10
all.equal(
cumsum(c(0L, diff(df1$N) != 0)),
df1$CN
)
#> [1] TRUE
Created on 2022-02-14 by the reprex package (v2.0.1)
Data
x <- "
+---+----+
| N | CN |
+---+----+
| 0 | 0 |
| 1 | 1 |
| 1 | 1 |
| 2 | 2 |
| 2 | 2 |
| 0 | 3 |
| 0 | 3 |
| 1 | 4 |
| 1 | 4 |
| 1 | 4 |
| 2 | 5 |
| 2 | 5 |
| 3 | 6 |
| 4 | 7 |
| 0 | 8 |
| 1 | 9 |
| 2 | 10 |
+---+----+"
df1 <- read.table(textConnection(x), header = TRUE, sep = "|", comment.char = "+")[2:3]
Created on 2022-02-14 by the reprex package (v2.0.1)

update table information based on columns of another table

I am new in python have two dataframes, df1 contains information about all students with their group and score, and df2 contains updated information about few students when they change their group and score. How could I update the information in df1 based on the values of df2 (group and score)?
df1
+----+----------+-----------+----------------+
| |student No| group | score |
|----+----------+-----------+----------------|
| 0 | 0 | 0 | 0.839626 |
| 1 | 1 | 0 | 0.845435 |
| 2 | 2 | 3 | 0.830778 |
| 3 | 3 | 2 | 0.831565 |
| 4 | 4 | 3 | 0.823569 |
| 5 | 5 | 0 | 0.808109 |
| 6 | 6 | 4 | 0.831645 |
| 7 | 7 | 1 | 0.851048 |
| 8 | 8 | 3 | 0.843209 |
| 9 | 9 | 4 | 0.84902 |
| 10 | 10 | 0 | 0.835143 |
| 11 | 11 | 4 | 0.843228 |
| 12 | 12 | 2 | 0.826949 |
| 13 | 13 | 0 | 0.84196 |
| 14 | 14 | 1 | 0.821634 |
| 15 | 15 | 3 | 0.840702 |
| 16 | 16 | 0 | 0.828994 |
| 17 | 17 | 2 | 0.843043 |
| 18 | 18 | 4 | 0.809093 |
| 19 | 19 | 1 | 0.85426 |
+----+----------+-----------+----------------+
df2
+----+-----------+----------+----------------+
| | group |student No| score |
|----+-----------+----------+----------------|
| 0 | 2 | 1 | 0.887435 |
| 1 | 0 | 19 | 0.81214 |
| 2 | 3 | 17 | 0.899041 |
| 3 | 0 | 8 | 0.853333 |
| 4 | 4 | 9 | 0.88512 |
+----+-----------+----------+----------------+
The result
df: 3
+----+----------+-----------+----------------+
| |student No| group | score |
|----+----------+-----------+----------------|
| 0 | 0 | 0 | 0.839626 |
| 1 | 1 | 2 | 0.887435 |
| 2 | 2 | 3 | 0.830778 |
| 3 | 3 | 2 | 0.831565 |
| 4 | 4 | 3 | 0.823569 |
| 5 | 5 | 0 | 0.808109 |
| 6 | 6 | 4 | 0.831645 |
| 7 | 7 | 1 | 0.851048 |
| 8 | 8 | 0 | 0.853333 |
| 9 | 9 | 4 | 0.88512 |
| 10 | 10 | 0 | 0.835143 |
| 11 | 11 | 4 | 0.843228 |
| 12 | 12 | 2 | 0.826949 |
| 13 | 13 | 0 | 0.84196 |
| 14 | 14 | 1 | 0.821634 |
| 15 | 15 | 3 | 0.840702 |
| 16 | 16 | 0 | 0.828994 |
| 17 | 17 | 3 | 0.899041 |
| 18 | 18 | 4 | 0.809093 |
| 19 | 19 | 0 | 0.81214 |
+----+----------+-----------+----------------+
my code to update df1 from df2
dfupdated = df1.merge(df2, how='left', on=['student No'], suffixes=('', '_new'))
dfupdated['group'] = np.where(pd.notnull(dfupdated['group_new']), dfupdated['group_new'],
dfupdated['group'])
dfupdated['score'] = np.where(pd.notnull(dfupdated['score_new']), dfupdated['score_new'],
dfupdated['score'])
dfupdated.drop(['group_new', 'score_new'],axis=1, inplace=True)
dfupdated.reset_index(drop=True, inplace=True)
but I face the following error
KeyError: "['group'] not in index"

I don't know what's wrong I ran same and got the answer
giving a different way to solve it
try :
dfupdated = df1.merge(df2, on='student No', how='left')
dfupdated['group'] = dfupdated['group_y'].fillna(dfupdated['group_x'])
dfupdated['score'] = dfupdated['score_y'].fillna(dfupdated['score_x'])
dfupdated.drop(['group_x', 'group_y','score_x', 'score_y'], axis=1,inplace=True)
will give you the solution you want.
to get the max from each group
dfupdated.groupby(['group'], sort=False)['score'].max()

Reattach ID column to data after passing through sklearn model

I am working on trying to create a predictive regression model that forecasts the completion date of a number of orders.
My dataset looks like:
| ORDER_NUMBER | Feature1 | Feature2 | Feature3 | Feature4 | Feature5 | Feature6 | TOTAL_DAYS_TO_COMPLETE | Feature8 | Feature9 | Feature10 | Feature11 | Feature12 | Feature13 | Feature14 | Feature15 | Feature16 | Feature17 | Feature18 | Feature19 | Feature20 | Feature21 | Feature22 | Feature23 | Feature24 | Feature25 | Feature26 | Feature27 | Feature28 | Feature29 | Feature30 | Feature31 |
|:------------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:----------------------:|:--------:|:--------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|
| 102203591 | 12 | 2014 | 10 | 2014 | 1 | 2015 | 760 | 50 | 83 | 5 | 6 | 12 | 18 | 31 | 8 | 0 | 1 | 0 | 1 | 16 | 131.29 | 24.3768 | 158.82 | 1.13 | 6.52 | 10 | 51 | 39 | 27 | 88 | 1084938 |
| 102231010 | 2 | 2015 | 1 | 2015 | 2 | 2015 | 706 | 35 | 34 | 2 | 1 | 4 | 3 | 3 | 3 | 0 | 0 | 0 | 1 | 2 | 11.95 | 5.162 | 17.83 | 1.14 | 3.45 | 1 | 4 | 20 | 16 | 25 | 367140 |
| 102251893 | 6 | 2015 | 4 | 2015 | 3 | 2015 | 1143 | 36 | 43 | 1 | 2 | 4 | 5 | 6 | 3 | 1 | 0 | 0 | 1 | 5 | 8.55 | 5.653 | 34.51 | 4.59 | 6.1 | 0 | 1 | 17 | 30 | 12 | 103906 |
| 102287793 | 4 | 2015 | 2 | 2015 | 4 | 2015 | 733 | 45 | 71 | 4 | 1 | 6 | 35 | 727 | 6 | 0 | 3 | 15 | 0 | 19 | 174.69 | 97.448 | 319.98 | 1.49 | 3.28 | 20 | 113 | 71 | 59 | 71 | 1005041 |
| 102288060 | 6 | 2015 | 5 | 2015 | 4 | 2015 | 1092 | 26 | 21 | 1 | 1 | 3 | 2 | 2 | 1 | 0 | 0 | 0 | 0 | 2 | 4.73 | 4.5363 | 18.85 | 3.11 | 4.16 | 0 | 1 | 16 | 8 | 16 | 69062 |
| 102308069 | 8 | 2015 | 6 | 2015 | 5 | 2015 | 676 | 41 | 34 | 2 | 0 | 3 | 2 | 2 | 1 | 0 | 0 | 0 | 0 | 2 | 2.98 | 6.1173 | 11.3 | 1.36 | 1.85 | 0 | 1 | 17 | 12 | 3 | 145887 |
| 102319918 | 8 | 2015 | 7 | 2015 | 6 | 2015 | 884 | 25 | 37 | 1 | 1 | 3 | 2 | 3 | 2 | 0 | 0 | 1 | 0 | 2 | 5.57 | 3.7083 | 9.18 | 0.97 | 2.48 | 0 | 1 | 14 | 5 | 7 | 45243 |
| 102327578 | 6 | 2015 | 4 | 2015 | 6 | 2015 | 595 | 49 | 68 | 3 | 5 | 9 | 11 | 13 | 5 | 4 | 2 | 0 | 1 | 10 | 55.41 | 24.3768 | 104.98 | 2.03 | 4.31 | 10 | 51 | 39 | 26 | 40 | 418266 |
| 102337989 | 7 | 2015 | 5 | 2015 | 7 | 2015 | 799 | 50 | 66 | 5 | 6 | 12 | 21 | 29 | 12 | 0 | 0 | 0 | 1 | 20 | 138.79 | 24.3768 | 172.56 | 1.39 | 7.08 | 10 | 51 | 39 | 34 | 101 | 1229299 |
| 102450069 | 8 | 2015 | 7 | 2015 | 11 | 2015 | 456 | 20 | 120 | 2 | 1 | 3 | 12 | 14 | 8 | 0 | 0 | 0 | 0 | 7 | 2.92 | 6.561 | 12.3 | 1.43 | 1.87 | 2 | 1 | 15 | 6 | 6 | 142805 |
| 102514564 | 5 | 2016 | 3 | 2016 | 2 | 2016 | 639 | 25 | 35 | 1 | 2 | 4 | 3 | 6 | 3 | 0 | 0 | 0 | 0 | 3 | 4.83 | 4.648 | 14.22 | 2.02 | 3.06 | 0 | 1 | 15 | 5 | 13 | 62941 |
| 102528121 | 10 | 2015 | 9 | 2015 | 3 | 2016 | 413 | 15 | 166 | 1 | 1 | 3 | 2 | 3 | 2 | 0 | 0 | 0 | 0 | 2 | 4.23 | 1.333 | 15.78 | 8.66 | 11.84 | 1 | 4 | 8 | 6 | 3 | 111752 |
| 102564376 | 1 | 2016 | 12 | 2015 | 4 | 2016 | 802 | 27 | 123 | 2 | 1 | 4 | 3 | 3 | 3 | 0 | 1 | 0 | 0 | 3 | 1.27 | 2.063 | 6.9 | 2.73 | 3.34 | 1 | 4 | 14 | 20 | 6 | 132403 |
| 102564472 | 1 | 2016 | 12 | 2015 | 4 | 2016 | 817 | 27 | 123 | 0 | 1 | 2 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1.03 | 2.063 | 9.86 | 4.28 | 4.78 | 1 | 4 | 14 | 22 | 4 | 116907 |
| 102599569 | 2 | 2016 | 12 | 2015 | 5 | 2016 | 425 | 47 | 151 | 1 | 2 | 4 | 3 | 4 | 3 | 0 | 0 | 0 | 0 | 2 | 27.73 | 15.8993 | 60.5 | 2.06 | 3.81 | 12 | 108 | 34 | 24 | 20 | 119743 |
| 102599628 | 2 | 2016 | 12 | 2015 | 5 | 2016 | 425 | 47 | 151 | 3 | 4 | 8 | 8 | 9 | 7 | 0 | 0 | 0 | 2 | 8 | 39.28 | 14.8593 | 91.26 | 3.5 | 6.14 | 12 | 108 | 34 | 38 | 15 | 173001 |
| 102606421 | 3 | 2016 | 12 | 2015 | 5 | 2016 | 965 | 55 | 161 | 5 | 11 | 17 | 29 | 44 | 11 | 1 | 1 | 0 | 1 | 22 | 148.06 | 23.7983 | 195.69 | 2 | 8.22 | 10 | 51 | 39 | 47 | 112 | 1196097 |
| 102621293 | 7 | 2016 | 5 | 2016 | 6 | 2016 | 701 | 42 | 27 | 2 | 1 | 4 | 3 | 3 | 1 | 0 | 0 | 0 | 1 | 2 | 8.39 | 3.7455 | 13.93 | 1.48 | 3.72 | 1 | 5 | 14 | 14 | 20 | 258629 |
| 102632364 | 7 | 2016 | 6 | 2016 | 6 | 2016 | 982 | 41 | 26 | 4 | 2 | 7 | 6 | 6 | 2 | 0 | 0 | 0 | 1 | 4 | 26.07 | 2.818 | 37.12 | 3.92 | 13.17 | 1 | 5 | 14 | 22 | 10 | 167768 |
| 102643207 | 9 | 2016 | 9 | 2016 | 7 | 2016 | 255 | 9 | 73 | 3 | 1 | 5 | 4 | 4 | 2 | 0 | 0 | 0 | 0 | 0 | 2.17 | 0.188 | 4.98 | 14.95 | 26.49 | 1 | 4 | 2 | 11 | 1 | 49070 |
| 102656091 | 9 | 2016 | 8 | 2016 | 7 | 2016 | 356 | 21 | 35 | 1 | 0 | 2 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1.45 | 2.0398 | 5.54 | 2.01 | 2.72 | 1 | 4 | 14 | 15 | 3 | 117107 |
| 102660407 | 9 | 2016 | 8 | 2016 | 7 | 2016 | 462 | 21 | 31 | 2 | 0 | 3 | 2 | 2 | 1 | 0 | 0 | 0 | 0 | 2 | 3.18 | 2.063 | 8.76 | 2.7 | 4.25 | 1 | 4 | 14 | 14 | 10 | 151272 |
| 102665666 | 10 | 2016 | 9 | 2016 | 7 | 2016 | 235 | 9 | 64 | 0 | 1 | 2 | 1 | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0.188 | 2.95 | 10.37 | 15.69 | 1 | 4 | 2 | 10 | 1 | 52578 |
| 102665667 | 10 | 2016 | 9 | 2016 | 7 | 2016 | 235 | 9 | 64 | 0 | 1 | 2 | 1 | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 0.72 | 0.188 | 2.22 | 7.98 | 11.81 | 1 | 4 | 2 | 10 | 1 | 52578 |
| 102665668 | 10 | 2016 | 9 | 2016 | 7 | 2016 | 235 | 9 | 64 | 0 | 1 | 2 | 1 | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 0.9 | 0.188 | 2.24 | 7.13 | 11.91 | 1 | 4 | 2 | 10 | 1 | 52578 |
| 102666306 | 7 | 2016 | 6 | 2016 | 7 | 2016 | 235 | 16 | 34 | 3 | 1 | 5 | 5 | 6 | 4 | 0 | 0 | 0 | 0 | 3 | 14.06 | 3.3235 | 31.27 | 5.18 | 9.41 | 1 | 1 | 16 | 5 | 18 | 246030 |
| 102668177 | 8 | 2016 | 6 | 2016 | 8 | 2016 | 233 | 36 | 32 | 0 | 1 | 2 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 2.5 | 5.2043 | 8.46 | 1.15 | 1.63 | 0 | 1 | 14 | 2 | 4 | 89059 |
| 102669909 | 6 | 2016 | 4 | 2016 | 8 | 2016 | 244 | 46 | 105 | 4 | 11 | 16 | 28 | 30 | 15 | 1 | 2 | 1 | 1 | 25 | 95.49 | 26.541 | 146.89 | 1.94 | 5.53 | 1 | 51 | 33 | 9 | 48 | 78488 |
| 102670188 | 5 | 2016 | 4 | 2016 | 8 | 2016 | 413 | 20 | 109 | 1 | 1 | 2 | 2 | 3 | 2 | 0 | 0 | 0 | 0 | 1 | 2.36 | 6.338 | 8.25 | 0.93 | 1.3 | 2 | 1 | 14 | 5 | 3 | 117137 |
| 102671063 | 8 | 2016 | 6 | 2016 | 8 | 2016 | 296 | 46 | 44 | 2 | 4 | 7 | 7 | 111 | 3 | 1 | 0 | 1 | 0 | 7 | 12.96 | 98.748 | 146.24 | 1.35 | 1.48 | 20 | 113 | 70 | 26 | 9 | 430192 |
| 102672475 | 8 | 2016 | 7 | 2016 | 8 | 2016 | 217 | 20 | 23 | 0 | 1 | 2 | 1 | 2 | 1 | 0 | 0 | 0 | 0 | 1 | 0.5 | 4.9093 | 5.37 | 0.99 | 1.09 | 0 | 1 | 16 | 0 | 1 | 116673 |
| 102672477 | 10 | 2016 | 9 | 2016 | 8 | 2016 | 194 | 20 | 36 | 1 | 0 | 2 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0.61 | 5.1425 | 3.65 | 0.59 | 0.71 | 0 | 1 | 16 | 0 | 2 | 98750 |
| 102672513 | 10 | 2016 | 9 | 2016 | 8 | 2016 | 228 | 20 | 36 | 1 | 1 | 3 | 2 | 2 | 1 | 0 | 0 | 0 | 0 | 1 | 0.25 | 5.1425 | 6.48 | 1.21 | 1.26 | 0 | 1 | 16 | 0 | 2 | 116780 |
| 102682943 | 5 | 2016 | 4 | 2016 | 8 | 2016 | 417 | 20 | 113 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0.64 | 6.338 | 5.53 | 0.77 | 0.87 | 2 | 1 | 14 | 5 | 2 | 100307 |
The ORDER_NUMBER should not be a feature in the model -- it is a unique identifier that I would like to essentially not count in the model since it is just a random ID, but include in the final dataset, so I can tie back predictions and actual values to the order.
Currently, my code looks like this:
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
import pandas as pd
import numpy as np
def get_feature_importances(cols, importances):
feats = {}
for feature, importance in zip(cols, importances):
feats[feature] = importance
importances = pd.DataFrame.from_dict(feats, orient='index').rename(columns={0: 'Gini-importance'})
return importances.sort_values(by='Gini-importance', ascending = False)
def compare_values(arr1, arr2):
thediff = 0
thediffs = []
for thing1, thing2 in zip(arr1, arr2):
thediff = abs(thing1 - thing2)
thediffs.append(thediff)
return thediffs
def print_to_file(filepath, arr):
with open(filepath, 'w') as f:
for item in arr:
f.write("%s\n" % item)
# READ IN THE DATA TABLE ABOVE
data = pd.read_csv('test.csv')
# create the labels, or field we are trying to estimate
label = data['TOTAL_DAYS_TO_COMPLETE']
# remove the header
label = label[1:]
# create the data, or the data that is to be estimated
data = data.drop('TOTAL_DAYS_TO_COMPLETE', axis=1)
# Remove the order number since we don't need it
data = data.drop('ORDER_NUMBER', axis=1)
# remove the header
data = data[1:]
# # split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data, label, test_size = 0.2)
rf = RandomForestRegressor(
bootstrap = True,
max_depth = None,
max_features = 'sqrt',
min_samples_leaf = 1,
min_samples_split = 2,
n_estimators = 5000
)
rf.fit(X_train, y_train)
rf_predictions = rf.predict(X_test)
rf_differences = compare_values(y_test, rf_predictions)
rf_Avg = np.average(rf_differences)
print("#################################################")
print("DATA FOR RANDOM FORESTS")
print(rf_Avg)
importances = get_feature_importances(X_test.columns, rf.feature_importances_)
print()
print(importances)
If I print(y_test) and print(rf_predictions) I get something like:
**print(y_test)**
7
155
84
64
49
41
200
168
43
111
64
46
96
47
50
27
216
..
**print(rf_predictions)**
34.496
77.366
69.6105
61.6825
80.8495
79.8785
177.5465
129.014
70.0405
97.3975
82.4435
57.9575
108.018
57.5515
..
And it works. If I print out y_test and rf_predictions, I get the labels for the test data and the predicted label values.
However, I would like to see what orders are associated both the y_test values and the rf_predictions values. How can I keep that dataset and create a dataframe (like below):
| Order Number | Predicted Value | Actual Value |
|--------------|-------------------|--------------|
| Foo0 | 34.496 | 7 |
| Foo1 | 77.366 | 155 |
| Foo2 | 69.6105 | 84 |
| Foo3 | 61.6825 | 64 |
I have tried looking at this post but I could not get a solution. I did try print(y_test, rf_predictions) and that did not do any good since I have .drop() the ORDER_NUMBER field.

As you're using pandas dataframes the index is retained in all your x/y train/test datasets, so you can re-assemble it after you applied the model. We just need to save the order numbers before dropping that column: order_numbers = data['ORDER_NUMBER']. The predictions rf_predictions are returned in the same order as the input data to rf.predict(X_test), i.e. rf_predictions[i] belongs to X_test.iloc[i].
This creates your required result dataset:
res = y_test.to_frame('Actual Value')
res.insert(0, 'Predicted Value', rf_predictions)
res = order_numbers.to_frame().join(res, how='inner')
Btw, data = data[1:] doesn't remove the header, it removes the first row, so there's no need to remove anything when you work with pandas dataframes.
So the final program will be:
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
import pandas as pd
import numpy as np
def get_feature_importances(cols, importances):
feats = {}
for feature, importance in zip(cols, importances):
feats[feature] = importance
importances = pd.DataFrame.from_dict(feats, orient='index').rename(columns={0: 'Gini-importance'})
return importances.sort_values(by='Gini-importance', ascending = False)
def compare_values(arr1, arr2):
thediff = 0
thediffs = []
for thing1, thing2 in zip(arr1, arr2):
thediff = abs(thing1 - thing2)
thediffs.append(thediff)
return thediffs
def print_to_file(filepath, arr):
with open(filepath, 'w') as f:
for item in arr:
f.write("%s\n" % item)
# READ IN THE DATA TABLE ABOVE
data = pd.read_csv('test.csv')
# create the labels, or field we are trying to estimate
label = data['TOTAL_DAYS_TO_COMPLETE']
# create the data, or the data that is to be estimated
data = data.drop('TOTAL_DAYS_TO_COMPLETE', axis=1)
# Remove the order number since we don't need it
order_numbers = data['ORDER_NUMBER']
data = data.drop('ORDER_NUMBER', axis=1)
# # split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data, label, test_size = 0.2)
rf = RandomForestRegressor(
bootstrap = True,
max_depth = None,
max_features = 'sqrt',
min_samples_leaf = 1,
min_samples_split = 2,
n_estimators = 5000
)
rf.fit(X_train, y_train)
rf_predictions = rf.predict(X_test)
rf_differences = compare_values(y_test, rf_predictions)
rf_Avg = np.average(rf_differences)
print("#################################################")
print("DATA FOR RANDOM FORESTS")
print(rf_Avg)
importances = get_feature_importances(X_test.columns, rf.feature_importances_)
print()
print(importances)
res = y_test.to_frame('Actual Value')
res.insert(0, 'Predicted Value', rf_predictions)
res = order_numbers.to_frame().join(res, how='inner')
print(res)
With your example data from above we get (for train_test_split with random_state=1):
ORDER_NUMBER Predicted Value Actual Value
3 102287793 652.0746 733
14 102599569 650.3984 425
19 102643207 319.4964 255
20 102656091 388.6004 356
26 102668177 475.1724 233
27 102669909 671.9158 244
32 102672513 319.1550 228

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Date time interpretation for Work& break time calculation - python

Related

Efficient way to apply a pandas conditional to a large list of values [duplicate]

How do I login to a website prior to scraping data?

shift below cells to count for R

update table information based on columns of another table

Reattach ID column to data after passing through sklearn model

Categories

Resources