My data
I have these data as attached and I'm trying to make overlap the Home and Away Histogram for each team individually? I'm new to python btw.
So far I made which looks exactly what I want but I want to combine them again by each team:
df_EPL['Away_score'].hist(by=df_EPL['AwayTeam'],figsize = (8,8),color = '#96ddff');
and
df_EPL['Home_score'].hist(by=df_EPL['HomeTeam'],figsize = (8,8),color = '#82c065');
Fake Dataframe creation
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
np.random.seed(42)
teams = ['Arsenal', 'Chelsea', 'Liverpool', 'Manchester City', 'Manchester Utd']
df = pd.DataFrame({'HomeTeam': np.repeat(teams, len(teams) - 1)})
df['AwayTeam'] = [away_team for home_team in teams for away_team in teams if away_team != home_team]
df['Home_score'] = np.random.randint(0, 5, len(df))
df['Away_score'] = np.random.randint(0, 5, len(df))
HomeTeam AwayTeam Home_score Away_score
0 Arsenal Chelsea 3 1
1 Arsenal Liverpool 4 4
2 Arsenal Manchester City 2 3
3 Arsenal Manchester Utd 4 0
4 Chelsea Arsenal 4 0
5 Chelsea Liverpool 1 2
6 Chelsea Manchester City 2 2
7 Chelsea Manchester Utd 2 1
8 Liverpool Arsenal 2 3
9 Liverpool Chelsea 4 3
10 Liverpool Manchester City 3 2
11 Liverpool Manchester Utd 2 3
12 Manchester City Arsenal 4 3
13 Manchester City Chelsea 1 0
14 Manchester City Liverpool 3 2
15 Manchester City Manchester Utd 1 4
16 Manchester Utd Arsenal 3 2
17 Manchester Utd Chelsea 4 4
18 Manchester Utd Liverpool 0 0
19 Manchester Utd Manchester City 3 1
Dataframe re-shape
You need to re-shape your dataframe in a different format in order to make the plot you want. For this purpose, you can use pandas.melt:
df = pd.melt(frame = df,
id_vars = ['HomeTeam', 'AwayTeam'],
var_name = 'H/A',
value_name = 'Score')
df = df.drop('AwayTeam', axis = 1).rename(columns = {'HomeTeam': 'Team'}).replace({'Home_score': 'Home', 'Away_score': 'Away'})
Team H/A Score
0 Arsenal Home 3
1 Arsenal Home 4
2 Arsenal Home 2
3 Arsenal Home 4
4 Chelsea Home 4
5 Chelsea Home 1
6 Chelsea Home 2
7 Chelsea Home 2
8 Liverpool Home 2
9 Liverpool Home 4
10 Liverpool Home 3
11 Liverpool Home 2
12 Manchester City Home 4
13 Manchester City Home 1
14 Manchester City Home 3
15 Manchester City Home 1
16 Manchester Utd Home 3
17 Manchester Utd Home 4
18 Manchester Utd Home 0
19 Manchester Utd Home 3
20 Arsenal Away 1
21 Arsenal Away 4
22 Arsenal Away 3
23 Arsenal Away 0
24 Chelsea Away 0
25 Chelsea Away 2
26 Chelsea Away 2
27 Chelsea Away 1
28 Liverpool Away 3
29 Liverpool Away 3
30 Liverpool Away 2
31 Liverpool Away 3
32 Manchester City Away 3
33 Manchester City Away 0
34 Manchester City Away 2
35 Manchester City Away 4
36 Manchester Utd Away 2
37 Manchester Utd Away 4
38 Manchester Utd Away 0
39 Manchester Utd Away 1
Plot
Now dataframe is ready to be plotted. You can use seaborn.FacetGrid to create the grid of subplots, one for each team. Each subplot will have two seaborn.histplot: one for Home_score and one for Away_score:
g = sns.FacetGrid(df, col = 'Team', hue = 'H/A')
g.map(sns.histplot, 'Score', bins = np.arange(df['Score'].min() - 0.5, df['Score'].max() + 1.5, 1))
g.add_legend()
g.set(xticks = np.arange(df['Score'].min(), df['Score'].max() + 1, 1))
plt.show()
Related
I am trying to create a separate pandas DataFrame in python using pandas'.groupby function. I am working with basketball data and want to create a column that displays if the home and away teams are on the tail end of a back-to-back.
The 0 in the yesterday_home_team and yesterday_away_team columns indicates that the away team did not play the previous night.
Given that there are multiple games each night, the .groupby function should be used.
Input Data:
date home_team away_team
9/22/22 LAL DET
9/23/22 LAC LAL
Desired output:
date home_team away_team yesterday_home_team yesterday_away_team
9/21/22 LAL MIN 0 MIN
9/22/22 LAL DET DET 0
9/23/22 LAC LAL LAL LAC
Appreciate your assistance.
Your output example doesn't make sense to me. Do you need the team names in the 'yesterday_home_team' and 'yesterday_away_team'? Is it sufficient to simply just have a 1 if the home team is on the back to back, and 0 if the home team is not (and then also same logic for away team)? It's also tough when you don't provide a good sample dataset.
Anyways, here's my solution that just indicates a 1 or 0 if the given team is on the back end of the back to back:
import pandas as pd
import numpy as np
months = ['October', 'November', 'December', 'January', 'February', 'March', 'April', 'May', 'June']
dfs = []
for month in months:
month = month.lower()
url = f'https://www.basketball-reference.com/leagues/NBA_2022_games-{month}.html'
df = pd.read_html(url)[0]
df['Date'] = pd.to_datetime(df['Date'])
dfs.append(df)
df = pd.concat(dfs)
df = df.rename(columns={'Visitor/Neutral':'away_team', 'Home/Neutral':'home_team'})
df_melt = pd.melt(df, id_vars=['Date'],
value_vars=['away_team', 'home_team'],
var_name = 'Home_Away',
value_name = 'Team')
df_melt = df_melt.sort_values('Date').reset_index(drop=True)
df_melt['days_between'] = df_melt.groupby('Team')['Date'].diff().dt.days
df_melt['yesterday'] = np.where(df_melt['days_between'] == 1, 1, 0)
df_melt = df_melt.drop(['days_between', 'Home_Away'], axis=1)
df = df.merge(df_melt.rename(columns={'Team':'home_team', 'yesterday':'yesterday_home_team'}), how='left', left_on=['Date', 'home_team'], right_on=['Date', 'home_team'])
df = df.merge(df_melt.rename(columns={'Team':'away_team', 'yesterday':'yesterday_away_team'}), how='left', left_on=['Date', 'away_team'], right_on=['Date', 'away_team'])
df = df[['Date', 'home_team', 'away_team', 'yesterday_home_team', 'yesterday_away_team']]
Output:
print(df.head(30).to_string())
Date home_team away_team yesterday_home_team yesterday_away_team
0 2021-10-19 Milwaukee Bucks Brooklyn Nets 0 0
1 2021-10-19 Los Angeles Lakers Golden State Warriors 0 0
2 2021-10-20 Charlotte Hornets Indiana Pacers 0 0
3 2021-10-20 Detroit Pistons Chicago Bulls 0 0
4 2021-10-20 New York Knicks Boston Celtics 0 0
5 2021-10-20 Toronto Raptors Washington Wizards 0 0
6 2021-10-20 Memphis Grizzlies Cleveland Cavaliers 0 0
7 2021-10-20 Minnesota Timberwolves Houston Rockets 0 0
8 2021-10-20 New Orleans Pelicans Philadelphia 76ers 0 0
9 2021-10-20 San Antonio Spurs Orlando Magic 0 0
10 2021-10-20 Utah Jazz Oklahoma City Thunder 0 0
11 2021-10-20 Portland Trail Blazers Sacramento Kings 0 0
12 2021-10-20 Phoenix Suns Denver Nuggets 0 0
13 2021-10-21 Atlanta Hawks Dallas Mavericks 0 0
14 2021-10-21 Miami Heat Milwaukee Bucks 0 0
15 2021-10-21 Golden State Warriors Los Angeles Clippers 0 0
16 2021-10-22 Orlando Magic New York Knicks 0 0
17 2021-10-22 Washington Wizards Indiana Pacers 0 0
18 2021-10-22 Cleveland Cavaliers Charlotte Hornets 0 0
19 2021-10-22 Boston Celtics Toronto Raptors 0 0
20 2021-10-22 Philadelphia 76ers Brooklyn Nets 0 0
21 2021-10-22 Houston Rockets Oklahoma City Thunder 0 0
22 2021-10-22 Chicago Bulls New Orleans Pelicans 0 0
23 2021-10-22 Denver Nuggets San Antonio Spurs 0 0
24 2021-10-22 Los Angeles Lakers Phoenix Suns 0 0
25 2021-10-22 Sacramento Kings Utah Jazz 0 0
26 2021-10-23 Cleveland Cavaliers Atlanta Hawks 1 0
27 2021-10-23 Indiana Pacers Miami Heat 1 0
28 2021-10-23 Toronto Raptors Dallas Mavericks 1 0
29 2021-10-23 Chicago Bulls Detroit Pistons 1 0
I want get the value of a cell in Dataframe based on string that is not equal but so similar.
This is the dataframe
Teams GP Pts
0 Liverpool 15 44
1 Chelsea 15 35
2 Manchester C. 15 32
3 West Ham Utd 15 28
4 Manchester Utd 14 24
5 Leicester City 14 22
6 Watford 15 20
7 Aston Villa 14 19
8 Crystal Palace 14 19
9 Arsenal 14 17
10 Brentford 14 17
11 Everton 14 17
12 Newcastle Utd 15 17
13 Brighton 15 14
14 Burnley 14 14
15 Southampton 15 14
16 Leeds Utd 14 13
17 Tottenham 13 13
18 Wolverhampton 15 12
19 Norwich City 14 8
Code
hometeam = 'Manchester City'
pts_man_city = df[df.Teams == hometeam].iloc[0]['Pts']
But got IndexError: single positional indexer is out-of-bounds
You can use thefuzz.process (previously fuzzywuzzy):
# pip install thefuzz
from thefuzz import process
hometeam = 'Manchester City'
best = process.extractOne(hometeam, df['Teams'])[0]
df.loc[df['Teams'].eq(best), 'Pts'].iloc[0]
output: 32
We need to find similar strings. Ok, let's do it!
from difflib import SequenceMatcher
def similar(a, b):
return SequenceMatcher(None, a, b).ratio()
alpha = 0.75
idx = df.team.apply(lambda x: x if similar(x, your_team) > alpha else None).dropna().index[0]
df.iloc[idx]['pts']
Just change alpha parameter for your task.
The below code returns row of specific team
'''
df.loc[df['Teams'] == hometown]
'''
I need to order values, as ascending, in a column on a dataframe, based on values of another column in the same dataframe. First thing i am doing is to perform a 'select' query to retrieve all data from our table and store it in a dataframe named 'df':
def from_econtable_search_virk():
engine = create_engine(f'postgresql+psycopg2://{username}:{password}#{server}:5432/{database}')
df = pd.read_sql_query('select * from {table}', con=engine)
When i print the dataframe df, i receive an output looking like the following:
number name address city token
0 1 Alarm oxstreet 12 Reading eng
1 3 Center examstreet 24 Bristol JOC
2 2 Computer pentaroad 4 Oxford eng
3 3 Music thisstreet 2 London eng
4 4 School schoolroad 45 London eng
5 1 Hospital madstreet 24 Manchester Owx
6 2 Bowling placestreet 5 Birmingham Owx
7 1 Hotel cemstreet 24 Liverpool JOC
8 2 Paintball shootstreet 2 Manchester JOC
9 4 Computer comproad 24 Brigthon JOC
What i then need to do with the dataframe df, is to first and foremost, sort the tokens based on a list (not alphabetically).
list = ['eng', 'Owx', 'JOC']
Which should make the dataframe df look like the following:
number name address city token
0 1 Alarm oxstreet 12 Reading eng
1 2 Computer pentaroad 4 Oxford eng
2 3 Music thisstreet 2 London eng
3 4 School schoolroad 45 London eng
4 1 Hospital madstreet 24 Manchester Owx
5 2 Bowling placestreet 5 Birmingham Owx
6 1 Hotel cemstreet 24 Liverpool JOC
7 2 Paintball shootstreet 2 Manchester JOC
8 4 Computer comproad 24 Brigthon JOC
9 3 Center examstreet 24 Bristol JOC
Finally, the values in the number column must be ordered in an ascending manner, based on the token, and the dataframe will eventually then look like the following:
number name address city token
0 1 Alarm oxstreet 12 Reading eng
1 2 Computer pentaroad 4 Oxford eng
2 3 Music thisstreet 2 London eng
3 4 School schoolroad 45 London eng
4 1 Hospital madstreet 24 Manchester Owx
5 2 Bowling placestreet 5 Birmingham Owx
6 1 Hotel cemstreet 24 Liverpool JOC
7 2 Paintball shootstreet 2 Manchester JOC
8 3 Center examstreet 24 Bristol JOC
9 4 Computer comproad 24 Brigthon JOC
Convert token column to categorical dtype and sort values by token then by number.
cats = ['eng', 'Owx', 'JOC']
df['token'] = df['token'].astype(pd.CategoricalDtype(cats, ordered=True))
>>> df['token'].dtype
CategoricalDtype(categories=['eng', 'Owx', 'JOC'], ordered=True)
>>> df.sort_values(['token', 'number'])
number name address city token
0 1 Alarm oxstreet 12 Reading eng
2 2 Computer pentaroad 4 Oxford eng
3 3 Music thisstreet 2 London eng
4 4 School schoolroad 45 London eng
5 1 Hospital madstreet 24 Manchester Owx
6 2 Bowling placestreet 5 Birmingham Owx
7 1 Hotel cemstreet 24 Liverpool JOC
8 2 Paintball shootstreet 2 Manchester JOC
1 3 Center examstreet 24 Bristol JOC
9 4 Computer comproad 24 Brigthon JOC
I appreciate your collaboration to convert the result code into a dataframe with the 2 columns. I was able to do the for loop to print each result and now I need to save this data to a dataframe. But I have not been able to get the result correct. Can you help me?
import requests
from bs4 import BeautifulSoup
r = requests.get('https://www.skysports.com/premier-league-table') #get page web information
soup = BeautifulSoup(r.text, 'html.parser') # interpreter
print(soup.prettify())
print(soup.title)
league_table = soup.find('table', class_ = 'standing-table__table callfn')
for team in league_table.find_all('tbody'):
rows = team.find_all('tr')
for row in rows:
pl_team = row.find('td', class_ ='standing-table__cell standing-table__cell--name')
pl_team = pl_team['data-long-name']
points = row.find_all('td', class_ = 'standing-table__cell')[9].text
print(pl_team, points)
Use pandas. You can do it in 1 line.
import pandas as pd
df = pd.read_html('https://www.skysports.com/premier-league-table')[0]
Ouput:
print(df[['Team','Pts']])
Team Pts
0 Manchester City 83
1 Manchester United 71
2 Chelsea 67
3 Liverpool 66
4 Leicester City 66
5 West Ham United 62
6 Tottenham Hotspur 59
7 Everton 59
8 Arsenal 58
9 Leeds United 56
10 Aston Villa 52
11 Wolverhampton Wanderers 45
12 Crystal Palace 44
13 Southampton 43
14 Newcastle United 42
15 Brighton and Hove Albion 41
16 Burnley 39
17 Fulham 28
18 West Bromwich Albion 26
19 Sheffield United 20
I have a text file that looks like this:
************************************************************************************************
English Premier Division - Saturday 25th May 2002
************************************************************************************************
================================================================================================
2001/2 Assists
================================================================================================
Pos Player Club Apps Asts
-------------------------------------------------------------------------
1st David Beckham Man Utd 29 15
2nd Dean Gordon Middlesbrough 30 (1) 11
3rd John Collins Fulham 32 11
4th Ryan Giggs Man Utd 32 11
5th Kieron Dyer Newcastle 33 10
6th Sean Davis Fulham 23 (1) 10
7th Damien Duff Blackburn 30 (3) 10
8th Alan Smith Leeds 23 (6) 9
9th Jesper Grønkjær Chelsea 34 9
10th Andrejs Stolcers Fulham 28 9
11th Ian Harte Leeds 37 8
12th Eidur Gudjohnsen Chelsea 28 (3) 8
13th Robert Pires Arsenal 24 (3) 7
14th Lauren Arsenal 32 (1) 7
15th John Robinson Charlton 33 7
16th Michael Gray Sunderland 37 7
17th Henrik Pedersen Bolton 36 7
18th Anders Svensson Southampton 34 (2) 7
19th Lee Bowyer Leeds 32 7
20th Craig Hignett Blackburn 21 (6) 7
21st Paul Merson Aston Villa 27 7
22nd Teddy Sheringham Tottenham 37 7
23rd Steed Malbranque Fulham 16 (14) 7
24th Marian Pahars Southampton 37 7
25th Muzzy Izzet Leicester 28 7
26th Sergei Rebrov Tottenham 36 (1) 7
27th Julio Arca Sunderland 32 (1) 7
28th Christian Bassedas Newcastle 37 7
29th Juan Sebastián Verón Man Utd 29 (2) 7
30th Joe Cole West Ham 32 6
I'm trying to read it into a pandas data frame like this:
df = pd.read_table('assist1.txt',
sep='\s+',
skiprows=6,
header=0,)
This code throws an exception - pandas.errors.ParserError: Error tokenizing data. C error: Expected 7 fields in line 31, saw 8.
I guess that's because of the space between the first and last name of the player (should be the value of the Player column).
Is there a way to achieve this?
Furthermore, it is a part of a larger text file that looks like this:
************************************************************************************************
English Premier Division - Saturday 25th May 2002
************************************************************************************************
================================================================================================
2001/2 Table
================================================================================================
Pos Team Pld Won Drn Lst For Ag Won Drn Lst For Ag Pts
--------------------------------------------------------------------------------------------------
1st C Man Utd 38 15 4 0 41 4 10 4 5 34 20 83
--------------------------------------------------------------------------------------------------
2nd Arsenal 38 15 2 2 38 9 11 3 5 28 14 83
3rd Leeds 38 15 4 0 33 8 9 4 6 36 37 80
4th Liverpool 38 13 4 2 25 7 9 2 8 26 24 72
5th Chelsea 38 16 1 2 44 18 4 5 10 24 33 66
6th Newcastle 38 11 5 3 40 23 7 3 9 25 33 62
7th Blackburn 38 11 3 5 36 24 5 5 9 23 30 56
8th Middlesbrough 38 9 7 3 31 19 5 6 8 20 29 55
9th Sunderland 38 8 5 6 31 30 8 2 9 22 25 55
10th West Ham 38 11 3 5 31 17 3 7 9 14 29 52
11th Tottenham 38 10 3 6 35 26 4 5 10 23 35 50
12th Leicester 38 7 5 7 23 20 6 4 9 26 28 48
13th Fulham 38 7 5 7 39 35 5 7 7 33 44 48
14th Ipswich 38 9 4 6 23 22 3 3 13 14 34 43
15th Charlton 38 5 5 9 18 26 5 4 10 16 30 39
16th Everton 38 8 4 7 30 28 1 5 13 11 36 36
17th Aston Villa 38 2 8 9 19 28 5 6 8 21 26 35
--------------------------------------------------------------------------------------------------
18th R Derby 38 6 4 9 25 28 3 3 13 14 39 34
19th R Southampton 38 5 7 7 34 34 1 4 14 12 35 29
20th R Bolton 38 6 3 10 25 31 1 4 14 15 40 28
================================================================================================
2001/2 Goals
================================================================================================
Pos Player Club Apps Gls
-------------------------------------------------------------------------
1st Thierry Henry Arsenal 34 25
2nd Alan Shearer Newcastle 36 25
3rd Ruud van Nistelrooy Man Utd 26 23
4th Steve Marlet Fulham 38 20
5th Jimmy Floyd Hasselbaink Chelsea 30 (1) 20
6th Les Ferdinand Sunderland 27 (2) 17
7th Kevin Phillips Sunderland 36 17
8th Frédéric Kanouté West Ham 32 (3) 14
9th Marcus Bent Blackburn 28 (4) 13
10th Alen Boksic Middlesbrough 36 13
11th Eidur Gudjohnsen Chelsea 28 (3) 13
12th Luis Boa Morte Fulham 36 13
13th Michael Owen Liverpool 32 (1) 12
14th Dwight Yorke Man Utd 29 (1) 11
15th Henrik Pedersen Bolton 36 11
16th Juan Pablo Angel Aston Villa 34 (2) 11
17th Juan Sebastián Verón Man Utd 29 (2) 11
18th Shaun Bartlett Charlton 35 10
19th Matt Jansen Blackburn 28 (5) 10
20th Duncan Ferguson Everton 28 (5) 10
21st Ian Harte Leeds 37 10
22nd Bosko Balaban Aston Villa 36 10
23rd Robbie Fowler Liverpool 25 (3) 10
24th Georgi Kinkladze Derby 36 (1) 10
25th Hamilton Ricard Middlesbrough 28 (2) 10
26th Robert Pires Arsenal 24 (3) 9
27th Andrew Cole Man Utd 15 (5) 9
28th Rod Wallace Bolton 31 9
29th James Beattie Southampton 28 (1) 9
30th Robbie Keane Leeds 28 (8) 9
================================================================================================
2001/2 Assists
================================================================================================
Pos Player Club Apps Asts
-------------------------------------------------------------------------
1st David Beckham Man Utd 29 15
2nd Dean Gordon Middlesbrough 30 (1) 11
3rd John Collins Fulham 32 11
4th Ryan Giggs Man Utd 32 11
5th Kieron Dyer Newcastle 33 10
6th Sean Davis Fulham 23 (1) 10
7th Damien Duff Blackburn 30 (3) 10
8th Alan Smith Leeds 23 (6) 9
9th Jesper Grønkjær Chelsea 34 9
10th Andrejs Stolcers Fulham 28 9
11th Ian Harte Leeds 37 8
12th Eidur Gudjohnsen Chelsea 28 (3) 8
13th Robert Pires Arsenal 24 (3) 7
14th Lauren Arsenal 32 (1) 7
15th John Robinson Charlton 33 7
16th Michael Gray Sunderland 37 7
17th Henrik Pedersen Bolton 36 7
18th Anders Svensson Southampton 34 (2) 7
19th Lee Bowyer Leeds 32 7
20th Craig Hignett Blackburn 21 (6) 7
21st Paul Merson Aston Villa 27 7
22nd Teddy Sheringham Tottenham 37 7
23rd Steed Malbranque Fulham 16 (14) 7
24th Marian Pahars Southampton 37 7
25th Muzzy Izzet Leicester 28 7
26th Sergei Rebrov Tottenham 36 (1) 7
27th Julio Arca Sunderland 32 (1) 7
28th Christian Bassedas Newcastle 37 7
29th Juan Sebastián Verón Man Utd 29 (2) 7
30th Joe Cole West Ham 32 6
================================================================================================
2001/2 Average Rating
================================================================================================
Pos Player Club Apps Av R
-------------------------------------------------------------------------
1st Ruud van Nistelrooy Man Utd 26 8.54
2nd Thierry Henry Arsenal 34 8.09
3rd Alan Shearer Newcastle 36 7.97
4th Kieron Dyer Newcastle 33 7.94
5th Steve Marlet Fulham 38 7.89
6th Ian Harte Leeds 37 7.86
7th Andrew Cole Man Utd 15 (5) 7.85
8th Roy Keane Man Utd 19 7.84
9th Les Ferdinand Sunderland 27 (2) 7.83
10th Juan Sebastián Verón Man Utd 29 (2) 7.81
11th Eidur Gudjohnsen Chelsea 28 (3) 7.77
12th Jesper Grønkjær Chelsea 34 7.76
13th Michaël Silvestre Man Utd 32 7.72
14th Dean Gordon Middlesbrough 30 (1) 7.71
15th Michael Owen Liverpool 32 (1) 7.70
16th Patrick Vieira Arsenal 29 7.69
17th Robert Pires Arsenal 24 (3) 7.67
18th Ryan Giggs Man Utd 32 7.66
19th Dwight Yorke Man Utd 29 (1) 7.63
20th Mario Stanic Chelsea 29 (3) 7.63
21st Frédéric Kanouté West Ham 32 (3) 7.57
22nd Mark Viduka Leeds 21 7.57
23rd David Beckham Man Utd 29 7.55
24th Jimmy Floyd Hasselbaink Chelsea 30 (1) 7.55
25th Martin Taylor Blackburn 14 (8) 7.55
26th Titus Bramble Ipswich 33 7.55
27th Sol Campbell Arsenal 20 (1) 7.52
28th Mario Melchiot Chelsea 19 (2) 7.52
29th Stephane Henchoz Liverpool 29 7.52
30th Rio Ferdinand Leeds 36 (1) 7.51
================================================================================================
2001/2 Man of Match
================================================================================================
Pos Player Club Apps MoM
-------------------------------------------------------------------------
1st Thierry Henry Arsenal 34 8
2nd Ruud van Nistelrooy Man Utd 26 8
3rd Kieron Dyer Newcastle 33 6
4th Les Ferdinand Sunderland 27 (2) 6
5th Steve Marlet Fulham 38 6
6th Eidur Gudjohnsen Chelsea 28 (3) 6
7th Ian Harte Leeds 37 5
8th Richie Wellens Leicester 20 (9) 5
9th Henrik Pedersen Bolton 36 5
10th Alan Shearer Newcastle 36 5
11th Michael Owen Liverpool 32 (1) 4
12th Dean Gordon Middlesbrough 30 (1) 4
13th Matt Jansen Blackburn 28 (5) 4
14th Marcus Bent Blackburn 28 (4) 4
15th Kevin Campbell Everton 27 (4) 4
16th Titus Bramble Ipswich 33 4
17th Roy Keane Man Utd 19 4
18th Frédéric Kanouté West Ham 32 (3) 4
19th Patrick Vieira Arsenal 29 4
20th Hermann Hreidarsson Ipswich 34 4
21st Dennis Bergkamp Arsenal 22 (9) 4
22nd Jimmy Floyd Hasselbaink Chelsea 30 (1) 4
23rd Claus Lundekvam Southampton 27 (2) 4
24th Robert Pires Arsenal 24 (3) 3
25th Shaun Bartlett Charlton 35 3
26th Kevin Phillips Sunderland 36 3
27th Lucas Radebe Leeds 31 (1) 3
28th Ragnvald Soma West Ham 27 (3) 3
29th Dean Richards Tottenham 34 3
30th Wayne Quinn Liverpool 25 (4) 3
Ideally I would like to run a function that creates a data frame out of each table above, but can't figure it out.
Thanks
Thanks
another way you can specify the seperator as more than one space, and skiprows as a list of rows. I tried this and it gave me your expected output. You can write simple script to find which lines to be skipped and which to be considered.
df = pd.read_table('assist1.txt', sep='\s\s+', skiprows=[0,1,2,3,4,5,6,7,8,10], header=0,engine='python')
You're using whitespace as a delimiter, but this is fixed-length delimited, not whitespace delimited. You should google fixed-length parsing, e.g. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_fwf.html.