In every "for" loop, two items (user and likes) are taken from the "Data" list and are added to a separate list (scoreboard). I am trying to detect if "user" already exists in the scoreboard list. If it does exist, then I try to take the item that comes after "user" in the scoreboard list (which would be the previous "likes" item) and add the new "likes" item to it, after that I try to set the new "likes" item to the new value. but i couldn't make it work.
def likecount():
scoreboard = []
for t in range(0,len(Data),3):
user = Data[t]
likes = Data[t + 2]
if user not in scoreboard:
scoreboard.append(user), scoreboard.append(likes)
else:
scoreboard[(scoreboard.index(user) + 1)] + likes == scoreboard[(scoreboard.index(user) + 1)]
for i in range(0,len(scoreboard),2):
print (scoreboard[i],scoreboard[i+1], end= "\n")
#Data list:
['Rhett_andEmma77', 'Los Angeles is soooo hot', '0', 'Tato', 'Glitch', '1', 'Aurio', 'April fools tomorrow', '4', 'nap', 'The bully**', '3', 'NICKSION', 'Oops', '6', 'stupidfuck', 'hes a little guy', '5', 'database', 'Ty', '1', 'stupidfuck', 'possible object show (objects needed)', '3', 'NightSkyMusic', 'nicotine takes 10 seconds to reach your brain', '27', 'stupidfuck', '#BFBleafyfan99 be anii e', '4', 'Odminey', 'Aliveness', '26', 'stupidfuck', '#techness2011 the', '5', 'techness2011', 'Boomerang', '1', 'saltyipaint', 'April is r slur month', '5', 'HENRYFOLIO', 'flip and dunk', '4', 'SpenceAnimation', 'Got Any grapes 🍇', '2', 'RainyFox2', 'Draw me in your style****', '1', 'hecksacontagon', 'funky cat guy impresses you with his fish corpse', '11', 'HENRYFOLIO', 'flip and dunk #bruhkeko version', '4', 'nairb', 'Spoderman turns green', '5', 'SpenceAnimation', 'Jellybean', '1', 'SpenceAnimation', '#FussiArt', '3']
Dictiionary is precisely the thing you are looking for instead of keeping a list of items. So whenever a user is considered, you will check if he/she exists
If exists, then add the new likes to the old like count.
If it doesn't, create an entry with the like count.
That is the natural solution.
data = ['Rhett_andEmma77', 'Los Angeles is soooo hot', '0', 'Tato', 'Glitch', '1', 'Aurio', 'April fools tomorrow', '4', 'nap', 'The bully**', '3', 'NICKSION', 'Oops', '6', 'stupidfuck', 'hes a little guy', '5', 'database', 'Ty', '1', 'stupidfuck', 'possible object show (objects needed)', '3', 'NightSkyMusic', 'nicotine takes 10 seconds to reach your brain', '27', 'stupidfuck', '#BFBleafyfan99 be anii e', '4', 'Odminey', 'Aliveness', '26', 'stupidfuck', '#techness2011 the', '5', 'techness2011', 'Boomerang', '1', 'saltyipaint', 'April is r slur month', '5', 'HENRYFOLIO', 'flip and dunk', '4', 'SpenceAnimation', 'Got Any grapes 🍇', '2', 'RainyFox2', 'Draw me in your style****', '1', 'hecksacontagon', 'funky cat guy impresses you with his fish corpse', '11', 'HENRYFOLIO', 'flip and dunk #bruhkeko version', '4', 'nairb', 'Spoderman turns green', '5', 'SpenceAnimation', 'Jellybean', '1', 'SpenceAnimation', '#FussiArt', '3']
assert(len(data)%3 == 0)
scoreboard = {}
for i in range(0, len(data), 3):
if data[i] in scoreboard.keys():
scoreboard[data[i]]+= int(data[i+2])
else:
scoreboard[data[i]] = int(data[i+2])
print(scoreboard)
Output
{'Rhett_andEmma77': 0, 'Tato': 1, 'Aurio': 4, 'nap': 3, 'NICKSION': 6, 'stupidfuck': 17, 'database': 1, 'NightSkyMusic': 27, 'Odminey': 26, 'techness2011': 1, 'saltyipaint': 5, 'HENRYFOLIO': 8, 'SpenceAnimation': 6, 'RainyFox2': 1, 'hecksacontagon': 11, 'nairb': 5}
Related
I am trying to use beautiful soup to pull the table corresponding to the HTML code below
<table class="sortable stats_table now_sortable" id="team_pitching" data-cols-to-freeze=",2">
<caption>Team Pitching</caption>
from https://www.baseball-reference.com/register/team.cgi?id=17cdc2d2. Here is a screenshot of the site layout and HTML code I am trying to extract from.
I was using the code
url = 'https://www.baseball-reference.com/register/team.cgi?id=17cdc2d2'
res = requests.get(url)
soup1 = BS(res.content, 'html.parser')
table1 = soup1.find('table',{'id':'team_pitching'})
table1
I can't seem to figure out how to get this working. The table above can be extracted with the line
table1 = soup1.find('table',{'id':'team_batting'})
and I figured similar code should work for the one below. Additionally, is there a way to extract this using the table class "sortable stats_table now_sortable" rather than id?
The problem is that if you open the page normally it shows all the tables, however if you load the page with Developer Tools just the first table is shown. So, when you do your request the left tables are not included into the HTML you're getting. The table you're looking for is not shown until "Show team pitchin" button is pressed, to do this you could use Selenium and get the full HTML response.
That is because the table you are looking for - i.e. <table> with id="team_pitching" is present as a comment inside the soup. You can check it for yourself by printing soup.
You need to
Extract that comment from the soup
Convert it into a soup object
Extract the table data from the soup object.
Here is the complete code that does the above mentioned steps.
from bs4 import BeautifulSoup, Comment
import requests
url = 'https://www.baseball-reference.com/register/team.cgi?id=17cdc2d2'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
main_div = soup.find('div', {'id': 'all_team_pitching'})
# Extracting the comment from the above selected <div>
for comments in main_div.find_all(text=lambda x: isinstance(x, Comment)):
temp = comments.extract()
# Converting the above extracted comment to a soup object
s = BeautifulSoup(temp, 'lxml')
trs = s.find('table', {'id': 'team_pitching'}).find_all('tr')
# Printing the first five entries of the table
for tr in trs[1:5]:
print(list(tr.stripped_strings))
The first 5 entries from the table
['1', 'Tyler Ahearn', '21', '1', '0', '1.000', '1.93', '6', '0', '0', '1', '9.1', '8', '5', '2', '0', '4', '14', '0', '0', '0', '42', '1.286', '7.7', '0.0', '3.9', '13.5', '3.50']
['2', 'Jack Anderson', '20', '2', '0', '1.000', '0.79', '4', '1', '0', '0', '11.1', '6', '4', '1', '0', '3', '11', '1', '0', '0', '45', '0.794', '4.8', '0.0', '2.4', '8.7', '3.67']
['3', 'Shane Drohan', '*', '21', '0', '1', '.000', '4.08', '4', '4', '0', '0', '17.2', '15', '12', '8', '0', '11', '27', '1', '0', '2', '82', '1.472', '7.6', '0.0', '5.6', '13.8', '2.45']
['4', 'Conor Grady', '21', '2', '0', '1.000', '3.00', '4', '4', '0', '0', '15.0', '10', '5', '5', '3', '8', '15', '1', '0', '2', '68', '1.200', '6.0', '1.8', '4.8', '9.0', '1.88']
Given the following csv file:
['offre_bfr.entreprise', 'offre_bfr.nombreemp', 'offre_bfr.ca2020', 'offre_bfr.ca2019', 'offre_bfr.ca2018', 'offre_bfr.benefice2020', 'offre_bfr.benefice2019', 'offre_bfr.benefice2018', 'offre_bfr.tauxrenta2020', 'offre_bfr.tauxrenta2019', 'offre_bfr.tauxrenta2018', 'offre_bfr.tauximposition', 'offre_bfr.chargesalariale', 'offre_bfr.chargesfixes', 'offre_bfr.agedirigeant', 'offre_bfr.partdirigeant', 'offre_bfr.agemoyact', 'offre_bfr.parttotaleact', 'offre_bfr.mtdmdcred', 'offre_bfr.creditusuel', 'offre_bfr.capipropres', 'offre_bfr.dettefin', 'offre_bfr.dettenonfin', 'offre_bfr.stock', 'offre_bfr.creances', 'offre_bfr.actifimmobilise', 'offre_bfr.passiftotal', 'offre_bfr.tresorerie', 'offre_bfr.capitalisation2020', 'offre_bfr.capitalisation2019', 'offre_bfr.capitalisation2018', 'offre_bfr.nivrisque', 'offre_bfr.indconfiance', 'offre_bfr.indperseverance', 'offre_bfr.score']
['1', '15', '1.84', '5.18', '7.96', '0.48', '1.19', '0.11', '26.086956', '22.972973', '1.3819095', '17.9', '0.035295', '1.2', '55', '33', '69', '67', '10', '14.98', '0.05', '0.04', '0.21', '0.1', '0.08', '0.41', '0.8', '0.0', '7.5', '52.8', '0.16', 'Bas', '4', '4', '5.0']
['3', '3030', '546.7', '589.7', '430.9', '62.58', '20.63', '99.06', '11.446863', '3.498389', '22.989092', '17.4', '7.12959', '270.9', '46', '37', '69', '73', '2973', '1567.3', '46.97', '13.39', '61.92', '3.0', '8.0', '145.0', '278.4', '-51.0', '1063.5', '3047.8', '538.08', 'Eleve', '4', '4', '3.0']
['4', '42', '4.28', '9.13', '8.99', '0.45', '0.59', '0.08', '10.514019', '6.4622126', '0.8898776', '31.5', '0.098826', '2.2', '70', '32', '53', '68', '9', '22.4', '0.13', '0.06', '0.31', '0.1', '0.07', '0.92', '1.7', '-0.3', '42.5', '69.5', '2.73', 'Eleve', '4', '4', '3.0']
['5', '497', '92.2', '62.5', '40.3', '20.14', '6.91', '4.92', '21.843819', '11.056', '12.208437', '32.2', '1.169441', '5.1', '64', '32', '70', '68', '197', '195.0', '6.07', '1.83', '12.49', '5.9', '3.83', '16.41', '16.5', '-2.7', '1048.3', '618.8', '11.24', 'Moyen', '4', '4', '4.0']
['8', '122', '67.8', '24.5', '91.4', '12.67', '5.69', '8.43', '18.687315', '23.22449', '9.223195', '24.8', '0.287066', '19.5', '53', '35', '61', '65', '424', '183.7', '1.64', '1.92', '6.48', '4.9', '2.45', '23.6', '23.7', '-3.5', '204.2', '109.5', '5.33', 'Eleve', '4', '4', '3.0']
['11', '310', '77.5', '78.7', '24.9', '8.05', '21.76', '1.79', '10.387096', '27.649302', '7.188755', '29.0', '0.72943', '12.0', '47', '32', '65', '68', '38', '181.1', '6.55', '3.27', '8.16', '5.1', '2.08', '15.09', '36.3', '-7.0', '669.8', '705.3', '22.95', 'Eleve', '4', '4', '3.0']
['14', '283', '91.9', '52.9', '51.9', '10.48', '7.01', '12.57', '11.4037', '13.251418', '24.219654', '24.2', '0.665899', '2.3', '61', '29', '58', '71', '60', '196.7', '8.02', '2.93', '7.79', '7.0', '3.87', '25.1', '42.7', '-4.4', '434.0', '143.4', '17.18', 'Eleve', '4', '4', '3.0']
['16', '41', '5.54', '6.48', '5.5', '1.55', '1.51', '0.73', '27.97834', '23.30247', '13.272727', '15.9', '0.096473', '2.4', '71', '39', '56', '61', '29', '17.52', '0.41', '0.11', '0.62', '0.3', '0.17', '1.47', '2.4', '0.0', '36.7', '76.0', '4.2', 'Bas', '4', '4', '5.0']
I would like to create a bar chart from columns 0 and 34 of the csv file.
Here is the python script I am running:
# -*-coding:Latin-1 -*
#!/usr/bin/python
#!/usr/bin/env python
import matplotlib as mpl
mpl.use('Agg')
import matplotlib.pyplot as plt
import csv
x = []
y = []
Bfr = csv.reader(open('/home/cloudera/PMGE/Bfr.csv​'))
linesBfr = list(Bfr)
i=1
for l in linesBfr:
x.append(l[i][0])
y.append(int(l[i][34]))
plt.bar(x, y, color = 'g', width = 0.72, label = "Score")
plt.xlabel('Entreprise')
plt.ylabel('Scores')
plt.title('Scores des entreprises en BFR')
plt.legend()
plt.show()
But i'm getting the following error:
Traceback (most recent call last):
File "barplot.py", line 20, in <module>
y.append(int(l[i][34]))
IndexError: string index out of range
Can someone help me out?
Python lists are zero-indexed. You are trying to iterate to the 35th element in a 34 element list.
Firstly, there are 35 elements from 0 to 34. This means that starting your indexing i at i=1 will look for an element at the 35th index, which does not exist, or be an "index out of range". To be more specific, your code is looking for a list that does not exist. Secondly, this is not the standard way to use 2d lists in python. I suggest using a method more as such:
https://www.kite.com/python/answers/how-to-append-to-a-2d-list-in-python#:~:text=Append%20a%20list%20to%20a,list%20to%20the%202D%20list.
Hope this was helpful.
You probably meant to write this:
x = []
y = []
Bfr = csv.reader(open('/home/cloudera/PMGE/Bfr.csv​'))
next(Bfr , None) # skip the header
for l in Bfr:
x.append(int(l[0]))
y.append(int(l[34]))
...
(See this question about skipping the header of a csv)
I am trying to construct a dictionary called author_venues in which author names are the keys and values are the list of venues where they have published.
I was given two dictionaries:
A sample author_pubs dictionary where author name is the key and a list of publication ids is the value
defaultdict(list,
{'José A. Blakeley': ['2',
'25',
'2018',
'2185',
'94602',
'145114',
'182779',
'182780',
'299422',
'299426',
'299428',
'299558',
'302125',
'511816',
'521294',
'597967',
'598123',
'598125',
'598130',
'598132',
'598134',
'598136',
'598620',
'600180',
'600221',
'642049',
'643606',
'808458',
'832249',
'938531',
'939047',
'1064640',
'1064641',
'1065929',
'1118153',
'1269074',
'2984279',
'3154713',
'3169639',
'3286099',
'3494140'],
'Yuri Breitbart': ['3',
'4',
'76914',
'113875',
'140847',
'147900',
'147901',
'150951',
'176221',
'176896',
'182963',
'200336',
'262940',
'285098',
'285564',
'299526',
'301313',
'303418',
'304160',
'400040',
'400041',
'400174',
'400175',
'402178',
'482506',
'482785',
'544757',
'545233',
'545429',
'559737',
'559761',
'559765',
'559783',
'559785',
'597889',
'598201',
'598202',
'598203',
'599325',
'599899',
'620806',
'636455',
'641884',
'642157',
'654200',
'654201',
'740600',
'740602',
'833336',
'844280',
'856032',
'856222',
'888870',
'934979',
'938228',
'941484',
'945339',
'949548',
'971592',
'971593',
'972813',
'972958',
'1064100',
'1064690',
'1064691',
'1064693',
'1064694',
'1078369',
'1078370',
'1089675',
'1095084',
'1121956',
'1122006',
'1122610',
'1127610',
'1138059',
'1138061',
'1141938',
'1227365',
'1278703',
'1319498',
'2818906',
'2876867',
'2978458',
'3015058',
'3223418'],
A sample venue_pubs dictionary where venue name is the key and a list of publication ids is the value
defaultdict(list,
{'Modern Database Systems': ['2',
'3',
'4',
'5',
'6',
'7',
'8',
'9',
'10',
'11',
'12',
'13',
'14',
'15',
'16',
'17',
'18',
'19',
'20',
'21',
'22',
'23',
'24',
'25',
'26',
'27',
'28',
'29',
'30',
'31',
'32',
'33',
'34',
'1203459',
'3000615',
'3000616',
'3000617',
'3000618',
'3000619',
'3000620',
'3000621',
'3000622',
'3000623',
'3000624',
'3000625',
'3000626'],
'Object-Oriented Concepts, Databases, and Applications': ['36',
'37',
'38',
'39',
'40',
'41',
'42',
'43',
'44',
'45',
'46',
'47',
'48',
'49',
'50',
'51',
'52',
'53',
'54',
'55',
'56',
'57',
'58',
'59'],
'The INGRES Papers': ['60',
'61',
'62',
'63',
'64',
'65',
'66',
'67',
'68',
'69'],
'Temporal Databases': ['168',
'169',
'170',
'171',
'172',
'173',
'174',
'175',
'176',
'177',
'178',
'179',
'180',
'181',
'182',
'183',
'184',
'185',
'186',
'187',
'188',
'189',
'190',
'627582',
'627584',
'627588',
'627589',
'627591',
'627592',
'627593',
'627594',
'627596',
'627600',
'627601',
'627602',
'627603',
'627604',
'627605',
'627608',
'627613',
'627615',
'627616',
'627617'],
The resulting dictionary should look like {'author':['venue1','venue2','venue3']}
author_venue = defaultdict(list)
This is code I wrote:
for k,v in author_pubs.items():
for item in v:
for x,y in venue_pubs.items():
if item in y:
venue = x
author_venue[k].append(venue)
But this loop takes forever since I have over 3million records
please help!
You can "invert" the dictionary venue_pubs to speed up the search:
from collections import defaultdict
author_pubs = {
"author1": [1, 2, 3],
"author2": [3, 4, 5],
}
venue_pubs = {
"xxx1": [1, 4, 20],
"xxx2": [4, 30, 40],
}
# "invert" dictionary `venue_pubs`:
tmp = defaultdict(list)
for k, v in venue_pubs.items():
for val in v:
tmp[val].append(k)
author_venue = defaultdict(list)
for k, v in author_pubs.items():
for item in v:
venues = tmp.get(item)
if not venues is None:
author_venue[k].extend(venues)
print(author_venue)
Prints:
defaultdict(<class 'list'>, {'author1': ['xxx1'], 'author2': ['xxx1', 'xxx2']})
EDIT: To remove duplicates:
# ...
for k in author_venue:
author_venue[k] = list(set(author_venue[k]))
print(author_venue)
I'm trying to scrape the following website:
http://mlb.mlb.com/stats/sortable_batter_vs_pitcher.jsp#season=2018&batting_team=119&batter=571771&pitching_team=133&pitcher=641941
(this is an example URL with a certain pitcher/batter matchup)
I'm able to enter the player codes and team codes easily with this function:
def matchupURL(season, batter, batterTeam, pitcher, pitcherTeam):
return "http://mlb.mlb.com/stats/sortable_batter_vs_pitcher.jsp#season=" + str(season)+ "&batting_team="+str(teamNumDict[batterTeam])+"&batter="+str(batter)+"&pitching_team="+str(teamNumDict[pitcherTeam])+"&pitcher="+str(pitcher);
which works nicely, and the returned string works when pasted into my browser.
But when i make a request a la
newURL = matchupURL(2018,i.id,x.home_team,j.id,x.away_team)
print(i+ " vs " + j)
newSes = requests.get(newURL);
html = BeautifulSoup(newSes.text, "lxml")
mydivs = html.findAll("td",{"class":"dg-ops"})
#do something with this div
I'm unable to find the div. Infact, the entire format of the HTML returned changes. Further, adding headers didnt help, nor did using urllib instead of requests.
This page is a dynamic, i.e., the content is dynamically generated by javascript and showed in the front. That is the reason you can't detect the div tag.
But in this case you can scrape easier. With inspect tool from your browser you can detect that the data comes from a GET request to an URL. For your example, you only have to provide the players id :
import requests
url = 'http://lookup-service-prod.mlb.com/json/named.stats_batter_vs_pitcher_composed.bam'
params = {"sport_code":"'mlb'","game_type":"'R'","player_id":"571771","pitcher_id":"641941"}
resp = requests.get(url, params=params).json()
print(resp)
That prints:
{'stats_batter_vs_pitcher_composed': {'stats_batter_vs_pitcher_total': {'queryResults': {'created': '2018-04-12T22:21:47', 'totalSize': '1', 'row': {'hr': '1', 'gidp': '0', 'pitcher_first_last_html': 'Emilio Pagán', 'player': 'Hernandez, Enrique', 'np': '4', 'sac': '0', 'pitcher': 'Pagan, Emilio', 'rbi': '1', 'player_first_last_html': 'Enrique Hernández', 'tb': '4', 'bats': 'R', 'xbh': '1', 'bb': '0', 'slg': '4.000', 'avg': '1.000', 'pitcher_id': '641941', 'ops': '5.000', 'hbp': '0', 'pitcher_html': 'Pagán, Emilio', 'g': '', 'd': '0', 'so': '0', 'throws': 'R', 'sf': '0', 'tpa': '1', 'h': '1', 'cs': '0', 'obp': '1.000', 't': '0', 'ao': '0', 'r': '1', 'go_ao': '-.--', 'sb': '0', 'player_html': 'Hernández, Enrique', 'sbpct': '.---', 'player_id': '571771', 'ibb': '0', 'ab': '1', 'go': '0'}}}, 'copyRight': ' Copyright 2018 MLB Advanced Media, L.P. Use of any content on this page acknowledges agreement to the terms posted here http://gdx.mlb.com/components/copyright.txt ', 'stats_batter_vs_pitcher': {'queryResults': {'created': '2018-04-12T22:21:47', 'totalSize': '1', 'row': {'hr': '1', 'gidp': '0', 'pitcher_first_last_html': 'Emilio Pagán', 'player': 'Hernandez, Enrique', 'np': '4', 'sac': '0', 'pitcher': 'Pagan, Emilio', 'rbi': '1', 'opponent': 'Oakland Athletics', 'player_first_last_html': 'Enrique Hernández', 'tb': '4', 'xbh': '1', 'bats': 'R', 'bb': '0', 'avg': '1.000', 'slg': '4.000', 'pitcher_id': '641941', 'ops': '5.000', 'hbp': '0', 'pitcher_html': 'Pagán, Emilio', 'g': '', 'd': '0', 'so': '0', 'throws': 'R', 'sport': 'MLB', 'sf': '0', 'team': 'Los Angeles Dodgers', 'tpa': '1', 'league': 'NL', 'h': '1', 'cs': '0', 'obp': '1.000', 't': '0', 'ao': '0', 'season': '2018', 'r': '1', 'go_ao': '-.--', 'sb': '0', 'opponent_league': 'AL', 'player_html': 'Hernández, Enrique', 'sbpct': '.---', 'player_id': '571771', 'ibb': '0', 'ab': '1', 'opponent_id': '133', 'team_id': '119', 'go': '0', 'opponent_sport': 'MLB'}}}}}
I am opening and reading one .csv file at a time from a folder and printing them out as follows:
ownerfiles = os.listdir(filepath)
for ownerfile in ownerfiles:
if ownerfile.endswith(".csv"):
eachfile = (filepath + ownerfile) #loops over each file in ownerfiles
with open (eachfile, 'r', encoding="UTF-8") as input_file:
next(input_file)
print(eachfile)
for idx, line in enumerate(input_file.readlines()) :
line = line.strip().split(",")
print(line)
However, when I do print(line) the files are printing as follows:
/Users/Sulz/Desktop/MSBA/Applied Data Analytics/Test_File/ownerfile_138.csv
['']
['2010-01-01 11:28:35', '16', '54', '59', '0000000040400', 'O.Coffee Hot Small', 'I', ' ', ' ', '14', '1', '0', '0.3241', '1.4900', '1.4900', '1.4900', '0.0000', '1', '0', '0', '0', '0.0000', '0.0000', '1', '44', '0', '0.00000000', '1', '0', '0', '0.0000', '0', '0', '', '0', '5', '0', '0', '0', '0', 'NULL', '0', 'NULL', '', '0', '20436', '1', '0', '0', '1']
How can I get rid of [''] before the list of all the data ??
EDIT:
I now tried reading it with the .csv module like this:
ownerfiles = os.listdir(filepath)
for ownerfile in ownerfiles:
if ownerfile.endswith(".csv"):
eachfile = (filepath + ownerfile) #loops over each file in ownerfiles
with open (eachfile, 'r', encoding="UTF-8") as input_file:
next(input_file)
reader = csv.reader(input_file, delimiter=',', quotechar='|')
for row in reader :
print(row)
However, it still prints output like this:
[] ['2010-01-01 11:28:35', '16', '54', '59', '0000000040400', 'O.Coffee Hot Small', 'I', ' ', ' ', '14', '1', '0', '0.3241', '1.4900', '1.4900', '1.4900', '0.0000', '1', '0', '0', '0', '0.0000', '0.0000', '1', '44', '0', '0.00000000', '1', '0', '0', '0.0000', '0', '0', '', '0', '5', '0', '0', '0', '0', 'NULL', '0', 'NULL', '', '0', '20436', '1', '0', '0', '1']
That's just Python's list syntax being printed. You are splitting each line on a comma which is generating a list. If you print the line before the split you'll probably get what you're looking for:
line = line.strip()
print(line)
line = line.split(",")
By the way, Python has a built in CSV module for reading and writing csv files, in case you didn't know.
EDIT: Sorry, I misread your question. Add this to the start of your readlines loop:
line = line.strip()
if not line:
continue