Related
I have a nested list that looks like this:
[['Student', 'Exam 1', 'Exam 2', 'Exam 3'],
['Thorny', '100', '90', '80'],
['Mac', '88', '99', '111'],
['Farva', '45', '56', '67'],
['Rabbit', '59', '61', '67'],
['Ursula', '73', '79', '83'],
['Foster', '89', '97', '101']]
I was able to pull out student names using the following:
`def get_students(grades):
student_names = []
for i in range(1,len(grades)):
student_names.append(grades[i][0])
return student_names `
The output was ['Thorny', 'Mac', 'Farva', 'Rabbit', 'Ursula', 'Foster']
Now I need to pull out the "Exam" headers. I used similar code but it is getting very different results.
`def get_assignments(grades):
exam_list = []
for i in range(6, len(grades)):
exam_list.append(grades[0][1 : ])
return exam_list`
This returns the output:
`[['Exam 1', 'Exam 2', 'Exam 3']]`
I have been looking here and at GeeksForGeeks to try and figure out what I am doing wrong. It seemed like it should be a simple tweaking of the first code to access a different part of the nested list but I can't seem to get it. Any tips would be appreciated.
EDIT: Apologies, my problem may not have been clear. The desired output is
['Exam 1', 'Exam 2', 'Exam 3']
Thank you!
this is a short solution with list Comprehension
lst=[['Student', 'Exam 1', 'Exam 2', 'Exam 3'], ['Thorny', '100', '90', '80'], ['Mac', '88', '99', '111'], ['Farva', '45', '56', '67'], ['Rabbit', '59', '61', '67'], ['Ursula', '73', '79', '83'], ['Foster', '89', '97', '101']]
lstStudents=[nlst[0] for nlst in lst]
lstExams=[nlst[1:] for nlst in lst[1:]]
print(lstStudents)
print(lstExams)
the out put will be:
['Student', 'Thorny', 'Mac', 'Farva', 'Rabbit', 'Ursula', 'Foster']
[['100', '90', '80'], ['88', '99', '111'], ['45', '56', '67'], ['59', '61', '67'], ['73', '79', '83'], ['89', '97', '101']]
Maybe consider turning your data in to a list of dictionaries, then we can make use of Python's list comprehensions to retrieve whatever you need.
grades = [['Student', 'Exam 1', 'Exam 2', 'Exam 3'], ['Thorny', '100', '90', '80'], ['Mac', '88', '99', '111'], ['Farva', '45', '56', '67'], ['Rabbit', '59', '61', '67'], ['Ursula', '73', '79', '83'], ['Foster', '89', '97', '101']]
headers = grades[0]
grades_list = [dict(zip(headers, student)) for student in grades[1:]]
print(*(grade for grade in grades_list if grade['Student']=='Rabbit'))
# Output {'Student': 'Rabbit', 'Exam 1': '59', 'Exam 2': '61', 'Exam 3': '67'}
I'm a newbie in this sector. Here is the website I need to crawling "http://py4e-data.dr-chuck.net/comments_1430669.html" and here is it source code "view-source:http://py4e-data.dr-chuck.net/comments_1430669.html"
It's a simple website for practice. The HTML code look something like:
<html>
<head>
<title>Welcome to the comments assignment from www.py4e.com</title>
</head>
<body>
<h1>This file contains the actual data for your assignment - good luck!</h1>
<table border="2">
<tr>
<td>Name</td><td>Comments</td>
</tr>
<tr><td>Melodie</td><td><span class="comments">100</span></td></tr>
<tr><td>Machaela</td><td><span class="comments">100</span></td></tr>
<tr><td>Rhoan</td><td><span class="comments">99</span></td></tr>
I need to get the number between comments and span (100,100,99)
Below is my code:
html=urllib.request.urlopen('http://py4e-data.dr-chuck.net/comments_1430669.html').read()
soup=BeautifulSoup(html,'html.parser')
tag=soup.span
print(tag) #<span class="comments">100</span>
print(tag.string) #100
I got the number 100 but only the first one, now I want to get all of them by iterating through a list or sth like that. What is the method to do this with beautifulsoup?
import urllib.request
from bs4 import BeautifulSoup
html = urllib.request.urlopen('http://py4e-data.dr-chuck.net/comments_1430669.html').read()
soup = BeautifulSoup(html,'html.parser')
tags = soup.find_all("span")
for i in tags:
print(i.string)
You can use find_all() function and then iterate it to get the numbers.
If you want names also you can use python dictionary :
import urllib.request
from bs4 import BeautifulSoup
html = urllib.request.urlopen('http://py4e-data.dr-chuck.net/comments_1430669.html').read()
soup = BeautifulSoup(html,'html.parser')
tags = soup.find_all("span")
comments = {}
for index, tag in enumerate(tags):
commentorName = tag.find_previous('tr').text
commentorComments = tag.string
comments[commentorName] = commentorComments
print(comments)
This will give you the output as :
{'Melodie100': '100', 'Machaela100': '100', 'Rhoan99': '99', 'Murrough96': '96', 'Lilygrace93': '93', 'Ellenor93': '93', 'Verity89': '89', 'Karlie88': '88', 'Berlin85': '85', 'Skylar84': '84', 'Benny84': '84', 'Crispin81': '81', 'Asya79': '79', 'Kadi76': '76', 'Dua74': '74', 'Stephany73': '73', 'Eila71': '71', 'Jennah70': '70', 'Eduardo67': '67', 'Shannan61': '61', 'Chymari60': '60', 'Inez60': '60', 'Charlene59': '59', 'Rosalin54': '54', 'James53': '53', 'Rhy53': '53', 'Zein52': '52', 'Ayren50': '50', 'Marissa46': '46', 'Mcbride46': '46', 'Ruben45': '45', 'Mikee41': '41', 'Carmel38': '38', 'Idahosa37': '37', 'Brooklin37': '37', 'Betsy36': '36', 'Kayah34': '34', 'Szymon26': '26', 'Tea24': '24', 'Queenie24': '24', 'Nima23': '23', 'Eassan23': '23', 'Haleema21': '21', 'Rahma17': '17', 'Rob17': '17', 'Roma16': '16', 'Jeffrey14': '14', 'Yorgos12': '12', 'Denon11': '11', 'Jasmina7': '7'}
Try the following approach:
from bs4 import BeautifulSoup
import urllib.request
html = urllib.request.urlopen('http://py4e-data.dr-chuck.net/comments_1430669.html').read()
soup = BeautifulSoup(html, 'html.parser')
data = []
for tr in soup.find_all('tr'):
row = [td.text for td in tr.find_all('td')]
data.append(row[1]) # or data.append(row) for both
print(data)
Giving you data holding a list containing just the one column:
['Comments', '100', '100', '99', '96', '93', '93', '89', '88', '85', '84', '84', '81', '79', '76', '74', '73', '71', '70', '67', '61', '60', '60', '59', '54', '53', '53', '52', '50', '46', '46', '45', '41', '38', '37', '37', '36', '34', '26', '24', '24', '23', '23', '21', '17', '17', '16', '14', '12', '11', '7']
First locate all of the table <tr> rows. Then extract all of the <td> values for each row. As you only want the second one, append row[1] to a data list holding your values.
You can skip the first one if needed with data[1:].
This approach would let you also save the name at the same time by appending the whole of row. e.g. use data.append(row) instead...
You could then display the entries using:
for name, comment in data[1:]:
print(name, comment)
Giving output starting:
Melodie 100
Machaela 100
Rhoan 99
Murrough 96
Lilygrace 93
Ellenor 93
Verity 89
Karlie 88
Given the following csv file:
['offre_bfr.entreprise', 'offre_bfr.nombreemp', 'offre_bfr.ca2020', 'offre_bfr.ca2019', 'offre_bfr.ca2018', 'offre_bfr.benefice2020', 'offre_bfr.benefice2019', 'offre_bfr.benefice2018', 'offre_bfr.tauxrenta2020', 'offre_bfr.tauxrenta2019', 'offre_bfr.tauxrenta2018', 'offre_bfr.tauximposition', 'offre_bfr.chargesalariale', 'offre_bfr.chargesfixes', 'offre_bfr.agedirigeant', 'offre_bfr.partdirigeant', 'offre_bfr.agemoyact', 'offre_bfr.parttotaleact', 'offre_bfr.mtdmdcred', 'offre_bfr.creditusuel', 'offre_bfr.capipropres', 'offre_bfr.dettefin', 'offre_bfr.dettenonfin', 'offre_bfr.stock', 'offre_bfr.creances', 'offre_bfr.actifimmobilise', 'offre_bfr.passiftotal', 'offre_bfr.tresorerie', 'offre_bfr.capitalisation2020', 'offre_bfr.capitalisation2019', 'offre_bfr.capitalisation2018', 'offre_bfr.nivrisque', 'offre_bfr.indconfiance', 'offre_bfr.indperseverance', 'offre_bfr.score']
['1', '15', '1.84', '5.18', '7.96', '0.48', '1.19', '0.11', '26.086956', '22.972973', '1.3819095', '17.9', '0.035295', '1.2', '55', '33', '69', '67', '10', '14.98', '0.05', '0.04', '0.21', '0.1', '0.08', '0.41', '0.8', '0.0', '7.5', '52.8', '0.16', 'Bas', '4', '4', '5.0']
['3', '3030', '546.7', '589.7', '430.9', '62.58', '20.63', '99.06', '11.446863', '3.498389', '22.989092', '17.4', '7.12959', '270.9', '46', '37', '69', '73', '2973', '1567.3', '46.97', '13.39', '61.92', '3.0', '8.0', '145.0', '278.4', '-51.0', '1063.5', '3047.8', '538.08', 'Eleve', '4', '4', '3.0']
['4', '42', '4.28', '9.13', '8.99', '0.45', '0.59', '0.08', '10.514019', '6.4622126', '0.8898776', '31.5', '0.098826', '2.2', '70', '32', '53', '68', '9', '22.4', '0.13', '0.06', '0.31', '0.1', '0.07', '0.92', '1.7', '-0.3', '42.5', '69.5', '2.73', 'Eleve', '4', '4', '3.0']
['5', '497', '92.2', '62.5', '40.3', '20.14', '6.91', '4.92', '21.843819', '11.056', '12.208437', '32.2', '1.169441', '5.1', '64', '32', '70', '68', '197', '195.0', '6.07', '1.83', '12.49', '5.9', '3.83', '16.41', '16.5', '-2.7', '1048.3', '618.8', '11.24', 'Moyen', '4', '4', '4.0']
['8', '122', '67.8', '24.5', '91.4', '12.67', '5.69', '8.43', '18.687315', '23.22449', '9.223195', '24.8', '0.287066', '19.5', '53', '35', '61', '65', '424', '183.7', '1.64', '1.92', '6.48', '4.9', '2.45', '23.6', '23.7', '-3.5', '204.2', '109.5', '5.33', 'Eleve', '4', '4', '3.0']
['11', '310', '77.5', '78.7', '24.9', '8.05', '21.76', '1.79', '10.387096', '27.649302', '7.188755', '29.0', '0.72943', '12.0', '47', '32', '65', '68', '38', '181.1', '6.55', '3.27', '8.16', '5.1', '2.08', '15.09', '36.3', '-7.0', '669.8', '705.3', '22.95', 'Eleve', '4', '4', '3.0']
['14', '283', '91.9', '52.9', '51.9', '10.48', '7.01', '12.57', '11.4037', '13.251418', '24.219654', '24.2', '0.665899', '2.3', '61', '29', '58', '71', '60', '196.7', '8.02', '2.93', '7.79', '7.0', '3.87', '25.1', '42.7', '-4.4', '434.0', '143.4', '17.18', 'Eleve', '4', '4', '3.0']
['16', '41', '5.54', '6.48', '5.5', '1.55', '1.51', '0.73', '27.97834', '23.30247', '13.272727', '15.9', '0.096473', '2.4', '71', '39', '56', '61', '29', '17.52', '0.41', '0.11', '0.62', '0.3', '0.17', '1.47', '2.4', '0.0', '36.7', '76.0', '4.2', 'Bas', '4', '4', '5.0']
I would like to create a bar chart from columns 0 and 34 of the csv file.
Here is the python script I am running:
# -*-coding:Latin-1 -*
#!/usr/bin/python
#!/usr/bin/env python
import matplotlib as mpl
mpl.use('Agg')
import matplotlib.pyplot as plt
import csv
x = []
y = []
Bfr = csv.reader(open('/home/cloudera/PMGE/Bfr.csv'))
linesBfr = list(Bfr)
i=1
for l in linesBfr:
x.append(l[i][0])
y.append(int(l[i][34]))
plt.bar(x, y, color = 'g', width = 0.72, label = "Score")
plt.xlabel('Entreprise')
plt.ylabel('Scores')
plt.title('Scores des entreprises en BFR')
plt.legend()
plt.show()
But i'm getting the following error:
Traceback (most recent call last):
File "barplot.py", line 20, in <module>
y.append(int(l[i][34]))
IndexError: string index out of range
Can someone help me out?
Python lists are zero-indexed. You are trying to iterate to the 35th element in a 34 element list.
Firstly, there are 35 elements from 0 to 34. This means that starting your indexing i at i=1 will look for an element at the 35th index, which does not exist, or be an "index out of range". To be more specific, your code is looking for a list that does not exist. Secondly, this is not the standard way to use 2d lists in python. I suggest using a method more as such:
https://www.kite.com/python/answers/how-to-append-to-a-2d-list-in-python#:~:text=Append%20a%20list%20to%20a,list%20to%20the%202D%20list.
Hope this was helpful.
You probably meant to write this:
x = []
y = []
Bfr = csv.reader(open('/home/cloudera/PMGE/Bfr.csv'))
next(Bfr , None) # skip the header
for l in Bfr:
x.append(int(l[0]))
y.append(int(l[34]))
...
(See this question about skipping the header of a csv)
I am trying to construct a dictionary called author_venues in which author names are the keys and values are the list of venues where they have published.
I was given two dictionaries:
A sample author_pubs dictionary where author name is the key and a list of publication ids is the value
defaultdict(list,
{'José A. Blakeley': ['2',
'25',
'2018',
'2185',
'94602',
'145114',
'182779',
'182780',
'299422',
'299426',
'299428',
'299558',
'302125',
'511816',
'521294',
'597967',
'598123',
'598125',
'598130',
'598132',
'598134',
'598136',
'598620',
'600180',
'600221',
'642049',
'643606',
'808458',
'832249',
'938531',
'939047',
'1064640',
'1064641',
'1065929',
'1118153',
'1269074',
'2984279',
'3154713',
'3169639',
'3286099',
'3494140'],
'Yuri Breitbart': ['3',
'4',
'76914',
'113875',
'140847',
'147900',
'147901',
'150951',
'176221',
'176896',
'182963',
'200336',
'262940',
'285098',
'285564',
'299526',
'301313',
'303418',
'304160',
'400040',
'400041',
'400174',
'400175',
'402178',
'482506',
'482785',
'544757',
'545233',
'545429',
'559737',
'559761',
'559765',
'559783',
'559785',
'597889',
'598201',
'598202',
'598203',
'599325',
'599899',
'620806',
'636455',
'641884',
'642157',
'654200',
'654201',
'740600',
'740602',
'833336',
'844280',
'856032',
'856222',
'888870',
'934979',
'938228',
'941484',
'945339',
'949548',
'971592',
'971593',
'972813',
'972958',
'1064100',
'1064690',
'1064691',
'1064693',
'1064694',
'1078369',
'1078370',
'1089675',
'1095084',
'1121956',
'1122006',
'1122610',
'1127610',
'1138059',
'1138061',
'1141938',
'1227365',
'1278703',
'1319498',
'2818906',
'2876867',
'2978458',
'3015058',
'3223418'],
A sample venue_pubs dictionary where venue name is the key and a list of publication ids is the value
defaultdict(list,
{'Modern Database Systems': ['2',
'3',
'4',
'5',
'6',
'7',
'8',
'9',
'10',
'11',
'12',
'13',
'14',
'15',
'16',
'17',
'18',
'19',
'20',
'21',
'22',
'23',
'24',
'25',
'26',
'27',
'28',
'29',
'30',
'31',
'32',
'33',
'34',
'1203459',
'3000615',
'3000616',
'3000617',
'3000618',
'3000619',
'3000620',
'3000621',
'3000622',
'3000623',
'3000624',
'3000625',
'3000626'],
'Object-Oriented Concepts, Databases, and Applications': ['36',
'37',
'38',
'39',
'40',
'41',
'42',
'43',
'44',
'45',
'46',
'47',
'48',
'49',
'50',
'51',
'52',
'53',
'54',
'55',
'56',
'57',
'58',
'59'],
'The INGRES Papers': ['60',
'61',
'62',
'63',
'64',
'65',
'66',
'67',
'68',
'69'],
'Temporal Databases': ['168',
'169',
'170',
'171',
'172',
'173',
'174',
'175',
'176',
'177',
'178',
'179',
'180',
'181',
'182',
'183',
'184',
'185',
'186',
'187',
'188',
'189',
'190',
'627582',
'627584',
'627588',
'627589',
'627591',
'627592',
'627593',
'627594',
'627596',
'627600',
'627601',
'627602',
'627603',
'627604',
'627605',
'627608',
'627613',
'627615',
'627616',
'627617'],
The resulting dictionary should look like {'author':['venue1','venue2','venue3']}
author_venue = defaultdict(list)
This is code I wrote:
for k,v in author_pubs.items():
for item in v:
for x,y in venue_pubs.items():
if item in y:
venue = x
author_venue[k].append(venue)
But this loop takes forever since I have over 3million records
please help!
You can "invert" the dictionary venue_pubs to speed up the search:
from collections import defaultdict
author_pubs = {
"author1": [1, 2, 3],
"author2": [3, 4, 5],
}
venue_pubs = {
"xxx1": [1, 4, 20],
"xxx2": [4, 30, 40],
}
# "invert" dictionary `venue_pubs`:
tmp = defaultdict(list)
for k, v in venue_pubs.items():
for val in v:
tmp[val].append(k)
author_venue = defaultdict(list)
for k, v in author_pubs.items():
for item in v:
venues = tmp.get(item)
if not venues is None:
author_venue[k].extend(venues)
print(author_venue)
Prints:
defaultdict(<class 'list'>, {'author1': ['xxx1'], 'author2': ['xxx1', 'xxx2']})
EDIT: To remove duplicates:
# ...
for k in author_venue:
author_venue[k] = list(set(author_venue[k]))
print(author_venue)
I have a csv files with player attributes:
['Peter Regin', '2', 'DAN', 'N', '1987', '6', '6', '199', '74', '2', '608000', '', '77', '52', '74', '72', '58', '72', '71', '72', '70', '72', '74', '68', '74', '41', '40', '51']
['Andrej Sekera', '8', 'SVK', 'N', '1987', '6', '6', '198', '72', '3', '1323000', '', '65', '39', '89', '78', '75', '70', '72', '56', '53', '56', '57', '72', '57', '59', '70', '51']
For example, I want to check if a player is a CENTER ('2' in position 1 in my list) and after I want to modify the 12 element (which is '77' for Peter Regin)
How can I do that using the CSV module ?
import csv
class ManipulationFichier:
def __init__(self, fichier):
self.fichier = fichier
def read(self):
with open(self.fichier) as f:
reader = csv.reader(f)
for row in reader:
print(row)
def write(self):
with open(self.fichier) as f:
writer = csv.writer(f)
for row in f:
if row[1] == 2:
writer.writerows(row[1] for row in f)
Which do nothing important..
Thanks,
In general, CSV files cannot be reliably modified in-place.
Read the entire file into memory (usually a list of lists, as in your example), modify the data, then write the entire file back.
Unless your file is really huge, and you do this really often, the performance hit will be negligible.