I'm a web-scraping beginner and am trying to scrape this webpage: https://profiles.doe.mass.edu/statereport/ap.aspx
I'd like to be able to put in some settings at the top (like District, 2020-2021, Computer Science A, Female) and then download the resulting data for those settings.
Here's the code I'm currently using:
import requests
from bs4 import BeautifulSoup
url = 'https://profiles.doe.mass.edu/statereport/ap.aspx'
with requests.Session() as s:
s.headers['User-Agent'] = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:100.0) Gecko/20100101 Firefox/100.0"
r = s.get('https://profiles.doe.mass.edu/statereport/ap.aspx')
soup = BeautifulSoup(r.text,"lxml")
data = {i['name']:i.get('value','') for i in soup.select('input[name]')}
data["ctl00$ContentPlaceHolder1$ddReportType"]="DISTRICT",
data["ctl00$ContentPlaceHolder1$ddYear"]="2021",
data["ctl00$ContentPlaceHolder1$ddSubject"]="COMSCA",
data["ctl00$ContentPlaceHolder1$ddStudentGroup"]="F",
p = s.post(url,data=data)
When I print out p.text, I get a page with title '\t404 - Page Not Found\r\n' and message
<h2>We are unable to locate information at: <br /><br '
'/>http://profiles.doe.mass.edu:80/statereport/ap.aspxp?ASP.NET_SessionId=bxfgao54wru50zl5tkmfml00</h2>\r\n'
Here's what data looks like before I modify it:
{'__EVENTVALIDATION': '/wEdAFXz4796FFICjJ1Xc5ZOd9SwSHUlrrW+2y3gXxnnQf/b23Vhtt4oQyaVxTPpLLu5SKjKYgCipfSrKpW6jkHllWSEpW6/zTHqyc3IGH3Y0p/oA6xdsl0Dt4O8D2I0RxEvXEWFWVOnvCipZArmSoAj/6Nog6zUh+Jhjqd1LNep6GtJczTu236xw2xaJFSzyG+xo1ygDunu7BCYVmh+LuKcW56TG5L0jGOqySgRaEMolHMgR0Wo68k/uWImXPWE+YrUtgDXkgqzsktuw0QHVZv7mSDJ31NaBb64Fs9ARJ5Argo+FxJW/LIaGGeAYoDphL88oao07IP77wrmH6t1R4d88C8ImDHG9DY3sCDemvzhV+wJcnU4a5qVvRziPyzqDWnj3tqRclGoSw0VvVK9w+C3/577Gx5gqF21UsZuYzfP4emcqvJ7ckTiBk7CpZkjUjM6Z9XchlxNjWi1LkzyZ8QMP0MaNCP4CVYJfndopwFzJC7kI3W106YIA/xglzXrSdmq6/MDUCczeqIsmRQGyTOkQFH724RllsbZyHoPHYvoSAJilrMQf6BUERVN4ojysx3fz5qZhZE7DWaJAC882mXz4mEtcevFrLwuVPD7iB2v2mlWoK0S5Chw4WavlmHC+9BRhT36jtBzSPRROlXuc6P9YehFJOmpQXqlVil7C9OylT4Kz5tYzrX9JVWEpeWULgo9Evm+ipJZOKY2YnC41xTK/MbZFxsIxqwHA3IuS10Q5laFojoB+e+FDCqazV9MvcHllsPv2TK3N1oNHA8ODKnEABoLdRgumrTLDF8Lh+k+Y4EROoHhBaO3aMppAI52v3ajRcCFET22jbEm/5+P2TG2dhPhYgtZ8M/e/AoXht29ixVQ1ReO/6bhLIM+i48RTmcl76n1mNjfimB8r3irXQGYIEqCkXlUHZ/SNlRYyx3obJ6E/eljlPveWNidFHOaj+FznOh264qDkMm7fF78WBO2v0x+or1WGijWDdQtRy9WRKXchYxUchmBlYm15YbBfMrIB7+77NJV+M6uIVVnCyiDRGj+oPXcTYxqSUCLrOMQyzYKJeu8/hWD0gOdKeoYUdUUJq4idIk+bLYy76sI/N2aK+aXZo/JPQ+23gTHzIlyi4Io7O6kXaULPs8rfo8hpkH1qXyKb/rP2VJBNWgyp8jOMx9px+m4/e2Iecd86E4eN4Rk6OIiwqGp+dMdgntXu5ruRHb1awPlVmDw92dL1P0b0XxJW7EGfMzyssMDhs1VT6K6iMUTHbuXkNGaEG1dP1h4ktnCwGqDLVutU6UuzT6i4nfqnvFjGK9+7Ze8qWIl8SYyhmvzmgpLjdMuF9CYMQ2Aa79HXLKFACsSSm0dyiU1/ZGyII2Fvga9o+nVV1jZam3LkcAPaXEKwEyJXfN/DA7P4nFAaQ+QP+2bSgrcw+/dw+86OhPyG88qyJwqZODEXE1WB5zSOUywGb1/Xed7wq9WoRs6v8rAK5c/2iH7YLiJ4mUVDo+7WCKrzO5+Hsyah3frMKbheY1acRmSVUzRgCnTx7jvcLGR9Jbt6TredqZaWZBrDFcntdg7EHd7imK5PqjUld3iCVjdyO+yLKUkMKiFD85G3vEferg/Q/TtfVBqeTU0ohP9d+CsKOmV/dxVYWEtBcfa9KiN6j4N8pP7+3iUOhajojZ8jV98kxT0zPZlzkpqI4SwR6Ys8d2RjIi5K+oQul4pL5u+zZvX0lsLP9Jl7FeVTfBvST67T6ohz8dl9gBfmmbwnT23SyuFSUGd6ZGaKE+9kKYmuImW7w3ePs7C70yDWHpIpxP/IJ4GHb36LWto2g3Ld3goCQ4fXPu7C4iTiN6b5WUSlJJsWGF4eQkJue8=',
'__VIEWSTATE': '/wEPDwUKLTM0NzY4OTQ4NmRkDwwPzTpuna+yxVhQxpRF4n2+zYKQtotwRPqzuCkRvyU=',
'__VIEWSTATEGENERATOR': '2B6F8D71',
'ctl00$ContentPlaceHolder1$btnViewReport': 'View Report',
'ctl00$ContentPlaceHolder1$hfExport': 'ViewReport',
'leftNavId': '11241',
'quickSearchValue': '',
'runQuickSearch': 'Y',
'searchType': 'QUICK',
'searchtext': ''}
Following suggestions from similar questions, I've tried playing around with the parameters, editing data in various ways (to emulate the POST request that I see in my browser when I navigate the site myself), and specifying an ASP.NET_SessionId, but to no avail.
How can I access the information from this website?
This should be what you are looking for what I did was use bs4 to parse HTML data and then found the table. Then I get the rows and to make it easier to work with the data I put it into a dictionary.
import requests
from bs4 import BeautifulSoup
url = 'https://profiles.doe.mass.edu/statereport/ap.aspx'
with requests.Session() as s:
s.headers['User-Agent'] = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:100.0) Gecko/20100101 Firefox/100.0"
r = s.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
table = soup.find_all('table')
rows = table[0].find_all('tr')
data = {}
for row in rows:
if row.find_all('th'):
keys = row.find_all('th')
for key in keys:
data[key.text] = []
else:
values = row.find_all('td')
for value in values:
data[keys[values.index(value)].text].append(value.text)
for key in data:
print(key, data[key][:10])
print('\n')
The output:
District Name ['Abington', 'Academy Of the Pacific Rim Charter Public (District)', 'Acton-Boxborough', 'Advanced Math and Science Academy Charter (District)', 'Agawam', 'Amesbury', 'Amherst-Pelham', 'Andover', 'Arlington', 'Ashburnham-Westminster']
District Code ['00010000', '04120000', '06000000', '04300000', '00050000', '00070000', '06050000', '00090000', '00100000', '06100000']
Tests Taken [' 100', ' 109', ' 1,070', ' 504', ' 209', ' 126', ' 178', ' 986', ' 893', ' 97']
Score=1 [' 16', ' 81', ' 12', ' 29', ' 27', ' 18', ' 5', ' 70', ' 72', ' 4']
Score=2 [' 31', ' 20', ' 55', ' 74', ' 65', ' 34', ' 22', ' 182', ' 149', ' 23']
Score=3 [' 37', ' 4', ' 158', ' 142', ' 55', ' 46', ' 37', ' 272', ' 242', ' 32']
Score=4 [' 15', ' 3', ' 344', ' 127', ' 39', ' 19', ' 65', ' 289', ' 270', ' 22']
Score=5 [' 1', ' 1', ' 501', ' 132', ' 23', ' 9', ' 49', ' 173', ' 160', ' 16']
% Score 1-2 [' 47.0', ' 92.7', ' 6.3', ' 20.4', ' 44.0', ' 41.3', ' 15.2', ' 25.6', ' 24.7', ' 27.8']
% Score 3-5 [' 53.0', ' 7.3', ' 93.7', ' 79.6', ' 56.0', ' 58.7', ' 84.8', ' 74.4', ' 75.3', ' 72.2']
Process finished with exit code 0
I was able to get this working by adapting the code from here. I'm not sure why editing the payload in this way made the difference, so I'd be grateful for any insights!
Here's my working code, using Pandas to parse out the tables:
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://profiles.doe.mass.edu/statereport/ap.aspx'
with requests.Session() as s:
s.headers['User-Agent'] = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:100.0) Gecko/20100101 Firefox/100.0"
response = s.get(url)
soup = BeautifulSoup(response.content, 'html5lib')
data = { tag['name']: tag['value']
for tag in soup.select('input[name^=ctl00]') if tag.get('value')
}
state = { tag['name']: tag['value']
for tag in soup.select('input[name^=__]')
}
payload = data.copy()
payload.update(state)
payload["ctl00$ContentPlaceHolder1$ddReportType"]="DISTRICT",
payload["ctl00$ContentPlaceHolder1$ddYear"]="2021",
payload["ctl00$ContentPlaceHolder1$ddSubject"]="COMSCA",
payload["ctl00$ContentPlaceHolder1$ddStudentGroup"]="F",
p = s.post(url,data=payload)
df = pd.read_html(p.text)[0]
df["District Code"] = df["District Code"].astype(str).str.zfill(8)
display(df)
Related
I am trying to scrape PFF.com for football grades with selenium, I am trying to get a specific grade for all Quarterbacks. Problem is, it doesn't seem like it's capturing the text as .text isn't working but I am not getting any NoSuchElementException.
Here's my code:
service = Service(executable_path="C:\\chromedriver.exe")
op = webdriver.ChromeOptions()
driver = webdriver.Chrome(service=service, options=op)
driver.get("https://premium.pff.com/nfl/positions/2022/REG/passing?position=QB")
sleep(2)
sign_in = driver.find_element(By.XPATH, '/html/body/div/div/header/div[3]/button')
sign_in.click()
sleep(2)
email = driver.find_element(By.XPATH, '/html/body/div/div/div/div/div/div/form/div[1]/input')
email.send_keys(my_email)
password = driver.find_element(By.XPATH,
'/html/body/div/div/div/div/div/div/form/div[2]/input')
password.send_keys(my_password)
sleep(2)
sign_in_2 = driver.find_element(By.XPATH,
'/html/body/div/div/div/div/div/div/form/button')
sign_in_2.click()
sleep(2)
all_off_grades = driver.find_elements(By.CSS_SELECTOR, '.kyber-table
.kyber-grade-badge__info-text div')
all_qb_names = driver.find_elements(By.CSS_SELECTOR, '.kyber-table .p-1 a')
qb_grades = []
qb_names = []
for grade in all_off_grades:
qb_grades.append(grade.text)
for qb_name in all_qb_names:
qb_names.append(qb_name.text)
print(qb_grades)
print(qb_names)
The lists keep showing as empty.
Here are the elements I am trying to pull, but for every QB, I already confirmed the other QB's have the same class names for their grade and name.
<div class="kyber-grade-badge__info-text">91.5</div>
need to pull the 91.5
<a class="p-1" href="/nfl/players/2022/REG/josh-allen/46601/passing">Josh Allen</a>
need to pull Josh Allen
#Jbuck3 I tried modifying the locator and it works for me. I am also giving the output I am getting. Let me know that is what you were expecting.
all_off_grades = driver.find_elements(By.CSS_SELECTOR, '.kyber-table-body__scrolling-rows-container .kyber-grade-badge__info-text')
all_qb_names = driver.find_elements(By.CSS_SELECTOR, "a[data-gtm-id = 'player_name']")
And the output I got is:
['91.5', '90.3', '74.6', '-', '-', '60.0', '84.3', '78.3', '78.1', '-', '-', '60.0', '82.8', '83.4', '-', '-', '-', '60.0']
['Josh Allen ', 'Geno Smith ', 'Kirk Cousins ', 'Marcus Mariota ', 'Jameis Winston ', 'Trey Lance ', 'Derek Carr ', 'Justin Fields ', 'Trevor Lawrence ', 'Russell Wilson ', 'Ryan Tannehill ', 'Tom Brady ', 'Tua Tagovailoa ', 'Mac Jones ', 'Davis Mills ', 'Matthew Stafford ', 'Baker Mayfield ', 'Lamar Jackson ', 'Joe Flacco ', 'Matt Ryan ', 'Jalen Hurts ', 'Daniel Jones ', 'Kyler Murray ', 'Justin Herbert ', 'Joe Burrow ', 'Aaron Rodgers ', 'Patrick Mahomes ', 'Mitchell Trubisky ', 'Dak Prescott ', 'Jacoby Brissett ', 'Carson Wentz ', 'Jared Goff ']
I made a dictionary using .groupdict() function, however, I am having a problem regarding elimination of certain output dictionaries.
For example my code looks like this (tweet is a string that contains 5 elements separated by || :
def somefuntion(pattern,tweet):
pattern = "^(?P<username>.*?)(?:\|{2}[^|]+){2}\|{2}(?P<botprob>.*?)(?:\|{2}|$)"
for paper in tweet:
for item in re.finditer(pattern,paper):
item.groupdict()
This produces an output in the form:
{'username': 'yashrgupta ', 'botprob': ' 0.30794588629999997 '}
{'username': 'sterector ', 'botprob': ' 0.39391528649999996 '}
{'username': 'MalcolmXon ', 'botprob': ' 0.05630123819 '}
{'username': 'ryechuuuuu ', 'botprob': ' 0.08492567222000001 '}
{'username': 'dpsisi ', 'botprob': ' 0.8300337045 '}
But I would like it to only return dictionaries whose botprob is above 0.7. How do I do this?
Specifically, as #WiktorStribizew notes, just skip iterations you don't want:
pattern = "^(?P<username>.*?)(?:\|{2}[^|]+){2}\|{2}(?P<botprob>.*?)(?:\|{2}|$)"
for paper in tweet:
for item in re.finditer(pattern,paper):
item = item.groupdict()
if item["botprob"] < 0.7:
continue
print(item)
This could be wrapped in a generator expression to save the explicit continue, but there's enough going on as it is without making it harder to read (in this case).
UPDATE since you are apparently in a function:
pattern = "^(?P<username>.*?)(?:\|{2}[^|]+){2}\|{2}(?P<botprob>.*?)(?:\|{2}|$)"
items = []
for paper in tweet:
for item in re.finditer(pattern,paper):
item = item.groupdict()
if float(item["botprob"]) > 0.7:
items.append(item)
return items
Or using comprehensions:
groupdicts = (item.groupdict() for paper in tweet for item in re.finditer(pattern, paper))
return [item for item in groupdicts if float(item["botprob"]) > 0.7]
I would like it to only return dictionaries whose botprob is above 0.7.
entries = [{'username': 'yashrgupta ', 'botprob': ' 0.30794588629999997 '},
{'username': 'sterector ', 'botprob': ' 0.39391528649999996 '},
{'username': 'MalcolmXon ', 'botprob': ' 0.05630123819 '},
{'username': 'ryechuuuuu ', 'botprob': ' 0.08492567222000001 '},
{'username': 'dpsisi ', 'botprob': ' 0.8300337045 '}]
filtered_entries = [e for e in entries if float(e['botprob'].strip()) > 0.7]
print(filtered_entries)
output
[{'username': 'dpsisi ', 'botprob': ' 0.8300337045 '}]
I'm new at using web scrapy and I've been trying to get the right xpath from this portion of the code.
from this website
hmtl code
I've been using this scrapy commands:
response.xpath('//*[#id="companycontent"]/div/div/div[2]/div/div[6]/div').getall()
This is the Output:
['<div class="address">\r\n <h4>Address <span>1</span></h4>\r\n <strong>Office : </strong>1715 , 1714<br>\r\n <strong>Floor : </strong>Floor 17<br>\r\n <strong>Building : </strong>Shatha Tower<br>\r\n Dubai Internet City<br><br>\r\n \t\t</div>']
response.xpath('//*[#id="companycontent"]/div/div/div2/div/div[6]/div').get()
'\r\n Address 1\r\n Office : 1715 , 1714\r\n Floor : Floor 17\r\n Building : Shatha Tower\r\n Dubai Internet City\r\n \t\t'
And this one:
response.xpath('//div[contains(#class, "address")]/text()').extract()
with the output:
['\r\n \r\n \r\n \t\t\t\t\t\t\t\t ', '\r\n ', '\r\n ', '1715 , 1714', '\r\n ', 'Floor 17', '\r\n ', 'Shatha Tower', '\r\n Dubai Internet City', '\r\n \t\t', ' \r\n \t\t\r\n\r\n\t\t\t\t\t\t \r\n ']
response.xpath('//div[contains(#class, "address")]/text()').getall()
['\r\n \r\n \r\n \t\t\t\t\t\t\t\t ', '\r\n ', '\r\n ', '1715 , 1714', '\r\n ', 'Floor 17', '\r\n ', 'Shatha Tower', '\r\n Dubai Internet City', '\r\n \t\t', ' \r\n \t\t\r\n\r\n\t\t\t\t\t\t \r\n ']
I'm sure the first command will do the job but I was wondering if there's a shorter xpath command to run the script.
Hope anyone can help me.
Finding text by xpath follow as //tag-name[#class="class-name"] you can follow this approach and find the data
Code:
from selenium import webdriver
path="C:\Program Files (x86)\chromedriver.exe"
driver=webdriver.Chrome(path)
driver.get("https://tecomgroup.ae/directory/company.php?company=0016F00001wcgFJQAY&csrt=2648526569298119449")
data=driver.find_element_by_xpath('//div[#class="address"]')
data.text.split("\n")
Output:
['ADDRESS 1',
'Office : 1715 , 1714',
'Floor : Floor 17',
'Building : Shatha Tower',
'Dubai Internet City']
You could use also css selector response.css('div.address > div.address ::text')
print(`[x.strip() for x in response.css('div.address > div.address ::text').getall() if x.strip()]`)
I am using the lxml xpath of python. I am able to extract text if I give the full path to a HTML tag. However I can't extract all the text from a tag and it's child elements into a list. So for example given this html I would like to get all the texts of the "example" class:
<div class="example">
"Some text"
<div>
"Some text 2"
<p>"Some text 3"</p>
<p>"Some text 4"</p>
<span>"Some text 5"</span>
</div>
<p>"Some text 6"</p>
</div>
I would like to get:
["Some text", "Some text 2", "Some text 3", "Some text 4", "Some text 5", "Some text 6"]
mzjn-s anwer is correct. After some trial and error I've managed to get it working. This is what the end code looks like. You need to put //text() to the end of the xpath. It is without refactoring for the moment, so there will definitely be some mistakes and bad practices but it works.
session = requests.Session()
retry = Retry(connect=3, backoff_factor=0.5)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://', adapter)
page = session.get("The url you are webscraping")
content = page.content
htmlsite = urllib.request.urlopen("The url you are webscraping")
soup = BeautifulSoup(htmlsite, 'lxml')
htmlsite.close()
tree = html.fromstring(content)
scraped = tree.xpath('//html[contains(#class, "no-js")]/body/div[contains(#class, "container")]/div[contains(#class, "content")]/div[contains(#class, "row")]/div[contains(#class, "col-md-6")]/div[contains(#class, "clearfix")]//text()')
I've tried it out on the team introduction page of keeleyteton.com. It returned the following list which is correct (although needs lots of amending!) because they are in different tags and some are children tags. Thank you for the help!
['\r\n ', '\r\n ', 'Nicholas F. Galluccio', '\r\n ', '\r\n ', 'Managing Director and Portfolio Manager', '\r\n ', 'Teton Small Cap Select Value', '\r\n ', 'Keeley Teton Small Mid Cap Value', '\r\n ', '\r\n ', '\r\n ', 'Scott R. Butler', '\r\n ', '\r\n ', 'Senior Vice President and Portfolio Manager ', '\r\n ', 'Teton Small Cap Select Value', '\r\n ', 'Keeley Teton Small Mid Cap Value', '\r\n ', '\r\n ', '\r\n ', 'Thomas E. Browne, Jr., CFA', '\r\n ', '\r\n ', 'Portfolio Manager', '\r\n ', 'Keeley Teton Small and Mid Cap Dividend Value', '\r\n ', 'Keeley Teton Small and Small Mid Cap Value', '\r\n ', '\r\n ', '\r\n ', 'Brian P. Leonard, CFA', '\r\n ', '\r\n
', 'Portfolio Manager', '\r\n ', 'Keeley Teton Small and Mid Cap Dividend Value', '\r\n ', 'Keeley Teton Small and Small Mid Cap Value', '\r\n ', '\r\n ', '\r\n ', 'Robert M. Goldsborough', '\r\n ', '\r\n ', 'Research Analyst', '\r\n ', 'Keeley Teton Small and Mid Cap Dividend Value', '\r\n ', '\r\n ', '\r\n ', 'Brian R. Keeley, CFA', '\r\n ', '\r\n ', 'Portfolio Manager', '\r\n ', 'Keeley Teton Small and Small Mid Cap Value', '\r\n ', '\r\n ', '\r\n ', 'Edward S. Borland', '\r\n ', '\r\n
', 'Research Analyst', '\r\n ', 'Keeley Teton Small and Small Mid Cap Value', '\r\n ', '\r\n ', '\r\n ', 'Kevin M. Keeley', '\r\n ', '\r\n ', 'President', '\r\n
', '\r\n ', '\r\n ', 'Deanna B. Marotz', '\r\n ', '\r\n ', 'Chief Compliance Officer', '\r\n ']
I'm trying to write a nested dictionary to a CSV file and running into issues; either the file doesn't write anything, or it errors out.
The dictionary looks something like this:
finalDict = 'How would you rate the quality of the product?': [{'10942625544': 'High '
'quality'},
{'10942625600': 'Neither '
'high nor '
'low '
'quality'},
{'10942625675': 'Neither '
'high nor '
'low '
'quality'},
{'10942625736': 'Very high '
'quality'},
{'10942625788': 'Neither '
'high nor '
'low '
'quality'},
{'10942625827': 'Neither '
'high nor '
'low '
'quality'},
{'10942625878': 'Neither '
'high nor '
'low '
'quality'},
{'10942625932': 'High '
'quality'},
{'10942625977': 'High '
'quality'},
{'10942626027': 'Neither '
'high nor '
'low '
'quality'},
{'10942626071': 'High '
'quality'},
{'10942626128': 'High '
'quality'},
{'10942626180': 'Very high '
'quality'},
{'10942626227': 'Very high '
'quality'},
{'10942626278': 'High '
'quality'},
{'10942626332': 'Low '
'quality'},
{'10942626375': 'Very high '
'quality'},
{'10942626430': 'Low '
'quality'},
{'10942626492': 'Low '
'quality'}],
'How would you rate the value for money of the product?': [{'10942625544': 'Above '
'average'},
{'10942625600': 'Below '
'average'},
{'10942625675': 'Average'},
{'10942625736': 'Excellent'},
{'10942625788': 'Above '
'average'},
{'10942625827': 'Below '
'average'},
{'10942625878': 'Average'},
{'10942625932': 'Average'},
{'10942625977': 'Above '
'average'},
{'10942626027': 'Above '
'average'},
{'10942626071': 'Above '
'average'},
{'10942626128': 'Average'},
{'10942626180': 'Excellent'},
{'10942626227': 'Average'},
{'10942626278': 'Average'},
{'10942626332': 'Below '
'average'},
{'10942626375': 'Excellent'},
{'10942626430': 'Poor'},
{'10942626492': 'Below '
'average'}],
I've tried working off of Write Nested Dictionary to CSV but am struggling to adapt it to my specific case.
My code currently looks like:
def writeToCsv(finalDict):
csv_columns = ['Question', 'UserID', 'Answer']
filename = "output.csv"
with open(filename, "w") as filename:
w = csv.DictWriter(filename, fieldnames=csv_columns)
w.writeheader()
for data in finalDict: #where I'm stuck
Any recommendations would be appreciated!
This is an option:
def writeToCsv(finalDict):
csv_columns = ['Question', 'UserID', 'Answer']
filename = "output.csv"
with open(filename, "w") as fl:
w = csv.DictWriter(fl, fieldnames=csv_columns, lineterminator='\n')
w.writeheader()
for question, data in finalDict.items()
for item in data:
for user, answer in item.items():
w.writerow(dict(zip(csv_columns, (question, user, answer))))
for question, data in finalDict.items():
for resp in data:
row = {'Question': question,
'UserID': list(resp.keys())[0],
'Answer': list(resp.values())[0]}
w.writerow(row)