How can I convert a CSV file to HTML and open it in a web browser via Python using pandas.
Below is my program but I can not display them in the web page:
import pandas
import webbrowser
data = pandas.read_csv(r'C:\Users\issao\Downloads\data.csv')
data = data.to_html()
webbrowser.open('data.html')
You need to pass a url to webbrowser.
Save the html content into a local file and pass it's path to webbrowser
import os
import webbrowser
import pandas
data = pandas.read_csv(r'C:\Users\issao\Downloads\data.csv')
html = data.to_html()
path = os.path.abspath('data.html')
url = 'file://' + path
with open(path, 'w') as f:
f.write(html)
webbrowser.open(url)
You're missing a few steps:
Pandas does not build a full HTML page but only a element.
pd.DataFrame({'a': [1,2,3]}).to_html()
Returns: <table border="1" class="dataframe">...</table>
You need host the HTML somewhere and open a web browser. You can use a local file and do run the browser from python (os.system('firefox page.html'). But I doubt that is what you are looking for.
Doesn't answer the OP's question directly, but for someone who is looking at an alternative to Pandas, they can check out csvtotable(https://github.com/vividvilla/csvtotable), especially the option with "--serve". Sample usage would be something like this: csvtotable data.csv --serve. This "serves" the CSV file to the browser.
Related
Here is my code:
import os
import requests
from urllib.parse import urljoin
from bs4 import BeautifulSoup
url = "https://mathsmadeeasy.co.uk/gcse-maths-revision/"
#If there is no such folder, the script will create one automatically
folder_location = r'E:\webscraping'
if not os.path.exists(folder_location):os.mkdir(folder_location)
response = requests.get(url)
soup= BeautifulSoup(response.text, "html.parser")
for link in soup.select("a[href$='.pdf']"):
#Name the pdf files using the last portion of each link which are unique in this case
filename = os.path.join(folder_location,link['href'].split('/')[-1])
with open(filename, 'wb') as f:
f.write(requests.get(urljoin(url,link['href'])).content)
Any help as to why the code does not download any of my files format maths revision site.
Thanks.
Looking at the page itself, while it may look like it is static, it isn't. The content you are trying to access is gated behind some fancy javascript loading. What I've done to assess that is simply logging the page that BS4 actually got and opening it in a text editor:
with open(folder_location+"\page.html", 'wb') as f:
f.write(response.content)
By the look of it, the page is remplacing placeholders with JS, as hinted by the comment line 70 of the HTML file: // interpolate json by replacing placeholders with variables
For solutions to your problems, it seems BS4 is not able to load Javascript. I suggest looking at this answer for someone who had a similar problem. I also suggest looking into Scrapy if you intend to do some more complex web scraping.
I am trying to download a CSV file from morningstar using python.
here is the link:
http://financials.morningstar.com/income-statement/is.html?t=NAB®ion=aus
There is a button to "export CSV" but I can't access the link.
There is this javascript:exportKeyStat2CSV(); but I am not sure how to find the URL of the CSV file?
I tried to download the file and get the URL to use requests/panadas to download the file but requests/panadas cant get anything.
import pandas as pd
URL= ("http://financials.morningstar.com/finan/ajax/exportKR2CSV.html?&callback=?&t=XASX:NAB®ion=aus&culture=en-US&cur=&order=asc")
df=pd.read_csv(URL) ( didnt work with pandas)
import requests
print ('Starting to Download!')
r = requests.get(URL)
filename = url.split('/')[-1]
with open(filename, 'wb') as out_file:
out_file.write(r.content)
print("Download complete!")
I get Requests 204 code.
How do I solve the problem?
Have you tried accessing the resource via Selenium with Python?
See this example:
https://stackoverflow.com/a/51725319/5198805
I have been working on a csv based image scraper using beautifulsoup. This is becuase the links have to be modified before downloading.
This is the basis of the code :
import requests
import csv
from bs4 import BeautifulSoup
from urllib import urlretrieve
import csv
import os
import sys
url = '..............'
r = requests.get(url)
soup = BeautifulSoup(r.content,'lxml')
with open('output.csv', 'wb') as f:
bsoup_writer = csv.writer(f)
for link in soup.find_all('a',{'class': '........'}):
bsoup_writer.writerow([link.get('href')])
This is just part of the main code and it works very well on the page/link you're at. With that said I would like to use other csv file (this would be the crawling file) with list of links to feed to this code/py program so it could download from each link in that csv file. Hence is it possible to modify the url variable to call the csv file and iterate over the links in it?
I have been scratching my head on how to tackle this dilemma of mines for a while now. I have a Address column in my csv file, which contains list of Addresses. I want to be able to direct Python to search the website designated below with the individual address values in the csv file and save the results into a new csv file.
import csv
import requests
with open('C:/Users/thefirstcolumn.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
print(row['Address'])
website = requests.get('https://etrakit.friscotexas.gov/Search/permit.aspx')
writer = csv.writer(open('thematchingresults.csv', 'w'))
print website.content
For example:
One of the address value I have in the csv file:
6525 Mountain Sky Rd
returns three rows of data when I manually paste the address in the search box. How can I tell Python to search for each one of the addresses in the csv file on the website and save the results for each one of the addresses in a new csv file. How can I accomplish this mountainous task?
The request module downloads static HTML pages from the website. You cannot interact with Javascript
You need to use Selenium to interact with the website
For example
from selenium import webdriver
from bs4 import BeautifulSoup
import time
driver = webdriver.Firefox()
driver.get('https://etrakit.friscotexas.gov/Search/permit.aspx')
#read in addresses
with open('file.csv','r') as f:
adresses = f.readlines()
# use css selectors to locate search field
for address in adresses:
driver.find_element_by_css_selector('#cplMain_txtSearchString').clear()
driver.find_element_by_css_selector('#cplMain_txtSearchString').send_keys(address)
driver.find_element_by_css_selector('#cplMain_btnSearch').click()
time.sleep(5)
# JS injected HTML
soup = BeautifulSoup(driver.page_source)
# extract relavant info from the soup
# and save to your new csv here
You would need to do a POST request for each value you have in the csv file. For example, to search for "6525 Mountain Sky Rd" at https://etrakit.friscotexas.gov/Search/permit.aspx, you can look at the developer console to see what POST params it is giving. For example:
You can use something like requests and pass the header values and form data, or you could use something like casper or selenium to emulate the browser.
I have a URL, for example
url = "www.example.com/file/processing/path/excefile.xls"
This URL downloads an excel file directly when I paste it in a browser.
How can I use python to download this file? That is, if I run the python code the above URL should open in a browser and download the excel file.
If you don't necessarily need to go through the browser, you can use the urllib module to save a file to a specified location.
import urllib
url = 'http://www.example.com/file/processing/path/excelfile.xls'
local_fname = '/home/John/excelfile.xls'
filename, headers = urllib.retrieveurl(url, local_fname)
http://docs.python.org/library/urllib.html#urllib.urlretrieve
Use the webbrowser module:
import webbrowser
webbrowser.open(url)
You should definitely look into the awesome requests lib.