I'm using Selenium to scrape table data from a website. I found that I can easily iterate through the rows to get the information that I need using xcode. Does selenium keep hitting the website every time I search for an object's text by xcode? Or does it download the page first and then search through the objects offline?
If the former is true does is there a way to download the html and iterate offline using Selenium?
Selenium uses a Web Driver, similar to your web browser. Selenium will access/download the web page once, unless you've wrote the code to reload the page.
You can download the web page and access it locally in selenium. For example you could get selenium to access the web page "C:\users\public\Desktop\index.html"
Related
I need to get the table on this website on live basis & unable download csv as the link is hidden in java script. Selenium is also not able access this website - https://www.nseindia.com/option-chain.
You can use beautifulsoup for scraping and get the table by id here is the doc
I'm trying to scrape data within the iFrame.
I have tried webdriver in Chrome as well as PhantomJS with no success. There are source links contained within the iframe where I assume its data is being pulled from, however, when using these links an error is generated saying "You can't render widget content without a correct InstanceId parameter."
Is it possible to access this data using python (PhantomJS)?
Go to network tools in your browser and investigate what data go to the server and just scrape via simple requests.
I want to crawl some data from Google Chrome Website. But I am facing a problem whenever trying to use selenium webdriver. When I use following code I get an error stating that this element doesn't exist in the site.
button = driver.find_element_by_class_name("a-d-l-L")
Snapshot of the website:
And also, how to get data from a pop up window (this window comes up when I press a button). Following screen can be found on the next page. I want to store the data that is showing in the pop up message.
Google has special permissions on the Chrome Webstore, as of now you can't use Selenium to automate any page on the Chrome Webstore website.
I use the python requests (http://docs.python-requests.org/en/latest/) library in order to download and parse particular web pages. This works fine as long as the page is not dynamic. Things look different if the page under consideration uses javascript.
In particular, I am talking about a web page that automatically loads more content once you scrolled to the bottom of the page so that you can continue scrolling. This new content is not included in the page's source text, thus, I can't download it.
I thought about simulating a browser in python (selenium) - is this the right way to go?
I am trying to scrap a website where targeted items are populated using document.write method. How can I get full browser html rendered version of the website in the Scrapy?
You can't do this, as scrapy will not execute the JavaScript code.
What you can do:
Rely on a headless browser like Selenium, which will execute the JavaScript. Afterwards, use XPath (or simple DOM access) like before to query the web page after executing the page.
Understand where the contents come from, and load and parse the source directly instead. Chrome Dev Tools / Firebug might help you with that, have a look at the "Network" panel that shows fetched data.
Especially look for JSON, sometimes also XML.