Python - Browser Automation - MechanicalSoup/ BeautifulSoup

Python - Browser Automation - MechanicalSoup/ BeautifulSoup - python

I am attempting to automate a button push that prompts javascript on my own website(godaddy server). I am running a .py from my MAC OS Terminal.
I have found beautiful soup and mechancial soup but only found documentation for parsing text or prefilling forms.
I have attempted mechanical soup code absent any 'form' mention, and merely trying to click a button based on a css selector.
I have played around with this code for a few hours and am not convinced what I want to accomplish is possible. Can anyone confirm this is possible with either of these modules? If not, what is a better tool?
I have just been using the example provided here: https://mechanicalsoup.readthedocs.io/en/stable/tutorial.html#first-contact-step-by-step
my code in a function:
def updatePrices()
br = mechanicalsoup.StatefulBrowser()
br.open("http://example.com")
br.get_current_page().find('button', id='exporter_decisionStream')
br.submit_selected()
updatePrices()
Any guidance is appreciated.
Thanks.

Related

How to write directly in the SEARCH boxes of websites with R

I am looking for a way to do web scraping on a web page after typing in its search box. Let me explain better with an example: I am looking for an R function that writes the word "notebook" directly on the amazon home page so that I can subsequently do web scraping of that generated page.
Any help?
Any suggestions?
Maybe I could do it in Python?
Thanks everyone for the help.

In python you have several modules designed for web scraping, i let you a list with the most common ones.
Requests
Beautiful Soup 4
lxml
Selenium
Scrapy

Just scrape the webpage from
https://www.amazon.com/s?k=whatever you want to search
Any sort of website will give you a url with a query when you search. just scrape from that url.

Url request does not parse every information in HTML using Python

I am trying to extract information from an exchange website (chiliz.net) using Python (requests module) and the following code:
data = requests.get(url,time.sleep(15)).text
I used time.sleep since the website is not directly connecting to the exchange main page, but I am not sure it is necessary.
The things is that, I cannot find anything written under <body style> in the HTML text (which is the data variable in this case). How can I reach the full HTML code and then start to extract the price information from this website?
I know Python, but not familiar with websites/HTML that much. So I would appreciate if you explain the website related info like you are talking to a beginner. Thanks!

There could be a few reasons for this.
The website runs behind a proxy server from what I can tell, so this does interfere with your request loading time. This is why it's not directly connecting to the main page.
It might also be the case that the elements are rendered using javascript AFTER the page has loaded. So, you only get the page and not the javascript rendered parts. You can try to increase your sleep() time but I don't think that will help.
You can also use a library called Selenium. It simply automates browsers and you can use the page_source property to obtain the HTML source code.
Code (taken from here)
from selenium import webdriver
browser = webdriver.Firefox()
browser.get("http://example.com")
html_source = browser.page_source
With selenium, you can also set the XPATH to obtain the data of -' extract the price information from this website'; you can see a tutorial on that here. Alternatively,
once you extract the HTML code, you can also use a parser such as bs4 to extract the required data.

Using Selenium + Python 3.8 and having trouble clicking a link, wondering how to handle onclick?

I'm using python + selenium to try and automate a few things. I've had success using it before, but I am stuck on this issue. Every time I try and click on this link I get a "NoSuchElementException". Here is the html from the portion of the webpage:
Knowledge Base.html
Some things I have tried:
current_file = driver.find_element_by_xpath("//a[contains(#href, 'Knowledge Base.html')]")
current_file = driver.find_element_by_link_text("Knowledge Base.html")
I have also tried inspecting and copying the xpath of the link. On the webpage, clicking "Knowledge Base.html" brings us to a place where we can upload an html file. I am fairly new to python and HTML as well so I'm wondering if Selenium can't find the link because of the onclick=javascript function. Anyway, any advice is appreciated and let me know if I need to post a bigger chunk of the html or anything.

Pls help: using Python mechanize, but don’t know the form name on webpage

(I am a newbie in programming)
I am trying to write some Python codes to login a forum, this is webpage: the https://www.artofproblemsolving.com/Forum/ucp.php?mode=login&redirect=/index.php.
And unfortunately I don’t have knowledge in web page source codes. My main question is, how does the form name in the webpage source code looked like?
Because for below codes, row 4, I need the form name of the webpage. However I tried below but not working.
import mechanize
b = mechanize.Browser()
r = b.open("https://www.artofproblemsolving.com/Forum/ucp.php?mode=login&redirect=/index.php")
b.select_form(name="login")
b.form["login"] = "MYNAME"
b.form["password"] = "MYPASSWORD"
b.submit()
could you please help me? many thanks.

check how html page is structured, focus on form tag
install FireBug on FireFox browser or use the same built in mechanism if you are on chrome, then with FireBug open go to 'net' tab and check what calls was done to the server when you sumbited the form
install scrapy and go though its tutorial http://doc.scrapy.org/en/latest/intro/tutorial.html, when you feel comfortable enough check how to use scrapy FormRequest http://doc.scrapy.org/en/latest/topics/request-response.html#formrequest-objects
Enjoy!

How to scrape a web-site filling out forms and 'clicking' on links with R?

I would like to web-scrape the html source code of java-script pages that I can´t access without selecting one option in a drop-down list and, after, 'clicking' on links. Spite of not been in java, a simple example can be this:
Web-scrape the main wikipedia pages in all languages available in the drop-down list in the bottom of this url: http://www.wikipedia.org/
To do so, I need to select one language, English for example, and then 'click' in the 'Main Page' link in the left of the new url (http://en.wikipedia.org/wiki/Special:Search?search=&go=Go).
After this step, I would scrape the html source code of the wikipedia main page in English.
Is there any way to do this using R? I have already tried RCurl and XML packages, but it does not work well with the javascript page.
If it is not possible with R, could anyone tell me how to do this with python?

It's possible to do this using python with the selenium package. There are some useful examples here. I found it helpful to install Firebug so that I could identify elements on the page. There is also a Selenium Firefox plugin with an interactive window that can help too.
import sys
import selenium
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get("http://website.aspx")
elem = driver.find_element_by_id("ctl00_ctl00")
elem.send_keys( '15' )
elem.send_keys( Keys.RETURN )

Take a look at the RCurl and XML packages for posting form information to the website and then processing the data afterwards. RCurl is pretty cool, but you might have an issue with the HTML parsing because if it isn't standards compliant, the XML package may not want to play nice.
If you are interested in learning Python however, Celenius' example above coupled with beautifulSoup would be what you need.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - Browser Automation - MechanicalSoup/ BeautifulSoup - python

Related

How to write directly in the SEARCH boxes of websites with R

Url request does not parse every information in HTML using Python

Using Selenium + Python 3.8 and having trouble clicking a link, wondering how to handle onclick?

Pls help: using Python mechanize, but don’t know the form name on webpage

How to scrape a web-site filling out forms and 'clicking' on links with R?

Categories

Resources