Getting started Webcrawling - python

I am trying to get started with webcrawling.
My main struggle is that I need a visual interface linked to python that allows me to see what is happening as I crawl the webpage. The main idea is that I have this webpage which after I load the url I have to press an x to be redirected to a new page from which I want to extract some data. However, using an inspector I am having a hard time finding the actual redirecting link.
link:https://shop.axs.co.uk/Lw%2fYCwAAAAA6dpvSAAAAAABB%2fv%2f%2f%2fwD%2f%2f%2f%2f%2fBXRoZW8yAP%2f%2f%2f%2f%2f%2f%2f%2f%2f%2f
PS: The main reason is because I want to buy some concert tickets, to go see a band my dad loves, but tickets are currently sold out. Sometimes people resell theirs and I want to detect when tickets are available on the second page and then give myself a notification that on the visual interface I am using I am able to proceed to buy the tickets.
I know I am asking for alot but I really want to get me and my dad to the concert.
Thank you in advance kind stranger.

To begin with. You need to use Selenium because interacting with javascript requires something more advance than just a scraper.
There you have a simple tutorial:
https://realpython.com/modern-web-automation-with-python-and-selenium/

Related

Python: Checking if a specific website is open in your web browser?

I have an idea for a program that I would like to try to write. Basically, I'm looking to check if I'm accessing a specific website. Then if I'm on that specific website the program immediately terminates that site.
I want to block a few websites from myself for times when I need to focus on school or work. Say I try to check my Facebook, I want it to close itself no matter how many times I try.
Does anyone know a way to check if a specific website is being opened?

Suitable Python modules for navigating a website

I am looking for a python module that will let me navigate searchbars, links etc of a website.
For context I am looking to do a little webscraping of this website [https://www.realclearpolitics.com/]
I simply want to take information on each state (polling data etc) in relation to the 2020 election and organize it all in a collection of a database.
Obviously there are a lot of states to go through and each is on a seperate webpage. So im looking for a method in python in which i could quickly navigate the site and take the data of each page etc aswell as update and add to existing data. So finding a method of quickly navigating links and search bars with my inputted data would be very helpful.
Any suggestions would be greatly appreciated.
# a simple list that contains the names of each state
states = ["Alabama", "Alaska" ,"Arizona", "....."]
for state in states:
#code to look up the state in the searchbar of website
#figures being taken from website etc
break
Here is the rough idea i have
There are many options to accomplish this with Python. As #LD mentioned, you can use Selenium. Selenium is a good option if you need to interact with a websites UI via a headless browser. E.g clicking a button, entering text into a search bar, etc. If your needs aren't that complex, for instance if you just need to quickly scrape all the raw content from a web page and process it, than you should use the requests module from Python's standard library.
For processing raw content from a crawl, I would recommend beautiful soup.
Hope that helps!

Program for Google Sites in Python

I must get the URLs of all subpages found within one Google Site in the editor mode. I have a subpage for each Form(1 to 6 Upper) of all classes at school. However, I intend to automate any future changes using Python code. I must be able to access each page and upload photos to the subpages under each one. But, for that, I must get onto the subpage itself.
Basically, the web structure goes like this:
EVERYTHING -> CLASSES -> SUBJECTS
I have tried using Selenium for automation but that idea didn't work out since I cannot log in with Google once it enters automation mode since Selenium is active. I have tried using a program to simulate mouse motion and actually click on the subpages but it is far too complex and after several unsuccessful attempts, I gave up.
I need ideas on what I should do to access each subpage and retrieve its URL. I would appreciate if someone could help me because I am really stuck as I cannot hope to update the entire site manually on a weekly basis.
If someone could show me the code which would perform this task, I would appreciate it too much to express in words. No matter what, thanks very much!

Selenium script to remove selected page likes from Facebook page

I'm looking for a script or example of script which traverses through list of users liking a Facebook page I administer and removes likes meeting some simple criteria (e.g. country). Maybe some selenium code?
Has somebody seen something like that over the web, or maybe somebody could share some code?
You can accomplish that with selenium. But, begin with the selenium Firefox extension to record the scenario you want and then convert it to a python script
for more info I recommend you to read this docs

Suggestion on creation app for getting information from webpage

First want to say that I have experience with python and some web libraries like mechanize, beautiful soup, urllib2.
The idea is to create an app that will grab information from webpage, that I currently looking on in webbrowser. And than store it.
For example:
I manually go to the website, create a user.
Than run my app, that will grab some details from webpage, that I'm currently looking on. like user name, first name, last name and so on.
Problems:
I don't know how to make a program to run kinda on top of my webbrowser. I can't simply make a scipt to login to this webpage and do the rest with Beautiful Soup because it has a very good protection from web-crawlers and web bots.
Need some place to start. So the main question is is it possible to grab information that currently on my web browser? if yes hope to hear some suggestions on how to make my program look at the browser?
Please fill free to ask me if you not kinda understand what I'm asking, or you have some suggestions, some libraries that I can use.
The easiest thing to do is probably to save the HTML content of the current page to a file (using File -> Save Page As or whatever it is in your browser) and then running Beautiful Soup / lxml.html / whatever on that file.
You could probably also get Selenium to do what you want, though I've never used it and am not sure.

Categories

Resources