I have an idea for a program that I would like to try to write. Basically, I'm looking to check if I'm accessing a specific website. Then if I'm on that specific website the program immediately terminates that site.
I want to block a few websites from myself for times when I need to focus on school or work. Say I try to check my Facebook, I want it to close itself no matter how many times I try.
Does anyone know a way to check if a specific website is being opened?
Related
I am trying to write a program that goes to the following site and downloads the Excel file that automatically downloads when clicking the XLS button on the bottom of the page. To be honest, I am quite new to programming.
Site: https://echa.europa.eu/assessment-regulatory-needs
At first I tried using Selenium to let the program click itself through the browser and really click on the button. However, I think the website detects the usage of automated software and I cannot bypass the disclaimer that appears when opening the website.
Then I read some answers on the same topic where it is possible by using the requests module. However, I could not get it to work.
One thing I think I understood from this anwer was that you need to get the site/server where the data is requested from by inspecting the button with F12 in the browser. I tried this, and thought I had it, however I cannot get the code to function. I learnt from this answer that you need to give the referer as well, bu I think the referer from this file ist only partially written out, as it is "https://echa.europa.eu/assessment-regulatory-needs".
This answer explained the network process more in detail, howver I am not able to recreate it. Also, to be honest i do not fully understand what the API is and how to search for it.
I also found this answer, but it does not work for me either.
So I am asking for help on this, as I think that my HTML, Java, Website, Python knowledge is too tiny to see what I have to change to be able to download the excel file.
I must get the URLs of all subpages found within one Google Site in the editor mode. I have a subpage for each Form(1 to 6 Upper) of all classes at school. However, I intend to automate any future changes using Python code. I must be able to access each page and upload photos to the subpages under each one. But, for that, I must get onto the subpage itself.
Basically, the web structure goes like this:
EVERYTHING -> CLASSES -> SUBJECTS
I have tried using Selenium for automation but that idea didn't work out since I cannot log in with Google once it enters automation mode since Selenium is active. I have tried using a program to simulate mouse motion and actually click on the subpages but it is far too complex and after several unsuccessful attempts, I gave up.
I need ideas on what I should do to access each subpage and retrieve its URL. I would appreciate if someone could help me because I am really stuck as I cannot hope to update the entire site manually on a weekly basis.
If someone could show me the code which would perform this task, I would appreciate it too much to express in words. No matter what, thanks very much!
I am trying to get started with webcrawling.
My main struggle is that I need a visual interface linked to python that allows me to see what is happening as I crawl the webpage. The main idea is that I have this webpage which after I load the url I have to press an x to be redirected to a new page from which I want to extract some data. However, using an inspector I am having a hard time finding the actual redirecting link.
link:https://shop.axs.co.uk/Lw%2fYCwAAAAA6dpvSAAAAAABB%2fv%2f%2f%2fwD%2f%2f%2f%2f%2fBXRoZW8yAP%2f%2f%2f%2f%2f%2f%2f%2f%2f%2f
PS: The main reason is because I want to buy some concert tickets, to go see a band my dad loves, but tickets are currently sold out. Sometimes people resell theirs and I want to detect when tickets are available on the second page and then give myself a notification that on the visual interface I am using I am able to proceed to buy the tickets.
I know I am asking for alot but I really want to get me and my dad to the concert.
Thank you in advance kind stranger.
To begin with. You need to use Selenium because interacting with javascript requires something more advance than just a scraper.
There you have a simple tutorial:
https://realpython.com/modern-web-automation-with-python-and-selenium/
I want to get some information on a web page. I use requests.get to abstract the page. But I cannot find what I want. Checking it carefully, I found the info I want is in a list with a scrollbar. When I drag scrollbar down, more and more info is loaded. So I guess all the info in the list is not loaded yet when I get the page by module requests. I want to know what is happened in this process and How can I gather the information I want. (I am not familiar with Html language).
I want to know what is happened in this process
It sounds like when the user scrolls, the scrolling causes some javascript(js) to execute, and the js makes repeated requests to the server for more data. Unfortunately, the requests module cannot cause the javascript on an html page to execute--all you get back is the text of the js. The unable to execute javascript on an html page in order to retrieve what the user actually sees has been a problem for a long time. Fortunately, smart programmers have largely solved that problem. You need to use a different module. Check out the selenium module.
I am not familiar with Html language
Scraping web pages can get really tricky really fast, and some web pages proactively try to prevent computer programs from scraping their content, so you need to know both html and js in order to figure out what is going on.
First want to say that I have experience with python and some web libraries like mechanize, beautiful soup, urllib2.
The idea is to create an app that will grab information from webpage, that I currently looking on in webbrowser. And than store it.
For example:
I manually go to the website, create a user.
Than run my app, that will grab some details from webpage, that I'm currently looking on. like user name, first name, last name and so on.
Problems:
I don't know how to make a program to run kinda on top of my webbrowser. I can't simply make a scipt to login to this webpage and do the rest with Beautiful Soup because it has a very good protection from web-crawlers and web bots.
Need some place to start. So the main question is is it possible to grab information that currently on my web browser? if yes hope to hear some suggestions on how to make my program look at the browser?
Please fill free to ask me if you not kinda understand what I'm asking, or you have some suggestions, some libraries that I can use.
The easiest thing to do is probably to save the HTML content of the current page to a file (using File -> Save Page As or whatever it is in your browser) and then running Beautiful Soup / lxml.html / whatever on that file.
You could probably also get Selenium to do what you want, though I've never used it and am not sure.