I'm looking for a script or example of script which traverses through list of users liking a Facebook page I administer and removes likes meeting some simple criteria (e.g. country). Maybe some selenium code?
Has somebody seen something like that over the web, or maybe somebody could share some code?
You can accomplish that with selenium. But, begin with the selenium Firefox extension to record the scenario you want and then convert it to a python script
for more info I recommend you to read this docs
Related
I'd like to ask somebody with experience with headless browsers and python if it's possible to extract box info with distance from closest strike on webpage below. Till now I was using python bs4 but since everything is driven by jQuery here simple download of webpage doesn't work. I found PhantomJS but I wasn't able extract it too so I am not sure if it's possible. Thanks for hints.
https://lxapp.weatherbug.net/v2/lxapp_impl.html?lat=49.13688&lon=16.56522&v=1.2.0
This isn't really a Linux question, it's a StackOverflow question, so I won't go into too much detail.
The thing you want to do can be easily done with Selenium. Selenium has both a headless mode, and a heady mode (where you can watch it open your browser and click on things). The DOM query API is a bit less extensive than bs4, but it does have nice visual query (location on screen) functions. So you would write a Python script that initializes Selenium, goes to your website and interacts with it. You may need to do some image recognition on screenshots at some point. It may be as simple as finding for a certain query image on the screen, or something much more complicated.
You'd have to go through the Selenium tutorials first to see how it works, which would take you 1-2 days. Then figure out what Selenium stuff you can use to do what you want, that depends on luck and whether what you want happens to be easy or hard for that particular website.
Instead of using Selenium, though, I recommend trying to reverse engineer the API. For example, the page you linked to hits https://cmn-lx.pulse.weatherbug.net/data/lightning/v1/spark with parameters like:
_
callback
isGpsLocation
location
locationtype
safetyMessage
shortMessage
units
verbose
authid
timestamp
hash
You can figure out by trial and error which ones you need and what to put in them. You can capture requests from your browser and then read them yourself. Then construct appropriate requests from a Python program and hit their API. It would save you from having to deal with a Web UI designed for humans.
For example, take a look at this page. https://erecord.co.lubbock.tx.us/recorder/eagleweb/viewDoc.jsp?node=DOCCL-OPR19830027670
I would like to get all the data from a page like that with the user entering a name to my program.
Should I do this with web scraping or is using an API (if there is one) best? How would you go about tackling this project?
The simple answer is definitely API first if one exists. The second is to figure out what type of pages the site is serving. It looks like JSP, so any of the JSP related scraping would be the way forward. Not sure it merits another answer since there's lots of stuff out there.
For example here are answers to scraping jsp with python:
How do I web-scrape a JSP with Python, Selenium and BeautifulSoup?
I am looking for a python module that will let me navigate searchbars, links etc of a website.
For context I am looking to do a little webscraping of this website [https://www.realclearpolitics.com/]
I simply want to take information on each state (polling data etc) in relation to the 2020 election and organize it all in a collection of a database.
Obviously there are a lot of states to go through and each is on a seperate webpage. So im looking for a method in python in which i could quickly navigate the site and take the data of each page etc aswell as update and add to existing data. So finding a method of quickly navigating links and search bars with my inputted data would be very helpful.
Any suggestions would be greatly appreciated.
# a simple list that contains the names of each state
states = ["Alabama", "Alaska" ,"Arizona", "....."]
for state in states:
#code to look up the state in the searchbar of website
#figures being taken from website etc
break
Here is the rough idea i have
There are many options to accomplish this with Python. As #LD mentioned, you can use Selenium. Selenium is a good option if you need to interact with a websites UI via a headless browser. E.g clicking a button, entering text into a search bar, etc. If your needs aren't that complex, for instance if you just need to quickly scrape all the raw content from a web page and process it, than you should use the requests module from Python's standard library.
For processing raw content from a crawl, I would recommend beautiful soup.
Hope that helps!
I am looking for resources or guide so that I could build a Python code to fill my ~2k online forms automatically. Sorry I dont have any script to share as many resources in which python code is written to go to form URL and fill it. Since in my case it is a pop up form it doesnt really have a real URL.
Please be kind, I am new to Python.
Is there a way to do something to imitate clicks on browser window and fill in new values in the form ?
You can imitate clicks in the browser using selenium https://realpython.com/modern-web-automation-with-python-and-selenium/. There are plenty of tutorials how to do that.
Other tools would be:
https://www.cypress.io/
http://wwwsearch.sourceforge.net/mechanize/
If you don't want to write code - an
extension in the browser: https://www.seleniumhq.org/projects/ide/
First want to say that I have experience with python and some web libraries like mechanize, beautiful soup, urllib2.
The idea is to create an app that will grab information from webpage, that I currently looking on in webbrowser. And than store it.
For example:
I manually go to the website, create a user.
Than run my app, that will grab some details from webpage, that I'm currently looking on. like user name, first name, last name and so on.
Problems:
I don't know how to make a program to run kinda on top of my webbrowser. I can't simply make a scipt to login to this webpage and do the rest with Beautiful Soup because it has a very good protection from web-crawlers and web bots.
Need some place to start. So the main question is is it possible to grab information that currently on my web browser? if yes hope to hear some suggestions on how to make my program look at the browser?
Please fill free to ask me if you not kinda understand what I'm asking, or you have some suggestions, some libraries that I can use.
The easiest thing to do is probably to save the HTML content of the current page to a file (using File -> Save Page As or whatever it is in your browser) and then running Beautiful Soup / lxml.html / whatever on that file.
You could probably also get Selenium to do what you want, though I've never used it and am not sure.