Use Python script to scrape comments (Spot.IM) - python

I am trying to download about 6000+ comments from this link, which uses Spot.IM to manage the comments. I saw an earlier solution posted here that requires a Spot.IM token, but the token can only be given by the account manager (I presume it requires a paid account).
Is there any other way to download the comments without the need for a token?

Yes, you can use a webdriver and Selenium for it.
You can follow this link to start:
https://selenium-python.readthedocs.io/getting-started.html

Related

How I could download a single post by url using instaloader?

I have a problem with python library instaloader, this one is really cool but I can`t find a method to download a post by url or post id. Everything I have found is terminal command in official documentation.
instaloader -- -B_K4CykAOtf
But this isn`t my decision, I need a way to use it in script. Hope somebody know answer, thank for attention
instaloader -- -B_K4CykAOtf

How to automatically log in using python

I want to use python to automatically enter a website, login to it and maybe pressing some buttons or something like that(for example entering an online class automatically).
And I also don't want to use selenium or something like that(I don't want to use web drivers).
And if you are suggesting me the "requests" please tell me how should I do it.
thanks a lot for your help.
You cannot use requests or beautifulsoup or urllib for clicking purposes,
Here is the answer for why you can't
I suggest you to use selenium it is easy to use here is the documentation
https://www.selenium.dev/documentation/en/
And you can download chrome driver from here (if you are using chrome):
https://chromedriver.chromium.org/downloads
If you want a detailed explanation on how to install driver and start creating some scripts i found a youtube video which explains it all:
https://www.youtube.com/watch?v=8iAqUVvytJk&ab_channel=TheAmericanDeveloper

PythonScraping a Website that Requires Login

Right, so I understand how to scrape a website, but I'm having trouble using Python 3 to login to a site I'm trying to scrape.
I've included the HTML that the site uses. As I understand this is what is needed?
I tried a simple solution that appeared should work but hasn't (it appears to not login, and goes straight to the destination URL, skipping the login?).
Attempted Solution: https://pastebin.com/AEK6Qwnb (I've also tried a solution using RoboBrowser, but I couldn't succeed there either.)
Website HTML: https://pastebin.com/Jp8Zpq2a
Let me know if this information isn't sufficient and I can try to provide more.
Thanks in advance!
There are a number of possible solutions to this that will depend on the site, your needs and limitations as well as personal preference. However, a straight-forward solution is possible with selenium
from selenium import webdriver
account = 'account'
password = 'password'
browser = webdriver.get('desktop/test.html')
browser.find_element_by_id('Account').send_keys(account)
browser.find_element_by_id('password').send_keys(password)

What information do I need when scraping a website that requires logging in?

I want to access my business' database on some site and scrape it using Python (I'm using Requests and BS4, I can go further if needed). But I couldn't.
Can someone provide us with info and simple resources on how to scrape such sites.
I'm not talking about providing usernames and passwords. The site requires much more than this.
How do I know the info I am required to provide for my script aside of UN and PW(e.g. how do I know that I must provide, say, an auth token)?
How to deal with the site when there are no HTTP URLs, but hrefs in the form of javascript:__doPostBack?
And in this regard, how do I transit from the logging in page to the page I want (the one contained in the aforementioned mentioned javascript:__doPostBack)?
Are the libraries I'm using enough? or do you recommend using—and learning in my case—something else?
Your help is greatly appreciated and thanked.
You didn't mention what you use for scraping, but since this sounds like a lot of the interaction on this site is based on client-side code, I'd suggest using a real browser to do the scraping, and interacting with the site not using low-level HTTP requests but using client side interaction (such as typing in elements or clicking buttons). This way, you don't need to worry about what form data to send or how to get the URLs of links yourself.
One recommended method of doing this would be to use BeutifulSoup with Selenium / WebDriver. There are multiple resources on how to do this, for example: How can I parse a website using Selenium and Beautifulsoup in python?

Using Mechanize for python, need to be able to right click

My script logs in to my account, navigates the links it needs to, but I need to download an image. This seems to be easy enough to do using urlretrive. The problem is that the src attribute for the image contains a link which points it to the page which initiates a download prompt, and so my only foreseeable option is to right click and select 'save as'. I'm using mechanize and from what I can tell Mechanize doesn't have this functionality. My question is should I switch to something like Selenium?
Mechanize, last I checked, was pretty poorly maintained and documented. Selenium has a much more active community.
That being said: why do you need mechanize to do this? Why not just use urllib?
I would try to watch Chrome's network tab, and try to imitate the final request to get the image. If it turned out to be too difficult, then I would use selenium as you suggested.

Categories

Resources