I want to create a bot that scraps all images from a website and send them to me in return.
What I had thought was to make a python scraper with BeautifulSoup that gets the url images and then use the function bot.send_photo(chat_id, 'your URl') in a forloop with every url.
The thing is I actually don't know if this can be done in this way or if it has to use only the telegram's functions to work on mobile.
Related
I'm currently working on an Instagram bot and want to upload pictures using selenium. I'm emulating a phone in my selenium, but everytime I click on the upload button, Explorer opens and wants me to select the pictures manually.
Is there a way to bypass that to have picture uploads automated?
Selenium is simply a web Automator and is of no use for handling file selectors, or any of that kind that is distinct from web functionalities.
That being said, I do believe this can be solved by using a library called Sikuli. See this article on the use of this and how you can incorporate this into your script. Also see this answers for How to use Sikuli with Selenium in Python?
The other way is to of-course use the API.
I was able to upload pictures on instagram from python using a library called instabot
Then did the following:
from instabot import Bot
bot = Bot()
bot.login(username=USERNAME, password=PASSWORD)
bot.upload_photo(IMAGE_PATH, caption="Follow me on instagram! :D")
Note: you might get an error when running the script more than once. To overcome this, you might want to delete your config folder. Or make a script that automates such task.
I was just making a simple Python program that would update me about my Instagram likes and followers. I don't want to open the browser every time it runs so I'm using requests and Beautifulsoup to get the page data. But the get method gives me a script that is totally different from the actual page script(the script in the browser). There's no class so I can't use BeautifulSoup.find() method on it. It seems like I'm receiving a different script because it's a robot accessing the page.
Is there any other module or method I can use?
I'm developing a web crawler in python using Django framework. i want it to work like a web-app. Means if I open in two different browser tabs, they should work individually, each having its own data (crawled + queued links). Both of them should start crawling from separate URL and continue their work.
currently i have designed very simple version of it. it is working in one tab, does not work in another browser tab. I have even tried opening a new window of chrome but same results.
I'm not sure what feature or library i should use for that purpose. can somebody help me?
You can pass some key in the URL:
URL PATTER<your_domain>/crowled/<P>
you can open each URL in different TAB
TAB1: <your_domain>/crowled/abcd
TAB2: <your_domain>/crowled/xyz
OR you can send some key on request.GET
I would create default page for your app which is a form to accept one or more URLs to crawl.
When the 'submit' button is pressed the list of URLs is stored in the database and a background process, using something such as celery, works through the queue of URLs.
You don't say anything about how the results of the crawl are to be stored/presented, so I'm assuming you just want to kickstart the crawl and the pages are stored in some way by the code crawling the sites - with no response sent to the web page.
I am writing a web scraping application. When I enter the URL directly into a browser, it displays the JSON data I want.
However, if I use Python's request lib, or URLDownloadToFile in C++, it simply downloads the html for the login page.
The site I am trying to scrape it from (DraftKings.com) requires a login. The other sites I scrape from don't.
I am 100% sure this is related, since if I paste the url when I am logged out, I get the login page, rather than the JSON data. Once I log in, if I paste the URL again, I get the JSON data again.
The thing is that if I remain logged in, and then use the Python script or C++ app to download the JSON data, as mentioned.... it downloads the Login HTML.
Anyone know how I can fix this issue?
Please don't ask us to help with an activity that violates the terms of service of the site you are trying to (ab-)use:
Using automated means (including but not limited to harvesting bots, robots, parser, spiders or screen scrapers) to obtain, collect or access any information on the Website or of any User for any purpose.
Even if that kind of usage were allowed, the answer would be boring:
You'd need to implement the login functionality in your scraper.
I'm using urllib and BeautifulSoup to crawl twitter data. But when I try to crawl someone else's data, for example, https://twitter.com/twitterapi/followers, it needs login.
How can I implement automatically login with Python script? Or, how to simulate the login process using Python?
You don't need to screen-scrape. Twitter has a perfectly functional API.
https://dev.twitter.com/docs/api