Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Can I use urllib2 to open a webpage which contain video (like vimeo page) and this visit will be counted as view?
In general, yes. A request done with urllib2 will be a normal HTTP request and as such will be recognized as a normal “visit” for the server you are connecting to. Depending on what additional headers you set, you can even make yourself look like a common browser, so they won’t be able to filter you out either.
As far as video counts go however, I’m pretty sure that simply visiting the site—without executing any code on it, and without actually playing the video—will not increase the view counter. In addition, these sites employ some systems to prevent abuse of the counter too. So if you have the hope to be able to spoof real views and increment the view counter by repeatedly visiting the page, then you will be out of luck.
As for actually playing—if you are interested in the content instead of the view counter—then yes, you can use Python to get access to the video. Of course Python won’t be able to play it, but you can download it instead. There are scripts like this one that already do this for you too.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
Might be a silly question... I want to use a python script to get some data from a website every 10 or 20min.
I'm using:
requests.get("http://somewebsite.php")
data = response.text
to get the data, and the rest is basically extraction of values from the string etc.
I would like to loop it and make a new request to the website every 10 or 20min to get data.
Assuming I'm running this script for few hours:
Would it look suspicious to the owner of the website?
Would it in any way 'hurt' the website or is it just equivalent to refreshing the website in the browser?
I just don't want someone, somewhere think something malicious is happening when I'm just playing around learning python. The data is not even important, I just want to see if the script that I wrote works. I just figured I might ask here before running it.
Thanks for any replies in advance.
Although you don't want to do any harm, you can misconfigure the script by accident (we are just humans), generate suspicious activity and a real person might spend some time investigating your activity (I'm not kidding, these things really happen).
My suggestion is to use a testing service like https://httpbin.org/ to play with the requests library. HttpBin is actually created by the same person who started the requests library (Kenneth Reitz).
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
In reference towards me question, how would one be able to input data and retrieve data from various websites (not using an API)?
Is there a module that searches or acts like a human for purposes as in searching along applicably given fields; in effort to (as said before) retrieve data?
Sorry if I'm making my question hard to follow along; though if so, here's an example of what I am trying to accomplish:
Directing an AI towards a specific website.
Inputting data into the search field.
Then finally, retrieving said data after previously ran processes.
I'm fairly new to the section or field in manipulating websites via APIs or various (unknown) code; therefore, sorry if I missed anything!
You can use
mechanize,
BeautifulSoup,
Urllib,
Urllib2,
modules in Python. What I suggest you is use mechanize module. It is like scraping website through python program. More over simply a browser through python code.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
It may seem like a stupid question, but I really can't find information about this in google.
I am trying to develop a server-client application in python language, I am searching for a correct way to save data on a computer.
I have a client, that when he click the "Register" button I want that his computer will save the information and he can auto-login when he secondly entered the program.
Should I make a new file, save it with the data in the computer and then, load it again and read the data? I really don't know is this is the correct way.
There are different approaches to this problem. You could save the credentials/token/.. to the local disk, but keep in mind that in some cases this might be consindered a security risk. If you do so you should probably store it under user's home folder to keep it from other (non-admin/root users) at least.
You could also store it and encrypt it with e "Master password" (like Firefox does if you enable it).
Or you could connect to a 3rd party authentication server and store your information there. It all depends on the use case you are implementing as well as the complexity required.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I want to create a little program with following features.
Use proxy in format proxy:port:username:password
Choose a proxy sequentialially from list
Open http://example.com
Fill Details choosing data from data.txt ( CSV )
Export Cookie,username,password,email address --> cookie.txt
Delete Cookies
Log into associated email account and confirm account by visiting
link sent to that email address.
Then cycle through Step1 again.
I read several similar question on stackoverflow.
I planned to use Selenium for this program, but reading comment here How to save and restore all cookies with Selenium RC?
the get_cookie method doesn't provide the path, domain, and expiry
date for each cookie, so it isn't possible to fully restore those
parameters with create_cookie. any other ideas
And i won't be able to manipulate cookies using method as describe here http://hub.tutsplus.com/tutorials/how-to-build-a-python-bot-that-can-play-web-games--active-11117
I want to know easiest way to tackle this problem. I plan to run single threaded application.
I don't know selenium, but why not use mechanize and requests ? Both are awesome.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am currently doing a research project and I am attempting to figure out a good way to identify ads given access to the html of a webpage.
I thought it might be a good idea to start with AdBlock. AdBlock is a program that prevents ads from being displayed to the user, so presumably it has a mechanism for identifying things as ads.
I downloaded the source code for AdBlockPlus, but I find myself completely lost in all of the files. I am not sure where to start looking for this detection mechanism, so I was wondering if anyone had any advice on where to start. Alternatively if you have dealt with AdBlock before and are familiar with it, I would appreciate any extra information.
For example, if the webpage needs to be rendered in a real browser to use Adblock, there are programs that will automate the loading of a webpage so this wouldn't be a problem but I am not sure how to figure out if this is what AdBlock does in the first place.
Note: AdBlock is written in Python and Perl :)
Thanks!
I would advise you to first have a look at writing adblock filter rules.
Then, once you get an idea of this, you can start parsing adblock lists available in various languages to suit your needs.