Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
and thanks in advance! I was hoping someone might be able to point me in the right direction as to how to scrape a searchable online database. Here is the url: https://hord.ca/projects/eow/. If possible, I'd like to be able to access all of the data from the site's database, I'm just not sure how to access it using bs4... Maybe bs4 isn't the answer here though. Still a relatively new Pythonista, any help is greatly appreciated!
Since you are new there are going to be a combination of things you need to address, you need to have a good handle on where to look in html, make sure you understand how the site works, what does it put into its URLs, and why? what are the class names of the important bits of the site you will want to reference? and how does it handle multipage display (if it does so at all).
once you are intimate with the website you are scraping you will need to apply that knowledge when you go to make your automation.
for beginners id highly reccomend this ebook: https://automatetheboringstuff.com/
its a great read and easy to follow for even the beginner in both python and html. even better its free to read on the site!
chapter 11 is the part you are specifically looking for on webscraping. which will give you the rundown on what you need to be looking for and how to go about planning your code.
but i highly recommend you read the whole thing once you are done focusing on your current project.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I am a python beginner and I am a little experienced in OO-programming in Java and PHP and also fucntional programming in R . Thus, my question is considering the general usage of python scripts in everyday use-cases.
I want to "learn" how to think/approach a problem that I do experience when facing a situation with my software where a "script" could help me out or improve something.
For instance, I've heard friends talking about their self-made python scripts to evenly mute the audio of movies to avoid loud outliers in explosive scenes, etc. Another example, in my case righ tnow, is to filter out certain pictures with no GPS-time meta information for the timezone in order to sort these fotos in accordance with the others.
I really want to get the essence and recipe based on the aforementioned examples to better integrate Python in my everyday life and get an intuitive feeling for it. (i.e. how would a simple script look like that takes a picture, filters out its meta data, and does something -> where do I have to run the script so I can call the function with these .JPG files as its arguments?).
I would also be glad if some of you could recommend some practical tutorials or literature.
Thank you in advance :)
P.S. I know it is not a concrete question but rather it is intended to get a glimpse on a wide field of usage and thinking - but I want to get this essential take away that motivates me and shows me the direction.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
So I have made a project website with a couple html files, css and some javascript. I was thinking that I would add a login functionality and some other things like that. So I set off to find out what would be good for it. Now its is hard for me because I can't decide which one to use. I saw Django on python but I had to make a whole new project for it and I have to start from scratch. I was just thinking I would add a database to this site using something. I am just doing some lightweight things. What should I use? Thank you.
Well if you are not going to use python for this with Django, you could use PHP and MySQL. I do it all the time for small sites. Basically just go to your hosting panel and create a new DB and then use php to create a connection script and then go from there as far as adding and using the database. All web hosts support this feature that I know of. If you go with Django though, you can simply add the sqllite3 DB or use something like MongoDB or any other DB out there really. Please look up the exact documentation for whichever route you choose.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
Might be a silly question... I want to use a python script to get some data from a website every 10 or 20min.
I'm using:
requests.get("http://somewebsite.php")
data = response.text
to get the data, and the rest is basically extraction of values from the string etc.
I would like to loop it and make a new request to the website every 10 or 20min to get data.
Assuming I'm running this script for few hours:
Would it look suspicious to the owner of the website?
Would it in any way 'hurt' the website or is it just equivalent to refreshing the website in the browser?
I just don't want someone, somewhere think something malicious is happening when I'm just playing around learning python. The data is not even important, I just want to see if the script that I wrote works. I just figured I might ask here before running it.
Thanks for any replies in advance.
Although you don't want to do any harm, you can misconfigure the script by accident (we are just humans), generate suspicious activity and a real person might spend some time investigating your activity (I'm not kidding, these things really happen).
My suggestion is to use a testing service like https://httpbin.org/ to play with the requests library. HttpBin is actually created by the same person who started the requests library (Kenneth Reitz).
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
In reference towards me question, how would one be able to input data and retrieve data from various websites (not using an API)?
Is there a module that searches or acts like a human for purposes as in searching along applicably given fields; in effort to (as said before) retrieve data?
Sorry if I'm making my question hard to follow along; though if so, here's an example of what I am trying to accomplish:
Directing an AI towards a specific website.
Inputting data into the search field.
Then finally, retrieving said data after previously ran processes.
I'm fairly new to the section or field in manipulating websites via APIs or various (unknown) code; therefore, sorry if I missed anything!
You can use
mechanize,
BeautifulSoup,
Urllib,
Urllib2,
modules in Python. What I suggest you is use mechanize module. It is like scraping website through python program. More over simply a browser through python code.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am currently doing a research project and I am attempting to figure out a good way to identify ads given access to the html of a webpage.
I thought it might be a good idea to start with AdBlock. AdBlock is a program that prevents ads from being displayed to the user, so presumably it has a mechanism for identifying things as ads.
I downloaded the source code for AdBlockPlus, but I find myself completely lost in all of the files. I am not sure where to start looking for this detection mechanism, so I was wondering if anyone had any advice on where to start. Alternatively if you have dealt with AdBlock before and are familiar with it, I would appreciate any extra information.
For example, if the webpage needs to be rendered in a real browser to use Adblock, there are programs that will automate the loading of a webpage so this wouldn't be a problem but I am not sure how to figure out if this is what AdBlock does in the first place.
Note: AdBlock is written in Python and Perl :)
Thanks!
I would advise you to first have a look at writing adblock filter rules.
Then, once you get an idea of this, you can start parsing adblock lists available in various languages to suit your needs.