Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I am writing a python program to evaluate stock prices. I'm using this page on Yahoo! finance to get my stock informaiton. I want to be able to get the top five listings' stock symbol on the top gainers page.
Can someone either provide me with an example of how to get the top five stock symbols or show me how I can find the symbol element(using the data-reactid or any other meathod) using selenium preferably.
Before this is flagged as a copy, I looked at the pages similar to this, but they did not solve my problem. Thanks in advance for any help!
I personally don't have a lot of experience with Selenium, but this sounds like a job that could be handled with either BeatifulSoup's find()/findall() methods, or with scrapy's Xpath/ CSS selectors.
For a beginner, I would recommend BeautifulSoup for a task like this. it makes it easy to target the page element you're looking for (in this case the stock symbol w/ data-reactid).
Hope this helped.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I am trying to scrape this website with scrapy and I have had to search for each link extracting the information from each one, I would like to know if there is an API of the site that I can use (I don't know how to find it).
I would also like to know how I can obtain the latitude and longitude? Currently the map is shown but I do not know how to obtain the numbers
I appreciate any suggestions
The website may be loading the data dynamically using Javascript. Use your browser dev tools and look at the networking tab, look for any XHR calls which may be accessing an API. Then you can scrape from that directly.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
and thanks in advance! I was hoping someone might be able to point me in the right direction as to how to scrape a searchable online database. Here is the url: https://hord.ca/projects/eow/. If possible, I'd like to be able to access all of the data from the site's database, I'm just not sure how to access it using bs4... Maybe bs4 isn't the answer here though. Still a relatively new Pythonista, any help is greatly appreciated!
Since you are new there are going to be a combination of things you need to address, you need to have a good handle on where to look in html, make sure you understand how the site works, what does it put into its URLs, and why? what are the class names of the important bits of the site you will want to reference? and how does it handle multipage display (if it does so at all).
once you are intimate with the website you are scraping you will need to apply that knowledge when you go to make your automation.
for beginners id highly reccomend this ebook: https://automatetheboringstuff.com/
its a great read and easy to follow for even the beginner in both python and html. even better its free to read on the site!
chapter 11 is the part you are specifically looking for on webscraping. which will give you the rundown on what you need to be looking for and how to go about planning your code.
but i highly recommend you read the whole thing once you are done focusing on your current project.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
In reference towards me question, how would one be able to input data and retrieve data from various websites (not using an API)?
Is there a module that searches or acts like a human for purposes as in searching along applicably given fields; in effort to (as said before) retrieve data?
Sorry if I'm making my question hard to follow along; though if so, here's an example of what I am trying to accomplish:
Directing an AI towards a specific website.
Inputting data into the search field.
Then finally, retrieving said data after previously ran processes.
I'm fairly new to the section or field in manipulating websites via APIs or various (unknown) code; therefore, sorry if I missed anything!
You can use
mechanize,
BeautifulSoup,
Urllib,
Urllib2,
modules in Python. What I suggest you is use mechanize module. It is like scraping website through python program. More over simply a browser through python code.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I know many questions were asked in this same context but i am not able to find a generic solution(that works on most of the websites)
I want to search in a website through search box provided in them and store those links generated as a result of my search query.But all the solutions i found are for only a particular website and they even didn't store the result of search query. Any idea how can i achieve it?
Thanks
Every website is different.
for example website No.1 might have called their search parameter 'q' while website No.2 might have named their search parameter 'search'
Examples :
http://example.com/search.php?search=
http://example.com/search.php?q=
A good approach would be to store every parameter name in a dictionary and iterate over it while getting the resulting links for every page .
To exemplify , you could do
pages = {'http://example.com/search.php?':'q','http://example23.com/php_search?','search',and so on}
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am currently doing a research project and I am attempting to figure out a good way to identify ads given access to the html of a webpage.
I thought it might be a good idea to start with AdBlock. AdBlock is a program that prevents ads from being displayed to the user, so presumably it has a mechanism for identifying things as ads.
I downloaded the source code for AdBlockPlus, but I find myself completely lost in all of the files. I am not sure where to start looking for this detection mechanism, so I was wondering if anyone had any advice on where to start. Alternatively if you have dealt with AdBlock before and are familiar with it, I would appreciate any extra information.
For example, if the webpage needs to be rendered in a real browser to use Adblock, there are programs that will automate the loading of a webpage so this wouldn't be a problem but I am not sure how to figure out if this is what AdBlock does in the first place.
Note: AdBlock is written in Python and Perl :)
Thanks!
I would advise you to first have a look at writing adblock filter rules.
Then, once you get an idea of this, you can start parsing adblock lists available in various languages to suit your needs.