How to access search box of any website using python? [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I know many questions were asked in this same context but i am not able to find a generic solution(that works on most of the websites)
I want to search in a website through search box provided in them and store those links generated as a result of my search query.But all the solutions i found are for only a particular website and they even didn't store the result of search query. Any idea how can i achieve it?
Thanks

Every website is different.
for example website No.1 might have called their search parameter 'q' while website No.2 might have named their search parameter 'search'
Examples :
http://example.com/search.php?search=
http://example.com/search.php?q=
A good approach would be to store every parameter name in a dictionary and iterate over it while getting the resulting links for every page .
To exemplify , you could do
pages = {'http://example.com/search.php?':'q','http://example23.com/php_search?','search',and so on}

Related

Is there any way to build a scraper that fetches contact info from different websites with different structures using python? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I need to build a scraper that fetches contact info from different websites with different structures using python. I have tried doing it but since websites have different structures code doesn't for all.
Is this doable or should I need to write code for each website idiviadually?
There is no direct way of getting information from different websites, due to different architectures on how the websites are built. It is very rare that different websites have the same class values and IDs used.

what is the better way to get the information from this website with scrapy? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I am trying to scrape this website with scrapy and I have had to search for each link extracting the information from each one, I would like to know if there is an API of the site that I can use (I don't know how to find it).
I would also like to know how I can obtain the latitude and longitude? Currently the map is shown but I do not know how to obtain the numbers
I appreciate any suggestions
The website may be loading the data dynamically using Javascript. Use your browser dev tools and look at the networking tab, look for any XHR calls which may be accessing an API. Then you can scrape from that directly.

How do I get the latest message sent on a group chat on whatsapp web as a string using python [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
Pretext:
Im trying to make an app that automates my zoom classes.
the links to which are shared on a whatsapp group that i can open with whatsapp web.
What I want to do exactly:
I want to take the latest message in the group chat check if it is text, then check if it contains a link.
If it does I want to extract the link and assign it to a variable, and finally open it in a browser and start the class
the problem is I have no idea how to get the latest message that is
sent in the group chat
Please Help!
Put in an array all the WebElements with the span tag.
Cycle trough it checking if a span has a data-icon property.
If you find it cycle trough his children using JS until u find an a tag with a href property. Save it and break.
To find the last one, just cycle in the reverse order.

How would I go about pulling data from a website using Python? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
In reference towards me question, how would one be able to input data and retrieve data from various websites (not using an API)?
Is there a module that searches or acts like a human for purposes as in searching along applicably given fields; in effort to (as said before) retrieve data?
Sorry if I'm making my question hard to follow along; though if so, here's an example of what I am trying to accomplish:
Directing an AI towards a specific website.
Inputting data into the search field.
Then finally, retrieving said data after previously ran processes.
I'm fairly new to the section or field in manipulating websites via APIs or various (unknown) code; therefore, sorry if I missed anything!
You can use
mechanize,
BeautifulSoup,
Urllib,
Urllib2,
modules in Python. What I suggest you is use mechanize module. It is like scraping website through python program. More over simply a browser through python code.

Identify main text in website article [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I would like to know if there is some tool that given a url to a blog/webpage it identifies and extract the main text. Because an article page, say a blog post, may have different parts of text, one of this part is the article itself. There is a way to identify and extract it?
Thank you.
There are three steps for this problem:
Retrieve the data from the URL
Extract article text (removing ads ...)
Summarize the text
1 is easily done with Python urllib2.urlopen.
If you know the structure of the web site (main HTML tags and such) this can be easily done with tools such as BeautifulSoup. Removing ads in generic way is a bigger subject - you can find some research on the subject online.
Creating a summary by extracting sentences is well known field. I think NLTK has some modules to do that. You can even take a look at a simple (and effective) approach I wrote a while back.
You could use an AJAX call to grab the content, but you have to be on the same domain. You can't copy someone else's content.
Alternatively, grab it with PHP using $content = file_get_contents('{filename}'); and then use the html tag (e.g. '<section>') to split it.
What are you using it for? Because if it is your content, I would use ajax and always put the content you want to grab in a tag with a specific class assigned. If it is someone else's content then you might want to ask their permission first.

Categories

Resources