Python Webpage Downloader [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Hey guys I need help with this please. I'm desperate.
Develop a webpage downloader program in Python using basic socket
programming, as discussed in the class. The program receives a URL pointing to the base HTML file
as a command line argument, and then downloads this base file as well as all image objects within that
file. You only need to support nonpersistent connections.
Recall that in all class projects, you must use basic socket programming for networking and not
higher-level libraries.
Hint: You can use the HTMLParser library to parse the HTML file and identify all images there. More
information about this library can be found at https://docs.python.org/2/library/htmlparser.html.

This sounds like 2 assignments. Certainly you don't expect that we heard what was "discussed in class!"
Use sockets to perform HTTP GETs
Parse HTML
Work on them independently.
Glue them together when done.

Related

Could you check it's possible (Selenium python automation + PHP) [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
Our system is developed with PHP and one of our coworkers developed Amazon automation program with Python.
I am wondering if it's possible to integrate together ?
if it is please recommend what ways i can do this
https://github.com/jasonminsookim/order_automation/blob/master/src/amzn.py
Here's code Amazon automation program
Thank you
There are lots of ways to do this, but I would weigh what you have available to you and go from there. The tempfile solution is the most general, and is a common interface pattern for any two or more languages, but you can get more exotic if performance is a major concern with pipes.
Temp-file
I guess the most rudimentary way to do this would be to have the python file output some data to a file that can be read in by php or vice versa.
Something like creating a directory called /orders where php put's in order.json files and python takes those in, reads them and gets the result, then puts it back as a order-result.json. Essentially a temp-file system to communicate between the two.
Pipes
Alternatively depending on your setup you could pipe results into php from python with something like the subprocessing module and a php CLI that interfaces with your DB.

How would I go about pulling data from a website using Python? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
In reference towards me question, how would one be able to input data and retrieve data from various websites (not using an API)?
Is there a module that searches or acts like a human for purposes as in searching along applicably given fields; in effort to (as said before) retrieve data?
Sorry if I'm making my question hard to follow along; though if so, here's an example of what I am trying to accomplish:
Directing an AI towards a specific website.
Inputting data into the search field.
Then finally, retrieving said data after previously ran processes.
I'm fairly new to the section or field in manipulating websites via APIs or various (unknown) code; therefore, sorry if I missed anything!
You can use
mechanize,
BeautifulSoup,
Urllib,
Urllib2,
modules in Python. What I suggest you is use mechanize module. It is like scraping website through python program. More over simply a browser through python code.

How to write a crawler for a desk-top [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I'd like to write a program that indexes my pdf and music files on my hard drive(not server). I plan to do this via perl or python, or both. I'll basically be writing a crawler for my desctop. The user interface will be in JavaFx, which I think quite fluent in. I've done a couple of projects in JavaFx. I have not done anything in perl/ python. I however, have done a few lines of code in them while teaching myself the syntax.
The question is what topics should I start my research in when embarking on writing a crawler. I've seen quite a number of tutorials online on crawlers but all do web page indexing. Plus what modules should I look into?
In python to find the files you can use os.walk - the examples in the help are very helpful.
Assuming that you are looking to do more than just locate the files and get their names you will need to look into getting some more information about the contents, there are python libraries that can get text from pdf files such as PDFMiner and pdfquery.
Likewise there are numerous python tools that can get you some more information on your music files.
It all depends on how you are planning on indexing them.

Storing scraped data into the database sqlite [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have used scarpy to scrap some text from a website. But I am not quite sure how to store them in sqlite?Can anyone help me with the code?
while you can find some examples that are using blocking operations to interact with the database it is worth noting that scrapy is built on top of twisted library, meaning that in its core there is only a single thread with a single loop for all operations, so when you do something like:
self.cursor.execute(...)
the entire system is waiting for a response from the database, including http requests that are waiting to be executed etc.
having said that, I suggest you'll check this code snippet https://github.com/riteshk/sc/blob/master/scraper/pipelines.py
using twisted.enterprise.adbapi.ConnectionPool is a little more complex than a simple blocking database access code but it plays well with the way scrapy uses io operations

nodejs python or twisted [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I'm new to web development and going to make a website which responses with data received from request to web-service(facebook for e.g.) and how to choose what is more useful here:
nodejs has an callback model which allows not to wait while gathering data for user from other services (but i've broken my fingers and my brain after trying to make a kind of class in javascript with inheritance and the whole server drops after unhandled error in script)
python is very convinient in working with diff. kinds of data, it's more convinient for me, former C++ developer
yesterday i've read about twisted python that also uses callbacks
Help me please to choose what to use, better - performance, simple code
The callback model might make your code more verbose but WAIT! there is a solution! Check out
waitfor.
Anyway, if it's a personal project then no one is forcing you to use node.js for webapp development.You should go with what makes you more comfortable. If you like developing in python then go for it! :)
why don't you try django; it uses python (which you said is more convenient) and is also very commonly used for web development.

Categories

Resources