Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have used scarpy to scrap some text from a website. But I am not quite sure how to store them in sqlite?Can anyone help me with the code?
while you can find some examples that are using blocking operations to interact with the database it is worth noting that scrapy is built on top of twisted library, meaning that in its core there is only a single thread with a single loop for all operations, so when you do something like:
self.cursor.execute(...)
the entire system is waiting for a response from the database, including http requests that are waiting to be executed etc.
having said that, I suggest you'll check this code snippet https://github.com/riteshk/sc/blob/master/scraper/pipelines.py
using twisted.enterprise.adbapi.ConnectionPool is a little more complex than a simple blocking database access code but it plays well with the way scrapy uses io operations
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
In reference towards me question, how would one be able to input data and retrieve data from various websites (not using an API)?
Is there a module that searches or acts like a human for purposes as in searching along applicably given fields; in effort to (as said before) retrieve data?
Sorry if I'm making my question hard to follow along; though if so, here's an example of what I am trying to accomplish:
Directing an AI towards a specific website.
Inputting data into the search field.
Then finally, retrieving said data after previously ran processes.
I'm fairly new to the section or field in manipulating websites via APIs or various (unknown) code; therefore, sorry if I missed anything!
You can use
mechanize,
BeautifulSoup,
Urllib,
Urllib2,
modules in Python. What I suggest you is use mechanize module. It is like scraping website through python program. More over simply a browser through python code.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Hey guys I need help with this please. I'm desperate.
Develop a webpage downloader program in Python using basic socket
programming, as discussed in the class. The program receives a URL pointing to the base HTML file
as a command line argument, and then downloads this base file as well as all image objects within that
file. You only need to support nonpersistent connections.
Recall that in all class projects, you must use basic socket programming for networking and not
higher-level libraries.
Hint: You can use the HTMLParser library to parse the HTML file and identify all images there. More
information about this library can be found at https://docs.python.org/2/library/htmlparser.html.
This sounds like 2 assignments. Certainly you don't expect that we heard what was "discussed in class!"
Use sockets to perform HTTP GETs
Parse HTML
Work on them independently.
Glue them together when done.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Can I use urllib2 to open a webpage which contain video (like vimeo page) and this visit will be counted as view?
In general, yes. A request done with urllib2 will be a normal HTTP request and as such will be recognized as a normal “visit” for the server you are connecting to. Depending on what additional headers you set, you can even make yourself look like a common browser, so they won’t be able to filter you out either.
As far as video counts go however, I’m pretty sure that simply visiting the site—without executing any code on it, and without actually playing the video—will not increase the view counter. In addition, these sites employ some systems to prevent abuse of the counter too. So if you have the hope to be able to spoof real views and increment the view counter by repeatedly visiting the page, then you will be out of luck.
As for actually playing—if you are interested in the content instead of the view counter—then yes, you can use Python to get access to the video. Of course Python won’t be able to play it, but you can download it instead. There are scripts like this one that already do this for you too.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am trying to build a client side chat which can send and read messages simultaneously.
One problem is that when I write a message, if someone else sends something its disrupt the message I am writing.
Another problem is the raw_input which blocks the user from reading new messages.
I tried to fix this problem by using msvcrt which causes another problem (I cant see the message I am writing and edit it).
How can I fix those 3 problem?
===>edit: Without using threads.
I think you may need asynchronous sockets...that will give you ability to handle sending and receiving in a single thread.
Look here for asynchronous sockets in python. This will let you code it "bare bones" (i.e. keep most of your code and just use the sockets).
Another option is to use Twisted. This has some complications, it is a complete framework, but it gives you a lot of lift.
You can also try multi-threading. This is not trivial to do, however.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I have a python script written up and the output of this script is a list . Right now I need to get it online and make it accesible to others. I looked at Django , but then I realised that it may be kind of hard to create the UI. Is there any simple way to create a UI in Django and map it to an existing python script. Right now I am not using sql and things like that. Or is there a simpler way by which I can proceed?
I'd go with Flask or web.py.
Django pays off if you develop a large app; yours is not.
Probably all you need is two pages: one with an input form, and another with results. As long as your input is text, you should have little trouble taking input from a POST handler and passing it as is to your script.
Both microframeworks have tutorials: here's web.py's, and Flask's is right on the home page. Should get you started very quickly.