Extract Author Name and Abstract from python Code [closed] - python

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
In my python project, I've list of cited papers and for each paper, I need its Author Name and Abstract and Citation Count from google scholars . I was using scholarly . PyPI like this :
search_pub = scholarly.search_pubs(paperName)
docInfo = next(search_pub)
but now I'm getting this error:
Exception: Cannot fetch the page from Google Scholar.
It seems like they've blocked my IP due to multiple requests. Now I'm unable to find any other programmatic way to extract these info. I can have a list of paper references to extract data for.
Can anyone help me out with any python library or guide me to write some piece of code for this?

You can just wait for this temporary ban to expire and keep going. Make sure to insert a time.sleep(...) or similar in your code to stay under their rate limit. Google Scholar has no official API, so scraping is your only option if this is the data you want.
(I am not recommending that you scrape, and please note that Google Scholar disallows robots through their robots.txt)

Google Scholar blocks your IP if you query too much or too often. Even if you make your program sleep, do not make it sleep periodically since they can detect that too. Google considers this as DoS (Denial of Service) attack. Even if you randomise your sleep time, at one point, if you make too many queries, it will flag you. One workaround for this is using rotating proxy services. Look up online, there are plenty of free ones. They offer you User-Agent strings, which if you put randomly for every query you make, you're good to go afaik.

Related

What Python tools can I use to write a scraper of a password-protected webpage? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Suppose there is a password-protected website that I want to access to scrape some info from it and put it into a spreadsheet. For example, it could be my personal credit card account page and I would be scraping info about the latest transactions.
A variation of this would be if the site allowed to download the transaction info as a CSV file, in which case I would want to download that file.
If I want to write such scraper in Python, what packages should I use for the task? Does it depend on how a specific website is implemented, i.e. I might need one tool to scrape one site and another tool to scrape another.
Thank you
I actually did something very similar to this, but in node. Are you definitely wanting to do this in Python?
If you want to stick to Python, take a look at these modules:
BeautifulSoup
requests
Someone wrote a really awesome module combining the above two modules:
Robobrowser
If you would like to venture down the node route, take a look at this:
nightmarejs

Using Python To Interact with Webpages [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
In my current job we have a web based Business intelligence tool where every morning i have to create data extracts from the system and paste them into a PowerPoint presentation. I wish to automate this as its repetitive and time consuming (we have also had several redundancies and i have been allocated the analysts work so i also want to try get home before 10pm :)). The bottle neck for generating these reports is running them on the website and then exporting the results to excel as this manual process can take anything from 10 minutes to an hour of waiting.
I would like to create a script that will open up the web page, make the selections in listboxes containing such information as location product etc as well as a date chooser, press an apply button once the report has generated then export it. This would happen during the period when no one is in the office so the files would be ready when i come in to analysis as opposed to just generating the reports
A second smaller question, Is there a quick way to identify the listboxes using firefox or IE explorer so that they can be referenced in the code?
Is this possible in Python?
Our IT department are also quite strict so for example i can not install new software but can install libraries for Python
Could anyone point me in the direction of sample code particularly referencing listboxes or date objects?
Thank you very much for your time
All this can be done automated using selenium[1] . If you know the class name/id etc for the listboxes, selenium allows you to send click events to the browser for checking/unchecking listboxes. Read up [2] on filling HTML forms using selenium. You can find the relevant code in the documentation links below.
[1] http://selenium-python.readthedocs.org/)
[2] http://selenium-python.readthedocs.org/en/latest/navigating.html#filling-in-forms

Is there any URL info / meta data webservice API? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
Is there any known URL info API which will provide data like title, description, content-type, image etc? I researched a bit but have not come across any such API. Eventually, I have ventured out creating something like from scratch.
The consumption of such API can be for various web apps which need to display URL information. A typical real world example is Facebook using something similar when you share / attach a link in a status update.
Suggestions welcomed as it will save me the effort of maintaining such a webservice myself.
Edit:
Found two good sources which can be helpful
Protonet - The Art Of Turning URLs Into A User Readable Preview
Using YQL to do a quick fetch
Iframely web service
Edit:
After much research and to address my specific use-case, I created my own API that is hosted on Google App Engine for this. Someone looking for this, may get in touch.
There's no such API to the best of my knowledge, simply because accessing such an API wouldn't be much simpler than just fetching the page itself.
I like the idea and I think you should try to implement it and "sell" it to other web developers. Of course there will be a trust issue -- will they trust you not to lie?

Open source Twitter clone (in Ruby/Python) [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
Is there any production ready open source twitter clones written in Ruby or Python ?
I am more interested in feature rich implementations, not just bare bones twitter like messages (e.g.: APIs, FBconnect, Notifications, etc)
Thanks !
I know of twissandra which is an open source clone. Of course I doubt it meets your need of feature rich implementations.
http://github.com/rnielsen/twetter
From their readme:
Twetter is an implementation of the twitter.com API, designed for use in situations where internet access is not available but a large number of people have twitter clients and want to tell each other what they are doing, for example a RailsCamp, where it was first developed.
The current goal is to have it work with as many third party twitter clients as possible. It has currently been tested with Twitterific, TwitterFox, and Spaz on OSX.
The following open source alternative to twitter : http://identi.ca/ is written using the the software http://status.net/ . It looks like it is written in PHP too.
Also there is http://code.google.com/p/jaikuengine/ which is a microblogging platform for google app engine. This should serve as an example for python implementation.
Also look at http://www.typepad.com/go/motion/
Found two relevant projects:
http://github.com/insoshi/insoshi
http://github.com/dmitryame/echowaves/wiki
Sadly both appear discontinued

How to write a Web Service for Google App Engine? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I am really new to Python and I have been looking for an example on how to write a Web Service (XML - SOAP) in Python with Google App Engine with no luck.
Can anyone point me to an article or give me an example on how to do this?
I was curious about this myself and not finding anything I decided to try to get something to work. The short answer is that it turns out a SOAP service can actually be done using the latest alpha ZSI library. However it isn't simple and I didn't do much more than a simple request so it could fall apart with a complex type. I'll try to find time to write a tutorial on how to do it and edit this answer with more detail.
Unless this is a hard requirement I would do what jamtoday says and go with a REST or RPC service. The SOAP way could be filled with trouble.
Update: For anyone interested I've written a tutorial on how to deploy a SOAP service to the Google App Engine. It is long process so I'm just linking to it instead of pasting it all here.
If you want to do something with App Engine specifically, there are libraries that will make it much faster on your end. I'd recommend looking at the XML-RPC and REST examples.
http://appengine-cookbook.appspot.com/recipe/xml-rpc-server-using-google-app-engine/
http://github.com/fczuardi/gae-rest/tree/master
I know this is an old thread but just in case if someone happens to read this.
I have just start an open source project for creating web services on GAE.
Project site: http://code.google.com/p/webserviceservlet/
Hope this is helpful.
EDIT:
Just noticed that this is a python question and the link project is java project....
Here is a Python Web Services project that might be helpful.
EDIT
And here is a SOAP consuming demonstration....
You could take a look at the Bottle framework. It's a Python framework which with you can easily create a REST api.
In my opinion, REST is definitely better than SOAP. It can be easily consumed by any software able to speak http, and it's faster to implement.

Categories

Resources