No module found ISBNLib using python3 and cgi - python

I am trying to teach myself for fun how to use cgi and run into the error "isbnlib module not found". the module is definitely installed and a similar scripts in that directory that uses isbnlib runs fine. What could be generating this error and why?
my code
#!/usr/bin/python3
print("Content-type: text/html\r\n\r\n")
import cgitb
cgitb.enable()
import cgi
import os
from os import system, name
#from googlesearch import search
import re
import time
import random
from isbnlib import isbnlib
# Create instance of FieldStorage
form = cgi.FieldStorage()
# Get data from fields
gtitle = form.getvalue('Title')
#Retrieve ISBN and metadata for the book
isbn, meta = ibsnlib.goom(gtitle)
#Extract the author and title from the metadata
author = meta.get("Authors")
title = meta.get("Title")
#make html code
html = """
<title>Book Maker</title>
<html><body>
<h1 style='background-color:black;color:white;'>
Generating author and title from Title
</h1>
<p style='background-color:black;color:white;'>
<p>ISBN: {}</p>
<p>Author: {}</p>
<p>Title: {}</p>
</p>
</body>
</html>
""".format(isbn, author, title)
print(html)

This behavior can be triggered if you need to run a code on an environment that need to be a root/admin and you don't have installed the library as root/admin.
So, just run sudo pip install isbnlib and you will get your results.

Related

Save image/table from a webpage while scraping

I would need to scrape a image from this website: https://web.archive.org/web/
for example for stackoverflow, towardsdatascience.
URL
stackoverflow.com
towardsdatascience.com
I do not know how to include information on the table/image within
<div class="sparkline" style="width: 1225px;"><div id="wm-graph-anchor"><div id="wm-ipp-sparkline" title="Explore captures for this URL" style="height: 77px;"><canvas class="sparkline-canvas" width="1225" height="75" alt="sparklines"></canvas></div></div><div id="year-labels"><span class="sparkline-year-label">1996</span><span class="sparkline-year-label">1997</span><span class="sparkline-year-label">1998</span><span class="sparkline-year-label">1999</span><span class="sparkline-year-label">2000</span><span class="sparkline-year-label">2001</span><span class="sparkline-year-label">2002</span><span class="sparkline-year-label">2003</span><span class="sparkline-year-label">2004</span><span class="sparkline-year-label">2005</span><span class="sparkline-year-label">2006</span><span class="sparkline-year-label">2007</span><span class="sparkline-year-label">2008</span><span class="sparkline-year-label">2009</span><span class="sparkline-year-label">2010</span><span class="sparkline-year-label">2011</span><span class="sparkline-year-label">2012</span><span class="sparkline-year-label">2013</span><span class="sparkline-year-label">2014</span><span class="sparkline-year-label">2015</span><span class="sparkline-year-label">2016</span><span class="sparkline-year-label">2017</span><span class="sparkline-year-label">2018</span><span class="sparkline-year-label">2019</span><span class="sparkline-year-label selected-year">2020</span></div></div>
i.e. the image where the timeline is shown through years.
I would like to save per each website this image/table, if possible.
I tried to write some code, but it misses this part:
import json
import requests
def my_function(file):
urls = list(set(file.URL.tolist()))
df_url= pd.DataFrame(columns=['URL'])
df_url['URL']=urls
api_url = 'https://web.archive.org/__wb/search/metadata'
for url in df_url['URL']:
res = requests.get(api_url, params={'q': url})
# part to scrape the image
return
my_function(df)
Can you give me some input on how to get those images?
If you have each image URL in the for loop, you can download the images using python library urllib.request function urlretrive:
First import it at the beginning of the script using
import os
from urllib.parse import urlparse
import urllib.request
And then download them using
for url in df_url['URL']:
urllib.request.urlretrieve(url,os.path.basename(urlparse(url).path))
If you don't to save using URL basename, then don't make first 2 imports.

HTML form return displays python code instead of executing

Coming from only basic front end experience here and running into trouble using method="get" on my python script.
The entire python code is returned to the browser instead of just the print statement. I am using python SimpleHTTPServer and expect that I may be missing some configuration, but I am having quite a bit of trouble determining a solution.
Here is the HTML:
<form name="search" action="\cgi-bin/test.py" method="get">
Search: <input type="text" name="searchbox">
<input type="submit" value="Submit">
</form>
Here is the Python (and also what gets returned to the browser when the form is submitted):
#!/usr/bin/env python3
import cgi
form = cgi.FieldStorage()
searchterm = form.getvalue('searchbox')
print(searchterm)
I know this is probably pretty basic, but I am stumped. I appreciate any guidance to be offered.
If you use Python 2 then you should use CGIHTTPServer instead of SimpleHTTPServer
python2 -m CGIHTTPServer
If you use Python 3 then you should use http.server --cgi
python3 -m http.server --cgi
And code has to be executable.
On Linux you do:
chmod +x cgi-bin/test.py
On Windows you have to assign extension .py to python. I don't use Windows to give more info.
Script has to send information what type of data it sends - text, HTML, image, PDF, Excel, etc. - and empty line which separates header and body.
print("Content-Type: text/html")
print() # empty line beetwin header and body
print(searchterm)
Without this information it may send it as file for downloading.
If you want to display text in console then you may have to use "standard error" because "standard output" and print() is send to browser
import sys
sys.stderr.write(searchterm + '\n')
#!/usr/bin/env python3
import cgi
import sys
form = cgi.FieldStorage()
searchterm = form.getvalue('searchbox')
# send text on console
sys.stderr.write('searchterm: ' + searchterm + '\n')
# send text to browser
#print("Content-Type: text/plain") # send all as text
print("Content-Type: text/html") # send all as HTML
print()
print(searchterm)

Unable to write data from StreamSets Jython Evaluator

I am trying to read data from directory and trying to parse that data and finally trying to write it to another directory.
for this i am using Jython Evaluator. Here is my code:
import sys
sys.path.append('/usr/lib/python2.7/site-packages')
import feedparser
for record in records:
myfeed = feedparser.parse(str(record))
for item in myfeed['items']:
title = item.title
link = item.link
output.write(record)
I am able to write data to output, but my requirement is write title and link which are parsed from input record.
Here is my code snippet:
any suggestions please.
Thanks in advance.
You need to write the values to the record, see below where we are adding the record value and assigning title and link respectively.
import sys
sys.path.append('/usr/lib/python2.7/site-packages')
import feedparser
for record in records:
myfeed = feedparser.parse(str(record))
for item in myfeed['items']:
record.value["title"] = item.title
record.value["link"] = item.link
output.write(record)

spynner doesn't load XHR data

I'm building a script to monitor a reporting service. Depending on how it takes to process the report the report appears in HTML or comes via XmlHttpRequest.
As a tool to check the page I want to use spynner, which works perfect for HTML, but it seems that I can't get it to work when the data comes via XHR.
The code for the test is the following:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
__docformat__ = 'restructuredtext en'
from time import sleep
from spynner import browser
import pyquery
from PyQt4.QtCore import QUrl
from PyQt4.QtNetwork import QNetworkRequest, QNetworkAccessManager
from PyQt4.QtCore import QByteArray
def load_page(br):
ret = br.load_jquery(True)
print ret
return 'Japan' in br.html
br = browser.Browser(
debug_level=4
)
br.load('https://foobar.eu/newton/cgi-bin/cognos.cgi')
br.create_webview()
br.show()
#br.load("https://foobar.eu/newton/cgi-bin/cognos.cgi?b_action=xts.run&m=portal/cc.xts&m_folder=iA37B5BBC0615469DA37767D2B6F1DCF1")
#br.browse()
res = br.load("https://foobar.eu:443/newton/cgi-bin/cognos.cgi?b_action=cognosViewer&ui.action=run&ui.object=/content/folder[#name='DMA Admin Zone']/folder[#name='02. Performance Benchmark Module']/folder[#name='1. Reports']/report[#name='CQM_Test_3_HTML_Heavy_Local_Processing_Final']&ui.name=CQM_Test_3_HTML_Heavy_Local_Processing_Final&run.outputFormat=&run.prompt=true", 1, wait_callback=load_page)
d = str(pyquery.PyQuery(br.html))
if d.find("Japan") > -1:
print 'We discovered Japan!'
else:
print 'Japan is nowhere to be seen!'
sleep(10)
The URL in the comments is a page which contains a link to the report. When I click the report by hand the report works (via XHP). However, I can't seem to get it to work via scripting.
The br.load_jquery always returns None.
As a help I have added part of the spynner debug trace when I click the link by hand: http://fpaste.org/97583/13987135/
In firebug I can clearly see the XHP reponse with the string 'Japan' in.
What am I missing?
apparantly replacing the load page function with the following code makes it work:
def load_page(br):
br.wait(5)
return 'Japan' in br.html

upload image to folder using python

I want to browse image & upload it to folder using python. I have tried a variety of solutions posted on on the forum but none of them worked in my case. Please guide me on what needs to be corrected. Thanks all for your quick help.
I'm getting error
raise AttributeError(attr)
AttributeError: has_key
#!/usr/bin/env python
import cgi, os
import cgitb; cgitb.enable()
import cgi
import datetime
import webapp2
import cgi, os
import cgitb; cgitb.enable()
from google.appengine.ext import ndb
from google.appengine.api import users
guestbook_key = ndb.Key('Guestbook', 'default_guestbook')
class Greeting(ndb.Model):
author = ndb.UserProperty()
content = ndb.TextProperty()
date = ndb.DateTimeProperty(auto_now_add=True)
class MainPage(webapp2.RequestHandler):
def get(self):
self.response.out.write('<html><body>')
greetings = ndb.gql('SELECT * '
'FROM Greeting '
'WHERE ANCESTOR IS :1 '
'ORDER BY date DESC LIMIT 10',
guestbook_key)
for greeting in greetings:
if greeting.author:
self.response.out.write('<b>%s</b> wrote:' % greeting.author.nickname())
else:
self.response.out.write('An anonymous person wrote:')
self.response.out.write('<blockquote>%s</blockquote>' %
cgi.escape(greeting.content))
self.response.out.write("""
<form enctype="multipart/form-data" action="/sign" method="post">
<p>File: <input type="file" name="file1"></p>
<p><input type="submit" value="Upload"></p>
</form>
</html>""")
class Guestbook(webapp2.RequestHandler):
def post(self):
form = cgi.FieldStorage()
# A nested FieldStorage instance holds the file
#file = models.FileField(upload_to='documents/', max_length=5234,blank=True, null=True,)
# docfile = forms.FileField(label='', show_hidden_initial='none',required=True,)
fileitem = str(self.request.get('file1'))
# Test if the file was uploaded
if self.request.has_key('file1'):
# strip leading path from file name to avoid directory traversal attacks
fn = os.path.basename(fileitem.file)
open('files/' + fn, 'wb').write(fileitem.file.read())
message = 'The file "' + fn + '" was uploaded successfully'
else:
message = 'No file was uploaded'
print """\
Content-Type: text/html\n
<html><body>
<p>%s</p>
</body></html>
""" % (message,)
app = webapp2.WSGIApplication([
('/', MainPage),
('/sign', Guestbook)
], debug=True)
You need to stop here and go back and read the introductory documentation on appengine and the python runtime. If you read through the intro docs you will see the section on the python runtime and the sandbox and it's restrictions.
On examining that section of documentation you will see you can not write to the filesystem in appengine. It is also worth noting the other restrictions whilst you are at it.
As to where in the code your error is, you should at least include a stacktrace and look at the particular lines of code where the error occurs and then ask specific questions about that rather than dump all of your code and saying what error you got.
At the moment I don't see a lot of point looking at the problem in your code where the has_key error occurs, that error is self explanatory and the rest of what you are trying to do just won't work anyway.
Your GAE Python project files are read only. You can only change those files when you update your project using appcfg.py or push-to-deploy.
But you can use Google cloudstorage folders or subdirectories to upload, write or overwrite files.
Docs: https://developers.google.com/appengine/docs/python/googlecloudstorageclient/
If you use the appid default bucket for your folders, you have 5 GB of free quota.

Categories

Resources