wikitools parsing error - python

I'm using wikitools package to parse the wikipedia. I just copy this example from documentation. But its not working. When I run this code. I get following error.
Invalid JSON,trying requesting again. Can you please help me ? thanks
from wikitools import wiki
from wikitools import api
# create a Wiki object
site = wiki.Wiki("http://my.wikisite.org/w/api.php")
# define the params for the query
params = {'action':'query', 'titles':'Papori'}
# create the request object
request = api.APIRequest(site, params)
# query the API
result = request.query()

The "http://my.wikisite.org/w/api.php" is only an example, there is no MediaWiki under that domain. Try with "http://en.wikipedia.org/w/api.php" which searches in the English Wikipedia.

Related

How to fix the tainted source of data in Coverity issue

Trying to read the URL from json file which in Coverity report shows as taint (untrusted source of data). And the issue is called as URL Manipulation where I used the URL attribute from json.
Can anyone suggest wasys to mitigate the URL Manipulation error in Coverity report.
It means you need to parse/validate the url string.
You can do this in a number of ways - either with your own regex, or with purpose-built libraries (urllib, validators).
For example:
from urllib.parse import urlparse
URL_TO_TEST = "https:/www.google.com"
result = urlparse(URL_TO_TEST)
if (result.scheme and result.netloc) is False:
raise ValueError("Invalid url string")
print(f"url: '{URL_TO_TEST}' is valid")
The Snyk page for the same type of issue provides some good info.

Store RDF data into Triplestore via SPARQL endpoint using python

I am trying to save data in the following url as triples into triples store for future query. Here are my code:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import re
url='http://gnafld.net/address/?per_page=10&page=7'
page = requests.get(url)
response = requests.get(url)
response.raise_for_status()
results = re.findall('\"Address ID: (GAACT[0-9]+)\"', response.text)
address1=results[0]
a = "http://gnafld.net/address/"
new_url = a + address1
r = requests.get(new_url).content
print(r)
After I run the code above, I got the answer like:
enter image description here
My question is how to insert the RDF data to a Fuseki Server SPARQL endpoint? I try the code like this:
import rdflib
from rdflib.plugins.stores import sparqlstore
#the following sparql endpoint is provided by the GNAF website
endpoint = 'http://gnafld.net/sparql'
store = sparqlstore.SPARQLUpdateStore(endpoint)
gs=rdflib.ConjunctiveGraph(store)
gs.open((endpoint,endpoint))
for stmt in r:
gs.add(stmt)
But it seems that it does not work. How can I fix this problem? Thanks for your help!
The answer you show in the image is in RDF triple format, it is just not pretty printed.
To store the RDF data in an RDF store you can use RDFlib. Here is an example of how to do that.
If you use Jena Fuseki server you should be able to access it from python just as you access any other SPARQL endpoint from python.
You may want to see my answer to a related SO question as well.

Connecting to YouTube API and download URLs - getting KeyError

My goal is to connect to Youtube API and download the URLs of specific music producers.I found the following script which I used from the following link: https://www.youtube.com/watch?v=_M_wle0Iq9M. In the video the code works beautifully. But when I try it on python 2.7 it gives me KeyError:'items'.
I know KeyErrors can occur when there is an incorrect use of a dictionary or when a key doesn't exist.
I have tried going to the google developers site for youtube to make sure that 'items' exist and it does.
I am also aware that using get() may be helpful for my problem but I am not sure. Any suggestions to fixing my KeyError using the following code or any suggestions on how to improve my code to reach my main goal of downloading the URLs (I have a Youtube API)?
Here is the code:
#these modules help with HTTP request from Youtube
import urllib
import urllib2
import json
API_KEY = open("/Users/ereyes/Desktop/APIKey.rtf","r")
API_KEY = API_KEY.read()
searchTerm = raw_input('Search for a video:')
searchTerm = urllib.quote_plus(searchTerm)
url = 'https://www.googleapis.com/youtube/v3/search?part=snippet&q='+searchTerm+'&key='+API_KEY
response = urllib.urlopen(url)
videos = json.load(response)
videoMetadata = [] #declaring our list
for video in videos['items']: #"for loop" cycle through json response and searches in items
if video['id']['kind'] == 'youtube#video': #makes sure that item we are looking at is only videos
videoMetadata.append(video['snippet']['title']+ # getting title of video and putting into list
"\nhttp://youtube.com/watch?v="+video['id']['videoId'])
videoMetadata.sort(); # sorts our list alphaetically
print ("\nSearch Results:\n") #print out search results
for metadata in videoMetadata:
print (metadata)+"\n"
raw_input('Press Enter to Exit')
The problem is most likely a combination of using an RTF file instead of a plain text file for the API key and you seem to be confused whether to use urllib or urllib2 since you imported both.
Personally, I would recommend requests, but I think you need to read() the contents of the request to get a string
response = urllib.urlopen(url).read()
You can check that by printing the response variable

generating RSS feed...django/python

I'm not using regular model, so can't use Django's syndication framework. So, i did used the low-level syndication util called feedgenerator to generate RSS feeds like shown below.
feed = feedgenerator.Rss201rev2Feed(title=_("Feed by %s") % user.username,
link="http://%s" % DOMAIN_NAME,
description=_("RSS Feed provided by something.com"),
language=user.language,
author_name=user.full_name,
feed_url="something")
for note in ObjectModel.published_objects.filter(user=user):
feed.add_item(title=note.title,
link="",
pubDate=note.created,
description=note.note)
response = HttpResponse(feed.writeString('UTF-8'), mimetype='application/rss+xml')
return response
However, I couldn't find good example how i can return this as Response type.
response = HttpResponse(feed.writeString('UTF-8'), mimetype='application/rss+xml')
Apparently, Above code seems not right cause the browser does not recognizes as RSS feed. Could someone tell me what I should do to fix this problem?
This is just working fine. Was recognizable by browser.

Using the RESTful interface to Google's AJAX Search API for "Did you mean"?

Is it possible to get spelling/search suggestions (i.e. "Did you mean") via the RESTful interface to Google's AJAX search API? I'm trying to access this from Python, though the URL query syntax is all I really need.
Thanks!
the Google AJAX API don't have a spelling check feature see this, you can use the SOAP service but i think it's no longer available .
at last you can look at yahoo API they have a feature for spelling check.
EDIT : check this maybe it can help you:
import httplib
import xml.dom.minidom
data = """
<spellrequest textalreadyclipped="0" ignoredups="0" ignoredigits="1" ignoreallcaps="1">
<text> %s </text>
</spellrequest>
"""
word_to_spell = "gooooooogle"
con = httplib.HTTPSConnection("www.google.com")
con.request("POST", "/tbproxy/spell?lang=en", data % word_to_spell)
response = con.getresponse()
dom = xml.dom.minidom.parseString(response.read())
dom_data = dom.getElementsByTagName('spellresult')[0]
for child_node in dom_data.childNodes:
result = child_node.firstChild.data.split()
print result
if you're just looking for spelling suggestions you might want to check out something like Wordnik: http://docs.wordnik.com/api/methods

Categories

Resources