I am using python to get the main category of a wiki page by constantly picking the first one in the category list. However, when I wrote the python code to do recursion, it kept returning the first argument I parse in even though I try to change it in the method.
import csv
from bs4 import BeautifulSoup
import urllib.request
string_set=[]
def get_first_category(url):
k=urllib.request.urlopen(url)
soup=BeautifulSoup(k)
s=soup.find_all('a')
for i in s:
string_set.append(i.string)
for i in range(-len(string_set), 0):
if string_set[i] == ("Categories"):
return (string_set[i + 1])
def join_with(k):
return k.replace(" ","_")
def get_category_page(k):
p=["https://en.wikipedia.org/wiki/Category:",k]
return "".join(p)
def return_link(url):
return get_category_page(join_with(get_first_category(url)))
file=open("Categories.csv")
categories=csv.reader(file)
categories=zip(*categories)
def find_category(url):
k=get_first_category(url)
for i in categories:
if k in i:
return [True,i[0]]
return [False,k]
def main(url):
if find_category(url)[0]:
return find_category(url)[1]
else:
print(find_category(url)[1])
return main(return_link(url))
print (main('https://en.wikipedia.org/wiki/Category:International_charities'))
The category csv is shared:
Categories.csv
Ideally, the main method should keep going to the first category link until it meets something that is in categories.csv, but it just keep going to the link I parsed in.
def main(url):
if find_category(url)[0]:
return find_category(url)[1]
else:
print(find_category(url)[1])
return main(return_link(url))
Here, you call the find_category function which is supposed to return the new url, then you print it, and then you call main with the original url again. You do not change the value of url based on the return value of find_category. So it keeps repeating itself.
Related
This question already has answers here:
How can I get the source code of a Python function?
(13 answers)
Closed 1 year ago.
I am trying to figure out how to only get the source code of the body of the function.
Let's say I have:
def simple_function(b = 5):
a = 5
print("here")
return a + b
I would want to get (up to indentation):
"""
a = 5
print("here")
return a + b
"""
While it's easy in the case above, I want it to be agnostic of decorators/function headers, etc. However, still include inline comments. So for example:
#decorator1
#decorator2
def simple_function(b: int = 5):
""" Very sophisticated docs
"""
a = 5
# Comment on top
print("here") # And in line
return a + b
Would result in:
"""
a = 5
# Comment on top
print("here") # And in line
return a + b
"""
I was not able to find any utility and have been trying to play with inspect.getsourcelines for few hours now, but with no luck.
Any help appreciated!
Why is it different from How can I get the source code of a Python function?
This question asks for a whole function source code, which includes both decorators, docs, def, and body itself. I'm interested in only the body of the function.
I wrote a simple regex that does the trick. I tried this script with classes and without. It seemed to work fine either way. It just opens whatever file you designate in the Main call, at the bottom, rewrites the entire document with all function/method bodies doc-stringed and then save it as whatever you designated as the second argument in the Main call.
It's not beautiful, and it could probably have more efficient regex statements. It works though. The regex finds everything from a decorator (if one) to the end of a function/method, grouping tabs and the function/method body. It then uses those groups in finditer to construct a docstring and place it before the entire chunk it found.
import re
FUNC_BODY = re.compile(r'^((([ \t]+)?#.+\n)+)?(?P<tabs>[\t ]+)?def([^\n]+)\n(?P<body>(^([\t ]+)?([^\n]+)\n)+)', re.M)
BLANK_LINES = re.compile(r'^[ \t]+$', re.M)
class Main(object):
def __init__(self, file_in:str, file_out:str) -> None:
#prime in/out strings
in_txt = ''
out_txt = ''
#open resuested file
with open(file_in, 'r') as f:
in_txt = f.read()
#remove all lines that just have space characters on them
#this stops FUNC_BODY from finding the entire file in one shot
in_txt = BLANK_LINES.sub('', in_txt)
last = 0 #to keep track of where we are in the file
#process all matches
for m in FUNC_BODY.finditer(in_txt):
s, e = m.span()
#make sure we catch anything that was between our last match and this one
out_txt = f"{out_txt}{in_txt[last:s]}"
last = e
tabs = m.group('tabs') if not m.group('tabs') is None else ''
#construct the docstring and inject it before the found function/method
out_txt = f"{out_txt}{tabs}'''\n{m.group('body')}{tabs}'''\n{m.group()}"
#save as requested file name
with open(file_out, 'w') as f:
f.write(out_txt)
if __name__ == '__main__':
Main('test.py', 'test_docd.py')
EDIT:
Apparently, I "missed the entire point" so I wrote it again a different way. Now you can get the body while the code is running and decorators don't matter, at all. I left my other answer here because it is also a solution, just not a "real time" one.
import re, inspect
FUNC_BODY = re.compile('^(?P<tabs>[\t ]+)?def (?P<name>[a-zA-Z0-9_]+)([^\n]+)\n(?P<body>(^([\t ]+)?([^\n]+)\n)+)', re.M)
class Source(object):
#staticmethod
def investigate(focus:object, strfocus:str) -> str:
with open(inspect.getsourcefile(focus), 'r') as f:
for m in FUNC_BODY.finditer(f.read()):
if m.group('name') == strfocus:
tabs = m.group('tabs') if not m.group('tabs') is None else ''
return f"{tabs}'''\n{m.group('body')}{tabs}'''"
def decorator(func):
def inner():
print("I'm decorated")
func()
return inner
#decorator
def test():
a = 5
b = 6
return a+b
print(Source.investigate(test, 'test'))
I have the following python code:
import requests
def requestAPI(url):
return requests.get(url=url).json()
UselessFact = RequestApi("https://uselessfacts.jsph.pl/random.json?language=en")['text']
I wanted to put a try/except on the requestAPI function so it does'nt break the code. I thought about this:
import requests
def requestAPI(url, keys):
return requests.get(url=url).json() #Here is the struggle with passing the "keys" parameter into the return
UselessFact = RequestApi("https://uselessfacts.jsph.pl/random.json?language=en", ['text'])
I could do something like:
import requests
def requestAPI(url):
try:
return requests.get(url=url).json()
except:
return False
UselessFact = RequestApi("https://uselessfacts.jsph.pl/random.json?language=en")['text'] if (condition here) else False
But i think there's a better way of doing this.
You can achieve it without a try-except via dict.get():
def requestAPI(url, key):
return requests.get(url=url).json().get(key, None)
This will return the value for key key if it exists in the JSON otherwise it will return None. If you want it to return False, do .get(key, False).
easiest way to explain this one:
import unittest
from element import Element
class TestHTMLGen(unittest.TestCase):
def test_Element(self):
page = Element("html", el_id=False)
self.assertEqual(page, Element("html", el_id=False)) # this is where I need help
I get the following error:
AssertionError: <element.Element object at 0x025C1B70> != <element.Element object at 0x025C1CB0>
I know the objects are not exactly the same but is there any way to check that they are equal? I would think that assertEqual would work.
edit: I am working with the addTypeEqualityFunc. However, I am still having trouble
def test_Element(self):
page = Element("html", el_id=False)
self.addTypeEqualityFunc(Element, self.are_elements_equal)
self.assertEqual(page, Element("html", el_id=False))
def are_elements_equal(self, first_element, second_element, msg=None):
print first_element.attribute == second_element.attribute
return type(first_element) is type(second_element) and first_element.tag == second_element.tag and first_element.attribute == second_element.attribute
This is the output I get:
False
and it says the test passed. It should not pass because first_element.attribute is not equal to second_element.attribute. Furthermore, even if I just have return false for are_elements_equal, the test still passes.
Solution:
import unittest
from element import Element
class TestHTMLGen(unittest.TestCase):
def test_Element(self):
page = Element("html", el_id=False)
self.addTypeEqualityFunc(Element, self.are_elements_equal)
self.assertEqual(page, Element("html", el_id=False)) # this is where I need help
def are_elements_equal(self, first_element, second_element, msg=None):
self.assertEqual(type(first_element), type(second_element))
self.assertEqual(first_element.tag, second_element.tag)
self.assertEqual(first_element.attribute, second_element.attribute)
however, a lot of times self.assertEqual(vars(page), vars(Element("html", el_id=False))) will do the trick
edit: also, I should add. I made a cool little function that can check if objects are equal. Should work in most cases.
def are_elements_equal(self, first_element, second_element, msg=None):
self.assertEqual(type(first_element), type(second_element))
try:
type(vars(first_element)) is dict
except:
self.assertEqual(first_element, second_element)
else:
for i in vars(first_element).keys():
try:
type(vars(vars(first_element)[i])) is dict
except:
if type(vars(first_element)[i]) is list:
for j in range(len(vars(first_element)[i])):
self.are_elements_equal(vars(first_element)[i][j], vars(second_element)[i][j])
else:
self.assertEqual(vars(first_element)[i], vars(second_element)[i])
else:
self.are_elements_equal(vars(first_element)[i], vars(second_element)[i])
Use vars():
Return the dict attribute for a module, class, instance, or any other object with a dict attribute.
self.assertEqual(vars(page), vars(Element("html", el_id=False)))
I have a function:
def search_result(request):
if request.method =='POST':
data = request.body
qd = QueryDict(data)
place = qd.values()[2]
indate = qd.values()[3]
outdate = qd.values()[0]
url = ('http://terminal2.expedia.com/x/mhotels/search?city=%s&checkInDate=%s&checkOutDate=%s&room1=2&apikey=%s') %(place, indate, outdate, MY_API_KEY)
req = requests.get(url).text
json_data = json.loads(req)
results = []
for hotels in json_data.get('hotelList'):
results.append(hotels.get('localizedName'))
return HttpResponse(results)
now I want to use func1's return within other function to render template something like this:
def search_page(request):
r = search_result(request)
d = r.content
return render(request,'search.html', {'res':d})
and this actually do not work.
Does any way exist to do what I want (without using class)?
I make post via ajax from template form and my first function works properly and prints result in console. The problems occurs when I try use my response in next function to render it in template. Thats why I ask to help me. Have you any ideas to make my response from first function visible for another function?
You have defined func1 to take the request parameter, but when you call it in your second function you do not pass it any arguments.
If you pass in request it should work.
EDIT: You are looking for the results, so I suppose you can just return them instead of the HttpResponse (maybe we need more information on what you are trying to accomplish)
def func1(request):
......
results = []
for hotels in json_data.get('hotelList'):
results.append(hotels.get('localizedName'))
return results
def funk2(request):
f = funk1(request)
return render(request,'search.html', {'res':f})
I keep getting error: "unhashable type: list" for line (routing_table[key][0] = [[params], func]. I'm attempting to pass a url and function into a route function. This route function should pick out the parameters for other functions by use of regular expressions. The ultimate goal is to read in "/page/<page_id>, then pick out <page_id> and replace that with user input. That user input will then be passed into the function. I don't see why this isn't working.
import re
routing_table = {}
def route(url, func):
key = url
key = re.findall(r"(.+?)/<[a-zA-Z_][a-zA-Z0-9_]*>", url)
if key:
params = re.findall(r"<([a-zA-Z_][a-zA-Z0-9_]*)>", url)
routing_table[key][0] = [[params], func]
else:
routing_table[url] = func
def find_path(url):
if url in routing_table:
return routing_table[url]
else:
return None
def index():
return "This is the main page"
def hello():
return "hi, how are you?"
def page(page_id = 7):
return "this is page %d" % page_id
def hello():
return "hi, how are you?"
route("/page/<page_id>", page)
print(routing_table)
I'm not sure why you are using re.findall if you're only interested in the first value found.
Nevertheless, your problem is simply the way you index the result: key is a list, and as the error says, you can't use a list as a dict key. But you've put the [0] outside the first set of brackets, so Python interprets this as you wanting to set the first value of routing_table[key], rather than you wanting to use the first value of key as the thing to set.
What you actually mean is this:
routing_table[key[0]] = [[params], func]