url builder for python - python

I know about urllib and urlparse, but I want to make sure I wouldn't be reinventing the wheel.
My problem is that I am going to be fetching a bunch of urls from the same domain via the urllib library. I basically want to be able to generate urls to use (as strings) with different paths and query params. I was hoping that something might have a syntax like:
url_builder = UrlBuilder("some.domain.com")
# should give me "http://some.domain.com/blah?foo=bar
url_i_need_to_hit = url_builder.withPath("blah").withParams("foo=bar") # maybe a ".build()" after this
Basically I want to be able to store defaults that get passed to urlparse.urlunsplit instead of constantly clouding up the code by passing in the whole tuple every time.
Does something like this exist? Do people agree it's worth throwing together?

Are you proposing an extension to http://docs.python.org/library/urlparse.html#urlparse.urlunparse that would substitute into the 6-item tuple?
Are you talking about something like this?
def myUnparse( someTuple, scheme=None, netloc=None, path=None, etc. ):
parts = list( someTuple )
if scheme is not None: parts[0] = scheme
if netloc is not None: parts[1]= netloc
if path is not None: parts[2]= path
etc.
return urlunparse( parts )
Is that what you're proposing?
This?
class URLBuilder( object ):
def __init__( self, base ):
self.parts = list( urlparse(base) )
def __call__( self, scheme=None, netloc=None, path=None, etc. ):
if scheme is not None: self.parts[0] = scheme
if netloc is not None: self.parts[1]= netloc
if path is not None: self.parts[2]= path
etc.
return urlunparse( self.parts )
bldr= URLBuilder( someURL )
print bldr( scheme="ftp" )
Something like that?

You might want consider having a look at furl because it might be an answer to your needs.

Still not quite sure what you're looking for... But I'll give it a shot. If you're just looking to make a class that will keep your default values and such, it's simple enough to make your own class and use Python magic like str. Here's a scratched-out example (suboptimal):
class UrlBuilder:
def __init__(self,domain,path="blah",params="foo=bar"):
self.domain = domain
self.path = path
self.params = params
def withPath(self,path):
self.path = path
return self
def withParams(self,params):
self.params = params
return self
def __str__(self):
return 'http://' + self.domain + '/' + self.path + '?' + self.params
# or return urlparse.urlunparse( ( "http", self.domain, self.path, self.params, "", "" )
def build(self):
return self.__str__()
if __name__ == '__main__':
u = UrlBuilder('www.example.com')
print u.withPath('bobloblaw')
print u.withParams('lawyer=yes')
print u.withPath('elvis').withParams('theking=true')
If you're looking for more of the Builder Design Pattern, the Wikipedia article has a reasonable Python example (as well as Java).

I think you want http://pythonhosted.org/uritools/.
Example from the docs:
parts = urisplit('foo://user#example.com:8042/over/there?name=ferret#nose')
orig_uri = uriunsplit(parts)
The split value is a named tuple, not a regular list. It is accessible by name or index:
assert(parts[0] == parts.schema)
assert(parts[1] == parts.authority)
assert(parts[2] == parts.path)
assert(parts[3] == parts.query)
assert(parts[4] == parts.fragment)
Make a copy to make changes:
new_parts = [part for part in parts]
new_parts[2] = "/some/other/path"
new_uri = uriunsplit(new_parts)

Related

How to reduce the complexity of several for loops in python

I have the following function
import requests
children_dict = {}
def get_list_of_children(base_url, username, password, folder="1"):
token = get_token()
url = f"{base_url}/unix/repo/folders/{folder}/list"
json = requests_json(url,token)
for obj in json["list"]:
if obj['name'] == 'MainFolder':
folderId = obj['id']
url_parent = f"{base_url}/unix/repo/folders/{folderId}/list"
json_parent = requests_json(url_parent,token)
for obj_child in json_parent['list']:
if obj_child['folder'] == True:
folder_grand_child_id = obj_child['id']
url_grand_child = f"{base_url}/unix/repo/folders/{folder_grand_child_id}/list"
json_grand_child = requests_json(url_grand_child,token)
for obj_grand_child in json_grand_child["list"]:
if obj_grand_child['name'] == 'SubFolder':
folder_grand_grand_child = obj_grand_child['id']
url_grand_grand_child = f"{base_url}/unix/repo/folders/{folder_grand_grand_child}/list"
json_grand_grand_child = requests_json(url_grand_grand_child,token)
for obj_grand_grand_child in json_grand_grand_child["list"]:
if obj_grand_grand_child['name'] == 'MainTasks':
folder_grand_grand_grand_child = obj_grand_grand_child['id']
url_grand_grand_grand_child = f"{base_url}/unix/repo/folders/{folder_grand_grand_grand_child}/list"
json_grand_grand_grand_child = requests_json(url_grand_grand_grand_child,token)
for obj_grand_grand_grand_child in json_grand_grand_grand_child["list"]:
children_dict[[obj_grand_grand_grand_child['id']] = obj_grand_grand_grand_child['name']
return children_dict
What i am trying to accomplish here is to make repeated api calls to traverse through http folder structure to get the list of files in the last directory
The function works as intended but sonarlint is through the below error
Refactor this function to reduce its Cognitive Complexity from 45 to the 15 allowed. [+9 locations]sonarlint(python:S3776)
is there is a better way to handle this function ?
can anyone refactor this, pointing me in the right direct will do
This isn't a complete solution, but to answer your question "how can I simplify this" more generally, you need to look for repeated patterns in your code and generalize them into a function. Perhaps it's a function you can call recursively, or in a loop. For example, in that deeply-nested statement of yours it's just the same pattern over and over again like:
url = f"{base_url}/unix/repo/folders/{folder}/list"
json = requests_json(url,token)
for obj in json["list"]:
if obj['name'] == '<some folder name>':
folderId = obj['id']
# ...repeat...
So try generalizing this into a loop, maybe, like:
url_format = "{base_url}/unix/repo/folders/{folder_id}/list"
folder_hierarchy = ['MainFolder', 'SubFolder', 'MainTasks']
folder_id = '1' # this was the argument passed to your function
for subfolder in folder_hierarchy:
url = url_format.format(base_url=base_url, folder_id=folder_id)
folder_json = requests_json(url, token)
for obj in folder_json['list']:
if obj['name'] == subfolder:
folder_id = obj['id']
break
# Now the pattern repeats for the next level of the hierarchy but
# starting with the new folder_id
This is just schematic and you may need to generalize further, but it's one idea.
If your goal is to traverse a more complicated hierarchy you might want to look into tree-traversal algorithms.
There's plenty of repeating code. Once you identify the repeating patterns, you can extract them to the classes and functions. In particular, I find it useful to isolate all the web API logic from the rest of the code:
class Client:
def __init__(self, base_url, token):
self.base_url = base_url
self.token = token
def list_folder(self, folder_id):
return request_json(
f'{self.base_url}/unix/repo/folders/{folder_id}/list', self.token
)['list']
def get_subfolders(self, parent_id=1):
return [c for c in self.list_folder(parent_id) if c['folder']]
def get_subfolder(self, parent_id, name):
children = self.list_folder(parent_id)
for child in children:
if child['name'] == name:
return child
return None
def resolve_path(self, path, root_id=1):
parent_id = root_id
for p in path:
current = self.get_subfolder(parent_id, p)
if not current:
return None
parent_id = current['id']
return current
Now you can use the class above to simplify the main code:
client = Client(base_url, token)
for folder in client.get_subfolders():
child = client.resolve_path(folder['id'], ('SubFolder', 'MainTasks'))
if child:
# do the rest of the stuff
The code above is not guaranteed to work as is, just an illustration of the idea.
I can't really test it, but I'd make it like the following in order to easily build many levels of repeating code.
class NestedProcessing:
def __init__(self, base_url):
self.base_url = base_url
self.token = get_token()
self.obj_predicates = []
def next_level_predicate(self, obj_predicate):
self.obj_predicates.append(obj_predicate)
return self
def final_action(self, obj_action):
self.obj_action = obj_action
return self
def process(self, first_folder_id):
self.process_level(0, first_folder_id)
def process_level(self, index, folder_id):
obj_is_good = self.obj_predicates[index]
url = f"{self.base_url}/unix/repo/folders/{folder_id}/list"
json = requests_json(url, self.token)
for obj in json["list"]:
if index == len(self.obj_predicates) - 1: # last level
self.obj_action(obj)
elif obj_is_good(obj):
self.process_level(index + 1, obj['id'])
def get_list_of_children(base_url, username, password, folder="1"):
children_dict = {}
NestedProcessing(base_url)
.next_level_predicate(lambda obj: obj['name'] == 'MainFolder')
.next_level_predicate(lambda obj: obj['folder'] == True)
.next_level_predicate(lambda obj: obj['name'] == 'SubFolder')
.next_level_predicate(lambda obj: obj['name'] == 'MainTasks')
.final_action(lambda obj, storage=children_dict: storage.update({obj['id']: obj['name']})
.process(folder)
return children_dict

python structs for dict of files

This may be a newb question, but I'm a bit of newb with python, so here's what I'm trying to do...
I am using python2.7
I would like to assign a file path as a string into a dict in functionA, and then call this dict in functionB.
I looked at C-like structures in Python to try and use structs with no luck, possibly from a lack of understanding... The below sample is an excerpt from the link.
I also took a look at What are metaclasses in Python?, but I'm not sure if I understand metaclasses either.
So, how would I call assigned parameters in functionaA, within frunctionB such as:
class cstruct:
path1 = ""
path2 = ""
path3 = ""
def functionA():
path_to_a_file1 = os.path.join("/some/path/", "filename1.txt")
path_to_a_file2 = os.path.join("/some/path/", "filename2.txt")
path_to_a_file3 = os.path.join("/some/path/", "filename3.txt")
obj = cstruct()
obj.path1 = path_to_a_file1
obj.path2 = path_to_a_file2
obj.path3 = path_to_a_file3
print("testing string here: ", obj.path1)
# returns the path correctly here
# this is where things fall apart and the print doesn't return the string that I've tested with print(type(obj.path))
def functionB():
obj = cstructs()
print(obj.path1)
print(obj.path2)
print(obj.path3)
print(type(obj.path))
# returns <type 'str'>, which is what i want, but no path
Am I passing the parameters properly for the paths? If not, could someone please let me know what would be the right way to pass the string to be consumed?
Thanks!
You need to do something like this:
class Paths:
def __init__(self, path1, path2, path3):
self.path1 = path1
self.path2 = path2
self.path3 = path3
def functionA():
path_to_a_file1 = os.path.join("/some/path/", "filename1.txt")
path_to_a_file2 = os.path.join("/some/path/", "filename2.txt")
path_to_a_file3 = os.path.join("/some/path/", "filename3.txt")
obj = Paths(path_to_a_file1, path_to_a_file2, path_to_a_file3)
return obj
def functionB(paths): # should take a parameter
# obj = cstructs() don't do this! This would create a *new empty object*
print(paths.path1)
print(paths.path2)
print(paths.path3)
print(type(paths.path))
paths = functionA()
functionB(paths) # pass the argument
In any case, you really should take the time to read the official tutorial on classes. And you really should be using Python 3, Python 2 is passed its end of life.

"Sub-classes" and self in Python

Note: I see that I need to more clearly work out what it is that I want each property/descriptor/class/method to do before I ask how to do it! I don't think my question can be answered at this time. Thanks all for helping me out.
Thanks to icktoofay and BrenBarn, I'm starting to understand discriptors and properties, but now I have a slightly harder question to ask:
I see now how these work:
class Blub(object):
def __get__(self, instance, owner):
print('Blub gets ' + instance._blub)
return instance._blub
def __set__(self, instance, value):
print('Blub becomes ' + value)
instance._blub = value
class Quish(object):
blub = Blub()
def __init__(self, value):
self.blub = value
And how a = Quish('one') works (produces "Blub becomes one") but take a gander at this code:
import os
import glob
class Index(object):
def __init__(self, dir=os.getcwd()):
self.name = dir #index name is directory of indexes
# index is the list of indexes
self.index = glob.glob(os.path.join(self.name, 'BatchStarted*'))
# which is the pointer to the index (index[which] == BatchStarted_12312013_115959.txt)
self.which = 0
# self.file = self.File(self.index[self.which])
def get(self):
return self.index[self.which]
def next(self):
self.which += 1
if self.which < len(self.index):
return self.get()
else:
# loop back to the first
self.which = 0
return None
def back(self):
if self.which > 0:
self.which -= 1
return self.get()
class File(object):
def __init__(self, file):
# if the file exists, we'll use it.
if os.path.isfile(file):
self.name = file
# otherwise, our name is none and we return.
else:
self.name = None
return None
# 'file' attribute is the actual file object
self.file = open(self.name, 'r')
self.line = Lines(self.file)
class Lines(object):
# pass through the actual file object (not filename)
def __init__(self, file):
self.file = file
# line is the list if this file's lines
self.line = self.file.readlines()
self.which = 0
self.extension = Extension(self.line[self.which])
def __get__(self):
return self.line[self.which]
def __set__(self, value):
self.which = value
def next(self):
self.which += 1
return self.__get__()
def back(self):
self.which -= 1
return self.__get__()
class Extension(object):
def __init__(self, lineStr):
# check to make sure a string is passed
if lineStr:
self.lineStr = lineStr
self.line = self.lineStr.split('|')
self.pathStr = self.line[0]
self.path = self.pathStr.split('\\')
self.fileStr = self.path[-1]
self.file = self.fileStr.split('.')
else:
self.lineStr = None
def __get__(self):
self.line = self.lineStr.split('|')
self.pathStr = self.line[0]
self.path = self.pathStr.split('\\')
self.fileStr = self.path[-1]
self.file = self.fileStr.split('.')
return self.file[-1]
def __set__(self, ext):
self.file[-1] = ext
self.fileStr = '.'.join(self.file)
self.path[-1] = fileStr
self.pathStr = '\\'.join(self.path)
self.line[0] = self.pathStr
self.lineStr = '|'.join(self.line)
Firstly, there may be some typos in here because I've been working on it and leaving it half-arsed. That's not my point. My point is that in icktoofay's example, nothing gets passed to Blub(). Is there any way to do what I'm doing here, that is set some "self" attributes and after doing some processing, taking that and passing it to the next class? Would this be better suited for a property?
I would like to have it so that:
>>> i = Index() # i contains list of index files
>>> f = File(i.get()) # f is now one of those files
>>> f.line
'\\\\server\\share\\folder\\file0.txt|Name|Sean|Date|10-20-2000|Type|1'
>>> f.line.extension
'txt'
>>> f.line.extension = 'rtf'
>>> f.line
'\\\\server\\share\\folder\\file0.rtf|Name|Sean|Date|10-20-2000|Type|1'
You can do that, but the issue there is less about properties/descriptors and more about creating classes that give the behavior you want.
So, when you do f.line, that is some object. When you do f.line.extension, that is doing (f.line).extension --- that is, it first evalautes f.line and then gets the extension attribute of whatever f.line is.
The important thing here is that f.line cannot know whether you are later going to try to access its extension. So you can't have f.line do one thing for "plain" f.line and another thing for f.line.extension. The f.line part has to be the same in both, and the extension part can't change that.
The solution for what you seem to want to do is to make f.line return some kind of object that in some way looks or works like a string, but also allows setting attributes and updating itself accordingly. Exactly how you do this depends on how much you need f.lines to behave like a string and how much you need it to do other stuff. Basically you need f.line to be a "gatekeeper" object that handles some operations by acting like a string (e.g., you apparently want it to display as a string), and handles other objects in custom ways (e.g., you apparently want to be able to set an extension attribute on it and have that update its contents).
Here's a simplistic example:
class Line(object):
def __init__(self, txt):
self.base, self.extension = txt.split('.')
def __str__(self):
return self.base + "." + self.extension
Now you can do:
>>> line = Line('file.txt')
>>> print line
file.txt
>>> line.extension
'txt'
>>> line.extension = 'foo'
>>> print line
file.foo
However, notice that I did print line, not just line. By writing a __str__ method, I defined the behavior that happens when you do print line. But if you evaluate it "raw" without printing it, you'll see it's not really a string:
>>> line
<__main__.Line object at 0x000000000233D278>
You could override this behavior as well (by defining __repr__), but do you want to? That depends on how you want to use line. The point is that you need to decide what you want your line to do in what situations, and then craft a class that does that.

cherrypy handle all request with one function or class

i'd like to use cherrypy but i don't want to use the normal dispatcher, i'd like to have a function that catch all the requests and then perform my code. I think that i have to implement my own dispatcher but i can't find any valid example. Can you help me by posting some code or link ?
Thanks
make a default function:
import cherrypy
class server(object):
#cherrypy.expose
def default(self,*args,**kwargs):
return "It works!"
cherrypy.quickstart(server())
What you ask can be done with routes and defining a custom dispatcher
http://tools.cherrypy.org/wiki/RoutesUrlGeneration
Something like the following. Note the class instantiation assigned to a variable that is used as the controller for all routes, otherwise you will get multiple instances of your class. This differs from the example in the link, but I think is more what you want.
class Root:
def index(self):
<cherrpy stuff>
return some_variable
dispatcher = None
root = Root()
def setup_routes():
d = cherrypy.dispatch.RoutesDispatcher()
d.connect('blog', 'myblog/:entry_id/:action', controller=root)
d.connect('main', ':action', controller=root)
dispatcher = d
return dispatcher
conf = {'/': {'request.dispatch': setup_routes()}}
Hope that helps : )
Here's a quick example for CherryPy 3.2:
from cherrypy._cpdispatch import LateParamPageHandler
class SingletonDispatcher(object):
def __init__(self, func):
self.func = func
def set_config(self, path_info):
# Get config for the root object/path.
request = cherrypy.serving.request
request.config = base = cherrypy.config.copy()
curpath = ""
def merge(nodeconf):
if 'tools.staticdir.dir' in nodeconf:
nodeconf['tools.staticdir.section'] = curpath or "/"
base.update(nodeconf)
# Mix in values from app.config.
app = request.app
if "/" in app.config:
merge(app.config["/"])
for segment in path_info.split("/")[:-1]:
curpath = "/".join((curpath, segment))
if curpath in app.config:
merge(app.config[curpath])
def __call__(self, path_info):
"""Set handler and config for the current request."""
self.set_config(path_info)
# Decode any leftover %2F in the virtual_path atoms.
vpath = [x.replace("%2F", "/") for x in path_info.split("/") if x]
cherrypy.request.handler = LateParamPageHandler(self.func, *vpath)
Then just set it in config for the paths you intend:
[/single]
request.dispatch = myapp.SingletonDispatcher(myapp.dispatch_func)
...where "dispatch_func" is your "function that catches all the requests". It will be passed any path segments as positional arguments, and any querystring as keyword arguments.

Python classes from a for loop

I've got a piece of code which contains a for loop to draw things from an XML file;
for evoNode in node.getElementsByTagName('evolution'):
evoName = getText(evoNode.getElementsByTagName( "type")[0].childNodes)
evoId = getText(evoNode.getElementsByTagName( "typeid")[0].childNodes)
evoLevel = getText(evoNode.getElementsByTagName( "level")[0].childNodes)
evoCost = getText(evoNode.getElementsByTagName("costperlevel")[0].childNodes)
evolutions.append("%s x %s" % (evoLevel, evoName))
Currently it outputs into a list called evolutions as it says in the last line of that code, for this and several other for functions with very similar functionality I need it to output into a class instead.
class evolutions:
def __init__(self, evoName, evoId, evoLevel, evoCost)
self.evoName = evoName
self.evoId = evoId
self.evoLevel = evoLevel
self.evoCost = evoCost
How to create a series of instances of this class, each of which is a response from that for function? Or what is a core practical solution? This one doesn't really need the class but one of the others really does.
A list comprehension might be a little cleaner. I'd also move the parsing logic to the constructor to clean up the implemenation:
class Evolution:
def __init__(self, node):
self.node = node
self.type = property("type")
self.typeid = property("typeid")
self.level = property("level")
self.costperlevel = property("costperlevel")
def property(self, prop):
return getText(self.node.getElementsByTagName(prop)[0].childNodes)
evolutionList = [Evolution(evoNode) for evoNode in node.getElementsByTagName('evolution')]
Alternatively, you could use map:
evolutionList = map(Evolution, node.getElementsByTagName('evolution'))
for evoNode in node.getElementsByTagName('evolution'):
evoName = getText(evoNode.getElementsByTagName("type")[0].childNodes)
evoId = getText(evoNode.getElementsByTagName("typeid")[0].childNodes)
evoLevel = getText(evoNode.getElementsByTagName("level")[0].childNodes)
evoCost = getText(evoNode.getElementsByTagName("costperlevel")[0].childNodes)
temporaryEvo = Evolutions(evoName, evoId, evoLevel, evoCost)
evolutionList.append(temporaryEvo)
# Or you can go with the 1 liner
evolutionList.append(Evolutions(evoName, evoId, evoLevel, evoCost))
I renamed your list because it shared the same name as your class and was confusing.

Categories

Resources