How do I convert binary ldap attributes returned by python-ldap to a nice hex representation and back again for use in an ldap filter?
For the task of converting to and from hex string, you should consider the builtin uuid module.
import uuid
object_guid = 'Igr\xafb\x19ME\xb2P9c\xfb\xa0\xe2w'
guid = uuid.UUID(bytes=object_guid)
# to hex
assert guid.hex == '496772af62194d45b2503963fba0e277'
# to human-readable guid
assert str(guid) == '496772af-6219-4d45-b250-3963fba0e277'
# to bytes
assert guid.bytes == object_guid == 'Igr\xafb\x19ME\xb2P9c\xfb\xa0\xe2w'
def guid2hexstring(val):
s = ['\\%02X' % ord(x) for x in val]
return ''.join(s)
guid = ldapobject.get('objectGUID', [''])[0] # 'Igr\xafb\x19ME\xb2P9c\xfb\xa0\xe2w'
guid2string(guid).replace("\\", "") # '496772AF62194D45B2503963FBA0E277'
#and back to a value you can use in an ldap search filter
guid = ''.join(['\\%s' % guid[i:i+2] for i in range(0, len(guid), 2)]) # '\\49\\67\\72\\AF\\62\\19\\4D\\45\\B2\\50\\39\\63\\FB\\A0\\E2\\77'
searchfilter = ('(objectGUID=%s)' % guid)
I wasn't able to directly use any of the above code in order to go from a string objectGUID representation to something that would work for an ldap query. But going on the code from #Rel and the comment from #hernan I was able to figure out how to do it. I'm posting this in case someone like me is still puzzled about how to use the details above to formulate a search filter. Here's what I did:
Starting from a string objectGuid (and I've borrowed the one above) I remove the hyphens.
guidString = '496772af-6219-4d45-b250-3963fba0e277'.replace("-","")
You need to reorder the characters, in pairs of characters, for the first three groupings. I generated an order as follows:
newOrder = [6,7,4,5,2,3,0,1,10,11,8,9,14,15,12,13] # the weird-ordered stuff
for i in range(16, len(guidString)): newOrder.append(i) # slam the rest on
I then create a new string with the characters in the stated order:
guid_string_in_search_order = str.join('', [guidString[i] for i in newOrder])
guidSearch = ''.join(['\\%s' % str.join('',guid_string_in_search_order[i:i+2]) for i in range(0, len(guid_string_in_search_order), 2)])
Then you need to add escaped backslashes in front of each of the pairs:
guidSearch = ''.join(['\\%s' % str.join('',guid_string_in_search_order[i:i+2]) for i in range(0, len(guid_string_in_search_order), 2)])
That should get you a guidSearch of:
'\\af\\72\\67\\49\\19\\62\\45\\4d\\b2\\50\\39\\63\\fb\\a0\\e2\\77'
So now you make that an ldap search string:
search_filter = '(objectGUID={})'.format(guidSearch)
And there you go - ready for an ldap search. I suspect someone with more miles clocked doing this stuff could do it in fewer steps, but at least this way you can follow what I did.
We can use python uuid to get hex representation
import uuid
object_guid_from_ldap_ad = '\x1dC\xce\x04\x88h\xffL\x8bX|\xe5!,\x9b\xa9'
guid = uuid.UUID(bytes=object_guid_from_ldap_ad)
# To hex
guid.hex
# To human readable
str(guid)
# Back to bytes
assert guid.bytes == object_guid_from_ldap_ad
The answer of the second part of the question...
Search filter can be created with the original raw objectGUID out from LDAP/AD or the guid.bytes of the python UUID object, both are the same.
Example :
search_filter = ('(objectGUID=%s)' % object_guid_from_ldap_ad)
OR
search_filter = ('(objectGUID=%s)' % guid.bytes)
Then you use your search_filter in a LDAP search.
Related
I'm trying to write a bittorrent tracker. The tracker protocol encode a binary hex into the http "GET" parameters. It looks like something in the following:
http://localhost:5000/announce?info_hash=%C5O%94%1b%1a9%86%86%12B%D7U%D0%ACF%E9%FA%3c%5d2
If I use request.args, it will try to decode the "info_hash" into a string, which will not work because the info_hash should just be some raw binary sequence which can't be represented by any string encoding.
So is there anything I could use to extract the result in bytes without rolling my own parser?
Thanks
Unfortunately there does not seem to be a really simple way of doing this with the standard library.
The way to go is to take the raw query string via request.query_string and parse it. The problem is that Python standard library functions all assume that that the values are also Unicode, which is of course not true for Bittorrent trackers as you have pointed out.
The thing is that the standard library has all of the required functionality, namely the function parse_qs from urllib.parse to implement with minor modifications:
def parse_qs_to_bytes(qs, keep_blank_values=False, strict_parsing=False,
max_num_fields=None):
parsed_result = {}
pairs = parse_qsl(qs, keep_blank_values, strict_parsing,
max_num_fields=max_num_fields)
for name, value in pairs:
if name in parsed_result:
parsed_result[name].append(value)
else:
parsed_result[name] = [value]
return parsed_result
def parse_qsl(qs, keep_blank_values=False, strict_parsing=False,
max_num_fields=None):
if max_num_fields is not None:
num_fields = 1 + qs.count('&') + qs.count(';')
if max_num_fields < num_fields:
raise ValueError('Max number of fields exceeded')
pairs = [s2 for s1 in qs.split('&') for s2 in s1.split(';')]
r = []
for name_value in pairs:
if not name_value and not strict_parsing:
continue
nv = name_value.split('=', 1)
if len(nv) != 2:
if strict_parsing:
raise ValueError("bad query field: %r" % (name_value,))
# Handle case of a control-name with no equal sign
if keep_blank_values:
nv.append('')
else:
continue
if len(nv[1]) or keep_blank_values:
name = nv[0].replace('+', ' ')
name = unquote(name)
value = nv[1].replace('+', ' ')
value = unquote_to_bytes(value) # changed line from original to prevent coercing into unicode
r.append((name, value))
return r
So, you should be able to call parse_qs_bytes_to_bytes(request.query_string.decode())
How i can change multiple parameters value in this url: https://google.com/?test=sadsad&again=tesss&dadasd=asdaas
You can see my code: i can just change 2 value!
This is the response https://google.com/?test=aaaaa&dadasd=howwww
again parameter not in the response! how i can change the value and add it to the url?
def between(value, a, b):
pos_a = value.find(a)
if pos_a == -1: return ""
pos_b = value.rfind(b)
if pos_b == -1: return ""
adjusted_pos_a = pos_a + len(a)
if adjusted_pos_a >= pos_b: return ""
return value[adjusted_pos_a:pos_b]
def before(value, a):
pos_a = value.find(a)
if pos_a == -1: return ""
return value[0:pos_a]
def after(value, a):
pos_a = value.rfind(a)
if pos_a == -1: return ""
adjusted_pos_a = pos_a + len(a)
if adjusted_pos_a >= len(value): return ""
return value[adjusted_pos_a:]
test = "https://google.com/?test=sadsad&again=tesss&dadasd=asdaas"
if "&" in test:
print(test.replace(between(test, "=", "&"), 'aaaaa').replace(after(test, "="), 'howwww'))
else:
print(test.replace(after(test, "="), 'test'))
Thanks!
From your code it seems like you are probably fairly new to programming, so first of all congratulations on having attempted to solve your problem.
As you might expect, there are language features you may not know about yet that can help with problems like this. (There are also libraries specifically for parsing URLs, but point you to those wouldn't help your progress in Python quite as much - if you are just trying to get some job done they might be a godsend).
Since the question lacks a little clarity (don't worry - I can only speak and write English, so you are ahead of me there), I'll try to explain a simpler approach to your problem. From the last block of your code I understand your intent to be:
"If there are multiple parameters, replace the value of the first with 'aaaaa' and the others with 'howwww'. If there is only one, replace its value with 'test'."
Your code is a fair attempt (at what I think you want to do). I hope the following discussion will help you. First, set url to your example initially.
>>> url = "https://google.com/?test=sadsad&again=tesss&dadasd=asdaas"
While the code deals with multiple arguments or one, it doesn't deal with no arguments at all. This may or may not matter, but I like to program defensively, having made too many silly mistakes in the past. Further, detecting that case early simplifies the remaining logic by eliminating an "edge case" (something the general flow of your code does not handle). If I were writing a function (good when you want to repeat actions) I'd start it with something like
if "?" not in url:
return url
I skipped this here because I know what the sample string is and I'm not writing a function. Once you know there are arguments, you can split them out quite easily with
>>> stuff, args = url.split("?", 1)
The second argument to split is another defensive measure, telling it to ignore all but the first question mark. Since we know there is at least one, this guarantees there will always be two elements in the result, and Python won't complain about a different number of names as values in that assignment. Let's confirm their values:
>>> stuff, args
('https://google.com/', 'test=sadsad&again=tesss&dadasd=asdaas')
Now we have the arguments alone, we can split them out into a list:
>>> key_vals = args.split("&")
>>> key_vals
['test=sadsad', 'again=tesss', 'dadasd=asdaas']
Now you can create a list of key,value pairs:
>>> kv_pairs = [kv.split("=", 1) for kv in key_vals]
>>> kv_pairs
[['test', 'sadsad'], ['again', 'tesss'], ['dadasd', 'asdaas']]
At this point you can do whatever is appropriate do the keys and values - deleting elements, changing values, changing keys, and so on. You could create a dictionary from them, but beware repeated keys. I assume you can change kv_pairs to reflect the final URL you want.
Once you have made the necessary changes, putting the return value back together is relatively simple: we have to put an "=" between each key and value, then a "&" between each resulting string, then join the stuff back up with a "?". One step at a time:
>>> [f"{k}={v}" for (k, v) in kv_pairs]
['test=sadsad', 'again=tesss', 'dadasd=asdaas']
>>> "&".join(f"{k}={v}" for (k, v) in kv_pairs)
'test=sadsad&again=tesss&dadasd=asdaas'
>>> stuff + "?" + "&".join(f"{k}={v}" for (k, v) in kv_pairs)
'https://google.com/?test=sadsad&again=tesss&dadasd=asdaas'
I would use urllib since it handles this for you.
First lets break down the URL.
import urllib
u = urllib.parse.urlparse('https://google.com/?test=sadsad&again=tesss&dadasd=asdaas')
ParseResult(scheme='https', netloc='google.com', path='/', params='', query='test=sadsad&again=tesss&dadasd=asdaas', fragment='')
Then lets isolate the query element.
data = dict(urllib.parse.parse_qsl(u.query))
{'test': 'sadsad', 'again': 'tesss', 'dadasd': 'asdaas'}
Now lets update some elements.
data.update({
'test': 'foo',
'again': 'fizz',
'dadasd': 'bar'})
Now we should encode it back to the proper format.
encoded = urllib.parse.urlencode(data)
'test=foo&again=fizz&dadasd=bar'
And finally let us assemble the whole URL back together.
new_parts = (u.scheme, u.netloc, u.path, u.params, encoded, u.fragment)
final_url = urllib.parse.urlunparse(new_parts)
'https://google.com/?test=foo&again=fizz&dadasd=bar'
Is it necessary to do it from scartch? If not use the urllib already included in vanilla Python.
from urllib.parse import urlparse, parse_qsl, urlencode, urlunparse
url = "https://google.com/?test=sadsad&again=tesss&dadasd=asdaas"
parsed_url = urlparse(url)
qs = dict(parse_qsl(parsed_url.query))
# {'test': 'sadsad', 'again': 'tesss', 'dadasd': 'asdaas'}
if 'again' in qs:
del qs['again']
# {'test': 'sadsad', 'dadasd': 'asdaas'}
parts = list(parsed_url)
parts[4] = urlencode(qs)
# ['https', 'google.com', '/', '', 'test=sadsad&dadasd=asdaas', '']
new_url = urlunparse(parts)
# https://google.com/?test=sadsad&dadasd=asdaas
I am using a new script (a) to extract information from an old script (b) to create a new file (c). I am looking for an equal sign in the old script (b) and want to modify the modification script (a) to make it automated.
The string is
lev1tolev2 'from=e119-b3331l1 mappars="simp:180" targ=enceladus.bi.def.3 km=0.6 lat=(-71.5,90) lon=(220,360)'
It is written in python 3.
The current output is fixed at
cam2map from=e119-b3331l1 to=rsmap-x map=enc.Ink.map pixres=mpp defaultrange=MAP res=300 minlat=-71.5 maxlat=90 minlon=220 maxlon=360
Currently, I have the code able to export a string of 0.6 for all of the iterations of lev1tolev2, but each one of these is going to be different.
cam2map = Call("cam2map")
cam2map.kwargs["from"] = old_lev1tolev2.kwargs["from"]
cam2map.kwargs["to"] = "rsmap-x"
cam2map.kwargs["map"] = "enc.Ink.map"
cam2map.kwargs["pixres"] = "mpp"
cam2map.kwargs["defaultrange"] = "MAP"
**cam2map.kwargs["res"] = float((old_lev1tolev2.kwargs["km"]))**
cam2map.kwargs["minlat"] = lat[0]
cam2map.kwargs["maxlat"] = lat[1]
cam2map.kwargs["minlon"] = lon[0]
cam2map.kwargs["maxlon"] = lon[1]
I have two questions, why is this not converting the string to a float? And, why is this not iterating over all of the lev1tolev2 commands as everything else in the code does?
The full code is available here.
https://codeshare.io/G6drmk
The problem occurred at a different location in the code.
def escape_kw_value(value):
if not isinstance(value, str):
return value
elif (value.startswith(('"', "'")) and value.endswith(('"', "'"))):
return value
# TODO escape the quote with \" or \'
#if value.startswith(('"', "'")) or value.endswith(('"', "'")):
# return value
if " " in value:
value = '"{}"'.format(value)
return value
it doesn't seem to clear to me, but from you syntax here :
**cam2map.kwargs["res"] = float((old_lev1tolev2.kwargs["km"]))**
I'd bet that cam2map.kwargs["res"] is a dict, and you thought that it would convert every values in the dict, using the ** syntax. The float built-in should then be called in a loop over the elements of the dict, or possible a list-comprehension as here :
cam2map.kwargs["res"] = dict()
for key, value in old_lev1tolev2.kwars["res"].items():
cam2map.kwargs["res"][key] = float(value)
Edit :
Ok so, it seems you took the string 'from=e119-b3331l1 mappars="simp:180" targ=enceladus.bi.def.3 km=0.6 lat=(-71.5,90) lon=(220,360)'
And then thought that calling youstring.kwargs would give you a dict, but it won't, you can probably parse it to a dict first, using some lib, or, you use mystring.split('=') and then work your way to a dict first, like that:
output = dict()
for one_bit in lev_1_lev2.split(' '):
key, value = one_bit.split('=')
output[key] = value
I try to filter and get some set of objects using this segment.
baseSet = ThreadedComment.objects.filter(tree_path__contains = baseT.comment_ptr_id)
but it brings some objects that are not supposed to be there.
For example, my baseT.comment_ptr_id is 1, it brought items with these tree_path.
comment_ptr_id=1 treepath = 0000000001
comment_ptr_id=3 treepath = 0000000001/0000000003
comment_ptr_id=4 treepath = 0000000001/0000000003/0000000004
comment_ptr_id=8 treepath = 0000000001/0000000003/0000000004/0000000008
comment_ptr_id=10 treepath = 0000000006/0000000010
comment_ptr_id=11 treepath = 0000000011
The last 2 ones are not supposed to be here. But since their tree_path contains "1"
filter brings those as well.
How can I write regex to create a filter that does not bring those items?
Why not do
baseSet = ThreadedComment.objects.filter(tree_path__contains = ('%010i' % int(baseT.comment_ptr_id)))
so that the search string for id=1 will be "0000000001" and won't be a substring of "0000000011"?
EDIT: As per the comment below, it might be better to use COMMENT_PATH_DIGITS. This is a little messier because you're using formatting to set a formatting tag. It looks like this:
tree_path__contains = ('%%0%ii' % COMMENT_PATH_DIGITS % int(baseT.comment_ptr_id))
the regexp would be '(^|/)0*%d(/|$)' % baseT.comment_ptr_id and you use it with tree_path__regex
read about MPTT for alternatives to this approach.
How can I make unique URL in Python a la http://imgur.com/gM19g or http://tumblr.com/xzh3bi25y
When using uuid from python I get a very large one. I want something shorter for URLs.
Edit: Here, I wrote a module for you. Use it. http://code.activestate.com/recipes/576918/
Counting up from 1 will guarantee short, unique URLS. /1, /2, /3 ... etc.
Adding uppercase and lowercase letters to your alphabet will give URLs like those in your question. And you're just counting in base-62 instead of base-10.
Now the only problem is that the URLs come consecutively. To fix that, read my answer to this question here:
Map incrementing integer range to six-digit base 26 max, but unpredictably
Basically the approach is to simply swap bits around in the incrementing value to give the appearance of randomness while maintaining determinism and guaranteeing that you don't have any collisions.
I'm not sure most URL shorteners use a random string. My impression is they write the URL to a database, then use the integer ID of the new record as the short URL, encoded base 36 or 62 (letters+digits).
Python code to convert an int to a string in arbitrary bases is here.
Python's short_url is awesome.
Here is an example:
import short_url
id = 20 # your object id
domain = 'mytiny.domain'
shortened_url = "http://{}/{}".format(
domain,
short_url.encode_url(id)
)
And to decode the code:
decoded_id = short_url.decode_url(param)
That's it :)
Hope this will help.
Hashids is an awesome tool for this.
Edit:
Here's how to use Hashids to generate a unique short URL with Python:
from hashids import Hashids
pk = 123 # Your object's id
domain = 'imgur.com' # Your domain
hashids = Hashids(salt='this is my salt', min_length=6)
link_id = hashids.encode(pk)
url = 'http://{domain}/{link_id}'.format(domain=domain, link_id=link_id)
This module will do what you want, guaranteeing that the string is globally unique (it is a UUID):
http://pypi.python.org/pypi/shortuuid/0.1
If you need something shorter, you should be able to truncate it to the desired length and still get something that will reasonably probably avoid clashes.
This answer comes pretty late but I stumbled upon this question when I was planning to create an URL shortener project. Now that I have implemented a fully functional URL shortener(source code at amitt001/pygmy) I am adding an answer here for others.
The basic principle behind any URL shortener is to get an int from long URL then use base62(base32, etc) encoding to convert this int to a more readable short URL.
How is this int generated?
Most of the URL shortener uses some auto-incrementing datastore to add URL to datastore and use the autoincrement id to get base62 encoding of int.
The sample base62 encoding from string program:
# Base-62 hash
import string
import time
_BASE = 62
class HashDigest:
"""Base base 62 hash library."""
def __init__(self):
self.base = string.ascii_letters + string.digits
self.short_str = ''
def encode(self, j):
"""Returns the repeated div mod of the number.
:param j: int
:return: list
"""
if j == 0:
return [j]
r = []
dividend = j
while dividend > 0:
dividend, remainder = divmod(dividend, _BASE)
r.append(remainder)
r = list(reversed(r))
return r
def shorten(self, i):
"""
:param i:
:return: str
"""
self.short_str = ""
encoded_list = self.encode(i)
for val in encoded_list:
self.short_str += self.base[val]
return self.short_str
This is just the partial code showing base62 encoding. Check out the complete base62 encoding/decoding code at core/hashdigest.py
All the link in this answer are shortened from the project I created
The reason UUIDs are long is because they contain lots of information so that they can be guaranteed to be globally unique.
If you want something shorter, then you'll need to do something like generate a random string, checking whether it is in the universe of already generated strings, and repeating until you get an unused string. You'll also need to watch out for concurrency here (what if the same string gets generated by a separate process before you inserted into the set of strings?).
If you need some help generating random strings in Python, this other question might help.
It doesn't really matter that this is Python, but you just need a hash function that maps to the length you want. For example, maybe use MD5 and then take just the first n characters. You'll have to watch out for collisions in that case, though, so you might want to pick something a little more robust in terms of collision detection (like using primes to cycle through the space of hash strings).
I don't know if you can use this, but we generate content objects in Zope that get unique numeric ids based on current time strings, in millis (eg, 1254298969501)
Maybe you can guess the rest. Using the recipe described here:
How to convert an integer to the shortest url-safe string in Python?, we encode and decode the real id on the fly, with no need for storage. A 13-digit integer is reduced to 7 alphanumeric chars in base 62, for example.
To complete the implementation, we registered a short (xxx.yy) domain name, that decodes and does a 301 redirect for "not found" URLs,
If I was starting over, I would subtract the "starting-over" time (in millis) from the numeric id prior to encoding, then re-add it when decoding. Or else when generating the objects. Whatever. That would be way shorter..
You can generate a N random string:
import string
import random
def short_random_string(N:int) -> str:
return ''.join(random.SystemRandom().choice(
string.ascii_letters + \
string.digits) for _ in range(N)
)
so,
print (short_random_string(10) )
#'G1ZRbouk2U'
all lowercase
print (short_random_string(10).lower() )
#'pljh6kp328'
Try this http://code.google.com/p/tiny4py/ ... It's still under development, but very useful!!
My Goal: Generate a unique identifier of a specified fixed length consisting of the characters 0-9 and a-z. For example:
zcgst5od
9x2zgn0l
qa44sp0z
61vv1nl5
umpprkbt
ylg4lmcy
dec0lu1t
38mhd8i5
rx00yf0e
kc2qdc07
Here's my solution. (Adapted from this answer by kmkaplan.)
import random
class IDGenerator(object):
ALPHABET = "0123456789abcdefghijklmnopqrstuvwxyz"
def __init__(self, length=8):
self._alphabet_length = len(self.ALPHABET)
self._id_length = length
def _encode_int(self, n):
# Adapted from:
# Source: https://stackoverflow.com/a/561809/1497596
# Author: https://stackoverflow.com/users/50902/kmkaplan
encoded = ''
while n > 0:
n, r = divmod(n, self._alphabet_length)
encoded = self.ALPHABET[r] + encoded
return encoded
def generate_id(self):
"""Generate an ID without leading zeros.
For example, for an ID that is eight characters in length, the
returned values will range from '10000000' to 'zzzzzzzz'.
"""
start = self._alphabet_length**(self._id_length - 1)
end = self._alphabet_length**self._id_length - 1
return self._encode_int(random.randint(start, end))
if __name__ == "__main__":
# Sample usage: Generate ten IDs each eight characters in length.
idgen = IDGenerator(8)
for i in range(10):
print idgen.generate_id()