How to make unique short URL with Python? - python

How can I make unique URL in Python a la http://imgur.com/gM19g or http://tumblr.com/xzh3bi25y
When using uuid from python I get a very large one. I want something shorter for URLs.

Edit: Here, I wrote a module for you. Use it. http://code.activestate.com/recipes/576918/
Counting up from 1 will guarantee short, unique URLS. /1, /2, /3 ... etc.
Adding uppercase and lowercase letters to your alphabet will give URLs like those in your question. And you're just counting in base-62 instead of base-10.
Now the only problem is that the URLs come consecutively. To fix that, read my answer to this question here:
Map incrementing integer range to six-digit base 26 max, but unpredictably
Basically the approach is to simply swap bits around in the incrementing value to give the appearance of randomness while maintaining determinism and guaranteeing that you don't have any collisions.

I'm not sure most URL shorteners use a random string. My impression is they write the URL to a database, then use the integer ID of the new record as the short URL, encoded base 36 or 62 (letters+digits).
Python code to convert an int to a string in arbitrary bases is here.

Python's short_url is awesome.
Here is an example:
import short_url
id = 20 # your object id
domain = 'mytiny.domain'
shortened_url = "http://{}/{}".format(
domain,
short_url.encode_url(id)
)
And to decode the code:
decoded_id = short_url.decode_url(param)
That's it :)
Hope this will help.

Hashids is an awesome tool for this.
Edit:
Here's how to use Hashids to generate a unique short URL with Python:
from hashids import Hashids
pk = 123 # Your object's id
domain = 'imgur.com' # Your domain
hashids = Hashids(salt='this is my salt', min_length=6)
link_id = hashids.encode(pk)
url = 'http://{domain}/{link_id}'.format(domain=domain, link_id=link_id)

This module will do what you want, guaranteeing that the string is globally unique (it is a UUID):
http://pypi.python.org/pypi/shortuuid/0.1
If you need something shorter, you should be able to truncate it to the desired length and still get something that will reasonably probably avoid clashes.

This answer comes pretty late but I stumbled upon this question when I was planning to create an URL shortener project. Now that I have implemented a fully functional URL shortener(source code at amitt001/pygmy) I am adding an answer here for others.
The basic principle behind any URL shortener is to get an int from long URL then use base62(base32, etc) encoding to convert this int to a more readable short URL.
How is this int generated?
Most of the URL shortener uses some auto-incrementing datastore to add URL to datastore and use the autoincrement id to get base62 encoding of int.
The sample base62 encoding from string program:
# Base-62 hash
import string
import time
_BASE = 62
class HashDigest:
"""Base base 62 hash library."""
def __init__(self):
self.base = string.ascii_letters + string.digits
self.short_str = ''
def encode(self, j):
"""Returns the repeated div mod of the number.
:param j: int
:return: list
"""
if j == 0:
return [j]
r = []
dividend = j
while dividend > 0:
dividend, remainder = divmod(dividend, _BASE)
r.append(remainder)
r = list(reversed(r))
return r
def shorten(self, i):
"""
:param i:
:return: str
"""
self.short_str = ""
encoded_list = self.encode(i)
for val in encoded_list:
self.short_str += self.base[val]
return self.short_str
This is just the partial code showing base62 encoding. Check out the complete base62 encoding/decoding code at core/hashdigest.py
All the link in this answer are shortened from the project I created

The reason UUIDs are long is because they contain lots of information so that they can be guaranteed to be globally unique.
If you want something shorter, then you'll need to do something like generate a random string, checking whether it is in the universe of already generated strings, and repeating until you get an unused string. You'll also need to watch out for concurrency here (what if the same string gets generated by a separate process before you inserted into the set of strings?).
If you need some help generating random strings in Python, this other question might help.

It doesn't really matter that this is Python, but you just need a hash function that maps to the length you want. For example, maybe use MD5 and then take just the first n characters. You'll have to watch out for collisions in that case, though, so you might want to pick something a little more robust in terms of collision detection (like using primes to cycle through the space of hash strings).

I don't know if you can use this, but we generate content objects in Zope that get unique numeric ids based on current time strings, in millis (eg, 1254298969501)
Maybe you can guess the rest. Using the recipe described here:
How to convert an integer to the shortest url-safe string in Python?, we encode and decode the real id on the fly, with no need for storage. A 13-digit integer is reduced to 7 alphanumeric chars in base 62, for example.
To complete the implementation, we registered a short (xxx.yy) domain name, that decodes and does a 301 redirect for "not found" URLs,
If I was starting over, I would subtract the "starting-over" time (in millis) from the numeric id prior to encoding, then re-add it when decoding. Or else when generating the objects. Whatever. That would be way shorter..

You can generate a N random string:
import string
import random
def short_random_string(N:int) -> str:
return ''.join(random.SystemRandom().choice(
string.ascii_letters + \
string.digits) for _ in range(N)
)
so,
print (short_random_string(10) )
#'G1ZRbouk2U'
all lowercase
print (short_random_string(10).lower() )
#'pljh6kp328'

Try this http://code.google.com/p/tiny4py/ ... It's still under development, but very useful!!

My Goal: Generate a unique identifier of a specified fixed length consisting of the characters 0-9 and a-z. For example:
zcgst5od
9x2zgn0l
qa44sp0z
61vv1nl5
umpprkbt
ylg4lmcy
dec0lu1t
38mhd8i5
rx00yf0e
kc2qdc07
Here's my solution. (Adapted from this answer by kmkaplan.)
import random
class IDGenerator(object):
ALPHABET = "0123456789abcdefghijklmnopqrstuvwxyz"
def __init__(self, length=8):
self._alphabet_length = len(self.ALPHABET)
self._id_length = length
def _encode_int(self, n):
# Adapted from:
# Source: https://stackoverflow.com/a/561809/1497596
# Author: https://stackoverflow.com/users/50902/kmkaplan
encoded = ''
while n > 0:
n, r = divmod(n, self._alphabet_length)
encoded = self.ALPHABET[r] + encoded
return encoded
def generate_id(self):
"""Generate an ID without leading zeros.
For example, for an ID that is eight characters in length, the
returned values will range from '10000000' to 'zzzzzzzz'.
"""
start = self._alphabet_length**(self._id_length - 1)
end = self._alphabet_length**self._id_length - 1
return self._encode_int(random.randint(start, end))
if __name__ == "__main__":
# Sample usage: Generate ten IDs each eight characters in length.
idgen = IDGenerator(8)
for i in range(10):
print idgen.generate_id()

Related

How to replace all T with U in an input string of DNA?

So, the task is quite simple. I just need to replace all "T"s with "U"s in an input string of DNA. I have written the following code:
def transcribe_dna_to_rna(s):
base_change = {"t":"U", "T":"U"}
replace = "".join([base_change(n,n) for n in s])
return replace.upper()
and for some reason, I get the following error code:
'dict' object is not callable
Why is it that my dictionary is not callable? What should I change in my code?
Thanks for any tips in advance!
To correctly convert DNA to RNA nucleotides in string s, use a combination of str.maketrans and str.translate, which replaces thymine to uracil while preserving the case. For example:
s = 'ACTGactgACTG'
s = s.translate(str.maketrans("tT", "uU"))
print(s)
# ACUGacugACUG
Note that in bioinformatics, case (lower or upper) is often important and should be preserved, so keeping both t -> u and T -> U is important. See, for example:
Uppercase vs lowercase letters in reference genome
SEE ALSO:
Character Translation using Python (like the tr command)
Note that there are specialized bioinformatics tools specifically for handling biological sequences.
For example, BioPython offers transcribe:
from Bio.Seq import Seq
my_seq = Seq('ACTGactgACTG')
my_seq = my_seq.transcribe()
print(my_seq)
# ACUGacugACUG
To install BioPython, use conda install biopython or conda create --name biopython biopython.
The syntax error tells you that base_change(n,n) looks like you are trying to use base_change as the name of a function, when in fact it is a dictionary.
I guess what you wanted to say was
def transcribe_dna_to_rna(s):
base_change = {"t":"U", "T":"U"}
replace = "".join([base_change.get(n, n) for n in s])
return replace.upper()
where the function is the .get(x, y) method of the dictionary, which returns the value for the key in x if it is present, and otherwise y (so in this case, return the original n if it's not in the dictionary).
But this is overcomplicating things; Python very easily lets you replace characters in strings.
def transcribe_dna_to_rna(s):
return s.upper().replace("T", "U")
(Stole the reordering to put the .upper() first from #norie's answer; thanks!)
If your real dictionary was much larger, your original attempt might make more sense, as long chains of .replace().replace().replace()... are unattractive and eventually inefficient when you have a lot of them.
In python 3, use str.translate:
dna = "ACTG"
rna = dna.translate(str.maketrans("T", "U")) # "ACUG"
Change s to upper and then do the replacement.
def transcribe_dna_to_rna(s):
return s.upper().replace("T", "U")

Generate a sequence in Python for barcoding items

I am trying to generate barcodes in an app to tag the products which includes 3 things:
Batch no. (GRN ID)
Product ID
serial ID
Something like this:
def get(self, request, *args, **kwargs):
pk = self.kwargs['pk']
grn = Grn.objects.filter(pk=pk)[0]
grn_prod = grn.items.all()
items = []
for i in grn_prod:
for j in range(i.item_quantity):
items.append("YNT" + str(pk) + str(i.item.pk) + str(j + 1))
It generates a sequence like the following:
YNT55232
Which is good but while scanning it if I want to know the item ID or Serial ID the it becomes a problem as it could be 23, 523, 3, etc.
For this I want to specify a no of digits for GRN, Product and Serial Id something like this:
GRN Barcode GRN ID Product ID Serial ID
YNT 000X 000X 0000X
I am unable to figure out how to append 0 before the IDs ?
you can use format in Python. It is commonly used to format many variables.
If you want to format this: "YNT" + str(pk) + str(i.item.pk) + str(j + 1)
you can use format as below:
'"YNT"\t{:04d}\t{:04d}\t{:05d}'.format(pk, i.item.pk, j+1)
In case you do not know; the {} are for each variable as in order in format().
As you want to have pk and i.item.pk as four characters, then you add :04d. :04d completes the words with 0. For instance;
if pk = 1, then it converts it to 0001, or if it is 101 then it converts to 0101.
Same is for j+1, if j+1 is 1, then it generates 00001, if it is 101, then it generates 00101.
If you have not used format in Python, I suggest you learn it. It is really helpful for formatting variables.
The zfill function does exactly this.
str.zfill(5) will pad given string in variable str to at least 5 characters, for example.
There are different ways to achieve String formatting.
%-formatting
Look at the above mention documentation of string function zfill. A box follows that explains:
printf style String Formatting using the % operator (modulo):
barcode = '%(gnr)03d%(product)03d%(serial)04d' % {'gnr': 123, 'product': 456, 'serial': 7890}
print(barcode)
produces YNT1234567890.
f-strings (since Python 3.6)
You can also use the Python 3 way with f-strings:
# barcode components readable named
gnr = pk
product = i.item.pk
serial = j + 1
# format-literal simply will replace the variables named
barcode = f"YNT{gnr:03}{product:03}{serial:04}"
items.append(barcode)
It uses a prefix after the variable-name:
:0x for left-padding with x leading zeros.
Note: I always clearly named the template-variables (here: barcode components):
put in a map with descriptive keys (like above in %-formatting)
put in separate variables with names describing each of them (like in f-string example)
So we can use a readable template also called format-literal like:
"{component_1} before {component_2} then {the_rest}"

random.choice() returns same value at the same second, how does one avoid it?

I have been looking at similar questions regarding how to generate random numbers in python. Example: Similar Question - but i do not have the problem that the randomfunction returns same values every time.
My random generator works fine, the problem is that it returns the same value when calling the function at, what I think, the same second which is undesireable.
My code looks like this
def getRandomID():
token = ''
letters = "abcdefghiklmnopqrstuvwwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890"
for i in range(1,36):
token = token + random.choice(letters)
return token
As I mentioned this function returns different values when being called at on different times but returns the same value when calling the function at the same time. How do I avoid this problem?
I use this function in a back-end-server to generate unique IDs for users in front-end to insert in a database so I cannot control the time intervals when this happens. I must have random tokens to map the users in the database to be able to insert them correctly with queuenumbers in the database.
You could possibly improve matters by using random.SystemRandom() as follows:
import random
sys_random = random.SystemRandom()
def getRandomID():
token = ''
letters = "abcdefghiklmnopqrstuvwwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890"
for i in range(1, 36):
token = token + sys_random.choice(letters)
return token
print(getRandomID())
This attempts to use the os.urandom() function which generates random numbers from sources provided by the operating system. The .choices() function could also be used to return a list of choices in a single call, avoiding the string concatenation:
import random
sys_random = random.SystemRandom()
def getRandomID():
letters = "abcdefghiklmnopqrstuvwwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890"
return ''.join(sys_random.choices(letters, k=35))
print(getRandomID())
def getRandomID(n):
import datetime
import random
random.seed(datetime.datetime.now())
letters = "abcdefghiklmnopqrstuvwwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890"
idList = [ ''.join([random.choice(letters) for j in range(1,36)]) for i in range(n)]
return idList
this script in the 3rd test of 10 million ids again have made them all unique
changing for loop to list comprehension did speedup quite a bit.
>>> listt = getRandomID(10000000)
>>> print(len(listt))
10000000
>>> setOfIds = set(listt)
>>> print(len(setOfIds))
10000000
this script uses permutations with repetition: 62 choose 35,
to theoretically total number of ids is quite big it is pow(62,35)
541638008296341754635824011376225346986572413939634062667808768
Another option would be to update the seed with the previous result to get a pseudorandom sequence. An option would be old_seed XOR result or just the result.

Python LDAP converting objectGUID to hex string and back

How do I convert binary ldap attributes returned by python-ldap to a nice hex representation and back again for use in an ldap filter?
For the task of converting to and from hex string, you should consider the builtin uuid module.
import uuid
object_guid = 'Igr\xafb\x19ME\xb2P9c\xfb\xa0\xe2w'
guid = uuid.UUID(bytes=object_guid)
# to hex
assert guid.hex == '496772af62194d45b2503963fba0e277'
# to human-readable guid
assert str(guid) == '496772af-6219-4d45-b250-3963fba0e277'
# to bytes
assert guid.bytes == object_guid == 'Igr\xafb\x19ME\xb2P9c\xfb\xa0\xe2w'
def guid2hexstring(val):
s = ['\\%02X' % ord(x) for x in val]
return ''.join(s)
guid = ldapobject.get('objectGUID', [''])[0] # 'Igr\xafb\x19ME\xb2P9c\xfb\xa0\xe2w'
guid2string(guid).replace("\\", "") # '496772AF62194D45B2503963FBA0E277'
#and back to a value you can use in an ldap search filter
guid = ''.join(['\\%s' % guid[i:i+2] for i in range(0, len(guid), 2)]) # '\\49\\67\\72\\AF\\62\\19\\4D\\45\\B2\\50\\39\\63\\FB\\A0\\E2\\77'
searchfilter = ('(objectGUID=%s)' % guid)
I wasn't able to directly use any of the above code in order to go from a string objectGUID representation to something that would work for an ldap query. But going on the code from #Rel and the comment from #hernan I was able to figure out how to do it. I'm posting this in case someone like me is still puzzled about how to use the details above to formulate a search filter. Here's what I did:
Starting from a string objectGuid (and I've borrowed the one above) I remove the hyphens.
guidString = '496772af-6219-4d45-b250-3963fba0e277'.replace("-","")
You need to reorder the characters, in pairs of characters, for the first three groupings. I generated an order as follows:
newOrder = [6,7,4,5,2,3,0,1,10,11,8,9,14,15,12,13] # the weird-ordered stuff
for i in range(16, len(guidString)): newOrder.append(i) # slam the rest on
I then create a new string with the characters in the stated order:
guid_string_in_search_order = str.join('', [guidString[i] for i in newOrder])
guidSearch = ''.join(['\\%s' % str.join('',guid_string_in_search_order[i:i+2]) for i in range(0, len(guid_string_in_search_order), 2)])
Then you need to add escaped backslashes in front of each of the pairs:
guidSearch = ''.join(['\\%s' % str.join('',guid_string_in_search_order[i:i+2]) for i in range(0, len(guid_string_in_search_order), 2)])
That should get you a guidSearch of:
'\\af\\72\\67\\49\\19\\62\\45\\4d\\b2\\50\\39\\63\\fb\\a0\\e2\\77'
So now you make that an ldap search string:
search_filter = '(objectGUID={})'.format(guidSearch)
And there you go - ready for an ldap search. I suspect someone with more miles clocked doing this stuff could do it in fewer steps, but at least this way you can follow what I did.
We can use python uuid to get hex representation
import uuid
object_guid_from_ldap_ad = '\x1dC\xce\x04\x88h\xffL\x8bX|\xe5!,\x9b\xa9'
guid = uuid.UUID(bytes=object_guid_from_ldap_ad)
# To hex
guid.hex
# To human readable
str(guid)
# Back to bytes
assert guid.bytes == object_guid_from_ldap_ad
The answer of the second part of the question...
Search filter can be created with the original raw objectGUID out from LDAP/AD or the guid.bytes of the python UUID object, both are the same.
Example :
search_filter = ('(objectGUID=%s)' % object_guid_from_ldap_ad)
OR
search_filter = ('(objectGUID=%s)' % guid.bytes)
Then you use your search_filter in a LDAP search.

ArcMap Field Calculator Program to create Unique ID's

I'm using the Field Calculator in ArcMap and
I need to create a unique ID for every storm drain in my county.
An ID Should look something like this: 16-I-003
The first number is the municipal number which is in the column/field titled "Munic"
The letter is using the letter in the column/field titled "Point"
The last number is simply just 1 to however many drains there are in a municipality.
So far I have:
rec=0
def autoIncrement()
pStart=1
pInterval=1
if(rec==0):
rec=pStart
else:
rec=rec+pInterval
return "16-I-" '{0:03}'.format(rec)
So you can see that I have manually been typing in the municipal number, the letter, and the hyphens. But I would like to use the fields: Munic and Point so I don't have to manually type them in each time it changes.
I'm a beginner when it comes to python and ArcMap, so please dumb things down a little.
I'm not familiar with the ArcMap, so can't directly help you, but you might just change your function to a generator as such:
def StormDrainIDGenerator():
rec = 0
while (rec < 99):
rec += 1
yield "16-I-" '{0:03}'.format(rec)
If you are ok with that, then parameterize the generator to accept the Munic and Point values and use them in your formatting string. You probably should also parameterize the ending value as well.
Use of a generator will allow you to drop it into any later expression that accepts an iterable, so you could create a list of such simply by saying list(StormDrainIDGenerator()).
Is your question on how to get Munic and Point values into the string ID? using .format()?
I think you can use following code to do that.
def autoIncrement(a,b):
global rec
pStart=1
pInterval=1
if(rec==0):
rec=pStart
else:
rec=rec+pInterval
r = "{1}-{2}-{0:03}".format(a,b,rec)
return r
and call
autoIncrement( !Munic! , !Point! )
The r = "{1}-{2}-{0:03}".format(a,b,rec) just replaces the {}s with values of variables a,b which are actually the values of Munic and Point passed to the function.

Categories

Resources