Random strings in Python 2.6 (Is this OK?) - python

I've been trying to find a more pythonic way of generating random string in python that can scale as well. Typically, I see something similar to
''.join(random.choice(string.letters) for i in xrange(len))
It sucks if you want to generate long string.
I've been thinking about random.getrandombits for a while, and figuring out how to convert that to an array of bits, then hex encode that. Using python 2.6 I came across the bitarray object, which isn't documented. Somehow I got it to work, and it seems really fast.
It generates a 50mil random string on my notebook in just about 3 seconds.
def rand1(leng):
nbits = leng * 6 + 1
bits = random.getrandbits(nbits)
uc = u"%0x" % bits
newlen = int(len(uc) / 2) * 2 # we have to make the string an even length
ba = bytearray.fromhex(uc[:newlen])
return base64.urlsafe_b64encode(str(ba))[:leng]
edit
heikogerlach pointed out that it was an odd number of characters causing the issue. New code added to make sure it always sent fromhex an even number of hex digits.
Still curious if there's a better way of doing this that's just as fast.

import os
random_string = os.urandom(string_length)
and if you need url safe string :
import os
random_string = os.urandom(string_length).hex()
(note random_string length is greatest than string_length in that case)

Sometimes a uuid is short enough and if you don't like the dashes you can always.replace('-', '') them
from uuid import uuid4
random_string = str(uuid4())
If you want it a specific length without dashes
random_string_length = 16
str(uuid4()).replace('-', '')[:random_string_length]

Taken from the 1023290 bug report at Python.org:
junk_len = 1024
junk = (("%%0%dX" % junk_len) % random.getrandbits(junk_len *
8)).decode("hex")
Also, see the issues 923643 and 1023290

It seems the fromhex() method expects an even number of hex digits. Your string is 75 characters long.
Be aware that something[:-1] excludes the last element! Just use something[:].

Regarding the last example, the following fix to make sure the line is even length, whatever the junk_len value:
junk_len = 1024
junk = (("%%0%dX" % (junk_len * 2)) % random.getrandbits(junk_len * 8)).decode("hex")

Related

How to increment a numeric string in Python

I've spent the last two days trying to figure out how to increment a numeric string in Python. I am trying to increment a sequence number when a record is created. I spent all day yesterday trying to do this as an Integer, and it works fine, but I could never get database to store leading zeros. I did extensive research on this topic in StackOverflow, and while there are several examples of how to do this as an Integer and store leading zeros, none of the examples worked for me. Many of the examples were from 2014, so perhaps the methodology has changed. I then switched over to a String and changed my attribute to a CharField, and can get the function to work with leading zeros, but now I can't seem to get it to increment. Again, the examples that I found on SO were from 2014, so maybe things have changed a bit. Here is the function that works, but every time I call it, it doesn't increment. It just returns 00000001. I'm sure it's something simple I'm not doing, but I'm out of ideas. Thanks in advance for your help. Here is the function that works but doesn't increment.
def getNextSeqNo(self):
x = str(int(self.request_number) + 1)
self.request_number = str(x).zfill(8)
return self.request_number
Here is the field as it's defined:
request_number = models.CharField(editable=True,null=True,max_length=254,default="00000")
I added a default of "00000" as the system is giving me the following error if it is not present:
int() argument must be a string, a bytes-like object or a number, not 'NoneType'
I realize the code I have is basically incrementing my default by 1, which is why I'm always getting 00000001 as my sequence number. Can't seem to figure out how to get the current number and then increment by 1. Any help is appreciated.
A times ago I made something similar
You have to convert your string to int and then you must to get its length and then you have to calculate the number of zeros that you need
code = "00000"
code = str(int(code) + 1 )
code_length = len(code)
if code_length < 5: # number five is the max length of your code
code = "0" * (5 - code_length) + code
print(code)
Can this be done? Yes. But don't do it.
Make it an integer.
Incrementing is then trivial - automatic if you make this the primary key. For searching, you convert the string to an integer and search the integer - that way you don't have to worry how many leading zeros were actually included as they will all be ignored. Otherwise you will have a problem if you use 6 digits and the user can't remember and puts in 6 0's + the number and then doesn't get a match.
For those who want to just increase the last number in a string.
Import re
a1 = 'name 1'
num0_m = re.search(r'\d+', str(a1))
if num0_m:
rx = r'(?<!\d){}(?!\d)'.format(num0_m.group())
print(re.sub(rx, lambda x: str(int(x.group()) + 1), a1))
number = int('00000150')
('0'*7 + str(number+1))[-8:]
This takes any number, adds 1, concatenates/joins it to a string of several (at least 7 in your case) zeros, and then slices to return the last 8 characters.
IMHO simpler and more elegant than measuring length and working out how many zeros to add.

How to define 80-bit long variable in Python to generate random .onion addresses?

I'm trying to implement some random generator of Tor .onion addresses which involves generation of 80-bit numbers to create 16-character hashes.
How do I define such variable in Python?
.onion format:
"16-character hashes can be made up of any letter of the alphabet, and
decimal digits beginning with 2 and ending with 7, thus representing
an 80-bit number in base32."
Links:
Manipulating 80 bits datatype in C
You want this one-liner if you are on Python3
import base64
import codecs
import random
data = base64.b32encode(
codecs.decode(codecs.encode(
'{0:020x}'.format(random.getrandbits(80))
), 'hex_codec')
)
Explanation: You grab your 80 random bits using random.getrandbits, encode it into binary form (which you kind of have to go through the process by going through the hex encoding, then use the base64.b32encode function, which provides the RFC 3548 compliant method of encoding this into your target encoding of base32.
Works for Python 2 also.
You can create a sequence of 10 bytes encoding an 80 bit random number like this:
import struct
import random
number = random.randint(0, 2**80)
data = struct.pack("qH", number >> 16, number & 16)
update
Sorry, teh above part does not take care about the encoding of the key in Base32 -
without resorting to Python's string codecs (see metatoaster's answer for that) a compact and readable form is:
import string
import random
digits = string.lowercase + "234567"
res = ""
n = random.randrange(2**80)
for _ in range(16):
res += digits[n & 0b11111]
n >>= 5
Since you actually need the alphanumeric representation of the 80-bit hash, just select the base-32 digits directly.
digits = "abcdefghijklmnopqrstuvwxyz234567"
address = "".join(random.choice(digits) for _ in range(16))
I found a 15% speed-up by avoiding repeated name lookups for random.choice and by using a list comprehension rather than passing a generator to "".join.
from random import choice
digits = "abcdefghijklmnopqrstuvwxyz234567"
address = "".join([choice(digits) for _ in range(16)])

Generate ID from string in Python

I'm struggling a bit to generate ID of type integer for given string in Python.
I thought the built-it hash function is perfect but it appears that the IDs are too long sometimes. It's a problem since I'm limited to 64bits as maximum length.
My code so far: hash(s) % 10000000000.
The input string(s) which I can expect will be in range of 12-512 chars long.
Requirements are:
integers only
generated from provided string
ideally up to 10-12 chars long (I'll have ~5 million items only)
low probability of collision..?
I would be glad if someone can provide any tips / solutions.
I would do something like this:
>>> import hashlib
>>> m = hashlib.md5()
>>> m.update("some string")
>>> str(int(m.hexdigest(), 16))[0:12]
'120665287271'
The idea:
Calculate the hash of a string with MD5 (or SHA-1 or ...) in hexadecimal form (see module hashlib)
Convert the string into an integer and reconvert it to a String with base 10 (there are just digits in the result)
Use the first 12 characters of the string.
If characters a-f are also okay, I would do m.hexdigest()[0:12].
If you're not allowed to add extra dependency, you can continue using hash function in the following way:
>>> my_string = "whatever"
>>> str(hash(my_string))[1:13]
'460440266319'
NB:
I am ignoring 1st character as it may be the negative sign.
hash may return different values for same string, as PYTHONHASHSEED Value will change everytime you run your program. You may want to set it to some fixed value. Read here
encode utf-8 was needed for mine to work:
def unique_name_from_str(string: str, last_idx: int = 12) -> str:
"""
Generates a unique id name
refs:
- md5: https://stackoverflow.com/questions/22974499/generate-id-from-string-in-python
- sha3: https://stackoverflow.com/questions/47601592/safest-way-to-generate-a-unique-hash
(- guid/uiid: https://stackoverflow.com/questions/534839/how-to-create-a-guid-uuid-in-python?noredirect=1&lq=1)
"""
import hashlib
m = hashlib.md5()
string = string.encode('utf-8')
m.update(string)
unqiue_name: str = str(int(m.hexdigest(), 16))[0:last_idx]
return unqiue_name
see my ultimate-utils python library.

Strings, ints and leading zeros

I need to record SerialNumber(s) on an object. We enter many objects. Most serial numbers are strings - the numbers aren't used numerically, just as unique identifiers - but they are often sequential. Further, leading zeros are important due to unique id status of serial number.
When doing data entry, it's nice to just enter the first "sequential" serial number (eg 000123) and then the number of items (eg 5) to get the desired output - that way we can enter data in bulk see below:
Obj1.serial = 000123
Obj2.serial = 000124
Obj3.serial = 000125
Obj4.serial = 000126
Obj5.serial = 000127
The problem is that when you take the first number-as-string, turn to integer and increment, you loose the leading zeros.
Not all serials are sequential - not all are even numbers (eg FDM-434\RRTASDVI908)
But those that are, I would like to automate entry.
In python, what is the most elegant way to check for leading zeros (*and, I guess, edge cases like 0009999) in a string before iterating, and then re-application of those zeros after increment?
I have a solution to this problem but it isn't elegant. In fact, it's the most boring and blunt alg possible.
Is there an elegant solution to this problem?
EDIT
To clarify the question, I want the serial to have the same number of digits after the increment.
So, in most cases, this will mean reapplying the same number of leading zeros. BUT in some edge cases the number of leading zeros will be decremented. eg: 009 -> 010; 0099 -> 0100
Try str.zfill():
>>> s = "000123"
>>> i = int(s)
>>> i
123
>>> n = 6
>>> str(i).zfill(n)
'000123'
I develop my comment here, Obj1.serial being a string:
Obj1.serial = "000123"
('%0'+str(len(Obj1.serial))+'d') % (1+int(Obj1.serial))
It's like #owen-s answer '%06d' % n: print the number and pad with leading 0.
Regarding '%d' % n, it's just one way of printing. From PEP3101:
In Python 3.0, the % operator is supplemented by a more powerful
string formatting method, format(). Support for the str.format()
method has been backported to Python 2.6.
So you may want to use format instead… Anyway, you have an integer at the right of the % sign, and it will replace the %d inside the left string.
'%06d' means print a minimum of 6 (6) digits (d) long, fill with 0 (0) if necessary.
As Obj1.serial is a string, you have to convert it to an integer before the increment: 1+int(Obj1.serial). And because the right side takes an integer, we can leave it like that.
Now, for the left part, as we can't hard code 6, we have to take the length of Obj1.serial. But this is an integer, so we have to convert it back to a string, and concatenate to the rest of the expression %0 6 d : '%0'+str(len(Obj1.serial))+'d'. Thus
('%0'+str(len(Obj1.serial))+'d') % (1+int(Obj1.serial))
Now, with format (format-specification):
'{0:06}'.format(n)
is replaced in the same way by
('{0:0'+str(len(Obj1.serial))+'}').format(1+int(Obj1.serial))
You could check the length of the string ahead of time, then use rjust to pad to the same length afterwards:
>>> s = "000123"
>>> len_s = len(s)
>>> i = int(s)
>>> i
123
>>> str(i).rjust(len_s, "0")
'000123'
You can check a serial number for all digits using:
if serial.isdigit():

Length of hexadecimal number

How can we get the length of a hexadecimal number in the Python language?
I tried using this code but even this is showing some error.
i = 0
def hex_len(a):
if a > 0x0:
# i = 0
i = i + 1
a = a/16
return i
b = 0x346
print(hex_len(b))
Here I just used 346 as the hexadecimal number, but my actual numbers are very big to be counted manually.
Use the function hex:
>>> b = 0x346
>>> hex(b)
'0x346'
>>> len(hex(b))-2
3
or using string formatting:
>>> len("{:x}".format(b))
3
While using the string representation as intermediate result has some merits in simplicity it's somewhat wasted time and memory. I'd prefer a mathematical solution (returning the pure number of digits without any 0x-prefix):
from math import ceil, log
def numberLength(n, base=16):
return ceil(log(n+1)/log(base))
The +1 adjustment takes care of the fact, that for an exact power of your number base you need a leading "1".
As Ashwini wrote, the hex function does the hard work for you:
hex(x)
Convert an integer number (of any size) to a hexadecimal string. The result is a valid Python expression.

Categories

Resources