How to increment a numeric string in Python - python

I've spent the last two days trying to figure out how to increment a numeric string in Python. I am trying to increment a sequence number when a record is created. I spent all day yesterday trying to do this as an Integer, and it works fine, but I could never get database to store leading zeros. I did extensive research on this topic in StackOverflow, and while there are several examples of how to do this as an Integer and store leading zeros, none of the examples worked for me. Many of the examples were from 2014, so perhaps the methodology has changed. I then switched over to a String and changed my attribute to a CharField, and can get the function to work with leading zeros, but now I can't seem to get it to increment. Again, the examples that I found on SO were from 2014, so maybe things have changed a bit. Here is the function that works, but every time I call it, it doesn't increment. It just returns 00000001. I'm sure it's something simple I'm not doing, but I'm out of ideas. Thanks in advance for your help. Here is the function that works but doesn't increment.
def getNextSeqNo(self):
x = str(int(self.request_number) + 1)
self.request_number = str(x).zfill(8)
return self.request_number
Here is the field as it's defined:
request_number = models.CharField(editable=True,null=True,max_length=254,default="00000")
I added a default of "00000" as the system is giving me the following error if it is not present:
int() argument must be a string, a bytes-like object or a number, not 'NoneType'
I realize the code I have is basically incrementing my default by 1, which is why I'm always getting 00000001 as my sequence number. Can't seem to figure out how to get the current number and then increment by 1. Any help is appreciated.

A times ago I made something similar
You have to convert your string to int and then you must to get its length and then you have to calculate the number of zeros that you need
code = "00000"
code = str(int(code) + 1 )
code_length = len(code)
if code_length < 5: # number five is the max length of your code
code = "0" * (5 - code_length) + code
print(code)

Can this be done? Yes. But don't do it.
Make it an integer.
Incrementing is then trivial - automatic if you make this the primary key. For searching, you convert the string to an integer and search the integer - that way you don't have to worry how many leading zeros were actually included as they will all be ignored. Otherwise you will have a problem if you use 6 digits and the user can't remember and puts in 6 0's + the number and then doesn't get a match.

For those who want to just increase the last number in a string.
Import re
a1 = 'name 1'
num0_m = re.search(r'\d+', str(a1))
if num0_m:
rx = r'(?<!\d){}(?!\d)'.format(num0_m.group())
print(re.sub(rx, lambda x: str(int(x.group()) + 1), a1))

number = int('00000150')
('0'*7 + str(number+1))[-8:]
This takes any number, adds 1, concatenates/joins it to a string of several (at least 7 in your case) zeros, and then slices to return the last 8 characters.
IMHO simpler and more elegant than measuring length and working out how many zeros to add.

Related

Why compare two strings via calculating xor of their characters?

Some time ago I found this function (unfortunately, I don't remember from where it came from, most likely from some Python framework) that compares two strings and returns a bool value. It's quite simple to understand what's going on here.
Finding xor between char returns 1 (True) if they do not match.
def cmp_strings(str1, str2):
return len(str1) == len(str2) and sum(ord(x)^ord(y) for x, y in zip(str1, str2)) == 0
But why is this function used? Isn't it the same as str1==str2?
It takes a similar amount of time to compare any strings that have the same length. It's used for security when the strings are sensitive. Usually it's used to compare password hashes.
If == is used, Python stops comparing characters when the first one not matching is found. This is bad for hashes because it could reveal how close a hash was to matching. This would help an attacker to brute force a password.
This is how hmac.compare_digest works.
The security issue that is being addressed by XOR comparison is known as a Timing Attack. ...This is where you observe how much time it takes the Compare function to succeed|fail, and use that knowledge to gain an advantage over the system.
There are 95 printable ASCII characters. If you have an 8 character password, there are 95^8 (6,634,204,312,890,625) possible combinations ...If the correct password is the last one in your list, and you can try 1 billion passwords per second, it will take you about 77 days to Brute Force the password ...That's too long - so we need a shortcut!
There are an infinite number of ways to store a string - and probably a dozen in popular use {length-prefixed, nul-terminated, ...}{Unicode, UTF-8, ASCII, ,...}. For this working example, I will use the ubiquitous 'NUL-terminated array of bytes using ASCII encoding' ...IE. "ABC" will be stored as "ABC"NUL, or {65, 66, 67, 0} ...but whatever storage/encoding standard you use, the problem is essentially the same.
Syntactically, there are as many ways to compare two strings as there are languages, eg. if str1 == str2 or if (strcmp(str1, str2) == 0) etc. ...but when you look at how they work internally, they are all pretty-much the same. Here is some simple (but realisitic) pseudo-code to perform a classic (non-security) string compare:
index = 0
LOOP FOREVER {
IF ( (str1[index] == 0) AND (str2[index] == 0) ) THEN return 'same'
IF (str1[index] != str2[index]) THEN return 'different'
index = index + 1
}
Assuming the secret password is "BY3"NUL ...Let's try some passwords, and notice how many operations the Compare function has to do to establish success|fail.
1. "A"NUL ... returns 'different' when 1st char is checked (A) [zero chars are correct]
2. "B"NUL ... returns 'different' when 2nd char is checked (NUL) [first char must be correct]
3. "BX"NUL ... returns 'different' when 2nd char is checked (X) [first char must be correct]
4. "BY"NUL ... returns 'different' when 3rd char is checked (NUL) [first two chars must be correct]
5. "BY1"NUL ... returns 'different' when 3rd char is checked (1) [first two chars must be correct]
6. "BY2"NUL ... returns 'different' when 3rd char is checked (2) [first two chars must be correct]
7. "BY3"NUL ... returns 'same' when the 4th character is checked (NUL) [all three chars are correct]
You can see that guess 1 fails the 1st time around the loop, guesses 2 & 3 fail the 2nd time around the loop ...guesses 4, 5, 6 fail the 3rd time around the loop ...and guess 7 succeeds the 4th time around the loop.
By observing how much time it takes the Compare function to fail, we can tell which character is wrong! This means we can actually guess the password one character at a time.
Again, let's assume an 8 character password made up of the 95 printable characters, and our last guess will be correct ...Because we can now guess the password one character at a time, it will take 95*8 (760) guesses. At 1 billion guesses per second, it will take about 0.7 milliseconds to find the password [it takes about 100mS to blink] ...which is a significant advantage over 77 days ...For a laugh work out the advantage for a 20 character password (95^20 vs 95 * 20).
So how do we stop an attacker from using a Timing Attack? [Spoiler: XOR]
The first thing we need to do is to make both strings the same length; and secondly, we must ALWAYS check EVERY character before returning 'same' or 'different' ...This is surprisingly difficult to do without introducing a new Timing Attack. But rather than show you lots of ways to get it wrong, let's see a way to do it right.
Passwords should (where possible) be stored as Hashes ...{DES, MD5, SHA-1, ...} have now been shown to have cryptographic flaws, {SHA-256, SHA-3, Whirlpool, ...} are still in good favour [Oct 2021] ...You may know that ALL Hashes (generated by a given algorithm) are the same length ...So if we Hash the guess and compare the Guess-Hash against the Stored-Hash, we have solved the first problem - the 'strings' (array of bytes) we need to compare are now ALWAYS the same length.
Secondly. How to make sure our Compare function ALWAYS takes the same amount of time to reach its decision ...There are probably a lot of ways to do this, but the most common solution is to use XOR like this:
result = 0
index = 0
LOOP WHILE (index < hashLength) {
result = result OR ( secretHash[index] XOR guessHash[index] )
index = index + 1
}
IF result == 0 THEN return 'same' ELSE return 'different'
And this way ALL calls to the compare function take the same length of time to run ...No more Timing Attack!
Footnote:
For readers not familiar with Boolean Logic - go and read up; but the essence here is:
If A and B are the same, (A XOR B) gives a result of 0
If A and B are different, (A XOR B) gives a non-0 result
If A and B are both 0, (A OR B) gives a result of 0
If either A or B are non-0, (A OR B) gives a non-0 result
So (looking at the second code block) the first time the XOR returns non-0 (different), the result becomes non-0 (different) and can never return to 0 (same).
A search for "cve timing attack" will provide you with a list of real-life examples.
It appears to be doing a correlation (XOR sum) character-wise between the strings, given they are of the same length. It could be required in situations where you need to know 'similarity' and not equality. Maybe that was the plan. The author might have wanted to extend this function further.

How do I represent a string as a number?

I need to represent a string as a number, however it is 8928313 characters long, note this string can contain more than just alphabet letters, and I have to be able to convert it back efficiently too. My current (too slow) code looks like this:
alpha = 'abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ,.?!#()+-=[]/*1234567890^*{}\'"$\\&#;|%<>:`~_'
alphaLeng = len(alpha)
def letterNumber(letters):
letters = str(letters)
cof = 1
nr = 0
for i in range(len(letters)):
nr += cof*alpha.find(letters[i])
cof *= alphaLeng
print(i,' ',len(letters))
return str(nr)
Ok, since other people are giving awful answers, I'm going to step in.
You shouldn't do this.
You shouldn't do this.
An integer and an array of characters are ultimately the same thing: bytes. You can access the values in the same way.
Most number representations cap out at 8 bytes (64-bits). You're looking at 8 MB, or 1 million times the largest integer representation. You shouldn't do this. Really.
You shouldn't do this. Your number will just be a custom, gigantic number type that would be identical under the hood.
If you really want to do this, despite all the reasons above, here's how...
Code
def lshift(a, b):
# bitwise left shift 8
return (a << (8 * b))
def string_to_int(data):
sum_ = 0
r = range(len(data)-1, -1, -1)
for a, b in zip(bytearray(data), r):
sum_ += lshift(a, b)
return sum_;
DONT DO THIS
Explanation
Characters are essentially bytes: they can be encoded in different ways, but ultimately you can treat them within a given encoding as a sequence of bytes. In order to convert them to a number, we can shift them left 8-bits for their position in the sequence, creating a unique number. r, the range value, is the position in reverse order: the 4th element needs to go left 24 bytes (3*8), etc.
After getting the range and converting our data to 8-bit integers, we can then transform the data and take the sum, giving us our unique identifier. It will be identical byte-wise (or in reverse byte-order) of the original number, but just "as a number". This is entirely futile. Don't do it.
Performance
Any performance is going to be outweighed by the fact that you're creating an identical object for no valid reason, but this solution is decently performant.
1,000 elements takes ~486 microseconds, 10,000 elements takes ~20.5 ms, while 100,000 elements takes about 1.5 seconds. It would work, but you shouldn't do it. This means it's scaled as O(n**2), which is likely due to memory overhead of reallocating the data each time the integer size gets larger. This might take ~4 hours to process all 8e6 elements (14365 seconds, calculated fitting the lower-order data to ax**2+bx+c). Remember, this is all to get the identical byte representation as the original data.
Futility
Remember, there are ~1e78 to 1e82 atoms in the entire universe, on current estimates. This is ~2^275. Your value will be able to represent 2^71426504, or about 260,000 times as many bits as you need to represent every atom in the universe. You don't need such a number. You never will.
If there are only ANSII characters. You can use ord() and chr().
built-in functions
There are several optimizations you can perform. For example, the find method requires searching through your string for the corresponding letter. A dictionary would be faster. Even faster might be (benchmark!) the chr function (if you're not too picky about the letter ordering) and the ord function to reverse the chr. But if you're not picky about ordering, it might be better if you just left-NULL-padded your string and treated it as a big binary number in memory if you don't need to display the value in any particular format.
You might get some speedup by iterating over characters instead of character indices. If you're using Python 2, a large range will be slow since a list needs to be generated (use xrange instead for Python 2); Python 3 uses a generator, so it's better.
Your print function is going to slow down output a fair bit, especially if you're outputting to a tty.
A big number library may also buy you speed-up: Handling big numbers in code
Your alpha.find() function needs to iterate through alpha on each loop.
You can probably speed things up by using a dict, as dictionary lookups are O(1):
alpha = 'abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ,.?!#()+-=[]/*1234567890^*{}\'"$\\&#;|%<>:`~_'
alpha_dict = { letter: index for index, letter in enumerate(alpha)}
print(alpha.find('$'))
# 83
print(alpha_dict['$'])
# 83
Store your strings in an array of distinct values; i.e. a string table. In your dataset, use a reference number. A reference number of n corresponds to the nth element of the string table array.

Sum of parts of a string in Python

I'm learning to program using the book "Introduction to computation and programming using Python" by John V. Guttag. There is an exercise on it that says the following:
'Finger exercise: Let s be a string that contains a sequence of
decimal numbers separated by commas, e.g., s = '1.23,2.4,3.123'. Write
a program that prints the sum of the numbers in s.'
My try was:
#Finger exercise [MIT] PAGE 42 12:50 | 11.10.2015
s = ','+raw_input('Enter a string that contains a sequence of decimal numbers separated by commas, e.g. 1.23,2.4,3.123): ')+','
total = 0
for l in range(0,len(s)):
if s[l] == ',':
c = l + 1
while s[c] != ',':
c = c + 1
if s[c] == ',':
total = total + int(s[int(l),int(c)])
print total
but it keeps showing this error
TypeError: string indices must be integers, not tuple
I've tried to seek solutions online but only found solutions that work but not with the content I already now.
Any help?
You are creating a tuple when accessing your string item here:
s[int(l),int(c)]
Commas generally create tuples.
Instead, you want to use a slice here using a colon:
s[int(l):int(c)]
Note that both variables are already integers, so you don't actually need to convert them:
s[l:c]
Also note that you are summing integer values although you accept floats as the input. So instead of adding int(s[l:c]) you want to add float(s[l:c]).
First of all, there is no processing of anything before the first comma.
Next, you should comment each part of it at least initially to you are clear what each line is doing.
You shouldn't need to check for a ',' in multiple places, keep a variable.
A solution I found, hope its useful:
s = "1.23, 2.4, 3.123"
news = s.split(",")
total = 0
for string in range(len(news)):
total += float(news[string])
print(total)

Strings, ints and leading zeros

I need to record SerialNumber(s) on an object. We enter many objects. Most serial numbers are strings - the numbers aren't used numerically, just as unique identifiers - but they are often sequential. Further, leading zeros are important due to unique id status of serial number.
When doing data entry, it's nice to just enter the first "sequential" serial number (eg 000123) and then the number of items (eg 5) to get the desired output - that way we can enter data in bulk see below:
Obj1.serial = 000123
Obj2.serial = 000124
Obj3.serial = 000125
Obj4.serial = 000126
Obj5.serial = 000127
The problem is that when you take the first number-as-string, turn to integer and increment, you loose the leading zeros.
Not all serials are sequential - not all are even numbers (eg FDM-434\RRTASDVI908)
But those that are, I would like to automate entry.
In python, what is the most elegant way to check for leading zeros (*and, I guess, edge cases like 0009999) in a string before iterating, and then re-application of those zeros after increment?
I have a solution to this problem but it isn't elegant. In fact, it's the most boring and blunt alg possible.
Is there an elegant solution to this problem?
EDIT
To clarify the question, I want the serial to have the same number of digits after the increment.
So, in most cases, this will mean reapplying the same number of leading zeros. BUT in some edge cases the number of leading zeros will be decremented. eg: 009 -> 010; 0099 -> 0100
Try str.zfill():
>>> s = "000123"
>>> i = int(s)
>>> i
123
>>> n = 6
>>> str(i).zfill(n)
'000123'
I develop my comment here, Obj1.serial being a string:
Obj1.serial = "000123"
('%0'+str(len(Obj1.serial))+'d') % (1+int(Obj1.serial))
It's like #owen-s answer '%06d' % n: print the number and pad with leading 0.
Regarding '%d' % n, it's just one way of printing. From PEP3101:
In Python 3.0, the % operator is supplemented by a more powerful
string formatting method, format(). Support for the str.format()
method has been backported to Python 2.6.
So you may want to use format instead… Anyway, you have an integer at the right of the % sign, and it will replace the %d inside the left string.
'%06d' means print a minimum of 6 (6) digits (d) long, fill with 0 (0) if necessary.
As Obj1.serial is a string, you have to convert it to an integer before the increment: 1+int(Obj1.serial). And because the right side takes an integer, we can leave it like that.
Now, for the left part, as we can't hard code 6, we have to take the length of Obj1.serial. But this is an integer, so we have to convert it back to a string, and concatenate to the rest of the expression %0 6 d : '%0'+str(len(Obj1.serial))+'d'. Thus
('%0'+str(len(Obj1.serial))+'d') % (1+int(Obj1.serial))
Now, with format (format-specification):
'{0:06}'.format(n)
is replaced in the same way by
('{0:0'+str(len(Obj1.serial))+'}').format(1+int(Obj1.serial))
You could check the length of the string ahead of time, then use rjust to pad to the same length afterwards:
>>> s = "000123"
>>> len_s = len(s)
>>> i = int(s)
>>> i
123
>>> str(i).rjust(len_s, "0")
'000123'
You can check a serial number for all digits using:
if serial.isdigit():

Python: How to refer to a digit in a string by its index?

I feel like this is a simple question, but it keeps escaping me...
If I had a string, say, "1010101", how would I refer to the first digit in the string by its index?
You can get the first element of any sequence with [0]. Since a string is a sequence of characters, you're looking for s[0]:
>>> s = "1010101"
>>> s[0]
'1'
For a detailed explanation, refer to the Python tutorial on strings.
Negative indexes count from the right side.
digit = mystring[-1]
In Python, a sting is something called, subscriptable. That means that you can access the different parts using square brackets, just like you can with a list.
If you want to get the first character of the string, then you can simply use my_string[0].
If you need to get the last (character) in a string (the final 1 in the string you provided), then use my_string[-1].
If you originally have an int (or a long) and you are looking for the last digit, you are best off using % (modulous) (10101 % 10 => 1).
If you have a float, on the other hand, you are best of str(my_float)[-1]

Categories

Resources