What is the equivalent of Base64.encodeToString(bytes,11) in python? - python

I am trying to recreate a java program in python, but I am stuck at this point
import android.util.Base64 -> this is the package
Base64.encodeToString(bytes, 11)
The python module for encoding with base64 only gets one parameter, which is bytes.
apparently, according to the android docs, the second parameter is supposed to indicate a flag, but I can't find any information about the number 11
What does this mean and how can I implement this in python?

11 is a combination of flags (or-ed together), specifically:
NO_PADDING, which has value 1
NO_WRAP which has value 2 and
URL_SAFE which has value 8.
I don't know the exact way to reproduce this in Python, but I believe base64.urlsafe_b64encode gets you halfway there by implementing the equivalent of URL_SAFE. For NO_PADDING you could simply trim any trailing padding (i.e. = charcters) in the output.

Related

Python __add__ magic method with integers

When i am tring this:
>>> "a".__add__("b")
'ab'
It is working. But this
>>> 1.__add__(2)
SyntaxError: invalid syntax
is not working.
And this is working:
>>> 1 .__add__(2) # notice that space after 1
3
What is going here? Is it about variable naming rules and is python thinks I am trying to create variable when I am not using space?
Python parser is intentionally kept dumb and simple. When it sees 1., it thinks you are midway through a floating point number, and 1._ is not a valid number (or more correctly, 1. is a valid float, and you can't follow a value by _: "a" __add__("b") is also an error). Thus, anything that makes it clear that . is not a part of the number helps: having a space before the dot, as you discovered (since space is not found in numbers, Python abandons the idea of a float and goes with integral syntax). Parentheses would also help: (1).__add__(2). Adding another dot does as well: 1..__add__(2) (here, 1. is a valid number, then .__add__ does the usual thing).
The python lexical parser tries to interpret an integer followed by a dot as a floating point number. To avoid this ambiguity, you have to add an extra space.
For comparison, the same code works without problem on a double:
>>> 4.23.__add__(2)
6.23
It also works if you put the int in parentheses:
>>> (5).__add__(4)
9
When you use 1. the interpreter think you started writing float number (you can see in the IDE (atleast Pycharm) the dot is blue, not white). The space tell it to treat 1. as a complete number, 1.0. 1..__add__(2) will also do the trick.

Python library to generate hash value as numbers

I am searching for a library where I need to hash a string which should producer numbers rather than alpha numeric
eg:
Input string: hello world
Salt value: 5467865390
Output value: 9223372036854775808
I have searched many libraries, but those library produces alpha-numeric as output, but I need plain numbers as output.
Is there is any such library? Though the problem of having only numbers as output will have high chance of collision, but though it is fine for my business use case.
EDIT 1:
Also I need to control the number of digits in output. I want to store the value in database which has Numeric datatype. So I need to control the number of digits to fit the size within the data type range
Hexadecimal hash codes can be interpreted as (rather large) numbers:
import hashlib
hex_hash = hashlib.sha1('hello world'.encode('utf-8')).hexdigest()
int_hash = int(hex_hash, 16) # convert hexadecimal to integer
print(hex_hash)
print(int_hash)
outputs
'2aae6c35c94fcfb415dbe95f408b9ce91ee846ed'
243667368468580896692010249115860146898325751533
EDIT: As asked in the comments, to limit the number to a certain range, you can simply use the modulus operator. Note, of course, that this will increase the possibility of collisions. For instance, we can limit the "hash" to 0 .. 9,999,999 with modulus 10,000,000.
limited_hex_hash = hex_hash % 10_000_000
print(limited_hex_hash)
outputs
5751533
I think there is no need for libraries. You can simply accomplish this with hash() function in python.
InputString="Hello World!!"
HashValue=hash(InputString)
print(HashValue)
print(type(HashValue))
Output:
8831022758553168752
<class 'int'>
Solution for the problem based on Latest EDIT :
The above method is the simplest solution, changing the hash for each invocation will help us prevent attackers from tampering our application.
If you like to switch off the randomization you can simply do that by assigning
PYTHONHASHSEED to zero.
For information on switching off the randomization check the official docs https://docs.python.org/3.3/using/cmdline.html#cmdoption-R

SHA512 crypt returns *0 when rounds=5000

Since some days following python program returns *0:
import crypt
# broken:
>>> crypt.crypt('pw', '$6$rounds=5000$0123456789abcdef')
'*0'
# works:
>>> crypt.crypt("pw", '$6$0123456789abcdef')
'$6$0123456789abcdef$zAYvvEJcrKSqV2KUPTUM1K9eaGv20n9mUjWSDZW0QnwBRk0L...'
>>> crypt.crypt('pw', '$6$rounds=5001$0123456789abcdef')
'$6$rounds=5001$0123456789abcdef$mG98GkftS5iu1VOpowpXm1fgefTbWnRm4rbw...'
>>> crypt.crypt("pw", '$6$rounds=4999$0123456789abcdef')
'$6$rounds=4999$0123456789abcdef$ulXwrQtpwNd/t6NVUJo53AXMpp40IrpCHFyC...'
I did the same with a small C program using crypt_r and the output was the same. I read in some posts that *0 and *1 will be returned when there are errors.
According to the manpage crypt(3) specifying the rounds=xxx parameter is supported since glibc 2.7 and the default is 5000, when no rounds parameter is given (like in the second example). But why am I not allowed to set rounds to 5000?
I'm using Fedora 28 with glibc 2.27. The results are the same with different Python versions (even Python2 and Python3). Using crypt in PHP also works as expected. But the most interesting thing is that running the same command in a Docker container (fedora:28) works:
>>> crypt.crypt("pw", '$6$rounds=5000$0123456789abcdef')
'$6$rounds=5000$0123456789abcdef$zAYvvEJcrKSqV2KUPTUM1K9eaGv20n9mUjWS...'
Does anybody know the reason for this behavior?
The libxcrypt sources contain this:
/* Do not allow an explicit setting of zero rounds, nor of the
default number of rounds, nor leading zeroes on the rounds. */
This was introduced in a commit “Add more tests based on gaps in line coverage.” with this comment:
This change makes us pickier about non-default round parameters to $5$ and $6$ hashes; numbers outside the valid range are now rejected, as are numbers with leading zeroes and an explicit request for the
default number of rounds. This is in keeping with the observation, in
the Passlib documentation, that allowing more than one valid crypt
output string for any given (rounds, salt, phrase) triple is asking
for trouble.
I suggest to open an issue if this causes too many compatibility issues. Alternatively, you can remove the rounds=5000 specification, but based on a quick glance, the change looks to me as if it should be reverted. It's not part of the original libcrypt implementation in glibc.

PyPDF2<=1.19 has issues with PDF encoding

I am trying to encrypt PDF files under Python 3.3.2 using PyPDF2.
The code is very simple:
password = 'password';
# password = password.encode('utf-8')
PDFout.encrypt(user_pwd=password,owner_pwd=password)
however I am getting the following errors, depending if the encoding is on or off:
on: TypeError: slice indices must be integers or None or have an __index__ method
off: TypeError: Can't convert 'bytes' object to str implicitly
Would you know by any chance how to resolve that problem?
Thanks and Regards
Peter
It appears to me that the current version of PyPDF2 (1.19 as of this writing) has some bugs concerning compatibility with Python 3, and that is what is causing both error messages. The change log on GitHub for PyPDF2 indicates that Python 3 support was added in version 1.16, which was released only 3 1/2 months ago, so it is possible this bug hasn't either been reported or fixed yet. GitHub also shows that there is a branch of this project specifically for Python 3.3 support, which is not currently merged back into the main branch.
Both errors occur in the pdf.py file of the PyPDF2 module. Here is what is happening:
The PyPDF2 module creates some extra bytes as padding and concatenates it with your password. If the Python version is less than 3, the padding is created as a string literal. If the version is 3 or higher, the padding is encoded using the 'latin-1' encoding. In Python 3, this means the padding is a bytes object, and concatenating that with a string object (your password) produces the TypeError you saw. Under Python 2, the concatenation would work because both objects would be the same type.
When you encode your password using "utf-8", you resolve that problem since both the password and padding are bytes objects in that case. However, you end up running into a second bug later in the module. The pdf.py file creates and uses a variable "keylen" like this:
keylen = 128 / 8
... # later on in the code...
key = md5_hash[:keylen]
The division operator underwent a change in Python 2.2 which altered its default behavior starting in Python 3. In brief, "/" means floor division in Python 2 and returns an int, but it means true division in Python 3 and returns a float. Therefore, "keylen" would be 16 in Python 2, but instead 16.0 in Python 3. Floats, unlike ints, can't be used to splice arrays, so Python 3 throws the TypeError you saw when md5_hash[:keylen] is evaluated. Python 2 would run this without error, since keylen would be an int.
You could resolve this second problem by altering the module's source code to use the "//" operator (which means floor division and returns an int in both Python 2 and 3):
keylen = 128 // 8
However, you would then run into a third bug later in the code, also related to Python 3 compatibility. I won't belabor the point by describing it. The short answer to your question then, as far as I see it, is to either use Python 2, or patch the various code compatibility problems, or use a different PDF library for Python which has better support for Python 3 (if one exists which meets your particular requirements).
Try installing the most recent version of PyPDF2 - it now fully supports Python 3!
It seems that "some" support was added in 1.16, but it didn't cover all features. Now, Py 3 should be fully compatible with this library.

how to avoid python numeric literals beginning with "0" being treated as octal?

I am trying to write a small Python 2.x API to support fetching a
job by jobNumber, where jobNumber is provided as an integer.
Sometimes the users provide ajobNumber as an integer literal
beginning with 0, e.g. 037537. (This is because they have been
coddled by R, a language that sanely considers 037537==37537.)
Python, however, considers integer literals starting with "0" to
be OCTAL, thus 037537!=37537, instead 037537==16223. This
strikes me as a blatant affront to the principle of least
surprise, and thankfully it looks like this was fixed in Python
3---see PEP 3127.
But I'm stuck with Python 2.7 at the moment. So my users do this:
>>> fetchJob(037537)
and silently get the wrong job (16223), or this:
>>> fetchJob(038537)
File "<stdin>", line 1
fetchJob(038537)
^
SyntaxError: invalid token
where Python is rejecting the octal-incompatible digit.
There doesn't seem to be anything provided via __future__ to
allow me to get the Py3K behavior---it would have to be built-in
to Python in some manner, since it requires a change to the lexer
at least.
Is anyone aware of how I could protect my users from getting the
wrong job in cases like this? At the moment the best I can think
of is to change that API so it take a string instead of an int.
At the moment the best I can think of is to change that API so it take a string instead of an int.
Yes, and I think this is a reasonable option given the situation.
Another option would be to make sure that all your job numbers contain at least one digit greater than 7 so that adding the leading zero will give an error immediately instead of an incorrect result, but that seems like a bigger hack than using strings.
A final option could be to educate your users. It will only take five minutes or so to explain not to add the leading zero and what can happen if you do. Even if they forget or accidentally add the zero due to old habits, they are more likely to spot the problem if they have heard of it before.
Perhaps you could take the input as a string, strip leading zeros, then convert back to an int?
test = "001234505"
test = int(test.lstrip("0")) # 1234505

Categories

Resources