Since some days following python program returns *0:
import crypt
# broken:
>>> crypt.crypt('pw', '$6$rounds=5000$0123456789abcdef')
'*0'
# works:
>>> crypt.crypt("pw", '$6$0123456789abcdef')
'$6$0123456789abcdef$zAYvvEJcrKSqV2KUPTUM1K9eaGv20n9mUjWSDZW0QnwBRk0L...'
>>> crypt.crypt('pw', '$6$rounds=5001$0123456789abcdef')
'$6$rounds=5001$0123456789abcdef$mG98GkftS5iu1VOpowpXm1fgefTbWnRm4rbw...'
>>> crypt.crypt("pw", '$6$rounds=4999$0123456789abcdef')
'$6$rounds=4999$0123456789abcdef$ulXwrQtpwNd/t6NVUJo53AXMpp40IrpCHFyC...'
I did the same with a small C program using crypt_r and the output was the same. I read in some posts that *0 and *1 will be returned when there are errors.
According to the manpage crypt(3) specifying the rounds=xxx parameter is supported since glibc 2.7 and the default is 5000, when no rounds parameter is given (like in the second example). But why am I not allowed to set rounds to 5000?
I'm using Fedora 28 with glibc 2.27. The results are the same with different Python versions (even Python2 and Python3). Using crypt in PHP also works as expected. But the most interesting thing is that running the same command in a Docker container (fedora:28) works:
>>> crypt.crypt("pw", '$6$rounds=5000$0123456789abcdef')
'$6$rounds=5000$0123456789abcdef$zAYvvEJcrKSqV2KUPTUM1K9eaGv20n9mUjWS...'
Does anybody know the reason for this behavior?
The libxcrypt sources contain this:
/* Do not allow an explicit setting of zero rounds, nor of the
default number of rounds, nor leading zeroes on the rounds. */
This was introduced in a commit “Add more tests based on gaps in line coverage.” with this comment:
This change makes us pickier about non-default round parameters to $5$ and $6$ hashes; numbers outside the valid range are now rejected, as are numbers with leading zeroes and an explicit request for the
default number of rounds. This is in keeping with the observation, in
the Passlib documentation, that allowing more than one valid crypt
output string for any given (rounds, salt, phrase) triple is asking
for trouble.
I suggest to open an issue if this causes too many compatibility issues. Alternatively, you can remove the rounds=5000 specification, but based on a quick glance, the change looks to me as if it should be reverted. It's not part of the original libcrypt implementation in glibc.
Related
I am trying to recreate a java program in python, but I am stuck at this point
import android.util.Base64 -> this is the package
Base64.encodeToString(bytes, 11)
The python module for encoding with base64 only gets one parameter, which is bytes.
apparently, according to the android docs, the second parameter is supposed to indicate a flag, but I can't find any information about the number 11
What does this mean and how can I implement this in python?
11 is a combination of flags (or-ed together), specifically:
NO_PADDING, which has value 1
NO_WRAP which has value 2 and
URL_SAFE which has value 8.
I don't know the exact way to reproduce this in Python, but I believe base64.urlsafe_b64encode gets you halfway there by implementing the equivalent of URL_SAFE. For NO_PADDING you could simply trim any trailing padding (i.e. = charcters) in the output.
Armin Ronacher, http://lucumr.pocoo.org/2013/7/2/the-updated-guide-to-unicode/
If you for instance pass [the result of os.fsdecode() or equivalent] to a template engine you [sometimes get a UnicodeEncodeError] somewhere else entirely and because the encoding happens at a much later stage you no longer know why the string was incorrect. If you detect that error when it happens the issue becomes much easier to debug
Armin suggests a function
def remove_surrogate_escaping(s, method='ignore'):
assert method in ('ignore', 'replace'), 'invalid removal method'
return s.encode('utf-8', method).decode('utf-8')
Nick Coghlan, 2014, [Python-Dev] Cleaning up surrogate escaped strings
The current proposal on the issue tracker is to ... take advantage of
the existing error handlers:
def convert_surrogateescape(data, errors='replace'):
return data.encode('utf-8', 'surrogateescape').decode('utf-8', errors)
That code is short, but semantically dense - it took a few iterations to
come up with that version. (Added bonus: once you're alerted to the
possibility, it's trivial to write your own version for existing Python 3
versions. The standard name just makes it easier to look up when you come
across it in a piece of code, and provides the option of optimising it
later if it ever seems worth the extra work)
The functions are slightly different. The second was written with knowledge of the first.
Since Python 3.5, the backslashreplace error handler now works on decoding as well as encoding. The first approach is not designed to use backslashreplace e.g. an error decoding the byte 0xff would get printed as "\udcff". The second approach is designed to solve this; it would print "\xff".
If you did not need backslashreplace, you might prefer the first version if you had the misfortune to be supporting Python < 3.5 (including polyglot 2/3 code, ouch).
Question
Is there a better idiom for this purpose yet? Or do we still use this drop-in function?
Nick referred to an issue for adding such a function to the codecs module. As of 2019 the function has not been added, and the ticket remains open.
The latest comment says
msg314682 Nick Coghlan, 2018
A recent discussion on python-ideas also introduced me to the third party library, "ftfy", which offers a wide range of tools for cleaning up improperly decoded data.
That includes a lone surrogate fixer: ftfy.fixes.fix_surrogates(text)
...
I do not find the function in ftfy appealing. The documentation does not say so, but it appears to be designed to handle both surrogateescape and ... be part of a workaround for CESU-8, or something like that ?
Replace 16-bit surrogate codepoints with the characters they represent (when properly paired), or with � otherwise.
I am trying to encrypt PDF files under Python 3.3.2 using PyPDF2.
The code is very simple:
password = 'password';
# password = password.encode('utf-8')
PDFout.encrypt(user_pwd=password,owner_pwd=password)
however I am getting the following errors, depending if the encoding is on or off:
on: TypeError: slice indices must be integers or None or have an __index__ method
off: TypeError: Can't convert 'bytes' object to str implicitly
Would you know by any chance how to resolve that problem?
Thanks and Regards
Peter
It appears to me that the current version of PyPDF2 (1.19 as of this writing) has some bugs concerning compatibility with Python 3, and that is what is causing both error messages. The change log on GitHub for PyPDF2 indicates that Python 3 support was added in version 1.16, which was released only 3 1/2 months ago, so it is possible this bug hasn't either been reported or fixed yet. GitHub also shows that there is a branch of this project specifically for Python 3.3 support, which is not currently merged back into the main branch.
Both errors occur in the pdf.py file of the PyPDF2 module. Here is what is happening:
The PyPDF2 module creates some extra bytes as padding and concatenates it with your password. If the Python version is less than 3, the padding is created as a string literal. If the version is 3 or higher, the padding is encoded using the 'latin-1' encoding. In Python 3, this means the padding is a bytes object, and concatenating that with a string object (your password) produces the TypeError you saw. Under Python 2, the concatenation would work because both objects would be the same type.
When you encode your password using "utf-8", you resolve that problem since both the password and padding are bytes objects in that case. However, you end up running into a second bug later in the module. The pdf.py file creates and uses a variable "keylen" like this:
keylen = 128 / 8
... # later on in the code...
key = md5_hash[:keylen]
The division operator underwent a change in Python 2.2 which altered its default behavior starting in Python 3. In brief, "/" means floor division in Python 2 and returns an int, but it means true division in Python 3 and returns a float. Therefore, "keylen" would be 16 in Python 2, but instead 16.0 in Python 3. Floats, unlike ints, can't be used to splice arrays, so Python 3 throws the TypeError you saw when md5_hash[:keylen] is evaluated. Python 2 would run this without error, since keylen would be an int.
You could resolve this second problem by altering the module's source code to use the "//" operator (which means floor division and returns an int in both Python 2 and 3):
keylen = 128 // 8
However, you would then run into a third bug later in the code, also related to Python 3 compatibility. I won't belabor the point by describing it. The short answer to your question then, as far as I see it, is to either use Python 2, or patch the various code compatibility problems, or use a different PDF library for Python which has better support for Python 3 (if one exists which meets your particular requirements).
Try installing the most recent version of PyPDF2 - it now fully supports Python 3!
It seems that "some" support was added in 1.16, but it didn't cover all features. Now, Py 3 should be fully compatible with this library.
Haskell and Python don't seem to agree on Murmurhash2 results. Python, Java, and PHP returned the same results but Haskell don't. Am I doing something wrong regarding Murmurhash2 on Haskell?
Here is my code for Haskell Murmurhash2:
import Data.Digest.Murmur32
main = do
print $ asWord32 $ hash32WithSeed 1 "woohoo"
And here is the code written in Python:
import murmur
if __name__ == "__main__":
print murmur.string_hash("woohoo", 1)
Python returned 3650852671 while Haskell returned 3966683799
From a quick inspection of the sources, it looks like the algorithm operates on 32 bits at a time. The Python version gets these by simply grabbing 4 bytes at a time from the input string, while the Haskell version converts each character to a single 32-bit Unicode index.
It's therefore not surprising that they yield different results.
The murmur-hash package (I am its author) does not promise to compute the same hashes as other languages. If you rely on hashes to be compatible with other software that computes hashes I suggest you create newtype wrappers that compute hashes the way you want them. For text, in particular, you need to at least specify the encoding. In your case you could convert the text to an ASCII string using Data.ByteString.Char8.pack, but that still doesn't give you the same hash since the ByteString instance is more of a placeholder.
BTW, I'm not actively improving that package because MurmurHash2 has been superseded by MurmurHash3, but I keep accepting patches.
I am trying to write a small Python 2.x API to support fetching a
job by jobNumber, where jobNumber is provided as an integer.
Sometimes the users provide ajobNumber as an integer literal
beginning with 0, e.g. 037537. (This is because they have been
coddled by R, a language that sanely considers 037537==37537.)
Python, however, considers integer literals starting with "0" to
be OCTAL, thus 037537!=37537, instead 037537==16223. This
strikes me as a blatant affront to the principle of least
surprise, and thankfully it looks like this was fixed in Python
3---see PEP 3127.
But I'm stuck with Python 2.7 at the moment. So my users do this:
>>> fetchJob(037537)
and silently get the wrong job (16223), or this:
>>> fetchJob(038537)
File "<stdin>", line 1
fetchJob(038537)
^
SyntaxError: invalid token
where Python is rejecting the octal-incompatible digit.
There doesn't seem to be anything provided via __future__ to
allow me to get the Py3K behavior---it would have to be built-in
to Python in some manner, since it requires a change to the lexer
at least.
Is anyone aware of how I could protect my users from getting the
wrong job in cases like this? At the moment the best I can think
of is to change that API so it take a string instead of an int.
At the moment the best I can think of is to change that API so it take a string instead of an int.
Yes, and I think this is a reasonable option given the situation.
Another option would be to make sure that all your job numbers contain at least one digit greater than 7 so that adding the leading zero will give an error immediately instead of an incorrect result, but that seems like a bigger hack than using strings.
A final option could be to educate your users. It will only take five minutes or so to explain not to add the leading zero and what can happen if you do. Even if they forget or accidentally add the zero due to old habits, they are more likely to spot the problem if they have heard of it before.
Perhaps you could take the input as a string, strip leading zeros, then convert back to an int?
test = "001234505"
test = int(test.lstrip("0")) # 1234505