I’m using Python 2 and am attempting to performing sha256 on binary values using hashlib.
I’ve become a bit stuck as I’m quite new to it all but have cobbled together:
hashlib.sha256('0110100001100101011011000110110001101111’.decode('hex')).hexdigest()
I believe it interprets the string as hex based on substituting the hex value (‘68656c6c6f’) into the above and it returning
2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
And comparing to this answer in which ‘hello’ or ‘68656c6c6f’ is used.
I think the answer lies with the decode component but I can’t find an example for binary only ‘hex’ or ‘utf-8’
Is anyone able to suggest what needs to be changed so that the function interprets as binary values instead of hex?
Here is code that does each of the data conversions you are looking for. These steps can all be combined, but are separated here so you can see each value.
import hashlib
import binascii
binstr = '0110100001100101011011000110110001101111'
hexstr = "{0:0>4X}".format(int(binstr,2)) # '68656C6C6F'
data = binascii.a2b_hex(hexstr) # 'hello'
output = hashlib.sha256(data).hexdigest()
print output
OUTPUT:
2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
Related
I am trying to read from a file a bunch of hex numbers.
lines ='4005297103CE40C040059B532A7472C440061509BB9597D7400696DBCF1E35CC4007206BB5B0A67B4007AF4B08111B87400840D4766460524008D47E0FFB4ABA400969A572EBAFE7400A0107CCFDF50E'
dummy = [lines[index][i:i+16] for i in range(0, len(lines[index]),16)]
rdummy=[]
for elem in dummy[:-1]:
rdummy.append(int(elem,16))
these are 10 number of 16 digits
in particular when reading the first one, I have:
print(dummy[0])
4005297103CE40C0
now I would like to convert it to float
I have an IDL script that when reading this number gives 2.64523509
the command used in IDL is
double(4613138958682833088,0)
where it appers 0 is an offset used when converting.
is there a way to do this in python?
you probably want to use the struct package for this, something like this seems to work:
import struct
lines ='4005297103CE40C040059B532A7472C440061509BB9597D7400696DBCF1E35CC4007206BB5B0A67B4007AF4B08111B87400840D4766460524008D47E0FFB4ABA400969A572EBAFE7400A0107CCFDF50E'
for [value] in struct.iter_unpack('>d', bytes.fromhex(lines)):
print(value)
results in 2.64523509 being printed first which seems about right
I am searching for a library where I need to hash a string which should producer numbers rather than alpha numeric
eg:
Input string: hello world
Salt value: 5467865390
Output value: 9223372036854775808
I have searched many libraries, but those library produces alpha-numeric as output, but I need plain numbers as output.
Is there is any such library? Though the problem of having only numbers as output will have high chance of collision, but though it is fine for my business use case.
EDIT 1:
Also I need to control the number of digits in output. I want to store the value in database which has Numeric datatype. So I need to control the number of digits to fit the size within the data type range
Hexadecimal hash codes can be interpreted as (rather large) numbers:
import hashlib
hex_hash = hashlib.sha1('hello world'.encode('utf-8')).hexdigest()
int_hash = int(hex_hash, 16) # convert hexadecimal to integer
print(hex_hash)
print(int_hash)
outputs
'2aae6c35c94fcfb415dbe95f408b9ce91ee846ed'
243667368468580896692010249115860146898325751533
EDIT: As asked in the comments, to limit the number to a certain range, you can simply use the modulus operator. Note, of course, that this will increase the possibility of collisions. For instance, we can limit the "hash" to 0 .. 9,999,999 with modulus 10,000,000.
limited_hex_hash = hex_hash % 10_000_000
print(limited_hex_hash)
outputs
5751533
I think there is no need for libraries. You can simply accomplish this with hash() function in python.
InputString="Hello World!!"
HashValue=hash(InputString)
print(HashValue)
print(type(HashValue))
Output:
8831022758553168752
<class 'int'>
Solution for the problem based on Latest EDIT :
The above method is the simplest solution, changing the hash for each invocation will help us prevent attackers from tampering our application.
If you like to switch off the randomization you can simply do that by assigning
PYTHONHASHSEED to zero.
For information on switching off the randomization check the official docs https://docs.python.org/3.3/using/cmdline.html#cmdoption-R
I'm trying to build a way to find emojis in twitter and relate them to the unicode table that one can find in unicode.org but I'm finding hard to identify them because of what I think are encoding problems or simply my misunderstanding on this topic. In short, what I did is build a "library" of emojis from the table found in http://www.unicode.org/emoji/charts/full-emoji-list.html that contains the title and the code point (code) of the emoji. I scrapped this in R with the library rvest.
The problem comes when I grab the information from twitter with the twitteR API in R. As the codes for the emojis do not look at all like the ones in this table.
Let's have an example with the emoji of the 100 (one hundred points) red icon. This is the number 1468 in the before linked table and its code point code is:
U+1F4AF
Now, when I grab it from twitter, first of all it is shown like this in the status class that the API has builtin to work with the tweets.
\xed��\xed��
Then, once I convert it to a dataframe, I do it also with a builtin function from the twitter API. For example:
tweet$toDataFrame()
The emoji becomes this:
<ed><U+00A0><U+00BD><ed><U+00B2><U+00AF>
I tried to convert it with the function iconv in R, with the following code:
iconv(tweet$text, from="UTF-8", to="ASCII", "byte)
and I only manage to make it look like this:
<ed><a0><bd><ed><b2><af>
So, wrapping up and at the end of my tests, I got to the following results:
<ed><a0><bd><ed><b2><af>
<ed><U+00A0><U+00BD><ed><U+00B2><U+00AF>
\xed��\xed��
None of which look like the code point specified by the table:
U+1F4AF
Is there any possibility to transform between the two strings?
What am I missing? Why is twitter returning this information for emojis?
I didn't know anything about enconding before, but after days of reading I think I know what is going on. I don't understand perfectly how the encoding for emoji works, but I stumbled upon the same problem and solved it.
You want to map \xed��\xed�� to its name-decoded version: hundred points. A sensible way could be to scrape a dictionary online and use a key, such as Unicode, to replace it. In this case it would be U+1F4AF.
The conversions you show are not different encodings but different notation for the same encoded emoji:
as.data.frame(tweet) returns <ed><U+00A0><U+00BD><ed><U+00B2><U+00AF>.
iconv(tweet, from="UTF-8", to="ASCII", "byte") returns <ed><a0><bd><ed><b2><af>.
So using Unicode directly isn't feasible. Another way could be to use a dictionary that already encodes emoji in the <ed>...<ed>... way like the one here: emoji list. Voilà! Only her list is incomplete because it comes from
a dictionary that contains fewer emoticons.
The fast solution is to simply scrape a more complete dictionary and map the <ed>...<ed>... with its corresponding english text translation. I have done that already and posted here.
Although the fact that nobody else posted a list with the proper encoding bugged me. In fact, most dictionaries I found had an UTF-8 encoding using not an <ed>...<ed>... representation but rather <f0>.... It turns out they are both correct UTF-8 encodings for the same unicode U+1F4AF only the Bytes are read differently.
Long answer. The tweet is read in UTF-16 and then converted to UTF-8, and here is where conversions diverge. When the read is done by pairs of bytes the result will be UTF-8 <ed>...<ed>..., when it is read by chunks of four bytes the result will be UTF-8 <f0>... (Why is this? I don't fully understand, but I suspect it has something to do with the architecture of your processor).
So a slower (but more conscious) way to solve your problem is to scrape the <f0>... dictionary, convert it to UTF-16, convert it back to UTF-8 by pairs and you'll end up with two <ed>.... These two <ed>... is known as the low-high surrogate pair representation for the Unicode U+xxxxx.
As an example:
unicode <- 0x1F4Af
# Multibyte Version
intToUtf8(unicode)
# Byte-pair Version
hilo <- unicode2hilo(unicode)
intToUtf8(hilo)
Returns:
[1] "\xf0\u009f\u0092�"
[1] "\xed��\xed��"
Which, again, using iconv(..., 'utf-8', 'latin1', 'byte'), is the same as:
[1] "<f0><9f><92><af>"
[1] "<ed><a0><bd><ed><b2><af>"
PS1.:
Function unicode2hilo is a simple linear transformation of hi-lo to unicode
unicode2hilo <- function(unicode){
hi = floor((unicode - 0x10000)/0x400) + 0xd800
lo = (unicode - 0x10000) + 0xdc00 - (hi-0xd800)*0x400
hilo = paste('0x', as.hexmode(c(hi,lo)), sep = '')
return(hilo)
}
hilo2unicode <- function(hi,lo){
unicode = (hi - 0xD800) * 0x400 + lo - 0xDC00 + 0x10000
unicode = paste('0x', as.hexmode(unicode), sep = '')
return(unicode)
}
PS2.:
I would recommend using iconv(tweet, 'UTF-8', 'latin1', 'byte') to preserve special characters like áäà.
PS3.:
To replace the emoji with its english text, tag, hash, or anything you want to map it to, I would suggest using DFS in a graph of emojis because there are some emojis whose unicode is the concatenation of other simpler unicodes (i.e. <f0><9f><a4><b8><e2><80><8d><e2><99><82><ef><b8><8f> is a man cartwheeling, while independently <f0><9f><a4><b8> is person cartwheeling, <e2><80><8d> is nothing, <e2><99><82> is a male sign, and <ef><b8><8f> is nothing) and while man cartwheeling and person cartwheeling male sign are obviously semantically related, I prefer the more faithfull translation.
The answer provided by Felipe Suárez Colmenares is excellent because it describes the mechanics of this issue, but I wanted to point you here, which is a dictionary I made with the < ed > R encoding specifically for Twitter. I also have code on how to go through and identify prose versions of emojis. Thought this might be easier for people who stumble into this problem in the future. The dictionary is up to date to the most recent Unicode version (9) and once the even newer one comes out I'll update it then too.
Please try type this: iconv(tweet$text, "latin1", "ASCII", sub="")
There you have also similar discussion:
Emoticons in Twitter Sentiment Analysis in r
Regards,
Magda
Python 2.6 on Redhat 6.3
I have a device that saves 32 bit floating point value across 2 memory registers, split into most significant word and least significant word.
I need to convert this to a float.
I have been using the following code found on SO and it is similar to code I have seen elsewhere
#!/usr/bin/env python
import sys
from ctypes import *
first = sys.argv[1]
second = sys.argv[2]
reading_1 = str(hex(int(first)).lstrip("0x"))
reading_2 = str(hex(int(second)).lstrip("0x"))
sample = reading_1 + reading_2
def convert(s):
i = int(s, 16) # convert from hex to a Python int
cp = pointer(c_int(i)) # make this into a c integer
fp = cast(cp, POINTER(c_float)) # cast the int pointer to a float pointer
return fp.contents.value # dereference the pointer, get the float
print convert(sample)
an example of the register values would be ;
register-1;16282 register-2;60597
this produces the resulting float of
1.21034872532
A perfectly cromulent number, however sometimes the memory values are something like;
register-1;16282 register-2;1147
which, using this function results in a float of;
1.46726675314e-36
which is a fantastically small number and not a number that seems to be correct. This device should be producing readings around the 1.2, 1.3 range.
What I am trying to work out is if the device is throwing bogus values or whether the values I am getting are correct but the function I am using is not properly able to convert them.
Also is there a better way to do this, like with numpy or something of that nature?
I will hold my hand up and say that I have just copied this code from examples on line and I have very little understanding of how it works, however it seemed to work in the test cases that I had available to me at the time.
Thank you.
If you have the raw bytes (e.g. read from memory, from file, over the network, ...) you can use struct for this:
>>> import struct
>>> struct.unpack('>f', '\x3f\x9a\xec\xb5')[0]
1.2103487253189087
Here, \x3f\x9a\xec\xb5 are your input registers, 16282 (hex 0x3f9a) and 60597 (hex 0xecb5) expressed as bytes in a string. The > is the byte order mark.
So depending how you get the register values, you may be able to use this method (e.g. by converting your input integers to byte strings). You can use struct for this, too; this is your second example:
>>> raw = struct.pack('>HH', 16282, 1147) # from two unsigned shorts
>>> struct.unpack('>f', raw)[0] # to one float
1.2032617330551147
The way you've converting the two ints makes implicit assumptions about endianness that I believe are wrong.
So, let's back up a step. You know that the first argument is the most significant word, and the second is the least significant word. So, rather than try to figure out how to combine them into a hex string in the appropriate way, let's just do this:
import struct
import sys
first = sys.argv[1]
second = sys.argv[2]
sample = int(first) << 16 | int(second)
Now we can just convert like this:
def convert(i):
s = struct.pack('=i', i)
return struct.unpack('=f', s)[0]
And if I try it on your inputs:
$ python floatify.py 16282 60597
1.21034872532
$ python floatify.py 16282 1147
1.20326173306
As I understand it, files like /dev/urandom provide just a constant stream of bits. The terminal emulator then tries to interpret them as strings, which results in a mess of unrecognised characters.
How would I go about doing the same thing in python, send a string of ones and zeros to the terminal as "raw bits"?
edit
I may have to clarify:
Say for example the string I want to "print" is 1011100. On an ascii system, the output should be "\". If I cat /dev/urandom, it provides a constant stream of bits. Which get printed like this: "���c�g/�t]+__��-�;". That's what I want.
Stephano: the key is the incomplete answer by "#you" above - the chr function :
import random, sys
for i in xrange(500):
sys.stdout.write(chr(random.randrange(256)))
Use the chr function. I takes an input between 0 and 255 and returns a string containing the character corresponding to that value.
And from another question on StackOverflow you can get a _bin function.
def _bin(x, width):
return ''.join(str((x>>i)&1) for i in xrange(width-1,-1,-1))
Then simply put call _bin(ord(x), 8) where x is a character (string of length one)
import sys, random
while True:
sys.stdout.write(chr(random.getrandbits(8)))
sys.stdout.flush()