How do you convert string to bytes - python

I am using python v2.7.3 and am trying to get a conversion to work but am having some issues.
This is code that works the way I would like it to:
testString = "\x00\x13\xA2\x00\x40\xAA\x15\x47"
print 'Test String:',testString
This produces the following result
TestString: ¢#ªG
Now I load the same string as above along with some other data:
\x00\x13\xA2\x00\x40\xAA\x15\x47123456
into a SQLite3 database and then pull it from the database as such:
cur.execute('select datafield from databasetable')
rows = cur.fetchall()
if len(rows) == 0:
print 'Sorry Found Nothing!'
else:
print row[0][:32]
This however produces the following result:
\x00\x13\xA2\x00\x40\xAA\x15\x47
I can not figure out how to convert the database stored string to the bytes string, if that is what it is, as the first snippet of code does. I actually need it to load into a variable in that format so I can pass it to a function for further processing.
The following I have tried:
print "My Addy:",bytes(row[0][:32])
print '{0}'.format(row[0][:32])
...
They all produce the same results...
Please
First, Can anyone tell me what format the first results are in? I think its bytes format but am not sure.
Second, How can I convert the database stored text into
Any help and I would be eternally grateful.
Thanks in advance,
Ed

The problem is that you're not storing the value in the database properly. You want to store a sequence of bytes, but you're storing an escaped version of those bytes instead.
When entering string literals into a programming language, you can use escape codes in your source code to access non-printing characters. That's what you've done in your first example:
testString = "\x00\x13\xA2\x00\x40\xAA\x15\x47"
print 'Test String:',testString
But this is processing done by the Python interpreter as it's reading through your program and executing it.
Change the database column to a binary blob instead of a string, then go back to the code you're using to store the bytes in SQLite3, and have it store the actual bytes ('ABC', 3 bytes) instead of an escaped string ('\x41\x42\x43', 12 characters).
If you truly need to store the escaped string in SQLite3 and convert it at run-time, you might be able to use ast.literal_eval() to evaluate it as a string literal.

Related

How to create variables for substitution based on user for unique filepath in python?

I'm writing code that I want to make generic to whoever needs to follow it.
Part of the code is reading in an excel file that the user has to download. I know that each user has a specific 6-digit unique ID, and the folder and name of the file remains the same. Is there some way for me to modify the pd.read_csv function so that it is like this:
USERID = '123abc'
pd.read_csv(r'C:\Users\USERID\Documents\Dataset.csv')
I keep getting stuck because there is an ' next to the r so concatenation with a constant does not seem to work.
Similarly, is there a method for code for exporting that would insert the current date in the title?
What you want to use are formatted strings. The r preceding the string literal in your code denotes that you are creating a raw string, which means that you aren't going to ever see the value of your variable get assigned correctly within that string. Python's docs explain what these raw strings are:
Both string and bytes literals may optionally be prefixed with a letter 'r' or 'R'; such strings are called raw strings and treat backslashes as literal characters. (3.10.4 Python Language Reference, Lexical Analysis)
Like Fredericka mentions in her comment, the formatted string is a great way to accomplish what you're trying to do. If you're using Python version 3.6 or greater, you can also use the format method on the string, which does the same thing.
# set the User ID
user_id = "PythonUser1"
# print the full filepath
print("C:\\Users\\{}\\Documents\\Dataset.csv".format(user_id))
# read the CSV file using formatted string literals
my_csv = pd.read_csv("C:\\Users\\{user_id}\\Documents\\Dataset.csv")
# read the CSV file using the format method
my_csv = pd.read_csv("C:\\Users\\{}\\Documents\\Dataset.csv".format(user_id))
For more information, I'd recommend checking out the official Python docs on input and output.

hex header of file, magic numbers, python

I have a program in Python which analyses file headers and decides which file type it is. (https://github.com/LeoGSA/Browser-Cache-Grabber)
The problem is the following:
I read first 24 bytes of a file:
with open (from_folder+"/"+i, "rb") as myfile:
header=str(myfile.read(24))
then I look for pattern in it:
if y[1] in header:
shutil.move (from_folder+"/"+i,to_folder+y[2]+i+y[3])
where y = ['/video', r'\x47\x40\x00', '/video/', '.ts']
y[1] is the pattern and = r'\x47\x40\x00'
the file has it inside, as you can see from the picture below.
the program does NOT find this pattern (r'\x47\x40\x00') in the file header.
so, I tried to print header:
You see? Python sees it as 'G#' instead of '\x47\x40'
and if i search for 'G#'+r'\x00' in header - everything is ok. It finds it.
Question: What am I doing wrong? I want to look for r'\x47\x40\x00' and find it. Not for some strange 'G#'+r'\x00'.
OR
why python sees first two numbers as 'G#' and not as '\x47\x40', though the rest of header it sees in HEX? Is there a way to fix it?
with open (from_folder+"/"+i, "rb") as myfile:
header=myfile.read(24)
header = str(binascii.hexlify(header))[2:-1]
the result I get is:
And I can work with it
4740001b0000b00d0001c100000001efff3690e23dffffff
P.S. But anyway, if anybody will explain what was the problem with 2 first bytes - I would be grateful.
In Python 3 you'll get bytes from a binary read, rather than a string.
No need to convert it to a string by str.
Print will try to convert bytes to something human readable.
If you don't want that, convert your bytes to e.g. hex representations of the integer values of the bytes by:
aBytes = b'\x00\x47\x40\x00\x13\x00\x00\xb0'
print (aBytes)
print (''.join ([hex (aByte) for aByte in aBytes]))
Output as redirected from the console:
b'\x00G#\x00\x13\x00\x00\xb0'
0x00x470x400x00x130x00x00xb0
You can't search in aBytes directly with the in operator, since aBytes isn't a string but an array of bytes.
If you want to apply a string search on '\x00\x47\x40', use:
aBytes = b'\x00\x47\x40\x00\x13\x00\x00\xb0'
print (aBytes)
print (r'\x'.join ([''] + ['%0.2x'%aByte for aByte in aBytes]))
Which will give you:
b'\x00G#\x00\x13\x00\x00\xb0'
\x00\x47\x40\x00\x13\x00\x00\xb0
So there's a number of separate issues at play here:
print tries to print something human readable, which succeeds only for the first two chars.
You can't directly search for bytearrays in bytearrays with in, so convert them to a string containing fixed length hex representations as substrings, as shown.

Python: String indices must be integers

I get this string from stdin.
{u'trades': [Custom(time=1418854520, sn=47998, timestamp=1418854517,
price=322, amount=0.269664, tid=48106793, type=u'ask',
start=1418847319, end=1418847320), Custom(time=1418854520, sn=47997,
timestamp=1418854517, price=322, amount=0.1, tid=48106794,
type=u'ask', start=1418847319, end=1418847320),
Custom(time=1418854520, sn=47996, timestamp=1418854517, price=321.596,
amount=0.011, tid=48106795, type=u'ask', start=1418847319,
end=1418847320)]}
My program fails when i try to access jsonload["trades"]. If i use jsonload[0] I only receive one character: {.
I checked it isn't a problem from get the text from stdin, but I don't know if it is a problem of format received (because i used Incursion library) or if it is a problem in my python code. I have tried many combinations about json.load/s and json.dump/s but without success.
inputdata = sys.stdin.read()
jsondump = json.dumps(inputdata)
jsonload = json.loads(jsondump)
print jsonload
print type(jsonload) # return me "<type 'unicode'>"
print repr(jsonload) # return me same but with u" ..same string.... "
for row in jsonload["trades"]: # error here: TypeError: string indices must be integers
You read input data into a string. This is then turned into a JSON encoded string by json.dumps. You then turn it back into a plain string using json.loads. You have not interpreted the original data as JSON at any point.
Try just converting the input data from json:
inputdata = sys.stdin.read()
jsonload = json.loads(inputdata)
However this will not work because you have not got valid JSON data in your snippet. It looks like serialized python code. You can check the input data using http://jsonlint.com
The use of u'trades' shows me that you have a unicode python string. The JSON equivalent would be "trades". To convert the python code you can eval it, but this is a dangerous operation if the data comes from an untrusted source.

How do I make Python 3.4 c_char_array read strings as two bytes?

I'm using pypyodbc with Python 3.4 on Ubuntu 12.04.
I'm trying to get the column names, but something is a little wonky. What is coming back is just the first character as a byte, like this:
(Pdb) Cname.value
b'T'
The thing behind the scenes is a ctypes char array:
(Pdb) Cname
<ctypes.c_char_Array_1024 object at 0xb6a1ad1c>
But if I look at the raw value:
(Pdb) Cname.raw
b'T\x00Y\x00P\x00E\x00_\x00N\x00A\x00M\x00E\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
you can see that the value TYPE_NAME is separated by \x00.
So it appears to me that what's happening is something (ctypes?) is reading that first \x00 as the null terminator for the string instead of part of the characters.
What can I do to modify the way ctypes is being used so that it will read the entire string? Everything else seems to work fine, it's just the descriptions that are wonky.
Your string is encoded in UTF-16LE. You want to call something like Cname.raw.decode('utf_16le').rstrip('\x00'). That will return a Python string, which you can then do with as you please.

Working with unicode encoded Strings from Active Directory via python-ldap

I already came up with this problem, but after some testing I decided to create a new question with some more specific Infos:
I am reading user accounts with python-ldap (and Python 2.7) from our Active Directory. This does work well, but I have problems with special chars. They do look like UTF-8 encoded strings when printed on the console. The goal is to write them into a MySQL DB, but I don't get those strings into proper UTF-8 from the beginning.
Example (fullentries is my array with all the AD entries):
fullentries[23][1].decode('utf-8', 'ignore')
print fullentries[23][1].encode('utf-8', 'ignore')
print fullentries[23][1].encode('latin1', 'ignore')
print repr(fullentries[23][1])
A second test with a string inserted by hand as follows:
testentry = "M\xc3\xbcller"
testentry.decode('utf-8', 'ignore')
print testentry.encode('utf-8', 'ignore')
print testentry.encode('latin1', 'ignore')
print repr(testentry)
The output of the first example ist:
M\xc3\xbcller
M\xc3\xbcller
u'M\\xc3\\xbcller'
Edit: If I try to replace the double backslashes with .replace('\\\\','\\) the output remains the same.
The output of the second example:
Müller
M�ller
'M\xc3\xbcller'
Is there any way to get the AD output properly encoded? I already read a lot of documentation, but it all states that LDAPv3 gives you strictly UTF-8 encoded strings. Active Directory uses LDAPv3.
My older question this topic is here: Writing UTF-8 String to MySQL with Python
Edit: Added repr(s) infos
First, know that printing to a Windows console is often the step that garbles data, so for your tests, you should print repr(s) to see the precise bytes you have in your string.
You need to find out how the data from AD is encoded. Again, print repr(s) will let you see the content of the data.
UPDATED:
OK, it looks like you're getting strange strings somehow. There might be a way to get them better, but you can adapt in any case, though it isn't pretty:
u.decode('unicode_escape').encode('iso8859-1').decode('utf8')
You might want to look into whether you can get the data in a more natural format.

Categories

Resources