Python search and replace in binary file - python

I am trying to search and replace some of the text (eg 'Smith, John') in this pdf form file (header.fdf, I presumed this is treated as binary file):
'%FDF-1.2\n%\xe2\xe3\xcf\xd3\n1 0 obj\n<</FDF<</Fields[<</V(M)/T(PatientSexLabel)>><</V(24-09-1956 53)/T(PatientDateOfBirth)>><</V(Fisher)/T(PatientLastNameLabel)>><</V(CNSL)/T(PatientConsultant)>><</V(28-01-2010 18:13)/T(PatientAdmission)>><</V(134 Field Street\\rBlackburn BB1 1BB)/T(PatientAddressLabel)>><</V(Smith, John)/T(PatientName)>><</V(24-09-1956)/T(PatientDobLabel)>><</V(0123456)/T(PatientRxr)>><</V(01234567891011)/T(PatientNhsLabel)>><</V(John)/T(PatientFirstNameLabel)>><</V(0123456)/T(PatientRxrLabel)>>]>>>>\nendobj\ntrailer\n<</Root 1 0 R>>\n%%EOF\n'
After
f=open("header.fdf","rb")
s=f.read()
f.close()
s=s.replace(b'PatientName',name)
the following error occurs:
Traceback (most recent call last):
File "/home/aj/Inkscape/Med/GAD/gad.py", line 56, in <module>
s=s.replace(b'PatientName',name)
TypeError: expected an object with the buffer interface
How best to do this?

f=open("header.fdf","rb")
s=str(f.read())
f.close()
s=s.replace(b'PatientName',name)
or
f=open("header.fdf","rb")
s=f.read()
f.close()
s=s.replace(b'PatientName',bytes(name))
probably the latter, as I don't think you are going to be able to use unicode names with this type of substitution anyway

You must be using Python 3.X. You didn't define 'name' in your example, but it is the problem. Likely you defined it as a Unicode string:
name = 'blah'
It needs to be a bytes object too:
name = b'blah'
This works:
Python 3.1.2 (r312:79149, Mar 21 2010, 00:41:52) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> f = open('file.txt','rb')
>>> s = f.read()
>>> f.close()
>>> s
b'Test File\r\n'
>>> name = b'Replacement'
>>> s=s.replace(b'File',name)
>>> s
b'Test Replacement\r\n'
In a bytes object, the arguments to replace must both be bytes objects.

Related

Unable to concatenate strings in Python2.7

I'm trying to concatenate two strings in my function. I tried all concatenation, but those two strings just don't concatenate one after another, instead, shorter strings B(length = s) substitute the first s units of longer string A.
I read some data from input file, and store third line whose content is "00001M035NNYY1111111" into a variable called applicant:
data = open("input.txt").read().split('\n')
applicant = str(data[2])
I want to add an integer 8 at the end of applicant, so the new applicant will be "00001M035NNYY11111118". I tried applicant += str(8) and "".join((applicant, str(8))) and other concatenation methods, but all of them only give me "80001M035NNYY1111111"... Does anyone know why this happened and how am I suppose to do to get my intended result.
You probably have Windows line endings in your file: \r\n. By splitting on \n, you leave the \r, which returns to the beginning of the line. You can trim it manually:
with open("input.txt") as f:
data = [line.rstrip() for line in f]
This should work
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> data = open("input.txt").read().split("\n")
>>> applicant = data[2] + str(8)
>>> print applicant
00001M035NNYY11111118
>>>
There is probably something wrong with your text file if this does not work.

Python using cStringIO with foreach loop

I want to iterate over lines cStringIO object, however it does not seem to work with foreach loop. To be more precise the behavior is as if the collection was empty. What am I doing wrong?
example:
Python 2.7.12 (default, Aug 29 2016, 16:51:45)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import cStringIO
>>> s = cStringIO.StringIO()
>>> import os
>>> s.write("Hello" + os.linesep + "World" + os.linesep)
>>> s.getvalue()
'Hello\nWorld\n'
>>> for line in s :
... print line
...
>>>
Thank you.
cStringIO.StringIO returns either cStringIO.InputType object i.e input stream if provided a string else or cStringIO.OutputType object i.e output stream.
In [13]: sio = cStringIO.StringIO()
In [14]: sio??
Type: StringO
String form: <cStringIO.StringO object at 0x7f63d418f538>
Docstring: Simple type for output to strings.
In [15]: isinstance(sio, cStringIO.OutputType)
Out[15]: True
In [16]: sio = cStringIO.StringIO("dsaknml")
In [17]: sio??
Type: StringI
String form: <cStringIO.StringI object at 0x7f63d4218580>
Docstring: Simple type for treating strings as input file streams
In [18]: isinstance(sio, cStringIO.InputType)
Out[18]: True
So you can either do read operations or write operations but not both. a simple solution to do read operations on a cStringIO.OutputType object is by converting it into the value by getvalue() method.
If you try do both operations then either of them gets ignored silently.
cStringIO.OutputType.getvalue(c_string_io_object)
Try using the string split method:
for line in s.getvalue().split('\n'): print line
...
Hello
World
Or as suggested, if you are always splitting on a new line:
for line in s.getvalue().splitlines(): print line
You can read the contents from an open file handle after writing, but you first have to use the seek(0) method to move the pointer back to the start. This will work for either cStringIO or a real file:
import cStringIO
s = cStringIO.StringIO()
s.write("Hello\nWorld\n") # Python automatically converts '\n' as needed
s.getvalue()
# 'Hello\nWorld\n'
s.seek(0) # move pointer to start of file
for line in s :
print line.strip()
# Hello
# World

Python recognizes the function count as a name

I am viewing the Python tutorials from the Pascal institute BDFL says are the best to start and i have a very basic question
While in the tutorial says:
How many of each base does this sequence contains?
>>> count(seq, 'a')
35
>>> count(seq, 'c')
21
>>> count(seq, 'g')
44
>>> count(seq, 't')
12
When i try to do is it does not work
>>> count(seq, 'a')
Traceback (most recent call last):
File "<pyshell#140>", line 1, in <module>
count(seq, 'a')
NameError: name 'count' is not defined
Why this is happening?
I' ve searched Stack resoures BTW and I didn't find anything.
COMMENT
Take a look at the start of the section 1.1.3. You have to type first from string import *
>>> from string import*
>>> nb_a = count(seq, 'a')
Traceback (most recent call last):
File "<pyshell#73>", line 1, in <module>
nb_a = count(seq, 'a')
NameError: name 'count' is not defined
>>> from string import *
>>> nb_a = count(seq, 'a')
Traceback (most recent call last):
File "<pyshell#75>", line 1, in <module>
nb_a = count(seq, 'a')
NameError: name 'count' is not defined
I did.
ANSWER
>>> from string import *
>>> from string import count
Traceback (most recent call last):
File "<pyshell#93>", line 1, in <module>
from string import count
ImportError: cannot import name count
>>> from string import count
Traceback (most recent call last):
File "<pyshell#94>", line 1, in <module>
from string import count
ImportError: cannot import name count
I did. Didn' t work.
The tutorial you linked to is very old:
Python 2.4.2 (#1, Dec 20 2005, 16:25:40)
You're probably using a more modern Python (>= 3) in which case there are no longer string functions like count in the string module. We used to have
Python 2.7.5+ (default, Feb 27 2014, 19:39:55)
[GCC 4.8.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from string import count
>>> count("abcc", "c")
2
but today:
Python 3.3.2+ (default, Feb 28 2014, 00:53:38)
[GCC 4.8.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from string import count
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: cannot import name count
>>> import string
>>> dir(string)
['ChainMap', 'Formatter', 'Template', '_TemplateMetaclass', '__builtins__',
'__cached__', '__doc__', '__file__', '__initializing__', '__loader__', '__name__',
'__package__', '_re', '_string', 'ascii_letters', 'ascii_lowercase',
'ascii_uppercase', 'capwords', 'digits', 'hexdigits', 'octdigits', 'printable',
'punctuation', 'whitespace']
These days we use the string methods instead, the ones that live in str itself:
>>> 'abcc'.count('c')
2
or even
>>> str.count('abcc','c')
2
While the other answers are correct, current Python releases propose another way to call count, as it is usable for str but also any type of sequence, as advised inside the documentation:
>>> seq.count('a')
35
As seq is as string object, it also have the count method.
This methodcount() is defined in string package. For using this method in your code, you need to import the definition.
Adding the following import statement before using the method will solve your problem
from string import count
>>> seq='acdaacc'
>>> count(seq,'a')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'count' is not defined
>>> from string import count
>>> count(seq,'a')
3
count is a method in the string module, meaning that at the top of your file (before you use the function) you need to "import" it so that your interpreter knows what you're talking about. Add the line from string import count as the first line of your file and it should work.

How to save to file a dictionary with utf-8 strings correctly

I am using googlemaps Python package to do reverse geocoding. Observe:
PS Z:\dev\poc\SDR> python
Python 2.7.1 (r271:86832, Nov 27 2010, 17:19:03) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from googlemaps import GoogleMaps
>>> gmaps = GoogleMaps("*** my google API key ***")
>>> d=gmaps.reverse_geocode(51.75,19.46667)
>>> d
{u'Status': {u'code': 200, u'request': u'geocode'}, u'Placemark': [{u'Point': {u'coordinates': [19.466876, 51.7501456, 0]}, u'ExtendedData': {u'LatLonBox': {u'west': 19.465527, u'east': 19.468225, u'n
orth': 51.7514946, u'south': 51.7487966}}, u'AddressDetails': {u'Country': {u'CountryName': u'Polska', u'AdministrativeArea': {u'SubAdministrativeArea': {u'SubAdministrativeAreaName': u'\u0141\xf3d\u0
17a', u'Locality': {u'Thoroughfare': {u'ThoroughfareName': u'ksi\u0119dza Biskupa Wincentego Tymienieckiego 16'}, u'LocalityName': u'\u0141\xf3d\u017a'}}, u'AdministrativeAreaName': u'\u0142\xf3dzkie'
}, u'CountryNameCode': u'PL'}, u'Accuracy': 8}, u'id': u'p1', u'address': u'ksi\u0119dza Biskupa Wincentego Tymienieckiego 16, 90-001 \u0141\xf3d\u017a, Poland'}], u'name': u'51.750000,19.466670'}
>>> import pprint
>>> pp = pprint.PrettyPrinter(indent = 2)
>>> pp.pprint(d)
{ u'Placemark': [ { u'AddressDetails': { u'Accuracy': 8,
u'Country': { u'AdministrativeArea': { u'AdministrativeAreaName': u'\u0142\xf3dzkie',
u'SubAdministrativeArea': { u'Locality': { u'LocalityName': u'\u0141\xf3d\u017a',
u'Thoroughfare': { u'ThoroughfareName': u'ksi\u0119dza Biskupa Wincentego Tym
ienieckiego 16'}},
u'SubAdministrativeAreaName': u'\u0141\xf3d\u017a'}},
u'CountryName': u'Polska',
u'CountryNameCode': u'PL'}},
u'ExtendedData': { u'LatLonBox': { u'east': 19.468225,
u'north': 51.7514946,
u'south': 51.7487966,
u'west': 19.465527}},
u'Point': { u'coordinates': [19.466876, 51.7501456, 0]},
u'address': u'ksi\u0119dza Biskupa Wincentego Tymienieckiego 16, 90-001 \u0141\xf3d\u017a, Poland',
u'id': u'p1'}],
u'Status': { u'code': 200, u'request': u'geocode'},
u'name': u'51.750000,19.466670'}
Now, I want to save the d dictionary to a file, but I do not want to see u'\u0141\xf3d\u017a' as the locality name. I want to see Łódź. Indeed:
\u0141 - http://www.fileformat.info/info/unicode/char/0141/index.htm
\xf3 = \u00f3 - http://www.fileformat.info/info/unicode/char/00f3/index.htm
\u017a - http://www.fileformat.info/info/unicode/char/017a/index.htm
So, I have tried this:
with codecs.open("aa.txt", "w", "utf-8") as f:
f.write(unicode(d))
and this:
with codecs.open("aa.txt", "w", "utf-8") as f:
f.write(unicode(str(d), "utf-8"))
and this:
with open("aa.txt", "w") as f:
f.write(unicode(d))
And of course, nothing works. All the trials yield \u0141\xf3d\u017a. How can I save it correctly?
Pass ensure_ascii=False to json.dump*() and use codecs.open().
The first form is right for writing unicode to the file:
>>> s = u'\u0141\xf3d\u017a'
>>> with codecs.open('aa.txt', 'w', 'utf-8') as f:
... f.write(s)
...
>>> with codecs.open('aa.txt', 'r', 'utf-8') as f:
... print f.read()
...
Łódź
What's happenning is that you're saving the representation for the dictionary when you use unicode(d).
>>> unicode(d)
u"{u'locality': u'\\u0141\\xf3d\\u017a'}"
Which is equivalent to:
>>> unicode(repr(d))
u"{u'locality': u'\\u0141\\xf3d\\u017a'}"
So, you aren't really writing down Łódź to the file. Notice the original escape sequences are escaped. u'\u0141' is the Ł char, but u'\u0141' is a string of 6 chars.
Since Python dictionaries don't have a unicode representation that won't do that escaping, you should use a better serialization method. Using json should be fine if the application that will read the file supports it.
If you really need to write it down to a file readable by some other application that do not support the same serialization method, you have to iterate over the dict and write down the key, value pairs one at a time, not the representation.
A file is a stream of bytes, so your unicode needs to be encoded (represented as bytes) before saved in the file.
Now, when opening (reading data from file), you need to decode the data back to unicode, using the same decoding (encoding) scheme, e.g. utf-8
Be careful to write a serialization of your object inside the file, and not the representation of it. Use json.dumps(d) to obtain a serialization and json.loads(filecontent) to read it back

Python: String of 1s and 0s -> binary file

I have a string of 1's and 0's in Python and I would like to write it to a binary file. I'm having a lot of trouble with finding a good way to do this.
Is there a standard way to do this that I'm simply missing?
If you want a binary file,
>>> import struct
>>> myFile=open('binaryFoo','wb')
>>> myStr='10010101110010101'
>>> x=int(myStr,2)
>>> x
76693
>>> struct.pack('i',x)
'\x95+\x01\x00'
>>> myFile.write(struct.pack('i',x))
>>> myFile.close()
>>> quit()
$ cat binaryFoo
�+$
Is this what you are looking for?
In [1]: int('10011001',2)
Out[1]: 153
Split your input into pieces of eight bits, then apply int(_, 2) and chr, then concatenate into a string and write this string to a file.
Something like...:
your_file.write(''.join(chr(int(your_input[8*k:8*k+8], 2)) for k in xrange(len(your_input)/8)))
There is a bitstring module now which does what you need.
from bitstring import BitArray
my_str = '001001111'
binary_file = open('file.bin', 'wb')
b = BitArray(bin=my_str)
b.tofile(binary_file)
binary_file.close()
You can test it from the shell in Linux with xxd -b file.bin
Or you can use the array module like this
$ python
Python 2.7.2+ (default, Oct 4 2011, 20:06:09)
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import random,array
#This is the best way, I could think of for coming up with an binary string of 100000
>>> binStr=''.join([str(random.randrange(0,2)) for i in range(100000)])
>>> len(binStr)
100000
>>> a = array.array("c", binStr)
#c is the type of data (character)
>>> with open("binaryFoo", "ab") as f:
... a.tofile(f)
...
#raw writing to file
>>> quit()
$
BITS_IN_BYTE = 8
chars = '00111110'
bytes = bytearray(int(chars[i:i+BITS_IN_BYTE], 2)
for i in xrange(0, len(chars), BITS_IN_BYTE))
open('filename', 'wb').write(bytes)

Categories

Resources