Python using cStringIO with foreach loop - python

I want to iterate over lines cStringIO object, however it does not seem to work with foreach loop. To be more precise the behavior is as if the collection was empty. What am I doing wrong?
example:
Python 2.7.12 (default, Aug 29 2016, 16:51:45)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import cStringIO
>>> s = cStringIO.StringIO()
>>> import os
>>> s.write("Hello" + os.linesep + "World" + os.linesep)
>>> s.getvalue()
'Hello\nWorld\n'
>>> for line in s :
... print line
...
>>>
Thank you.

cStringIO.StringIO returns either cStringIO.InputType object i.e input stream if provided a string else or cStringIO.OutputType object i.e output stream.
In [13]: sio = cStringIO.StringIO()
In [14]: sio??
Type: StringO
String form: <cStringIO.StringO object at 0x7f63d418f538>
Docstring: Simple type for output to strings.
In [15]: isinstance(sio, cStringIO.OutputType)
Out[15]: True
In [16]: sio = cStringIO.StringIO("dsaknml")
In [17]: sio??
Type: StringI
String form: <cStringIO.StringI object at 0x7f63d4218580>
Docstring: Simple type for treating strings as input file streams
In [18]: isinstance(sio, cStringIO.InputType)
Out[18]: True
So you can either do read operations or write operations but not both. a simple solution to do read operations on a cStringIO.OutputType object is by converting it into the value by getvalue() method.
If you try do both operations then either of them gets ignored silently.
cStringIO.OutputType.getvalue(c_string_io_object)

Try using the string split method:
for line in s.getvalue().split('\n'): print line
...
Hello
World
Or as suggested, if you are always splitting on a new line:
for line in s.getvalue().splitlines(): print line

You can read the contents from an open file handle after writing, but you first have to use the seek(0) method to move the pointer back to the start. This will work for either cStringIO or a real file:
import cStringIO
s = cStringIO.StringIO()
s.write("Hello\nWorld\n") # Python automatically converts '\n' as needed
s.getvalue()
# 'Hello\nWorld\n'
s.seek(0) # move pointer to start of file
for line in s :
print line.strip()
# Hello
# World

Related

Unable to concatenate strings in Python2.7

I'm trying to concatenate two strings in my function. I tried all concatenation, but those two strings just don't concatenate one after another, instead, shorter strings B(length = s) substitute the first s units of longer string A.
I read some data from input file, and store third line whose content is "00001M035NNYY1111111" into a variable called applicant:
data = open("input.txt").read().split('\n')
applicant = str(data[2])
I want to add an integer 8 at the end of applicant, so the new applicant will be "00001M035NNYY11111118". I tried applicant += str(8) and "".join((applicant, str(8))) and other concatenation methods, but all of them only give me "80001M035NNYY1111111"... Does anyone know why this happened and how am I suppose to do to get my intended result.
You probably have Windows line endings in your file: \r\n. By splitting on \n, you leave the \r, which returns to the beginning of the line. You can trim it manually:
with open("input.txt") as f:
data = [line.rstrip() for line in f]
This should work
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> data = open("input.txt").read().split("\n")
>>> applicant = data[2] + str(8)
>>> print applicant
00001M035NNYY11111118
>>>
There is probably something wrong with your text file if this does not work.

Bug in StringIO module python using numpy

Very simple code:
import StringIO
import numpy as np
c = StringIO.StringIO()
c.write("1 0")
a = np.loadtxt(c)
print a
I get an empty array + warning that c is an empty file.
I fixed this by adding:
d=StringIO.StringIO(c.getvalue())
a = np.loadtxt(d)
I think such a thing shouldn't happen, what is happening here?
It's because the 'position' of the file object is at the end of the file after the write. So when numpy reads it, it reads from the end of the file to the end, which is nothing.
Seek to the beginning of the file and then it works:
>>> from StringIO import StringIO
>>> s = StringIO()
>>> s.write("1 2")
>>> s.read()
''
>>> s.seek(0)
>>> s.read()
'1 2'
StringIO is a file-like object. As such it has behaviors consistent with a file. There is a notion of a file pointer - the current position within the file. When you write data to a StringIO object the file pointer is adjusted to the end of the data. When you try to read it, the file pointer is already at the end of the buffer, so no data is returned.
To read it back you can do one of two things:
Use StringIO.getvalue() as you already discovered. This returns the
data from the beginning of the buffer, leaving the file pointer unchanged.
Use StringIO.seek(0) to reposition the file pointer to the start of
the buffer and then calling StringIO.read() to read the data.
Demo
>>> from StringIO import StringIO
>>> s = StringIO()
>>> s.write('hi there')
>>> s.read()
''
>>> s.tell() # shows the current position of the file pointer
8
>>> s.getvalue()
'hi there'
>>> s.tell()
8
>>> s.read()
''
>>> s.seek(0)
>>> s.tell()
0
>>> s.read()
'hi there'
>>> s.tell()
8
>>> s.read()
''
There is one exception to this. If you provide a value at the time that you create the StringIO the buffer will be initialised with the value, but the file pointer will positioned at the start of the buffer:
>>> s = StringIO('hi there')
>>> s.tell()
0
>>> s.read()
'hi there'
>>> s.read()
''
>>> s.tell()
8
And that is why it works when you use
d=StringIO.StringIO(c.getvalue())
because you are initialising the StringIO object at creation time, and the file pointer is positioned at the beginning of the buffer.

How to save to file a dictionary with utf-8 strings correctly

I am using googlemaps Python package to do reverse geocoding. Observe:
PS Z:\dev\poc\SDR> python
Python 2.7.1 (r271:86832, Nov 27 2010, 17:19:03) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from googlemaps import GoogleMaps
>>> gmaps = GoogleMaps("*** my google API key ***")
>>> d=gmaps.reverse_geocode(51.75,19.46667)
>>> d
{u'Status': {u'code': 200, u'request': u'geocode'}, u'Placemark': [{u'Point': {u'coordinates': [19.466876, 51.7501456, 0]}, u'ExtendedData': {u'LatLonBox': {u'west': 19.465527, u'east': 19.468225, u'n
orth': 51.7514946, u'south': 51.7487966}}, u'AddressDetails': {u'Country': {u'CountryName': u'Polska', u'AdministrativeArea': {u'SubAdministrativeArea': {u'SubAdministrativeAreaName': u'\u0141\xf3d\u0
17a', u'Locality': {u'Thoroughfare': {u'ThoroughfareName': u'ksi\u0119dza Biskupa Wincentego Tymienieckiego 16'}, u'LocalityName': u'\u0141\xf3d\u017a'}}, u'AdministrativeAreaName': u'\u0142\xf3dzkie'
}, u'CountryNameCode': u'PL'}, u'Accuracy': 8}, u'id': u'p1', u'address': u'ksi\u0119dza Biskupa Wincentego Tymienieckiego 16, 90-001 \u0141\xf3d\u017a, Poland'}], u'name': u'51.750000,19.466670'}
>>> import pprint
>>> pp = pprint.PrettyPrinter(indent = 2)
>>> pp.pprint(d)
{ u'Placemark': [ { u'AddressDetails': { u'Accuracy': 8,
u'Country': { u'AdministrativeArea': { u'AdministrativeAreaName': u'\u0142\xf3dzkie',
u'SubAdministrativeArea': { u'Locality': { u'LocalityName': u'\u0141\xf3d\u017a',
u'Thoroughfare': { u'ThoroughfareName': u'ksi\u0119dza Biskupa Wincentego Tym
ienieckiego 16'}},
u'SubAdministrativeAreaName': u'\u0141\xf3d\u017a'}},
u'CountryName': u'Polska',
u'CountryNameCode': u'PL'}},
u'ExtendedData': { u'LatLonBox': { u'east': 19.468225,
u'north': 51.7514946,
u'south': 51.7487966,
u'west': 19.465527}},
u'Point': { u'coordinates': [19.466876, 51.7501456, 0]},
u'address': u'ksi\u0119dza Biskupa Wincentego Tymienieckiego 16, 90-001 \u0141\xf3d\u017a, Poland',
u'id': u'p1'}],
u'Status': { u'code': 200, u'request': u'geocode'},
u'name': u'51.750000,19.466670'}
Now, I want to save the d dictionary to a file, but I do not want to see u'\u0141\xf3d\u017a' as the locality name. I want to see Łódź. Indeed:
\u0141 - http://www.fileformat.info/info/unicode/char/0141/index.htm
\xf3 = \u00f3 - http://www.fileformat.info/info/unicode/char/00f3/index.htm
\u017a - http://www.fileformat.info/info/unicode/char/017a/index.htm
So, I have tried this:
with codecs.open("aa.txt", "w", "utf-8") as f:
f.write(unicode(d))
and this:
with codecs.open("aa.txt", "w", "utf-8") as f:
f.write(unicode(str(d), "utf-8"))
and this:
with open("aa.txt", "w") as f:
f.write(unicode(d))
And of course, nothing works. All the trials yield \u0141\xf3d\u017a. How can I save it correctly?
Pass ensure_ascii=False to json.dump*() and use codecs.open().
The first form is right for writing unicode to the file:
>>> s = u'\u0141\xf3d\u017a'
>>> with codecs.open('aa.txt', 'w', 'utf-8') as f:
... f.write(s)
...
>>> with codecs.open('aa.txt', 'r', 'utf-8') as f:
... print f.read()
...
Łódź
What's happenning is that you're saving the representation for the dictionary when you use unicode(d).
>>> unicode(d)
u"{u'locality': u'\\u0141\\xf3d\\u017a'}"
Which is equivalent to:
>>> unicode(repr(d))
u"{u'locality': u'\\u0141\\xf3d\\u017a'}"
So, you aren't really writing down Łódź to the file. Notice the original escape sequences are escaped. u'\u0141' is the Ł char, but u'\u0141' is a string of 6 chars.
Since Python dictionaries don't have a unicode representation that won't do that escaping, you should use a better serialization method. Using json should be fine if the application that will read the file supports it.
If you really need to write it down to a file readable by some other application that do not support the same serialization method, you have to iterate over the dict and write down the key, value pairs one at a time, not the representation.
A file is a stream of bytes, so your unicode needs to be encoded (represented as bytes) before saved in the file.
Now, when opening (reading data from file), you need to decode the data back to unicode, using the same decoding (encoding) scheme, e.g. utf-8
Be careful to write a serialization of your object inside the file, and not the representation of it. Use json.dumps(d) to obtain a serialization and json.loads(filecontent) to read it back

Python: String of 1s and 0s -> binary file

I have a string of 1's and 0's in Python and I would like to write it to a binary file. I'm having a lot of trouble with finding a good way to do this.
Is there a standard way to do this that I'm simply missing?
If you want a binary file,
>>> import struct
>>> myFile=open('binaryFoo','wb')
>>> myStr='10010101110010101'
>>> x=int(myStr,2)
>>> x
76693
>>> struct.pack('i',x)
'\x95+\x01\x00'
>>> myFile.write(struct.pack('i',x))
>>> myFile.close()
>>> quit()
$ cat binaryFoo
�+$
Is this what you are looking for?
In [1]: int('10011001',2)
Out[1]: 153
Split your input into pieces of eight bits, then apply int(_, 2) and chr, then concatenate into a string and write this string to a file.
Something like...:
your_file.write(''.join(chr(int(your_input[8*k:8*k+8], 2)) for k in xrange(len(your_input)/8)))
There is a bitstring module now which does what you need.
from bitstring import BitArray
my_str = '001001111'
binary_file = open('file.bin', 'wb')
b = BitArray(bin=my_str)
b.tofile(binary_file)
binary_file.close()
You can test it from the shell in Linux with xxd -b file.bin
Or you can use the array module like this
$ python
Python 2.7.2+ (default, Oct 4 2011, 20:06:09)
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import random,array
#This is the best way, I could think of for coming up with an binary string of 100000
>>> binStr=''.join([str(random.randrange(0,2)) for i in range(100000)])
>>> len(binStr)
100000
>>> a = array.array("c", binStr)
#c is the type of data (character)
>>> with open("binaryFoo", "ab") as f:
... a.tofile(f)
...
#raw writing to file
>>> quit()
$
BITS_IN_BYTE = 8
chars = '00111110'
bytes = bytearray(int(chars[i:i+BITS_IN_BYTE], 2)
for i in xrange(0, len(chars), BITS_IN_BYTE))
open('filename', 'wb').write(bytes)

Python search and replace in binary file

I am trying to search and replace some of the text (eg 'Smith, John') in this pdf form file (header.fdf, I presumed this is treated as binary file):
'%FDF-1.2\n%\xe2\xe3\xcf\xd3\n1 0 obj\n<</FDF<</Fields[<</V(M)/T(PatientSexLabel)>><</V(24-09-1956 53)/T(PatientDateOfBirth)>><</V(Fisher)/T(PatientLastNameLabel)>><</V(CNSL)/T(PatientConsultant)>><</V(28-01-2010 18:13)/T(PatientAdmission)>><</V(134 Field Street\\rBlackburn BB1 1BB)/T(PatientAddressLabel)>><</V(Smith, John)/T(PatientName)>><</V(24-09-1956)/T(PatientDobLabel)>><</V(0123456)/T(PatientRxr)>><</V(01234567891011)/T(PatientNhsLabel)>><</V(John)/T(PatientFirstNameLabel)>><</V(0123456)/T(PatientRxrLabel)>>]>>>>\nendobj\ntrailer\n<</Root 1 0 R>>\n%%EOF\n'
After
f=open("header.fdf","rb")
s=f.read()
f.close()
s=s.replace(b'PatientName',name)
the following error occurs:
Traceback (most recent call last):
File "/home/aj/Inkscape/Med/GAD/gad.py", line 56, in <module>
s=s.replace(b'PatientName',name)
TypeError: expected an object with the buffer interface
How best to do this?
f=open("header.fdf","rb")
s=str(f.read())
f.close()
s=s.replace(b'PatientName',name)
or
f=open("header.fdf","rb")
s=f.read()
f.close()
s=s.replace(b'PatientName',bytes(name))
probably the latter, as I don't think you are going to be able to use unicode names with this type of substitution anyway
You must be using Python 3.X. You didn't define 'name' in your example, but it is the problem. Likely you defined it as a Unicode string:
name = 'blah'
It needs to be a bytes object too:
name = b'blah'
This works:
Python 3.1.2 (r312:79149, Mar 21 2010, 00:41:52) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> f = open('file.txt','rb')
>>> s = f.read()
>>> f.close()
>>> s
b'Test File\r\n'
>>> name = b'Replacement'
>>> s=s.replace(b'File',name)
>>> s
b'Test Replacement\r\n'
In a bytes object, the arguments to replace must both be bytes objects.

Categories

Resources