Reading line from CSV file Python given me "" instead of '' (classes) - python

I am reading in a CSV file in Python that looks like this:
REGION,1910,1920,1930,1940,1950,1960,1970,1980,1990,2000,2010
Alabama,2138093,2348174,2646248,2832961,3061743,3266740,3444165,3893888,4040587,4447100,4779736
Alaska,64356,55036,59278,72524,128643,226167,300382,401851,550043,626932,710231
My problem is that when i read the first line it reads it as
REGION,1910,1920,1930,1940,1950,1960,1970,1980,1990,2000,2010
which in first place doesn't seem as much as a problem.
But later on I look for a number so a split the string into a list
lijst_eerste_regel = self.eerste_regel.split(",")
and then look for the index of str(2010) but Python then seems to look for '2010' not "2010". Therefor it won't find the index.
I post the code right here(it is in a class I am having this problem, not sure if that is relevant or not)
import io
class Volkstelling:
def __init__(self,jaartal,csvb):
"""
>>> vs2010 = Volkstelling(2010, 'vs_bevolkingsaantal.csv')
"""
import csv
self.jaartal = jaartal
self.csvb = csvb
self.eerste_regel = next(self.csvb)
if str(jaartal) not in self.eerste_regel:
raise AssertionError ("geen gegevens beschikbaar")
def inwoners(self, regio):
lijst_eerste_regel = self.eerste_regel.split(",")
plaats_jaartal = lijst_eerste_regel.index(self.jaartal) # here is where the error occurs
data = """REGION,1910,1920,1930,1940,1950,1960,1970,1980,1990,2000,2010
Alabama,2138093,2348174,2646248,2832961,3061743,3266740,3444165,3893888,4040587,4447100,4779736
Alaska,64356,55036,59278,72524,128643,226167,300382,401851,550043,626932,710231"""
v = Volkstelling('2010',io.StringIO(data))
v.inwoners('Alabama')
## ValueError: '2010' not in list

Your code had several issues leading to 2010 being not found:
If you read in files, each line has a newline character, commonly represented as \n, at the end. Insert the following code into your inwoners function to see the newline character behind 2010:
print(lijst_eerste_regel)
You can remove whitespaces and newlines using the python function 'SOME STRING'.strip()
Your function did not return a value, so you get None from inwoners even if it would run correctly.
The following example works:
import io
class Volkstelling:
def __init__(self,jaartal,csvb):
"""
>>> vs2010 = Volkstelling(2010, 'vs_bevolkingsaantal.csv')
"""
import csv
self.jaartal = jaartal
self.csvb = csvb
self.eerste_regel = next(self.csvb)
if str(jaartal) not in self.eerste_regel:
raise AssertionError ("geen gegevens beschikbaar")
def inwoners(self, regio):
lijst_eerste_regel = [s.strip() for s in self.eerste_regel.split(",")]
plaats_jaartal = lijst_eerste_regel.index(self.jaartal)
return plaats_jaartal # Returns the column index where to find the no of inhabitants
data = """REGION,1910,1920,1930,1940,1950,1960,1970,1980,1990,2000,2010
Alabama,2138093,2348174,2646248,2832961,3061743,3266740,3444165,3893888,4040587,4447100,4779736
Alaska,64356,55036,59278,72524,128643,226167,300382,401851,550043,626932,710231"""
v2 = Volkstelling('1920',io.StringIO(data))
print(v2.inwoners('Alabama'))
## -> prints 2
v1 = Volkstelling('2010',io.StringIO(data))
print(v1.inwoners('Alabama'))
## -> prints 11

Related

How to read values one whitespace separated value at a time?

In C++ you can read one value at a time like this:
//from console
cin >> x;
//from file:
ifstream fin("file name");
fin >> x;
I would like to emulate this behaviour in Python. It seems, however, that the ordinary ways to get input in Python read either whole lines, the whole file, or a set number of bits.
I would like a function, let's call it one_read(), that reads from a file until it encounters either a white-space or a newline character, then stops. Also, on subsequent calls to one_read() the input should begin where it left off.
Examples of how it should work:
# file input.in is:
# 5 4
# 1 2 3 4 5
n = int(one_read())
k = int(one_read())
a = []
for i in range(n):
a.append(int(one_read()))
# n = 5 , k = 4 , a = [1,2,3,4,5]
How can I do this?
I think the following should get you close. I admit I haven't tested the code carefully. It sounds like itertools.takewhile should be your friend, and a generator like yield_characters below will be useful.
from itertools import takewhile
import regex as re
# this function yields characters from a file one a at a time.
def yield_characters(file):
with open(file, 'r') as f:
while f:
line = f.readline()
for char in line:
yield char
# double check this. My python regex is weak.
def not_whitespace(char):
return bool(re.match(r"\S", char))
# this should use takewhile to get iterators while something is
def read_one(file):
chars = yield_character(file)
while chars:
yield list(takewhile(not_whitespace, chars)).join()
The read_one above is a generator, so you will need to do something like call list on it.
Normally you would just read a line at a time, then split this and work with each part. However if you can't do this for resource reasons, you can implement your own reader which will read one character at a time, and then yield a word each time it reaches a delimiter (or in this example also a newline or the end of the file).
This implemention uses a context manager to handle the file opening/reading, though this might be overkill:
from functools import partial
class Words():
def __init__(self, fname, delim):
self.delims = ['\n', delim]
self.fname = fname
self.fh = None
def __enter__(self):
self.fh = open(self.fname)
return self
def __exit__(self, exc_type, exc_val, exc_tb):
self.fh.close()
def one_read(self):
chars = []
for char in iter(partial(self.fh.read, 1), ''):
if char in self.delims:
# delimiter signifies end of word
word = ''.join(chars)
chars = []
yield word
else:
chars.append(char)
# Assuming x.txt contains 12 34 567 8910
with Words('/tmp/x.txt', ' ') as w:
print(next(w.one_read()))
# 12
print(next(w.one_read()))
# 34
print(list(w.one_read()))
# [567, 8910]
More or less anything that operates on files in Python can operate on the standard input and standard output. The sys standard library module defines stdin and stdout which give you access to those streams as file-like objects.
Reading a line at a time is considered idiomatic in Python because the other way is quite error-prone (just one C++ example question on Stack Overflow). But if you insist: you will have to build it yourself.
As you've found, .read(n) will read at most n text characters (technically, Unicode code points) from a stream opened in text mode. You can't tell where the end of the word is until you read the whitespace, but you can .seek back one spot - though not on the standard input, which isn't seekable.
You should also be aware that the built-in input will ignore any existing data on the standard input before prompting the user:
>>> sys.stdin.read(1) # blocks
foo
'f'
>>> # the `foo` is our input, the `'f'` is the result
>>> sys.stdin.read(1) # data is available; doesn't block
'o'
>>> input()
bar
'bar'
>>> # the second `o` from the first input was lost
Try creating a class to remember where the operation left off.
The __init__ function takes the filename, you could modify this to take a list or other iterable.
read_one checks if there is anything left to read, and if there is, removes and returns the item at index 0 in the list; that being everything until the first whitespace.
class Reader:
def __init__(self, filename):
self.file_contents = open(filename).read().split()
def read_one(self):
if self.file_contents != []:
return self.file_contents.pop(0)
Initalise the function as follows and adapt to your liking:
reader = Reader(filepath)
reader.read_one()

How to remove brackets and the contents inside from a file

I have a file named sample.txt which looks like below
ServiceProfile.SharediFCList[1].DefaultHandling=1
ServiceProfile.SharediFCList[1].ServiceInformation=
ServiceProfile.SharediFCList[1].IncludeRegisterRequest=n
ServiceProfile.SharediFCList[1].IncludeRegisterResponse=n
Here my requirement is to remove the brackets and the integer and enter os commands with that
ServiceProfile.SharediFCList.DefaultHandling=1
ServiceProfile.SharediFCList.ServiceInformation=
ServiceProfile.SharediFCList.IncludeRegisterRequest=n
ServiceProfile.SharediFCList.IncludeRegisterResponse=n
I am quite a newbie in Python. This is my first attempt. I have used these codes to remove the brackets:
#!/usr/bin/python
import re
import os
import sys
f = os.open("sample.txt", os.O_RDWR)
ret = os.read(f, 10000)
os.close(f)
print ret
var1 = re.sub("[\(\[].*?[\)\]]", "", ret)
print var1f = open("removed.cfg", "w+")
f.write(var1)
f.close()
After this using the file as input I want to form application specific commands which looks like this:
cmcli INS "DefaultHandling=1 ServiceInformation="
and the next set as
cmcli INS "IncludeRegisterRequest=n IncludeRegisterRequest=y"
so basically now I want the all the output to be bunched to a set of two for me to execute the commands on the operating system.
Is there any way that I could bunch them up as set of two?
Reading 10,000 bytes of text into a string is really not necessary when your file is line-oriented text, and isn't scalable either. And you need a very good reason to be using os.open() instead of open().
So, treat your data as the lines of text that it is, and every two lines, compose a single line of output.
from __future__ import print_function
import re
command = [None,None]
cmd_id = 1
bracket_re = re.compile(r".+\[\d\]\.(.+)")
# This doesn't just remove the brackets: what you actually seem to want is
# to pick out everything after [1]. and ignore the rest.
with open("removed_cfg","w") as outfile:
with open("sample.txt") as infile:
for line in infile:
m = bracket_re.match(line)
cmd_id = 1 - cmd_id # gives 0, 1, 0, 1
command[cmd_id] = m.group(1)
if cmd_id == 1: # we have a pair
output_line = """cmcli INS "{0} {1}" """.format(*command)
print (output_line, file=outfile)
This gives the output
cmcli INS "DefaultHandling=1 ServiceInformation="
cmcli INS "IncludeRegisterRequest=n IncludeRegisterResponse=n"
The second line doesn't correspond to your sample output. I don't know how the input IncludeRegisterResponse=n is supposed to become the output IncludeRegisterRequest=y. I assume that's a mistake.
Note that this code depends on your input data being precisely as you describe it and has no error checking whatsoever. So if the format of the input is in reality more variable than that, then you will need to add some validation.

Writing to UTF-16-LE text file with BOM

I've read a few postings regarding Python writing to text files but I could not find a solution to my problem. Here it is in a nutshell.
The requirement: to write values delimited by thorn characters (u00FE; and surronding the text values) and the pilcrow character (u00B6; after each column) to a UTF-16LE text file with BOM (FF FE).
The issue: The written-to text file has whitespace between each column that I did not script for. Also, it's not showing up right in UltraEdit. Only the first value ("mom") shows. I welcome any insight or advice.
The script (simplified to ease troubleshooting; the actual script uses a third-party API to obtain the list of values):
import os
import codecs
import shutil
import sys
import codecs
first = u''
textdel = u'\u00FE'.encode('utf_16_le') #thorn
fielddel = u'\u00B6'.encode('utf_16_le') #pilcrow
list1 = ['mom', 'dad', 'son']
num = len(list1) #pretend this is from the metadata profile
f = codecs.open('c:/myFile.txt', 'w', 'utf_16_le')
f.write(u'\uFEFF')
for item in list1:
mytext2 = u''
i = 0
i = i + 1
mytext2 = mytext2 + item + textdel
if i < (num - 1):
mytext2 = mytext2 + fielddel
f.write(mytext2 + u'\n')
f.close()
You're double-encoding your strings. You've already opened your file as UTF-16-LE, so leave your textdel and fielddel strings unencoded. They will get encoded at write time along with every line written to the file.
Or put another way, textdel = u'\u00FE' sets textdel to the "thorn" character, while textdel = u'\u00FE'.encode('utf-16-le') sets textdel to a particular serialized form of that character, a sequence of bytes according to that codec; it is no longer a sequence of characters:
textdel = u'\u00FE'
len(textdel) # -> 1
type(textdel) # -> unicode
len(textdel.encode('utf-16-le')) # -> 2
type(textdel.encode('utf-16-le')) # -> str

"list index out of range" in python

I have a code in python to index a text file that contain arabic words. I tested the code on an english text and it works well ,but it gives me an error when i tested an arabic one.
Note: the text file is saved in unicode encoding not in ANSI encoding.
This is my code:
from whoosh import fields, index
import os.path
import csv
import codecs
from whoosh.qparser import QueryParser
# This list associates a name with each position in a row
columns = ["juza","chapter","verse","voc"]
schema = fields.Schema(juza=fields.NUMERIC,
chapter=fields.NUMERIC,
verse=fields.NUMERIC,
voc=fields.TEXT)
# Create the Whoosh index
indexname = "indexdir"
if not os.path.exists(indexname):
os.mkdir(indexname)
ix = index.create_in(indexname, schema)
# Open a writer for the index
with ix.writer() as writer:
with open("h.txt", 'r') as txtfile:
lines=txtfile.readlines()
# Read each row in the file
for i in lines:
# Create a dictionary to hold the document values for this row
doc = {}
thisline=i.split()
u=0
# Read the values for the row enumerated like
# (0, "juza"), (1, "chapter"), etc.
for w in thisline:
# Get the field name from the "columns" list
fieldname = columns[u]
u+=1
#if isinstance(w, basestring):
# w = unicode(w)
doc[fieldname] = w
# Pass the dictionary to the add_document method
writer.add_document(**doc)
with ix.searcher() as searcher:
query = QueryParser("voc", ix.schema).parse(u"بسم")
results = searcher.search(query)
print(len(results))
print(results[1])
Then the error is :
Traceback (most recent call last):
File "C:\Python27\yarab.py", line 38, in <module>
fieldname = columns[u]
IndexError: list index out of range
this is a sample of the file:
1 1 1 كتاب
1 1 2 قرأ
1 1 3 لعب
1 1 4 كتاب
While I cannot see anything obviously wrong with that, I would make sure you're designing for error. Make sure you catch any situation where split() returns more than expected amount of elements and handle it promptly (e.g. print and terminate). It looks like you might be dealing with ill-formatted data.
You missed the header of Unicode in your script. the first line should be:
encoding: utf-8
Also to open a file with the unicode encoding use:
import codecs
with codecs.open("s.txt",encoding='utf-8') as txtfile:

Why am I getting an IndexError: string index out of range?

I am running the following code on ubuntu 11.10, python 2.7.2+.
import urllib
import Image
import StringIO
source = '/home/cah/Downloads/evil2.gfx'
dataFile = open(source, 'rb').read()
slicedFile1 = StringIO.StringIO(dataFile[::5])
slicedFile2 = StringIO.StringIO(dataFile[1::5])
slicedFile3 = StringIO.StringIO(dataFile[2::5])
slicedFile4 = StringIO.StringIO(dataFile[3::5])
jpgimage1 = Image.open(slicedFile1)
jpgimage1.save('/home/cah/Documents/pychallenge12.1.jpg')
pngimage1 = Image.open(slicedFile2)
pngimage1.save('/home/cah/Documents/pychallenge12.2.png')
gifimage1 = Image.open(slicedFile3)
gifimage1.save('/home/cah/Documents/pychallenge12.3.gif')
pngimage2 = Image.open(slicedFile4)
pngimage2.save('/home/cah/Documents/pychallenge12.4.png')
in essence i'm taking a .bin file that has hex code for several image files jumbled
like 123451234512345... and clumping together then saving. The problem is i'm getting the following error:
File "/usr/lib/python2.7/dist-packages/PIL/PngImagePlugin.py", line 96, in read
len = i32(s)
File "/usr/lib/python2.7/dist-packages/PIL/PngImagePlugin.py", line 44, in i32
return ord(c[3]) + (ord(c[2])<<8) + (ord(c[1])<<16) + (ord(c[0])<<24)
IndexError: string index out of range
i found the PngImagePlugin.py and I looked at what it had:
def i32(c):
return ord(c[3]) + (ord(c[2])<<8) + (ord(c[1])<<16) + (ord(c[0])<<24) (line 44)
"Fetch a new chunk. Returns header information."
if self.queue:
cid, pos, len = self.queue[-1]
del self.queue[-1]
self.fp.seek(pos)
else:
s = self.fp.read(8)
cid = s[4:]
pos = self.fp.tell()
len = i32(s) (lines 88-96)
i would try tinkering, but I'm afraid I'll screw up png and PIL, which have been erksome to get working.
thanks
It would appear that len(s) < 4 at this stage
len = i32(s)
Which means that
s = self.fp.read(8)
isn't reading the whole 4 bytes
probably the data in the fp you are passing isn't making sense to the image decoder.
Double check that you are slicing correctly
Make sure that the string you are passing in is of at least length 4.

Categories

Resources