Python ConfigParser elements into CSV arguments - python

I have a script that parses a csv file and produces an XML file. One of the arguments I have to give the parser is the delimiter, which in my case is not a comma but a tab.
This information is stored in a configuration file which I extract and then pass to the csv parser.
ident = parser.get('CSV', 'delimiter') #delimiter taken from config file
csv.register_dialect('custom',
delimiter= ident, #passed to csv parser
doublequote=False,
escapechar=None,
quotechar='"',
quoting=csv.QUOTE_MINIMAL,
skipinitialspace=False)
However I get a type error saying that the "delimiter" must be an 1-character string. I checked the type of ident and it's a string but it doesn't seem to be recognising the \t as a tab. When I put ident = '\t' or delimiter = '\t' it works. How do I get the value correctly from the config file.

Maybe a bit too late, but I have a small workaround: setting the parameter as the hex code value and then decoding it
from ConfigParser import ConfigParser
cp = ConfigParser()
cp.add_section('a')
cp.set('a', 'b', '09') #hex code for tab (please note that there is no \x
cp.write(open('foo.ini', 'w'))
from ConfigParser import ConfigParser
cp_in = ConfigParser()
cp_in.read('foo.ini')
print(repr(bytearray.fromhex(cp_in.get('a', 'b')).decode())) #where the magic happens

This doesn't appear to be possible using ConfigParser.
While the docs don't explicitly mention this case, they do say that leading whitespace will be stripped from values.
Having tried to round-trip the value, it just gets back an empty string:
from ConfigParser import ConfigParser
cp = ConfigParser()
cp.add_section('a')
cp.set('a', 'b', '\t')
cp.write(open('foo.ini', 'w'))
cp_in = ConfigParser()
cp_in.read('foo.ini')
print(repr(cp_in.get('a', 'b'))) # prints ''

I'm adding what I think is the obvious answer that everyone apparently missed. Judging from the comments, the config file looks something like this:
[CSV]
delimiter=\t
quoting=QUOTE_ALL
The value for 'delimiter' is two characters, a backslash and a 't'. Here's how to read it and convert the value into a tab.
>>> import configparser, codecs, csv
>>> parser = configparser.ConfigParser()
>>> parser.read('foo.cfg')
['foo.cfg']
>>> ident = parser.get('CSV', 'delimiter')
>>> csv.register_dialect('custom', delimiter=ident)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: "delimiter" must be a 1-character string
>>> ident, len(ident)
('\\t', 2)
>>> decoded = codecs.decode(ident, encoding='unicode_escape')
>>> csv.register_dialect('custom', delimiter=decoded)
>>> decoded, len(decoded)
('\t', 1)
And here's a bonus:
>>> quoting = parser.get('CSV', 'quoting')
>>> csv.register_dialect('custom', quoting=quoting)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: "quoting" must be an integer
>>> quoting
'QUOTE_ALL'
>>> try:
... quoting = parser.getint('CSV', 'quoting')
... except ValueError:
... quoting = getattr(csv, parser.get('CSV', 'quoting'))
>>> csv.register_dialect('custom', quoting=quoting)
>>> quoting
1

Related

I'm trying to save my result into a new file but got problems - Python

I'm trying to make an script which takes all rows starting by 'HELIX', 'SHEET' and 'DBREF' from a .txt, from that rows takes some specifical columns and then saves the results on a new file.
#!/usr/bin/python
import sys
if len(sys.argv) != 3:
print("2 Parameters expected: You must introduce your pdb file and a name for output file.")`
exit()
for line in open(sys.argv[1]):
if 'HELIX' in line:
helix = line.split()
cols_h = helix[0], helix[3:6:2], helix[6:9:2]
elif 'SHEET'in line:
sheet = line.split()
cols_s = sheet[0], sheet[4:7:2], sheet[7:10:2], sheet [12:15:2], sheet[16:19:2]
elif 'DBREF' in line:
dbref = line.split()
cols_id = dbref[0], dbref[3:5], dbref[8:10]
modified_data = open(sys.argv[2],'w')
modified_data.write(cols_id)
modified_data.write(cols_h)
modified_data.write(cols_s)
My problem is that when I try to write my final results it gives this error:
Traceback (most recent call last):
File "funcional2.py", line 21, in <module>
modified_data.write(cols_id)
TypeError: expected a character buffer object
When I try to convert to a string using ''.join() it returns another error
Traceback (most recent call last):
File "funcional2.py", line 21, in <module>
modified_data.write(' '.join(cols_id))
TypeError: sequence item 1: expected string, list found
What am I doing wrong?
Also, if there is some easy way to simplify my code, it'll be great.
PS: I'm no programmer so I'll probably need some explanation if you do something...
Thank you very much.
cols_id, cols_h and cols_s seems to be lists, not strings.
You can only write a string in your file so you have to convert the list to a string.
modified_data.write(' '.join(cols_id))
and similar.
'!'.join(a_list_of_things) converts the list into a string separating each element with an exclamation mark
EDIT:
#!/usr/bin/python
import sys
if len(sys.argv) != 3:
print("2 Parameters expected: You must introduce your pdb file and a name for output file.")`
exit()
cols_h, cols_s, cols_id = []
for line in open(sys.argv[1]):
if 'HELIX' in line:
helix = line.split()
cols_h.append(''.join(helix[0]+helix[3:6:2]+helix[6:9:2]))
elif 'SHEET'in line:
sheet = line.split()
cols_s.append( ''.join(sheet[0]+sheet[4:7:2]+sheet[7:10:2]+sheet[12:15:2]+sheet[16:19:2]))
elif 'DBREF' in line:
dbref = line.split()
cols_id.append(''.join(dbref[0]+dbref[3:5]+dbref[8:10]))
modified_data = open(sys.argv[2],'w')
cols = [cols_id,cols_h,cols_s]
for col in cols:
modified_data.write(''.join(col))
Here is a solution (untested) that separates data and code a little more. There is a data structure (keyword_and_slices) describing the keywords searched in the lines paired with the slices to be taken for the result.
The code then goes through the lines and builds a data structure (keyword2lines) mapping the keyword to the result lines for that keyword.
At the end the collected lines for each keyword are written to the result file.
import sys
from collections import defaultdict
def main():
if len(sys.argv) != 3:
print(
'2 Parameters expected: You must introduce your pdb file'
' and a name for output file.'
)
sys.exit(1)
input_filename, output_filename = sys.argv[1:3]
#
# Pairs of keywords and slices that should be taken from the line
# starting with the respective keyword.
#
keyword_and_slices = [
('HELIX', [slice(3, 6, 2), slice(6, 9, 2)]),
(
'SHEET',
[slice(a, b, 2) for a, b in [(4, 7), (7, 10), (12, 15), (16, 19)]]
),
('DBREF', [slice(3, 5), slice(8, 10)]),
]
keyword2lines = defaultdict(list)
with open(input_filename, 'r') as lines:
for line in lines:
for keyword, slices in keyword_and_slices:
if line.startswith(keyword):
parts = line.split()
result_line = [keyword]
for index in slices:
result_line.extend(parts[index])
keyword2lines[keyword].append(' '.join(result_line) + '\n')
with open(output_filename, 'w') as out_file:
for keyword in ['DBREF', 'HELIX', 'SHEET']:
out_file.writelines(keyword2lines[keyword])
if __name__ == '__main__':
main()
The code follows your text in checking if a line starts with a keyword, instead your code which checks if a keyword is anywhere within a line.
It also makes sure all files are closed properly by using the with statement.
You need to convert the tuple created on RHS in your assignments to string.
# Replace this with statement given below
cols_id = dbref[0], dbref[3:5], dbref[8:10]
# Create a string out of the tuple
cols_id = ''.join((dbref[0], dbref[3:5], dbref[8:10]))

How to parse JSON files with double-quotes inside strings in Python?

I'm trying to read a JSON file in Python. Some of the lines have strings with double quotes inside:
{"Height__c": "8' 0\"", "Width__c": "2' 8\""}
Using a raw string literal produces the right output:
json.loads(r"""{"Height__c": "8' 0\"", "Width__c": "2' 8\""}""")
{u'Width__c': u'2\' 8"', u'Height__c': u'8\' 0"'}
But my string comes from a file, ie:
s = f.readline()
Where:
>>> print repr(s)
'{"Height__c": "8\' 0"", "Width__c": "2\' 8""}'
And json throws the following exception:
json.loads(s) # s = """{"Height__c": "8' 0\"", "Width__c": "2' 8\""}"""
ValueError: Expecting ',' delimiter: line 1 column 21 (char 20)
Also,
>>> s = """{"Height__c": "8' 0\"", "Width__c": "2' 8\""}"""
>>> json.loads(s)
Fails, but assigning the raw literal works:
>>> s = r"""{"Height__c": "8' 0\"", "Width__c": "2' 8\""}"""
>>> json.loads(s)
{u'Width__c': u'2\' 8"', u'Height__c': u'8\' 0"'}
Do I need to write a custom Decoder?
The data file you have does not escape the nested quotes correctly; this can be hard to repair.
If the nested quotes follow a pattern; e.g. always follow a digit and are the last character in each string you can use a regular expression to fix these up. Given your sample data, if all you have is measurements in feet and inches, that's certainly doable:
import re
from functools import partial
repair_nested = partial(re.compile(r'(\d)""').sub, r'\1\\""')
json.loads(repair_nested(s))
Demo:
>>> import json
>>> import re
>>> from functools import partial
>>> s = '{"Height__c": "8\' 0"", "Width__c": "2\' 8""}'
>>> repair_nested = partial(re.compile(r'(\d)""').sub, r'\1\\""')
>>> json.loads(s)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/json/decoder.py", line 365, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/json/decoder.py", line 381, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting , delimiter: line 1 column 21 (char 20)
>>> json.loads(repair_nested(s))
{u'Width__c': u'2\' 8"', u'Height__c': u'8\' 0"'}

How to base64 encode/decode a variable with string type in Python 3?

It gives me an error that the line encoded needs to be bytes not str/dict
I know of adding a "b" before the text will solve that and print the encoded thing.
import base64
s = base64.b64encode(b'12345')
print(s)
>>b'MTIzNDU='
But how do I encode a variable?
such as
import base64
s = "12345"
s2 = base64.b64encode(s)
print(s2)
It gives me an error with the b added and without. I don't understand
I'm also trying to encode/decode a dictionary with base64.
You need to encode the unicode string. If it's just normal characters, you can use ASCII. If it might have other characters in it, or just for general safety, you probably want utf-8.
>>> import base64
>>> s = "12345"
>>> s2 = base64.b64encode(s)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ". . . /lib/python3.3/base64.py", line 58, in b64encode
raise TypeError("expected bytes, not %s" % s.__class__.__name__)
TypeError: expected bytes, not str
>>> s2 = base64.b64encode(s.encode('ascii'))
>>> print(s2)
b'MTIzNDU='
>>>

"list index out of range" in python

I have a code in python to index a text file that contain arabic words. I tested the code on an english text and it works well ,but it gives me an error when i tested an arabic one.
Note: the text file is saved in unicode encoding not in ANSI encoding.
This is my code:
from whoosh import fields, index
import os.path
import csv
import codecs
from whoosh.qparser import QueryParser
# This list associates a name with each position in a row
columns = ["juza","chapter","verse","voc"]
schema = fields.Schema(juza=fields.NUMERIC,
chapter=fields.NUMERIC,
verse=fields.NUMERIC,
voc=fields.TEXT)
# Create the Whoosh index
indexname = "indexdir"
if not os.path.exists(indexname):
os.mkdir(indexname)
ix = index.create_in(indexname, schema)
# Open a writer for the index
with ix.writer() as writer:
with open("h.txt", 'r') as txtfile:
lines=txtfile.readlines()
# Read each row in the file
for i in lines:
# Create a dictionary to hold the document values for this row
doc = {}
thisline=i.split()
u=0
# Read the values for the row enumerated like
# (0, "juza"), (1, "chapter"), etc.
for w in thisline:
# Get the field name from the "columns" list
fieldname = columns[u]
u+=1
#if isinstance(w, basestring):
# w = unicode(w)
doc[fieldname] = w
# Pass the dictionary to the add_document method
writer.add_document(**doc)
with ix.searcher() as searcher:
query = QueryParser("voc", ix.schema).parse(u"بسم")
results = searcher.search(query)
print(len(results))
print(results[1])
Then the error is :
Traceback (most recent call last):
File "C:\Python27\yarab.py", line 38, in <module>
fieldname = columns[u]
IndexError: list index out of range
this is a sample of the file:
1 1 1 كتاب
1 1 2 قرأ
1 1 3 لعب
1 1 4 كتاب
While I cannot see anything obviously wrong with that, I would make sure you're designing for error. Make sure you catch any situation where split() returns more than expected amount of elements and handle it promptly (e.g. print and terminate). It looks like you might be dealing with ill-formatted data.
You missed the header of Unicode in your script. the first line should be:
encoding: utf-8
Also to open a file with the unicode encoding use:
import codecs
with codecs.open("s.txt",encoding='utf-8') as txtfile:

The right and elegant way to split a join a string in Python

I have the following list:
>>> poly
'C:\\04-las_clip_inside_area\\16x16grids_1pp_fsa.shp'
>>> record
1373155
and I wish to create:
'C:\\04-las_clip_inside_area\\16x16grids_1pp_fsa_1373155.txt'
I wish to split in order to get the part "C:\04-las_clip_inside_area\16x16grids_1pp_fsa16x16grids_1pp_fsa".
I have tried this two-code-lines solution:
mylist = [poly.split(".")[0], "_", record, ".txt"]
>>> mylist
['C:\\04-las_clip_inside_area\\16x16grids_1pp_fsa', '_', 1373155, '.txt']
from here, reading the example in Python join, why is it string.join(list) instead of list.join(string)?.
I find this solution to joint, but I get this error message:
>>> mylist.join("")
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
AttributeError: 'list' object has no attribute 'join'
Also if I use:
>>> "".join(mylist)
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
TypeError: sequence item 2: expected string, int found
Python join: why is it string.join(list) instead of list.join(string)?
So there is
"".join(mylist)
instead of
mylist.join("")
There's your error.
To solve your int/string problem, convert the int to string:
mylist= [poly.split(".")[0],"_",str(record),".txt"]
or write directly:
"{}_{}.txt".format(poly.split(".")[0], record)
>>> from os import path
>>>
>>> path.splitext(poly)
('C:\\04-las_clip_inside_area\\16x16grids_1pp_fsa', '.shp')
>>>
>>> filename, ext = path.splitext(poly)
>>> "{0}_{1}.txt".format(filename, record)
'C:\\04-las_clip_inside_area\\16x16grids_1pp_fsa_1373155.txt'
>>> poly = 'C:\\04-las_clip_inside_area\\16x16grids_1pp_fsa.shp'
>>> record = 1373155
>>> "{}_{}.txt".format(poly.rpartition('.')[0], record)
'C:\\04-las_clip_inside_area\\16x16grids_1pp_fsa_1373155.txt'
or if you insist on using join()
>>> "".join([poly.rpartition('.')[0], "_", str(record), ".txt"])
'C:\\04-las_clip_inside_area\\16x16grids_1pp_fsa_1373155.txt'
It's important to use rpartition() (or rsplit()) as otherwise it won't work properly if the path has any other '.''s in it
You need to convert record into a string.
mylist= [poly.split(".")[0],"_",str(record),".txt"]

Categories

Resources