How can I loop through blocks of lines in a file?

How can I loop through blocks of lines in a file? - python

I have a text file that looks like this, with blocks of lines separated by blank lines:
ID: 1
Name: X
FamilyN: Y
Age: 20
ID: 2
Name: H
FamilyN: F
Age: 23
ID: 3
Name: S
FamilyN: Y
Age: 13
ID: 4
Name: M
FamilyN: Z
Age: 25
How can I loop through the blocks and process the data in each block? eventually I want to gather the name, family name and age values into three columns, like so:
Y X 20
F H 23
Y S 13
Z M 25

Here's another way, using itertools.groupby.
The function groupy iterates through lines of the file and calls isa_group_separator(line) for each line. isa_group_separator returns either True or False (called the key), and itertools.groupby then groups all the consecutive lines that yielded the same True or False result.
This is a very convenient way to collect lines into groups.
import itertools
def isa_group_separator(line):
return line=='\n'
with open('data_file') as f:
for key,group in itertools.groupby(f,isa_group_separator):
# print(key,list(group)) # uncomment to see what itertools.groupby does.
if not key: # however, this will make the rest of the code not work
data={} # as it exhausts the `group` iterator
for item in group:
field,value=item.split(':')
value=value.strip()
data[field]=value
print('{FamilyN} {Name} {Age}'.format(**data))
# Y X 20
# F H 23
# Y S 13
# Z M 25

Use a generator.
def blocks( iterable ):
accumulator= []
for line in iterable:
if start_pattern( line ):
if accumulator:
yield accumulator
accumulator= []
# elif other significant patterns
else:
accumulator.append( line )
if accumulator:
yield accumulator

import re
result = re.findall(
r"""(?mx) # multiline, verbose regex
^ID:.*\s* # Match ID: and anything else on that line
Name:\s*(.*)\s* # Match name, capture all characters on this line
FamilyN:\s*(.*)\s* # etc. for family name
Age:\s*(.*)$ # and age""",
subject)
Result will then be
[('X', 'Y', '20'), ('H', 'F', '23'), ('S', 'Y', '13'), ('M', 'Z', '25')]
which can be trivially changed into whatever string representation you want.

If your file is too large to read into memory all at once, you can still use a regular expressions based solution by using a memory mapped file, with the mmap module:
import sys
import re
import os
import mmap
block_expr = re.compile('ID:.*?\nAge: \d+', re.DOTALL)
filepath = sys.argv[1]
fp = open(filepath)
contents = mmap.mmap(fp.fileno(), os.stat(filepath).st_size, access=mmap.ACCESS_READ)
for block_match in block_expr.finditer(contents):
print block_match.group()
The mmap trick will provide a "pretend string" to make regular expressions work on the file without having to read it all into one large string. And the find_iter() method of the regular expression object will yield matches without creating an entire list of all matches at once (which findall() does).
I do think this solution is overkill for this use case however (still: it's a nice trick to know...)

If file is not huge you can read whole file with:
content = f.open(filename).read()
then you can split content to blocks using:
blocks = content.split('\n\n')
Now you can create function to parse block of text. I would use split('\n') to get lines from block and split(':') to get key and value, eventually with str.strip() or some help of regular expressions.
Without checking if block has required data code can look like:
f = open('data.txt', 'r')
content = f.read()
f.close()
for block in content.split('\n\n'):
person = {}
for l in block.split('\n'):
k, v = l.split(': ')
person[k] = v
print('%s %s %s' % (person['FamilyN'], person['Name'], person['Age']))

import itertools
# Assuming input in file input.txt
data = open('input.txt').readlines()
records = (lines for valid, lines in itertools.groupby(data, lambda l : l != '\n') if valid)
output = [tuple(field.split(':')[1].strip() for field in itertools.islice(record, 1, None)) for record in records]
# You can change output to generator by
output = (tuple(field.split(':')[1].strip() for field in itertools.islice(record, 1, None)) for record in records)
# output = [('X', 'Y', '20'), ('H', 'F', '23'), ('S', 'Y', '13'), ('M', 'Z', '25')]
#You can iterate and change the order of elements in the way you want
# [(elem[1], elem[0], elem[2]) for elem in output] as required in your output

This answer isn't necessarily better than what's already been posted, but as an illustration of how I approach problems like this it might be useful, especially if you're not used to working with Python's interactive interpreter.
I've started out knowing two things about this problem. First, I'm going to use itertools.groupby to group the input into lists of data lines, one list for each individual data record. Second, I want to represent those records as dictionaries so that I can easily format the output.
One other thing that this shows is how using generators makes breaking a problem like this down into small parts easy.
>>> # first let's create some useful test data and put it into something
>>> # we can easily iterate over:
>>> data = """ID: 1
Name: X
FamilyN: Y
Age: 20
ID: 2
Name: H
FamilyN: F
Age: 23
ID: 3
Name: S
FamilyN: Y
Age: 13"""
>>> data = data.split("\n")
>>> # now we need a key function for itertools.groupby.
>>> # the key we'll be grouping by is, essentially, whether or not
>>> # the line is empty.
>>> # this will make groupby return groups whose key is True if we
>>> care about them.
>>> def is_data(line):
return True if line.strip() else False
>>> # make sure this really works
>>> "\n".join([line for line in data if is_data(line)])
'ID: 1\nName: X\nFamilyN: Y\nAge: 20\nID: 2\nName: H\nFamilyN: F\nAge: 23\nID: 3\nName: S\nFamilyN: Y\nAge: 13\nID: 4\nName: M\nFamilyN: Z\nAge: 25'
>>> # does groupby return what we expect?
>>> import itertools
>>> [list(value) for (key, value) in itertools.groupby(data, is_data) if key]
[['ID: 1', 'Name: X', 'FamilyN: Y', 'Age: 20'], ['ID: 2', 'Name: H', 'FamilyN: F', 'Age: 23'], ['ID: 3', 'Name: S', 'FamilyN: Y', 'Age: 13'], ['ID: 4', 'Name: M', 'FamilyN: Z', 'Age: 25']]
>>> # what we really want is for each item in the group to be a tuple
>>> # that's a key/value pair, so that we can easily create a dictionary
>>> # from each item.
>>> def make_key_value_pair(item):
items = item.split(":")
return (items[0].strip(), items[1].strip())
>>> make_key_value_pair("a: b")
('a', 'b')
>>> # let's test this:
>>> dict(make_key_value_pair(item) for item in ["a:1", "b:2", "c:3"])
{'a': '1', 'c': '3', 'b': '2'}
>>> # we could conceivably do all this in one line of code, but this
>>> # will be much more readable as a function:
>>> def get_data_as_dicts(data):
for (key, value) in itertools.groupby(data, is_data):
if key:
yield dict(make_key_value_pair(item) for item in value)
>>> list(get_data_as_dicts(data))
[{'FamilyN': 'Y', 'Age': '20', 'ID': '1', 'Name': 'X'}, {'FamilyN': 'F', 'Age': '23', 'ID': '2', 'Name': 'H'}, {'FamilyN': 'Y', 'Age': '13', 'ID': '3', 'Name': 'S'}, {'FamilyN': 'Z', 'Age': '25', 'ID': '4', 'Name': 'M'}]
>>> # now for an old trick: using a list of column names to drive the output.
>>> columns = ["Name", "FamilyN", "Age"]
>>> print "\n".join(" ".join(d[c] for c in columns) for d in get_data_as_dicts(data))
X Y 20
H F 23
S Y 13
M Z 25
>>> # okay, let's package this all into one function that takes a filename
>>> def get_formatted_data(filename):
with open(filename, "r") as f:
columns = ["Name", "FamilyN", "Age"]
for d in get_data_as_dicts(f):
yield " ".join(d[c] for c in columns)
>>> print "\n".join(get_formatted_data("c:\\temp\\test_data.txt"))
X Y 20
H F 23
S Y 13
M Z 25

Use a dict, namedtuple, or custom class to store each attribute as you come across it, then append the object to a list when you reach a blank line or EOF.

simple solution:
result = []
for record in content.split('\n\n'):
try:
id, name, familyn, age = map(lambda rec: rec.split(' ', 1)[1], record.split('\n'))
except ValueError:
pass
except IndexError:
pass
else:
result.append((familyn, name, age))

Along with the half-dozen other solutions I already see here, I'm a bit surprised that no one has been so simple-minded (that is, generator-, regex-, map-, and read-free) as to propose, for example,
fp = open(fn)
def get_one_value():
line = fp.readline()
if not line:
return None
parts = line.split(':')
if 2 != len(parts):
return ''
return parts[1].strip()
# The result is supposed to be a list.
result = []
while 1:
# We don't care about the ID.
if get_one_value() is None:
break
name = get_one_value()
familyn = get_one_value()
age = get_one_value()
result.append((name, familyn, age))
# We don't care about the block separator.
if get_one_value() is None:
break
for item in result:
print item
Re-format to taste.

Related

Python : convert text file to dict

I would like to convert but I have gotten an error
A 123 132 21
B 34 293 91
d = {}
with open("ab.txt") as f:
for line in f:
(key, val) = line.split()
d[(key)] = val
print(d)

You probably got a "ValueError: too many values to unpack (expected 2)" error.
I am assuming you are trying to create a dictionary where the results will be:
d = {'A': [123, 132, 21], 'B': [34, 293, 91]}
If that is the case, you need:
d = {}
with open("ab.txt") as f:
for line in f:
(key, *val) = line.split() # this will allow val to become a list of the remaining values
d[key] = list(map(lambda x: int(x), val))
print(d)
You were just missing the * before val to cause iterable unpacking: Its operand must be an iterable. The iterable is expanded into a sequence of items, which are included in the new tuple, list, or set, at the site of the unpacking.
Update
From the Python manual concerning the map(function, iterable, ...) function:
Return an iterator that applies function to every item of iterable, yielding the results. If additional iterable arguments are passed, function must take that many arguments and is applied to the items from all iterables in parallel. With multiple iterables, the iterator stops when the shortest iterable is exhausted. For cases where the function inputs are already arranged into argument tuples, see itertools.starmap().
In this case we use a lambda function as the first argument to map. It takes one argument and converts it to an int.
Then we apply the list() constructor to the iterator returned by the map function to create a list of int items.
Crank up an IDE (you can just type python with no arguments):
>>> val = ['123', '132', '21']
>>> m = map(lambda x: int(x), val)
>>> m
<map object at 0x00000279A94F1358>
>>> list(m)
[123, 132, 21]
Or:
>>> val = ['123', '132', '21']
>>> m = map(lambda x: int(x), val)
>>> for x in m: print(x)
...
123
132
21
And:
>>> l = lambda x, y: x + y
>>> l(7,9)
16
>>>
You can also accomplish the conversion with a list comprehension:
Most people find the list comprehension method easier to read/understand compared with using map, but I wanted to show you both ways:
d = {}
with open("ab.txt") as f:
for line in f:
(key, *val) = line.split() # this will allow val to become a list of the remaining values
d[key] = [int(x) for x in val]
print(d)

Depending on what you want to achieve, either:
replace line.split() with line.split(maxsplit=1) - this way
values of your dict will be strings like "123 132 21"
replace (key, val) = line.split() with (key, *val) = line.split() - this way values of your dict will be lists like [123, 132, 21]

The built-in function str.split() splits a string wherever it finds whitespace when no separator is specified. So, using your first line as an example, the following happens:
>>> 'A 123 132 21'.split()
['A', '123', '132', '21']
Thus, you get too many values to unpack in your definition of the tuple. To fix it, you could use the star * operator (I don't know the proper name for that operator) in front of your variable definition
d = {}
with open("ab.txt") as f:
for line in f:
(key, *val) = line.split()
d[(key)] = val
print(d)
# Prints {'A': ['123', '132', '21'], 'B': ['34', '293', '91']}
This is explained in the Python tutorial.
Another option is to use the named maxsplit parameter of the split() function:
d = {}
with open("ab.txt") as f:
for line in f:
(key, val) = line.split(maxsplit=1)
d[(key)] = val.strip()
print(d)
Note that I had to append strip() when setting the entry to the dictionary to get rid of a trailing newline \n.

Yon can do it this way :
fichier = open("ab.txt", 'r')
d = {}
for line in fichier.readlines():
d[line[0]] = line[2:].rstrip().rsplit(' ')
print(d)
Output :
{'A': ['123', '132', '21'], 'B': ['34', '293', '91']}
Ask me if you don't understand something :).

pattern match get list and dict from string

I have string below,and I want to get list,dict,var from this string.
How can I to split this string to specific format?
s = 'list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}'
import re
m1 = re.findall (r'(?=.*,)(.*?=\[.+?\],?)',s)
for i in m1 :
print('m1:',i)
I only get result 1 correctly.
Does anyone know how to do?
m1: list_c=[1,2],
m1: a=3,b=1.3,c=abch,list_a=[1,2],

Use '=' to split instead, then you can work around with variable name and it's value.
You still need to handle the type casting for values (regex, split, try with casting may help).
Also, same as others' comment, using dict may be easier to handle
s = 'list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}'
al = s.split('=')
var_l = [al[0]]
value_l = []
for a in al[1:-1]:
var_l.append(a.split(',')[-1])
value_l.append(','.join(a.split(',')[:-1]))
value_l.append(al[-1])
output = dict(zip(var_l, value_l))
print(output)

You may have better luck if you more or less explicitly describe the right-hand side expressions: numbers, lists, dictionaries, and identifiers:
re.findall(r"([^=]+)=" # LHS and assignment operator
+r"([+-]?\d+(?:\.\d+)?|" # Numbers
+r"[+-]?\d+\.|" # More numbers
+r"\[[^]]+\]|" # Lists
+r"{[^}]+}|" # Dictionaries
+r"[a-zA-Z_][a-zA-Z_\d]*)", # Idents
s)
# [('list_c', '[1,2]'), ('a', '3'), ('b', '1.3'), ('c', 'abch'),
# ('list_a', '[1,2]'), ('dict_a', '{a:2,b:3}')]

The answer is like below
import re
from pprint import pprint
s = 'list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1],Save,Record,dict_a={a:2,b:3}'
m1 = re.findall(r"([^=]+)=" # LHS and assignment operator
+r"([+-]?\d+(?:\.\d+)?|" # Numbers
+r"[+-]?\d+\.|" # More numbers
+r"\[[^]]+\]|" # Lists
+r"{[^}]+}|" # Dictionaries
+r"[a-zA-Z_][a-zA-Z_\d]*)", # Idents
s)
temp_d = {}
for i,j in m1:
temp = i.strip(',').split(',')
if len(temp)>1:
for k in temp[:-1]:
temp_d[k]=''
temp_d[temp[-1]] = j
else:
temp_d[temp[0]] = j
pprint(temp_d)
Output is like
{'Record': '',
'Save': '',
'a': '3',
'b': '1.3',
'c': 'abch',
'dict_a': '{a:2,b:3}',
'list_a': '[1]',
'list_c': '[1,2]'}

Instead of picking out the types, you can start by capturing the identifiers. Here's a regex that captures all the identifiers in the string (for lowercase only, but see note):
regex = re.compile(r'([a-z]|_)+=')
#note if you want all valid variable names: r'([a-z]|[A-Z]|[0-9]|_)+'
cases = [x.group() for x in re.finditer(regex, s)]
This gives a list of all the identifiers in the string:
['list_c=', 'a=', 'b=', 'c=', 'list_a=', 'dict_a=']
We can now define a function to sequentially chop up s using the
above list to partition the string sequentially:
def chop(mystr, mylist):
temp = mystr.partition(mylist[0])[2]
cut = temp.find(mylist[1]) #strip leading bits
return mystr.partition(mylist[0])[2][cut:], mylist[1:]
mystr = s[:]
temp = [mystr]
mylist = cases[:]
while len() > 1:
mystr, mylist = chop(mystr, mylist)
temp.append(mystr)
This (convoluted) slicing operation gives this list of strings:
['list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}',
'a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}',
'b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}',
'c=abch,list_a=[1,2],dict_a={a:2,b:3}',
'list_a=[1,2],dict_a={a:2,b:3}',
'dict_a={a:2,b:3}']
Now cut off the ends using each successive entry:
result = []
for x in range(len(temp) - 1):
cut = temp[x].find(temp[x+1]) - 1 #-1 to remove commas
result.append(temp[x][:cut])
result.append(temp.pop()) #get the last item
Now we have the full list:
['list_c=[1,2]', 'a=3', 'b=1.3', 'c=abch', 'list_a=[1,2]', 'dict_a={a:2,b:3}']
Each element is easily parsable into key:value pairs (and is also executable via exec).

Creating dicts from file in python

For example, I've got file with multilines like
<<something>> 1, 5, 8
<<somethingelse>> hello
<<somethingelseelse>> 1,5,6
I need to create dict with keys
dict = { "something":[1,5,8], "somethingelse": "hello" ...}
I need to somehow read what is inside << >> and put it as a key, and also I need to check if there are a lot of elements or only 1. If only one then I put it as string. If more then one then I need to put it as a list of elements.
Any ideas how to help me?
Maybe regEx's but I'm not great with them.
I easily created def which is reading a file lines, but don't know how to separate those values:
f = open('something.txt', 'r')
lines = f.readlines()
f.close()
def finding_path():
for line in lines:
print line
finding_path()
f.close()
Any ideas? Thanks :)

Assuming that your keys will always be single words, you can play with split(char, maxSplits). Something like below
import sys
def finding_path(file_name):
f = open(file_name, 'r')
my_dict = {}
for line in f:
# split on first occurance of space
key_val_pair = line.split(' ', 1)
# if we do have a key seprated by a space
if len(key_val_pair) > 1:
key = key_val_pair[0]
# proceed only if the key is enclosed within '<<' and '>>'
if key.startswith('<<') and key.endswith('>>'):
key = key[2:-2]
# put more than one value in list, otherwise directly a string literal
val = key_val_pair[1].split(',') if ',' in key_val_pair[1] else key_val_pair[1]
my_dict[key] = val
print my_dict
f.close()
if __name__ == '__main__':
finding_path(sys.argv[1])
Using a file like below
<<one>> 1, 5, 8
<<two>> hello
// this is a comment, it will be skipped
<<three>> 1,5,6
I get the output
{'three': ['1', '5', '6\n'], 'two': 'hello\n', 'one': ['1', ' 5', ' 8\n']}

Please check the below code:
Used regex to get key and value
If the length of value list is 1, then converting it into string.
import re
demo_dict = {}
with open("val.txt",'r') as f:
for line in f:
m= re.search(r"<<(.*?)>>(.*)",line)
if m is not None:
k = m.group(1)
v = m.group(2).strip().split(',')
if len(v) == 1:
v = v[0]
demo_dict[k]=v
print demo_dict
Output:
C:\Users\dinesh_pundkar\Desktop>python demo.Py
{'somethingelseelse': [' 1', '5', '6'], 'somethingelse': 'hello', 'something': [
' 1', ' 5', ' 8']}

My answer is similar to Dinesh's. I've added a function to convert the values in the list to numbers if possible, and some error handling so that if a line doesn't match, a useful warning is given.
import re
import warnings
regexp =re.compile(r'<<(\w+)>>\s+(.*)')
lines = ["<<something>> 1, 5, 8\n",
"<<somethingelse>> hello\n",
"<<somethingelseelse>> 1,5,6\n"]
#In real use use a file descriptor instead of the list
#lines = open('something.txt','r')
def get_value(obj):
"""Converts an object to a number if possible,
or a string if not possible"""
try:
return int(obj)
except ValueError:
pass
try:
return float(obj)
except ValueError:
return str(obj)
dictionary = {}
for line in lines:
line = line.strip()
m = re.search(regexp, line)
if m is None:
warnings.warn("Match failed on \n {}".format(line))
continue
key = m.group(1)
value = [get_value(x) for x in m.group(2).split(',')]
if len(value) == 1:
value = value[0]
dictionary[key] = value
print(dictionary)
output
{'something': [1, 5, 8], 'somethingelse': 'hello', 'somethingelseelse': [1, 5, 6]}

Using `itertools.groupby()` to get lists of runs of strings that start with `A`?

The (abstracted) problem is this: I have a log file
A: 1
A: 2
A: 3
B: 4
B: 5
A: 6
C: 7
D: 8
A: 9
A: 10
A: 11
and I want to end up with a list of lists like this:
[["1", "2", "3"], ["6"], ["9", "10", "11"]]
where the file has been broken up into "runs" of strings starting with A. I know that I can use itertools.groupby to solve this, and right now I have this solution (where f is a list of the lines in the file).
starts_with_a = lambda x: x.startswith("A")
coalesced = [g for _, g in groupby(f), key=starts_with_a]
runs = [re.sub(r'A: ', '', s) for s in coalesced if starts_with_a(s)]
So I use groupby, but then I have to filter out the stuff that doesn't start with "A". This is okay, and pretty terse, but is there a more elegant way to do it? I'd love a way that:
doesn't require two passes
is terser (and/or) is more readable
Help me harness the might of itertools!

Yes, filter out the lines that don't start with A but use the key produced by groupby() for each group returned. It's the return value of the key function, so it'll be True for those lines that do start with A. I'd use str.partition() here instead of a regular expression:
coalesce = (g for key, g in groupby(f, key=lambda x: x[:1] == "A") if key)
runs = [[res.partition(':')[-1].strip() for res in group] for group in coalesce]
Since your str.startswith() argument is a fixed-width string literal, you may as well use slicing; x[:1] slices of the first character and compares that to 'A', which gives you the same test as x.startswith('A').
I used a generator expression to group the groupby() filtering; you could just inline that into just the one list comprehension:
runs = [[res.partition(':')[-1].strip() for res in group]
for key, group in groupby(f, key=lambda x: x[:1] == "A") if key]
Demo:
>>> from itertools import groupby
>>> f = '''\
... A: 1
... A: 2
... A: 3
... B: 4
... B: 5
... A: 6
... C: 7
... D: 8
... A: 9
... A: 10
... A: 11
... '''.splitlines(True)
>>> coalesce = (g for key, g in groupby(f, key=lambda x: x[:1] == "A") if key)
>>> [[res.partition(':')[-1].strip() for res in group] for group in coalesce]
[['1', '2', '3'], ['6'], ['9', '10', '11']]

You want terse? OK, you got it.
>>> lst = ['A: 1', 'A: 2', 'A: 3', 'B: 4', 'B: 5', 'A: 6', 'C: 7', 'D: 8', 'A: 9', 'A: 10', 'A: 11']
>>> [[x[1] for x in group[1]] for group in itertools.groupby((line.split(': ') for line in lst), key=lambda a:a[0]) if group[0]=='A']
[['1', '2', '3'], ['6'], ['9', '10', '11']]
Breaking it down, from the inside out:
(line.split(': ') for line in lst)
This is a generator expression that splits each element into its alpha key and the associated string value.
for group in itertools.groupby(..., key=lambda a:a[0])
This simply groups the elements by the alpha key that was determined in the first step.
... if group[0]=='A'
This simply excludes any results that don't match the criteria specified in the question. You could also use if not group[0].startswith('A') if the string isn't a single character.
[x[1] for x in group[1]] for ...]
This is a list comprehension that builds a list from the results of groupby that match the earlier condition. groupby returns an iterator as the second return value (group[1]), so we simply turn that iterator into a list with a list comprehension. x[0] is the key value, and x[1] is the string that follows it.
[...]
The desired output is a list, so a list comprehension makes it so. The whole operation occurs with a single pass over the input.

Probably not so pythonic way in one loop without itertools:
lines = '''
A: 1
A: 2
A: 3
B: 4
B: 5
A: 6
C: 7
D: 8
A: 9
A: 10
A: 11
'''
res = []
cont_last = []
for line in lines.splitlines():
if line.startswith('A: '):
cont_last.append(line.replace('A: ', ''))
else:
if cont_last:
res.append(cont_last)
cont_last = []
if cont_last:
res.append(cont_last)
print(res)

Without needing itertools, this does the full file with only one iteration:
lines = open('logfile.txt','r').readlines()
out_list = []
temp_list = []
for line in lines:
if line.split(':')[0].strip() == 'A':
temp_list.append(line.split(':')[1].strip())
elif temp_list:
out_list.append(temp_list)
temp_list = []
if temp_list:
out_list.append(temp_list)
temp_list = []
print (out_list)
I know you asked for itertools I just don't have it handy, so I couldn't have debugged it. Hope this helps.

Getting elements of a csv file

I have a csv file. Each column represents a parameter and contains few values (eg. 1, 2, 3, 5) repeated hundreds of times.
I want to write a python program that reads each column and stocks its content in a dictionary {column_header: list_numbers} (without repeating the numbers).
I tried to adapt the example given in the python documentation:
def getlist(file):
content = dict()
with open(file, newline = '') as inp:
my_reader = reader(inp, delimiter = ' ')
for col in zip(*my_reader):
l = []
for k in col:
if k not in l:
l.append(k)
print(k) # for debugging purposes
content[col[0]] = l
I was expecting, by printing k, to see each element of the column. Instead, I get several columns at a time.
Any idea about what is wrong?

It looks like you are almost there. I'd use a set to detect repeated numbers (more efficient):
def getlist(file):
content = {}
with open(file, newline = '') as inp:
my_reader = reader(inp, delimiter = ' ')
for col in zip(*my_reader):
content[col[0]] = l = []
seen = set()
for k in col[1:]:
if k not in seen:
l.append(k)
seen.add(k)
return content
Make sure you get your delimiter right; if the above doesn't work for you then print() may show you whole rows with the delimiters still in them, as strings.
Say, your file uses , as a delimiter instead, the output would look something like:
{'a,b,c,d': ['0,1,2,3', '1,2,3,4']}
while configuring the correct delimiter would give you:
{'d': ['3', '4'], 'c': ['2', '3'], 'b': ['1', '2'], 'a': ['0', '1']}

Does the following python script work for you?
import csv
test_file = 'test.csv'
csv_file = csv.DictReader(open(test_file, 'rb'), delimiter=',')
for line in csv_file:
print line['x']

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How can I loop through blocks of lines in a file? - python

Use a generator. def blocks( iterable ): accumulator= [] for line in iterable: if start_pattern( line ): if accumulator: yield accumulator accumulator= [] # elif other significant patterns else: accumulator.append( line ) if accumulator: yield accumulator

Use a dict, namedtuple, or custom class to store each attribute as you come across it, then append the object to a list when you reach a blank line or EOF.

simple solution: result = [] for record in content.split('\n\n'): try: id, name, familyn, age = map(lambda rec: rec.split(' ', 1)[1], record.split('\n')) except ValueError: pass except IndexError: pass else: result.append((familyn, name, age))

Related

Python : convert text file to dict

pattern match get list and dict from string

Creating dicts from file in python

Using `itertools.groupby()` to get lists of runs of strings that start with `A`?

Getting elements of a csv file

Categories

Resources