I am new to python and trying to do the following in python 3
I have a text file like this
1 2 3
4 5 6
7 8 9
.
.
I wanted this to be converted into groups of tuple like this
((1,2,3),(4,5,6),(7,8,9),...)
I have tried using
f = open('text.txt', 'r')
f.readlines()
but this is giving me a list of individual words.
could any one help me with this?
A method using csv module -
>>> import csv
>>> f = open('a.txt','r')
>>> c = csv.reader(f,delimiter='\t') #Use the delimiter from the file , if a single space, use a single space, etc.
>>> l = []
>>> for row in c:
... l.append(tuple(map(int, row)))
...
>>> l = tuple(l)
>>> l
(('1', '2', '3'), ('4', '5', '6'), ('7', '8', '9'))
Though if you do not really need the tuples , do not use them, it may be better to just leave them at list.
Both row and l in above code are initially lists.
You may try this,
>>> s = '''1 2 3
4 5 6
7 8 9'''.splitlines()
>>> tuple(tuple(int(j) for j in i.split()) for i in s)
((1, 2, 3), (4, 5, 6), (7, 8, 9))
For your case,
tuple(tuple(int(j) for j in i.split()) for i in f.readlines())
Related
After opening and reading an input file, I'm trying to split the input on different characters. This works well, although I seem to be getting a nested list which I don't want. My list does not look like [[list]], but like ["[list]"]. What did I do wrong here?
The input looks like this:
name1___1 2 3 4 5
5=20=22=10=2=0=0=1=0=1something,something
name2___1 2 3 4
2=30=15=8=4=3=2=0=0=0;
The output looks like this:
["['name1", '', '', "1 2 3 4 5', 'name2", '', '', "1 2 3 4']"]
Here is my code:
file = open("file.txt")
input_of_this_file = file.read()
a = input_of_this_file.split("\n")
b = a[0::2] # so i get only the even lines
c = str(b) # to make it a string so the .strip() works
d = c.strip() # because there were whitespaces
e = d("_")
print e
If i then do:
x = e[0]
I get:
['name1
This removes the outer list, but also removes the last ].
I would like it to look like: name1, name2
So that i only get the names.
Use itertools.islice and a list comprehension.
>>> from itertools import islice
>>> with open("tmp.txt") as f:
... [line.rstrip("\n").split("_") for line in islice(f, None, None, 2)]
...
[['name1', '', '', '1 2 3 4 5'], ['name2', '', '', '1 2 3 4']]
Keeping your code syntax without imports:
c=[]
input_of_file = '''name1___1 2 3 4 5
5=20=22=10=2=0=0=1=0=1something,something
name2___1 2 3 4
2=30=15=8=4=3=2=0=0=0;'''
a = input_of_file.split("\n")
b = a[::2]
for item in b:
new_item = item.split('__')
c.append(new_item)
Results
c = [['name1', '_1 2 3 4 5'], ['name2', '_1 2 3 4']]
c[0][0] = 'name1'
I have a code which returns a dictionary with the
names as the keys and the corresponding values which are tuples of
numbers. The first number after the name controls how many numbers are
in the corresponding tuple (the numbers included in the tuple are taken
from the left to right). For example, the line of text "Ali 6 7 6 5 12 31 61 9" has 6 as the first number after the name and this line of text
becomes the dictionary entry with the keyword "Ali" and the
corresponding value is a tuple made up of the next six integers "Ali":
(7, 6, 5, 12, 31, 61).
This is the film I'm taking the code from
Bella 5 2 6 2 2 30 4 8 9 2
Gill 2 9 7 54 67
Jin 3 26 51 3 344 23
Elmo 4 3 8 6 8
Ali 6 7 6 5 12 31 61 9
the expected output is
Ali : (7, 6, 5, 12, 31, 61)
Bella : (2, 6, 2, 2, 30)
Elmo : (3, 8, 6, 8)
Gill : (9, 7)
Jin : (26, 51, 3)
so i've done like this
def get_names_num_tuple_dict(filename):
file_in = open(filename, 'r')
contents = file_in.read()
file_in.close()
emty_dict = {}
for line in contents:
data = line.strip().split()
key = data[0]
length = int(data[1])
data = tuple(data[2:length + 2])
emty_dict[key] = data
return emty_dict
But I'm having this error
length = int(data[1])
IndexError: list index out of range
Can anyone please help? That will be really helpful. I'm a bit weak with the dictionary as learning for the first time.
Use following code:
def get_names_num_tuple_dict(filename):
emty_dict = {}
with open(filename) as f:
for line in f:
data = line.strip().split()
key = data[0]
length = int(data[1])
data = tuple(data[2:length + 2])
emty_dict[key] = data
return emty_dict
print(get_names_num_tuple_dict('my_filename'))
Output:
{'Bella': ('2', '6', '2', '2', '30'), 'Gill': ('9', '7'), 'Jin': ('26', '51', '3'), 'Elmo': ('3', '8', '6', '8'), 'Ali': ('7', '6', '5', '12', '31', '61')}
Here is what happens:
contents = file_in.read()
Reads your file into string. When you loop over this string it will go character by character and give you IndexError: list index out of range.
Basically try using:
for line in file_in:
data = line.strip().split()
...
I have the following list with 2 elements:
['AGCTT 6 6 35 25 10', 'AGGGT 7 7 28 29 2']
I need to make a list or zip file such that each alphabet corresponds to its number further in the list. For example in list[0] the list/zip should read
{"A":"6", "G":"6", "C":"35","T":"25","T":"10"}
Can I make a list of such lists/zips that stores the corresponding vales for list[0], list[1],...list[n]?
Note: The alphabets can only be A,G,C or T, and the numbers can take anyvalue
Edit 1: Previously, I thought I could use a dictionary. But several members pointed out that this cannot be done. So I just want to make a list or zip or anything else recommended to pair the Alphabet element to its corresponding number.
Use tuples splitting once to get the pairs, then split the second element of each pair, zip together:
l =['AGCTT 6 6 35 25 10', 'AGGGT 7 7 28 29 2']
pairs = [zip(a,b.split()) for a,b in (sub.split(None,1) for sub in l]
Which would give you:
[[('A', '6'), ('G', '6'), ('C', '35'), ('T', '25'), ('T', '10')], [('A', '7'), ('G', '7'), ('G', '28'), ('G', '29'), ('T', '2')]]
Of using a for loop with list.append:
l = ['AGCTT 6 6 35 25 10', 'AGGGT 7 7 28 29 2']
out = []
for a,b in (sub.split(None,1) for sub in l ):
out.append(zip(a,b))
If you want to convert any letter to Z where the digit is < 10, you just need another loop where we check the digit in each pairing:
pairs = [[("Z", i ) if int(i) < 10 else (c, i) for c,i in zip(a, b.split())]
for a,b in (sub.split(None, 1) for sub in l)]
print(pairs)
Which would give you:
[[('Z', '6'), ('Z', '6'), ('C', '35'), ('T', '25'), ('T', '10')], [('Z', '7'), ('Z', '7'), ('G', '28'), ('G', '29'), ('Z', '2')]]
To break it into a regular loop:
pairs = []
for a, b in (sub.split(None, 1) for sub in l):
pairs.append([("Z", i) if int(i) < 10 else (c, i) for c, i in zip(a, b.split())])
print(pairs)
[("Z", i) if int(i) < 10 else (c, i) for c, i in zip(a, b.split())] sets the letter to Z if the corresponding digit i is < 10 or else we just leave the letter as is.
if you want to get back to the original pairs after you just need to transpose with zip:
In [13]: l = ['AGCTT 6 6 35 25 10', 'AGGGT 7 7 28 29 2']
In [14]: pairs = [[("Z", i) if int(i) < 10 else (c, i) for c, i in zip(a, b.split())] for a, b in
....: (sub.split(None, 1) for sub in l)]
In [15]: pairs
Out[15]:
[[('Z', '6'), ('Z', '6'), ('C', '35'), ('T', '25'), ('T', '10')],
[('Z', '7'), ('Z', '7'), ('G', '28'), ('G', '29'), ('Z', '2')]]
In [16]: unzipped = [["".join(a), " ".join(b)] for a, b in (zip(*tup) for tup in pairs)]
In [17]: unzipped
Out[17]: [['ZZCTT', '6 6 35 25 10'], ['ZZGGZ', '7 7 28 29 2']]
zip(*...) will give you the original elements back into a tuple of their own, we then just need to join the strings back together. If you wanted to get back to the total original state you could just join again:
In[18][ " ".join(["".join(a), " ".join(b)]) for a, b in (zip(*tup) for tup in pairs) ]
Out[19]: ['ZZCTT 6 6 35 25 10', 'ZZGGZ 7 7 28 29 2']
If you consider using tuples to pair the items, then this works:
>>> from pprint import pprint
>>> lst = ['AGCTT 6 6 35 25 10', 'AGGGT 7 7 28 29 2']
>>> new_lst = [list(zip(sub[0], sub[1:])) for sub in [i.split() for i in lst]]
>>> pprint(new_lst)
[[('A', '6'), ('G', '6'), ('C', '35'), ('T', '25'), ('T', '10')],
[('A', '7'), ('G', '7'), ('G', '28'), ('G', '29'), ('T', '2')]]
[i.split() for i in lst]: An initial split on the string.
zip(sub[0], sub[1:])): Zip lists of alphabets and list of numbers
Iterate through list > iterate through items (alpha numeric) of the list and construct list of characters and numbers > and then construct list of tuple.
alphanum = ['AGCTT 6 6 35 25 10', 'AGGGT 7 7 28 29 2']
list_of_tuple = []
for s in alphanum:
ints = []
chars = []
for i in s.split():
if i.isdigit():
ints.append(i)
else:
chars.append(i)
new_tuple = []
for (n, item) in enumerate(list(chars[0])):
new_tuple.append((item, ints[n]))
list_of_tuple.append(new_tuple)
print list_of_tuple
This code would work, assuming the elements in the list are correctly formed.
This means the number of letters and numbers must match!
And it will overwrite the value if the key already exists.
list = ['AGCTT 6 6 35 25 10', 'AGGGT 7 7 28 29 2']
dictionary = {}
for line in list:
split_line = line.split()
letters = split_line[0]
iterator = 1
for letter in letters:
dictionary[letter] = split_line[iterator]
iterator += 1
print dictionary
This modified one will check if the key exists and add it to a list with that key:
list = ['AGCTT 6 6 35 25 10', 'AGGGT 7 7 28 29 2']
dictionary = {}
for line in list:
split_line = line.split()
letters = split_line[0]
iterator = 1
for letter in letters:
if letter in dictionary.keys():
dictionary[letter].append(split_line[iterator])
else:
dictionary[letter] = [split_line[iterator]]
iterator += 1
print dictionary
I have a list of tuples that looks like this: B=[('dict1', 'dict2'), (1, 5), (2, 6), (3, 7), (4, 8)]. Of course dict1 and dict2 refer to two dictionaries which values are shown in the table-like view below.
I want to reshape it so that a table-like view is produced, with the purpose of later writing it to a csv file:
dict1 dict2
1 5
2 6
3 7
4 8
I have tried with data=B.reshape(2,5) but to no avail since this is not the way to reshape a list.
How could this be done in a pythonic way? Thanks!
Try
In [22]: import pandas as pd
In [23]: B=[('dict1', 'dict2'), (1, 5), (2, 6), (3, 7),]
In [24]: pd.DataFrame(B).to_csv("my_file.csv", header=False, index=False, sep="\t")
Result:
$ cat my_file.csv
dict1 dict2
1 5
2 6
3 7
If you want to write in a csv file try:
import csv
with open('file.csv', 'wb') as f:
writer = csv.writer(f, delimiter='\t', quoting=csv.QUOTE_NONE)
writer.writerows(B)
Result:
dict1 dict2
1 5
2 6
3 7
4 8
-------------2000--------------
1 17824
2 20131125192004.9
3 690714s1969 dcu 000 0 eng
4 a 75601809
4 a DLC
4 b eng
4 c DLC
5 a WA 750
-------------2001--------------
1 3224
2 20w125192004.9
3 690714s1969 dcu 000 0 eng
5 a WA 120
-------------2002--------------
2 2013341524626245.9
3 484914s1969 dcu 000 0 eng
4 a 75601809
4 a eng
4 c DLC
5 a WA 345
I want to iterate through both the years and the fields under each year (e.g. 1, 2, 3, 4, and 5). a, b, and other alphabet letters after some fields are subfields.
The lines with dashes in my code indicates the year of the entry. Each record group starts at ---year--- and ends at the line before ---year---.
Also, fields is a list:
fields=["1", "2", "3,", "4", "5"].
I'm eventually trying to retrieve the values next to the fields for each entry/year. For example, if my current field is 1, which is equivalent to fields[0], I would iterate through all the years (2000, 2001, and 2002) to get the values for the field 1. The output would be
17824
3224
(Blank space for Year 2002)
How can I iterate through the years (indicated by the dashes)? I can't seem to think of a code to generate the desired output.
You can first use regex to split your text then use itertools.izip_longest within a nested list comprehension to get your expected columns :
>>> import re
>>> blocks=re.split(r'-+\d+-+',s)
>>> from itertools import izip_longest
>>> z=[list(izip_longest(*[k for k in sub if k])) for sub in izip_longest(*[[j.split() for j in i.split('\n')] for i in blocks])]
[[], [('1', '1', '2'), ('17824', '3224', '2013341524626245.9')], [('2', '2', '3'), ('20131125192004.9', '20w125192004.9', '484914s1969'), (None, None, 'dcu'), (None, None, '000'), (None, None, '0'), (None, None, 'eng')], [('3', '3', '4'), ('690714s1969', '690714s1969', 'a'), ('dcu', 'dcu', '75601809'), ('000', '000', None), ('0', '0', None), ('eng', 'eng', None)], [('4', '5', '4'), ('a', 'a', 'a'), ('75601809', 'WA', 'eng'), (None, '120', None)], [('4', '4'), ('a', 'c'), ('DLC', 'DLC')], [('4', '5'), ('b', 'a'), ('eng', 'WA'), (None, '345')], [('4',), ('c',), ('DLC',)], [('5',), ('a',), ('WA',), ('750',)], []]
each sub list represent a specific line in each block for example the first sub list is first lines in each block :
>>> z=[i for i in z if i] # remove the empty lists
>>> z[0]
[('1', '1', '2'), ('17824', '3224', '2013341524626245.9')]
>>> z[0][1]
('17824', '3224', '2013341524626245.9')
So I'm writing a pretty involved answer that uses a helper function, but I think you'll find it pretty flexible. It uses an iterutil type helper function that I wrote called groupby. The groupby function accepts a key function to specify which group each item belongs to. In your case the key function was a little fancy because it had to maintain state to know which year each element belonged to. The code below is totally runnable. Just copy and paste into a script and let me know what you think.
EDIT
Turns out the groupby function is already implemented in the itertools module and I've been missing it forever. I edited the code to use the itertools version
#!/usr/bin/env python
import io
import re
import itertools as it
data = '''-------------2000--------------
1 17824
2 20131125192004.9
3 690714s1969 dcu 000 0 eng
4 a 75601809
4 a DLC
4 b eng
4 c DLC
5 a WA 750
-------------2001--------------
1 3224
2 20w125192004.9
3 690714s1969 dcu 000 0 eng
5 a WA 120
-------------2002--------------
2 2013341524626245.9
3 484914s1969 dcu 000 0 eng
4 a 75601809
4 a eng
4 c DLC
5 a WA 345'''
def group_year():
'''
A stateful closure to group the year blobs together
'''
# Hack to update a variable from the closure
g = [0]
def closure(e):
if re.findall(r'-----[0-9]{4}------', e):
g[0] += 1
return g[0]
return closure
if __name__ == "__main__":
f = io.BytesIO(data)
gy = group_year()
for k,group in it.groupby(f, key=gy):
# group is now an iter of lines for each year group in the data
# Now you can iterate on each group like so:
for line in group:
rec = line.strip().split()
if rec[0] == '1':
print rec[1]
# You could also use nested groupby's at this point to perform
# further grouping on the different columns or whatever