I have a code which returns a dictionary with the
names as the keys and the corresponding values which are tuples of
numbers. The first number after the name controls how many numbers are
in the corresponding tuple (the numbers included in the tuple are taken
from the left to right). For example, the line of text "Ali 6 7 6 5 12 31 61 9" has 6 as the first number after the name and this line of text
becomes the dictionary entry with the keyword "Ali" and the
corresponding value is a tuple made up of the next six integers "Ali":
(7, 6, 5, 12, 31, 61).
This is the film I'm taking the code from
Bella 5 2 6 2 2 30 4 8 9 2
Gill 2 9 7 54 67
Jin 3 26 51 3 344 23
Elmo 4 3 8 6 8
Ali 6 7 6 5 12 31 61 9
the expected output is
Ali : (7, 6, 5, 12, 31, 61)
Bella : (2, 6, 2, 2, 30)
Elmo : (3, 8, 6, 8)
Gill : (9, 7)
Jin : (26, 51, 3)
so i've done like this
def get_names_num_tuple_dict(filename):
file_in = open(filename, 'r')
contents = file_in.read()
file_in.close()
emty_dict = {}
for line in contents:
data = line.strip().split()
key = data[0]
length = int(data[1])
data = tuple(data[2:length + 2])
emty_dict[key] = data
return emty_dict
But I'm having this error
length = int(data[1])
IndexError: list index out of range
Can anyone please help? That will be really helpful. I'm a bit weak with the dictionary as learning for the first time.
Use following code:
def get_names_num_tuple_dict(filename):
emty_dict = {}
with open(filename) as f:
for line in f:
data = line.strip().split()
key = data[0]
length = int(data[1])
data = tuple(data[2:length + 2])
emty_dict[key] = data
return emty_dict
print(get_names_num_tuple_dict('my_filename'))
Output:
{'Bella': ('2', '6', '2', '2', '30'), 'Gill': ('9', '7'), 'Jin': ('26', '51', '3'), 'Elmo': ('3', '8', '6', '8'), 'Ali': ('7', '6', '5', '12', '31', '61')}
Here is what happens:
contents = file_in.read()
Reads your file into string. When you loop over this string it will go character by character and give you IndexError: list index out of range.
Basically try using:
for line in file_in:
data = line.strip().split()
...
Related
I have a variable that is mixed with letters and numbers. The letters range from A:Z and the numbers range from 2:8. I want to re-code this variable so that it is all numeric with the letters A:Z now becoming numbers 1:26 and the numbers 2:8 becoming numbers 27:33.
For example, I would like this variable:
Var1 = c('A',2,3,8,'C','W',6,'T')
To become this:
Var1 = c(1,27,28,33,3,23,31,20)
In R I can do this using 'match' like this:
Var1 = as.numeric(match(Var1, c(LETTERS, 2:8)))
How can I do this using python? Pandas?
Thank you
Make a dictionary and map the values:
import string
import numpy as np
dct = dict(zip(list(string.ascii_uppercase) + list(np.arange(2, 9)), np.arange(1, 34)))
# If they are strings of numbers, not integers use:
#dct = dict(zip(list(string.ascii_uppercase) + ['2', '3', '4', '5', '6', '7', '8'], np.arange(1, 34)))
df.col_name = df.col_name.map(dct)
An example:
import pandas as pd
df = pd.DataFrame({'col': [2, 4, 6, 3, 5, 'A', 'B', 'D', 'F', 'Z', 'X']})
df.col.map(dct)
Outputs:
0 27
1 29
2 31
3 28
4 30
5 1
6 2
7 4
8 6
9 26
10 24
Name: col, dtype: int64
i think that could help you
Replacing letters with numbers with its position in alphabet
then you just need to apply on you df column
dt.Var1.apply(alphabet_position)
you can also try this
for i in range(len(var1)):
if type(var1[i]) == int:
var1[i] = var1[i] + 25
else:
var1[i] = ord(var1[i].lower()) - 96
I have the following list with 2 elements:
['AGCTT 6 6 35 25 10', 'AGGGT 7 7 28 29 2']
I need to make a list or zip file such that each alphabet corresponds to its number further in the list. For example in list[0] the list/zip should read
{"A":"6", "G":"6", "C":"35","T":"25","T":"10"}
Can I make a list of such lists/zips that stores the corresponding vales for list[0], list[1],...list[n]?
Note: The alphabets can only be A,G,C or T, and the numbers can take anyvalue
Edit 1: Previously, I thought I could use a dictionary. But several members pointed out that this cannot be done. So I just want to make a list or zip or anything else recommended to pair the Alphabet element to its corresponding number.
Use tuples splitting once to get the pairs, then split the second element of each pair, zip together:
l =['AGCTT 6 6 35 25 10', 'AGGGT 7 7 28 29 2']
pairs = [zip(a,b.split()) for a,b in (sub.split(None,1) for sub in l]
Which would give you:
[[('A', '6'), ('G', '6'), ('C', '35'), ('T', '25'), ('T', '10')], [('A', '7'), ('G', '7'), ('G', '28'), ('G', '29'), ('T', '2')]]
Of using a for loop with list.append:
l = ['AGCTT 6 6 35 25 10', 'AGGGT 7 7 28 29 2']
out = []
for a,b in (sub.split(None,1) for sub in l ):
out.append(zip(a,b))
If you want to convert any letter to Z where the digit is < 10, you just need another loop where we check the digit in each pairing:
pairs = [[("Z", i ) if int(i) < 10 else (c, i) for c,i in zip(a, b.split())]
for a,b in (sub.split(None, 1) for sub in l)]
print(pairs)
Which would give you:
[[('Z', '6'), ('Z', '6'), ('C', '35'), ('T', '25'), ('T', '10')], [('Z', '7'), ('Z', '7'), ('G', '28'), ('G', '29'), ('Z', '2')]]
To break it into a regular loop:
pairs = []
for a, b in (sub.split(None, 1) for sub in l):
pairs.append([("Z", i) if int(i) < 10 else (c, i) for c, i in zip(a, b.split())])
print(pairs)
[("Z", i) if int(i) < 10 else (c, i) for c, i in zip(a, b.split())] sets the letter to Z if the corresponding digit i is < 10 or else we just leave the letter as is.
if you want to get back to the original pairs after you just need to transpose with zip:
In [13]: l = ['AGCTT 6 6 35 25 10', 'AGGGT 7 7 28 29 2']
In [14]: pairs = [[("Z", i) if int(i) < 10 else (c, i) for c, i in zip(a, b.split())] for a, b in
....: (sub.split(None, 1) for sub in l)]
In [15]: pairs
Out[15]:
[[('Z', '6'), ('Z', '6'), ('C', '35'), ('T', '25'), ('T', '10')],
[('Z', '7'), ('Z', '7'), ('G', '28'), ('G', '29'), ('Z', '2')]]
In [16]: unzipped = [["".join(a), " ".join(b)] for a, b in (zip(*tup) for tup in pairs)]
In [17]: unzipped
Out[17]: [['ZZCTT', '6 6 35 25 10'], ['ZZGGZ', '7 7 28 29 2']]
zip(*...) will give you the original elements back into a tuple of their own, we then just need to join the strings back together. If you wanted to get back to the total original state you could just join again:
In[18][ " ".join(["".join(a), " ".join(b)]) for a, b in (zip(*tup) for tup in pairs) ]
Out[19]: ['ZZCTT 6 6 35 25 10', 'ZZGGZ 7 7 28 29 2']
If you consider using tuples to pair the items, then this works:
>>> from pprint import pprint
>>> lst = ['AGCTT 6 6 35 25 10', 'AGGGT 7 7 28 29 2']
>>> new_lst = [list(zip(sub[0], sub[1:])) for sub in [i.split() for i in lst]]
>>> pprint(new_lst)
[[('A', '6'), ('G', '6'), ('C', '35'), ('T', '25'), ('T', '10')],
[('A', '7'), ('G', '7'), ('G', '28'), ('G', '29'), ('T', '2')]]
[i.split() for i in lst]: An initial split on the string.
zip(sub[0], sub[1:])): Zip lists of alphabets and list of numbers
Iterate through list > iterate through items (alpha numeric) of the list and construct list of characters and numbers > and then construct list of tuple.
alphanum = ['AGCTT 6 6 35 25 10', 'AGGGT 7 7 28 29 2']
list_of_tuple = []
for s in alphanum:
ints = []
chars = []
for i in s.split():
if i.isdigit():
ints.append(i)
else:
chars.append(i)
new_tuple = []
for (n, item) in enumerate(list(chars[0])):
new_tuple.append((item, ints[n]))
list_of_tuple.append(new_tuple)
print list_of_tuple
This code would work, assuming the elements in the list are correctly formed.
This means the number of letters and numbers must match!
And it will overwrite the value if the key already exists.
list = ['AGCTT 6 6 35 25 10', 'AGGGT 7 7 28 29 2']
dictionary = {}
for line in list:
split_line = line.split()
letters = split_line[0]
iterator = 1
for letter in letters:
dictionary[letter] = split_line[iterator]
iterator += 1
print dictionary
This modified one will check if the key exists and add it to a list with that key:
list = ['AGCTT 6 6 35 25 10', 'AGGGT 7 7 28 29 2']
dictionary = {}
for line in list:
split_line = line.split()
letters = split_line[0]
iterator = 1
for letter in letters:
if letter in dictionary.keys():
dictionary[letter].append(split_line[iterator])
else:
dictionary[letter] = [split_line[iterator]]
iterator += 1
print dictionary
I am new to python and trying to do the following in python 3
I have a text file like this
1 2 3
4 5 6
7 8 9
.
.
I wanted this to be converted into groups of tuple like this
((1,2,3),(4,5,6),(7,8,9),...)
I have tried using
f = open('text.txt', 'r')
f.readlines()
but this is giving me a list of individual words.
could any one help me with this?
A method using csv module -
>>> import csv
>>> f = open('a.txt','r')
>>> c = csv.reader(f,delimiter='\t') #Use the delimiter from the file , if a single space, use a single space, etc.
>>> l = []
>>> for row in c:
... l.append(tuple(map(int, row)))
...
>>> l = tuple(l)
>>> l
(('1', '2', '3'), ('4', '5', '6'), ('7', '8', '9'))
Though if you do not really need the tuples , do not use them, it may be better to just leave them at list.
Both row and l in above code are initially lists.
You may try this,
>>> s = '''1 2 3
4 5 6
7 8 9'''.splitlines()
>>> tuple(tuple(int(j) for j in i.split()) for i in s)
((1, 2, 3), (4, 5, 6), (7, 8, 9))
For your case,
tuple(tuple(int(j) for j in i.split()) for i in f.readlines())
-------------2000--------------
1 17824
2 20131125192004.9
3 690714s1969 dcu 000 0 eng
4 a 75601809
4 a DLC
4 b eng
4 c DLC
5 a WA 750
-------------2001--------------
1 3224
2 20w125192004.9
3 690714s1969 dcu 000 0 eng
5 a WA 120
-------------2002--------------
2 2013341524626245.9
3 484914s1969 dcu 000 0 eng
4 a 75601809
4 a eng
4 c DLC
5 a WA 345
I want to iterate through both the years and the fields under each year (e.g. 1, 2, 3, 4, and 5). a, b, and other alphabet letters after some fields are subfields.
The lines with dashes in my code indicates the year of the entry. Each record group starts at ---year--- and ends at the line before ---year---.
Also, fields is a list:
fields=["1", "2", "3,", "4", "5"].
I'm eventually trying to retrieve the values next to the fields for each entry/year. For example, if my current field is 1, which is equivalent to fields[0], I would iterate through all the years (2000, 2001, and 2002) to get the values for the field 1. The output would be
17824
3224
(Blank space for Year 2002)
How can I iterate through the years (indicated by the dashes)? I can't seem to think of a code to generate the desired output.
You can first use regex to split your text then use itertools.izip_longest within a nested list comprehension to get your expected columns :
>>> import re
>>> blocks=re.split(r'-+\d+-+',s)
>>> from itertools import izip_longest
>>> z=[list(izip_longest(*[k for k in sub if k])) for sub in izip_longest(*[[j.split() for j in i.split('\n')] for i in blocks])]
[[], [('1', '1', '2'), ('17824', '3224', '2013341524626245.9')], [('2', '2', '3'), ('20131125192004.9', '20w125192004.9', '484914s1969'), (None, None, 'dcu'), (None, None, '000'), (None, None, '0'), (None, None, 'eng')], [('3', '3', '4'), ('690714s1969', '690714s1969', 'a'), ('dcu', 'dcu', '75601809'), ('000', '000', None), ('0', '0', None), ('eng', 'eng', None)], [('4', '5', '4'), ('a', 'a', 'a'), ('75601809', 'WA', 'eng'), (None, '120', None)], [('4', '4'), ('a', 'c'), ('DLC', 'DLC')], [('4', '5'), ('b', 'a'), ('eng', 'WA'), (None, '345')], [('4',), ('c',), ('DLC',)], [('5',), ('a',), ('WA',), ('750',)], []]
each sub list represent a specific line in each block for example the first sub list is first lines in each block :
>>> z=[i for i in z if i] # remove the empty lists
>>> z[0]
[('1', '1', '2'), ('17824', '3224', '2013341524626245.9')]
>>> z[0][1]
('17824', '3224', '2013341524626245.9')
So I'm writing a pretty involved answer that uses a helper function, but I think you'll find it pretty flexible. It uses an iterutil type helper function that I wrote called groupby. The groupby function accepts a key function to specify which group each item belongs to. In your case the key function was a little fancy because it had to maintain state to know which year each element belonged to. The code below is totally runnable. Just copy and paste into a script and let me know what you think.
EDIT
Turns out the groupby function is already implemented in the itertools module and I've been missing it forever. I edited the code to use the itertools version
#!/usr/bin/env python
import io
import re
import itertools as it
data = '''-------------2000--------------
1 17824
2 20131125192004.9
3 690714s1969 dcu 000 0 eng
4 a 75601809
4 a DLC
4 b eng
4 c DLC
5 a WA 750
-------------2001--------------
1 3224
2 20w125192004.9
3 690714s1969 dcu 000 0 eng
5 a WA 120
-------------2002--------------
2 2013341524626245.9
3 484914s1969 dcu 000 0 eng
4 a 75601809
4 a eng
4 c DLC
5 a WA 345'''
def group_year():
'''
A stateful closure to group the year blobs together
'''
# Hack to update a variable from the closure
g = [0]
def closure(e):
if re.findall(r'-----[0-9]{4}------', e):
g[0] += 1
return g[0]
return closure
if __name__ == "__main__":
f = io.BytesIO(data)
gy = group_year()
for k,group in it.groupby(f, key=gy):
# group is now an iter of lines for each year group in the data
# Now you can iterate on each group like so:
for line in group:
rec = line.strip().split()
if rec[0] == '1':
print rec[1]
# You could also use nested groupby's at this point to perform
# further grouping on the different columns or whatever
I have a file like this:
Ben
0 1 5 2 0 1 0 1
Tim
3 2 1 5 4 0 0 1
I would like to make a dictionary that looks like this:
{Ben: 0 1 5 2 0 1 0 1, Tim : 3 2 1 5 4 0 0 1}
so I was thinking something like:
for line in file:
dict[line] = line + 1
but you can't iterate through a file like that, so how would I go about
doing this?
This might be what you want:
dict_data = {}
with open('data.txt') as f:
for key in f:
dict_data[key.strip()] = next(f).split()
print dict_data
Output:
{'Tim': ['3', '2', '1', '5', '4', '0', '0', '1'], 'Ben': ['0', '1', '5', '2', '0', '1', '0', '1']}
Discussion
The for loop assumes each line is a key, we will read the next line in the body of the loop
key.strip() will turn 'Tim\n' to 'Tim'
f.next() reads and returns the next line -- the line after the key line
f.next().split() therefore splitting that line into a list
dict_data[key.strip()] = ... will do something like: dict_data['Tim'] = [ ... ]
Update
Thank to Blckknght for the pointer. I changed f.next() to next(f)
Update 2
If you want to turn the list into a list of integers instead of string, then instead of:
dict_data[key.strip()] = next(f).split()
Do this:
dict_data[key.strip()] = [int(i) for i in next(f).split()]
state = 0
d = {}
for line in file:
if state == 0:
key = line.strip()
state = 1
elif state == 1:
d[key] = line.split()
state = 0
I think the easiest approach is to first load the full file with file.readlines(), which loads the whole file and returns a list of the lines. Then you can create your dictionary with a comprehension:
lines = my_file.readlines()
my_dict = dict(lines[i:i+2] for i in range(0, len(lines), 2))
For your example file, this will give my_dict the contents:
{"Ben\n": "0 1 5 2 0 1 0 1\n", "Tim\n": "3 2 1 5 4 0 0 1\n"}
An alternative approach would be to use a while loop that reads two lines at a time:
my_dict = {}
while True:
name = file.readline().strip()
if not name: # detect the end of the file, where readline returns ""
break
numbers = [int(n) for n in file.readline().split()]
my_dict[name] = numbers
This approach allows you easily do some processing of the lines than the comprehension in the earlier version, such as stripping newlines and splitting the line of numbers into a list of actual int objects.
The result for the example file would be:
{"Ben": [0, 1, 5, 2, 0, 1, 0, 1], "Tim": [3, 2, 1, 5, 4, 0, 0, 1]}