Related
I have a list of lists of sequences, and a corresponding list of lists of names.
testSequences = [
['aaaa', 'cccc'],
['tt', 'gg'],
['AAAAAAA', 'CCCCCC', 'TTTTTT', 'GGGGGG']]
testNames = [
['>xx_oneFish |xzx', '>xx_twoFish |zzx'],
['>xx_redFish |zxx', '>xx_blueFish |zxx'],
['>xx_oneFish |xzx', '>xx_twoFish |xzx', '>xz_redFish |xxx', '>zx_blueFish |xzz']]
I also have a list of all the identifying parts of the names:
taxonNames = ['oneFish', 'twoFish', 'redFish', 'blueFish']
I am trying to produce a new list, where each item in the list will correspond to one of the "identifying parts of the names", and the string will be made up of all the sequences for that name.
If a name and sequence does not appear in one of the lists in the lists (i.e. no redFish or blueFish in the first list of testNames) I want to add in a string of hyphens the same length as the sequences in that list. This would give me this output:
['aaaa--AAAAAA', 'cccc--CCCCCC', '----ttTTTTTT', '----ggGGGG']
I have this piece of code to do this.
complete = [''] * len(taxonNames)
for i in range(len(testSequences)):
for j in range(len(taxonNames)):
sequenceLength = len(testSequences[i][0])
for k in range(len(testSequences[i])):
if taxonNames[j] in testNames[i][k]:
complete[j].join(testSequences[i][k])
if taxonNames[j] not in testNames[i][k]:
hyphenString = "-" * sequenceLength
complete[j].join(hyphenString)
print complete
"complete" should give my final output as explained above, but it comes out looking like this:
['', '', '', '']
How can I fix my code to give me the correct answer?
The main issue with your code, which makes it very hard to understand, is you're not really leveraging the language elements that make Python so strong.
Here's a solution to your problem that works:
test_sequences = [
['aaaa', 'cccc'],
['tt', 'gg'],
['AAAAAAA', 'CCCCCC', 'TTTTTT', 'GGGGGG']]
test_names = [
['>xx_oneFish |xzx', '>xx_twoFish |zzx'],
['>xx_redFish |zxx', '>xx_blueFish |zxx'],
['>xx_oneFish |xzx', '>xx_twoFish |xzx', '>xz_redFish |xxx', '>zx_blueFish |xzz']]
taxon_names = ['oneFish', 'twoFish', 'redFish', 'blueFish']
def get_seqs(taxon_name, sequences_list, names_list):
for seqs, names in zip(sequences_list, names_list):
found_seq = None
for seq, name in zip(seqs, names):
if taxon_name in name:
found_seq = seq
break
yield found_seq if found_seq else '-' * len(seqs[0])
result = [''.join(get_seqs(taxon_name, test_sequences, test_names))
for taxon_name in taxon_names]
print(result)
The generator get_seqs pairs up lists from test_sequences and test_names and for each pair, tries to find the sequence (seq) for the name (name) that matches and yields it, or yields a string of the right number of hyphens for that list of sequences.
The generator (a function that yields multiple values) has code that quite literally follows the explanation above.
The result is then simply a matter of, for each taxon_name, getting all the resulting sequences that match in order and joining them together into a string, which is the result = ... line.
You could make it work with list indexing loops and string concatenation, but this is not a PHP question, now is it? :)
Note: for brevity, you could just access the global test_sequences and test_names instead of passing them in as parameters, but I think that would come back to haunt you if you were to actually use this code. Also, I think it makes semantic sense to change the order of names and sequences in the entire example, but I didn't to avoid further deviating from your example.
Here is a solution that may do what you want. It begins, not with your data structures from this post, but with the three example files from your previous post (which you used to build this post's data structures).
The only thing I couldn't figure out was how many hyphens to use for a missing sequence from a file.
differentNames = ['oneFish', 'twoFish', 'redFish', 'blueFish']
files = ['f1.txt', 'f2.txt', 'f3.txt']
data = [[] for _ in range(len(differentNames))]
final = []
for file in files:
d = dict()
with open(file, 'r') as fin:
for line in fin:
line = line.rstrip()
if line.startswith('>'): # for ex., >xx_oneFish |xxx
underscore = line.index('_')
space = line.index(' ')
key = line[underscore+1:space]
else:
d[key] = line
for i, key in enumerate(differentNames):
data[i].append(d.get(key, '-' * 4))
for array in data:
final.append(''.join(array))
print(final)
Prints:
['AAAAAAAaaaa----', 'CCCCCCcccc----', 'TTTTTT----tt', 'GGGGGG----gg']
I am trying to match elements from two lists and write it to a file, match columns from both the files col[0] and print certain columns in to a new file
with open('~/gf_out.txt', 'w') as w:
w.write('\t'.join(headers) + '\n')
for i in d1: #list1
for j in d2: # list2
if i[0] == j[0]:
out = ((j[0:10]),i[1],i[2],j[11],j[12])
# print out
w.write('\t'.join(out) + '\n')
TypeError: sequence item 0: expected string, list found
if out changed to
out = (str(j[0:10]),i[1],i[2],j[11],j[12])
the final output would have [ ] around the first 10 columns, how can this be fixed
ANALYSIS
Your problem is right where the error messge (certainly) told and, and just what it described ... once you're comfortable enough with Python to interpret the description.
out = ((j[0:10]),i[1],i[2],j[11],j[12])
w.write('\t'.join(out) + '\n')
join operates on a sequence of strings. You gave it a sequence, but the first element of that is the tuple (j[0:10]).
REMEDY
You have nested lists, so you need nested joins.
sep = '\t' # separator
out_0 = sep.join(j[0:10])
out_line = sep.join(out_0,,i[1],i[2],j[11],j[12])
w.write(out_line)
Yes, you can recombine this to a single-line write; I broke it down to make the logic clear.
If this doesn't match your needs, then please provide the required MCVE to clarify the problems.
What exactly are you wanting it to do? j[0:10] is a list, so if you want to convert it to a string, it will have square brackets. if you want those lements to be joind by tabs as well, you need to either do that explicitly or join it to the other list instead of embedding it.
out = ('\t'.join(j[0:10]),i[1],i[2],j[11],j[12])
or
out = j[0:10] + [i[1],i[2],j[11],j[12]]
We have a homework that I have a serious problem on.
The key is to make each line to a tuple and make these tuple to a list.
like list=[tuple(line1),tuple(line2),tuple(line3),...].
Besides, there are many strings separated by commas, like "aei","1433","lincoln",...
Here is the question:
A book can be represented as a tuple of the author's lastName, the author's firstName, the title, date, and ISBN.
Write a function, readBook(), that, given a comma-separated string containing this information, returns a tuple representing the book.
Write a function, readBooks(), that, given the name of a text file containing one comma-separated line per book, uses readBook() to return a list of tuples, each of which describes one book.
Write a function, buildIndex(), that, given a list of books as returned by readBooks(), builds a map from key word to book title. A key word is any word in a book's title, except "a", "an, or "the".
Here is my code:
RC=("Chann", "Robbbin", "Pride and Prejudice", "2013", "19960418")
RB=("Benjamin","Franklin","The Death of a Robin Thickle", "1725","4637284")
def readBook(lastName, firstName, booktitle, date, isbn):
booktuple=(lastName, firstName, booktitle, date, isbn)
return booktuple
# print readBook("Chen", "Robert", "Pride and Prejudice", "2013", "19960418")
def readBooks(file1):
inputFile = open(file1, "r")
lines = inputFile.readlines()
book = (lines)
inputFile.close()
return book
print readBooks("book.txt")
BooklistR=[RC,RB]
def buildIndex(file2):
inputFile= open("book.txt","r")
Blist = inputFile.readlines()
dictbooks={}
for bookinfo in Blist:
title=bookinfo[2].split()
for infos in title:
if infos.upper()=="A":
title.remove(infos)
elif infos.upper()=="THE":
title.remove(infos)
elif infos.upper()=="AN":
title.remove(infos)
else:
pass
dictbooks[tuple(title)]= bookinfo[2]
return dictbooks
print buildIndex("book.txt")
#Queries#
def lookupKeyword(keywords):
dictbooks=buildIndex(BooklistR)
keys=dictbooks.viewkeys()
values=dictbooks.viewvalues()
for keybook in list(keys):
for keyw in keywords:
for keyk in keybook:
if keyw== keyk:
printoo= dictbooks[keybook]
else:
pass
return printoo
print lookupKeyword("Robin")
What's wrong with something like this?:
with open(someFile) as inputFile:
myListofTuples = [tuple(line.split(',')) for line in inputFile.readlines()]
[Explanation added based on Robert's comment]
The first line opens the file in a with statement. Python with statements are a fairly new feature and rather advanced. The set up a context in which code executes with certain guarantees about how clean-up and finalization code will be executed as the Python engine exits that context (whether by completing the work or encountering an un-handled exception).
You can read about the ugly details at: Python Docs: Context Managers but the gist of it all is that we're opening someFile with a guarantee that it'll be closed properly after the execution of the code leaves that context (the suite of statements after the with statement. That'll be done even if we encounter some error or if our code inside that suite raises some exception that we fail to catch.
In this case we use the as clause to give us a local name by which we can refer to the opened file object. (The filename is just a string, passed as an argument to the open() built-in function ... the object returned by that function needs to have a name by which we can refer to it. This is similar to who a for i in whatever statement binds each item in whatever to the name i for each iteration through the loop.
The suite of our with statement (that's the set of indented statements which is run within the context of the context manager) consists of a single statement ... a list comprehension which is bound to the name myListofTuples.
A list comprehension is another fairly advanced programming concept. There are a number of very high level languages which implement them in various ways. In the case of Python they date back to much earlier versions than the with statement --- I think they were introduced in the 2.2 or so timeframe.
Consequently, list comprehensions are fairly common in Python code while with statements are only slowly being adopted.
A list literal in Python looks like: [something, another_thing, etc, ...] a list comprehension is similar but replaces the list of item literals with an expression, a line of code, which evaluates into a list. For example: [x*x for x in range(100) if x % 2] is a list comprehension which evaluates into a list of integers which are the squares of odd integers between 1 and 99. (Notice the absence of commas in the list comprehension. An expression takes the place of the comma delimited sequence which would have been used in a list literal).
In my example I'm using for line in inputFile.readlines() as the core of the expression and I'm splitting each of those on the common (line.split(',')) and then converting the resulting list into a tuple().
This is just a very concise way of saying:
myListofTuples = list()
for line in inputfile.readlines():
myListofTuples.append(line.split(','))
One possible program:
import fileinput
def readBook(str):
l = str.split(',')
t = (l[0:5])
return t
#b = readBook("First,Last,Title,2013,ISBN")
#print b
def readBooks(file):
l = []
for line in fileinput.input(file):
t = readBook(line)
# print t
l.append(t)
return l
books = readBooks("data")
#for t in books:
# for f in t:
# print f
def buildIndex(books):
i = {}
for b in books:
for w in b[2].split():
if w.lower() not in ('a', 'an', 'the'):
if w not in i:
i[w] = []
i[w].append(b[2])
return i
index = buildIndex(books)
for w in sorted(index):
print "Word: ", w
for t in index[w]:
print "Title: ", t
Sample data file (called "data" in the code):
Austen,Jane,Pride and Prejudice,1811,123456789012X
Austen,Jane,Sense and Sensibility,1813,21234567892
Rice-Burroughs,Edgar,Tarzan and the Apes,1911,302912341234X
Sample output:
Word: Apes
Title: Tarzan and the Apes
Word: Prejudice
Title: Pride and Prejudice
Word: Pride
Title: Pride and Prejudice
Word: Sense
Title: Sense and Sensibility
Word: Sensibility
Title: Sense and Sensibility
Word: Tarzan
Title: Tarzan and the Apes
Word: and
Title: Pride and Prejudice
Title: Sense and Sensibility
Title: Tarzan and the Apes
Note that the data format can't support book titles such as "The Lion, The Witch, and the Wardrobe" because of the embedded commas. If the file was in CSV format with quotes around the strings, then it could manage that.
I'm not sure that's perfectly minimally Pythonic code (not at all sure), but it does seem to match the requirements.
I have an assessment to do, and here's my code so far:
number1 = input("Number1? ")
number2 = input("Number2? ")
packages = csv.reader(open('lol.txt', newline='\n'), delimiter=',')
for PackName,PriceAdultString,PriceChildString in packages:
n += 1
PriceAdult = float(PriceAdultString)
PriceChild = float(PriceChildString)
print("%i. %17s - $%4.2d / $%4.2d" % (n, PackName, PriceAdult, PriceChild))
NameChoice = input("Which name would you like? Choose using a number: ")
The lol.txt used by csv.reader consists of the following:
herp,123,456
derp,321,654
hurr,213,546
Now, I need to be able to use NameChoice to retrieve a row from the file, and use the data within as name, number1, and number2, so for NameChoice == 1, name = herp, number1 = 123 and number 2 = 456, and the numbers must be a floating point number.
I'm having trouble figuring this out, and could use some guidance if that's possible.
Thanks all.
Before it's asked, I realised I forgot to mention: I have googled, and trawled through the Python guides, and my textbooks. I'm not entirely convinced I know what I'm looking for, though.
Run into a new problem:
I need to be able to take CSV text with '\n\n' instead of '\n', so the text is more like the following:
herp,123,456
derp,321,654
hurr,213,546
My (very slightly adjusted) version of the code Li-Aung used:
import csv
with open ("lol.txt",'rt', newline = '\n\n') as f:
csv_r = csv.reader (f)
entries = [ (name, float(p1), float(p2)) for name, p1, p2 in csv_r]
for index, entry in enumerate(entries):
print ("%2i. %-10s %5.2f %5.2f" % (index, entry[0], entry[1], entry[2]))
choice = int(input("Choose a number: "))
print (entries[choice])
Which returns the exception:
Traceback (most recent call last):
File "C:/Python32/eh", line 2, in <module>
with open ("lol.txt",'rt', newline = '\n\n') as f:
ValueError: illegal newline value:
Now, the debug is pretty clear - '\n\n' is not acceptable as a newline specifier, but I was wondering if there is a way around this?
Note: Screwed up the previous edit, the debug from the code with " newline = '\n'" would have been:
Traceback (most recent call last):
File "C:/Python32/eh", line 4, in <module>
entries = [ (name, float(p1), float(p2)) for name, p1, p2 in csv_r]
File "C:/Python32/eh", line 4, in <listcomp>
entries = [ (name, float(p1), float(p2)) for name, p1, p2 in csv_r]
ValueError: need more than 0 values to unpack
Which is because it treated the blank space with 0 values between each useful row as a row, as it was told to do, and there was nothing in there.
#mata has the right of it, but I feel the need to elaborate on their answer more than I can in a comment.
Since you need to refer back to your data instead of just printing it, it makes sense to have it stick around somehow. Once you reach that point, the biggest thing you need to worry about is what kind of data structure to use - if your chosen data structure is a close match to how you want to use the data, the code becomes quite simple.
So, your logic will look like:
Parse the data into some kind of data structure
Walk this data structure to print out the menu
Get the user input, and use it to select the right bit of data
Since the user input is a number representing how far down the file the data is, a list is an obvious choice. If you were using one of the existing fields as a lookup, a dict would serve you better.
If you do:
data = list(csv.reader(open('lol.txt', newline='\n'), delimiter=','))
, you can walk it to print the menu the same way you current use the file, except that the data sticks around, and using the number you get in is directly meaningful.
You might prefer to store the numbers as number types than strings; it would make good sense to, but figuring out how to adjust the code above to achieve it is left as an exercise for the reader. :-)
Store the entire file to a list:
import csv
with open ("lol.txt",'rb') as f:
csv_r = csv.reader (f)
entries = [ (name, float(p1), float(p2)) for name, p1, p2 in csv_r]
for index, entry in enumerate(entries):
print ("%2i. %-10s %5.2f %5.2f" % (index, entry[0], entry[1], entry[2]))
choice = int(raw_input("Choose a number: "))
print (entries[choice])
Output:
0. herp 123.00 456.00
1. derp 321.00 654.00
2. hurr 213.00 546.00
Choose a number: 0
('herp', 123.0, 456.0)
>>>
you could for example store the input in a list instead of just printing it out.
after that it's trivial to get the right tuple from the list and assign it to your variables...
with open ("lol.txt",'rt', newline = '\n\n') as f:
change it to '\n'
with open ("lol.txt",'rt', newline = '\n') as f:
I'm a newbie to Python and I'm looking at using it to write some hairy EDI stuff that our supplier requires.
Basically they need an 80-character fixed width text file, with certain "chunks" of the field with data and others left blank. I have the documentation so I know what the length of each "chunk" is. The response that I get back is easier to parse since it will already have data and I can use Python's "slices" to extract what I need, but I can't assign to a slice - I tried that already because it sounded like a good solution, and it didn't work since Python strings are immutable :)
Like I said I'm really a newbie to Python but I'm excited about learning it :) How would I go about doing this? Ideally I'd want to be able to say that range 10-20 is equal to "Foo" and have it be the string "Foo" with 7 additional whitespace characters (assuming said field has a length of 10) and have that be a part of the larger 80-character field, but I'm not sure how to do what I'm thinking.
You don't need to assign to slices, just build the string using % formatting.
An example with a fixed format for 3 data items:
>>> fmt="%4s%10s%10s"
>>> fmt % (1,"ONE",2)
' 1 ONE 2'
>>>
Same thing, field width supplied with the data:
>>> fmt2 = "%*s%*s%*s"
>>> fmt2 % (4,1, 10,"ONE", 10,2)
' 1 ONE 2'
>>>
Separating data and field widths, and using zip() and str.join() tricks:
>>> widths=(4,10,10)
>>> items=(1,"ONE",2)
>>> "".join("%*s" % i for i in zip(widths, items))
' 1 ONE 2'
>>>
Hopefully I understand what you're looking for: some way to conveniently identify each part of the line by a simple variable, but output it padded to the correct width?
The snippet below may give you what you want
class FixWidthFieldLine(object):
fields = (('foo', 10),
('bar', 30),
('ooga', 30),
('booga', 10))
def __init__(self):
self.foo = ''
self.bar = ''
self.ooga = ''
self.booga = ''
def __str__(self):
return ''.join([getattr(self, field_name).ljust(width)
for field_name, width in self.fields])
f = FixWidthFieldLine()
f.foo = 'hi'
f.bar = 'joe'
f.ooga = 'howya'
f.booga = 'doin?'
print f
This yields:
hi joe howya doing
It works by storing a class-level variable, fields which records the order in which each field should appear in the output, together with the number of columns that field should have. There are correspondingly-named instance variables in the __init__ that are set to an empty string initially.
The __str__ method outputs these values as a string. It uses a list comprehension over the class-level fields attribute, looking up the instance value for each field by name, and then left-justifying it's output according to the columns. The resulting list of fields is then joined together by an empty string.
Note this doesn't parse input, though you could easily override the constructor to take a string and parse the columns according to the field and field widths in fields. It also doesn't check for instance values that are longer than their allotted width.
You can use justify functions to left-justify, right-justify and center a string in a field of given width.
'hi'.ljust(10) -> 'hi '
I know this thread is quite old, but we use a library called django-copybook. It has nothing to do with django (anymore). We use it to go between fixed width cobol files and python. You create a class to define your fixed width record layout and can easy move between typed python objects and fixed width files:
USAGE:
class Person(Record):
first_name = fields.StringField(length=20)
last_name = fields.StringField(length=30)
siblings = fields.IntegerField(length=2)
birth_date = fields.DateField(length=10, format="%Y-%m-%d")
>>> fixedwidth_record = 'Joe Smith 031982-09-11'
>>> person = Person.from_record(fixedwidth_record)
>>> person.first_name
'Joe'
>>> person.last_name
'Smith'
>>> person.siblings
3
>>> person.birth_date
datetime.date(1982, 9, 11)
It can also handle situations similar to Cobol's OCCURS functionality like when a particular section is repeated X times
I used Jarret Hardie's example and modified it slightly. This allows for selection of type of text alignment(left, right or centered.)
class FixedWidthFieldLine(object):
def __init__(self, fields, justify = 'L'):
""" Returns line from list containing tuples of field values and lengths. Accepts
justification parameter.
FixedWidthFieldLine(fields[, justify])
fields = [(value, fieldLenght)[, ...]]
"""
self.fields = fields
if (justify in ('L','C','R')):
self.justify = justify
else:
self.justify = 'L'
def __str__(self):
if(self.justify == 'L'):
return ''.join([field[0].ljust(field[1]) for field in self.fields])
elif(self.justify == 'R'):
return ''.join([field[0].rjust(field[1]) for field in self.fields])
elif(self.justify == 'C'):
return ''.join([field[0].center(field[1]) for field in self.fields])
fieldTest = [('Alex', 10),
('Programmer', 20),
('Salem, OR', 15)]
f = FixedWidthFieldLine(fieldTest)
print f
f = FixedWidthFieldLine(fieldTest,'R')
print f
Returns:
Alex Programmer Salem, OR
Alex Programmer Salem, OR
It's a little difficult to parse your question, but I'm gathering that you are receiving a file or file-like-object, reading it, and replacing some of the values with some business logic results. Is this correct?
The simplest way to overcome string immutability is to write a new string:
# Won't work:
test_string[3:6] = "foo"
# Will work:
test_string = test_string[:3] + "foo" + test_string[6:]
Having said that, it sounds like it's important to you that you do something with this string, but I'm not sure exactly what that is. Are you writing it back to an output file, trying to edit a file in place, or something else? I bring this up because the act of creating a new string (which happens to have the same variable name as the old string) should emphasize the necessity of performing an explicit write operation after the transformation.
You can convert the string to a list and do the slice manipulation.
>>> text = list("some text")
>>> text[0:4] = list("fine")
>>> text
['f', 'i', 'n', 'e', ' ', 't', 'e', 'x', 't']
>>> text[0:4] = list("all")
>>> text
['a', 'l', 'l', ' ', 't', 'e', 'x', 't']
>>> import string
>>> string.join(text, "")
'all text'
It is easy to write function to "modify" string.
def change(string, start, end, what):
length = end - start
if len(what)<length: what = what + " "*(length-len(what))
return string[0:start]+what[0:length]+string[end:]
Usage:
test_string = 'This is test string'
print test_string[5:7]
# is
test_string = change(test_string, 5, 7, 'IS')
# This IS test string
test_string = change(test_string, 8, 12, 'X')
# This IS X string
test_string = change(test_string, 8, 12, 'XXXXXXXXXXXX')
# This IS XXXX string