I have a file of paths called test.txt
/groups/cgsd/javed/validation_set/LeungSY_Targeted_SS-190528-01a/Clean/pfg001G_1_Clean.fastq.gz
/groups/cgsd/javed/validation_set/LeungSY_Targeted_SS-190528-01a/Clean/pfg001G_2_Clean.fastq.gz
/groups/cgsd/javed/validation_set/LeungSY_Targeted_SS-190528-01a/Clean/pfg001T_1_Clean.fastq.gz
/groups/cgsd/javed/validation_set/LeungSY_Targeted_SS-190528-01a/Clean/pfg001T_2_Clean.fastq.gz
/groups/cgsd/javed/validation_set/LeungSY_Targeted_SS-190528-01a/Clean/pfg002G_1_Clean.fastq.gz
/groups/cgsd/javed/validation_set/LeungSY_Targeted_SS-190528-01a/Clean/pfg002G_2_Clean.fastq.gz
/groups/cgsd/javed/validation_set/LeungSY_Targeted_SS-190528-01a/Clean/pfg002T_1_Clean.fastq.gz
/groups/cgsd/javed/validation_set/LeungSY_Targeted_SS-190528-01a/Clean/pfg002T_2_Clean.fastq.gz
Notice that the number of lines is even and always even, my final goal is to parse this file and create a new one looping through these paths on a two by two basis. I am trying enumerate function but this will not parse two by two. Furthermore, I'm going out of range because indexing the way I'm doing is wrong. It would also be great if someone could tell me how to index properly with enumerate.
with open('./src/test.txt') as f:
for index,line in enumerate(f):
sample = re.search(r'pfg[\dGT]+',line)
sample_string = sample.group(0)
#print(sample_string)
print('{{"name":"{0}","readgroup":"{0}","platform_unit":"{0}","fastq_1":"{1}","fastq_2":"{2}","library":"{0}"}},'.format(sample_string,line,line[index+1]))
The result is something like this:
{"name":"pfg001G","readgroup":"pfg001G","platform_unit":"pfg001G","fastq_1":"/groups/cgsd/javed/validation_set/LeungSY_Targeted_SS-190528-01a/Clean/pfg001G_1_Clean.fastq.gz
","fastq_2":"g","library":"pfg001G"},
{"name":"pfg001G","readgroup":"pfg001G","platform_unit":"pfg001G","fastq_1":"/groups/cgsd/javed/validation_set/LeungSY_Targeted_SS-190528-01a/Clean/pfg001G_2_Clean.fastq.gz
","fastq_2":"r","library":"pfg001G"},
{"name":"pfg001T","readgroup":"pfg001T","platform_unit":"pfg001T","fastq_1":"/groups/cgsd/javed/validation_set/LeungSY_Targeted_SS-190528-01a/Clean/pfg001T_1_Clean.fastq.gz
","fastq_2":"o","library":"pfg001T"},
{"name":"pfg001T","readgroup":"pfg001T","platform_unit":"pfg001T","fastq_1":"/groups/cgsd/javed/validation_set/LeungSY_Targeted_SS-190528-01a/Clean/pfg001T_2_Clean.fastq.gz
","fastq_2":"u","library":"pfg001T"},
{"name":"pfg002G","readgroup":"pfg002G","platform_unit":"pfg002G","fastq_1":"/groups/cgsd/javed/validation_set/LeungSY_Targeted_SS-190528-01a/Clean/pfg002G_1_Clean.fastq.gz
","fastq_2":"p","library":"pfg002G"},
{"name":"pfg002G","readgroup":"pfg002G","platform_unit":"pfg002G","fastq_1":"/groups/cgsd/javed/validation_set/LeungSY_Targeted_SS-190528-01a/Clean/pfg002G_2_Clean.fastq.gz
","fastq_2":"s","library":"pfg002G"},
{"name":"pfg002T","readgroup":"pfg002T","platform_unit":"pfg002T","fastq_1":"/groups/cgsd/javed/validation_set/LeungSY_Targeted_SS-190528-01a/Clean/pfg002T_1_Clean.fastq.gz
","fastq_2":"/","library":"pfg002T"},
{"name":"pfg002T","readgroup":"pfg002T","platform_unit":"pfg002T","fastq_1":"/groups/cgsd/javed/validation_set/LeungSY_Targeted_SS-190528-01a/Clean/pfg002T_2_Clean.fastq.gz","fastq_2":"c","library":"pfg002T"},
Clearly the indexation is wrong since it's going through every element of my path that is g r etc instead of printing the next path. For the first iteration the next path printed should be: "fastq_2":"/groups/cgsd/javed/validation_set/LeungSY_Targeted_SS-190528-01a/Clean/pfg001G_2_Clean.fastq.gz".
I believe the problem itself can be tackled with itertools more elegantly I just don't know how to do it. Would also be great if someone could tell me if an indexation with enumerate could also work.
One problem is that you are trying to access the data from the second line of the pair before you have read it. Additionally you can not access the second line with line[index + 1] because that refers to a character in the current line, not the next line which hasn't yet been read.
So you need to keep track of pairs of lines. You can use the index provided by enumerate() to determine whether the current line is the first (because it is an even number) or the second (because it's odd). Store the name and path for fastq_1 when you read the first line. Only write the output on the second line. Like this:
import re
with open('test.txt') as f:
for index, line in enumerate(f):
if index % 2 == 0: # even, so this is the first line of a pair
name = re.search(r'pfg[\dGT]+',line).group(0)
fastq_1 = line.rstrip()
else: # odd, so second line. Emit result
fastq_2 = line.rstrip()
print('{{"name":"{0}","readgroup":"{0}","platform_unit":"{0}","fastq_1":"{1}","fastq_2":"{2}","library":"{0}"}},'.format(name, fastq_1, fastq_2))
line.rstrip() is required to remove the trailing new line character at the end of each line.
#mhawke already provided a good solution, but to give another approach, "looping through these ... on a two by two basis" can be done with the more_itertools.chunked function from the more_itertools library or with the grouper() recipe from the Python manual.
This also gives options for what should happen when the last line is an odd one; whether that should raise an error or pair it with a default value.
You may want to consider that when you're assigning index to variable, you're getting the index character of that string not the indexation of it.
What you can do is to assign th e file to a list then get the index location so, you can switch between line as you want.
Still don't understand point, do you want to switch between lines in both fastq_1 and fastq_2 or you each path be according to its key?
Code Syntax
with open(path) as f:
lis = list(f)
for index, line in enumerate(lis):
try:
sample = re.search(r'pfg[\dGT]+',line)
sample_string = sample.group(0)
print(f'{{"name":"{sample_string}","readgroup":"{sample_string}","platform_unit":"{sample_string}","fastq_1":"{line}","fastq_2":"{lis[index+1]}","library":"{sample_string}"}},')
except IndexError:
break
Output
{"name":"pfg001G","readgroup":"pfg001G","platform_unit":"pf
g001G","fastq_1":"/groups/cgsd/javed/validation_set/Leung
SY_Targeted_SS-190528-01a/Clean/pfg001G_1_Clean.fastq.gz
","fastq_2":"/groups/cgsd/javed/validation_set/LeungSY_Ta
rgeted_SS-190528-01a/Clean/pfg001G_2_Clean.fastq.gz
","library":"pfg001G"},
{"name":"pfg001G","readgroup":"pfg001G","platform_unit":"pf
g001G","fastq_1":"/groups/cgsd/javed/validation_set/Leung
SY_Targeted_SS-190528-01a/Clean/pfg001G_2_Clean.fastq.gz
","fastq_2":"/groups/cgsd/javed/validation_set/LeungSY_Targeted_SS-190528-01a/Clean/pfg001T_1_Clean.fastq.gz
","library":"pfg001G"},
{"name":"pfg001T","readgroup":"pfg001T","platform_unit":"pf
g001T","fastq_1":"/groups/cgsd/javed/validation_set/Leung
SY_Targeted_SS-190528-01a/Clean/pfg001T_1_Clean.fastq.gz
","fastq_2":"/groups/cgsd/javed/validation_set/LeungSY_Targeted_SS-190528-01a/Clean/pfg001T_2_Clean.fastq.gz
","library":"pfg001T"},
{"name":"pfg001T","readgroup":"pfg001T","platform_unit":"pf
g001T","fastq_1":"/groups/cgsd/javed/validation_set/Leung
SY_Targeted_SS-190528-01a/Clean/pfg001T_2_Clean.fastq.gz
","fastq_2":"/groups/cgsd/javed/validation_set/LeungSY_Targeted_SS-190528-01a/Clean/pfg002G_1_Clean.fastq.gz
","library":"pfg001T"},
{"name":"pfg002G","readgroup":"pfg002G","platform_unit":"pf
g002G","fastq_1":"/groups/cgsd/javed/validation_set/Leung
SY_Targeted_SS-190528-01a/Clean/pfg002G_1_Clean.fastq.gz
","fastq_2":"/groups/cgsd/javed/validation_set/LeungSY_Targeted_SS-190528-01a/Clean/pfg002G_2_Clean.fastq.gz
","library":"pfg002G"},
{"name":"pfg002G","readgroup":"pfg002G","platform_unit":"pf
g002G","fastq_1":"/groups/cgsd/javed/validation_set/LeungSY_Targeted_SS-190528-01a/Clean/pfg002G_2_Clean.fastq.gz
","fastq_2":"/groups/cgsd/javed/validation_set/LeungSY_Targeted_SS-190528-01a/Clean/pfg002T_1_Clean.fastq.gz
","library":"pfg002G"},
{"name":"pfg002T","readgroup":"pfg002T","platform_unit":"pfg002T","fastq_1":"/groups/cgsd/javed/validation_set/Leung
SY_Targeted_SS-190528-01a/Clean/pfg002T_1_Clean.fastq.gz
","fastq_2":"/groups/cgsd/javed/validation_set/LeungSY_Targeted_SS-190528-01a/Clean/pfg002T_2_Clean.fastq.gz
","library":"pfg002T"},
[Program finished]
tambola_callout = {}
for line in open("bingo-call-out.txt"):
num, callout = line.split(";")
tambola_callout[num] = callout
don't know what the problem, what can I do?
This line:
num, callout = line.split(';')
is expecting that your call to split will return a list of exactly two elements. Python will error out if you try to unpack too few or too many values during an assignment.
For example, this will return a single element list:
'something'.split(';') # == ['something']
Make sure your string is what you expect it to be.
it simply means that one of the lines in your file bingo-call-out.txt has the format of <all characters here> instead <some characters here>;<some characters here>
What this error means is that in a general scenario, suppose line = abcd;efgh
line.split(";") will return an array of two elements [abcd,efgh] which will be assigned tonum and callout respectively.
Now if there is a line = abcde, so line.split(';') is returning just ['abcde'] which is a single element list, which can be unpacked into 2 variables like your syntax intends to.
I wrote a script for system automation, but I'm getting the error described in the title. My code below is the relevant portion of the script. What is the problem?
import csv
import os
DIR = "C:/Users/Administrator/Desktop/key_list.csv"
def Customer_List(csv):
customer = open(DIR)
for line in customer:
row = []
(row['MEM_ID'],
row['MEM_SQ'],
row['X_AUTH_USER'],
row['X_AUTH_KEY'],
row['X_STORAGE_URL'],
row['ACCESSKEY'],
row['ACCESSKEYID'],
row['ACCESSKEY1'],
row['ACCESSKEYID1'],
row['ACCESSKEY2'],
row['ACCESSKEYID2'])=line.split()
if csv == row['MEM_ID']:
customer.close()
return(row)
else:
print ("Not search for ID")
return([])
id_input = input("Please input the Customer ID(Email): ")
result = Customer_List(id_input)
if result:
print ("iD: " + id['MEM_ID']
For the line
line.split()
What are you splitting on? Looks like a CSV, so try
line.split(',')
Example:
"one,two,three".split() # returns one element ["one,two,three"]
"one,two,three".split(',') # returns three elements ["one", "two", "three"]
As #TigerhawkT3 mentions, it would be better to use the CSV module. Incredibly quick and easy method available here.
The error message is fairly self-explanatory
(a,b,c,d,e) = line.split()
expects line.split() to yield 5 elements, but in your case, it is only yielding 1 element. This could be because the data is not in the format you expect, a rogue malformed line, or maybe an empty line - there's no way to know.
To see what line is causing the issue, you could add some debug statements like this:
if len(line.split()) != 11:
print line
As Martin suggests, you might also be splitting on the wrong delimiter.
Looks like something is wrong with your data, it isn't in the format you are expecting. It could be a new line character or a blank space in the data that is tinkering with your code.
I'm trying to join each word from a .txt list to a string eg{ 'word' + str(1) = 'word1' }
with this code:
def function(item):
for i in range(2):
print item + str(i)
with open('dump/test.txt') as f:
for line in f:
function(str(line))
I'll just use a txt file containing just two words ('this', 'that').
What I get is:
this
0
this
1
that0
that1
What I was expecting:
this0
this1
that0
that1
It works fine if I just use function('this') and function('that') but why doesn't it work with the txt input?
--- Edit:
Solved, thank you!
Problem was caused by
newline characters in the strings received
Solution: see Answers
You should change
print item + str(i)
to
print item.rstrip() + str(i)
This will remove any newline characters in the strings received in function.
A couple of other tips:
A better way to print data is to use the .format method, e.g. in your case:
print '{}{}'.format(item.strip(), i)
This method is very flexible if you have a more complicated task.
All rows read from a file are strings - you don't have to call str() on them.
The first line read by python is this\n and when you append 0 and 1 to this you get this\n0 and this\n1. Whereas in case in the second line you are not having a new line at the end of the file (inferring from what you are getting on printing). So appending is working fine for it.
For removing the \n from the right end of the string you should use rstrip('\n')
print (item.rstrip('\n') + str(i))
In the first "This\n" is there that's why you are getting not formatted output. pass the argument after removing new line char.
I am just two days to Python and also to this Forum. Please bear if my question looks silly. I tried searching the stack overflow, but couldn't correlate the info whats been given.
Please help me on this
>>> import re
>>> file=open("shortcuts","r")
>>> for i in file:
... i=i.split(' ',2)
... if i[1] == '^/giftfordad$':
... print i[1]
...
^/giftfordad$
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
IndexError: list index out of range
I am receiving my desire output but along with list index out of range error.
My file shortcuts contains four columns which are delimited with SPACE.
Also please help me how to find the pattern if the value resides in a variable.
For example
var1=giftcard
How to find ^/var$ in my file. Thanks for help in advance.
From Jon's point, I arrive at this. But still looking for answer. the below gives me an empty result. Need some tweak in regex for [^/ and $]
>>> import re
>>> with open('shortcut','r') as fin:
... lines = (line for line in fin if line.strip() and not line.strip().startswith('#'))
... for line in lines:
... if re.match("^/giftfordad$",line):
... print line
From my shell command, i arrive at the answer easily. Can somebody please write/correct this piece of code achieving the results, I'm looking for. Many thanks
$grep "\^\/giftfordad\\$" shortcut
RewriteRule ^/giftfordad$ /home/sathya/?id=456 [R,L]
In this:
i=i.split(' ',2)
if i[1] == '^/giftfordad$':
It would imply that i is now a list of length 1 (ie, there was no ' ' character to split on).
Also, it looks like you might be trying to use a regular expression and if i[1] == '^/giftfordad$' is not the way Python does those. That comparison would be written as:
if i[1] == '/giftfordad':
However, that's a completely valid string if you're grabbing it from a file of a list of regular expressions ;)
Just seen your example:
If you're processing an .htaccess like file, you'll want to ignore blank lines and presumably commented lines...
with open('shortcuts') as fin:
lines = (line for line in fin if line.strip() and not line.strip().startswith('#'))
for line in lines:
stuff = line.split(' ', 2)
# etc...
I took the line you gave as an example, and here is what I found, hoping that fits your expectations (I understood you wanted to replace ^/tv$ by ^/giftfordad$ in column #2):
>>> s = 'RewriteRule ^/tv$ /home/sathya?id=123 [R=301,L]'
>>> parts = s.split()
>>> parts
['RewriteRule', '^/tv$', '/home/sathya?id=123', '[R=301,L]']
>>> if len(parts) > 1:
part = parts[1]
if not "^/giftfordad$" in part:
print ' '.join([parts[0]] + ["^/giftfordad$"] + parts[2:])
else:
print s
RewriteRule ^/giftfordad$ /home/sathya?id=123 [R=301,L]
The line with join is the most complex: I recreate a list by concatenating:
the 1st column unchanged
the 2nd column replaced by ^/giftfordad$
the rest of the columns unchanged
join is then used to join all these elements as a string.