f=open('students.csv', 'r')
a=f.readline()
length=len(a.split(","))
fw=open('output.csv', 'w')
lst = []
while a:
lst.append(a)
a=f.readline()
for counter in range(length):
for item in lst:
x = len(item.split(","))
if x == length:
x = item.split(",")
#here i want if condition to check whether it is the last element of row and add","?
fw.write(x[counter].split("\n")[0]+",")
#elif the condition that it is the last element of each row to not add ","?
fw.write("\n")
fw.close()
f.close()
join will be your friend here, if you cannot use the csv module:
for counter in range(length):
fw.write(','.join(x[counter] for x in (item.split(',') for item in lst)))
fw.write('\n')
But you should first strip the end of line characters:
a=f.readline().strip()
length=len(a.split(","))
fw=open('output.csv', 'w')
lst = []
while a:
lst.append(a)
a=f.readline().strip()
But your code is neither Pythonic nor efficient.
You split the same string in every iteration of counter, when you could have splitted it once at read time. Next for iterating the lines of a text file, the Pythonic way is to iterate the file. And finaly, with ensure that the files will be properly closed at the end of the block. Your code could become:
with open('students.csv', 'r') as f, open('output.csv', 'w') as fw
lst = [a.strip().split(',') for a in f]
counter = len(lst[0])
for counter in range(length):
fw.write(','.join(x[counter] for x in (item for item in lst)))
fw.write('\n')
Related
Want to seperate a list data into two parts based on condition. If the value is less than "H1000", we want in a first dataframe(Output for list 1) and if it is greater or equal to "H1000" we want in a second dataframe(Output for list2). First column starts the value with H followed by a four numbers.
Here is my python code:
with open(fn) as f:
text = f.read().strip()
print(text)
lines = [[(Path(fn.name), line_no + 1, col_no + 1, cell) for col_no, cell in enumerate(
re.split('\t', l.strip())) if cell != ''] for line_no, l in enumerate(re.split(r'[\r\n]+', text))]
print(lines)
if (lines[:][:][3] == "H1000"):
list1
list2
I am not able to write a python logic to divide the list data into two parts.
Attach python code & file here
So basically you want to check if the number after the H is greater or not than 1000 right? If I'm right then just do like this:
with open(fn) as f:
text = f.read().strip()
print(text)
lines = [[(Path(fn.name), line_no + 1, col_no + 1, cell) for col_no, cell in enumerate(
re.split('\t', l.strip())) if cell != ''] for line_no, l in enumerate(re.split(r'[\r\n]+', text))]
print(lines)
value = lines[:][:][3]
if value[1:].isdigit():
if (int(value[1:]) < 1000):
#list 1
else:
#list 2
we simply take the numerical part of the factor "hxxxx" with the slices, convert it into an integer and compare it with 1000
with open(fn) as f:
text = f.read().strip()
lines =text.split('\n')
list1=[]
list2=[]
for i in lines:
if int(i.split(' ')[0].replace("H",""))>=1000:
list2.append(i)
else:
list1.append(i)
print(list1)
print("***************************************")
print(list2)
I'm not sure exactly where the problem lies. Assuming you read the above text file line by line, you can simply make use of str.__le__ to check your condition, e.g.
lines = """
H0002 Version 3
H0003 Date_generated 5-Aug-81
H0004 Reporting_period_end_date 09-Jun-99
H0005 State WAA
H0999 Tene_no/Combined_rept_no E79/38975
H1001 Tene_holder Magnetic Resources NL
""".strip().split("\n")
# Or
# with open(fn) as f: lines = f.readlines()
list_1, list_2 = [], []
for line in lines:
if line[:6] <= "H1000":
list_1.append(line)
else:
list_2.append(line)
print(list_1, list_2, sep="\n")
# ['H0002 Version 3', 'H0003 Date_generated 5-Aug-81', 'H0004 Reporting_period_end_date 09-Jun-99', 'H0005 State WAA', 'H0999 Tene_no/Combined_rept_no E79/38975']
# ['H1001 Tene_holder Magnetic Resources NL']
I want to get all oneKmers that don't exist in in.txt.
in.txt is sorted in the same order as oneKmer at column 0.
It should be doable in O(N) instead of O(N^2) since both lists are in the same order.
How can I write this ?
import csv
import itertools
tsvfile = open('in.txt', "r")
tsvreader = csv.reader(tsvfile, delimiter=" ")
for i in itertools.product('ACTG', repeat = 18):
oneKmer = ''.join(i)
flag = 1
with open(InFile) as tsvfile:
tsvreader = csv.reader(tsvfile, delimiter=" ")
for line in tsvreader:
if line[0] == oneKmer:
flag = 0
break
if flag:
print(oneKmer)
in.txt:
AAAAAAAAAAAAAAAAAA 1400100
AAAAAAAAAAAAAAAAAC 37055
AAAAAAAAAAAAAAAAAT 70686
AAAAAAAAAAAAAAAAAG 192363
AAAAAAAAAAAAAAAACA 20042
AAAAAAAAAAAAAAAACC 12965
AAAAAAAAAAAAAAAACT 10596
AAAAAAAAAAAAAAAACG 1732
AAAAAAAAAAAAAAAATA 16440
AAAAAAAAAAAAAAAATC 18461
...
The whole in.txt file is 38,569,002,592 bytes with 1,836,020,688 lines.
The expected result should be (4^18 - 1,836,020,688) lines of strings. Of course I will further filter them later in the script.
For an easy example, say I want to print the integers <16 that don't exist in a given sorted list [3,5,6,8,10,11]. The result should be [1,2,4,7,9,12,13,14,15]. The given list is huge, so I want to read it one element at a time. So when I read 3, I know I can print out 1 and 2. Then skip 3, and read the next 5, now I can print out 4 and skip 5.
A few solutions, all processing the supersequence and the subsequence in parallel, taking linear time and constant memory.
Using your easy example:
full = iter(range(1, 16))
skip = iter([3,5,6,8,10,11])
Solution 0: (the one I came up with last, but should've done first)
s = next(skip, None)
for x in full:
if x == s:
s = next(skip, None)
else:
print(x)
Solution 1:
from heapq import merge
from itertools import groupby
for x, g in groupby(merge(full, skip)):
if len(list(g)) == 1:
print(x)
Solution 2:
for s in skip:
for x in iter(full.__next__, s):
print(x)
for x in full:
print(x)
Solution 3:
from functools import partial
until = partial(iter, full.__next__)
for s in skip:
for x in until(s):
print(x)
for x in full:
print(x)
Solution 4:
from itertools import takewhile
for s in skip:
for x in takewhile(s.__ne__, full):
print(x)
for x in full:
print(x)
Output of all solutions:
1
2
4
7
9
12
13
14
15
Solution 0 for your actual problem:
import csv
import itertools
with open('in.txt') as tsvfile:
tsvreader = csv.reader(tsvfile, delimiter=' ')
skip = next(tsvreader, [None])[0]
for i in itertools.product('ACTG', repeat=18):
oneKmer = ''.join(i)
if oneKmer == skip:
skip = next(tsvreader, [None])[0]
else:
print(oneKmer)
Slight variation:
import csv
from itertools import product
from operator import itemgetter
with open('in.txt') as tsvfile:
tsvreader = csv.reader(tsvfile, delimiter=' ')
skips = map(itemgetter(0), tsvreader)
skip = next(skips, None)
for oneKmer in map(''.join, product('ACTG', repeat=18)):
if oneKmer == skip:
skip = next(skips, None)
else:
print(oneKmer)
First, opening files many times cases slow so ACTG loop must be included in file loop. Second, Stdout is slow than you think, so stop print(onemake) and output to file directly. They must improve speed possibly.
I have a .txt file like this:
ancient 0.4882
detained 5.5512
neighboring 2.9644
scores 0.5951
eggs 0.918
excesses 3.0974
proceedings 0.7446
menem 1.7971
I want to display the top 3 words by comparing their value in one list and the remaining words in another list.
i.e., the output for this example should be:
[detained, excesses, neighboring] & [menem, eggs, proceedings, scores, ancient]
How to do that?
EDIT:
I forgot to mention one thing: I want to consider only those words that have a value great than 0.5 How to do that?
import csv
with open('x.txt') as f:
# use space as delimiter
reader = csv.reader(f, delimiter=' ')
# sort by the value in the second place of each line i.e. x[1]
s = sorted(reader, key=lambda x: x[1], reverse=True)
# filter only grater than 0.5 and take the first value only
l = [x[0] for x in s if float(x[1])>0.5]
print l[:3]
print l[3:]
import csv
with open('inputFile.csv','r') as inputFile:
reader = csv.reader(inputFile, delimiter = " ")
word = dict()
for line in reader:
if float(line[1]) > 0.5:
word[line[0]] = float(line[1])
sortedArray = sorted(word.iteritems(), key=lambda x:-x[1])
maxWords = sortedArray[:3]
Remaining = sortedArray[3:]
print maxWords
print Remaining
The answers using csv are more concise than mine but here is another approach.
from operator import itemgetter
with open('file_list_data.txt', 'r') as f:
lines = f.readlines()
records = [l.split() for l in lines]
records_with_numbers = [(r[0], float(r[1])) for r in records if float(r[1]) > 0.5]
sorted_records = sorted(records_with_numbers, key=itemgetter(1), reverse=True)
top_3 = [word for (word, score) in sorted_records[0:3]]
rest = [word for (word, score) in sorted_records[3:]]
I am looking to remove lines from a list that have a 0 in the 4th position. When I write out the file now it is not eliinating all the zero lines.
counter = 0
for j in all_decisions:
if all_decisions[counter][4] == 0:
all_decisions.remove(j)
counter += 1
ofile = open("non_zero_decisions1.csv","a")
writer = csv.writer(ofile, delimiter=',')
for each in all_decisions:
writer.writerow(each)
ofile.close()
Use a list comprehension.
all_decisions = [x for x in all_decisions if x[4] != 0]
Or, use filter.
all_decisions = filter(lambda x: x[4] != 0, all_decisions)
The way you're doing this is not a good idea because you're modifying all_decisions while you're iterating over it. If you wanted to do it in a loop, I would suggest something like:
temp = []
for x in all_decisions:
if x[4] != 0:
temp.append(x)
all_decisions = temp
But this is basically just a more verbose equivalent of the list comprehension and filter approaches I showed above.
I think the problem is in your loop which eliminates the lines:
counter = 0
for j in all_decisions:
if all_decisions[counter][4] == 0:
all_decisions.remove(j)
counter += 1
If you remove an element, you also bump the counter. The consequence of that is that you're skipping lines. So you might miss lines to be removed. Try only bumping the counter if you didn't remove an element, i.e.
counter = 0
for j in all_decisions:
if all_decisions[counter][4] == 0:
all_decisions.remove(j)
else:
counter += 1
That being said, a more concise way to do what you want would be
with open("non_zero_decisions1.csv","a") as ofile:
writer = csv.writer(ofile, delimiter=',')
writer.writerows(d for d in all_decisions if d[4] != 0)
The with clause will take care of calling close on ofile after executing the code, even if an exception is thrown. Also, csv.writer features a writerows method which takes a list of rows. Thirdly, you can use a generator expression d for d in all_decisions if d[4] != 0 to replace your filtering loop.
I'm writing a huge code and one of the little things I need it to do is go over a text file that is divided to different lines.
i need it to create a new list of lines every time the line is empty. for example if the text is: (each number is a new line)
1
2
3
4
5
6
3
1
2
it should build 3 different lists: [1,2,3,4], [5,6,3], [1,2]
this is my code so far (just getting started):
new_list=[]
my_list=[]
doc=open(filename, "r")
for line in doc:
line=line.rstrip()
if line !="":
new_list.append(line)
return new_list
Ok, This should work now:
initial_list, temp_list = [], []
for line in open(filename):
if line.strip() == '':
initial_list.append(temp_list)
temp_list = []
else: temp_list.append(line.strip())
if len(temp_list) > 0: initial_list.append(temp_list)
final_list = [item for item in initial_list if len(item) > 0]
print final_list
You could do something like:
[x.split() for x in fileobject if x.strip()]
To get integers, you could use map:
[map(int,x.split()) for x in fileobject if x.strip()]
where fileobject is the object returned by open. This is probably best to do in a context manager:
with open(filename) as fileobject:
data_list = [map(int,x.split()) for x in fileobject if x.strip()]
Reading some of the comments on the other post, it seems that I also didn't understand your question properly. Here's my stab at correcting it:
with open(filename) as fileobject:
current = []
result = [current]
for line in fileobject:
if line.strip(): #Non-blank line -- Extend current working list.
current.extend(map(int,line.split()))
else: #blank line -- Start new list to work with
current = []
result.append(current)
Now your resulting list should be contained in result.