I'm not particularly experienced with python, so may be doing something silly below. I have the following program:
import os
import re
import linecache
LINENUMBER = 2
angles_file = open("d:/UserData/Robin Wilson/AlteredData/ncaveo/16-June/scan1_high/000/angles.txt")
lines = angles_file.readlines()
for line in lines:
splitted_line = line.split(";")
DN = float(linecache.getline(splitted_line[0], LINENUMBER))
Zenith = splitted_line[2]
output_file = open("d:/UserData/Robin Wilson/AlteredData/ncaveo/16-June/scan1_high/000/DNandZenith.txt", "a")
output_file.write("0\t" + str(DN) + "\t" + Zenith + "\n")
#print >> output_file, str(DN) + "\t" + Zenith
#print DN, Zenith
output_file.close()
When I look at the output to the file I get the following:
0 105.5 0.0
0 104.125 18.0
0 104.0 36.0
0 104.625 54.0
0 104.25 72.0
0 104.0 90.0
0 104.75 108.0
0 104.125 126.0
0 104.875 144.0
0 104.375 162.0
0 104.125 180.0
Which is the right numbers, it just has blank lines between each line. I've tried and tried to remove them, but I can't seem to. What am I doing wrong?
Robin
For a GENERAL solution, remove the trailing newline from your INPUT:
splitted_line = line.rstrip("\n").split(";")
Removing the extraneous newline from your output "works" in this case but it's a kludge.
ALSO: (1) it's not a good idea to open your output file in the middle of a loop; do it once, otherwise you are just wasting resources. With a long enough loop, you will run out of file handles and crash (2) It's not a good idea to hard-wire file names like that, especially hidden in the middle of your script; try to make your scripts reusable.
Change this:
output_file.write("0\t" + str(DN) + "\t" + Zenith + "\n")
to this:
output_file.write("0\t" + str(DN) + "\t" + Zenith)
The Zenith string already contains the trailing \n from the original file when you read it in.
Alternative solution (handy if you are processing lines from file) is to strip the whitespace:
Zenith = Zenith.strip();
EDIT: See comments for details, but there's definitely a better way. [:-1] isn't the best choice, no matter how cool it looks. Use line.rstrip('\n') instead.
The problem is that, unlike file_text.split('\n'), file.readlines() does not remove the \n from the end of each line of input. My default pattern for parsing lines of text goes like this:
with open(filename) as f:
for line in f.readlines():
parse_line(line[:-1]) # funny face trims the '\n'
If you want to make sure there's no whitespace on any of your tokens (not just the first and last), try this:
splitted_line = map (str.strip, line.split (';'))
Related
I have a script that puts the line that starts with #Solution 1 in a new file together with the name of the input file. But I want to add the piece belonging to Major from the input file. Can someone please help me to figure out how to get the piece of text?
The script now:
#!/usr/bin/env python3
import os
dr = "/home/nwalraven/Result_pgx/Runfolder/Runres_Aldy" outdr = "/home/nwalraven/Result_pgx/Runfolder/Aldy_res_txt" tag = ".aldy"
for f in os.listdir(dr):
if f.endswith(tag):
print(f)
new_file_name = f.split('_')[0]+'.txt' # get the name of the file before the '_' and add '.txt' to it
with open(dr+"/"+f) as file:
for line in file.readlines():
f
if line.startswith("#Solution 1"):
with open(outdr+"/"+new_file_name,"a",newline='\n') as new_file:
new_file.write(f.split('.')[0] + "\n")
new_file.write(line + "\n")
if line.startswith("#Solution 2"):
with open(outdr+"/"+new_file_name,"a",newline='\n') as new_file:
new_file.write(line + "\n")
print("Meerdere oplossingen gevonden! Check Aldy bestand" )
The input:
file = EMQN3-S3_COMT.aldy
#Sample Gene SolutionID Major Minor Copy Allele Location Type Coverage Effect dbSNP Code Status
#Solution 1: *Met, *ValB
EMQN3-S3 COMT 1 *Met/*ValB Met;ValB 0 Met 19950234 C>T 530 H62= rs4633
EMQN3-S3 COMT 1 *Met/*ValB Met;ValB 0 Met 19951270 G>A 651 V158M rs4680
EMQN3-S3 COMT 1 *Met/*ValB Met;ValB 1 ValB
file = EMQN3-S3_CYP2B6.aldy
#Sample Gene SolutionID Major Minor Copy Allele Location Type Coverage Effect dbSNP Code Status
#Solution 1: *1.001, *1.001
EMQN3-S3 CYP2B6 1 *1/*1 1.001;1.001 0 1.001
EMQN3-S3 CYP2B6 1 *1/*1 1.001;1.001 1 1.001
The result it gives right now:
EMQN3-S3_COMT.aldy
#Solution 1: *Met, *ValB
EMQN3-S3_CYP2B6.aldy
#Solution 1: *1.001, *1.001
The result I need:
EMQN3-S3_COMT.aldy
#Solution 1: *Met/*ValB
EMQN3-S3_CYP2B6.aldy
#Solution 1: *1/*1
If you print out the line, you could use regular expression to replace text before printing the line.
On the other hand, if you know it always starts with a fixed number of chars, then it's easier and faster to edit the line manually.
With regex:
# Importing regular expressions
import re
# Setting up regex replacement to replace ", " with "/"
regex = "\, "
replacement = "/"
...
# Format the line before printing it
line_formatted = re.sub(regex, replacement, line)
new_file.write(line.replace(regex, replacement) + "\n") # edited
...
Try to replace this part of your script:
...
if line.startswith("#Solution 1"):
with open(outdr+"/"+new_file_name,"a",newline='\n') as new_file:
new_file.write(f.split('.')[0] + "\n")
solution = "/".join([x.strip().split(".")[0] for x in line.split(",")])
new_file.write(solution + "\n")
...
It will do the following:
split the string into two tokens, based on the comma
strip them
remove the decimal part (if any) from the token
rejoin the string using the slash.
Hope it helps.
I have a txt file as here:
pid,party,state,res
SC5,Republican,NY,Donald Trump 45%-Marco Rubio 18%-John Kasich 18%-Ted Cruz 11%
TB1,Republican,AR,Ted Cruz 27%-Marco Rubio 23%-Donald Trump 23%-Ben Carson 11%
FX2,Democratic,MI,Hillary Clinton 61%-Bernie Sanders 34%
BN1,Democratic,FL,Hillary Clinton 61%-Bernie Sanders 30%
PB2,Democratic,OH,Hillary Clinton 56%-Bernie Sanders 35%
what I want to do, is check that the % of each "res" gets to 100%
def addPoll(pid,party,state,res,filetype):
with open('Polls.txt', 'a+') as file: # open file temporarly for writing and reading
lines = file.readlines() # get all lines from file
file.seek(0)
next(file) # go to next line --
#this is suppose to skip the 1st line with pid/pary/state/res
for line in lines: # loop
line = line.split(',', 3)[3]
y = line.split()
print y
#else:
#file.write(pid + "," + party + "," + state + "," + res+"\n")
#file.close()
return "pass"
print addPoll("123","Democratic","OH","bla bla 50%-Asd ASD 50%",'f')
So in my code I manage to split the last ',' and enter it into a list, but im not sure how I can get only the numbers out of that text.
You can use regex to find all the numbers:
import re
for line in lines:
numbers = re.findall(r'\d+', line)
numbers = [int(n) for n in numbers]
print(sum(numbers))
This will print
0 # no numbers in the first line
97
85
97
92
93
The re.findall() method finds all substrings matching the specified pattern, which in this case is \d+, meaning any continuous string of digits. This returns a list of strings, which we cast to a list of ints, then take the sum.
It seems like what you have is CSV. Instead of trying to parse that on your own, Python already has a builtin parser that will give you back nice dictionaries (so you can do line['res']):
import csv
with open('Polls.txt') as f:
reader = csv.DictReader(f)
for row in reader:
# Do something with row['res']
pass
For the # Do something part, you can either parse the field manually (it appears to be structured): split('-') and then rsplit(' ', 1) each - separated part (the last thing should be the percent). If you're trying to enforce a format, then I'd definitely go this route, but regex are also a fine solution too for quickly pulling out what you want. You'll want to read up on them, but in your case, you want \d+%:
# Manually parse (throws IndexError if there isn't a space separating candidate name and %)
percents = [candidate.rsplit(' ', 1)[1] for candidate row['res'].split('-')]
if not all(p.endswith('%') for p in percents):
# Handle bad percent (not ending in %)
pass
else:
# Throws ValueError if any of the percents aren't integers
percents = [int(p[:-1]) for p in percents]
if sum(percents) != 100:
# Handle bad total
pass
Or with regex:
percents = [int(match.group(1)) for match in re.finditer(r'(\d+)%', row['res'])]
if sum(percents) != 100:
# Handle bad total here
pass
Regex is certainly shorter, but the former will enforce more strict formatting requirements on row['res'] and will allow you to later extract things like candidate names.
Also some random notes:
You don't need to open with 'a+' unless you plan to append to the file, 'r' will do (and 'r' is implicit, so you don't have to specify it).
Instead of next() use a for loop!
I have a text file which stores data like name : score e.g.:
bob : 10
fred : 3
george : 5
However, I want to make it so it says
10 : bob
3 : fred
5 : george
What would the code be to flip it like that?
Would I need to separate them first by removing the colon as I have managed this through this code?
file = open("Class 3.txt", "r")
t4 = (file.read())
test =''.join(t4.split(':')[0:10])
print (test)
How would I finish it and make it say the reverse?
This code handles fractional scores (e.g. 9.5), and doesn't care whether there are extra spaces around the : delimiter. It should be much easier to maintain than your current code.
Class 3.txt:
bob : 10
fred : 3
george : 5
Code:
class_num = input('Which class (1, 2, or 3)? ')
score_sort = input('Sort by name or score? ').lower().startswith('s')
with open("Class " + class_num + ".txt", "r") as f:
scores = {name.strip():float(score) for
name,score in (line.strip().split(':') for line in f)}
if score_sort:
for name in sorted(scores, key=scores.get, reverse=True):
print(scores.get(name), ':', name)
else:
for name in sorted(scores):
print(name, ':', scores.get(name))
Input:
3
scores
Output:
10.0 : bob
5.0 : george
3.0 : fred
Input:
3
name
Output:
bob : 10.0
fred : 3.0
george : 5.0
First, this is going to be a lot harder to do whole-file-at-once than line-at-a-time.
But, either way, you obviously can't just split(':') and then ''.join(…). All that's going to do is replace colons with nothing. You obviously need ':'.join(…) to put the colons back in.
And meanwhile, you have to swap the values around on each side of each colon.
So, here's a function that takes just one line, and swaps the sides:
def swap_sides(line):
left, right = line.split(':')
return ':'.join((right, left))
But you'll notice there's a few problems here. The left has a space before the colon; the right has a space after the colon, and a newline at the end. How are you going to deal with that?
The simplest way is to just strip out all the whitespace on both sides, then add back in the whitespace you want:
def swap_sides(line):
left, right = line.split(':')
return ':'.join((right.strip() + ' ', ' ' + left.strip())) + '\n'
But a smarter idea is to treat the space around the colon as part of the delimiter. (The newline, you'll still need to handle manually.)
def swap_sides(line):
left, right = line.strip().split(' : ')
return ' : '.join((right.strip(), left.strip())) + '\n'
But if you think about it, do you really need to add the newline back on? If you're just going to pass it to print, the answer is obviously no. So:
def swap_sides(line):
left, right = line.strip().split(' : ')
return ' : '.join((right.strip(), left.strip()))
Anyway, once you're happy with this function, you just write a loop that calls it once for each line. For example:
with open("Class 3.txt", "r") as file:
for line in file:
swapped_line = swap_sides(line)
print(swapped_line)
Let's learn how to reverse a single line:
line = `bob : 10`
line.partition(' : ') # ('10', ' : ', 'bob')
''.join(reversed(line.partition(' : ')) # 'bob : 10'
Now, combine with reading lines from a file:
for line in open('Class 3.txt').read().splitlines():
print ''.join(reversed(line.partition(' : '))
Update
I am re-writing the code to read the file, line by line:
with open('Class 3.txt') as input_file:
for line in input_file:
line = line.strip()
print ''.join(reversed(line.partition(' : ')))
I have a file which is filled like this:
Samsung CLP 680/ CLX6260 + CLT-C506S/ELS + CLT-M506S/ELS + CLT-Y506S/ELS + 39.50
Xerox Phaser 6000/6010/6015 + 106R01627 + 106R01628 + 106R01629 + 8.43
Xerox DocuPrint 6110/6110mfp + 106R01206 + 106R01204 + 106R01205 + 7.60
Xerox Phaser 6121/6121D + 106R01466 + 106R01467 + 106R01468 + 18.20
When I read it with:
for line in excelRead:
title=line.split("+")
title=[lines.strip()for lines in title]
sometimes there is an "\n" at the end of the line, and sometimes there is not, if line ends with \n splitting gives me 5 elements, if not 9 and etc., until it founds and "\n" as I guess
So, the question is: How do I read only one line in file each time, and obtain 5 elements every time, with or without an "\n" at the end? I can't check all all file whether there is, or not an "\n" at the end
Thanks
You might consider using the csv module to parse this, and placing into a dict by model:
import csv
data={}
with open('/tmp/excel.csv') as f:
for line in csv.reader(f, delimiter='+', skipinitialspace=True):
data[line[0].strip()]=[e.strip() for e in line[1:]]
print data
# {'Samsung CLP 680/ CLX6260': ['CLT-C506S/ELS', 'CLT-M506S/ELS', 'CLT-Y506S/ELS', '39.50'],
'Xerox Phaser 6121/6121D': ['106R01466', '106R01467', '106R01468', '18.20'],
'Xerox DocuPrint 6110/6110mfp': ['106R01206', '106R01204', '106R01205', '7.60'],
'Xerox Phaser 6000/6010/6015': ['106R01627', '106R01628', '106R01629', '8.43']}
for line in excelRead:
title = [x.strip() for x in line.rstrip('\n').split('+')]
It's better to avoid making one variable (title) mean two different things. Rather than give it a different name in your second line, I just removed the line entirely and put the split inside the list comprehension.
Instead of feeding line into split, first I rstrip the \n (removes that character from the end)
When \n is missing, this will split title[4] to give two titles:
import re
data = []
with open('aa.txt') as excelRead:
for line in excelRead:
title=line.split("+")
title=[lines.strip()for lines in title]
while len(title) > 5:
one = re.sub('(\d+\.\d+)', '', title[4])
five = title[4].replace(one, '')
title1 = title[:4] + [five]
title = [one] + title[5:]
data.append(title1)
data.append(title)
for item in data:
print(item)
You could easily make data a dictionary instead of a list.
I have searched high and low for a resolution to this situation, and tested a few different methods, but I haven't had any luck thus far. Basically, I have a file with data in the following format that I need to convert into a CSV:
(previously known as CyberWay Pte Ltd)
0 2019
01.com
0 1975
1 TRAVEL.COM
0 228
1&1 Internet
97 606
1&1 Internet AG
0 1347
1-800-HOSTING
0 8
1Velocity
0 28
1st Class Internet Solutions
0 375
2iC Systems
0 192
I've tried using re.sub and replacing the whitespace between the numbers on every other line with a comma, but haven't had any success so far. I admit that I normally parse from CSVs, so raw text has been a bit of a challenge for me. I would need to maintain the string formats that are above each respective set of numbers.
I'd prefer the CSV to be formatted as such:
foo bar
0,8
foo bar
0,9
foo bar
0,10
foo bar
0,11
There's about 50,000 entries, so manually editing this would take an obscene amount of time.
If anyone has any suggestions, I'd be most grateful.
Thank you very much.
If you just want to replace whitespace with comma, you can just do:
line = ','.join(line.split())
You'll have to do this only on every other line, but from your question it sounds like you already figured out how to work with every other line.
If I have correctly understood your requirement, you need a strip() on all lines and a split based on whitespace on even lines (lines starting from 1):
import re
fp = open("csv.txt", "r")
while True:
line = fp.readline()
if '' == line:
break
line = line.strip()
fields = re.split("\s+", fp.readline().strip())
print "\"%s\",%s,%s" % ( line, fields[0], fields[1] )
fp.close()
The output is a CSV (you might need to escape quotes if they occur in your input):
"Content of odd line",Number1,Number2
I do not understand the 'foo,bar' you place as header on your example's odd lines, though.