I have to parse some data, I've successfully made a script that finds the start and end of the test, and will print out the data in between those.
Python, the line here is from the csv library
else:
print(line)
# csv_file = open(title+'.txt', "w")
# writer = csv.writer(csv_file)
# csv.writer(csv_file).writerow(line)
the csv writes extremely funky data instead of the line as a row. Printing out the line looks exactly as it does in the txt file.
Printing out the line looks exactly as it does in the txt file.
print function does accept optional file argument so if you want to write file exactly like it would look at stdout you might do
...
with open("file.txt","w") as f:
print(line, file=f)
I have an input file that contains data in the same format repeatedly across 5 rows. I need to format this data into one row (CSV file) and have only few fields relevant to me. How do i achieve the mentioned output with the input file provided.
Note - I'm very new to learning any language and haven't reached to this depth of details yet to write my own. I have already written the code where i'm importing the input file, reaching to a specif word and then printing the rest of the data(this is where i need help as i don't need all the information in the input as using space is delimiter is not giving the output in correct columns). I have also written the code to write the output in a csv file.
Note 2 - I'm very to this forum as well and kindly excuse me in case i have made any posting in posting my query.
Input -
Input File
Output -
Output File
import itertools, csv
You should read in the file and parse it manually, then use the csv module to write it to a .csv file:
import re
with open('myfile.txt', 'r') as f:
lines = f.readlines()
# divide on whitespace characters, but not single spaces
lines = [re.split("\s\s+", line) for line in lines]
with open('output.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile, delimiter=' ', quotechar='|', quoting=csv.QUOTE_MINIMAL)
for line in lines:
writer.writerow(lines)
But this will include every piece of data. You can iterate through lines and remove the fields you don't want to keep. So before you do the csv writing, you could do:
def filter_line(line):
# see how the input file was parsed
print(line)
# for example, only keep the first 2 columns
return [line[0], line[1]]
lines = [filter_line(line) for line in lines]
I have a file in the following format
name#company.com, information
name#company2.com, information
....
What I need to do is read in the file and output the email address only to a file. I have the following code created
with open ('n-emails.txt') as f:
lines = f.readlines()
print (lines)
Can someone please show me how to only get the email part of the file and how to output it to a file this is all done on a mac.
2 different ways of doing it:
without csv module: read each line, split according to tokens, strip the blanks, print:
with open ('n-emails.txt') as f:
for line in f:
toks = line.split(",")
if toks:
print(toks[0].strip())
with the csv module, map the opened file on a csv reader, iterate on the rows, print first (stripped) row.
import csv
with open ('n-emails.txt') as f:
cr = csv.reader(delimiter=",")
for row in cr:
print(row[0].strip())
the second method has the advantage of being robust to commas contained in cells, quotes, ... that's why I recommend it.
I tried to write output file as a CSV file but getting either an error or not the expected result. I am using Python 3.5.2 and 2.7 also.
Getting error in Python 3.5:
wr.writerow(var)
TypeError: a bytes-like object is required, not 'str'
and
In Python 2.7, I am getting all column result in one column.
Expected Result:
An output file same format as the input file.
Code:
import csv
f1 = open("input_1.csv", "r")
resultFile = open("out.csv", "wb")
wr = csv.writer(resultFile, quotechar=',')
def sort_duplicates(f1):
for i in range(0, len(f1)):
f1.insert(f1.index(f1[i])+1, f1[i])
f1.pop(i+1)
for var in f1:
#print (var)
wr.writerow([var])
If I am using resultFile = open("out.csv", "w"), I get one row extra in the output file.
If I am using above code, getting one row and column extra.
On Python 3, csv requires that you open the file in text mode, not binary mode. Drop the b from your file mode. You should really use newline='' too:
resultFile = open("out.csv", "w", newline='')
Better still, use the file object as a context manager to ensure it is closed automatically:
with open("input_1.csv", "r") as f1, \
open("out.csv", "w", newline='') as resultFile:
wr = csv.writer(resultFile, dialect='excel')
for var in f1:
wr.writerow([var.rstrip('\n')])
I've also stripped the lines from f1 (just to remove the newline) and put the line in a list; csv.writer.writerow wants a sequence with columns, not a single string.
Quoting the csv.writer() documentation:
If csvfile is a file object, it should be opened with newline='' [1]. [...] All other non-string data are stringified with str() before being written.
[1] If newline='' is not specified, newlines embedded inside quoted fields will not be interpreted correctly, and on platforms that use \r\n linendings on write an extra \r will be added. It should always be safe to specify newline='', since the csv module does its own (universal) newline handling.
Others have answered that you should open the output file in text mode when using Python 3, i.e.
with open('out.csv', 'w', newline='') as resultFile:
...
But you also need to parse the incoming CSV data. As it is your code reads each line of the input CSV file as a single string. Then, without splitting that line into its constituent fields, it passes the string to the CSV writer. As a result, the csv.writer will treat the string as a sequence and output each character , including any terminating new line character, as a separate field. For example, if your input CSV file contains:
1,2,3,4
Your output file would be written like this:
1,",",2,",",3,",",4,"
"
You should change the for loop to this:
for row in csv.reader(f1):
# process the row
wr.writerow(row)
Now the input CSV file will be parsed into fields and row will contain a list of strings - one for each field. For the previous example, row would be:
for row in csv.reader(f1):
print(row)
['1', '2', '3', '4']
And when that list is passed to the csv.writer the output to the file will be:
1,2,3,4
Putting all of that together you get this code:
import csv
with open('input_1.csv') as f1, open('out.csv', 'w', newline='') as resultFile:
wr = csv.writer(resultFile, dialect='excel')
for row in csv.reader(f1):
wr.writerow(row)
open file without b mode
b mode open your file as binary
you can open file as w
open_file = open("filename.csv", "w")
You are opening the input file in normal read mode but the output file is opened in binary mode, correct way
resultFile = open("out.csv", "w")
As shown above if you replace "wb" with "w" it will work.
I'm doing a program where I export an excel file to .txt and I have to import this .txt file into my program. The main goal is to extract the same part from each line but the problem is that in the .txt file the lines of the excel are being made into a huge string with no /n. Do you know if there is a way to separate them within the program and if so how can I do it?
The file I'm working with can be downloaded in http://we.tl/YtixI1ck6l
and so far I was trying something like
ppi = []
for line in read_text:
prot_interaction = line[0:14]
ppi.append(prot_interaction)
result_ppi = []
for line in read_text:
result = line[-1]
result_ppi.append(result)
But since it's not formatted in lines but just in a single one I'm not getting any good results.
Using that file as an example, use the csv module to parse it.
Example:
import csv
with open('/tmp/Model_Oralome.txt', 'rU') as f:
reader=csv.reader(f, delimiter="\t")
for row in reader:
print row[0]
Prints:
ppi
C4FQL5;Q08426
C8PB60;D2NP19
P40189;Q05655
P22712;Q9NR31
...
P05783;P02751
B5E709;D2NPK7
Q8N7J2;Q9UKZ4
(BTW, the issue you may be having with this particular file is the line terminations are a CR only from a Mac Classic OS. You can fix that in Python by using the Universal Newline mode when you open the file...)
Excel is exporting the text file with carriage returns (\r) instead of newlines (\n).
ppi = []
with open("Model_Oralome.txt",'r') as f:
lines = f.readlines()
lines = lines[0].split('\r')
From here you can iterate through each line of lines. Since it looks like you want the value of the first column:
lines = lines[1:]
for line in lines:
content = line.split('\t')
ppi.append(content[0])