From excel to txt - Separate lines - python

I'm doing a program where I export an excel file to .txt and I have to import this .txt file into my program. The main goal is to extract the same part from each line but the problem is that in the .txt file the lines of the excel are being made into a huge string with no /n. Do you know if there is a way to separate them within the program and if so how can I do it?
The file I'm working with can be downloaded in http://we.tl/YtixI1ck6l
and so far I was trying something like
ppi = []
for line in read_text:
prot_interaction = line[0:14]
ppi.append(prot_interaction)
result_ppi = []
for line in read_text:
result = line[-1]
result_ppi.append(result)
But since it's not formatted in lines but just in a single one I'm not getting any good results.

Using that file as an example, use the csv module to parse it.
Example:
import csv
with open('/tmp/Model_Oralome.txt', 'rU') as f:
reader=csv.reader(f, delimiter="\t")
for row in reader:
print row[0]
Prints:
ppi
C4FQL5;Q08426
C8PB60;D2NP19
P40189;Q05655
P22712;Q9NR31
...
P05783;P02751
B5E709;D2NPK7
Q8N7J2;Q9UKZ4
(BTW, the issue you may be having with this particular file is the line terminations are a CR only from a Mac Classic OS. You can fix that in Python by using the Universal Newline mode when you open the file...)

Excel is exporting the text file with carriage returns (\r) instead of newlines (\n).
ppi = []
with open("Model_Oralome.txt",'r') as f:
lines = f.readlines()
lines = lines[0].split('\r')
From here you can iterate through each line of lines. Since it looks like you want the value of the first column:
lines = lines[1:]
for line in lines:
content = line.split('\t')
ppi.append(content[0])

Related

How to delete blank lines in a text file on a windows machine

I am trying to delete all blank lines in all YAML files in a folder. I have multiple lines with nothing but CRLF (using Notepad++), and I can't seem to eliminate these blank lines. I researched this before posting, as always, but I can't seem to get this working.
import glob
import re
path = 'C:\\Users\\ryans\\OneDrive\\Desktop\\output\\*.yaml'
for fname in glob.glob(path):
with open(fname, 'r') as f:
sfile = f.read()
for line in sfile.splitlines(True):
line = sfile.rstrip('\r\n')
f = open(fname,'w')
f.write(line)
f.close()
Here is a view in Notepad++
I want to delete the very first row shown here, as well as all other blank rows. Thanks.
If you use python, you can update the line using:
re.sub(r'[\s\r\n]','',line)
Close the reading file handler before writing.
If you use Notepad++, install the plugin called TextFX.
Replace all occurances of \r\n with blank.
Select all the text
Use the new menu TextFX -> TextFX Edit -> E:Delete Blank Lines
I hope this helps.
You cant write the file you are currently reading in. Also you are stripping things via file.splitlines() from each line - this way you'll remove all \r\n - not only those in empty lines. Store content in a new name and delete/rename the file afterwards:
Create demo file:
with open ("t.txt","w") as f:
f.write("""
asdfb
adsfoine
""")
Load / create new file from it:
with open("t.txt", 'r') as r, open("q.txt","w") as w:
for l in r:
if l.strip(): # only write line if when stripped it is not empty
w.write(l)
with open ("q.txt","r") as f:
print(f.read())
Output:
asdfb
adsfoine
( You need to strip() lines to see if they contain spaces and a newline. )
For rename/delete see f.e. How to rename a file using Python and Delete a file or folder
import os
os.remove("t.txt") # remove original
os.rename("q.txt","t.txt") # rename cleaned one
It's nice and easy...
file_path = "C:\\Users\\ryans\\OneDrive\\Desktop\\output\\*.yaml"
with open(file_path,"r+") as file:
lines = file.readlines()
file.seek(0)
for i in lines:
if i.rstrip():
file.write(i)
Where you open the file, read the lines, and if they're not blank write them back out again.

Trying to append quotes to each item in list.

I am trying to create a script that will take each line in my text file which includes one rule name in each of them. The first script I created worked (finished) but would delete everything in the file. I have been googling for past hour or so trying to take examples and apply them on my own but keep failing. The current script is as follows.
with open('TDAppendlist.txt', 'w') as file:
for line in file:
s = ('""')
seq = (file)
s.join(seq)
with open('TDAppendlist.txt') as file:
line = file.readlines()
for line in file:
line.join('"' + line + '"')
Neither of them are working. Could someone please point me in the correct direction? Thank you all for reading.
First, we'll read all the lines of the file into a list, then we can change them, and finally write them back to the file.
with open('TDAppendlist.txt') as file:
lines = list(file)
with open('TDAppendlist.txt', 'w') as file:
file.write('\n'.join(['"{}"'.format(line.rstrip('\n')) for line in lines]))
That last line can be written out to be more clear
lines = (line.rstrip('\n') for line in lines)
lines = ('"{}"'.format(line) for line in lines)
lines = '\n'.join(lines)
file.write(lines)
This produces an output file TDAppendlist_out that is just like the input, but with quotes surrounding the lines:
with open('TDAppendlist.txt', 'r') as f:
with open('TDAppendlist_out.txt', 'w') as f_out:
for line in f:
f_out.write('\"{}\"'.format(line))
This keeps the input file intact as is should you need it later, and avoids putting everything in the input file into memory all at once.

Extracting a substring from string in Python and putting it to a file

I have a file in the following format
name#company.com, information
name#company2.com, information
....
What I need to do is read in the file and output the email address only to a file. I have the following code created
with open ('n-emails.txt') as f:
lines = f.readlines()
print (lines)
Can someone please show me how to only get the email part of the file and how to output it to a file this is all done on a mac.
2 different ways of doing it:
without csv module: read each line, split according to tokens, strip the blanks, print:
with open ('n-emails.txt') as f:
for line in f:
toks = line.split(",")
if toks:
print(toks[0].strip())
with the csv module, map the opened file on a csv reader, iterate on the rows, print first (stripped) row.
import csv
with open ('n-emails.txt') as f:
cr = csv.reader(delimiter=",")
for row in cr:
print(row[0].strip())
the second method has the advantage of being robust to commas contained in cells, quotes, ... that's why I recommend it.

Writing results into a .txt file

I created a code to take two .txt files, compare them and export the results to another .txt file. Below is my code (sorry about the mess).
Any ideas? Or am I just an imbecile?
Using python 3.5.2:
# Barcodes Search (V3actual)
# Import the text files, putting them into arrays/lists
with open('Barcodes1000', 'r') as f:
barcodes = {line.strip() for line in f}
with open('EANstaging1000', 'r') as f:
EAN_staging = {line.strip() for line in f}
##diff = barcodes ^ EAN_staging
##print (diff)
in_barcodes_but_not_in_EAN_staging = barcodes.difference(EAN_staging)
print (in_barcodes_but_not_in_EAN_staging)
# Exporting in_barcodes_but_not_in_EAN_staging to a .txt file
with open("BarcodesSearch29_06_16", "wt") as BarcodesSearch29_06_16: # Create .txt file
BarcodesSearch29_06_16.write(in_barcodes_but_not_in_EAN_staging) # Write results to the .txt file
From the comments to your question, it sounds like your issue is that you want to save your list of strings as a file. File.write expects a single string as input, while File.writelines expects a list of strings, which is what your data appears to be.
with open("BarcodesSearch29_06_16", "wt") as BarcodesSearch29_06_16:
BarcodesSearch29_06_16.writelines(in_barcodes_but_not_in_EAN_staging)
That will iterate through your list in_barcodes_but_not_in_EAN_staging, and write each element as a separate line in the file BarcodesSearch29_06_16.
Try BarcodesSearch29_06_16.write(str(in_barcodes_but_not_in_EAN_staging)). Also, you'll want to close the file after you're done writing to it with BarcodesSearch29_06_16.close().

Reading specific data from a text file

I have a .dat file that contains over 40,000 lines containing text and data. I want to extract specific data from this file according to the following:
I need a line counter, obviously, so I know when I reach the end of the file.
I want to open the file for reading and another for writing, and read the first line. If the line 2 positions from the first line begins with "Model", I want to print a blank line to the file open for writing and then skip two lines ahead in the file. If the line two positions from the opening line does not start with "Model", then I wish to select the text that is 8 positions from this first line and print that to the file opened for writing. I will then move 11 positions from the first line and so on.
infile = open("ratios.dat","r")
outfile = open("corr_ratios.txt","w")
for aline in infile:
items = (aline+2).split()
if items[0] = "Model"
outfile.write("\n")
aline = aline+2
else
items = aline+8
outfile.write(items)
Files in python are their own iterators and can be worked with / advanced a line at a time like so:
with open('path-to-file.txt') as infile:
for line in infile:
# code here to deal with line.
Additionally, because the file handle is an iterator, it can be advanced explicitly as well:
with open('path-to-file.txt') as infile:
for line in infile:
if condition:
# skip a line
next(infile)
Combining the two, you should be able to use lines, skip lines, etc.
Having reviewed your posted code closer, you're attempting to add an integer to a string (aline + 2). To come closer to your attempted approach, you'd actually do something like this:
lines = infile.readlines()
for lineno, line in enumerate(lines):
targetline = lines[lineno + 2]
This approach loads the entire file into memory, which may or may not be suitable depending on your file size.

Categories

Resources