Save rows of Dataframe to Separate txt files - python

Output displayed on the terminal screen has to be saved into seperate txt files. The output terminal looks like this:
A1001
2
B1001
6
C1001
4
which has been derived from a while loop via commands:
print(apartmentno)
print(df)
How can I print each pair of output into a text file named output1.txt, output2.txt, output3.txt.... outputn.txt?
Expected output:
output1.txt
A1001
2
output2.txt
B1001
6
output3.txt
C1001
4
So far I have tried the following code:
.
.
.
df = disp_date.ClassAttend.sum()
print(apartmentno)
print(final_df)
z = []
z.append(apartmentno)
z.append(df)
print(z)
ctr = 0
for i in z:
ctr += 1
f = open(('d:\\output'+ str(ctr)+ '.txt'), mode='w')
f.write(apartmentno)
f.write(df)
f.close()
Also, I get an error FileNotFoundError: [Errno 2] No such file or directory although I am opening in 'w' mode.
How can I get my required output?

It is not entirely clear to me how you are constructing your loop, but hopefully the following information is helpful to you.
You can use enumerate for the index numbers in your loop, and use the with keyword when dealing with file objects. This way you can create your files. Regarding with:
The advantage is that the file is properly closed after its suite finishes, even if an exception is raised at some point.
As of Python 3.6, you can also use f-strings, which makes the construction of your filename more readable:
for index, value in enumerate(z):
with open(f'output{index}.txt', 'w') as f:
f.write(value)

Related

Using python to search for strings in a file and use output to group the content of the second folder

I tried writing a python code that search for one/more strings in file1.txt, and then then make a change to the findall output (e.g., change cap0001 to 1). Next the code use the modfied output to group the content of file2.txt based on matches to column "capNo" in File2.txt.
File1.txt:
>cap00001 supr2
x2shh qewrrw
dsfff rggfdd
>cap00002 supr5
dadamic adertsy
waeee ddccmet
File2.txt
Ref capNo qual
AM1 1 Good
AM8 1 Good
AM7 2 Poor
AM2 2 Good
AM9 2 Good
AM6 3 Poor
AM1 3 Poor
AM2 3 Good
Require output:
capNo counts
1 2
2 3
The following code did not work for me:
import re
With open("File1.txt","r") as InFile1:
for line in InFile1:
match=re.findall(r'cap\d+',line)
if len(match) > 0:
match=match.remove(cap0000)
With open("File2.txt","r") as InFile2:
df=InFile2.read()
df2=df.groupby(match)["capNo"].value_counts()
print(df2)
How can I get this code working? Thanks
Change the Withs to with
Call the read function:
e.g.
with open('File1.txt') as f:
InFile1 = f.read()
# Do something with InFile1
In your code df is a string - you can't call groupby on it (did you mean to convert it to a pandas DataFrame?)

searching a specific values from one file in another file using nested for loop [duplicate]

This question already has an answer here:
Script skips second for loop when reading a file
(1 answer)
Closed 2 years ago.
I have two files, file A.txt has hundreds of rows of format (ip,mac) and file B.txt has hundreds of rows of format (mac). what I am looking for is to search the (macs) from file B in file A and if found to print the line (ip, mac) from file A. there are already more than 100 mac matches between the two files but with the code I wrote it returns only the first match.
below is my simple code
with open("B.txt", "r") as out_mac_file, open("A.txt", "r") as out_arp_file:
for x in out_mac_file:
for y in out_arp_file:
if x in y:
print(y)
Any idea what could be wrong in the code, or if there other ways to do that?
Edit: Adding the format of file A and file B
File B
64167f18cd3d
64167f18c77a
64167f067082
64167f0670b5
64167f067400
64167f0674e5
64167f06740d
File A
10.55.14.160,64167f869f18
10.55.20.59,64167f37a5f4
10.55.20.62,64167f8866e0
10.55.20.65,64167f8b4bd8
10.55.20.66,64167f372a72
10.55.20.67,64167f371436
If you are ok with using pandas (since your data is in coma separated format):
import pandas as pd
a = pd.read_csv("A.txt", header=None, names=["mac"])
b = pd.read_csv("B.txt", header=None, names=["ip","mac"])
for mac in a["mac"]:
result = b[b["mac"] == mac]
if len(result) > 0:
print (result)
Or just a oneliner instead of a loop:
b.merge(a, on="mac")

Concatenate multiple text files of DNA sequences in Python or R?

I was wondering how to concatenate exon/DNA fasta files using Python or R.
Example files:
So far I really liked using R ape package for the cbind method, solely because of the fill.with.gaps=TRUE attribute. I really need gaps inserted when a species is missing an exon.
My code:
ex1 <- read.dna("exon1.txt", format="fasta")
ex2 <- read.dna("exon2.txt", format="fasta")
output <- cbind(ex1, ex2, fill.with.gaps=TRUE)
write.dna(output, "Output.txt", format="fasta")
Example:
exon1.txt
>sp1
AAAA
>sp2
CCCC
exon2.txt
>sp1
AGG-G
>sp2
CTGAT
>sp3
CTTTT
Output file:
>sp1
AAAAAGG-G
>sp2
CCCCCTGAT
>sp3
----CTTTT
So far I am having trouble trying to apply this technique when I have multiple exon files (trying to figure out a loop to open and execute the cbind method for all files ending with .fa in the directory), and sometimes not all files have exons that are all identical in length - hence DNAbin stops working.
So far I have:
file_list <- list.files(pattern=".fa")
myFunc <- function(x) {
for (file in file_list) {
x <- read.dna(file, format="fasta")
out <- cbind(x, fill.with.gaps=TRUE)
write.dna(out, "Output.txt", format="fasta")
}
}
However when I run this and I check my output text file, it misses many exons and I think that is because not all files have the same exon length... or my script is failing somewhere and I can't figure it out: (
Any ideas? I can also try Python.
If you prefer using Linux one liners you have
cat exon1.txt exon2.txt > outfile
if you want only the unique records from the outfile use
awk '/^>/{f=!d[$1];d[$1]=1}f' outfile > sorted_outfile
I just came out with this answer in Python 3:
def read_fasta(fasta): #Function that reads the files
output = {}
for line in fasta.split("\n"):
line = line.strip()
if not line:
continue
if line.startswith(">"):
active_sequence_name = line[1:]
if active_sequence_name not in output:
output[active_sequence_name] = []
continue
sequence = line
output[active_sequence_name].append(sequence)
return output
with open("exon1.txt", 'r') as file: # read exon1.txt
file1 = read_fasta(file.read())
with open("exon2.txt", 'r') as file: # read exon2.txt
file2 = read_fasta(file.read())
finaldict = {} #Concatenate the
for i in list(file1.keys()) + list(file2.keys()): #both files content
if i not in file1.keys():
file1[i] = ["-" * len(file2[i][0])]
if i not in file2.keys():
file2[i] = ["-" * len(file1[i][0])]
finaldict[i] = file1[i] + file2[i]
with open("output.txt", 'w') as file: # output that in file
for k, i in finaldict.items(): # named output.txt
file.write(">{}\n{}\n".format(k, "".join(i))) #proper formatting
It's pretty hard to comment and explain it completely, and it might not help you, but this is better than nothing :P
I used Ɓukasz Rogalski's code from answer to Reading a fasta file format into Python dict.

How to get os.system() output as a string and not a set of characters? [duplicate]

This question already has an answer here:
How can I make a for-loop loop through lines instead of characters in a variable?
(1 answer)
Closed 6 years ago.
I'm trying to get output from os.system using the following code:
p = subprocess.Popen([some_directory], stdout=subprocess.PIPE, shell=True)
ls = p.communicate()[0]
when I print the output I get:
> print (ls)
file1.txt
file2.txt
The output somehow displays as two separate strings, However, when I try to print out the strings of filenames using a for loop i get
a list of characters instead:
>> for i in range(len(ls)):
> print i, ls[i]
Output:
0 f
1 i
2 l
3 e
4 1
5 .
6 t
7 x
8 t
9 f
10 i
11 l
12 e
13 2
14 .
15 t
16 x
17 t
I need help ensuring the os.system() output returns as strings and
not a set of characters.
p.communicate returns a string. It may look like a list of filenames, but it is just a string. You can convert it to a list of filenames by splitting on the newline character:
s = p.communicate()[0]
for line in s.split("\n"):
print "line:", line
Are you aware that there are built-in functions to get a list of files in a directory?
for i in range(len(...)): is usually a code smell in Python. If you want to iterate over the numbered elements of a collection to canonical method is for i, element in enumerate(...):.
The code you quote clearly isn't the code you ran, since when you print ls you see two lines separated by a newline, but when you iterate over the characters of the string the newline doesn't appear.
The bottom line is that you are getting a string back from communicate()[0], but you are then iterating over it, giving you the individual characters. I suspect what you would like to do is use the .splt() or .splitlines() method on ls to get the individual file names, but you are trying to run before you can walk. Forst of all, get a clear handle on what the communicate method is returning to you.
Apparently, in Python 3.6, p.communicate returns bytes object:
In [16]: type(ls)
Out[16]: bytes
Following seems to work better:
In [22]: p = subprocess.Popen([some_directory], stdout=subprocess.PIPE, shell=True)
In [23]: ls = p.communicate()[0].split()
In [25]: for i in range(len(ls)):
...: print(i, ls[i])
...:
0 b'file1.txt'
1 b'file2.txt'
But I would rather use os.listdir() instead of subprocess:
import os
for line in os.listdir():
print line

matching and dispalying specific lines through python

I have 15 lines in a log file and i want to read the 4th and 10 th line for example through python and display them on output saying this string is found :
abc
def
aaa
aaa
aasd
dsfsfs
dssfsd
sdfsds
sfdsf
ssddfs
sdsf
f
dsf
s
d
please suggest through code how to achieve this in python .
just to elaborate more on this example the first (string or line is unique) and can be found easily in logfile the next String B comes within 40 lines of the first one but this one occurs at lots of places in the log file so i need to read this string withing the first 40 lines after reading string A and print the same that these strings were found.
Also I cant use with command of python as this gives me errors like 'with' will become a reserved keyword in Python 2.6. I am using Python 2.5
You can use this:
fp = open("file")
for i, line in enumerate(fp):
if i == 3:
print line
elif i == 9:
print line
break
fp.close()
def bar(start,end,search_term):
with open("foo.txt") as fil:
if search_term in fil.readlines()[start,end]:
print search_term + " has found"
>>>bar(4, 10, "dsfsfs")
"dsfsfs has found"
#list of random characters
from random import randint
a = list(chr(randint(0,100)) for x in xrange(100))
#look for this
lookfor = 'b'
for element in xrange(100):
if lookfor==a[element]:
print a[element],'on',element
#b on 33
#b on 34
is one easy to read and simple way to do it. Can you give part of your log file as an example? There are other ways that may work better :).
after edits by author:
The easiest thing you can do then is:
looking_for = 'findthis' i = 1 for line in open('filename.txt','r'):
if looking_for == line:
print i, line
i+=1
it's efficient and easy :)

Categories

Resources