Read text file values and assign it to each value to variable - python

I want to read text file and find the words that starts with 56 in below text and pass each word to a variable and pass to a python file as a parameters.
My sample text file content -
51:40:2e:c0:01:c9:53:e8
56:c9:ce:90:4d:77:c6:03
56:c9:ce:90:4d:77:c6:07
51:40:2e:c0:01:c9:54:80
56:c9:ce:90:12:b4:19:01
56:c9:ce:90:12:b4:19:03
I like to pass to python file as
mytestfile.py var1 var2 var3
var1 should have value 56:c9:ce:90:4d:77:c6:03
var2 should have value 56:c9:ce:90:4d:77:c6:07
var3 should have value 56:c9:ce:90:12:b4:19:01
so on
I wrote code something like below but not working
[#var1 = "51:40:2e:c0:01:c9:53:e8"]
[#var2 = "51:40:2e:c0:01:c9:53:ea"]
filepath = '/root/SDFlex/work/cookbooks/ilorest/files/file.txt'
with open(filepath) as fp:
line = fp.readline()
cnt = 1
while line:
print("Line {}: {}".format(cnt, line.strip()))
line = fp.readline()
cnt += 1
execute "run create volume script" do
command "python SDFlexnimblevolcreate.py #{var1} #{node['ilorest']['Test0']} #{var2} #{node['ilorest']['Test1']}
cwd "#{platformdirectory}"
live_stream true
end
Thanks in advance

So this code prints string of required variables:
# a.py
with open('file.txt') as f:
result = ' '.join(map(lambda x: f'<{x}>',
filter(lambda x: x.startswith('56'),
map(str.strip, f))))
print(result, end='')
Output:
<56:c9:ce:90:4d:77:c6:03> <56:c9:ce:90:4d:77:c6:07> <56:c9:ce:90:12:b4:19:01> <56:c9:ce:90:12:b4:19:03>
You can pass result of this program using xargs to some another python program. For example if you have simple program which prints their arguments:
# b.py
import sys
print(sys.argv)
Than just type in shell:
python3 a.py | xargs python3 b.py
Output:
['b.py', '<56:c9:ce:90:4d:77:c6:03> <56:c9:ce:90:4d:77:c6:07> <56:c9:ce:90:12:b4:19:01> <56:c9:ce:90:12:b4:19:03>']

Are you trying to run your mytestfile.py once for each line that starts with '56:', like this:
python mytestfile.py 56:c9:ce:90:4d:77:c6:03
python mytestfile.py 56:c9:ce:90:4d:77:c6:07
python mytestfile.py 56:c9:ce:90:12:b4:19:01
Or do you want to run it one time only, giving it all values from the file as arguments, e.g., with your example data:
python mytestfile.py 56:c9:ce:90:4d:77:c6:03 56:c9:ce:90:4d:77:c6:07 56:c9:ce:90:12:b4:19:01
If it is the former (one run per matching line), the easiest way is:
grep '^56' textfile | xargs python mytestfile.py
otherwise, you could do this:
python mytestfile.py `grep '^56' textfile`
note the latter is dependent on having no spaces in the file - and is a bit dangerous if the file is very large (you could run into the command-line length limit).

Related

\n not working in the output

Hi I currently have an output of:
'root:$6$aYGtvxKp/bl6Fv2y$sdZ3FbdJYQlP8VcfFZT.Y67We5EQmqcHW4I9Gl/3pXp8v4.nu9qMIEkmOcdRuD0lBTvEtnMHosEo7OEaYgG4E0::0:99999:7:::\nbin::17110:0:99999:7:::\ndaemon::17110:0:99999:7:::\nadm::17110:0:99999:7:::\nlp::17110:0:99999:7:::\nsync::17110:0:99999:7:::\nshutdown::17110:0:99999:7:::\nhalt::17110:0:99999:7:::\nmail::17110:0:99999:7:::\noperator::17110:0:99999:7:::\ngames::17110:0:99999:7:::\nftp::17110:0:99999:7:::\nnobody::17110:0:99999:7:::\nsystemd-bus-proxy:!!:17572::::::\nsystemd-network:!!:17572::::::\ndbus:!!:17572::::::\npolkitd:!!:17572::::::\ntss:!!:17572::::::\nsshd:!!:17572::::::\npostfix:!!:17572::::::\nchrony:!!:17572::::::\funky:$1$EgZiG263$4W/wMljYzhOqnupg9cJ7W/:17599:0:99999:7:::\n'
From my code:
command = "cat /etc/shadow "
process = os.popen(command)
results = str(process.read())
I'm trying to make it look like the one in the command prompt wherein it is in a table form but for some reason when I transfer it to python it does do the new line function "\n" does not work. What is wrong with my code?
You should probably just read the file directly:
filename = '/etc/shadow'
with open(filename) as shadowfile:
content = shadowfile.read()
# or possibly lines = shadowfile.readlines()
Did you try printing the output or did you just see the contents of the results variable in the interpreter? In the latter case, the line breaks will be shown as \p, while print(results) will produce the results you expect.
l = 'root:$6$aYGtvxKp/bl6Fv2y$sdZ3FbdJYQlP8VcfFZT.Y67We5EQmqcHW4I9Gl/3pXp8v4.nu9qMIEkmOcdRuD0lBTvEtnMHosEo7OEaYgG4E0::0:99999:7:::\nbin::17110:0:99999:7:::\ndaemon::17110:0:99999:7:::\nadm::17110:0:99999:7:::\nlp::17110:0:99999:7:::\nsync::17110:0:99999:7:::\nshutdown::17110:0:99999:7:::\nhalt::17110:0:99999:7:::\nmail::17110:0:99999:7:::\noperator::17110:0:99999:7:::\ngames::17110:0:99999:7:::\nftp::17110:0:99999:7:::\nnobody::17110:0:99999:7:::\nsystemd-bus-proxy:!!:17572::::::\nsystemd-network:!!:17572::::::\ndbus:!!:17572::::::\npolkitd:!!:17572::::::\ntss:!!:17572::::::\nsshd:!!:17572::::::\npostfix:!!:17572::::::\nchrony:!!:17572::::::\funky:$1$EgZiG263$4W/wMljYzhOqnupg9cJ7W/:17599:0:99999:7:::\n'
for i in l.split('\n'):
print(i)
Output:
root:$6$aYGtvxKp/bl6Fv2y$sdZ3FbdJYQlP8VcfFZT.Y67We5EQmqcHW4I9Gl/3pXp8v4.nu9qMIEkmOcdRuD0lBTvEtnMHosEo7OEaYgG4E0::0:99999:7:::
bin::17110:0:99999:7:::
daemon::17110:0:99999:7:::
adm::17110:0:99999:7:::
lp::17110:0:99999:7:::
sync::17110:0:99999:7:::
shutdown::17110:0:99999:7:::
halt::17110:0:99999:7:::
mail::17110:0:99999:7:::
operator::17110:0:99999:7:::
games::17110:0:99999:7:::
ftp::17110:0:99999:7:::
nobody::17110:0:99999:7:::
systemd-bus-proxy:!!:17572::::::
systemd-network:!!:17572::::::
dbus:!!:17572::::::
polkitd:!!:17572::::::
tss:!!:17572::::::
sshd:!!:17572::::::
postfix:!!:17572::::::
chrony:!!:17572:::::: unky:$1$EgZiG263$4W/wMljYzhOqnupg9cJ7W/:17599:0:99999:7:::

Concatenate multiple text files of DNA sequences in Python or R?

I was wondering how to concatenate exon/DNA fasta files using Python or R.
Example files:
So far I really liked using R ape package for the cbind method, solely because of the fill.with.gaps=TRUE attribute. I really need gaps inserted when a species is missing an exon.
My code:
ex1 <- read.dna("exon1.txt", format="fasta")
ex2 <- read.dna("exon2.txt", format="fasta")
output <- cbind(ex1, ex2, fill.with.gaps=TRUE)
write.dna(output, "Output.txt", format="fasta")
Example:
exon1.txt
>sp1
AAAA
>sp2
CCCC
exon2.txt
>sp1
AGG-G
>sp2
CTGAT
>sp3
CTTTT
Output file:
>sp1
AAAAAGG-G
>sp2
CCCCCTGAT
>sp3
----CTTTT
So far I am having trouble trying to apply this technique when I have multiple exon files (trying to figure out a loop to open and execute the cbind method for all files ending with .fa in the directory), and sometimes not all files have exons that are all identical in length - hence DNAbin stops working.
So far I have:
file_list <- list.files(pattern=".fa")
myFunc <- function(x) {
for (file in file_list) {
x <- read.dna(file, format="fasta")
out <- cbind(x, fill.with.gaps=TRUE)
write.dna(out, "Output.txt", format="fasta")
}
}
However when I run this and I check my output text file, it misses many exons and I think that is because not all files have the same exon length... or my script is failing somewhere and I can't figure it out: (
Any ideas? I can also try Python.
If you prefer using Linux one liners you have
cat exon1.txt exon2.txt > outfile
if you want only the unique records from the outfile use
awk '/^>/{f=!d[$1];d[$1]=1}f' outfile > sorted_outfile
I just came out with this answer in Python 3:
def read_fasta(fasta): #Function that reads the files
output = {}
for line in fasta.split("\n"):
line = line.strip()
if not line:
continue
if line.startswith(">"):
active_sequence_name = line[1:]
if active_sequence_name not in output:
output[active_sequence_name] = []
continue
sequence = line
output[active_sequence_name].append(sequence)
return output
with open("exon1.txt", 'r') as file: # read exon1.txt
file1 = read_fasta(file.read())
with open("exon2.txt", 'r') as file: # read exon2.txt
file2 = read_fasta(file.read())
finaldict = {} #Concatenate the
for i in list(file1.keys()) + list(file2.keys()): #both files content
if i not in file1.keys():
file1[i] = ["-" * len(file2[i][0])]
if i not in file2.keys():
file2[i] = ["-" * len(file1[i][0])]
finaldict[i] = file1[i] + file2[i]
with open("output.txt", 'w') as file: # output that in file
for k, i in finaldict.items(): # named output.txt
file.write(">{}\n{}\n".format(k, "".join(i))) #proper formatting
It's pretty hard to comment and explain it completely, and it might not help you, but this is better than nothing :P
I used Ɓukasz Rogalski's code from answer to Reading a fasta file format into Python dict.

bash 4.4 inside python with os.system

I have problems with running a bash script inside a python script script.py:
import os
bashCommand = """
sed "s/) \['/1, color=\"#ffcccc\", label=\"/g" list.txt | sed 's/\[/ GraphicFeature(start=/g' | sed 's/\:/, end=/g' | sed 's/>//g' | sed 's/\](/, strand=/g' | sed "s/'\]/\"),/g" >list2.txt"""
os.system("bash %s" % bashCommand)
When I run this as python script.py, no list2.txt is written, but on the terminal I see that I am inside bash-4.4 instead of the native macOS bash.
Any ideas what could cause this?
The script I posted above is part of a bigger script, where first it reads in some file and outputs list.txt.
edit: here comes some more description
In a first python script, I parsed a file (genbank file, to be specific), to write out a list with items (location, strand, name) into list.txt.
This list.txt has to be transformed to be parsable by a second python script, therefore the sed.
list.txt
[0:2463](+) ['bifunctional aspartokinase/homoserine dehydrogenase I']
[2464:3397](+) ['Homoserine kinase']
[3397:4684](+) ['Threonine synthase']
all the brackets, :, ' have to be replaced to look like desired output list2.txt
GraphicFeature(start=0, end=2463, strand=+1, color="#ffcccc", label="bifunctional aspartokinase/homoserine dehydrogenase I"),
GraphicFeature(start=2464, end=3397, strand=+1, color="#ffcccc", label="Homoserine kinase"),
GraphicFeature(start=3397, end=4684, strand=+1, color="#ffcccc", label="Threonine synthase"),
Read the file in Python, parse each line with a single regular expression, and output an appropriate line constructed from the captured pieces.
import re
import sys
# 1 2 3
# --- --- --
regex = re.compile(r"^\[(\d+):(\d+)\]\(\+\) \['(.*)'\]$")
# 1 - start value
# 2 - end value
# 3 - text value
with open("list2.txt", "w") as out:
for line in sys.stdin:
line = line.strip()
m = regex.match(line)
if m is None:
print(line, file=out)
else:
print('GraphicFeature(start={}, end={}, strand=+1, color="#ffcccc", label="{}"),'.format(*m.groups()), file=out)
I output lines that don't match the regular expression unmodified; you may want to ignore them altogether or report an error instead.

I need to replace a string in one file using key value paris from another file

I have a single attribute file that has two columns. The string in column 1 matches the string in the files that need to be changed. The string in file 2 needs to be the string in file 1 column 2.
I'm not sure the best way to approach this sed? awk? There is only a single file 1 that has every key and value pair, they are all unique. There are over 10,000 File 2, that are each different but have the same format, that I would need to change from the numbers to the names. Every number in any of the File 2's will be in File 1.
File 1
1000079541 ALBlai_CCA27168
1000079542 ALBlai_CCA27169
1000082614 PHYsoj_128987
1000082623 PHYsoj_128997
1000112581 PHYcap_Phyca_508162
1000112588 PHYcap_Phyca_508166
1000112589 PHYcap_Phyca_508170
1000112592 PHYcap_Phyca_549547
1000120087 HYAara_HpaP801280
1000134210 PHYinf_PITG_01218T0
1000134213 PHYinf_PITG_01223T0
1000134221 PHYinf_PITG_01231T0
1000144497 PHYinf_PITG_13921T0
1000153541 PYTultPYU1_T002777
1000162512 PYTultPYU1_T013706
1000163504 PYTultPYU1_T014907
1000168326 PHYram_79731
1000168327 PHYram_79730
1000168332 PHYram_79725
1000168335 PHYram_79722
...
File 2
(1000079542:0.60919245567850022205,((1000162512:0.41491233674846345059,(1000153541:0.39076742568979516701,1000163504:0.52813999143574519302):0.14562273102476630537):0.28880212838980307000,(((1000144497:0.20364901110426453235,1000168327:0.22130795712572320921):0.35964649479701132906,((1000120087:0.34990382691181332042,(1000112588:0.08084123331549526725,(1000168332:0.12176200773214326811,1000134213:0.09481932223544080329):0.00945982345360765406):0.01846847662360769429):0.19758412044470402558,((1000168326:0.06182031367986642878,1000112589:0.07837371928562210377):0.03460740736793390532,(1000134210:0.13512192366876615846,(1000082623:0.13344777464787777044,1000112592:0.14943677128375676411):0.03425386814075986885):0.05235436818005634318):0.44112430521695145114):0.21763784827666701749):0.22507080810857052477,(1000112581:0.02102132893524749635,(1000134221:0.10938436290969000275,(1000082614:0.05263067805665807425,1000168335:0.07681947209386902342):0.03562545894572662769):0.02623229853693959113):0.49114147006852687527):0.23017851954961116023):0.64646763541457552549,1000079541:0.90035900920746847476):0.0;
Desired Result
(ALBlai_CCA27169:0.60919245567850022205,((PYTultPYU1_T013706:0.41491233674846345059, ...
Python:
import re
# Build a dictionary of replacements:
with open('File 1') as f:
repl = dict(line.split() for line in f)
# Read in the file and make the replacements:
with open('File 2') as f:
data = f.read()
data = re.sub(r'(\d+):',lambda m: repl[m.group(1)]+':',data)
# Write it back out:
with open('File 2','w') as f:
f.write(data)
Full running awk solution. Hope it helps.
awk -F":" 'BEGIN {
while (getline < "file1")
{
split($0,dat," ");
a[dat[1]]=dat[2];
}
}
{
gsub(substr($1,2,length($1)),a[substr($1,2,length($1))],$0); print
}' file2
I'll do something like that in bash:
while read -r key value
do
echo s/($key:/($value:/g >> sedtmpfile
done < file1
sed -f sedtmpfile file2 > result
rm sedtmpfile

grep in python properly

I am used to do scripting in bash, but I am also learning python.
So, as a way of learning, I am trying to modify my few old bash in python. As, say,I have a file, with line like:
TOTDOS= 0.38384E+02n_Ef= 0.81961E+02 Ebnd 0.86883E+01
to get the value of TOTDOS in bash, I just do:
grep "TOTDOS=" 630/out-Dy-eos2|head -c 19|tail -c 11
but by python, I am doing:
#!/usr/bin/python3
import re
import os.path
import sys
f1 = open("630/out-Dy-eos2", "r")
re1 = r'TOTDOS=\s*(.*)n_Ef=\s*(.*)\sEbnd'
for line in f1:
match1 = re.search(re1, line)
if match1:
TD = (match1.group(1))
f1.close()
print(TD)
Which is surely giving correct result, but seems to be much more then bash(not to mention problem with regex).
Question is, am I overworking in python, or missing something of it?
A python script that matches your bash line would be more like this:
with open('630/out-Dy-eos2', 'r') as f1:
for line in f1:
if "TOTDOS=" in line:
print line[8:19]
Looks a little bit better now.
[...] but seems to be much more than bash
Maybe (?) generators are the closest Python concept to the "pipe filtering" used in shell.
import itertools
#
# Simple generator to iterate through a file
# equivalent of line by line reading from an input file
def source(fname):
with open(fname,"r") as f:
for l in f:
yield l
src = source("630/out-Dy-eos2")
# First filter to keep only lines containing the required word
# equivalent to `grep -F`
filter1 = (l for l in src if "TOTDOS=" in l)
# Second filter to keep only line in the required range
# equivalent of `head -n ... | tail -n ...`
filter2 = itertools.islice(filter1, 10, 20,1)
# Finally output
output = "".join(filter2)
print(output)
Concerning your specific example, if you need it, you could use regexp in a generator:
re1 = r'TOTDOS=\s*(.*)n_Ef=\s*(.*)\sEbnd'
filter1 = (m.group(1) for m in (re.match(re1, l) for l in src) if m)
Those are only (some of the) basic building blocs available to you.

Categories

Resources