Python Classify commands - python

I am writing a python script to classify ip countries as they are in another file .. for example .. I have 2 files in the script dir
IPCountries.txt contains :-
192.168.1.1 | US,
188.100.0.0 | AU,
and the file arrange.txt contains :-
0="US,CA,UK,GE,"
1="AU,EG,"
Now the script will read each line in IPCountries.txt file and take the value after "|" like the value "US" and then match it with the value in file arrange.txt and write it into a new file called 0.txt .
The problem is that i do not know how to do this but i have used some info to write the next code but i am stuck in the loop in the end of the code as u can see here ..
import re
import os
filepath = 'arrange.txt'
with open(filepath) as file:
txt = file.read()
mapping = re.findall(r'(\d+)="(.*)"', txt)
ip = open("IPCountries.txt",'r')
for line in ip:
Any help with the loop or suggestion how to do it but in the same process and files ?
Thanks

You could use something like
for line in ip:
ip, country = [e.strip() for e in line.split("|")]
country = country[:-1] # Strip off comma at the end
I'm not sure what you intend to do with this variables, but the basic extraction process could look like my example code.

Related

Python: Using a variable in a filename output [duplicate]

Sorry for this very basic question. I am new to Python and trying to write a script which can print the URL links. The IP addresses are stored in a file named list.txt. How should I use the variable in the link? Could you please help?
# cat list.txt
192.168.0.1
192.168.0.2
192.168.0.9
script:
import sys
import os
file = open('/home/list.txt', 'r')
for line in file.readlines():
source = line.strip('\n')
print source
link = "https://(source)/result”
print link
output:
192.168.0.1
192.168.0.2
192.168.0.9
https://(source)/result
Expected output:
192.168.0.1
192.168.0.2
192.168.0.9
https://192.168.0.1/result
https://192.168.0.2/result
https://192.168.0.9/result
You need to pass the actual variable, you can iterate over the file object so you don't need to use readlines and use with to open your files as it will close them automatically. You also need the print inside the loop if you want to see each line and str.rstrip() will remove any newlines from the end of each line:
with open('/home/list.txt') as f:
for ip in f:
print "https://{0}/result".format(ip.rstrip())
If you want to store all the links use a list comprehension:
with open('/home/list.txt' as f:
links = ["https://{0}/result".format(ip.rstrip()) for line in f]
For python 2.6 you have to pass the numeric index of a positional argument, i.e {0} using str.format .
You can also use names to pass to str.format:
with open('/home/list.txt') as f:
for ip in f:
print "https://{ip}/result".format(ip=ip.rstrip())
Get the link inside the loop, you are not appending data to it, you are assigning to it every time. Use something like this:
file = open('/home/list.txt', 'r')
for line in file.readlines():
source = line.strip('\n')
print source
link = "https://%s/result" %(source)
print link
Try this:
lines = [line.strip('\n') for line in file]
for source in lines:
print source
for source in lines:
link = "https://{}/result".format(source)
print link
The feature you just described is often called string interpolation.
In Python, this is called string formatting.
There are two styles of string formatting in Python: the old style and the new style.
What I've shown in the example above is the new style, in which we format with a string method named format.
While the old style uses the % operator, eg. "https://%s/result" % source
Use format specifier for string and also put the link printing section in the for loop only
something like this:
import sys
import os
file = open('/home/list.txt', 'r')
for line in file.readlines():
source = line.strip('\n')
print source
link = "https://%s/result”%source
print link
import sys
import os
file = open('/home/list.txt', 'r')
for line in file.readlines():
source = line.strip('\n')
print source
link = "https://" + str(source) + "/result”
print link

how to save your edited headers in original fasta files?

hi i'am trying to edit the header of my fasta files using seqkit and i have been able to do it but i'm not able to save it!
the command i am using to edit multiple fasta files with respect to their filename and doing it with refseq-
for i in $(find -name \genomid); do seqkit replace -p "^(.+?) (.+?)$" --replacement '{kv}' -k proid_unique *.faa; done
The directory having all my fasta files is like this-
PATH:
~/PANGENOMICS/DATA1/test
FILES in the directory:
GCF_000016305.1_ASM1630v1_protein.faa
GCF_000220485.1_ASM22048v1_protein.faa
GCF_900635735.1_32875_B01_protein.faa
proid_unique
genomid
i am finding filenames using a csv file list- genomid
GCF_900635735.1_32875_B01_protein.faa:WP_151362402.1:WP_151362402.1#0940
GCF_900635735.1_32875_B01_protein.faa:WP_151362403.1:WP_151362403.1#0940
GCF_900635735.1_32875_B01_protein.faa:WP_151362404.1:WP_151362404.1#0940
GCF_900635735.1_32875_B01_protein.faa:WP_151362405.1:WP_151362405.1#0940
GCF_900635735.1_32875_B01_protein.faa:WP_151362406.1:WP_151362406.1#0940
GCF_900635735.1_32875_B01_protein.faa:WP_151362407.1:WP_151362407.1#0940
GCF_900635735.1_32875_B01_protein.faa:WP_151362408.1:WP_151362408.1#0940
GCF_900635735.1_32875_B01_protein.faa:WP_151362409.1:WP_151362409.1#0940
GCF_900635735.1_32875_B01_protein.faa:WP_151362410.1:WP_151362410.1#0940
GCF_900635735.1_32875_B01_protein.faa:WP_151362411.1:WP_151362411.1#0940
GCF_900635735.1_32875_B01_protein.faa:WP_151362412.1:WP_151362412.1#0940
GCF_900635735.1_32875_B01_protein.faa:WP_151362413.1:WP_151362413.1#0940
GCF_900635735.1_32875_B01_protein.faa:WP_151362414.1:WP_151362414.1#0940
the file (proid_unique) i used as key-value file to edit the fasta headers look like this-
WP_151362399.1 WP_151362399.1#0940
WP_151362400.1 WP_151362400.1#0940
WP_151362401.1 WP_151362401.1#0940
WP_151362402.1 WP_151362402.1#0940
WP_151362409.1 WP_151362409.1#0940
WP_151362410.1 WP_151362410.1#0940
WP_151362411.1 WP_151362411.1#0940
WP_151362412.1 WP_151362412.1#0940
WP_151362413.1 WP_151362413.1#0940
WP_151362414.1 WP_151362414.1#0940
WP_094096600.1 WP_094096600.1#0945
WP_016530940.1 WP_016530940.1#0950
WP_000940121.1 WP_000940121.1#0951
WP_012540940.1 WP_012540940.1#0951
example of input-
>WP_151362411.1 YoaH family protein [Klebsiella pneumoniae]
MYAPQCSRSKRCFAGLPSLSHEQQQQAVERIHELMAQGISSGQAIALVAEELRATHTGEQ
IVARFEDEDEDE
>WP_151362412.1 gamma-glutamylcyclotransferase [Klebsiella pneumoniae]
MLEAIGGEWRPGYVTGTFYARGWGAAADFPGIVLDAHGPRVNGYLFLSDRLARTGPCWTT
LRRGYDRVPVEVTTDDGQQISAWIYQLQPRG
>WP_151362413.1 acid resistance repetitive basic protein Asr [Klebsiella pneumoniae]
MKKVLALVVAAAMGLSSVAFAADAASTTPSAAASHTTVHHKKHHKAAAKPAAEQKAQAAK
KHHKTAAKTGSRAESAGCKETS
>WP_151362414.1 ABC transporter permease [Klebsiella pneumoniae]
MKRAPWYLRLATWGGVIFLHFPLLIIAIYAFNTEDAAFSFPPQGLTLRWFSEAAGRSDIL
QAVTLSLKIAALSTAIALVLGTLAAGALWRSAFFGKNAVSLLLLLPIALPGIITGLALLT
AFKAVGLEPGLLTIVVGHATFCVVVVFNNVIARFRRTSWSMVEASMDLGATGWQTFRYVV
LPNLGSALLAGGMLAFALSFDEIIVTTFTAGHERTLPLWLLNQLGRPRDVPVTNVVALLV
MLVTTIPILGAWWLTRDGDSDAGNGK
example of output- expected and correct with above command
>WP_151362411.1#0940
MYAPQCSRSKRCFAGLPSLSHEQQQQAVERIHELMAQGISSGQAIALVAEELRATHTGEQ
IVARFEDEDEDE
>WP_151362412.1#0940
MLEAIGGEWRPGYVTGTFYARGWGAAADFPGIVLDAHGPRVNGYLFLSDRLARTGPCWTT
LRRGYDRVPVEVTTDDGQQISAWIYQLQPRG
>WP_151362413.1#0940
MKKVLALVVAAAMGLSSVAFAADAASTTPSAAASHTTVHHKKHHKAAAKPAAEQKAQAAK
KHHKTAAKTGSRAESAGCKETS
>WP_151362414.1#0940
MKRAPWYLRLATWGGVIFLHFPLLIIAIYAFNTEDAAFSFPPQGLTLRWFSEAAGRSDIL
QAVTLSLKIAALSTAIALVLGTLAAGALWRSAFFGKNAVSLLLLLPIALPGIITGLALLT
AFKAVGLEPGLLTIVVGHATFCVVVVFNNVIARFRRTSWSMVEASMDLGATGWQTFRYVV
LPNLGSALLAGGMLAFALSFDEIIVTTFTAGHERTLPLWLLNQLGRPRDVPVTNVVALLV
MLVTTIPILGAWWLTRDGDSDAGNGK
i am getting the required/expected result but this editing is not saving with this command, can someone help me figure out that how to save those editing in the original files bcz when i open those files again they were same as before with no edited header?
Python alternative of the above command used would also be helpful
Assuming:
The relevant files are proid_unique and *.faa files in current
directory.
We want to replace *.faa files by editing the header lines
according to the key-value pairs described in proid_unique.
We can forget about genomid file so far.
As I'm not familiar with seqkit command, here is an python alternative:
#!/usr/bin/python
import glob
import os
with open('proid_unique') as f: # open the key-value file
m = {k : v for k, v in [line.split() for line in f]}
# create a dictionary of key-value pairs to edit
for fasta in glob.glob('*.faa'):
org = fasta + '.O' # backup filename appending '.O' suffix
os.rename(fasta, org) # rename the file
with open(org) as f, open(fasta, 'w') as w:
# open files to read and write
for line in f: # process line by line
line = line.rstrip() # remove a newline character
if line.startswith('>'): # header line
header = line.split()[0] # extract the substring before a whitespace
if header[1:] in m: # if the header is a key in the dictionary
line = '>' + m[header[1:]]
# then replace the line
w.write(line + '\n') # overwrite to the fasta file
It back-ups the old *.faa files as *.faa.O.
If my assumption is incorrect, please let me know.

How to check if a block of lines has a particular keyword using python?

I am checking a text file with blocks of commands as following -
File start -
!
interface Vlan100
description XYZ
ip vrf forwarding XYZ
ip address 10.208.56.62 255.255.255.192
!
interface Vlan101
description ABC
ip vrf forwarding ABC
ip address 10.208.55.126 255.255.255.192
no ip redirects
no ip unreachables
no ip proxy-arp
!
File End
and I want to create a txt file where if in source file I am getting a pattern vrf forwarding ABC output should be interface Vlan101
as of now what I have done following script but it showing only the line which contains the pattern.
import re
f = open("output_file.txt","w") #output file to be generated
shakes = open("input_file.txt","r") #input file to read
for lines in shakes:
if re.match("(.*)ABC(.*)",lines):
f.write(lines)
f.close()
Easiest: read the file, cut where ! is, then for each of those, if there's the desired text, get the first line:
with open("input_file.txt") as r, open("output_file.txt", "w") as w:
txt = r.read()
result = [block.strip().split("\n")[0]
for block in txt.split('!')
if 'vrf forwarding ABC' in block]
w.write("\n".join(result))
Just to be clear, I imagine that you want to replace any instances of "interface Vlan101" with "vrf forwarding ABC". In this case, I had test.txt as the input file and out.txt as the output file with all the replaced instances as was needed. I used a list comprehension--with a list string method-- to replace the substrings of "interface Vlan101" with "vrf forwarding ABC".
with open("test.txt") as f:
lines = f.readlines()
new_lines = [line.replace("interface Vlan101", "vrf forwarding ABC" for line in lines]
with open("out.txt", "w") as f1:
f1.writelines(new_lines)
Hope this helps.
If you are just interested in the interface, you can do following as well.
#Read File
with open('sample.txt', 'r') as f:
lines = f.readlines()
#Capture 'interfaces'
interfaces = [i for i in lines if i.strip().startswith('inter')]
#Write it to a file
with open('output.txt', 'w') as f:
f.writelines(interfaces)
With your code you are going through the document line by line.
If you want to parse blocks (between "!"-signs) you could split the blocks into lines first (though if it's a really large document, you may need to consider something else as this will read the entire document into memory)
import re
f = open("output_file.txt","w") #output file to be generated
source = open("input_file.txt","r") #input file to read
lines = "".join(source) #creates a string from the document
shakes = lines.replace("\n","").replace("! ","\n")
# remove all newlines and create new ones from "!"-block delimiter
# retrieve all text before "vrf forwarding ABC"
finds = re.findall("(.*)vrf forwarding ABC",shakes)
# return start of line
# if the part you want is the same length in all,
# then you could use find[:17] instead of
# find to get only the beginning. otherwise you need to modify your
# regex to only take the first 2 words of the line.
for find in finds:
f.write(find)
f.close()
Alternatively, if you want to use match per line, you can do the same as above, however instead of replacing "!" with new line, you can just split it, and then use the previous code and go line by line.
Hope this helps!

error in ip2location python library

I am using ip2location Python library to find out location of corresponding ip address.I am trying to open a file containing ip address list and find out corresponding location through that.
import IP2Location;
IP2LocObj = IP2Location.IP2Location();
IP2LocObj.open("data/IP-COUNTRY-REGION-CITY-. LATITUDE-LONGITUDE-ZIPCODE-TIMEZONE-ISP-DOMAIN-NETSPEED-AREACODE-WEATHER-MOBILE-ELEVATION-USAGETYPE-SAMPLE.BIN");//This is sample database
File1=open('test_ip.txt','r');//This is file containing ipaddress
Line=File1.readline();
While line:
rec = IP2LocObj.get_all(Line);
Line=File1.readline();
print rec.country_short
This code is giving error.You can check out the sample code here http://www.ip2location.com/developers/python
Please use the following Python codes.
import IP2Location;
IP2LocObj = IP2Location.IP2Location();
IP2LocObj.open("IP-COUNTRY-REGION-CITY-LATITUDE-LONGITUDE-ZIPCODE-TIMEZONE-ISP-DOMAIN-NETSPEED-AREACODE-WEATHER-MOBILE-ELEVATION-USAGETYPE-SAMPLE.BIN"); # This is sample database
with open('test_ip.txt') as f: # file containing ip addresses
for line_terminated in f:
line = line_terminated.rstrip('\r\n'); # strip newline
if line: # non-blank lines
print line
rec = IP2LocObj.get_all(line);
print rec.country_short

Python Find script

I have a script that looks into a .txt file like this:
house.txt:
1289
534
9057
12873
(every line is meant to be a "CODE" for a product)
and it looks for a filename with that code in a given folder and copies it to another folder.
Everything works fine, except if this happens:
0001_filename_blablalba.jpg
00011 filename.jpg
000123Filename.jpg
I want to copy the file with the string "0001" but the script copies all the above because indeed they have 0001, but it's not the whole code.
Here's my script:
import subprocess
with open('CASA.txt','r') as f:
lines = [line.rstrip('\n') for line in f]
for ID in lines:
id_produto = str(ID+'*')
command = "find . -maxdepth 1 -name '%s' -exec ditto -v {} ./imagenss/ \;"%id_produto
print "A copiar: %s"%id_produto
proc = subprocess.Popen(command,shell=True,stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
Is there a simple way to do this?
You are somewhat mixing Python and shellsripting - but still you could try another filename pattern:
Instead of
id_produto = str(ID+'*')
try
id_produto = str(ID+'[!0-9]*')
This will match anything that starts with the ID followed by anything else but a number.
If you want to do a pythonic way, use the package glob for filename matching and os for copying ...
You can try to use the "split" function to split your filename string. You'll have to analyze the remaining part of the string and see if there are digits left (i.e. the ID only match a part of the complete ID of the file) or if there are no digits left (i.e. you found the full ID so you can copy the file):
completeFilename = '12345_filename.jpg'
ID = '123'
fileName = completeFilename.split(ID)[1]
if fileName[0].isdigit():
#There are some digit left, so this file should not be copied
else:
#No digits left, copy this file
It's better to use python code instead of shell command to find the file.
import os
def get_base(filename):
'get the "code" for a filename'
out=''
for char in filename:
if char.isdigit():
out+=char
else:
return out
with open('path/to/the/txt_file.txt','r') as f:
lines=f.splitlines()
files=os.listdir('path/to/the/folder')
files_dict={get_base(x):x for x in files}
for line in lines:
print('copy %s'%files_dict.get(line,None))

Categories

Resources