I want to use an IP address string, ie: 192.168.1.23 but only keep the first three bytes of the IP address and then append 0-255. I want to transform that IP address into a range of IP address' I can pass to NMAP to conduct a sweep scan.
The easiest solution of course is to simply trim off the last two characters of the string, but of course this won't work if the IP is 192.168.1.1 or 192.168.1.123
Here is the solution I came up with:
lhost = "192.168.1.23"
# Split the lhost on each '.' then re-assemble the first three parts
lip = self.lhost.split('.')
trange = ""
for i, val in enumerate(lip):
if (i < len(lip) - 1):
trange += val + "."
# Append "0-255" at the end, we now have target range trange = "XX.XX.XX.0-255"
trange += "0-255"
It works fine but feels ugly and not efficient to me. What is a better way to do this?
You could use the rfind function of string object.
>>> lhost = "192.168.1.23"
>>> lhost[:lhost.rfind(".")] + ".0-255"
'192.168.1.0-255'
The rfind function is similar with find() but searching from the end.
rfind(...)
S.rfind(sub [,start [,end]]) -> int
Return the highest index in S where substring sub is found,
such that sub is contained within S[start:end]. Optional
arguments start and end are interpreted as in slice notation.
Return -1 on failure.
A more complicate solution could use regular express as:
>>> import re
>>> re.sub("\d{1,3}$","0-255",lhost)
'192.168.1.0-255'
Hope it be helpful!
You could split and get the first three values, join by a '.', and then add ".0-255"
>>> lhost = "192.168.1.23"
>>> '.'.join(lhost.split('.')[0:-1]) + ".0-255"
'192.168.1.0-255'
>>>
Not all IPs belong to class C. I think that the code must be flexible to accommodate various IP ranges and their masks,
I had previously written a tiny python module to calculate network ID< broadcast ID for a given IP address with any network mask.
code can be found here : https://github.com/brownbytes/tamepython/blob/master/subnet_calculator.py
I think networkSubnet() and hostRange() are functions which can be of some help to you.
I like this:
#!/usr/bin/python3
ip_address = '128.200.34.1'
list_ = ip_address.split('.')
assert len(list_) == 4
list_[3] = '0-255'
print('.'.join(list_))
Related
I was wondering if there's a good way to find the next available gap to create a network block given a list of existing ones?
For example, I have these networks in my list:
[
'10.0.0.0/24',
'10.0.0.0/20',
'10.10.0.0/20',
]
and then someone comes along and ask: "Do you have have enough space for 1 /22 for me?"
I'd like to be able to suggest something along the line:
"Here's a space: x.x.x.x/22" (x.x.x.x is something that comes before 10.0.0.0)
or
"Here's a space: x.x.x.x/22" (x.x.x.x is something in between 10.0.0.255 and 10.10.0.0)
or
"Here's a space: x.x.x.x/22" (x.x.x.x is something that comes after 10.10.15.255)
I'd really appreciate any suggestions.
The ipaddress library is good for this sort of use case. You can use the IPv4Network class to define subnet ranges, and the IPv4Address objects it can return can be converted into integers for comparison.
What I do below:
Establish your given list as a list of IPv4Networks
Determine the size of the block we're looking for
Iterate through the list, computing the amount of space between consecutive blocks, and checking if our wanted block fits.
You could also return an IPv4Network with the subnet built into it, instead of an IPv4Address, but I'll leave that as an exercise to the reader.
from ipaddress import IPv4Network, IPv4Address
networks = [
IPv4Network('10.0.0.0/24')
IPv4Network('10.0.0.0/20')
IPv4Network('10.0.10.0/20')
]
wanted = 22
wanted_size = 2 ** (32 - wanted) # number of addresses in a /22
space_found = None
for i in range(1, len(networks):
previous_network_end = int(networks[i-1].network_address + int(networks[i-1].hostmask))
next_network_start = int(networks[i].network_address)
free_space_size = next_network_start - previous_network_end
if free_space_size >= wanted_size:
return IPv4Address(networks[i-1] + 1) # first available address
I am running the below function in python (3x) to generate a random IP Address
def get_ip_address(self):
x = ".".join(map(str, (random.randint(0, 255)
for _ in range(4))))
return x
However I need to convert the IP address generated into a hex value, I dont need to do anything complicated and am happy to either convert x post creation or do it in the create Ip address function. I need to be able to easily see what the IP address is I am creating and converting as this is part of a test Suite and at the other end the Hex IP Address is converted back. I also need it in the format 0xb15150ca
You're complicating things. Just take the IP address as an int and convert it into hex:
# Generate a random
>>> hex(random.randint(0,(1<<32)-1))
'0x85c90851'
>>> hex(random.randint(0,(1<<32)-1))
'0xfb4f592d'
If you always wish for it to be exactly 8 hex digits, and strip the 0x up front, you may as well format it straight like this:
>>> "{:0X}".format(random.randint(0,(1<<32)-1))
'4CC27A5E'
If you wish to know the IP, use the ipaddress module like so:
import ipaddress
>>> ip = random.randint(0,(1<<32)-1)
>>> ipaddress.ip_address(ip)
IPv4Address('238.53.246.162')
>>> "{:0X}".format(ip)
'EE35F6A2'
You can extend you function as follows:
def get_ip_address():
x = ".".join(map(str, (random.randint(0, 255)
for _ in range(4))))
# Split you created decimal IP in single numbers
ip_split_dec = str(x).split('.')
# Convert to hex
ip_split_hex = [int(t) for t in ip_split_dec]
# Construct hex IP adress
# ip_hex = '.'.join([hex(X)[2:] for X in ip_split_hex]) # Format hh.hh.hh.hh
ip_hex = '0x' + ''.join([hex(X)[2:] for X in ip_split_hex]) # Format 0xhhhhhhhh
return ip_hex
which will give you
>>> address = get_ip_address()
>>> 0xa535f08b
You can also combine this with the construction of your decimal IP to spare some code lines
Btw: As long as your function is no method of a class, theres is no need for the self in your function definition
I am cleaning a data set with fraudulent email addresses that I am removing.
I established multiple rules for catching duplicates and fraudulent domains. But there is one screnario, where I can't think of how to code a rule in python to flag them.
So I have for example rules like this:
#delete punction
df['email'].apply(lambda x:''.join([i for i in x if i not in string.punctuation]))
#flag yopmail
pattern = "yopmail"
match = df['email'].str.contains(pattern)
df['yopmail'] = np.where(match, 'Y', '0')
#flag duplicates
df['duplicate']=df.email.duplicated(keep=False)
This is the data where I can't figure out a rule to catch it. Basically I am looking for a way to flag addresses that start the same way, but then have consecutive numbers in the end.
abc7020#gmail.com
abc7020.1#gmail.com
abc7020.10#gmail.com
abc7020.11#gmail.com
abc7020.12#gmail.com
abc7020.13#gmail.com
abc7020.14#gmail.com
abc7020.15#gmail.com
attn1#gmail.com
attn12#gmail.com
attn123#gmail.com
attn1234#gmail.com
attn12345#gmail.com
attn123456#gmail.com
attn1234567#gmail.com
attn12345678#gmail.com
My solution isn't efficient, nor pretty. But check it out and see if it works for you #jeangelj. It definitely works for the examples you provided. Good luck!
import os
from random import shuffle
from difflib import SequenceMatcher
emails = [... ...] # for example the 16 email addresses you gave in your question
shuffle(emails) # everyday i'm shuffling
emails = sorted(emails) # sort that shit!
names = [email.split('#')[0] for email in emails]
T = 0.7 # <- set your string similarity threshold here!!
split_indices=[]
for i in range(1,len(emails)):
if SequenceMatcher(None, emails[i], emails[i-1]).ratio() < T:
split_indices.append(i) # we want to remember where dissimilar email address occurs
grouped=[]
for i in split_indices:
grouped.append(emails[:i])
grouped.append(emails[i:])
# now we have similar email addresses grouped, we want to find the common prefix for each group
prefix_strings=[]
for group in grouped:
prefix_strings.append(os.path.commonprefix(group))
# finally
ham=[]
spam=[]
true_ids = [names.index(p) for p in prefix_strings]
for i in range(len(emails)):
if i in true_ids:
ham.append(emails[i])
else:
spam.append(emails[i])
In [30]: ham
Out[30]: ['abc7020#gmail.com', 'attn1#gmail.com']
In [31]: spam
Out[31]:
['abc7020.10#gmail.com',
'abc7020.11#gmail.com',
'abc7020.12#gmail.com',
'abc7020.13#gmail.com',
'abc7020.14#gmail.com',
'abc7020.15#gmail.com',
'abc7020.1#gmail.com',
'attn12345678#gmail.com',
'attn1234567#gmail.com',
'attn123456#gmail.com',
'attn12345#gmail.com',
'attn1234#gmail.com',
'attn123#gmail.com',
'attn12#gmail.com']
# THE TRUTH YALL!
You can use a regular expression to do this; example below:
import re
a = "attn12345#gmail.comf"
b = "abc7020.14#gmail.com"
c = "abc7020#gmail.com"
d = "attn12345678#gmail.com"
pattern = re.compile("[0-9]{3,500}\.?[0-9]{0,500}?#")
if pattern.search(a):
print("spam1")
if pattern.search(b):
print("spam2")
if pattern.search(c):
print("spam3")
if pattern.search(d):
print("spam4")
If you run the code you will see:
$ python spam.py
spam1
spam2
spam3
spam4
The benefit to this method is that its standardized (regular expressions) and that you can adjust the strength of the match easily by adjusting the values within {}; which means you can have a global configuration file where you set/adjust the values. You can also adjust the regular expression easily without having to rewrite code.
First take a look at regexp question here
Second, try to filter email address like that:
# Let's email is = 'attn1234#gmail.com'
email = 'attn1234#gmail.com'
email_name = email.split(',', maxsplit=1)[0]
# Here you get email_name = 'attn1234
import re
m = re.search(r'\d+$', email_name)
# if the string ends in digits m will be a Match object, or None otherwise.
if m is not None:
print ('%s is good' % email)
else:
print ('%s is BAD' % email)
You could pick a diff threshold using edit distance (aka Levenshtein distance). In python:
$pip install editdistance
$ipython2
>>> import editdistance
>>> threshold = 5 # This could be anything, really
>>> data = ["attn1#gmail.com...", ...]# set up data to be the set you gave
>>> fraudulent_emails = set([email for email in data for _ in data if editdistance.eval(email, _) < threshold])
If you wanted to be smarter about it, you could run through the resulting list and, instead of turning it into a set, keep track of how many other email addresses it was near - then use that as a 'weight' to determine fake-ness.
This gets you not only the given cases (where the fraudulent addresses all share a common start and differ only in numerical suffix, but additionally number or letter padding eg at the beginning or in the middle of an email address.
ids = [s.split('#')[0] for s in email_list]
det = np.zeros((len(ids), len(ids)), dtype=np.bool)
for i in range(len(ids)):
for j in range(i + 1, len(ids)):
mi = ids[i]
mj = ids[j]
if len(mj) == len(mi) + 1 and mj.startswith(mi):
try:
int(mj[-1])
det[j,i] = True
det[i,j] = True
except:
continue
spam_indices = np.where(np.sum(det, axis=0) != 0)[0].tolist()
Here's one way to approach it, that should be pretty efficient.
We do it by grouping the email address in lengths, so that we only need to check if each email address matches the level down, by a slice and set membership check.
The code:
First, read in the data:
import pandas as pd
import numpy as np
string = '''
abc7020#gmail.com
abc7020.1#gmail.com
abc7020.10#gmail.com
abc7020.11#gmail.com
abc7020.12#gmail.com
abc7020.13#gmail.com
abc7020.14#gmail.com
abc7020.15#gmail.com
attn1#gmail.com
attn12#gmail.com
attn123#gmail.com
attn1234#gmail.com
attn12345#gmail.com
attn123456#gmail.com
attn1234567#gmail.com
attn12345678#gmail.com
foo123#bar.com
foo1#bar.com
'''
x = pd.DataFrame({'x':string.split()})
#remove duplicates:
x = x[~x.x.duplicated()]
We strip off the #foo.bar part, and then filer to only those that end with a number, and add on a 'length' column:
#split on #, expand means into two columns
emails = x.x.str.split('#', expand = True)
#filter by last in string is a digit
emails = emails.loc[:,emails.loc[:,0].str[-1].str.isdigit()]
#add a length of email column for the next step
emails['lengths'] = emails.loc[:,0].str.len()
Now, all we have to do, is take each length, and length -1, and see if the length. with it's last character dropped, appears in a set of the n-1 lengths (and, we have to check if the opposite is true, in case it is the shortest repeat):
#unique lengths to check
lengths = emails.lengths.unique()
#mask to hold results
mask = pd.Series([0]*len(emails), index = emails.index)
#for each length
for j in lengths:
#we subset those of that length
totest = emails['lengths'] == j
#and those who might be the shorter version
against = emails['lengths'] == j -1
#we make a set of unique values, for a hashed lookup
againstset = set([i for i in emails.loc[against,0]])
#we cut off the last char of each in to test
tests = emails.loc[totest,0].str[:-1]
#we check matches, by checking the set
mask = mask.add(tests.apply(lambda x: x in againstset), fill_value = 0)
#viceversa, otherwise we miss the smallest one in the group
againstset = set([i for i in emails.loc[totest,0].str[:-1]])
tests = emails.loc[against,0]
mask = mask.add(tests.apply(lambda x: x in againstset), fill_value = 0)
The resulting mask can be converted to boolean, and used to subset the original (deduplicated) dataframe, and the indices should match the original indices to subset like that:
x.loc[~mask.astype(bool),:]
x
0 abc7020#gmail.com
16 foo123#bar.com
17 foo1#bar.com
You can see that we have not removed your first value, as the '.' means it did not match - you can remove the punctuation first.
I have an idea on how to solve this:
fuzzywuzzy
Create a set of unique emails, for-loop over them and compare them with fuzzywuzzy.
Example:
from fuzzywuzzy import fuzz
for email in emailset:
for row in data:
emailcomp = re.search(pattern=r'(.+)#.+',string=email).groups()[0]
rowemail = re.search(pattern=r'(.+)#.+',string=row['email']).groups()[0]
if row['email']==email:
continue
elif fuzz.partial_ratio(emailcomp,rowemail)>80:
'flagging operation'
I took some liberties with how the data is represented, but I feel the variable names are mnemonic enough for you to understand what I am getting at. It is a very rough piece of code, in that I have not thought through how to stop repetitive flagging.
Anyways, the elif part compares the two email addresses without #gmail.com (or any other email e.g. #yahoo.com), if the ratio is above 80 (play around with this number) use your flagging operation.
For example:
fuzz.partial_ratio("abc7020.1", "abc7020")
100
In my Python application I have an array of IP address strings which looks something like this:
[
"50.28.85.81-140", // Matches any IP address that matches the first 3 octets, and has its final octet somewhere between 81 and 140
"26.83.152.12-194" // Same idea: 26.83.152.12 would match, 26.83.152.120 would match, 26.83.152.195 would not match
]
I installed netaddr and although the documentation seems great, I can't wrap my head around it. This must be really simple - how do I check if a given IP address matches one of these ranges? Don't need to use netaddr in particular - any simple Python solution will do.
The idea is to split the IP and check every component separately.
mask = "26.83.152.12-192"
IP = "26.83.152.19"
def match(mask, IP):
splitted_IP = IP.split('.')
for index, current_range in enumerate(mask.split('.')):
if '-' in current_range:
mini, maxi = map(int,current_range.split('-'))
else:
mini = maxi = int(current_range)
if not (mini <= int(splitted_IP[index]) <= maxi):
return False
return True
Not sure this is the most optimal, but this is base python, no need for extra packages.
parse the ip_range, creating a list with 1 element if simple value, and a range if range. So it creates a list of 4 int/range objects.
then zip it with a split version of your address and test each value in range of the other
Note: Using range ensures super-fast in test (in Python 3) (Why is "1000000000000000 in range(1000000000000001)" so fast in Python 3?)
ip_range = "50.28.85.81-140"
toks = [[int(d)] if d.isdigit() else range(int(d.split("-")[0]),int(d.split("-")[1]+1)) for d in ip_range.split(".")]
print(toks) # debug
for test_ip in ("50.28.85.86","50.284.85.200","1.2.3.4"):
print (all(int(a) in b for a,b in zip(test_ip.split("."),toks)))
result (as expected):
[[50], [28], [85], range(81, 140)]
True
False
False
I would like to increment an ip address by a fixed value.
Precisely this is what I am trying to achieve, I have an ip address say, 192.168.0.3 and I want to increment it by 1 which would result in 192.168.0.4 or even by a fixed value, x so that it will increment my ip address by that number. so, I can have a host like 192.168.0.3+x.
I just want to know if any modules already exist for this conversion.
I tried socket.inet_aton and then socket.inet_ntoa, but I don't know how to get that working properly. Need some help or advice on that.
In Python 3:
>>> import ipaddress
>>> ipaddress.ip_address('192.168.0.4') # accept both IPv4 and IPv6 addresses
IPv4Address('192.168.0.4')
>>> int(_)
3232235524
>>> ipaddress.ip_address('192.168.0.4') + 256
IPv4Address('192.168.1.4')
In reverse:
>>> ipaddress.ip_address(3232235524)
IPv4Address('192.168.0.4')
>>> str(_)
'192.168.0.4'
>>> ipaddress.ip_address('192.168.0.4') -1
IPv4Address('192.168.0.3')
Python 2/3
You could use struct module to unpack the result of inet_aton() e.g.,
import struct, socket
# x.x.x.x string -> integer
ip2int = lambda ipstr: struct.unpack('!I', socket.inet_aton(ipstr))[0]
print(ip2int("192.168.0.4"))
# -> 3232235524
In reverse:
int2ip = lambda n: socket.inet_ntoa(struct.pack('!I', n))
print(int2ip(3232235525))
# -> 192.168.0.5
From python 3.4 onwards:
>>> import ipaddress
>>> a = ipaddress.IPv4Address('192.168.0.1')
>>> a+500
IPv4Address('192.168.1.245')
>>> a = ipaddress.IPv6Address('2001:1900:2254:206a::50:0')
>>> a+200
IPv6Address('2001:1900:2254:206a::50:c8')
>>>
There's a module that makes this and other tasks very easy: pip install iptools.
In [1]: import iptools
In [3]: iptools.ip2long('127.0.0.1')
Out[3]: 2130706433
In [4]: p = iptools.ip2long('127.0.0.1') + 1
In [6]: iptools.long2ip(p)
Out[6]: '127.0.0.2'
Convert the last part of your IP address into a number, add 1 to it, and call ifconfig.
I think the approach of incrementing the last bit will not scale well as we span across networks. –OP
I thought of mentioning that in my original answer, but didn't, for various reasons. These reasons are as follows:
I thought it is unlikely you would need to do this, and could not guess why you'd want to.
Even if you did need to do this, you could just parse the second-to-last number.
This is only valid for those bits where the netmask is 0.
You also have to worry about "special" reserved IP ranges, such as 192.168.etc.etc. Also hex doublets with 0 and possibly ff/255 have special meaning. There are different rules in IPv6.
It might be quicker to just use simple addition and iteration, something like:
ip = [192,168,0,0]
ip_dict = {}
ip_list = []
for i in range(100):
new_ip = ip[3]+=1
ip_dict[i]=new_ip
ip_list.append(new_ip)
EDIT: This is buggy and shouldn't be used as is.
I would use ipaddr for this
>>> import ipaddr
>>> a = ipaddr.IPAddress('192.168.0.3')
>>> a
IPv4Address('192.168.0.3')
>>> a + 1
IPv4Address('192.168.0.4')
The library ipcalc has routines to make math on ip addresses fairly easy. As an example an iterator for an address range can be done like:
Code:
import ipcalc
network = ipcalc.Network('10.1.0.0/16')
host_first = network.host_first()
addresses = (host_first + i for i in range(network.size()-2))
Test Code:
print(next(addresses))
print(next(addresses))
print(next(addresses))
print(max(list(addresses)))
Results:
10.1.0.1
10.1.0.2
10.1.0.3
10.1.255.254
def FunIncrementIp(IPADDRESS,IPADDRESSES):
#import the ipaddress module and also check whether it is an ipv6 or ipv4
import ipaddress
if ':' in IPADDRESS:
IPADDRESSMOD = ipaddress.IPv6Address(IPADDRESS)
print ('this is ipv6 address')
else:
IPADDRESSMOD = ipaddress.IPv4Address(IPADDRESS)
print ('this is ipv4 address')
IPADDRESSES = int(c)
IPADDRESSES = IPADDRESSMOD+IPADDRESSES
while IPADDRESSMOD < IPADDRESSES:
IPADDRESSMOD += 1
print(IPADDRESSMOD)
This should do it.
FunIncrementIp('1.1.1.1','10')
This will increment your ipv4 addresses to 10 more
FunIncrementIp('2001:db8:0:1:1:1:1:1','10')
This will increment your ipv6 addresses to 10 more
This will also tell auto detect the type of ip address so that you don't have to have separate script for ipv4 & ipv6.