I have a script for creating accounts that outputs the following:
creating user in XYZ: username: testing firstName: Bob lastName:Test email:auto999#nowhere.com password:gWY6*Pja&4
So, I need to create a python script that will store the username and password in a csv file.
I tried splitting this string by spaces and colons then indexing it, but this isn't working quite properly and could fail if the message is different. Does anyone have any idea how to do this?
Regex is almost always the answer to this type of issue:
import re
text = 'creating user in XYZ: username: testing firstName: Bob lastName:Test email:auto999#nowhere.com password:gWY6*Pja&4'
pattern = '.*username:\s*(\S+)\s*firstName:\s*(\S+)\s*lastName:\s*(\S+)\s*email:\s*(\S+)\s*password:\s*(\S+)'
values = re.findall(pattern, text)
print(values)
Output:
[('testing', 'Bob', 'Test', 'auto999#nowhere.com', 'gWY6*Pja&4')]
Regexr Pattern Explanation
I don't see the need for Regex here, a simple but robust parsing is enough:
def get_data(account: str, attribute: str) -> str:
data = ' '.join(account.split()).strip()
for k, v in {' :': ':', ' : ': ':', ': ': ':'}.items():
data = data.replace(k, v)
index1 = data.find(attribute)
index2 = data.find(' ', index1)
return data[index1 + len(attribute + ':'): len(account) if index2 == -1 else index2]
example of use:
acc = "username: testing firstName: Bob lastName:Test email:auto999#nowhere.com password:gWY6*Pja&4"
print(get_data(acc, 'username'))
print(get_data(acc, 'password'))
output:
testing
gWY6*Pja&4
As the generator is yours, you can control how the accounts are created and I personally think that Regex is not easy to maintain.
This approach works even adding extra spaces or changing the order of the attributes, e.g.:
acc = " username: testing firstName: Bob lastName :Test email:auto999#nowhere.com password : gWY6*Pja&4 "
acc = "firstName: Bob username: testing email:auto999#nowhere.com password:gWY6*Pja&4 lastName:Test "
import getpass
import sys
import telnetlib
import re
import smtplib
print "Pasul 1"
HOST = "route-views.routeviews.org"
user = "rviews"
password = ""
tn = telnetlib.Telnet(HOST)
tn.read_until("login: ", 5)
tn.write(user + "\r\n")
tn.read_until("Password: ", 5)
tn.write(password + "\r\n")
print tn.read_until(">", 10)
tn.write("show ip route 192.0.2.1"+"\r\n")
y = tn.read_until("free", 10)
print y
tn.write("exit"+ "\r\n")
tn.close()
print "Pasul 2"
for x in range(1,99999):
m = re.search(' Known via "bgp xxxxx"', y)
if m:
print (m.group(0))
break
else:
print False
break
x has to be a number between 1 and 99999
If i write ' Known via "bgp 6447"' it will find and print it, but if i write ' Known via "bgp xxxxx"', it returns false. Anybody knows why?
The output is this:
route-views>
show ip route 192.0.2.1
Routing entry for 192.0.2.1/32
Known via "bgp 6447", distance 20, metric 0
Tag 19214, type external
Last update from 208.74.64.40 4w0d ago
Routing Descriptor Blocks:
208.74.64.40, from 208.74.64.40, 4w0d ago
Route metric is 0, traffic share count is 1
AS Hops 1
Route tag 19214
MPLS label: none
route-views>
You're using regexp in a totally wrong way, try changing the whole this section:
for x in range(1,99999):
m = re.search(' Known via "bgp xxxxx"', y)
if m:
print (m.group(0))
break
else:
print False
break
with following:
m = re.search(r'Known via "bgp \d{0,5}"', y)
if m:
print m.group(0)
else:
print False
And notice r before expression, it's important here.
Probably you should read this docs for python re module: https://docs.python.org/2/library/re.html
Upd. By the way, your version does not works because x inside string is interpreted as literal "x", not the value of variable x. If you want to put a variable inside a string you should use formatting like in this example:
x = 12345
print ' Known via "bgp {}"'.format(x)
It gives me 'True' if I test
>>> y = ' Known via "bgp xxxxx"'
>>> re.search('Known via "bgp xxxxx"', y)
>>> if x:
... print "yes"
...
yes
I am trying to write a login routine for a python script. In doing so, I find the need to pattern match the credentials on a whole word basis. I have attempted to RegEx this, but it is failing for reasons that are unclear to me, but I hope are obvious to someone here. The code and output:
import re
authentry = "testusertestpass"
username = "testuser"
password = "testpass"
combo = "r\'\\b"+username + password + "\\b\'"
testcred = re.search(combo, authentry)
print combo
print authentry
print testcred
r'\btestusertestpass\b'
testusertestpass
None
So my regex test appears, at least to me, to be properly formatted, and should be a direct match against the test string, but is not. Any ideas? Thanks so much for any insight!
try this: it may works.
import re
authentry = "testusertestpass with another text"
username = "testuser"
password = "testpass"
combo = username + password + r'\b'
testcred = re.search(combo, authentry)
print combo
print authentry
print testcred
output:
testusertestpass\b
testusertestpass with another text
<_sre.SRE_Match object at 0x1b8a030>
I'm new to programming and Python.
Background
My program accepts a url. I want to extract the username from the url.
The username is the subdomain.
If the subdomain is 'www', the username should be the main part of the domain. The rest of the domain should be discard (eg. '.com/', '.org/')
I've tried the following:
def get_username_from_url(url):
if url.startswith(r'http://www.'):
user = url.replace(r'http://www.', '', 1)
user = user.split('.')[0]
return user
elif url.startswith(r'http://'):
user = url.replace(r'http://', '', 1)
user = user.split('.')[0]
return user
easy_url = "http://www.httpwwwweirdusername.com/"
hard_url = "http://httpwwwweirdusername.blogger.com/"
print get_username_from_url(easy_url)
# output = httpwwwweirdusername (good! expected.)
print get_username_from_url(hard_url)
# output = weirdusername (bad! username should = httpwwwweirdusername)
I've tried many other combinations using strip(), split(), and replace().
Could you advise me on how to solve this relatively simple problem?
There is a module called urlparse that is specifically for the task:
>>> from urlparse import urlparse
>>> url = "http://httpwwwweirdusername.blogger.com/"
>>> urlparse(url).hostname.split('.')[0]
'httpwwwweirdusername'
In case of http://www.httpwwwweirdusername.com/ it would output www which is not desired. There are workarounds to ignore www part, like, for example, get the first item from the splitted hostname that is not equal to www:
>>> from urlparse import urlparse
>>> url = "http://www.httpwwwweirdusername.com/"
>>> next(item for item in urlparse(url).hostname.split('.') if item != 'www')
'httpwwwweirdusername'
>>> url = "http://httpwwwweirdusername.blogger.com/"
>>> next(item for item in urlparse(url).hostname.split('.') if item != 'www')
'httpwwwweirdusername'
Possible to do this with regular expressions (could probably modify the regex to be more accurate/efficient).
import re
url_pattern = re.compile(r'.*/(?:www.)?(\w+)')
def get_username_from_url(url):
match = re.match(url_pattern, url)
if match:
return match.group(1)
easy_url = "http://www.httpwwwweirdusername.com/"
hard_url = "http://httpwwwweirdusername.blogger.com/"
print get_username_from_url(easy_url)
print get_username_from_url(hard_url)
Which yields us:
httpwwwweirdusername
httpwwwweirdusername
I am using .rsplit() to split up all the digits in a string after the last comma using further commas. The transformations should be like this:
Before:
,000
After:
,0,0,0
I am using the following method to do this:
upl = line.rsplit(",",1)[1:]
upl2 = "{}".format(",".join(list(upl[0])))
As a comparison, to ensure that the correct substring is being selected to begin with, I am also using this statement:
upl1 = "{}".format("".join(list(upl[0])))
I then print both to ensure that they are both as expected. In this example I get:
up1 = ,000
up2 = ,0,0,0,
I then use a .replace() statement to substitute out my before substring with my after one:
new_var = ''
for line in new_var.split("\n"):
upl = line.rsplit(",",1)[1:]
upl1 = "{}".format("".join(list(upl[0])))
upl2 = "{}".format(",".join(list(upl[0])))
upl2 = str(upl2)
upl1 = str(upl1)
new_var += line.replace(upl1, upl2) + '\n'
In almost all instances of parsed data the old substring is overwritten with the new correctly. However on a few the subbed in string will display as:
,0,00 when it should be ,0,0,0,
Can anyone see anything obvious as to why this might be as I am at a bit of a loss.
Thanks
EDIT:
Here is the Scrapy code I am using to generate the data I am manipulating. The issues come from line:
new_match3g += line.replace(spl1, spl2).replace(tpl1, tpl2).replace(upl1, upl2) + '\n'
The full code is:
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import Selector
from scrapy.item import Item
from scrapy.spider import BaseSpider
from scrapy import log
from scrapy.cmdline import execute
from scrapy.utils.markup import remove_tags
import time
import re
import json
class ExampleSpider(CrawlSpider):
name = "mrcrawl2"
allowed_domains = ["whoscored.com"]
start_urls = ["http://www.whoscored.com"]
download_delay = 5
rules = [Rule(SgmlLinkExtractor(allow=('/Seasons'),deny=('/News', '/Fixtures', '/Graphics', '/Articles', '/Live', '/Matches', '/Explanations', '/Glossary', '/Players', 'ContactUs', 'TermsOfUse', 'Jobs', 'AboutUs', 'RSS'),), follow=False, callback='parse_item')]
def parse_item(self, response):
sel = Selector(response)
regex = re.compile('DataStore\.prime\(\'history\', { stageId: \d+ },\[\[.*?\]\]?\)?;', re.S)
match2g = re.search(regex, response.body)
if match2g is not None:
match3g = match2g.group()
match3g = str(match3g)
match3g = match3g.replace("'", '').replace("'", '').replace('[', '').replace(']', '').replace('] );', '')
match3g = re.sub("DataStore\.prime\(history, { stageId: \d+ },", '', match3g)
match3g = match3g.replace(');', '')
#print'-' * 170, '\n', match3g.decode('utf-8'), '-' * 170, '\n'
new_match3g = ''
for line in match3g.split("\n"):
upl = line.rsplit(",",1)[1:]
if upl:
upl1 = "{}".format("".join(list(upl[0])))
upl2 = "{}".format(",".join(list(upl[0])))
upl2 = str(upl2)
upl1 = str(upl1)
new_match3g += line.replace(upl1, upl2) + '\n'
print "UPL1 = ", upl1
print "UPL2 = ", upl2
print'-' * 170, '\n', new_match3g.decode('utf-8'), '-' * 170, '\n'
print'-' * 170, '\n', match3g.decode('utf-8'), '-' * 170, '\n'
execute(['scrapy','crawl','mrcrawl2'])
Since you've given us an example, let's trace it through:
>>> line = ',9243,46,Unterhaching,2,11333,8,13,1,133'
>>> split = line.rsplit(",",1)
>>> split
[',9243,46,Unterhaching,2,11333,8,13,1', '133']
>>> upl = split[1:]
>>> upl
['133']
>>> upl0 = upl[0]
>>> upl0
'133'
>>> upl0_list = list(upl0)
>>> upl0_list
['1', '3', '3']
>>> joined1 = "".join(upl0_list)
>>> joined1
'133'
>>> upl1 = "{}".format(joined1)
>>> upl1
'133'
>>> joined2 = ",".join(upl0_list)
>>> joined2
'1,3,3'
>>> upl2 = "{}".format(joined2)
>>> upl2
'1,3,3'
>>> upl2 = str(upl2)
>>> upl2
'1,3,3'
>>> upl1 = str(upl1)
>>> upl1
'133'
>>> r = line.replace(upl1, upl2)
>>> r
',9243,46,Unterhaching,2,11,3,33,8,13,1,1,3,3'
Again, notice that more than half of the steps don't actually do anything at all. You're converting strings to the same strings, then converting them to the same strings again; you're converting them to lists just to join them back together; etc. If you can't explain what each step is supposed to do, why are you doing them? Your code is supposed to be instructions to the computer to do something; just giving it random instructions that you don't understand isn't going to do any good.
More importantly, that's not the output you described. It has a different problem than the one you described: in addition to correctly replacing the 133 at the end with 1,3,3, it's also replacing the embedded 133 in the middle of 11333 with 11,3,33. Because that's exactly what you're asking it to do.
So, assuming that's your actual problem, rather than the problem you asked about, how do you fix that?
Well, you don't. You don't want to replace every '133' substring with '1,3,3', so don't ask it to do that. You want to make a string with everything up to the last comma, followed by the processed version of everything after the last comma. In other words:
>>> ",".join([split[0], upl2])
',9243,46,Unterhaching,2,11333,8,13,1,1,3,3'
I'd do it this way:
>>> ",000".replace("", ",")[2:]
',0,0,0,'