Convert spaces to %20 in list - python

I need to convert spaces to %20 for api posts in a python array
tree = et.parse(os.environ['SPRINT_XML'])
olp = tree.findall(".//string")
if not olp:
print colored('FAILED', 'red') +" No jobs accociated to this view"
exit(1)
joblist = [t.text for t in olp]
How can I do that to t.text above?

I would recommend using urllib.parse module and its quote() function.
https://docs.python.org/3.6/library/urllib.parse.html#urllib.parse.quote
Example for Python3:
from urllib.parse import quote
text_encoded = quote(t.text)
Note: using quote_plus() won't work in your case as this function replaces spaces by plus char.

Use the String.replace() method as described here: http://www.tutorialspoint.com/python/string_replace.htm
So for t.text, it would be t.text.replace(" ", "%20")

Use urllib.quote_plus for this:
import urllib
...
joblist = [urllib.quote_plus(t.text) for t in olp]

Here is something I found that might help you: https://www.youtube.com/watch?v=qqxKQbKTO7o
def spaceReplace(s):
strArr = list(s)
for i, c in enumerate(strArr):
if c == ' ': strArr[i] = '%20'
return "".join(strArr)
df["Name"] = df["Name"].apply(spaceReplace)

Related

How do I print matches from a regex given a string value in Python?

I have the string "/browse/advanced-computer-science-modules?title=machine-learning"** in Python. I want to print the string in between the second "/" and the "?", which is "advanced-computer-science-modules".
I've created a regular expression that is as follows ^([a-z]*[\-]*[a-z])*?$ but it prints nothing when I run the .findall() function from the re module.
I created my own regex and imported the re module in python. Below is a snippet of my code that returned nothing.
regex = re.compile(r'^([a-z]*[\-]*[a-z])*?$')
str = '/browse/advanced-computer-science-modules?title=machine-learning'
print(regex.findall(str))
Since this appears to be a URL, I'd suggest you use URL-parsing tools instead:
>>> from urllib.parse import urlsplit
>>> url = '/browse/advanced-computer-science-modules?title=machine-learning'
>>> s = urlsplit(url)
SplitResult(scheme='', netloc='', path='/browse/advanced-computer-science-modules', query='title=machine-learning', fragment='')
>>> s.path
'/browse/advanced-computer-science-modules'
>>> s.path.split('/')[-1]
'advanced-computer-science-modules'
The regex is as follows:
\/[a-zA-Z\-]+\?
Then you catch the substring:
regex.findall(str)[1:len(str) - 1]
Very specific to this problem, but it should work.
Alternatively, you can use split method of a string:
str = '/browse/advanced-computer-science-modules?title=machine-learning'
result = str.split('/')[-1].split('?')[0]
print(result)
#advanced-computer-science-modules

How can I split string between group of word in python?

How can I split the "Value1" and "Value2 from this string?
my_str = 'Value1Value2'
I try to this but it's not work.
my_str = 'Value1Value2'
for i in my_str:
i = str(i).split('^<a.*>$|</a>')
print(i)
You can use bs4.BeautifulSoup:
from bs4 import BeautifulSoup
soup = BeautifulSoup(my_str)
out = [st.string for st in soup.find_all('a')]
Output:
['Value1', 'Value2']
One another way is to use cleaning techniques for extraction, you split on one character and remove out unwanted values.
Here's the code, I used
my_str = 'Value1Value2'
strList = my_str.split('/a>',maxsplit = 2)
for i in strList:
try:
print(i.split('>')[1].replace('<',''))
except IndexError:
pass
This will get you Value1 and Value2
If you want to do regex splitting on html, which again you shouldn’t (see bs4 answer above for way better answer).
import re
my_str = 'Value1Value2'
split_str = re.findall(r'(?<=>)\w*?(?=<\/a>)', my_str)
This works if you want the entire html element for each.
import re
re.sub("(a>)(<a)", "\\1[SEP]\\2", my_str).split("[SEP]")
if you just want the values, do this
re.findall("\>(.[^<]+)<\/a>", my_str)

how to use python re findall and regex so that 2 conditions can be run simultaneously?

here my data string :
MYDATA=DATANORMAL
MYDATA=DATA_NOTNORMAL
i use this code, but when i run it it shows empty at DATANORMAL
mydata = re.findall(r'MYDATA=(.*)' r'_.*', mystring)
print mydata
and it just shows : NOTNORMAL
i want both to work, and displays data like this:
DATANORMAL
NOTNORMAL
how do i do it? Thanks.
Try it online!
import re
mystring = """
MYDATA=DATANORMAL
MYDATA=DATA_NOTNORMAL
"""
mydata = re.findall(r'^\s*MYDATA=(?:.+_)?(.+?)\s*$', mystring, re.M)
print(mydata)
In case if you need word before _, not after, then use regex r'^\s*MYDATA=(.+?)(?:_.+)?\s*$' in code above, you may try this second variant here.
Based on what you describe, you might want to use an alternation here:
\bMYDATA=((?:DATA|(?:DATA_))\S+)\b
Script:
inp = "some text MYDATA=DATANORMAL more text MYDATA=DATA_NOTNORMAL"
mydata = re.findall(r'\bMYDATA=((?:DATA|(?:DATA_))\S+)\b', inp)
print(mydata)
This prints:
['DATANORMAL', 'DATA_NOTNORMAL']
I guess you need to add flags=re.M?
import re
mystring = """
MYDATA=DATANORMAL
MYDATA=DATA_NOTNORMAL"""
pattern = re.compile("MYDATA=(?:DATA_)?(\w+)",flags=re.M)
print(pattern.findall(mystring))

i want to change the url using python

I'm new to python and I can't figure out a way to do this so I'm asking for someone to help
I have URL like this https://abc.xyz/f/b/go_cc_Jpterxvid_avi_mp4 and I want to remove the last part go_cc_Jpterxvid_avi_mp4 of URL and also change /f/ with /d/ so I can get the URL to be like this https://abc.xyz/d/b
/b it change regular I have tried use somthing like this didn't work
newurl = oldurl.replace('/f/','/d/').rsplit("/", 1)[0])
Late answer, but you can use re.sub to replace "/f/.+" with "/d/b", i.e.:
old_url = "https://abc.xyz/f/b/go_cc_Jpterxvid_avi_mp4"
new_url = re.sub("/f/.+", r"/d/b", old_url)
# https://abc.xyz/d/b
Regex Demo and Explanation
You can apply re.sub twice:
import re
s = 'https://abc.xyz/f/b/go_cc_Jpterxvid_avi_mp4'
new_s = re.sub('(?<=\.\w{3}/)\w', 'd', re.sub('(?<=/)\w+$', '', s))
Output:
'https://abc.xyz/d/b/'
import re
domain_str = 'https://abc.xyz/f/b/go_cc_Jpterxvid_avi_mp4'
#find all appearances of the first part of the url
matches = re.findall('(https?:\/\/\w*\.\w*\/?)',domain_str)
#add your domain extension to each of the results
d_extension = 'd'
altered_domains = []
for res in matches:
altered_domains.append(res + d_extension)
print(altered_domains)
exmaple input:
'https://abc.xyz/f/b/go_cc_Jpterxvid_avi_mp4'
and output:
['https://abc.xyz/d']
What you had almost worked. The change is to remove the trailing right paren ) at the end of your assignment to newurl. The following works in both Python 2 and 3:
oldurl = "https://abc.xyz/f/b/go_cc_Jpterxvid_avi_mp4"
newurl = oldurl.replace('/f/','/d/').rsplit("/", 1)[0]
print(newurl)
But a more idiomatic expression can be obtain thru the re standard lib:
import re
old_url = "https://abc.xyz/f/b/go_cc_Jpterxvid_avi_mp4"
new_url = re.sub("/f/.+", r"/d/b", old_url)
print(new_url)

Deleting all occurances of '/' after its 2nd occurance in python

I have a URL string which is https://example.com/about/hello/
I want to split string as 'https://example.com', 'about' ,'hello'
How to do this ??
Use the urlparse to correctly parse a URL:
import urlparse
url = 'https://example.com/about/hello/'
parts = urlparse.urlparse(url)
paths = [p for p in parts.path.split('/') if p]
print 'Scheme:', parts.scheme # https
print 'Host:', parts.netloc # example.com
print 'Path:', parts.path # /about/hello/
print 'Paths:', paths # ['about', 'hello']
At the end of the day, the information you want are in the parts.scheme, parts.netloc and paths variables.
You may do this :
First split by '/'
Then join by '/' only before the 3rd occurance
Code:
text="https://example.com/about/hello/"
groups = text.split('/')
print( "/".join(groups[:3]),groups[3],groups[4])
Output:
https://example.com about hello
Inspired in Hai Vu's answer. This solution is for Python 3
from urllib.parse import urlparse
url = 'https://example.com/about/hello/'
parts = [p for p in urlparse(url).path.split('/') if p]
parts.insert(0, ''.join(url.split('/')[:3]))
There are lots of ways to do this. You could use re.split() to split on a regular expression, for instance.
>>> import re
>>> re.split(r'\b/\b', 'https://example.com/about/hello/')
['https://example.com', 'about', 'hello']
re is part of the standard library, documented here.
https://docs.python.org/3/library/re.html#re.split
The regex itself uses \b which means a boundy between a "word" character and a "non-word" character. You can use regex101 to explore how it works. https://regex101.com/r/mY8fV8/1

Categories

Resources