How to cut link in python? [closed] - python

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Closed 8 years ago.
Improve this question
I have the following link:
http://ecx.images-amazon.com/images/I/51JXXb2vpDL._SY344_PJlook-inside-v2,TopRight,1,0_SH20_BO1,204,203,200_.jpg
How to take just this one part of the link:
http://ecx.images-amazon.com/images/I/51JXXb2vpDL.jpg
and remove everything else? I also want to keep the extension.
I want to remove this part:
._SY344_PJlook-inside-v2,TopRight,1,0_SH20_BO1,204,203,200_
and keep this part:
http://ecx.images-amazon.com/images/I/51JXXb2vpDL.jpg
How can I do this in python?

You could use:
re.sub(r'\._[\w.,-]*(\.(?:jpg|png|gif))$', r'\1', inputurl)
This makes some assumptions but works on your input. The search starts at the ._ sequence, takes anything after that that is a letter, digit, dash, underscore, dot or comma, then matches the extension. I picked an explicit small group of possible extensions; you could also just use (\.w+)$ at the end instead to widen the acceptable extensions to word characters.
Demo:
>>> import re
>>> inputurl = 'http://ecx.images-amazon.com/images/I/51JXXb2vpDL._SY344_PJlook-inside-v2,TopRight,1,0_SH20_BO1,204,203,200_.jpg'
>>> re.sub(r'\._[\w.,-]*(\.(?:jpg|png|gif))$', r'\1', inputurl)
'http://ecx.images-amazon.com/images/I51JXXb2vpDL.jpg'

url = "http://ecx.images-amazon.com/images/I/51JXXb2vpDL._SY344_PJlook-inside-v2,TopRight,1,0_SH20_BO1,204,203,200_.jpg"
l = url.split(".")
print(".".join(l[:-2:])+".{}".format(l[-1]))
prints
http://ecx.images-amazon.com/images/I/51JXXb2vpDL.jpg

The following should work:
import re
url = "http://ecx.images-amazon.com/images/I/51JXXb2vpDL._SY344_PJlook-inside-v2,TopRight,1,0_SH20_BO1,204,203,200_.jpg"
print re.sub(r"(https?://.+?)\._.+(\.\w+)", r'\1\2', url)
The above code prints
http://ecx.images-amazon.com/images/I/51JXXb2vpDL.jpg
An important detail: More links are necessary to find the correct pattern. I'm currently assuming you want everything until the first ._

url = re.sub("(/[^./]+)\.[^/]*?(\.[^.]+)$", "\\1\\2", url)

Related

Regex syntax to match equal sign in each word but do not take in consideration the first equal sign [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 27 days ago.
Improve this question
It might be confusing and I don't know if it even possible as I have some knowledge about regex using python but I couldn't solve the issue I have.
The thing is I have logs that I want to replace the equal sign in their URL to their url encoding (%3D) using Regex syntax.
For example I have logs of this:
request=www.google.com/fgdsg=gfsdg=gfdsd
request_access=https://regex101.com/eawf/?=dasf
All the equal sign after the first one that uses to assign value for variable I want to match them then replace them with %3D like this:
request=www.google.com/fgdsg%3Dgfsdg%3Dsgfdsd
request_access=https://regex101.com/eawf/?%3Ddasf
This is what I want to be.
The written text it only example I didn't wrote real logs.
You can make use of a negative lookbehind like so: (?<!request)= this will match all the = that do not come after request. A little
import re
sample = "request=www.google.com/fgdsg=gfsdg=gfdsd"
replaced = re.sub('(?<!request)=', '%3D', sample)
print(replaced)
First remove the part before (including) the first =, leaving only the request URL. Then you can simply substitute all occurrences of = with %3D in this string.
import re
log = "request=www.google.com/fgdsg=gfsdg=gfdsd"
request_url = re.sub(r'^\w+=', '', log, 1)
urlencoded_request_url = re.sub(r'=', '%3D', request_url)
print(urlencoded_request_url) # www.google.com/fgdsg%3Dgfsdg%3Dgfdsd

Replace a string in a URL with python [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 months ago.
Improve this question
I have a text containing a URL that needs to be reworked.
text='dfs:/?url=https://myserver/c12&ofg={"tes":{"id":1812}}'
I need to replace programmatically the id value (in this example 1812, which is unknown before the execution) with a fixed substring (e.g. 189). So the end result must be
'dfs:/?url=https://myserver/c12&ofg={"tes":{"id":189}}'
As I'm programming in Python, I guess that I should use the regular expression (module re) to automatically replace that value between "id": and }} but I couldn't find one that works for this use case.
I assume you are always generating the same URL with that pattern, and the value to 'change' is always in {"id":X}. One way to solve this particular problem is with a positive lookbehind + re.sub replacement.
import re
pattern = re.compile(r"(?<=\"id\":)\d+")
string = "dfs:/?url=https://myserver/c12&ofg={\"tes\":{\"id\":1812}}"
print(pattern.sub("desired_value", string))
Generated output will contain desired_value in place of the 1812. A good explanation of what is happening is done in regex101 but a quick rep of what is happening in the pattern:
Matches any digit one or more times ONLY if behind has "id":, without consuming characters
what about simply splitting the string twice? eg.
my_string = 'dfs:/?url=https://myserver/c12&ofg={"tes":{"id":1812}}'
substring = my_string.split('"id":',1)[1]
substring = substring.split('}}')[0]
print(my_string.replace(substring, "189"))

Python: how to extract the variables between 2 constant substring [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
I am trying to extract the variables between 2 constant substring in a string. For example,
I wish to extract the variable Apple, Orange, Watermelon, Kiwi....13cups, 14cups...19cups. I am using the re expression to get to the first step of taking the variable between $ sign but I do not get anything results.
Anyone can advise on the correct expression or if there is a better way to extract it ?
Thanks.
import re
file = '$n$n$n$xa0$n$nSHOWALL$nSHOWALL%GROWTH$n$n$xa0$n$xa0$n$n$n$nApple$na$nOrange$n$nWatermelon$nKiwi$n$nBanana$nJackfruit$n$nGuava$na$nGrape$n$nPlum$na$nOrange$n$nCoconut$nWatermelon$n$n12cups$n13cups$n$n14cups$na$n15cups$n$n16cups$na$n17cups$n$n18cups$n19cups$n'
found = re.findall(r'(?=$(.*?)$)',file)
print(found)
Given that the rule(s) for identifying the required character sequences is ambiguous, I contend that RE is impractical. No doubt it could be done but here's a quick'n'dirty approach to the problem:-
data = '$n$n$n$xa0$n$nSHOWALL$nSHOWALL%GROWTH$n$n$xa0$n$xa0$n$n$n$nApple$na$nOrange$n$nWatermelon$nKiwi$n$nBanana$nJackfruit$n$nGuava$na$nGrape$n$nPlum$na$nOrange$n$nCoconut$nWatermelon$n$n12cups$n13cups$n$n14cups$na$n15cups$n$n16cups$na$n17cups$n$n18cups$n19cups$n'
for token in data.split('$n'):
if token not in ('SHOWALL%GROWTH', 'SHOWALL', '$xa0', 'a', ''):
print(token)

negative lookbehind not working as expected [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I have strings of this form:
FPLBX(2x3)ZE(53x13)(4x7)ZGQO
I want to find the blocks in parenthesis but only when they're not preceded by another group.
The other way around works perfectly fine but I can't make it work with preceding.
current regex:
(\(\d*x\d*\))(?<!\))
You simply need to put the so-called negative lookbehind assertion, i.e. the (?<!\))-part, in front of your search re:
>>> import re
>>> txt = "FPLBX(2x3)ZE(53x13)(4x7)ZGQO"
>>> re.findall(r"(?<!\))(\(\d*x\d*\))", txt)
['(2x3)', '(53x13)']

Python Regular Expression for pattern containing multiple lines [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
I want to extract all the text printed after "AAAAAAAAAAAAAAAAAA"
Give me some text!
AAAAAAAAAAAAAAAAAA
S
p
p
p
Epppp
The following does not work:
import re
m = re.findall(r'AAAAAAAAAAAAAAAAAA(.*)', result)
print m[0]
Also, can I specify a variable in a regular expression instead of a hard coded string: "AAAAAAAAAAAAAAAAAA"?
Reason being, the text: "AAAAAAAAAAAAAAAAAA" is a variable and changes. So, I would like to look for a specific variable value in the pattern and then extract all the text after it.
Use re.S or re.DOTALL (they are synonyms) to have findall match across lines. Or, in your case, search is probably more appropriate since you only want one match. Also, to have it work for a non-hard-coded string, simply use string formatting or string concatenation. To avoid having unescaped regex characters in the string, run it through re.escape.
import re
result = """Give me some text!
AAAAAAAAAAAAAAAAAA
S
p
p
p
Epppp"""
s = 'AAAAAAAAAAAAAAAAAA'
# With formatting
m = re.search(r'{}(.*)'.format(re.escape(s)), result, re.S)
# With concatenation
m = re.search(re.escape(s) + r'(.*)', result, re.S)
print m.group(1)

Categories

Resources