How do i clean a url using python [closed] - python

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
When i extract a url, it displays as below
https://tv.line.me/v/14985624_%E0%B8%A0%E0%B8%B9%E0%B8%95%E0%B8%A3%E0%B8%B1%E0%B8%95%E0%B8%95%E0%B8%B4%E0%B8%81%E0%B8%B2%E0%B8%A5-ep3-6-6-%E0%B8%8A%E0%B9%88%E0%B8%AD%E0%B8%878
how do i convert this to more readable format like below in python. The link below is the same as above.
Link to the image of how the url appears on browser address bar

You can use urllib module to decode this url
from urllib.parse import unquote
url = unquote('https://tv.line.me/v/14985624_%E0%B8%A0%E0%B8%B9%E0%B8%95%E0%B8%A3%E0%B8%B1%E0%B8%95%E0%B8%95%E0%B8%B4%E0%B8%81%E0%B8%B2%E0%B8%A5-ep3-6-6-%E0%B8%8A%E0%B9%88%E0%B8%AD%E0%B8%878')
print(url)
This will give you the result as follows.
https://tv.line.me/v/14985624_ภูตรัตติกาล-ep3-6-6-ช่อง8
Thank you

Related

Hi, I am trying to scrape some data from a webpage through 'requests' and 'BeautifulSoup' libraries in python via following line of code [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 days ago.
Improve this question
I am trying to scrape some data from a webpage through requests and BeautifulSoup libraries in python via following lines of code:
import requests
from bs4 import BeautifulSoup
r = requests.get("https://www.century21.com/real-estate/rock-springs-wy/LCWYROCKSPRINGS/?")
c = r.content
soup=BeautifulSoup(c,'html.parser')
all=soup.find_all("div",{'class' : "infinite-item property-card clearfix property-card-CBR52611979 initialized visited" })
all
In the output I am getting an empty list:
[]

How to use proxy in requests [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I have got proxies from leafproxies, and they are giving me something like this. I don't really understand what this is. At the very end, I want to use it in requests.
Here's what they provide.
ipaddress:port:xxx:xxx
I believe that xxx:xxx part is username and password but I am not really sure.
Can anyone guide me, how can I use it with python requests?
You can try this :
import requests
import os
#Syntax is http://username:password#ipaddress:port
http_proxyf = 'http://xxx:xxx#gPtYMLoPX84.5.maxrainbow.net:8877'
# Copy xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-au-xxxxxxxxxxxxxxxxx-xxxxxxxxxx:xxxxxxxxxxxx
# and paste at first
os.environ["http_proxy"] = http_proxyf
os.environ["https_proxy"] = http_proxyf
page=requests.get("https://google.com")
To check you can do this:
print(requests.get('https://api64.ipify.org?format=json').json())
This will return the ip address.

How to check if user input is an url? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I'm currently working on a small side project using the steam api, I want to check if the user inputs an url or not.
Could not find anything to this specific problem so if anyone knows that help!
Use regex
Sample Code Below:
import re
regex = r"(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\"., <>?«»“”‘’]))"
def is_url(input_string):
return re.match(regex, input_string):

Read first line of a raw text [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
Hey Im using my little scraper and I want to save results with one detail, first line of a raw pastebin/any other bin webpage.
lets say I have this code:
r=requests.get("https://pastebin.com/raw/qH03hKGU") #random link
text=r.text
I want to get the first line of the variable text without saving it (I will save just the one line)
You can use partition('\n')[0] to get the first line:
import requests
r=requests.get("https://pastebin.com/raw/qH03hKGU") #random link
text=r.text
print(text.partition('\n')[0])
OUT: import glob

Trying to read the text of an FTP website into a string in pythong [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
This is the site I can open in Chrome and see text:
ftp://ftp.cmegroup.com/pub/settle/stlags
Any idea how to read this into a string in python?
Don`t know if this helps but this will get you the html of a website:
import urllib.request
url = "ftp://ftp.cmegroup.com/pub/settle/stlags"
html = urllib.request.urlopen(url)
htmlB=html.read()
htmlS = htmlB.decode()
print(htmlS)

Categories

Resources