Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
I have a project and want to access a url by python. If i have variable1=1 and variable2=2, I want an output to be like this:
www.example.com/data.php?variable1=1&variable2=2
How do I achieve this? Thanks!
Check this out:
try:
from urllib2 import urlopen # python 2
except:
from urllib.request import urlopen # python 3
vars = ['variable1=1', 'variable2=2']
for i in vars:
url = 'http://www.example.com/data.php?' + i
response = urlopen(url)
html = response.read()
print(html)
The first four lines import some code we can use to make a HTTP request.
Then we create a list of variables named vars.
Then we pass each of those variables into a loop; that loop will run once for each item in vars.
Next we build the url given the current value in vars.
Finally we get the html at that address and print it to the terminal.
You can use formate operation in python for string.
for example
variable1=1
variable1=1
url = 'www.example.com/data.php?variable1={}&variable2={}'.format(variable1,variable1)
or if you want to use the url with request then you can make a dict and pass it in request like this way
import requests
url = 'http://www.example.com/data.php'
data = {'variable1':'1','variable2':'2'}
r = requests.get(url,data)
and it will making request on this url
http://www.example.com/data.php?variable1=1&variable2=2
Try string formatting...
url = 'www.example.com/data.php?variable1={}&variable2={}'.format(variable1, variable2)
This means the 2 {} will be replaced with whatever you pass in .format(), which in this case is just the variables' values
Related
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
I tried to access 'gold_spent` key in some dictionary made from JSON file.
Here is my code:
import json
import requests
response = requests.get("https://sky.shiiyu.moe/api/v2/profile/tProfile")
json_data = json.loads(response.text)
print(json_data['gold_spent'])
When I run this I get this "KeyError: 'gold_spent'"
I don't know what I am doing wrong, any help would be appreciated.
The data you are looking for is nested. See below.
print(json_data['profiles']['590cedda63e145ea98d44015649aba30']['data']['misc']['auctions_buy']['gold_spent'])
output
46294255
You experienced an exception because gold_spent isn't at all a key of first level, you need to investigate the structure to find it. Accessing non-existing key in the dictionary would always end with KeyError exception.
import json
import requests
response = requests.get("https://sky.shiiyu.moe/api/v2/profile/tProfile")
json_data = json.loads(response.text)
print(json_data.keys())
# dict_keys(['profiles'])
print(json_data['profiles'].keys())
# dict_keys(['590cedda63e145ea98d44015649aba30'])
print(json_data['profiles']['590cedda63e145ea98d44015649aba30'].keys())
# dict_keys(['profile_id', 'cute_name', 'current', 'last_save', 'raw', 'items', 'data'])
print(json_data['profiles']['590cedda63e145ea98d44015649aba30']['data']['misc']['auctions_buy']['gold_spent'])
# 46294255
I have 2 function blocks in my scraper
1.Parse
2.Parse_info
In the 1st block, I got the list of URLs.
Some of the URLs are working (they already have the 'https://www.example.com/' part)
Rest URLs are not working (they do not have the 'https://www.example.com/' part)
So before passing the URL to 2nd block i.e. parse_info; I want to validate the URL
and If it is not working I want to edit it and add the required part ('https://www.example.com/' part).
You could leverage the requests module and get the status code of the website - a guide to doing that is here.
Similarly, if you're just trying to validate whether the URL contains a specific portion i.e the 'https://www.example.com/', you can perform a regex query and do that.
My interpretation from your question is that you have a list of URLs, some of which have an absolute address like 'https://www.example.com/xyz' and some only have a relative reference like '/xyz' that belongs to the 'https://www.example.com' site.
If that is the case, you can use 'urljoin' to rationalise each of the URLs, for example:
>>> from urllib.parse import urljoin
>>> url = 'https://www.example.com/xyz'
>>> print(urljoin('https://www.example.com', url))
https://www.example.com/xyz
>>> url = '/xyz'
>>> print(urljoin('https://www.example.com', url))
https://www.example.com/xyz
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
So I am trying to get into python, and am using other examples that I find online to understand certain functions better.
I found a post online that shared a way to check prices on an item through CamelCamelCamel.
They had it set to request from a specific url, so I decided to change it to userinput instead.
How can I just simply loop this function?
It runs fine afaik once, but after the inital process i get 'Process finished with exit code 0', which isn't necessarily a problem.
For the script to perform how I would like it to. It would be nice if there was a break from maybe, 'quit' or something, but after it processes the URL that was given, I would like it to request for a new URL.
Im sure theres a way to check for a specific url, IE this should only work for Camelcamelcamel, so to limit to only that domain.
Im more familiar with Batch, and have kinda gotten away with using batch to run my python files to circumvent what I dont understand.
Personally if I could . . .
I would just mark the funct as 'top:'
and put goto top at the bottom of the script.
from bs4 import BeautifulSoup
import requests
print("Enter CamelCamelCamel Link: ")
plink = input("")
headers = {'User-Agent': 'Mozilla/5.0'}
r = requests.get(plink,headers=headers)
data = r.text
soup = BeautifulSoup(data,'html.parser')
table_data = soup.select('table.product_pane tbody tr td')
hprice = table_data[1].string
hdate = table_data[2].string
lprice = table_data[7].string
ldate = table_data[8].string
print ('High price-',hprice)
print ("[H-Date]", hdate)
print ('---------------')
print ('Low price-',lprice)
print ("[L-Date]", ldate)
Also how could I find the difference from the date I obtain from either hdate or ldate, from today/now. How the dates I parsed they're strings and I got. TypeError: unsupported operand type(s) for +=: 'int' and 'str'.
This is really just for learning, any example works, It doesnt have to be that site in specific.
In Python, you have access to several different types of looping control structures, including:
while statements
while (condition) # Will execute until condition is no longer True (or until break is called)
<statements to execute while looping>
for statements
for i in range(10) # Will execute 10 times (or until break is called)
<statements to execute while looping>
Each one has its strengths and weaknesses, and the documentation at Python.org is very thorough but easy to assimilate.
https://docs.python.org/3/tutorial/controlflow.html
This question already has answers here:
Extract domain name from URL in Python
(8 answers)
Closed 3 years ago.
I've seen questions similar to this but not really getting at what I'm looking for so I was wondering. I'm trying to extract the main domain of a server from its URL, but just that, without any subdomains. So if the URL was, for example, "http://forums.example.com/" I want to know how to extract just the "example.com" portion from it. I've tried splitting at the second-to-last dot but that brings trouble when dealing with URLs like "http://forums.example.co.uk/", as it extracts just the "co.uk" when I would want "example.co.uk". Is there a way I can parse URLs this way without having to find a list of TLDs to compare?
PS: In case it matters, I will be using this in the context of mail servers, so the URLs will likely look more like "mail.example.co.uk" or "message-ID#user.mail.example.co.uk"
Edit: Okay so I know that the answer to this question is the same as one of the answers in the "duplicate" question but I believe it is different because the questions are different. In the other question the asker was asking regardless of subdomains and so the selected answer used urlparse, which doesn't distinguish subdomain from domain. In addition this question asks about email addresses as well, and urlparse doesn't work on email addresses (throws invalid url exception). So I believe this question is distinct from the other and not a duplicate
You want to check out tldextract. With it you can do everything you want easily. For example:
>>> import tldextract
>>> extracted_domain = tldextract.extract('forums.example.com')
ExtractResult(subdomain='forums', domain='example', suffix='com')
Then you can just:
>>> domain = "{}.{}".format(extracted_domain.domain, extracted_domain.suffix)
>>> domain
'example.com'
It also works with emails:
>>> tldextract.extract('message-ID#user.mail.example.co.uk')
ExtractResult(subdomain='user.mail', domain='example', suffix='co.uk')
Just use pip to install it: pip install tldextract
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I want to crawl some data from this type of url:
http://steamcommunity.com/market/listings/730/AK-47%20%7C%20Redline%20%28Field-Tested%29/render?start=0&count=5¤cy=&language=english
I don't know, it contains some kind of html-tags but i don't know how to actually scrape this page (i used beautifulSoup for my other urls).
Hope you can help me out.
The page you loaded is a JSON file. Use the JSON library like so:
import requests
import json
html = requests.get('http://steamcommunity.com/market/listings/730/AK-47%20%7C%20Redline%20%28Field-Tested%29/render?start=0&count=5¤cy=&language=english')
# Load the parsed page into a JSON object.
steam_json = json.loads(html.text)
# Extract whatever you want like this:
success_status = steam_json['success']
You may want to do it with python, i.e. jsoup is a BeautifulSoup-like library for Java. The url returns a json. Your first have to load it as a python-native instance. In this case the corresponding python-native object is a dictionary, using the json library:
import json, urllib2
request = urllib2.Request(url=your_url)
request.add_header('User-agent',user_agent) # let's say you want to add headers like user-agent etc...
response = urllib2.urlopen(request)
dico = json.loads(response.read())
Then you have to explore the key-value pairs which are of interest for you, and parse the values containing html as you usually do with beautifulSoup.
Also, note that the site from which you want to get data, can be hypermedia-driven (see HATEOAS), which is a kind of AJAX implemented with no graphical interface. Whatever it might be, it allows you to be more precise (and thus more server-friendly) in the data you request.
url_base = "http://steamcommunity.com/market/listings/730/AK-47%20%7C%20Redline%20%28Field-Tested%29/render?"
start = 0
count = 5
currency = ''
language = 'english'
your_url = url_base + "start={0}&count={1}¤cy={2}&language={3}".format(start,count,currency,language)