How to iterate through URL in Python - python

I have the following base url that I would like to iterate:
http://www.blabla.com/?mode_id=1
Basically, I would like Python for loop to iterate through id=1 like this:
http://www.blabla.com/?mode_id=1
http://www.blabla.com/?mode_id=2
http://www.blabla.com/?mode_id=3
http://www.blabla.com/?mode_id=4, etc.
I tried with my loop below, but it does not work:
for i in range(0, 200,1):
url = 'http://www.blabla.com/?mode_id= + str(i)'
driver.get(url)
How can I make it run properly? Thank you

You could use:
for i in range(200):
url = 'http://www.blabla.com/?mode_id={}'.format(i)
driver.get(url)
Remarks:
If you're going to start iterating one by one from zero, you can just use range(200), no need for the rest of arguments.
You should avoid concatenating strings in Python. There are better ways, like format (as per my example).
Make sure your indentation is correct.

Related

How do I get a normal list with strings instead of generator objects when I perform a googlesearch

Hi I am trying to get the first url of a google search based on queries in a list. For the sake of simplicity I am going to use the same code as a similar question 2 years prior.
from googlesearch import search
list_of_queries = ["Geeksforgeeks", "stackoverflow", "GitHub"]
results = []
for query in list_of_queries:
results.append(search(query, tld="co.in", num=1, stop=1, pause=2))
print (results)
Now this returns a list of generator objects. A solution was found to print out the list of results by adding
for result in results:
print (list(results))
However I want the results list to be in the form of a list of strings in order to web scrape the urls for data. One solution I found was to add
results_str = []
for result in results:
results_str.append(list(result))
When I print results_str I get this as an output:
[['https://www.geeksforgeeks.org/'], ['https://stackoverflow.com/'], ['https://github.com/']]
As one can see I cannot even use results_str directly as a list of urls to webscrape because of the additional brackets around each url. I thought I could work around it by removing the brackets by following this answer and thus adding
results_str_new = [s.replace('[' and ']', '') for s in results_str]
But this simply results in an AttributeError
AttributeError: 'list' object has no attribute 'replace'
Either way even if I did get it to work it all seems unnecessarily unnecessary to do all this work just to convert a list of generator objects to strings to use as urls to webscrape so I was wondering if there were any alternatives. I know that one of my options is to use selenium but I don't really want to do that because I don't want the hassle of an instance of Chrome opening whenever I run my script.
Thanks in advance
You are getting back a list of lists of string. To change that, you can use a list comprehension like this
results_str = [url for result in results for url in result]
or you can change from append to extend if you don't want to go with a list comprehension. Extend just extends the list where es append inserts the lists into the list.
results_str = []
for result in results:
results_str.extend(result)
Looks like you may be using a different version of googlesearch. I'm using googlesearch-python 1.1.0 so the call parameters are different. However, this should help:
from googlesearch import search
list_of_queries = ["Geeksforgeeks", "stackoverflow", "GitHub"]
results = []
for query in list_of_queries:
results.extend([r for r in search(query, 1, 'en')])
print(results)
Output:
['https://www.youtube.com/c/GeeksforGeeksVideos/videos', 'https://stackoverflow.com/', 'https://stackoverflow.blog/', 'https://github.com/']
Which, as you can see, is a simple list of strings (URLs in this case)

Add a variable to a link to make an Api call Python

I want to put the id in the link because I want to make an api call
id= 156
url1 = 'https://comtrade.un.org/api/get?r='<id>'&px=HS&ps=2020&p=0&rg=1&cc=total'
response1 = requests.get(url1)
print(response1.url)
You have a lot of options here, if you want to add a lot of variables to the url or want your code to look clean, I suggest using an f-string as shown below:
url1 = f'https://comtrade.un.org/api/get?r={id}&px=HS&ps=2020&p=0&rg=1&cc=total'
This way you can put any variable in your string with just saying {variable}.
Don't forget to put the f before the quotes.
Python 3 I would suggest the f"string" method as people wrote above me.
I personally like the format as it works for both 3 and 2
url1 = 'https://comtrade.un.org/api/get?r={0}&px=HS&ps=2020&p=0&rg=1&cc=total'.format(id)

How to add numbers in link (loop)

I'm writing a script where I try to scrape data from json files. The website link structure looks like this:
https://go.lime-go.com/395012/Organization/pase1009/
I want the Python script to go through a certain number and try to visit them. For example, right now the link is at pase1009. After the script has visited this link I want it to go to pase1010 and so on.
I'm really new to Python and trying to learn how to use loops, count, etc. but don't get it.
My PY code:
rlista = "https://go.lime-go.com/395012/Organization/pase1009/getEmployees"
page = self.driver.get(rlista)
time.sleep(2)
Best regards,
Tobias
You can combine several strings to one with the +-operator.
So you could save your base link in a variable and add the number afterwards in the loop.
Would look something like this:
baseLink = "https://your-link.com/any/further/stuff/pase"
for k in range(1000,1010,2):
link = baseLink + str(k)
print(link)
There your links would be
https://your-link.com/any/further/stuff/pase1000
https://your-link.com/any/further/stuff/pase1002
https://your-link.com/any/further/stuff/pase1004
https://your-link.com/any/further/stuff/pase1006
https://your-link.com/any/further/stuff/pase1008
as k will start with 1000, increment by 2 and stop before 1010 (range(start, stop, increment)).

Processing all data in a for loop instead of only one element

I wrote some code in order to scrape some data from a website. When I run the code manually I can get all the information for all the shoes, but when I run my script it only gives me one result for each variable.
What can I change to get all the results I want?
For example, when I run the following, I only get one result for marque and one for modele, but when i do it in my terminal I can see that vignette contains multiple values.
import requests
from bs4 import BeautifulSoup
r=requests.get('https://www.sarenza.com/store/product/gender-type/list/view?gender=1&type=76&index=0&count=99')
soup=BeautifulSoup(r.text,'lxml')
vignette=soup.find_all('li',class_='vignette')
for i in range(len(vignette)):
marque=vignette[i].contents[3].text
modele=vignette[i].contents[5].contents[3].text
You're updating your marque and modele variables overwriting their previous value on each iteration of the loop. At the end of the loop, they will only contain the last values that were assigned to them.
If you want to extract all the values, you need to use two lists, and append values to them like this:
marques = []
modeles = []
for i in range(len(vignette)):
marques.append(vignette[i].contents[3].text)
modeles.append(vignette[i].contents[5].contents[3].text)
Or, in a more Pythonic way:
marques = list(v.contents[3].text for v in vignette)
modeles = list(v.contents[5].contents[3].text for v in vignette)
Now you'll have all the values you need, and you can process them or print them out, like this:
for marque, modele in zip(marques, modeles):
print('Marque:', marque, 'Modèle:', modele)

How to integrate this script with this function in Python(Instagram)

I am doing a little script where I want to collect all the "code:" regarding a tag.
For example:
https://www.instagram.com/explore/tags/%s/?__a=1
The next next page will be:
https://www.instagram.com/explore/tags/plebiscito/?__a=1&max_id=end_cursor
However, my drawback is to make each url get me what I need (which are the comments and username of the people).
So as the script works, it does not do what I need.
The "obtain_max_id" function works, getting the following end_cursors, but I do not know how to adapt it.
I appreciate your help!
In conclusion, I need to adapt the "obtain_max_id" function in my "connect_main" function to extract the information I need with each of the URLs.
This is simple.
import requests
import json
host = "https://www.instagram.com/explore/tags/plebiscito/?__a=1"
r = requests.get(host).json()
for x in r['tag']['media']['nodes']:
print (x['code'])
next = r['tag']['media']['page_info']['end_cursor']
while next:
r = requests.get(host + "&max_id=" + next ).json()
for x in r['tag']['media']['nodes']:
print (x['code'])
next = r['tag']['media']['page_info']['end_cursor']
You have all the data you want in your data variable (in JSON form), right after you execute the line:
data = json.loads(finish.text)
in the while loop inside your obtain_max_id() method. Just use that.
Assuming everything inside the else block of your connect_main() method works, you could simple use that code inside the above while loop, right after you have all the data in your data variable.

Categories

Resources