Nested list or list of string pairs - python

I have some pairs of strings. First contains name, second contains city of birth.
I use them in web scraping. When I find appropriate element on web page I want in for loop make send_keys(name) and do other operations like click or enter. For second element from web page I want also make for loop and send_keys(city). How can I do it?
Should I make list of string pairs or nested list?
Like:
list_1 = [["Ann", "London"], ["John", "Barcelona"], ["Kate", "Paris"]]
list_2 = [("Ann", "London"), ("John", "Barcelona"), ("Kate", "Paris")]
What is better if my double iteration should look like:
for element in list_1:
el_scraped = driver.find.....
el_scraped.send_keys(element)
el_scraped.click()
for element2 in element:
el2_scraped = driver.find ....
el2_scarped.send_keys(element2)
el2_scraped.click()
I have a problem with for loop construction. I only post some operations between one loop and another. Can someone help me with for loops and make appropriate list?

You can store the data in any iterator unless you call them appropriately.
I don't see any necessity for a nested for loop.
For the data format in list_1 you can call them as below:
for name,city in list_1:
el_scraped = driver.find.....
el_scraped.send_keys(name)
el_scraped.click()
el2_scraped = driver.find ....
el2_scarped.send_keys(city)
el2_scraped.click()

Related

Hashing String List Elements and Saving to a New List

I'm trying to take 100 names, hash each name to 8 bits and save it to a new list. I understand that using 8 bits will most likely result in collisions, I'm trying to see at what rate they will collide and I'm hoping to include this code snippet in my paper.
I believe my logic is okay, it's just syntax that's causing my issues. Any help is appreciated.
import hashlib
list = ["Cammy", "Maisha", "Lizette", "Marjorie", "Shaquita", "Rueben", "Fatima", "Maynard",
"Laurena", "Lauren", "Allyson", "Pearlie", "Bethel", "Daniell", "Laurinda", "Crista",
"Ching", "Kareen", "Beth", "Stephnie", "Manie", "Kareem", "Titus", "Humberto",
"Lauretta", "Rob", "Raul", "Damion", "Stephani", "Carin", "Sharla", "Eleonor", "Naida",
"Ashley", "Rachel", "Graig", "Raymonde", "Shalanda", "Annetta", "Lissette", "Sandi",
"Alda", "Arlinda", "Ashlee", "Marguerite", "Tammi", "Denisha", "Genie", "Elizbeth",
"Elvie", "Markus", "Marquitta", "Arla", "Vanda", "Devon", "Meagan", "Taryn", "Lina",
"Shea", "Leighann", "Janel", "Sanora", "Harmony", "Concetta", "Dwayne", "Kyla",
"Evonne", "Mauro", "Deane", "Chester", "Inez", "Tari", "Maribeth", "Ariel", "Elisa",
"Maurice", "Dung", "Mona", "Hung", "Maximina", "Demarcus", "Jayson", "Jenny", "Duane",
"Reginia", "Gennie", "Orval", "Venus", "Craig", "Lessie", "Madaline", "Paulina",
"Aletha", "Gisele", "Sheena", "Devora", "Arcelia", "Ericka", "Colene", "Hildegard"]
newlist = []
for i in list:
newlist = hash(list[i] % 10**8)
for i in newlist:
print(i)
Without touching your logic, to make your code work you want to replace these lines:
for i in list:
newlist = hash(list[i] % 10**8)
with
for i in list:
newlist.append(hash(i) % 10**8)
Some clarification:
In Python, you can use .append() on any list object to add elements to the end of that list. In this case, you're filling the empty list you initialized above with elements inside a loop. Further, other than in e.g. a classic Java loop, in Python you can iterate over a list directly, such that your i refers to a different element of the list each time. Thus, there is no need to try and access the list at a certain index each time. Hope this helps!

multiple findAll in one for loop

I'm using BeatufulSoap to read some data from web page.
This code works fine, but I would like to improve it.
How do I make the for loop to extract more than one piece of data per iteration? Here I have 3 for loops to get values from:
for elem in bsObj.findAll('div', class_="grad"): ...
for elem in bsObj.findAll('div', class_="ulica"): ...
for elem in bsObj.findAll('div', class_="kada"): ...
How to change this to work in one for loop? Of course I'd like a simple solution.
Output can be list
My code so far
from bs4 import BeautifulSoup
# get data from a web page into the ``html`` varaible here
bsObj = BeautifulSoup(html.read(),'lxml')
mj=[]
adr=[]
vri=[]
for mjesto in bsObj.findAll('div', class_="grad"):
print (mjesto.get_text())
mj.append(mjesto.get_text())
for adresa in bsObj.findAll('div', class_="ulica"):
print (adresa.get_text())
adr.append(adresa.get_text())
for vrijeme in bsObj.findAll('div', class_="kada"):
print (vrijeme.get_text())
vri.append(vrijeme.get_text())
You can use BeautifulSoup's select method to target your various desired elements, and do whatever you want with them. In this case we are going to simplify the CSS selector pattern by using the :is() pseudo-class, but basically we are searching for any div that has class grad, ulica, or kada. As each element is returned that matches the pattern, we just sort them by which class they correspond to:
from urllib.request import urlopen
from bs4 import BeautifulSoup
import requests
lokacija="http://www.hep.hr/ods/bez-struje/19?dp=koprivnica&el=124"
datum="12.02.2019"
lokacija=lokacija+"&datum="+datum
print(lokacija)
r = requests.get(lokacija)
print(type(str(r)))
print(r.status_code)
html = urlopen(lokacija)
bsObj = BeautifulSoup(html.read(),'lxml')
print("Datum radova:",datum)
print("HEP područje:",bsObj.h3.get_text())
mj=[]
adr=[]
vri=[]
hep_podrucje=bsObj.h3.get_text()
for el in bsObj.select('div:is(.grad, .ulica, .kada)'):
if 'grad' in el.get('class'):
print (el.get_text())
mj.append(el.get_text())
elif 'ulica' in el.get('class'):
print(el.get_text())
adr.append(el.get_text())
elif 'kada' in el.get('class'):
print (el.get_text())
vri.append(el.get_text())
Note: basic explanation ahead. If you know this, skip directly to the listing of possibilities
To change the code into a loop, you have to look at the part that stays the same and the part that varies. In your case, you find a div, get the text and append it to a list.
The class attribute of the div objects varies each time, so does the list you append to. A for loop works by having one variable that is assigned different values each iteration, then executig the code within.
We get a basic structure:
for div_class in <div classes>:
<stuff to do>
Now, in <stuff to do>, we have a different list each time. We need some way of getting a different list into the loop. For this, there are multiple possibilities:
Put the list into a dict and use item lookup
zip the lists with <div classes> and iterate over them
The first two will involve using nested loops, the result looking similar to this:
list_1 = []
list_2 = []
list_3 = []
for div_class, the_list in zip(['div_cls1', 'div_cls2', 'div_cls3'], [list_1, list_2, list_3]):
for elem in bsObj.find_all('div', class_=div_class):
the_list.append(elem.get_text())
or
lists = {'div_cls1': [], 'div_cls2': [], 'div_cls3': []}
for div_class in lists: # note: keys MUST match the class of div elements
for elem in bsObj.find_all('div', class_=div_class):
lists[div_class].append(elem.get_text)
Of course, the inner loop could be replaced by list comprehension (works for the dict approach): lists[div_class] = [elem.get_text() for elem in bsObj.find_all('div', class_=div_class)]

Multiple python nested for loops, possibly recursively traverse lists. python 3

I am trying to match the first item in participantId to the first list in str(each['participantId']). And then the second item in participantId to second list, and so on.
participantId = ['2','5','7','4','10','9','2']
each['participantId'] = [[1,2,3,4,5,6,7,8,9,10],
[1,2,3,4,5,6,7,8,9,10],
[1,2,3,4,5,6,7,8,9,10],
[1,2,3,4,5,6,7,8,9,10],
[1,2,3,4,5,6,7,8,9,10],
[1,2,3,4,5,6,7,8,9,10],
[1,2,3,4,5,6,7,8,9,10]]
So I want to use a for loop on participantId but have just the first item in the list do an if statement with the first list, then the second item in participantId do an if statement with the second list. As of Now my code just matches all participantId's with each list. here is a snippet of my code:
for each in json['participants']:
for x in str(each['participantId']):
print(x)
for i in participantId:
if x == i:
kills = each['stats']['deaths']
print('kills')
print(kills)
I have looked at recursively traverse lists as a solution but I cant seem to make that work. I'm very new to python and coding so maybe there is some function im missing.
I would recommend using dictionary comprehension for something like this.
For example:
participant_data = {
_id: data for _id, data in \
zip(participantId, each['participantId'])
}
will make participant_data a dictionary where the keys are the items in participantId, and the values are the items in each['participantId']. To get the data for a specific id, you just need to use participant_data[id_of_participant].

Cleaner or easier way to write this?

I'm scrapping from here: https://www.usatoday.com/sports/ncaaf/sagarin/ and the page is just a mess of font tags. I've been able to successfully scrape the data that I need, but I'm curious if I could written this 'cleaner' I guess for lack of a better word. It just seems silly that I have to use three different temporary lists as I stage the cleanup of the scrapped data.
For example, here is my snippet of code that gets the overall rating for each team in the "table" on that page:
source = urllib.request.urlopen('https://www.usatoday.com/sports/ncaaf/sagarin/').read()
soup = bs.BeautifulSoup(source, "lxml")
page_source = soup.find("font", {"color": "#000000"}
sagarin_raw_rating_list = page_source.find_all("font", {"color": "#9900ff"})
raw_ratings = sagarin_raw_rating_list[:-1]
temp_list = [element.text for element in raw_ratings]
temp_list_cleanup1 = [element for element in temp_list if element != 'RATING']
temp_list_cleanup2 = re.findall("&nbsp\s*(-?\d+\.\d+)", str(temp_list_cleanup1))
final_ratings_list = [element for element in temp_list_cleanup2 if element != home_team_advantage] # This variable is scrapped from another piece of code
print(final_ratings_list)
This is for a private program for me and some friends so I'm the only one ever maintaining it, but it just seems a bit convoluted. Part of the problem is the site because I have to do so much work to extract the relevant data.
The main thing I see is that you turn temp_list_cleanup1 into a string kind of unnecessarily. I don't think there's going to be that much of a difference between re.findall on one giant string and re.search on a bunch of smaller strings. After that you can swap out most of the list comprehensions [...] for generator comprehensions (...). It doesn't eliminate any lines of code, but you don't store extra lists that you won't ever need again
temp_iter = (element.text for element in raw_ratings)
temp_iter_cleanup1 = (element for element in temp_iter if element != 'RATING')
# search each element individually, rather than one large string
temp_iter_cleanup2 = (re.search("&nbsp\s*(-?\d+\.\d+)", element).group(1)
for element in temp_iter_cleanup1)
# here do a list comprehension so that you have the scrubbed data stored
final_ratings_list = [element for element in temp_iter_cleanup2 if element != home_team_advantage]

How can I take a text file and create a triple nested list from it with tkinter python

I'm making a program that allows the user to log loot they receive from monsters in an MMO. I have the drop tables for each monster stored in text files. I've tried a few different formats but I still can't pin down exactly how to take that information into python and store it into a list of lists of lists.
The text file is formatted like this
item 1*4,5,8*ns
item 2*3*s
item 3*90,34*ns
The item # is the name of the item, the numbers are different quantities that can be dropped, and the s/ns is whether the item is stackable or not stackable in game.
I want the entire drop table of the monster to be stored in a list called currentDropTable so that I can reference the names and quantities of the items to pull photos and log the quantities dropped and stuff.
The list for the above example should look like this
[["item 1", ["4","5","8"], "ns"], ["item 2", ["2","3"], "s"], ["item 3", ["90","34"], "ns"]]
That way, I can reference currentDropTable[0][0] to get the name of an item, or if I want to log a drop of 4 of item 1, I can use currentDropTable[0][1][0].
I hope this makes sense, I've tried the following and it almost works, but I don't know what to add or change to get the result I want.
def convert_drop_table(list):
global currentDropTable
currentDropTable = []
for i in list:
item = i.split('*')
currentDropTable.append(item)
dropTableFile = open("droptable.txt", "r").read().split('\n')
convert_drop_table(dropTableFile)
print(currentDropTable)
This prints everything properly except the quantities are still an entity without being a list, so it would look like
[['item 1', '4,5,8', 'ns'], ['item 2', '2,3', 's']...etc]
I've tried nesting another for j in i, split(',') but then that breaks up everything, not just the list of quantities.
I hope I was clear, if I need to clarify anything let me know. This is the first time I've posted on here, usually I can just find another solution from the past but I haven't been able to find anyone who is trying to do or doing what I want to do.
Thank you.
You want to split only the second entity by ',' so you don't need another loop. Since you know that item = i.split('*') returns a list of 3 items, you can simply change your innermost for-loop as follows,
for i in list:
item = i.split('*')
item[1] = item[1].split(',')
currentDropTable.append(item)
Here you replace the second element of item with a list of the quantities.
You only need to split second element from that list.
def convert_drop_table(list):
global currentDropTable
currentDropTable = []
for i in list:
item = i.split('*')
item[1] = item[1].split(',')
currentDropTable.append(item)
The first thing I feel bound to say is that it's usually a good idea to avoid using global variables in any language. Errors involving them can be hard to track down. In fact you could simply omit that function convert_drop_table from your code and do what you need in-line. Then readers aren't obliged to look elsewhere to find out what it does.
And here's yet another way to parse those lines! :) Look for the asterisks then use their positions to select what you want.
currentDropTable = []
with open('droptable.txt') as droptable:
for line in droptable:
line = line.strip()
p = line.find('*')
q = line.rfind('*')
currentDropTable.append([line[0:p], line[1+p:q], line[1+q:]])
print (currentDropTable)

Categories

Resources