My code is meant to find a specific class element on a page labelled "lh-copy truncate silve" and then copy all links within the attribute as well as info into a list. As of right now, the code simply saves the list into a variable instead and I am having issues making the conversion.
Here is the code that I have so far:
age_sex = browser.find_elements_by_xpath('//*[#class="lh-copy truncate silver"]')
for ii in age_sex:
link = ii.find_element_by_xpath('.//a').get_attribute('href')
sex = ii.find_element_by_xpath('.//span').text
print(link, sex)
The code returns the information that I need in variable as opposed to list format.
Edit: The reason why I need it to be a list as opposed to a variable with a variable if I type variable[1], it'll just give me the second letter of the https:// link which is 't". Whereas if it is in list format, list[1] will return to me the full link. It's the only way that I know to be able to divide the block of text in a variable into separate links that can be accessed separately by my script.
It appears that your for loop is only printing individual elements. If you want lists of links and sexs, this may be helpful:
age_sex = browser.find_elements_by_xpath('//*[#class="lh-copy truncate silver"]')
link_list = []
sex_list = []
for ii in age_sex:
link = ii.find_element_by_xpath('.//a').get_attribute('href')
link_list.append(link)
sex = ii.find_element_by_xpath('.//span').text
sex_list.append(sex)
print(link_list, sex_list)
If you want to keep things together (i.e. list of link and sex pairs), you can have the following:
age_sex = browser.find_elements_by_xpath('//*[#class="lh-copy truncate silver"]')
result_list = []
for ii in age_sex:
link = ii.find_element_by_xpath('.//a').get_attribute('href')
sex = ii.find_element_by_xpath('.//span').text
result_list.append([link, sex])
print(result_list)
I hope I'm understanding your problem correctly.
# If your info in the variable are separated by something, a space for example or any specific char, try the following.
new_list = varibale.split(char)
# if it's a space:
new_list = varibale.split(' ')
Could you please explain clearer the problem?
Related
My code as it is right now looks like this:
def read_in_movie_preference():
"""Read the move data, and return a
preference dictionary."""
preference = {}
movies = []
# write code here:
file_location="./data/"
f = open(file_location+"preference.csv","r")
df = f.readlines()
#names as keys and prefrences
for line in df:
name = line[1].strip("\n").split(",")
prefs = line[2:].strip("\n").split(",")
preference[line[1]] = line[2:]
#print(test)
#movie names`
movietitles = df[0].strip("\n").split(",")
for movie in movietitles:
movie=movie.rstrip()
#can't seem to get rid of the spaces at the end
movies+=movietitles[2:]
print(movies)
return [movies, preference]
I cant seem to get the movie titles into the list without spaces at the end of some of them & I also cant add the names and preferences into the dictionary... I am supposed to do this task with basic python and no pandas .. very stuck would appreciate any help!
the dictionary would have names as keys and the preference numbers in number format instead of strings so it would theoretically look like this:
key: pref:
dennis, 0 1 0 1 0 ect
[![enter image description here][1]][1]this is what the data set looks like
here is the data pasted:
So the issue here is that you are using rstrip on a copy of the data but never apply it to the original.
The issue
for movie in movietitles:
movie=movie.rstrip() # Changes the (copy) of the data rather than the original
# We still need to apply this back to movietitles
There are a couple ways to fix this!
# Using indexing
for _ in range(len(movietitles)):
movietitles[_] = movietitles[_].rstrip()
Or we can do this inline with list comprehension
# Using list comprehension
movietitles = [movie.rstrip() for movie in movietitles]
As stated in the other answer, when working with csv data it's recomended to use a csv parser, but completely unnecessary for this scale! Hope this helps
I am trying to write a Python program as following:
list_a = []
list_b = []
list_c = []
listName = str(input('insert list name: '))
listValue = int(input('insert value: '))
listName.append(listValue)
But unfortunately "listName.append()" won't work.
Using only IF functions as in:
if listName == 'list_a':
list_a.append(listValue)
is impractical because I am actually working with 600+ lists...
Is there any other way I could do to make something like this work properly??
Help is much appreciated!
Thank you very much in advance!!
When you're tempted to use variable names to hold data — like the names of stocks — you should almost certainly be using a dictionary with the data as keys. You can still pre-populate the dictionary if you want to (although you don't have to).
You can change your existing to code to something like:
# all the valid names
names = ['list_a', 'list_b', 'list_c']
# pre-populate a dictionary with them
names = {name:[] for name in names}
# later you can add data to the arrays:
listName = str(input('insert list name: '))
listValue = int(input('insert value: '))
# append value
names[listName].append(listValue)
With this, all your data is in one structure. It will be easy to loop through it, make summary statistics like sums/averages act. Having a dictionary with 600 keys is no big deal, but having 600 individual variables is a recipe for a mess.
This will raise a key error if you try to add a name that doesn't exists, which you can catch in a try block if that's a possibility.
Keep your lists in a dict. You could initialize your dict from a file, db, etc.
lists = {"GOOG": []}
listName = str(input('insert list name: '))
listValue = int(input('insert value: '))
lists.setdefault(listName,[]).append(listValue)
# Just a little output to see what you've got in there...
for list_name, list_value in lists.items():
print(list_name, list_value)
So, following Mark Meyer's and MSlimmer's suggestions above, I am using a dictionary of lists to simplify data input, which has made this section of my program work flawlessly (thanks again, guys!).
However, I am experiencing problems with the next section of my code (which was working before when I had it all as lists haha). I have a dictionary as below:
names = {'list_a':[5, 7, -3], 'list_b':[10, 12, -10, -10]}
Now, I have to add up all positives and all negatives to each list. I want it to have the following result:
names_positives = {'list_a': 12, 'list_b': 22}
names_negatives = {'list_a': -3, 'list_b': -20}
I have tried three different ways, but none of them worked:
## first try:
names_positives = sum(a for a in names.values() if a > 0)
## second try:
names_positives = []
names_positives.append(a for a in names.values() if compras > 0)
## third try:
names_positives = dict(zip(names.keys(), [[sum(a)] for a in names.values() if compra > 0]))
To be honest, I have no idea how to proceed -- I am getting errors due to mixing strings and integers in those lines, and I am not being able to work some way around this problem... Any help is much appreciated. It could result in a dictionary, in a list or even in only the total sum (as in the first try, under no better circumstances).
Cheers!
You can try this one:
I just added one line for your code to work.
list_a = ['a']
list_b = []
list_c = []
listName = str(input('insert list name: '))
listValue = int(input('insert value: '))
eval(listName).append(listValue)
the eval function evaluates the string into an actual python expression. It should be noted however that there are security issues regarding the use of eval. But for the exact question that you were trying to answer, this would be the easiest way that wouldn't require much refactoring of the existing code.
I'm making a program that allows the user to log loot they receive from monsters in an MMO. I have the drop tables for each monster stored in text files. I've tried a few different formats but I still can't pin down exactly how to take that information into python and store it into a list of lists of lists.
The text file is formatted like this
item 1*4,5,8*ns
item 2*3*s
item 3*90,34*ns
The item # is the name of the item, the numbers are different quantities that can be dropped, and the s/ns is whether the item is stackable or not stackable in game.
I want the entire drop table of the monster to be stored in a list called currentDropTable so that I can reference the names and quantities of the items to pull photos and log the quantities dropped and stuff.
The list for the above example should look like this
[["item 1", ["4","5","8"], "ns"], ["item 2", ["2","3"], "s"], ["item 3", ["90","34"], "ns"]]
That way, I can reference currentDropTable[0][0] to get the name of an item, or if I want to log a drop of 4 of item 1, I can use currentDropTable[0][1][0].
I hope this makes sense, I've tried the following and it almost works, but I don't know what to add or change to get the result I want.
def convert_drop_table(list):
global currentDropTable
currentDropTable = []
for i in list:
item = i.split('*')
currentDropTable.append(item)
dropTableFile = open("droptable.txt", "r").read().split('\n')
convert_drop_table(dropTableFile)
print(currentDropTable)
This prints everything properly except the quantities are still an entity without being a list, so it would look like
[['item 1', '4,5,8', 'ns'], ['item 2', '2,3', 's']...etc]
I've tried nesting another for j in i, split(',') but then that breaks up everything, not just the list of quantities.
I hope I was clear, if I need to clarify anything let me know. This is the first time I've posted on here, usually I can just find another solution from the past but I haven't been able to find anyone who is trying to do or doing what I want to do.
Thank you.
You want to split only the second entity by ',' so you don't need another loop. Since you know that item = i.split('*') returns a list of 3 items, you can simply change your innermost for-loop as follows,
for i in list:
item = i.split('*')
item[1] = item[1].split(',')
currentDropTable.append(item)
Here you replace the second element of item with a list of the quantities.
You only need to split second element from that list.
def convert_drop_table(list):
global currentDropTable
currentDropTable = []
for i in list:
item = i.split('*')
item[1] = item[1].split(',')
currentDropTable.append(item)
The first thing I feel bound to say is that it's usually a good idea to avoid using global variables in any language. Errors involving them can be hard to track down. In fact you could simply omit that function convert_drop_table from your code and do what you need in-line. Then readers aren't obliged to look elsewhere to find out what it does.
And here's yet another way to parse those lines! :) Look for the asterisks then use their positions to select what you want.
currentDropTable = []
with open('droptable.txt') as droptable:
for line in droptable:
line = line.strip()
p = line.find('*')
q = line.rfind('*')
currentDropTable.append([line[0:p], line[1+p:q], line[1+q:]])
print (currentDropTable)
I have this part of code isolated for testing purposes and this question
noTasks = int(input())
noOutput = int(input())
outputClist = []
outputCList = []
for i in range(0, noTasks):
for w in range(0, noOutput):
outputChecked = str(input())
outputClist.append(outputChecked)
outputCList.append(outputClist)
outputClist[:] = []
print(outputCList)
I have this code here, and i get this output
[[], []]
I can't figure out how to get the following output, and i must clear that sublist or i get something completely wrong...
[["test lol", "here can be more stuff"], ["test 2 lol", "here can be more stuff"]]
In Python everything is a object. A list is a object with elements. You only create one object outputclist filling and clearing its contents. In the end, you have one list multiple times in outputCList, and as your last thing is clearing the list, this list is empty.
Instead, you have to create a new list for every task:
noTasks = int(input())
noOutput = int(input())
output = []
for i in range(noTasks):
checks = []
for w in range(noOutput):
checks.append(input())
output.append(checks)
print(output)
Instead of passing the contained elements in outputClist to outputCList (not the greatest naming practice either to just have one capitalization partway through be the only difference in variable names), you are passing a reference to the list itself. To get around this important and useful feature of Python that you don't want to make use of, you can pretty easily just pass a new list containing the elements of outputClist by changing this line
outputCList.append(outputClist)
to
outputCList.append(list(outputClist))
or equivalently, as #jonrsharpe states in his comment
outputCList.append(outputClist[:])
I'm quite new to programming so I'm sure there's a terser way to pose this, but I'm trying to create a personal bookmarking program. Given multiple urls each with a list of tags ordered by relevance, I want to be able to create a search consisting of a list of tags that returns a list of most relevant urls. My first solution, below, is to give the first tag a value of 1, the second 2, and so on & let the python list sort function do the rest. 2 questions:
1) Is there a much more elegant/efficient way of doing this (embarrass me!)
2) Any other general approaches to the sorting by relevance given the inputs above problem?
Much obliged.
# Given a list of saved urls each with a corresponding user-generated taglist
# (ordered by relevance), the user enters a "search" list-of-tags, and is
# returned a sorted list of urls.
# Generate sample "content" linked-list-dictionary. The rationale is to
# be able to add things like 'title' etc at later stages and to
# treat each url/note as in independent entity. But a single dictionary
# approach like "note['url1']=['b','a','c','d']" might work better?
content = []
note = {'url':'url1', 'taglist':['b','a','c','d']}
content.append(note)
note = {'url':'url2', 'taglist':['c','a','b','d']}
content.append(note)
note = {'url':'url3', 'taglist':['a','b','c','d']}
content.append(note)
note = {'url':'url4', 'taglist':['a','b','d','c']}
content.append(note)
note = {'url':'url5', 'taglist':['d','a','c','b']}
content.append(note)
# An example search term of tags, ordered by importance
# I'm using a dictionary with an ordinal number system
# This seems clumsy
search = {'d':1,'a':2,'b':3}
# Create a tagCloud with one entry for each tag that occurs
tagCloud = []
for note in content:
for tag in note['taglist']:
if tagCloud.count(tag) == 0:
tagCloud.append(tag)
# Create a dictionary that associates an integer value denoting
# relevance (1 is most relevant etc) for each existing tag
d={}
for tag in tagCloud:
try:
d[tag]=search[tag]
except KeyError:
d[tag]=100
# Create a [[relevance, tag],[],[],...] result list & sort
result=[]
for note in content:
resultNote=[]
for tag in note['taglist']:
resultNote.append([d[tag],tag])
resultNote.append(note['url'])
result.append(resultNote)
result.sort()
# Remove the relevance values & recreate a list containing
# the url string followed by corresponding tags.
# Its so hacky i've forgotten how it works!
# It's mostly for display, but suggestions on "best-practice"
# intermediate-form data storage?
finalResult=[]
for note in result:
temp=[]
temp.append(note.pop())
for tag in note:
temp.append(tag[1])
finalResult.append(temp)
print "Content: ", content
print "Search: ", search
print "Final Result: ", finalResult
1) Is there a much more elegant/efficient way of doing this (embarrass me!)
Sure thing. The basic idea: quit trying to tell Python what to do, and just ask it for what you want.
content = [
{'url':'url1', 'taglist':['b','a','c','d']},
{'url':'url2', 'taglist':['c','a','b','d']},
{'url':'url3', 'taglist':['a','b','c','d']},
{'url':'url4', 'taglist':['a','b','d','c']},
{'url':'url5', 'taglist':['d','a','c','b']}
]
search = {'d' : 1, 'a' : 2, 'b' : 3}
# We can create the tag cloud like this:
# tagCloud = set(sum((note['taglist'] for note in content), []))
# But we don't actually need it: instead, we'll just use a default value
# when looking things up in the 'search' dict.
# Create a [[relevance, tag],[],[],...] result list & sort
result = sorted(
[
[search.get(tag, 100), tag]
for tag in note['taglist']
] + [[note['url']]]
# The result will look like [ [relevance, tag],... , [url] ]
# Note that the url is wrapped in a list too. This makes the
# last processing step easier: we just take the last element of
# each nested list.
for note in content
)
# Remove the relevance values & recreate a list containing
# the url string followed by corresponding tags.
finalResult = [
[x[-1] for x in note]
for note in result
]
print "Content: ", content
print "Search: ", search
print "Final Result: ", finalResult
I suggest you also give a weight to each tag, depending on how rare it is (e.g. a “tarantula” tag would weigh more than a “nature” tag¹). For a given URL, rare tags that are common with other URLs should mark a stronger relevance, while frequently used tags of the given URL not existing in another URL should mark down the relevance.
It's easy to convert the rules I describe above as calculations of a numerical relevance for every other URL.
¹ unless all your URLs are related to “tarantulas”, of course :)