I can't seem to find a question on SO about my particular problem, so forgive me if this has been asked before!
Anyway, I'm writing a script to loop through a set of URL's and give me a list of unique urls with unique parameters.
The trouble I'm having is actually comparing the parameters to eliminate multiple duplicates. It's a bit hard to explain, so some examples are probably in order:
Say I have a list of URL's like this
hxxp://www.somesite.com/page.php?id=3&title=derp
hxxp://www.somesite.com/page.php?id=4&title=blah
hxxp://www.somesite.com/page.php?id=3&c=32&title=thing
hxxp://www.somesite.com/page.php?b=33&id=3
I have it parsing each URL into a list of lists, so eventually I have a list like this:
sort = [['id', 'title'], ['id', 'c', 'title'], ['b', 'id']]
I nee to figure out a way to give me just 2 lists in my list at that point:
new = [['id', 'c', 'title'], ['b', 'id']]
As of right now I've got a bit to sort it out a little, I know I'm close and I've been slamming my head against this for a couple days now :(. Any ideas?
Thanks in advance! :)
EDIT: Sorry for not being clear! This script is aimed at finding unique entry points for web applications post-spidering. Basically if a URL has 3 unique entry points
['id', 'c', 'title']
I'd prefer that to the same link with 2 unique entry points, such as:
['id', 'title']
So I need my new list of lists to eliminate the one with 2 and prefer the one with 3 ONLY if the smaller variables are in the larger set. If it's still unclear let me know, and thank you for the quick responses! :)
I'll assume that subsets are considered "duplicates" (non-commutatively, of course)...
Start by converting each query into a set and ordering them all from largest to smallest. Then add each query to a new list if it isn't a subset of an already-added query. Since any set is a subset of itself, this logic covers exact duplicates:
a = []
for q in sorted((set(q) for q in sort), key=len, reverse=True):
if not any(q.issubset(Q) for Q in a):
a.append(q)
a = [list(q) for q in a] # Back to lists, if you want
Related
Thank you for looking at my issue.
I'm trying to compare cells from three csv files to make sure they are exactly the same info. the cells in the csv can contain names, dates or ID numbers. All have to match.
compile = []
for a in Treader,Vreader,Dreader:
for b in a:
compile.append(b[0])
However, the number of variables will fluctuate and I don't want to keep adding index splicing every time. see "complie.append(b[0])" . The question now what way can I construct this to give me a random amount of variables and random number of indexes based on the length "len" of the original list. can i use the range function for that? not sure how i can create something like this.
The current question I have is
List = [[sally,john,jim], [sally,john,jim], [sally,john,jim]]
If I have the list above how could I get it to show
List =[sally,sally,sally]
List1 = [john,john,john]
List2 = [jim,jim,jim]
Also I want to be able to come up with unlimited number of list based on the length of this list that is inside the list. In this case its 3 for three names.
Some of my list has 30 some has 5 so its important I can assign it without having to type list1 to list 30 and manually assign each one.
you may use:
compile = list(zip(Treader,Vreader,Dreader))
this will create a list of tuples, a tuple will have like (sally,john,jim)
after your edit
you may use:
list(zip(*List))
output:
[('sally', 'sally', 'sally'), ('john', 'john', 'john'), ('jim', 'jim', 'jim')]
I am new to data wrangling in python.
I have a column in a dataframe that has text like:
I really like Product A!
I think Product B is for me!
I will go with Product C.
My objective is to create a new column with Product Name (Including the word 'Product'). I do not want to use Regex. Product name is unique in a row. So there will be no row with string such as
I really like Product A and Product B
Problem in generic form: I have a list of unique items. lets call it list A. I have another list of strings where each string includes atmost one of the items from list A. How do I create a new list with matched item.
I have written the following code. It works fine. But even I (new to progamming) can tell this is highly inefficient.
Any better and elegant solution?
product_type = ['Product A', 'Product B', 'Product C', 'Product D']
product_list = [None] * len(fed_df['product_line'])
for i in range(len(product_list)):
for product in product_type:
if product in fed_df['product_line'][i]:
product_list[i] = product
fed_df['product_line'] = product_list
Short Background
Fundamentally, at some point, each element of each list will need to be compared similarly to how you've written it (although you can skip to the next loop once a match has been found). But the trick with writing good python code, is to utilise functionality written on a lower level for efficiency, rather than trying to write it yourself. For example: You should try to avoid using
for i in range(len(myList)): #code which accesses myList[i]
when you can use
for myListElement in myList: #code which uses myListElement
since in the latter, the accessing of myList is handled internally, and more efficiently than python calculating i manually, then accessing the ith element of myList. This fact is true of some other high-level programming languages too.
Actual Answer
Anyway, to answer your question, I came up with the following and I believe it would be more efficient:
answer = map(lambda product_line_element: next(filter(lambda product: product in product_line_element,product_type),None), fed_df['product_line'])
What this does is it maps each line (map) of the fed_df['product_line'] and replaces that element with the first element (next) in a list containing the product types found in each line of products in fed_df['product_line'] (filter).
How I tested
To test this I made a list of lists to use as fed_df['productline']
[['h', 'a', 'g'], ['k', 'b', 'l'], ['u', 't', 'a'], ['r', 'e', 'p'], ['g', 'e', 'b']]
and searched for "a" and "b" "product_types", which gave
['a', 'b', 'a', None, 'b']
as a result, which I think is what you are after...
These mapping functions are usually preferred over for loops, since it promotes no mutation, and can be made multi-threaded/multi-process more easily.
Another bonus of this solutions is that the result isn't calculated until future code attempts to access answer, which spreads the CPU usage a bit better. You can force it to be calculated by converting answer into a list (list(answer)), but it shouldn't be necessary.
I hope I understood your problem correctly. Let me know if you have any questions :)
Alright, so I have a question. I am working on creating a script that grabs a random name from a list of provided names, and generates them in a list of 5. I know that you can use the command
items = ['names','go','here']
rand_item = items[random.randrange(len(items))]
This, if I am not mistaken, should grab one random item from the list. Though if I am wrong correct me, but my question is how would I get it to generate, say a list of 5 names, going down like below;
random
names
generated
using
code
Also is there a way to make it where if I run this 5 days in a row, it doesn't repeat the names in the same order?
I appreciate any help you can give, or any errors in my existing code.
Edit:
The general use for my script will be to generate task assignments for a group of users every day, 5 days a week. What I am looking for is a way to generate these names in 5 different rotations.
I apologize for any confusion. Though some of the returned answers will be helpful.
Edit2:
Alright so I think I have mostly what I want, thank you Markus Meskanen & mescalinum, I used some of the code from both of you to resolve most of this issue. I appreciate it greatly. Below is the code I am using now.
import random
items = ['items', 'go', 'in', 'this', 'string']
rand_item = random.sample(items, 5)
for item in random.sample(items, 5):
print item
random.choice() is good for selecting on element at random.
However if you want to select multiple elements at random without repetition, you could use random.sample():
for item in random.sample(items, 5):
print item
For the last question, you should trust the (pseudo-) random generator to not give the same sequence on two consecutive days. The random seed is initialized with current time by default, so it's unlikely to observe the same sequence on two consecutive days, altough not impossible, especially if the number of items is small.
If you absolutely need to avoid this, save the last sequence to a file, and load it before shuffling, and keep shuffling until it gives you a different order.
You could use random.choice() to get one item only:
items = ['names','go','here']
rand_item = random.choice(items)
Now just repeat this 5 times (a for loop!)
If you want the names just in a random order, use random.shuffle() to get a different result every time.
It is not clear in your question if you simply want to shuffle the items or make choose a subset. From what I've made sense you want the second case.
You can use random.sample, to get a given number of random items from a list in python. If I wanted to get 3 randomly items from a list of five letters, I would do:
>>> import random
>>> random.sample(['a', 'b', 'c', 'd', 'e'], 3)
['b', 'a', 'e']
Note that the letters are not necessarily returned in the same order - 'b' is returned before 'a', although that wasn't the case in the original list.
Regarding the second part of your question, preventing it from generating
the same letters in the same order, you can append every new generated sublists in a file, retrieving this file during your script execution and generating a new sublist until it is different from every past generated sublist.
random.shuffle(items) will handle the random order generation
In [15]: print items
['names', 'go', 'here']
In [16]: for item in items: print item
names
go
here
In [17]: random.shuffle(items)
In [18]: for item in items: print item
here
names
go
For completeness, I agree with the above poster on random.choice().
I have a list of the form
['A', 'B', 'C', 'D']
which I want to mutate into:
[('Option1','A'), ('Option2','B'), ('Option3','C'), ('Option4','D')]
I can iterate over the original list and mutate successfully, but the closest that I can come to what I want is this:
["('Option1','A')", "('Option2','B')", "('Option3','C')", "('Option4','D')"]
I need the single quotes but don't want the double quotes around each list.
[EDIT] - here is the code that I used to generate the list; although I've tried many variations. Clearly, I've turned 'element' into a string--obviously, I'm not thinking about it the right way here.
array = ['A', 'B', 'C', 'D']
listOption = 0
finalArray = []
for a in array:
listOption += 1
element = "('Option" + str(listOption) + "','" + a + "')"
finalArray.append(element)
Any help would be most appreciated.
[EDIT] - a question was asked (rightly) why I need it this way. The final array will be fed to an application (Indigo home control server) to populate a drop-down list in a config dialog.
[('Option{}'.format(i+1),item) for i,item in enumerate(['A','B','C','D'])]
# EDIT FOR PYTHON 2.5
[('Option%s' % (i+1), item) for i,item in enumerate(['A','B','C','D'])]
This is how I'd do it, but honestly I'd probably try not to do this and instead want to know why I NEEDED to do this. Any time you're making a variable with a number in it (or in this case a tuple with one element of data and one element naming the data BY NUMBER) think instead how you could organize your consuming code to not need that instead.
For instance: when I started coding professionally the company I work for had an issue with files not being purged on time at a few of our locations. Not all the files, mind you, just a few. In order to provide our software developer with the information to resolve the problem, we needed a list of files from which sites the purge process was failing on.
Because I was still wet behind the ears, instead of doing something SANE like making a dictionary with keys of the files and values of the sizes, I used locals() to create new variables WITH MEANING. Don't do this -- your variables should mean nothing to anyone but future coders. Basically I had a whole bunch of variables named "J_ITEM" and "J_INV" and etc with a value 25009 and etc, one for each file, then I grouped them all together with [item for item in locals() if item.startswith("J_")]. THAT'S INSANITY! Don't do this, build a saner data structure instead.
That said, I'm interested in how you put it all together. Do you mind sharing your code by editing your answer? Maybe we can work together on a better solution than this hackjob.
x = ['A','B','C','D']
option = 1
answer = []
for element in x:
t = ('Option'+str(option),element) #Creating the tuple
answer.append(t)
option+=1
print answer
A tuple is different from a string, in that a tuple is an immutable list. You define it by writing:
t = (something, something_else)
You probably defined t to be a string "(something, something_else)" which is indicated by the quotations surrounding the expression.
In addition to adsmith great answer, I would add the map way:
>>> map(lambda (index, item): ('Option{}'.format(index+1),item), enumerate(['a','b','c', 'd']))
[('Option1', 'a'), ('Option2', 'b'), ('Option3', 'c'), ('Option4', 'd')]
I have a dictionary in the form.
dictName = {'Hepp': [-1,0,1], 'Fork': [-1,-1,-1], 'Dings': [0,0,1]}
and I basically want to pull out the values ( the lists )
and add them together elementwise and get a vector as a result, like
[-2,-1,1]
I am having a hard time figuring out how to code this, and all examples I have found for adding lists assumes that I can make it into tuples, but I might have to add like 100 lists together.
Can anyone of you guys help out?
You can use a list comprehension, and zip:
[sum(t) for t in zip(*dictName.itervalues())]