Wildcard for nested list-query

Wildcard for nested list-query - python

I've got a nested list and I'd like to check whether i is contained on the lowest level of my list (i is the first of two elements of one "sublist").
1) Is there a direct way to do this?
2) I tried the following:
for i in randomlist:
if [i,randomlist.count(i)] in list1:
Is there a way to replace randomlist.count(i) with a wildcard? I tried *,%,..., but non of these worked well. Any ideas?
Thanks in advance!

I think what you want is:
if any(l[0] == i for l in list1):
This will only check the first item in each sub-list, which is effectively the same as having a wild-card second element.

It seems that this is the actual problem:
input shows nested list with numbers and their counts in sublists:
[[86, 4], [67, 1], [89, 1],...] output: i need to know whether a
number with its count is already in the list (in order not to add it a
second time), but the count is unknown during the for loop
There are two ways to approach this problem. First, if the list does not have duplicates, simply convert it to a dictionary:
numbers = dict([[86,4],[67,1],[89,1]])
Now each number is the key, and the count a value. Next, if you want to know if a number is not in the dictionary, you have many ways to do that:
# Fetch the number
try:
count = numbers[14]
except KeyError:
print('{} does not exist'.format(14))
# Another way to write the above is:
count = numbers.get(14)
if not count:
print('{} does not exist'.format(14))
# From a list of a numbers, add them only if they don't
# exist in the dictionary:
for i in list_of_numbers:
if i not in numbers.keys():
numbers[i] = some_value
If there are already duplicates in the original list, you can still convert it into a dictionary but you need to do some extra work if you want to preserve all the values for the numbers:
from collections import defaultdict
numbers = defaultdict(list)
for key,value in original_list:
numbers[key].append(value)
Now if you have duplicate numbers, all their values are stored in a list. You can still follow the same logic:
for i in new_numbers:
numbers[i].append(new_value)
Except now if the number already existed, the new_value will just be added to the list of existing values.
Finally, if all you want to do is add to the list if the first number doesn't exist:
numbers = set(i[0] for i in original_list)
for i in new_numbers:
if i not in numbers:
original_list += [i, some_value]

Related

python: Simplest way to sort a list by a calculated value

If I have a master_list (list of lists), I want to know what the simplest way would to sort it by x value (I'll call it score). Score (a float) is derived by calling a function that calculates it based on the items in master_list[i] (a list).
The way I was going about is like this:
for i in range(len(master_list):
# call get_score(master_list[i]) (function that calculates score for master_list[i]) and
insert to index 0 in master_list[i]
sorted_list = sorted(master_list)
for i in range(len(master_list):
master_list[i].pop(0)
My function get_score() returns the a single float for one of the lists in master_list
It is important not to modify the original master_list, hence why I am removing score from master_list[i]. What I wish to know is if there is any way to accomplish without adding score to each master_list[i], then removing it.
I also tried something like this but I don't believe it would work:
score_sorted = sorted(master_list, key=get_score(iterm for item in master_list))
Expected output:
master_list = [['b', 1]['a', 2]['c', 3]]
If the score for master_list[i] are 3, 2, 1 for all items respectively then the ouput would be:
master_list_with_score = [[3,'b', 1][2,'a', 2][1,'c', 3]]
sorted_by_score_master_list = [['c', 3]['a', 2]['b', 1]]
Sorry about the formatting, it's my first time posting. Let me know if there is need for any clarification

You're just supposed to provide the function, sorted will call it on the list elements itself.
sorted(master_list, key=get_score)
Try it online!

I would keep a seperate list of scores.
master_list = master_list
score = get_score(master_list[i]) # i am assuming it returns a list of scores for the same index in master list
sorted(zip(score, master_list)) # then master list is sorted by the scores
If you want a seperated sorted list,
sorted_master_list = [i for (score,i) in sorted(zip(score, master_list))]

Need help speeding up this function

Input: A list of lists of various positions.
[['61097', '12204947'],
['61097', '239293'],
['61794', '37020977'],
['61794', '63243'],
['63243', '5380636']]
Output: A sorted list that contains the count of unique numbers in a list.
[4, 3, 3, 3, 3]
The idea is fairly simple, I have a list of lists where each list contains a variable number of positions (in our example there is only 2 in each list, but lists of up to 10 exist). I want to loop through each list and if there exists ANY other list that contains the same number then that list gets appended to the original list.
Example: Taking the input data from above and using the following code:
def gen_haplotype_blocks(df):
counts = []
for i in range(len(df)):
my_list = [item for item in df if any(x in item for x in df[i])]
my_list = list(itertools.chain.from_iterable(my_list))
uniq_counts = len(set(my_list))
counts.append(uniq_counts)
clear_output()
display('Currently Running ' +str(i))
return sorted(counts, reverse=True)
I get the output that is expected. In this case when I loop through the first list ['61097', '12204947'] I find that my second list ['61097', '239293'] both contain '61097' so these who lists get concatenated together and form ['61097', '12204947', '61097', '239293']. This is done for every single list outputting the following:
['61097', '12204947', '61097', '239293']
['61097', '12204947', '61097', '239293']
['61794', '37020977', '61794', '63243']
['61794', '37020977', '61794', '63243', '63243', '5380636']
['61794', '63243', '63243', '5380636']
Once this list is complete, I then count the number of unique values in each list, append that to another list, then sort the final list and return that.
So in the case of ['61097', '12204947', '61097', '239293'], we have two '61097', one '12204947' and one '239293' which equals to 3 unique numbers.
While my code works, it is VERY slow. Running for nearly two hours and still only on line ~44k.
I am looking for a way to speed up this function considerably. Preferably without changing the original data structure. I am very new to python.
Thanks in advance!

Too considerably improve the speed of your program, especially for larger data set. The key is to use a hash table, or a dictionary in Python's term, to store different numbers as the key, and the lines each unique number exist as value. Then in the second pass, merge the lists for each line based on the dictionary and count unique elements.
def gen_haplotype_blocks(input):
unique_numbers = {}
for i, numbers in enumerate(input):
for number in numbers:
if number in unique_numbers:
unique_numbers[number].append(i)
else:
unique_numbers[number] = [i]
output = [[] for _ in range(len(input))]
for i, numbers in enumerate(input):
for number in numbers:
for line in unique_numbers[number]:
output[i] += input[line]
counts = [len(set(x)) for x in output]
return sorted(counts, reverse=True)
In theory, the time complexity of your algorithm is O(N*N), N as the size of the input list. Because you need to compare each list with all other lists. But in this approach the complexity is O(N), which should be considerably faster for a larger data set. And the trade-off is extra space complexity.

Not sure how much you expect by saying "considerably", but converting your inner lists to sets from the beginning should speed up things. The following works approximately 2.5x faster in my testing:
def gen_haplotype_blocks_improved(df):
df_set = [set(d) for d in df]
counts = []
for d1 in df_set:
row = d1
for d2 in df_set:
if d1.intersection(d2) and d1 != d2:
row = row.union(d2)
counts.append(len(row))
return sorted(counts, reverse=True)

python : recognising if a list has no duplicate numbers

Im currently working on a game which uses lists and gives users choices depending on if there is/isnt duplicates in a list (really basic text based game), however i cant seem to code something that recognises if there is no duplicates without looping through and checking if the count for each number is greater than 1. I really just need something that would check if the list contains no duplicates so that i can then write the rest of the program.
so the psuedocode would look something like this
numbers = [1, 3, 5, 6]
check list for duplicates
if no duplicates:
do x

Use set function with sorted:
if sorted(set(y)) == sorted(y):
pass
Set remove duplicates from given list so its easy to check if your list has duplicates. Sorted its optional but if you give user option to input numbers in other order this will be helpful then.
set()
sorted()
Simpler solution if you don't need sorted use:
len(y) != len(set(y))
Its faster because u dont need use sort on both lists. Its return True of False.
Check for duplicates in a flat list

You can check the length of the set of the list and compare with its regular length:
l = [1, 3, 32, 4]
if len(set(l)) == len(l):
pass

Optimized code to check the element of list unique or not

python3 program that takes input a list and output if it is unique or not. The following is an example:
list_a = [1,2,3,4,5] #unique
list_b = [1,2,2,3,4] #not unique
I have wrote a python3 script for this problem:
for i in range(len(list_a)):
j = i+1
for j in range(len(list_a)):
if list_a[i] == list_a[j]:
print ("not unique")
else:
print ("unique")
Is this the only way to check it. I bet it isn't! I want some optimized code that is equivalent to above or simply that ouputs "unique" or "not unique" for a given list.
Thank you in advance.

The easiest way to do this is compare length of set of given list with length of list:
if len(l) != len(set(l)):
# not unique

You can use all() and sets, this will short-circuit as soon as a repeated item is found.
>>> def solve(lis):
... seen = set()
... return all(item not in seen and not seen.add(item) for item in lis)
...
>>> solve(range(5))
True
>>> solve([1,2,2,3,4])
False

Throw the list into a set:
set_a = set(list_a)
if len(set_a) == len(list_a):
print("unique")
else:
print("not unique")

You could use AVL Trees, add each element one by one in the tree while the inserted element is not yet in the tree.
In the case of big lists, it will be very fast in comparison of your current code.

If you want to easily find duplicate elements you can use collections.Counter:
import collections
a = [1, 2, 2, 3]
b = collections.Counter(a)
duplicates = [i for i in b if b[i] > 1]
Variable b is an object which acts a bit like a dictionary with keys being values from a and values being numbers indicating how many times this value appeared in the original list.
print(b[2])
would give 2.
Variable duplicates has all duplicate elements and you can use it like this:
if duplicates:
print("not unique")
else:
print("unique")
This is longer than generating a set, but you have much more information you can use.

Trying to add to dictionary values by counting occurrences in a list of lists (Python)

I'm trying to get a count of items in a list of lists and add those counts to a dictionary in Python. I have successfully made the list (it's a list of all possible combos of occurrences for individual ad viewing records) and a dictionary with keys equal to all the values that could possibly appear, and now I need to count how many times each occur and change the values in the dictionary to the count of their corresponding keys in the list of lists. Here's what I have:
import itertools
stuff=(1,2,3,4)
n=1
combs=list()
while n<=len(stuff):
combs.append(list(itertools.combinations(stuff,n)))
n = n+1
viewers=((1,3,4),(1,2,4),(1,4),(1,2),(1,4))
recs=list()
h=1
while h<=len(viewers):
j=1
while j<=len(viewers[h-1]):
recs.append(list(itertools.combinations(viewers[h-1],j)))
j=j+1
h=h+1
showcount={}
for list in combs:
for item in list:
showcount[item]=0
for k, v in showcount:
for item in recs:
for item in item:
if item == k:
v = v+1
I've tried a bunch of different ways to do this, and I usually either get 'too many values to unpack' errors or it simply doesn't populate. There are several similar questions posted but I'm pretty new to Python and none of them really addressed what I needed close enough for me to figure it out. Many thanks.

Use a Counter instead of an ordinary dict to count things:
from collections import Counter
showcount = Counter()
for item in recs:
showcount.update(item)
or even:
from collections import Counter
from itertools import chain
showcount = Counter(chain.from_iterable(recs))
As you can see that makes your code vastly simpler.

If all you want to do is flatten your list of lists you can use itertools.chain()
>>> import itertools
>>> listOfLists = ((1,3,4),(1,2,4),(1,4),(1,2),(1,4))
>>> flatList = itertools.chain.from_iterable(listOfLists)
The Counter object from the collections module will probably do the rest of what you want.
>>> from collections import Counter
>>> Counter(flatList)
Counter({1: 5, 4: 4, 2: 2, 3: 1})

I have some old code that resembles the issue, it might prove useful to people facing a similar problem.
import sys
file = open(sys.argv[-1], "r").read()
wordictionary={}
for word in file.split():
if word not in wordictionary:
wordictionary[word] = 1
else:
wordictionary[word] += 1
sortable = [(wordictionary[key], key) for key in wordictionary]
sortable.sort()
sortable.reverse()
for member in sortable: print (member)

First, 'flatten' the list using a generator expression: (item for sublist in combs for item in sublist).
Then, iterate over the flattened list. For each item, you either add an entry to the dict (if it doesn't already exist), or add one to the value.
d = {}
for key in (item for sublist in combs for item in sublist):
try:
d[key] += 1
except KeyError: # I'm not certain that KeyError is the right one, you might get TypeError. You should check this
d[key] = 1
This technique assumes all the elements of the sublists are hashable and can be used as keys.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Wildcard for nested list-query - python

I think what you want is: if any(l[0] == i for l in list1): This will only check the first item in each sub-list, which is effectively the same as having a wild-card second element.

Related

python: Simplest way to sort a list by a calculated value

Need help speeding up this function

python : recognising if a list has no duplicate numbers

Optimized code to check the element of list unique or not

Trying to add to dictionary values by counting occurrences in a list of lists (Python)

Categories

Resources