The difference results about difflib in python - python

I'm novice in python.Now I'm learning difflib in python.I want to know why
for x in difflib.Differ().compare([1,2,3],[0,2,1]):
print x
result:
+ 0
+ 2
1
- 2
- 3
why not :
+ 0
2
1

Difflib respects the ordering of arguments. It essentially shows the edits that would transform one sequence into another.
When you don't care about order, a set difference may be what you want:
>>> {1, 2, 3} - {0, 2, 1}
set([3])
>>> {0, 2, 1} - {1, 2, 3}
set([0])

Related

how to return the intersection of all possible combinations of n different lists

I was wondering if there is an algorithm that can return the intersection of all possible combinations of n different lists. My example is the following with n = 3 different lists:
list1 = [1,2,3,4,5]
list2 = [1,3,5]
list3 = [1,2,5]
the outcome should look like this:
list1_2_intersection = [1,3,5]
list1_3_intersection = [1,2,5]
list2_3_intersection = [1,5]
list1_2_3_intersection = [1,5]
I was thinking to first use combination to get all possible combinations of n sets and use that to create intersections using intersection manually. However, since I have 6 different sets this seems very time consuming, which is why I was wondering if there is a more efficient way to compute this. Thanks in advance! :)
If you have all sets in a list, you can use the more-itertools-package (pip install more-itertools), which is based on itertools, to get all combinations of those elements using more_itertools.powerset, as is done in this post.
Then getting the intersection is a matter of using set.intersection as you point out yourself. So a solution can look like this
from more_itertools import powerset
sets = [{1,2,3,4,5},{1,3,5},{1,2,5}]
pwset = powerset(sets)
res = [c[0].intersection(*c[1:]) if len(c)>1 else c for c in pwset]
If you just load or define the sets in an iterable like a list:
my_sets = [
{1, 2, 3, 4, 5},
{1, 3, 5},
{1, 2, 5}
]
my_set_intersection = set.intersection(*my_sets)
print(my_set_intersection)
Of course, the print is just there to show the result, which is in my_set_intersection
If you just have a couple of sets:
set1 = {1, 2, 3, 4, 5},
set2 = {1, 3, 5},
set3 = {1, 2, 5}
intersection_123 = set1 & set2 & set3
# or:
intersection_123 = set.intersection(set1, set2, set3)
For all possible intersections of combinations of all possible sizes of a group of sets:
from itertools import combinations
my_sets = [
{1, 2, 3, 4, 5},
{1, 3, 5},
{1, 2, 5}
]
all_intersections = [set.intersection(*x) for n in range(len(my_sets)) for x in combinations(my_sets, n+1)]
print(all_intersections)
Result:
[{1, 2, 3, 4, 5}, {1, 3, 5}, {1, 2, 5}, {1, 3, 5}, {1, 2, 5}, {1, 5}, {1, 5}]
If you don't like the duplicates (because different combinations of sets can yield the same intersection), you can of course just make the list a set instead.
Note that this is similar to the solution using more_itertools, but I don't like requiring a third party library for something as trivial as generating a powerset (although it may perform better if well-written, which may matter for extremely large sets).
Also note that this leaves out the empty set (the intersection of a combination of size 0), but that's trivial to add of course.

pandas: Find overlap of clubs

I am given a (pandas) dataframe telling me about membership relations of people and clubs. What I want to find is the number of members that any two clubs have in common.
Example Input:
Person Club
1 A
1 B
1 C
2 A
2 C
3 A
3 B
4 C
In other words, A = {1,2,3}, B = {1,3}, and C = {1,2,4}.
Desired output:
Club 1 Club 2 Num_Overlaps
A B 2
A C 2
B C 1
I can of course write python code that calculates those numbers, but I guess there must be a more dataframe-ish way using groupby or so to accomplish the same.
First, I grouped the dataframe on the club to get a set of each person in the club.
grouped = df.groupby("Club").agg({"Person": set}).reset_index()
Club Person
0 A {1, 2, 3}
1 B {1, 3}
2 C {1, 2, 4}
Then, I created a Cartesian product of this dataframe. I didn't have pandas 1.2.0, so I couldn't use the cross join available in df.merge(). Instead, I used the idea from this answer: pandas two dataframe cross join
grouped["key"] = 0
product = grouped.merge(grouped, on="key", how="outer").drop(columns="key")
Club_x Person_x Club_y Person_y
0 A {1, 2, 3} A {1, 2, 3}
1 A {1, 2, 3} B {1, 3}
2 A {1, 2, 3} C {1, 2, 4}
3 B {1, 3} A {1, 2, 3}
4 B {1, 3} B {1, 3}
5 B {1, 3} C {1, 2, 4}
6 C {1, 2, 4} A {1, 2, 3}
7 C {1, 2, 4} B {1, 3}
8 C {1, 2, 4} C {1, 2, 4}
I then filtered out pairs where Club_x < Club_y so it removes duplicate pairs.
filtered = product[product["Club_x"] < product["Club_y"]]
Club_x Person_x Club_y Person_y
1 A {1, 2, 3} B {1, 3}
2 A {1, 2, 3} C {1, 2, 4}
5 B {1, 3} C {1, 2, 4}
Finally, I added the column with the overlap size and renamed columns as necessary.
result = filtered.assign(Num_Overlaps=filtered.apply(lambda row: len(row["Person_x"].intersection(row["Person_y"])), axis=1))
result = result.rename(columns={"Club_x": "Club 1", "Club_y": "Club 2"}).drop(["Person_x", "Person_y"], axis=1)
Club 1 Club 2 Num_Overlaps
1 A B 2
2 A C 2
5 B C 1
You can indeed do this with groupby and some set manipulation. I would also use itertools.combinations, to get the list of club pairs.
import pandas as pd
from itertools import combinations
df = pd.DataFrame({'Person': [1, 1, 1, 2, 2, 3, 3, 4],
'Club': list('ABCACABC')})
members = df.groupby('Club').agg(set)
clubs = sorted(list(set(df.Club)))
overlap = pd.DataFrame(list(combinations(clubs, 2)),
columns=['Club 1', 'Club 2'])
def n_overlap(row):
club1, club2 = row
members1 = members.loc[club1, 'Person']
members2 = members.loc[club2, 'Person']
return len(members1.intersection(members2))
overlap['Num_Overlaps'] = overlap.apply(n_overlap, axis=1)
overlap
Club 1 Club 2 Num_Overlaps
0 A B 2
1 A C 2
2 B C 1
Note there is one difference to your desired output, but that is probably as it should be, as noted by #rchome in the comment above.

Summing up collections.Counter objects using `groupby` in pandas

I am trying to group the words_count column by both essay_Set and domain1_score and adding the counters in words_count to add the counters results as mentioned here:
>>> c = Counter(a=3, b=1)
>>> d = Counter(a=1, b=2)
>>> c + d # add two counters together: c[x] + d[x]
Counter({'a': 4, 'b': 3})
I grouped them using this command:
words_freq_by_set = words_freq_by_set.groupby(by=["essay_set", "domain1_score"]) but do not know how to pass the Counter addition function to apply it on words_count column which is simply +.
Here is my dataframe:
GroupBy.sum works with Counter objects. However I should mention the process is pairwise, so this may not be very fast. Let's try
words_freq_by_set.groupby(by=["essay_set", "domain1_score"])['words_count'].sum()
df = pd.DataFrame({
'a': [1, 1, 2],
'b': [Counter([1, 2]), Counter([1, 3]), Counter([2, 3])]
})
df
a b
0 1 {1: 1, 2: 1}
1 1 {1: 1, 3: 1}
2 2 {2: 1, 3: 1}
df.groupby(by=['a'])['b'].sum()
a
1 {1: 2, 2: 1, 3: 1}
2 {2: 1, 3: 1}
Name: b, dtype: object

Python: check if one set can be created from another sets

I have list of sets:
graphs = [{1, 2, 3}, {4, 5}, {6}]
I have to check if input set can be created as sum of sets inside graphs.
For example:
input1 = {1, 2, 3, 6} # answer - True
input2 = {1, 2, 3, 4} # answer - False, because "4" is only a part of another set, only combinations of full sets are required
In other words, there are all combinations of sets inside graphs:
{1, 2, 3}
{4, 5}
{6}
{1, 2, 3, 6}
{1, 2, 3, 4, 5}
{4, 5, 6}
{1, 2, 3, 4, 5, 6}
I need to know, if one of these combinations is equal to input.
How should I correctly iterate through graphs elements to get answer? If graphs is bigger, there would be some problems with finding all the combinations.
I think you are looking at this the wrong way. I think it is better to remove any set that contains an element you cannot use (i.e. remove set {4,5} when you are looking for {1,2,3,4}. Then create union of all other sets and see if this is equal to your input set.
This way you will not need to find all combinations, just a (at most) O(n*len(sets)) elimination step at first.
graphs = [i for i in graphs if i.issubset(input1) ]
check for answer:
result = set().union(*graphs) == input1
You can find all combinations with itertools.combinations, then simply compare the sets:
from itertools import combinations, chain
def check(graphs, inp):
for i in range(1, len(graphs)+1):
for p in combinations(graphs, i):
if set(chain(*p)) == inp:
return True
return False
graphs = [{1, 2, 3}, {4, 5}, {6}]
input1 = {1, 2, 3, 6}
input2 = {1, 2, 3, 4}
print(check(graphs, input1))
print(check(graphs, input2))
Prints:
True
False

How to compare values within an array in Python - find out whether 2 values are the same

I basically have an array of 50 integers, and I need to find out whether any of the 50 integers are equal, and if they are, I need to carry out an action.
How would I go about doing this? As far as I know there isn't currently a function in Python that does this is there?
If you mean you have a list and you want to know if there are any duplicate values, then make a set from the list and see if it's shorter than the list:
if len(set(my_list)) < len(my_list):
print "There's a dupe!"
This won't tell you what the duplicate value is, though.
If you have Python 2.7+ you can use Counter.
>>> import collections
>>> input = [1, 1, 3, 6, 4, 8, 8, 5, 6]
>>> c = collections.Counter(input)
>>> c
Counter({1: 2, 6: 2, 8: 2, 3: 1, 4: 1, 5: 1})
>>> duplicates = [i for i in c if c[i] > 1]
>>> duplicates
[1, 6, 8]
If your actions need to know the number or how many times that number gets repeated over your input list then groupby is a good choice.
>>> from itertools import groupby
>>> for x in groupby([1,1,2,2,2,3]):
... print x[0],len(list(x[1]))
...
1 2
2 3
3 1
The first number is the element and the second the number of repetitions. groupby works over a sorted list so make sure you sort your input list, for instance.
>>> for x in groupby(sorted([1,1,2,4,2,2,3])):
... print x[0],len(list(x[1]))
...
1 2
2 3
3 1
4 1
You can convert the list to a Set, and check their lengths.
>>> a = [1, 2, 3]
>>> len(set(a)) == len(a)
True
>>> a = [1, 2, 3, 4, 4]
>>> len(set(a)) == len(a)
False
>>> arbitrary_list = [1, 1, 2, 3, 4, 4, 4]
>>> item_occurence = dict([(item, list.count(item)) for item in arbitrary_list])
{1: 2, 2: 1, 3: 1, 4: 3}
If you want to see which values are not unique you can get a list of those values by
>>> filter(lambda item: item_occurence[item] > 1, item_occurence)
[1, 4]

Categories

Resources