I was wondering if you could help me simplify my code or find an efficient method. I am given a nested array and I wish to add the second element based on the first nested item.
[('dog','1'),('dog','2'),('cat',1'),('cat','2')]
This will result in:
[('dog','3'),('cat','3')]
I would want the numbers to be strings instead of int type. Here is my code below:
dddd=[]
dddd=result_1_ce+result_2_ce+result_3_ce+result_4_ce
#Sum all of the elements from a prior find dddd stores [('dog','1'),('dog','2'),('cat',1'),('cat','2')]
newlist = [[int(element) if element.isdigit() else element for element in sub] for sub in dddd]
grouped = dict()
grouped.update((name,grouped.get(name,0)+value) for name,value in newlist)
dddd = [*map(list,grouped.items())]
#Of this manipulation display it in reverse order
dddd=sorted(dddd,key=lambda x:x[1],reverse=True)
X = [tuple(i) for i in dddd]
print("Findings:",X)
This code work
I am writing a comment where I change the code.
dddd=result_1_ce+result_2_ce+result_3_ce+result_4_ce
#Sum all of the elements from a prior find dddd stores [('dog','1'),('dog','2'),('cat',1'),('cat','2')]
newlist = [[int(element) if element.isdigit() else element for element in sub] for sub in dddd]
grouped = dict()
grouped.update((name,grouped.get(name,0)+value) for name,value in newlist)
dddd = [*map(list,grouped.items())]
#Of this manipulation display it in reverse order
dddd=sorted(dddd,key=lambda x:x[1],reverse=True)
X = [tuple([f,str(s)]) for f,s in dddd] # get two both element from the list of list and change second('s) element to str.
print("Findings:",X)
OUTPUT
Findings: [('dog', '3'), ('cat', '3')]
You dddd list is looks like this [['dog', 3], ['cat', 3]].
# If I write This
dddd = [['dog', 3], ['cat', 3]]
for f,s in dddd: # (f is for 'dog' and 'cat) and (s is for 3 and 3)
print(f)
print(s)
It seems to me, a very simple approach would be to convert to a dictionary first, and it is a good data structure for grouping. Also, use an integer to sum the numbers. You can use int of str if you are unsure if each number value will be int or str. Then to get the output of list of tuples, you just convert with a simple comprehension.
l = [("dog", "1"), ("dog", "2"), ("cat", 1), ("cat", "2")]
d = {}
for t in l:
d[t[0]] = d.setdefault(t[0], 0) + int(str(t[1]))
print([(k, str(v)) for k, v in d.items()])
Output:
[('dog', '3'), ('cat', '3')]
Related
This question already has answers here:
How to sum up a list of tuples having the same first element?
(5 answers)
How to perform a groupby operation in Python on a list of tuples where we need to sum the second element? [duplicate]
(1 answer)
Closed 10 months ago.
Convert a nested list from [[...],[...]] to [(...),(...)]. I wish to format my list below :
x=[['dog', 2], ['bird', 1],['dog',1]]
to
x=[('dog', 3), ('bird', 1)]
Here is my code for reference.
#Convert last element of nested list to int
newlist = [[int(element) if element.isdigit() else element for element in sub for sub in x]
#add the 2 columns that match
grouped = dict()
grouped.update((name,grouped.get(name,0)+value) for name,value in newlist)
x = [*map(list,grouped.items())]
Could this be due to my use of a dict()
I have been successful with adding the second indices given that the first ones match, however the result is being formatted as such
x=[['dog', 3], ['bird', 1]]
however, I would like it as so any advice on how to get this ideal output?
x=[('dog', 3), ('bird', 1)]
I guess you are looking for collections.Counter:
from collections import Counter
x=[['dog', 2], ['bird', 1],['dog',1]]
c = Counter()
for k, v in x:
c[k] += v
print(c)
# as pointed out by wim in the comments, use the below
# to get a list of tuples:
print([*c.items()])
Here is one way to do so:
x = [['dog', 2], ['bird', 1], ['dog', 1]]
data = {k: 0 for k, _ in x}
for key, num in x:
data[key] += num
print(list(data.items())) # [('dog', 3), ('bird', 1)]
You can also use setdefault():
data = {}
for key, num in x:
data.setdefault(key, 0)
data[key] += num
print(list(data.items()))
looks like this works
newlist = [int(element) if element[0].isdigit() else element for element in [sub for sub in x]]
# add the 2 columns that match
grouped = dict()
grouped.update((name, grouped.get(name, 0) + value) for name, value in newlist)
x = [*map(tuple, grouped.items())]
Don't make it a list in the first place. The only real thing to note here is replacing list with tuple however, I also removed the unpacking ([*...]) and went directly to casting the parent as a list.
change:
x = [*map(list,grouped.items())]
to:
x = list(map(tuple, grouped.items()))
x=[['dog', 3], ['bird', 1]]
# You want it to be...
x=[('dog', 3), ('bird', 1)]
So you should first know how to convert ['dog', 3] to ('dog', 3):
>>> x = ['dog', 3]
>>> tuple(x)
('dog', 3)
To make it a tuple you just have to use the tuple's class constructor.
Then you have to apply this to the whole x list:
x = [tuple(i) for i in x]
I'm having some issues trying to figure this out (as i'm a pure beginner to python).
I have a list of names:
names_2 = ["Lars", "Per", "Henrik"]
Which I need to convert into a tuple who hold each elements length + the element it self.
I tried this:
namesTuple = tuple(names_2 + [len(name) for name in names_2])
Output of this is: ('Lars', 'Per', 'Henrik', 4, 3, 6)
The output im looking for is ('Lars', 4, 'Per', 3, 'Henrik', 6)
Anyone who can help?
You can use a nested generator expression in the tuple constructor, for instance:
names_tuple = tuple(x for name in names_2 for x in (name, len(name)))
# ('Lars', 4, 'Per', 3, 'Henrik', 6)
If you were to build it in a looping approach, it makes sense to build a list first (tuples are immutable):
names = []
for name in names_2:
# extend (both at once)
names.extend((name, len(name)))
# OR append one by one (no spurious intermediate tuple)
# names.append(name)
# names.append(len(name))
names_tuple = tuple(names)
names_2 = ["Lars", "Per", "Henrik"]
names = []
for name in names_2:
names.append(name)
names.append(len(name))
names = tuple(names)
Iterate over the names, append the name itself and its length to a list, and convert the list to tuple.
Or as a one-liner (but you'll end up with a tuple of tuples):
names_2 = ["Lars", "Per", "Henrik"]
names = tuple((name, len(name)) for name in names_2)
Zip the list of names with the list of lengths, then flatten the resulting list and convert that to a tuple.
from itertools import chain
namesTuple = tuple(chain.from_iterable(zip(names_2, map(len, names_2))))
If you prefer something a little less "functional", you can use a generator expression.
namesTuple = tuple(chain.from_iterable((x, len(x)) for x in names_2))
or (repeating #schwobaseggl's answer)
namesTuple = tuple(value for name in names_2 for value in (name, len(name)))
First create a tuple of tuples : ((name_1,lenght_1), (name_2,lenght_2),...)
The zip function is existing for that.
Secondly, you have to flatten this tuple of tuples.
[In]
names = ["Lars", "Per", "Henrik"]
[In]
zip_tupled = tuple(zip(names, [len(x) for x in names]))
[Out]
zip_tupled = (('Lars', 4), ('Per', 3), ('Henrik', 6))
[In]
final = tuple(item for subtuple in zip_tupled for item in subtuple)
[Out]
final = ('Lars', 4, 'Per', 3, 'Henrik', 6)
This solution is quite close to the solution of schwobaseggl...But less direct/straight.
Stackoverflow : how to flatten a list
In a data frame column, I have list of tuples containing int, str, float.
My objective is to extract the numeric value and store it in new column.
If there are two numeric value in the list of tuple, then two variables should be created for the two extracted values.
Input data -
List_Tuple
[('watch','price','is','$','100')]
[('there', 'was', '2','apple','and','2','mango')]
[('2','cat'),('3','mouse')]
I am not sure whether it can be done, not able to think on the next step.
Please guide and advise.
Expected Output -
Var1 Var2
100
2 2
2 3
Let us use the following test data:
List_Tuple = [
[('watch','price','is','$','100')],
[('there', 'were', '2','apples','and','2','mangos')],
[('2','cats'),('3','mice')],
]
Note that some of your lists contains one tuple, and some contain two tuples.
In order to search for the numeric values, it would help to merge them together.
chain.from_iterable from the `itertools' library is useful for this purpose:
Consider the following code:
for row in List_Tuple:
print(*itts.chain.from_iterable(row))
The above code prints as follows:
watch price is $ 100
there were 2 apples and 2 mangos
2 cats 3 mice
All that remains is to extract the numbers
import string
import re # regular expressions
def extract_numbers(in_data):
out_data = list()
for row in in_data:
merged_row = itts.chain.from_iterable(row)
merged_row = ''.join(merged_row)
print(merged_row)
match = re.search("\D*(\d+)\D*(\d*)", merged_row)
groups = match.groups() if match != None else None
out_data.append(groups)
return out_data
print('\n'.join((str(x) for x in extract_numbers(List_Tuple))))
The last print statement displays:
('100', '')
('2', '2')
('2', '3')
final = []
for tup in my_tuple:
for item in tup:
if item.isdigit():
final.append(item)
or as a list comprehension:
[item for item in tup for tup in my_list if item.isdigit()]
if you want to check for floats as well use isinstance(item, (int, float)) e.g.:
[item for item in tup for tup in my_list if isinstance(item, (int, float))]
edit: I believe this gets you the functionality you want?
df = pd.DataFrame([[[('watch','price','is','$','100')]],
[[('there', 'was', '2','apple','and','2','mango')]],
[[('2','cat'),('3','mouse')]]])
df.columns = ['x1']
def tuple_join(row):
tup = row[0]
tup_int = [item for item in tup if item.isdigit()]
return (tup_int)
test = lambda x: tuple_join(x)
df['a1'] = pd.DataFrame(df.x1.apply(test))
I'm trying to make a list that contains the most frequent tuple of a dictionary acording the first element. For example:
If d is my dictionary:
d = {(Hello, my): 1,(Hello, world):2, (my, name):3, (my,house):1}
I want to obtain a list like this:
L= [(Hello, world),(my, name)]
So I try this:
L = [k for k,val in d.iteritems() if val == max(d.values())]
But that only gives me the max of all the tuples:
L = [('my', 'name')]
I was thinking that maybe I have to go through my dictionary and make a new one for every first word of each tuple and then find the most frequent and put it on a list, but I'm having trouble to translate that in a code.
from itertools import groupby
# your input data
d = {('Hello', 'my'): 1,('Hello', 'world'):2, ('my', 'name'):3, ('my','house'):1}
key_fu = lambda x: x[0][0] # first element of first element,
# i.e. of ((a,b), c), return a
groups = groupby(sorted(d.iteritems(), key=key_fu), key_fu)
l = [max(g, key=lambda x:x[1])[0] for _, g in groups]
This is achievable in O(n) if you just re-key the mapping off the first word:
>>> d = {('Hello','my'): 1, ('Hello','world'): 2, ('my','name'): 3, ('my','house'): 1}
>>> d_max = {}
>>> for (first, second), count in d.items():
... if count >= d_max.get(first, (None, 0))[1]:
... d_max[first] = (second, count)
...
>>> d_max
{'Hello': ('world', 2), 'my': ('name', 3)}
>>> output = [(first, second) for (first, (second, count)) in d_max.items()]
>>> output
[('my', 'name'), ('Hello', 'world')]
In my opinion you should not just get the max on all the d values otherwise it just get the biggest value contained in your dictionary that is three in the specified case.
What I would do is create an intermediate list ( maybe this can be hidden ) that keeps in memory the first part of the key as second element, and the counter as first element. In this way you can just get the first element on the sorted list, to get the real max key.
You have pairs of words and a count associated to each of them. You could store your information in (or convert it to) 3-tuples:
d = [
('Hello', 'my', 1),
('Hello', 'world', 2),
('my', 'name', 3),
('my', 'house', 1)
]
For each word in the first position, you want to find the word in 2nd position occurs the most frequently. Sort the data according to the first word (any order, just to group them), then according to the count (descending).
d.sort(lambda t1,t2: cmp(t2[2],t1[2]) if (t1[0]==t2[0]) else cmp(t1[0],t2[0]))
Finally, iterate through the resulting array, keeping track of the last word encountered, and append only when encountering a new word in 1st position.
L = []
last_word = ""
for word1, word2, count in d:
if word1 != last_word:
L.append((word1,word2))
last_word = word1
print L
By running this code, you obtain [('Hello', 'world'), ('my', 'name')].
I have list a = ["string2" , "string4"] and list b = ["string1" , "string2" , "string3" , "string4" , "string5"] and I want to check if "string2" and "string4" from list a match those in list b and if it does, append list c with it's corresponding index in list b so list c should be [1,3]
My code so far:
for x in a:
for y in b:
if x == y:
print (x)
So I managed to print them out but don't know how to get the index.
Now this is the simpler version of my problem and I could just solve it like this but just for fun I will tell you the whole thing.
I have a list of tuples generated with nltk.word_tokenize in the following format [('string1', 'DT'), ('string2', 'NNP'), ('string3', 'NNP'), ('string4', 'NNP'), ('string5', 'VBZ'), ("string6", 'RB')] and I want to check witch of the words(string1, string2, string3 etc) are found in another list of words (the stopwords list ex: stopwords = ["string312" , "string552" , string631"]) and if found I would like to know their index in my list of tuples by creating another list that will store those indexes or remain empty if none found.
You can use index from your second list, while iterating over elements of the first list in a list comprehension.
>>> a = ["string2" , "string4"]
>>> b = ["string1" , "string2" , "string3" , "string4" , "string5"]
>>> c = [b.index(i) for i in a]
>>> c
[1, 3]
If there is a possibility that an element may be in a but not in b then you can modify this slightly
>>> [b.index(i) for i in a if i in b]
[1, 3]
A continuation to your posted code:
c = []
for x in a:
for y in b:
if x == y:
print(x)
c.append(b.index(x))
Use enumerate combined with list comprehension to get the indexes directly in a list.
>>> [i for i,j in enumerate(b) if j in a]
[1,3]
You can make a dictionary of element->index by using enumerate on b. This has linear time complexity, but after you complete this step, all of your index lookups will be in constant time O(1), and you'll also have an easy way to see if the value from a could not be found in b, because dict.get will return None. You will also be able to do an O(1) filter operation on a by checking the existence of its elements in the dictionary first, which also makes your second loop have linear time complexity.
>>> a = [50, 150, 250]
>>> b = list(range(200))
>>> bindex = {x: i for i, x in enumerate(b)}
>>> [bindex.get(x) for x in a]
[50, 150, None]
>>> [bindex[x] for x in a if x in bindex]
[50, 150]
If you are comfortable with sets, you can use the intersection property of sets.
set1 = set(a)
set2 = set(b)
set3 = a & b #intersection
You can convert back 'set3' to a list and use a list comprehension.
c = list(set3)
[c.index(i) for i in c]