Conditionally access list of tuples and sum - python

I have some data that is contained within a list of tuples. I want to sum one part of each tuple if the other part meets a certain set of conditions. Here is some example data:
var = [("car", '1'), ("dog", '1'), ("mercedes", '1'), ("cat", '1'), ("ferrari", '1'), ("bird", '1')]
I have the following code that will allow me access all the numeric data in the above structure:
var = [x[1] for x in var]
print ",".join(map(lambda x: str(x).strip(), var))
This will print out data in the following format:
1,1,1,1,1,1
If I instead used x[0] in the list comprehension I would get an output of:
car, dog, mercedes, cat, ferrari, bird
What I would like to have though is something that says:
if x[0] == "car" or x[0] == "mercedes" or x[0] == "ferrari" then var2 == x[1] + x[1] + x[1]
print var2
I'm assuming that that above won't work, but I'm not really sure of how to code it in a way that will work.
The above is a simple demonstration. The full string I am parsing is:
[("'goal','corner','rightfoot'", '1'), ("'goal','directfreekick','leftfoot'", '1'),
("'goal','openplay','leftfoot'", '1'), ("'goal','openplay','rightfoot'", '2'),
("'miss','corner','header'", '3'), ("'miss','directfreekick','leftfoot'", '1'),
("'miss','directfreekick','rightfoot'", '1'), ("'miss','openplay','header'", '3'),
("'miss','openplay','leftfoot'", '8'), ("'miss','openplay','rightfoot'", '11')]
...and the exact syntax I am using to parse is:
matching = {"'goal','openplay','leftfoot'", "'goal','openplay','rightfoot'", "'goal','corner','leftfoot'", "'goal','corner','rightfoot'"}
regex2 = [value for key, value in regex2 if key in matching]
regex2 = sum(int(value) for key, value in regex2 if key in matching)
print regex2
...where regex2 is assigned the value of the list of tuples above. The sum line is the one that causes the error. The line above it prints as so:
['1', '1', '2']

Use sum() with a generator expression, testing for your conditions:
matching = {'car', 'mercedes', 'ferrari'}
sum(int(value) for key, value in var if key in matching)
The generator expression does much the same as your list comprehension does; loop over the list and do something with each element. I chose to use tuple assignment in the loop; the two elements in each tuple are assigned to key and value respectively. We can then filter using an if statement (if the key value is a member of the matching set), and use only the value part in the sum.
A quick demo to show you what happens, including a list comprehension version to show you that only a subset of values are picked:
>>> var = [("car", '1'), ("dog", '1'), ("mercedes", '1'), ("cat", '1'), ("ferrari", '1'), ("bird", '1')]
>>> matching = {'car', 'mercedes', 'ferrari'}
>>> [value for key, value in var if key in matching]
['1', '1', '1']
>>> sum(int(value) for key, value in var if key in matching)
3
Of course, this gets a little more interesting when you use values other than '1':
>>> var = [("car", '8'), ("dog", '2'), ("mercedes", '16'), ("cat", '4'), ("ferrari", '32'), ("bird", '64')]
>>> [value for key, value in var if key in matching]
['8', '16', '32']
>>> sum(int(value) for key, value in var if key in matching)
56
As for your attempt to implement my solution, you replaced your original list with a list with only the values. Remove the list comprehension line rebinding regex2 and run just the sum() line:
>>> regex2 = [("'goal','corner','rightfoot'", '1'), ("'goal','directfreekick','leftfoot'", '1'),
... ("'goal','openplay','leftfoot'", '1'), ("'goal','openplay','rightfoot'", '2'),
... ("'miss','corner','header'", '3'), ("'miss','directfreekick','leftfoot'", '1'),
... ("'miss','directfreekick','rightfoot'", '1'), ("'miss','openplay','header'", '3'),
... ("'miss','openplay','leftfoot'", '8'), ("'miss','openplay','rightfoot'", '11')]
>>> matching = {"'goal','openplay','leftfoot'", "'goal','openplay','rightfoot'", "'goal','corner','leftfoot'", "'goal','corner','rightfoot'"}
>>> sum(int(value) for key, value in regex2 if key in matching)
4

Related

Delete item from list of tuples and move items up

I want to delete an element from a list of tuples and move the other items that had the same position.
Input:
a=[('201001', '-4'), ('201002', '2'), ('201003', '6')]
Desired output:
a=[('201001', '2'), ('201002', '6'), ('201003', 'na')]
I have tried the following code for it:
a[0](:- 1)
But I get SyntaxError: invalid syntax
I would appreciate it if you could suggest ways to solve this case.
Iterate through each element and set the tuple so that the second value is the value of the next element (except the last element because there is no element after it)
for i, val in enumerate(a):
try:
a[i] = (val[0], a[i+1][1])
except IndexError:
a[i] = (val[0], "na")
instead of error catching, you could also use the index:
arr_len = len(a) - 1
for i, val in enumerate(a):
if i == arr_len:
a[i] = (val[0], "na")
break
a[i] = (val[0], a[i+1][1])
Another way using zip:
a = [('201001', '-4'), ('201002', '2'), ('201003', '6')]
output = [(x, y) for (x, _), (_ ,y) in zip(a, [*a[1:], (None, 'na')])]
print(output) # [('201001', '2'), ('201002', '6'), ('201003', 'na')]
Here's a different way that lets you choose where you want to delete your number:
a = [('201001', '-4'), ('201002', '2'), ('201003', '6'), ('201004', '8'), ('201005', '3')]
def delete_item(position, arr): # position starting at 0, [(0,0), (1,1), (2,2), etc]
newNums = ([arr[x][1] for x in range(0, position)]
+ [arr[x][1] for x in range(position+1, len(arr))]
+ ['na'])
arr = [(arr[x][0], y) for x,y in zip(range(len(arr)), newNums)]
return arr
newTuple = delete_item(3, a)
Ouput:
[('201001', '-4'),
('201002', '2'),
('201003', '6'),
('201004', '3'),
('201005', 'na')]
Then you can keep putting the list of tuples in to remove a new number at a new position:
newTuple = delete_item(1, newTuple)
Output:
[('201001', '-4'),
('201002', '6'),
('201003', '3'),
('201004', 'na'),
('201005', 'na')]
Tuples are referenced the same way as a list which is a[0][-1] for the last element of the first tuple.
As tuples are immutable, you would be creating a new tuple every time you want to edit one.
One answer to your question would be to iterate through the values and update them like in the other answers.
A simpler way could be to first convert it into two lists, update and zip them back, like so:
from itertools import zip_longest
fst = list(map(lambda x: x[0], a))
snd = list(map(lambda x: x[1], a))
snd.pop(0)
a = list(zip_longest(fst, snd, fillvalue='na'))

Split a sentence and group each value by key

I have an input data in below format which I'm trying to split and create a key value pair:
Input:
"SQL",1,2,3,4,5
"ORACLE",2,5,6,7
Intended data to write into RDD:
SQL,1
SQL,2
SQL,3
SQL,4
SQL,5
ORACLE,2
ORACLE,5
ORACLE,6
ORACLE,7
I'm trying to create key-value pair using below code which does not work
data_rdd = f.zipWithIndex() \
.map(lambda row: (row[0].replace('"', '').split(',')[0], (dst for dst in row[1:len(row[0])]))) \
.aggregateByKey([], lambda a, b: a + [b], lambda a, b: a + b)
Input data:
inp = '''"SQL",1,2,3,4,5
"ORACLE",2,5,6,7'''
Code:
res = []
for line in inp.splitlines():
values = line.split(',')
key = values[0].replace('"', '')
res.extend((key, v) for v in values[1:])
print(res)
Note: It creates a copy of values without first element to skip it.
Also you can skip first element by accessing to values elements by index:
res = []
for line in inp.splitlines():
values = line.split(',')
key = values[0].replace('"', '')
res.extend((key, values[i]) for i in range(1, len(values)))
print(res)
Output:
[('SQL', '1'), ('SQL', '2'), ('SQL', '3'), ('SQL', '4'), ('SQL', '5'), ('ORACLE', '2'), ('ORACLE', '5'), ('ORACLE', '6'), ('ORACLE', '7')]
If you want to collect them into list of strings in format you provided, just replace
res.extend((key, v) for v in values[1:])
with
res.extend('{},{}'.format(key, v) for v in values[1:])
Use flatMap():
data_rdd.flatMap(lambda row: [
(k, v) for k, vs in [row.replace('"','').split(',', 1)] for v in vs.split(',')
]).collect()
#[('SQL', '1'),
# ('SQL', '2'),
# ('SQL', '3'),
# ('SQL', '4'),
# ('SQL', '5'),
# ('ORACLE', '2'),
# ('ORACLE', '5'),
# ('ORACLE', '6'),
# ('ORACLE', '7')]
Where:
[row.replace('"','').split(',', 1)] convert a row like "SQL",1,2,3,4,5 into a list of two elements SQL and 1,2,3,4,5
vs.split(',') then split the 2nd item into a new list
the list comprehension with tuples of (k, v) will then be flattened by flatMap()

Python Prepend a list of dicts

I have this list:
[('1', '1')]
and I want to prepend the list with a dict object to look like:
[('All', 'All'), ('1', '1')]
I'm trying:
myList[:0] = dict({'All': 'All'})
But that gives me:
['All', ('1', '1')]
What am I doing wrong?
Use items() of dictionary to get key, value and prepend them to list:
lst = [('1', '1')]
lst = list({'All': 'All'}.items()) + lst
print(lst)
# [('All', 'All'), ('1', '1')]
Note: {'All': 'All'} is a dictionary itself, so dict({'All': 'All'}) in your code is unnecessary.
When you use a dict in as an iterable, you only iterate over its keys. If you instead want to iterate over its key/value pairs, you have to use the dict.items view.
l = [('1', '1')]
d = dict({'All': 'All'})
print([*d.items(), *l])
# [('All', 'All'), ('1', '1')]
The * syntax is available in Python 3.5 and later.
l[:0] = d.items()
also works
You can also have a look at below.
>>> myList = [('1', '1')]
>>>
>>> myList[:0] = dict({'All': 'All'}).items()
>>> myList
[('All', 'All'), ('1', '1')]
>>>
For an answer like [('All', 'All'), ('1', '1')], do:
myList = [('1', '1')]
myList = [('All', 'All')] + myList
For more, reference this.
You can refer the function below for appending any dict as list items to already present list. You just have to send a new dict which you want to append with the old list already present with you.
def append_dict_to_list(new_dict,old_list):
list_to_append = list(new_dict.items())
new_list = list_to_append + old_list
return new_list
print (append_dict_to_list({'All':'All'},[('1', '1')]))
P.S: If you want the new dict to be appended after the existing list, just change the sequence in code as new_list = old_list + list_to_append

Iterating over a Python 2D list to find the value

I am trying to iterate over a Python 2D list. As the algorithm iterates over the list, it will add the key to a new list until a new value is detected. An operation is then applied to the list and then the list is emptied so that it can be used again as follows:
original_list = [('4', 'a'), ('3', 'a'), ('2', 'a'), ('1', 'b'), ('6', 'b')]
When the original_list is read by the algorithm it should evaluate the second value of each object and decipher if it is different from the previous value; if not, add it to a temporary list.
Here is the psedo code
temp_list = []
new_value = original_list[0][1] #find the first value
for key, value in original_list:
if value != new_value:
temp_list.append(new_value)
Should output
temp_list = ['4', '3', '2']
temp_list = []
prev_value = original_list[0][1]
for key, value in original_list:
if value == prev_value:
temp_list.append(key)
else:
do_something(temp_list)
print temp_list
temp_list = [key]
prev_value = value
do_something(temp_list)
print temp_list
# prints ['4', '3', '2']
# prints ['1', '6']
Not entirely sure what you are asking, but I think itertools.groupby could help:
>>> from itertools import groupby
>>> original_list = [('4', 'a'), ('3', 'a'), ('2', 'a'), ('1', 'b'), ('6', 'b')]
>>> [(zip(*group)[0], k) for k, group in groupby(original_list, key=lambda x: x[1])]
[(('4', '3', '2'), 'a'), (('1', '6'), 'b')]
What this does: It groups the items in the list by their value with key=lambda x: x[1] and gets tuples of keys corresponding to one value with (zip(*group)[0], k).
In case your "keys" do not repeat themselves, you could just use a defaultdict to "sort" the values based on keys, then extract what you need
from collections import defaultdict
ddict = defaultdict(list)
for v1, v2 in original_list:
ddict[v2].append(v1)
ddict values are now all temp_list:
>>> ddict["a"]
['4', '3', '2']

Extract information from defaultdict

I have a defaultdict that contains the calculation of the average position of number ( Euler problem )
[('1', 0.6923076923076923), ('0', 2.0), ('3', 0.2222222222222222),
('2', 1.0909090909090908), ('7', 0.0), ('6', 0.875),
('9', 1.6923076923076923),('8', 1.3333333333333333)]
I'm trying to get this information into simple string instead of doing it manually from 0 - 2.
The end result I'm looking for is something like
73162890
I don't know any good way to extracting them without using many if-else and for-loops.
Is there any simple and good way of doing this in python?
If your dict is d, then items = d.items() gives you a list of pairs, like you have. Once you have this list, you can sort it by the second element:
ordered = sorted(items, key=lambda (_, value): value) # Not Python 3
# or,
ordered = sorted(items, key=lambda x: x[1])
# or,
import operator
ordered = sorted(items, key=operator.itemgetter(1))
Once we have the list in sorted order, we just need to extract the strings from each one, and glue them all together:
result = ''.join(string for (string, _) in ordered)
(Note that I'm calling unused parameters _, there's nothing special about the _ in a Python program.)
In [36]: ''.join([key for key, val in sorted(data, key = lambda item: item[1])])
Out[36]: '73162890'
Explanation:
This sorts the data according to the second value for each item in data.
In [37]: sorted(data, key = lambda item: item[1])
Out[37]:
[('7', 0.0),
('3', 0.2222222222222222),
('1', 0.6923076923076923),
('6', 0.875),
('2', 1.0909090909090908),
('8', 1.3333333333333333),
('9', 1.6923076923076923),
('0', 2.0)]
Now we can collect the first value in each item using a list comprehension:
In [38]: [key for key, val in sorted(data, key = lambda item: item[1])]
Out[38]: ['7', '3', '1', '6', '2', '8', '9', '0']
And join these items into a string using ''.join:
In [39]: ''.join([key for key, val in sorted(data, key = lambda item: item[1])])
Out[39]: '73162890'

Categories

Resources