Extract lists from a list in dictionary - python

Suppose I have this dictionary:
self.dict = {'A':[[10, 20],[23,76,76],[23,655,54]], 'B':[30, 40, 50], 'C':[60, 100]}
Where the key 'A' is a list of lists. I want to get only the first 2 lists of 'A', i.e. [10, 20],[23,76,76]. I tried the idea of looping but it does not work well. :
class T(object):
def __init__(self):
self.dict = {'A':[[10, 20],[23,76,76],[23,655,54]], 'B':[30, 40, 50], 'C':[60, 100]}
def output(self):
for i in self.dict:
for j in self.dict[i]:
first_two_lists = j
print ("%s" % (first_two_lists))
if __name__ == '__main__':
T().output()
How can I get that ?

>>> d = {'A':[[10, 20],[23,76,76],[23,655,54]], 'B':[30, 40, 50], 'C':[60, 100]}
>>> d['A'][:2]
[[10, 20], [23, 76, 76]]

Using list slicing:
>>> d = {'A':[[10, 20],[23,76,76],[23,655,54]], 'B':[30, 40, 50], 'C':[60, 100]}
>>> d.get('A')[:2]
[[10, 20], [23, 76, 76]]

Related

recreate multi-token strings from tokens given indices and text source

I'm preparing a script that reconstitutes multi-token strings from a tokenized text for tokens that have specific labels. My tokens are associated with their start and end indices in the original text.
This is an example piece of text:
t = "Breakfast at Tiffany's is a novella by Truman Capote."
The tokens data structure containing the original text indices and labels:
[(['Breakfast', 0, 9], 'BOOK'),
(['at', 10, 12], 'BOOK'),
(['Tiffany', 13, 20], 'BOOK'),
(["'", 20, 21], 'BOOK'),
(['s', 21, 22], 'BOOK'),
(['is', 23, 25], 'O'),
(['a', 26, 27], 'O'),
(['novella', 28, 35], 'O'),
(['by', 36, 38], 'O'),
(['Truman', 39, 45], 'PER'),
(['Capote', 46, 52], 'PER'),
(['.', 52, 53], 'O')]
This data structure was generated from t as follows
import re
tokens = [[m.group(0), m.start(), m.end()] for m in re.finditer(r"\w+|[^\w\s]", t, re.UNICODE)]
tags = ['BOOK', 'BOOK', 'BOOK', 'BOOK', 'BOOK', 'O', 'O', 'O', 'O', 'PER', 'PER', 'O']
token_tuples = list(zip(tokens, tags))
What I would like my script to do is to iterate through token_tuples and if it encounters a non-O token, it breaks off from the main iteration and reconstitutes the tagged multi-token span until it hits the nearest token with O.
This is the current script:
for i in range(len(token_tuples)):
if token_tuples[i][1] != 'O':
tag = token_tuples[i][1]
start_ix = token_tuples[i][0][1]
slider = i+1
while slider < len(token_tuples):
if tag != token_tuples[slider][1]:
end_ix = token_tuples[slider][0][2]
print((t[start_ix:end_ix], tag))
break
else:
slider+=1
This prints:
("Breakfast at Tiffany's is", 'BOOK')
("at Tiffany's is", 'BOOK')
("Tiffany's is", 'BOOK')
("'s is", 'BOOK')
('s is', 'BOOK')
('Truman Capote.', 'PER')
('Capote.', 'PER')
What needs to be modified so that the output for this example is:
> ("Breakfast at Tiffany's", "BOOK")
> ("Truman Capote", "PER")
Here's one solution. If you can come up with something less long-winded, I'd be happy to choose your answer instead!
def extract_entities(t, token_tuples):
entities = []
tag = ''
for i in range(len(token_tuples)):
if token_tuples[i][1] != 'O':
if token_tuples[i][1] != tag:
tag = token_tuples[i][1]
start_ix = token_tuples[i][0][1]
if i+1 < len(token_tuples):
if tag != token_tuples[i+1][1]:
end_ix = token_tuples[i][0][2]
entities.append((t[start_ix:end_ix], tag))
tag = ''
return(entities)

inherit a function from one class to another in python?

I have a class calling two functions and two instance methods.
class MyClass(object):
def df1(self):
raw_data = {'preTestScore': [4, 24, '', 2, 3],'postTestScore': [25, 94, 57, 62, 70]}
df1 = pd.DataFrame(raw_data, columns = ['preTestScore', 'postTestScore'])
return df1
def df2(self):
raw_data = {'preTestScore': [14, 4, 15, 12, 13],'postTestScore': ['', 4, 7, 2, 7]}
df2 = pd.DataFrame(raw_data, columns = ['preTestScore', 'postTestScore'])
return df2
def df1_and_df2(self):
return (self.df1(), self.df2())
And how to inherit those two dfs from another class
class MySecond():
#call df1 and df2 from MyClass
#work on from those two dfs
Inside the class you could do
obj = MyClass()
obj.dfl()
obj.df2()
You could also have MySecond inherit MyClass by doing
class MySecond(MyClass):
#code
Your MyClass.df1 and MyClass.df2 methods are not depending on a MyClass object, meaning that they could be declared as static methods:
class MyClass(object):
#staticmethod
def df1():
raw_data = {'preTestScore': [4, 24, '', 2, 3],'postTestScore': [25, 94, 57, 62, 70]}
df1 = pd.DataFrame(raw_data, columns = ['preTestScore', 'postTestScore'])
return df1
#staticmethod
def df2():
raw_data = {'preTestScore': [14, 4, 15, 12, 13],'postTestScore': ['', 4, 7, 2, 7]}
df2 = pd.DataFrame(raw_data, columns = ['preTestScore', 'postTestScore'])
return df2
From there they could be called directly from the class anywhere without requiring an instantiated object:
MyClass.df1()
MyClass.df2()
EDIT: To inherit from MyClass, you would do the following:
class MySecond(MyClass):
pass
Now instantiating MySecond would inherits MyClass methods. E.g.
instance = MySecond()
instance.df1_and_df2()
More info about inheritance: https://docs.python.org/3/tutorial/classes.html#inheritance

How to handle no unique mode; found 2 equally common values in below function

Here is the snippet of my function used to find mode.
import statistics
unique = numpy.unique(aggr_group)
x = numpy.zeros(shape=(unique.size))
for i in range(0, unique.size):
x[i] = statistics.mode(val[aggr_group==unique[i]])
However, can anyone let me know how can i handle "no unique mode; found 2 equally common value" situation here? I tried multiple options, but didn't work. Can anyone suggest something?
The following code group the data by column 1, and find first mode in terms of column 0 in each group
import numpy as np
from collections import Counter
data = np.array([[115, 26925055],
[115, 26925055],
[115, 26925055],
[115, 26925055],
[114, 26925055],
[115, 26925055],
[115, 26925055],
[114, 26925055],
[115, 26925055],
[84, 25471149],
[111, 25471149],
[84, 25471149],
[84, 25471149],
[84, 25471149],
[84, 25471149],
[111, 25471149],
[111, 25471149],
[111, 25471149],
[84, 25471149]])
# ans = [[115, 26925055], [84, 25471149]]
def get_first_mode(a):
c = Counter(a)
mode_count = max(c.values())
mode = {key for key, count in c.items() if count == mode_count}
first_mode = next(x for x in a if x in mode)
return first_mode
col0 = data[:, 0]
col1 = data[:, 1]
ans = []
for v in np.unique(col1):
mode = get_first_mode(col0[col1 == v])
print((mode, v))
ans.append((mode, v))
Here is another function for getting all modes
def get_all_modes(a):
c = Counter(a)
mode_count = max(c.values())
mode = {key for key, count in c.items() if count == mode_count}
return mode

How do I use threads on a generator (multiple threads per item) while keeping the order?

I have a code that is mimicking a REST API call (see below).
For every key in the item of the generator, it needs to run a REST call. So in my example, a record could be
{"a": 2, "b": 36, "c": 77}
I need to run a REST call for every key (a, b, and c) individually, then output the results (which just negates the number):
{"a": 2, "a_neg": -2, "b": 36, "b_neg": -36, "c": 77, "c_neg": -77}
Right now my current code works for one key, but with multiple keys, it will repeat the items (so I'm getting triple the results for 3 keys).
Also there is some funky race condition that occurs as well. I guess I could only keep the last record, but I'm not good with threads and concerned about thread safety or other advanced stuff.
Here is an example output:
{'a': 89, 'a_neg': -89, 'b': 69, 'c': 38}
{'a': 89, 'a_neg': -89, 'b': 69, 'c': 38, 'c_neg': -38}
{'a': 89, 'a_neg': -89, 'b': 69, 'b_neg': -69, 'c': 38, 'c_neg': -38}
{'a': 90, 'a_neg': -90, 'b': 43, 'c': 16}
{'a': 90, 'a_neg': -90, 'b': 43, 'c': 16, 'c_neg': -16}
{'a': 90, 'a_neg': -90, 'b': 43, 'b_neg': -43, 'c': 16, 'c_neg': -16}
{'a': 91, 'a_neg': -91, 'b': 49, 'b_neg': -49, 'c': 77, 'c_neg': -77}
{'a': 91, 'a_neg': -91, 'b': 49, 'b_neg': -49, 'c': 77, 'c_neg': -77}
{'a': 91, 'a_neg': -91, 'b': 49, 'b_neg': -49, 'c': 77, 'c_neg': -77}
Finally here is my source code (you can run it yourself):
#!/usr/bin/env python
from concurrent.futures import ThreadPoolExecutor
from time import sleep
from pprint import pprint
import random
def records():
# simulates records generator
for i in range(100):
yield {"a": i, "b": random.randint(0,100), "c": random.randint(0,100)}
def stream(records):
threads = 8
pool = ThreadPoolExecutor(threads)
def rest_api_lookup(record_dict):
# simulates REST call :)
sleep(0.1)
key = record_dict["key"]
record = record_dict["record"]
record[key + "_neg"] = -record[key]
return record
def thread(records):
chunk = []
for record in records:
for key in record:
chunk.append(pool.submit(rest_api_lookup, {"record": record, "key": key}))
if len(chunk) == threads:
yield chunk
chunk = []
if chunk:
yield chunk
def unchunk(chunk_gen):
"""Flattens a generator of Future chunks into a generator of Future results."""
for chunk in chunk_gen:
for f in chunk:
yield f.result() # get result from Future
# Now iterate over all results in same order as records
for result in unchunk(thread(records)):
#yield result
pprint(result)
stream(records())
1st issue here is that your looping over keys in a record that grows...
for key in list(record): # make a copy of the keys!
I think the 2nd issue here is that you have 3 keys and 8 threads... len(chunk) will be 3, 6, 9 ... threads is 8 - the following condition is not reached
if len(chunk) == threads: # try len(chunk) >= threads
yield chunk
chunk = []
last issue is that you yield uncompleted records before all threads are finish. here is a possible fix:
def unchunk(chunk_gen):
"""Flattens a generator of Future chunks into a generator of Future results."""
for chunk in chunk_gen:
old_res = None
for f in chunk:
res = f.result() # get result from Future
if old_res and res is not old_res:
yield old_res
old_res = res
if old_res:
yield old_res

Turning list of lists into objects

I currently have a list of lists where every list consists of the same kind of information, say:
[['Planet Name', 16, 19, 27, 11], ['Planet Name 2', 12, 22, 11, 42], ....]
and I would like to use a class to make this into a list of objects with the same information, where index 0 is self.name, index 1 is self.distance and so on for every seperate list.
I know that I need to use some kind of a for loop, but have no idea how to go about and do this.
I would really appreciate some help, trying to learn Python and currently classes!
You can use namedtuple like this, to create an object dynamically, with the list of field names. *item in this code is called, unpacking of arguments list
from collections import namedtuple
Planet = namedtuple("Planet", ["name", "distance", "a", "b", "c"])
data = [['Planet Name', 16, 19, 27, 11],['Planet Name 2', 12, 22, 11, 42]]
for item in data:
planet = Planet(*item)
print planet.name, planet.distance, planet
Output
Planet Name 16 Planet(name='Planet Name', distance=16, a=19, b=27, c=11)
Planet Name 2 12 Planet(name='Planet Name 2', distance=12, a=22, b=11, c=42)
Note: namedtuple is a subclass of tuple. So, all the objects created with namedtuple are immutable. It means that, once the object is created, data in the member variables cannot be changed.
Well... To make a class like you want you can do something like this:
class Planet(object):
def __init__(self, *args, **kwargs):
self.name = args[0]
self.distance = args[1]
# ... etc ...
Or something like this:
class Planet(object):
def __init__(self, name, distance, ...):
self.name = name
self.distance = distance
# ... etc ...
And then you call it like this:
p = Planet(*['Planet Name', 16, 19, 27, 11])
In a loop that would be:
l = [['Planet Name', 16, 19, 27, 11], ['Planet Name 2', 12, 22, 11, 42], ....]
planets = [Planet(*data) for data in l]
I'm confused. Have you created the Planet constructor yet?
The code would be something like:
class Planet(object):
def __init__(self, ....):
....
planets = [['Planet Name', 16, 19, 27, 11]['Planet Name 2', 12, 22, 11, 42]....]
planet_list = [Planet(*p) for p in planets]
If you don't want to have a constructor (__init__) which knows about the specifics of your lists, you could do it like this
lists = [['Planet Name', 16, 19, 27, 11], ['Planet Name 2', 12, 22, 11, 42]]
class Planet(object):
pass
for l in lists:
planet = Planet()
setattr(planet, 'name', l[0])
setattr(planet, 'distance', l[1])
setattr(planet, 'size', l[2])
print planet.name, planet.distance, planet.size

Categories

Resources