I have an example of annotation file
{'text': "BELGIE BELGIQUE BELGIEN\nIDENTITEITSKAART CARTE D'IDENTITE PERSONALAUSWEIS\nBELGIUM\nIDENTITY CARD\nNaam / Name\nDermrive\nVoornamen / Given names\nBrando Jerom L\nGeslacht / Nationaliteit /\nGeboortedatum /\nSex\nNationality\nDate of birth\nM/M\nBEL\n19 05 1982\nRijksregisternr. 7 National Register Nº\n85.08.23-562.77\nKaartnr. / Card Nº\n752-0465474-34\nVervalt op / Expires on\n23 07 2025\n", 'spans': [{'start': 24, 'end': 40, 'token_start': 16, 'token_end': 16, 'label': 'CardType'}, {'start': 41, 'end': 57, 'token_start': 16, 'token_end': 16, 'label': 'CardType'}, {'start': 58, 'end': 73, 'token_start': 15, 'token_end': 15, 'label': 'CardType'}, {'start': 108, 'end': 116, 'token_start': 8, 'token_end': 8, 'label': 'LastName'}, {'start': 141, 'end': 155, 'token_start': 14, 'token_end': 14, 'label': 'FirstName'}, {'start': 229, 'end': 232, 'token_start': 3, 'token_end': 3, 'label': 'Gender_nid'}, {'start': 233, 'end': 236, 'token_start': 3, 'token_end': 3, 'label': 'Nationality_nid'}, {'start': 237, 'end': 247, 'token_start': 10, 'token_end': 10, 'label': 'DateOfBirth_nid'}, {'start': 288, 'end': 303, 'token_start': 15, 'token_end': 15, 'label': 'Ssn'}, {'start': 323, 'end': 337, 'token_start': 14, 'token_end': 14, 'label': 'CardNumber'}, {'start': 362, 'end': 372, 'token_start': 10, 'token_end': 10, 'label': 'ValidUntil_nid'}]}
So when a i have a start and end position of "LastName"entity, in the example is "Dermrive", when i produce another, shorter or longer LastName for example "Brad", i need to change all the rest by difference of this words, so that other labels stays in the correct postition. Its works perfecly with one entity, but when i try to change all of them, the output is messy and labels are not correct anymore.
def replace_text_by_index_and_type(self, new_text, type):
label_position = self.search_label_position_in_spans(self.annotation['spans'], type.value)
label = self.annotation['spans'][label_position]
begin_new_string = self.annotation["text"][:label["start"]]
end_new_string = self.annotation["text"][label["end"]:]
new_string = begin_new_string + new_text + end_new_string
for to_change_ent in self.annotation['spans'][label_position+1:]:
diff = len(new_text) - (label["end"] - label["start"])
self.annotation['spans'][label_position]["end"] = self.annotation['spans'][label_position]["end"] + diff
#print(f"Diff between original {to_change_ent} and new_string: {diff}")
to_change_ent["start"] += diff
to_change_ent["end"] += diff
return new_string
I start to change all entities from the second one, to keep the start position of first one. And add diff to ending position of first entity, as a results the firstname and lastname are correct, but other entities are shifted to mess.
Related
I am new to python and I am trying to construct data structure from existing data.
I have following:
[
{'UserName': 'aaa', 'AccessKeyId': 'AKIAYWQTISJD6X27YVK', 'Status': 'Active', 'CreateDate': datetime.datetime(2022, 9, 8, 15, 56, 39, tzinfo=tzutc())},
{'UserName': 'eee', 'AccessKeyId': 'AKIAYWQTISJD6QXMAKY', 'Status': 'Active', 'CreateDate': datetime.datetime(2023, 1, 24, 12, 30, 59, tzinfo=tzutc())},
{'UserName': 'eee', 'AccessKeyId': 'AKIAYWQTISJDUARK6FV', 'Status': 'Active', 'CreateDate': datetime.datetime(2023, 1, 24, 16, 58, 38, tzinfo=tzutc())}
]
I need to get this:
{
"aaa": [
{'AccessKeyId': 'AKIAYWQTISJD6X27YVK', 'Status': 'Active', 'CreateDate': datetime.datetime(2022, 9, 8, 15, 56, 39, tzinfo=tzutc())}],
"eee": [
{'AccessKeyId': 'AKIAYWQTISJD6QXMAKY', 'Status': 'Active', 'CreateDate': datetime.datetime(2023, 1, 24, 12, 30, 59, tzinfo=tzutc())},
{'AccessKeyId': 'AKIAYWQTISJDUARK6FV', 'Status': 'Active', 'CreateDate': datetime.datetime(2023, 1, 24, 16, 58, 38, tzinfo=tzutc())}
]
}
I tried following:
list_per_user = {i['UserName']: copy.deepcopy(i) for i in key_list}
for obj in list_per_user:
del list_per_user[obj]['UserName']
but I am missing array here. So in case of two keys per user I will have only last one with this. I don't know how to get the list I need per user.
Thanks!
Create an external dict that maps username -> list of entries.
data = [
{'UserName': 'aaa', 'AccessKeyId': 'AKIAYWQTISJD6X27YVK', 'Status': 'Active', 'CreateDate': datetime.datetime(2022, 9, 8, 15, 56, 39, tzinfo=tzutc())},
{'UserName': 'eee', 'AccessKeyId': 'AKIAYWQTISJD6QXMAKY', 'Status': 'Active', 'CreateDate': datetime.datetime(2023, 1, 24, 12, 30, 59, tzinfo=tzutc())},
{'UserName': 'eee', 'AccessKeyId': 'AKIAYWQTISJDUARK6FV', 'Status': 'Active', 'CreateDate': datetime.datetime(2023, 1, 24, 16, 58, 38, tzinfo=tzutc())}
]
new_data = {}
for entry in data:
new_data.setdefault(entry["UserName"], []).append(
{k: v for k, v in entry.items() if k != "UserName"}
)
print(new_data)
Output (some fields hidden because I don't want to import those libraries in my repl, but they'll be there when you run it)
{'aaa': [{'AccessKeyId': 'AKIAYWQTISJD6X27YVK', 'Status': 'Active'}],
'eee': [{'AccessKeyId': 'AKIAYWQTISJD6QXMAKY', 'Status': 'Active'},
{'AccessKeyId': 'AKIAYWQTISJDUARK6FV', 'Status': 'Active'}]}
how can i take a string like this
string = "image1 [{'box': [35, 0, 112, 36], 'score': 0.8626706004142761, 'label': 'FACE_F'}, {'box': [71, 80, 149, 149], 'score': 0.8010843992233276, 'label': 'FACE_F'}, {'box': [0, 81, 80, 149], 'score': 0.7892318964004517, 'label': 'FACE_F'}]"
and turn it into variables like this?
filename = "image1"
box = [35, 0, 112, 36]
score = 0.8010843992233276
label = "FACE_F"
or if there are more than one of box, score, or label
filename = "image1"
box = [[71, 80, 149, 149], [35, 0, 112, 36], [0, 81, 80, 149]]
score = [0.8010843992233276, 0.8626706004142761, 0.7892318964004517]
label = ["FACE_F", "FACE_F", "FACE_F"]
this is how far i've gotten
log = open(r'C:\Users\15868\Desktop\python\log.txt', "r")
data = log.readline()
log.close()
print(data)
filename = data.split(" ")[0]
info = data.rsplit(" ")[1]
print(filename)
print(info)
output
[{'box':
image1
Here is how I would do it:
import ast
string = "image1 [{'box': [35, 0, 112, 36], 'score': 0.8626706004142761, 'label': 'FACE_F'}, {'box': [71, 80, 149, 149], 'score': 0.8010843992233276, 'label': 'FACE_F'}, {'box': [0, 81, 80, 149], 'score': 0.7892318964004517, 'label': 'FACE_F'}]"
filename, data = string.split(' ', 1)
data = ast.literal_eval(data)
print(filename)
print(data)
Output:
image1
[{'box': [35, 0, 112, 36], 'score': 0.8626706004142761, 'label': 'FACE_F'}, {'box': [71, 80, 149, 149], 'score': 0.8010843992233276, 'label': 'FACE_F'}, {'box': [0, 81, 80, 149], 'score': 0.7892318964004517, 'label': 'FACE_F'}]
(updated to follow your example of combining the keys):
From there I'd just write some simple code, something like:
box = []
score = []
label = []
for row in data:
box.append(row['box'])
score.append(row['score'])
label.append(row['label'])
To unpack that data there are fancier ways but that is the most straight forward, for example:
box, score, label = zip(*{ x.values() for x in data })
I am brand new to Python and I am starting a BS in data analytics in August I am trying to get a head start on learning. Can anyone solve this for me?
from collections import Counter
Counter(one_d)
returns the following
Counter({'Action': 303,
'Adventure': 259,
'Sci-Fi': 120,
'Mystery': 106,
'Horror': 119,
'Thriller': 195,
'Animation': 49,
'Comedy': 279,
'Family': 51,
'Fantasy': 101,
'Drama': 513,
'Music': 16,
'Biography': 81,
'Romance': 141,
'History': 29,
'Crime': 150,
'Western': 7,
'War': 13,
'Musical': 5,
'Sport': 18})
I would like to create a Barplot but am unsure how to do this. Is barplot even the best choice for this data?
The Pandas library is quite useful for data analytics and visualization:
from collections import Counter
import pandas as pd
counts = Counter({'Action': 303, 'Adventure': 259, 'Sci-Fi': 120, 'Mystery': 106, 'Horror': 119, 'Thriller': 195, 'Animation': 49, 'Comedy': 279, 'Family': 51, 'Fantasy': 101, 'Drama': 513, 'Music': 16, 'Biography': 81, 'Romance': 141, 'History': 29, 'Crime': 150, 'Western': 7, 'War': 13, 'Musical': 5, 'Sport': 18})
data = pd.Series(counts)
ax = data.plot.bar()
ax.set(xlabel='Genre', ylabel='Count', title='Good luck on your BS')
What is the difference between studentsDict.values() and studentsDict[key].values in the following code?
studentsDict = {'Ayush': {'maths': 24, 'english': 19, 'hindi': 97, 'bio': 20, 'science': 0}, 'Pankaj': {'maths': 52, 'english': 76, 'hindi': 68, 'bio': 97, 'science': 66}, 'Raj': {'maths': 85, 'english': 79, 'hindi': 51, 'bio': 36, 'science': 75}, 'iC5z4DK': {'maths': 24, 'english': 92, 'hindi': 31, 'bio': 29, 'science': 91}, 'Zf1WSV6': {'maths': 81, 'english': 58, 'hindi': 85, 'bio': 31, 'science': 7}}
for key in studentsDict.keys():
for marks in studentsDict[key].values():
if marks < 33:
print(key, "FAILED")
break
studentsDict.keys() gives you each of the keys in the outer dict: "Ayush", "Pankaj", "Raj", "iC5z4DK" and "Zf1WSV6".
studentsDict[key].values() gives you the values for the entry in studentsDict corresponding to key. For example, if key is "Ayush", you would get 24, 19, 97, 20, and 0.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
I am new to pandas and trying to solve a problem of a basic code to form a data frame. I wrote two rows the data frame to try, but it is not working. I do not know the problem is about the continuation of the dictionaries and the list on the new line or something else. Do I need to use backslash when moving to the new line? Any help is appreciated.
Here is the code:
import pandas as pd
data = [{'#':1, 'Name': 'BS', 'Type 1': 'grass', 'type 2': 'poison', 'Total': 318, 'HP': 45, 'Attack': 49, 'Defense': 49, 'Sp. Atk': 65, 'Sp. Def': 65, 'Speed': 45, 'Generation': 1, 'Legendary':'false'}, {'#':2, 'Name': 'IS', 'Type 1': 'grass', 'type 2': 'poison', 'Total': 405, 'HP': 60, 'Attack': 62, Defense': 63, 'Sp. Atk': 80, 'Sp. Def': 80, 'Speed': 60, 'Generation': 1, 'Legendary':'false'}]
df = pd.DataFrame(data)
Your problem is syntax error with the ‘defense’ key element. There is a missing apostrophe.
data = [{'#':1, 'Name': 'BS', 'Type 1': 'grass', 'type 2': 'poison', 'Total': 318, 'HP': 45, 'Attack': 49,
'Defense': 49, 'Sp. Atk': 65, 'Sp. Def': 65, 'Speed': 45, 'Generation': 1, 'Legendary':'false'},
{'#':2, 'Name': 'IS', 'Type 1': 'grass', 'type 2': 'poison', 'Total': 405, 'HP': 60, 'Attack': 62,
'Defense': 63, 'Sp. Atk': 80, 'Sp. Def': 80, 'Speed': 60, 'Generation': 1, 'Legendary':'false'}]
>>> pd.DataFrame(data)
# Name Type 1 type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 1 BS grass poison 318 45 49 49 65 65 45 1 false
1 2 IS grass poison 405 60 62 63 80 80 60 1 false
import pandas as pd
data = [{'#':1, 'Name': 'BS', 'Type 1': 'grass', 'type 2': 'poison', 'Total': 318,
'HP': 45, 'Attack': 49, 'Defense': 49, 'Sp. Atk': 65, 'Sp. Def': 65, 'Speed': 45,
'Generation': 1, 'Legendary':'false'}, {'#':2, 'Name': 'IS', 'Type 1': 'grass',
'type 2': 'poison', 'Total': 405, 'HP': 60,
'Attack': 62, 'Defense': 63,
'Sp. Atk': 80, 'Sp. Def': 80,
'Speed': 60, 'Generation': 1, 'Legendary':'false'}]
df = pd.DataFrame(data)
print(df)