Shifting label numbers by new string - python

I have an example of annotation file
{'text': "BELGIE BELGIQUE BELGIEN\nIDENTITEITSKAART CARTE D'IDENTITE PERSONALAUSWEIS\nBELGIUM\nIDENTITY CARD\nNaam / Name\nDermrive\nVoornamen / Given names\nBrando Jerom L\nGeslacht / Nationaliteit /\nGeboortedatum /\nSex\nNationality\nDate of birth\nM/M\nBEL\n19 05 1982\nRijksregisternr. 7 National Register Nº\n85.08.23-562.77\nKaartnr. / Card Nº\n752-0465474-34\nVervalt op / Expires on\n23 07 2025\n", 'spans': [{'start': 24, 'end': 40, 'token_start': 16, 'token_end': 16, 'label': 'CardType'}, {'start': 41, 'end': 57, 'token_start': 16, 'token_end': 16, 'label': 'CardType'}, {'start': 58, 'end': 73, 'token_start': 15, 'token_end': 15, 'label': 'CardType'}, {'start': 108, 'end': 116, 'token_start': 8, 'token_end': 8, 'label': 'LastName'}, {'start': 141, 'end': 155, 'token_start': 14, 'token_end': 14, 'label': 'FirstName'}, {'start': 229, 'end': 232, 'token_start': 3, 'token_end': 3, 'label': 'Gender_nid'}, {'start': 233, 'end': 236, 'token_start': 3, 'token_end': 3, 'label': 'Nationality_nid'}, {'start': 237, 'end': 247, 'token_start': 10, 'token_end': 10, 'label': 'DateOfBirth_nid'}, {'start': 288, 'end': 303, 'token_start': 15, 'token_end': 15, 'label': 'Ssn'}, {'start': 323, 'end': 337, 'token_start': 14, 'token_end': 14, 'label': 'CardNumber'}, {'start': 362, 'end': 372, 'token_start': 10, 'token_end': 10, 'label': 'ValidUntil_nid'}]}
So when a i have a start and end position of "LastName"entity, in the example is "Dermrive", when i produce another, shorter or longer LastName for example "Brad", i need to change all the rest by difference of this words, so that other labels stays in the correct postition. Its works perfecly with one entity, but when i try to change all of them, the output is messy and labels are not correct anymore.
def replace_text_by_index_and_type(self, new_text, type):
label_position = self.search_label_position_in_spans(self.annotation['spans'], type.value)
label = self.annotation['spans'][label_position]
begin_new_string = self.annotation["text"][:label["start"]]
end_new_string = self.annotation["text"][label["end"]:]
new_string = begin_new_string + new_text + end_new_string
for to_change_ent in self.annotation['spans'][label_position+1:]:
diff = len(new_text) - (label["end"] - label["start"])
self.annotation['spans'][label_position]["end"] = self.annotation['spans'][label_position]["end"] + diff
#print(f"Diff between original {to_change_ent} and new_string: {diff}")
to_change_ent["start"] += diff
to_change_ent["end"] += diff
return new_string
I start to change all entities from the second one, to keep the start position of first one. And add diff to ending position of first entity, as a results the firstname and lastname are correct, but other entities are shifted to mess.

Related

Python create list of dicts

I am new to python and I am trying to construct data structure from existing data.
I have following:
[
{'UserName': 'aaa', 'AccessKeyId': 'AKIAYWQTISJD6X27YVK', 'Status': 'Active', 'CreateDate': datetime.datetime(2022, 9, 8, 15, 56, 39, tzinfo=tzutc())},
{'UserName': 'eee', 'AccessKeyId': 'AKIAYWQTISJD6QXMAKY', 'Status': 'Active', 'CreateDate': datetime.datetime(2023, 1, 24, 12, 30, 59, tzinfo=tzutc())},
{'UserName': 'eee', 'AccessKeyId': 'AKIAYWQTISJDUARK6FV', 'Status': 'Active', 'CreateDate': datetime.datetime(2023, 1, 24, 16, 58, 38, tzinfo=tzutc())}
]
I need to get this:
{
"aaa": [
{'AccessKeyId': 'AKIAYWQTISJD6X27YVK', 'Status': 'Active', 'CreateDate': datetime.datetime(2022, 9, 8, 15, 56, 39, tzinfo=tzutc())}],
"eee": [
{'AccessKeyId': 'AKIAYWQTISJD6QXMAKY', 'Status': 'Active', 'CreateDate': datetime.datetime(2023, 1, 24, 12, 30, 59, tzinfo=tzutc())},
{'AccessKeyId': 'AKIAYWQTISJDUARK6FV', 'Status': 'Active', 'CreateDate': datetime.datetime(2023, 1, 24, 16, 58, 38, tzinfo=tzutc())}
]
}
I tried following:
list_per_user = {i['UserName']: copy.deepcopy(i) for i in key_list}
for obj in list_per_user:
del list_per_user[obj]['UserName']
but I am missing array here. So in case of two keys per user I will have only last one with this. I don't know how to get the list I need per user.
Thanks!
Create an external dict that maps username -> list of entries.
data = [
{'UserName': 'aaa', 'AccessKeyId': 'AKIAYWQTISJD6X27YVK', 'Status': 'Active', 'CreateDate': datetime.datetime(2022, 9, 8, 15, 56, 39, tzinfo=tzutc())},
{'UserName': 'eee', 'AccessKeyId': 'AKIAYWQTISJD6QXMAKY', 'Status': 'Active', 'CreateDate': datetime.datetime(2023, 1, 24, 12, 30, 59, tzinfo=tzutc())},
{'UserName': 'eee', 'AccessKeyId': 'AKIAYWQTISJDUARK6FV', 'Status': 'Active', 'CreateDate': datetime.datetime(2023, 1, 24, 16, 58, 38, tzinfo=tzutc())}
]
new_data = {}
for entry in data:
new_data.setdefault(entry["UserName"], []).append(
{k: v for k, v in entry.items() if k != "UserName"}
)
print(new_data)
Output (some fields hidden because I don't want to import those libraries in my repl, but they'll be there when you run it)
{'aaa': [{'AccessKeyId': 'AKIAYWQTISJD6X27YVK', 'Status': 'Active'}],
'eee': [{'AccessKeyId': 'AKIAYWQTISJD6QXMAKY', 'Status': 'Active'},
{'AccessKeyId': 'AKIAYWQTISJDUARK6FV', 'Status': 'Active'}]}

how to convert a string into an array of variables

how can i take a string like this
string = "image1 [{'box': [35, 0, 112, 36], 'score': 0.8626706004142761, 'label': 'FACE_F'}, {'box': [71, 80, 149, 149], 'score': 0.8010843992233276, 'label': 'FACE_F'}, {'box': [0, 81, 80, 149], 'score': 0.7892318964004517, 'label': 'FACE_F'}]"
and turn it into variables like this?
filename = "image1"
box = [35, 0, 112, 36]
score = 0.8010843992233276
label = "FACE_F"
or if there are more than one of box, score, or label
filename = "image1"
box = [[71, 80, 149, 149], [35, 0, 112, 36], [0, 81, 80, 149]]
score = [0.8010843992233276, 0.8626706004142761, 0.7892318964004517]
label = ["FACE_F", "FACE_F", "FACE_F"]
this is how far i've gotten
log = open(r'C:\Users\15868\Desktop\python\log.txt', "r")
data = log.readline()
log.close()
print(data)
filename = data.split(" ")[0]
info = data.rsplit(" ")[1]
print(filename)
print(info)
output
[{'box':
image1
Here is how I would do it:
import ast
string = "image1 [{'box': [35, 0, 112, 36], 'score': 0.8626706004142761, 'label': 'FACE_F'}, {'box': [71, 80, 149, 149], 'score': 0.8010843992233276, 'label': 'FACE_F'}, {'box': [0, 81, 80, 149], 'score': 0.7892318964004517, 'label': 'FACE_F'}]"
filename, data = string.split(' ', 1)
data = ast.literal_eval(data)
print(filename)
print(data)
Output:
image1
[{'box': [35, 0, 112, 36], 'score': 0.8626706004142761, 'label': 'FACE_F'}, {'box': [71, 80, 149, 149], 'score': 0.8010843992233276, 'label': 'FACE_F'}, {'box': [0, 81, 80, 149], 'score': 0.7892318964004517, 'label': 'FACE_F'}]
(updated to follow your example of combining the keys):
From there I'd just write some simple code, something like:
box = []
score = []
label = []
for row in data:
box.append(row['box'])
score.append(row['score'])
label.append(row['label'])
To unpack that data there are fancier ways but that is the most straight forward, for example:
box, score, label = zip(*{ x.values() for x in data })

Python barplot from Counter

I am brand new to Python and I am starting a BS in data analytics in August I am trying to get a head start on learning. Can anyone solve this for me?
from collections import Counter
Counter(one_d)
returns the following
Counter({'Action': 303,
'Adventure': 259,
'Sci-Fi': 120,
'Mystery': 106,
'Horror': 119,
'Thriller': 195,
'Animation': 49,
'Comedy': 279,
'Family': 51,
'Fantasy': 101,
'Drama': 513,
'Music': 16,
'Biography': 81,
'Romance': 141,
'History': 29,
'Crime': 150,
'Western': 7,
'War': 13,
'Musical': 5,
'Sport': 18})
I would like to create a Barplot but am unsure how to do this. Is barplot even the best choice for this data?
The Pandas library is quite useful for data analytics and visualization:
from collections import Counter
import pandas as pd
counts = Counter({'Action': 303, 'Adventure': 259, 'Sci-Fi': 120, 'Mystery': 106, 'Horror': 119, 'Thriller': 195, 'Animation': 49, 'Comedy': 279, 'Family': 51, 'Fantasy': 101, 'Drama': 513, 'Music': 16, 'Biography': 81, 'Romance': 141, 'History': 29, 'Crime': 150, 'Western': 7, 'War': 13, 'Musical': 5, 'Sport': 18})
data = pd.Series(counts)
ax = data.plot.bar()
ax.set(xlabel='Genre', ylabel='Count', title='Good luck on your BS')

Difference between dict.values and dict[key].values

What is the difference between studentsDict.values() and studentsDict[key].values in the following code?
studentsDict = {'Ayush': {'maths': 24, 'english': 19, 'hindi': 97, 'bio': 20, 'science': 0}, 'Pankaj': {'maths': 52, 'english': 76, 'hindi': 68, 'bio': 97, 'science': 66}, 'Raj': {'maths': 85, 'english': 79, 'hindi': 51, 'bio': 36, 'science': 75}, 'iC5z4DK': {'maths': 24, 'english': 92, 'hindi': 31, 'bio': 29, 'science': 91}, 'Zf1WSV6': {'maths': 81, 'english': 58, 'hindi': 85, 'bio': 31, 'science': 7}}
for key in studentsDict.keys():
for marks in studentsDict[key].values():
if marks < 33:
print(key, "FAILED")
break
studentsDict.keys() gives you each of the keys in the outer dict: "Ayush", "Pankaj", "Raj", "iC5z4DK" and "Zf1WSV6".
studentsDict[key].values() gives you the values for the entry in studentsDict corresponding to key. For example, if key is "Ayush", you would get 24, 19, 97, 20, and 0.

Printing a data frame in Pandas and Python [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
I am new to pandas and trying to solve a problem of a basic code to form a data frame. I wrote two rows the data frame to try, but it is not working. I do not know the problem is about the continuation of the dictionaries and the list on the new line or something else. Do I need to use backslash when moving to the new line? Any help is appreciated.
Here is the code:
import pandas as pd
data = [{'#':1, 'Name': 'BS', 'Type 1': 'grass', 'type 2': 'poison', 'Total': 318, 'HP': 45, 'Attack': 49, 'Defense': 49, 'Sp. Atk': 65, 'Sp. Def': 65, 'Speed': 45, 'Generation': 1, 'Legendary':'false'}, {'#':2, 'Name': 'IS', 'Type 1': 'grass', 'type 2': 'poison', 'Total': 405, 'HP': 60, 'Attack': 62, Defense': 63, 'Sp. Atk': 80, 'Sp. Def': 80, 'Speed': 60, 'Generation': 1, 'Legendary':'false'}]
df = pd.DataFrame(data)
Your problem is syntax error with the ‘defense’ key element. There is a missing apostrophe.
data = [{'#':1, 'Name': 'BS', 'Type 1': 'grass', 'type 2': 'poison', 'Total': 318, 'HP': 45, 'Attack': 49,
'Defense': 49, 'Sp. Atk': 65, 'Sp. Def': 65, 'Speed': 45, 'Generation': 1, 'Legendary':'false'},
{'#':2, 'Name': 'IS', 'Type 1': 'grass', 'type 2': 'poison', 'Total': 405, 'HP': 60, 'Attack': 62,
'Defense': 63, 'Sp. Atk': 80, 'Sp. Def': 80, 'Speed': 60, 'Generation': 1, 'Legendary':'false'}]
>>> pd.DataFrame(data)
# Name Type 1 type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 1 BS grass poison 318 45 49 49 65 65 45 1 false
1 2 IS grass poison 405 60 62 63 80 80 60 1 false
import pandas as pd
data = [{'#':1, 'Name': 'BS', 'Type 1': 'grass', 'type 2': 'poison', 'Total': 318,
'HP': 45, 'Attack': 49, 'Defense': 49, 'Sp. Atk': 65, 'Sp. Def': 65, 'Speed': 45,
'Generation': 1, 'Legendary':'false'}, {'#':2, 'Name': 'IS', 'Type 1': 'grass',
'type 2': 'poison', 'Total': 405, 'HP': 60,
'Attack': 62, 'Defense': 63,
'Sp. Atk': 80, 'Sp. Def': 80,
'Speed': 60, 'Generation': 1, 'Legendary':'false'}]
df = pd.DataFrame(data)
print(df)

Categories

Resources