I have a python dict that looks like this
{'data': [{'data': [{'data': 'gen1', 'name': 'objectID'},
{'data': 'familyX', 'name': 'family'}],
'name': 'An-instance-of-A'},
{'data': [{'data': 'gen2', 'name': 'objectID'},
{'data': 'familyY', 'name': 'family'},
{'data': [{'data': [{'data': '21',
'name': 'objectID'},
{'data': 'name-for-21',
'name': 'name'},
{'data': 'no-name', 'name': None}],
'name': 'An-instance-of-X:'},
{'data': [{'data': '22',
'name': 'objectID'}],
'name': 'An-instance-of-X:'}],
'name': 'List-of-2-X-elements:'}],
'name': 'An-instance-of-A'}],
'name': 'main'}
The structure is repeating and its rule is like:
A dict contains 'name' and 'data'
'data' can contain a list of dicts
If 'data' is not a list, it is a value I need.
'name' is a just a name
The problem is that for each value, I need to know every info for each parent.
So at the end, I need to print a list with items that looks something like:
objectID=gen2 family=familyY An-instance-of-X_objectID=21 An-instance-of-X_name=name-for-21
Edit: This is only one of several lines I want as the output. I need one line like this for each item that doesn’t have a dict as 'data'.
So, for each data that is not a dict, traverse up, find info and print it..
I don't know every function in modules like itertools and collections. But is there something in there I can use? What is this called (when I am trying to do research on my own)?
I can find many "flatten dict" methods, but not like this, not when I have 'data', 'name' like this..
This is a wonderful example what recursion is good for:
input_ = {'data': [{'data': [{'data': 'gen1', 'name': 'objectID'},
{'data': 'familyX', 'name': 'family'}],
'name': 'An-instance-of-A'},
{'data': [{'data': 'gen2', 'name': 'objectID'},
{'data': 'familyY', 'name': 'family'},
{'data': [{'data': [{'data': '21',
'name': 'objectID'},
{'data': 'name-for-21',
'name': 'name'},
{'data': 'no-name', 'name': None}],
'name': 'An-instance-of-X:'},
{'data': [{'data': '22',
'name': 'objectID'}],
'name': 'An-instance-of-X:'}],
'name': 'List-of-2-X-elements:'}],
'name': 'An-instance-of-A'}],
'name': 'main'}
def parse_dict(d, predecessors, output):
"""Recurse into dict and fill list of path-value-pairs"""
data = d["data"]
name = d["name"]
name = name.strip(":") if type(name) is str else name
if type(data) is list:
for d_ in data:
parse_dict(d_, predecessors + [name], output)
else:
output.append(("_".join(map(str,predecessors+[name])), data))
result = []
parse_dict(input_, [], result)
print "\n".join(map(lambda x: "%s=%s"%(x[0],x[1]),result))
Output:
main_An-instance-of-A_objectID=gen1
main_An-instance-of-A_family=familyX
main_An-instance-of-A_objectID=gen2
main_An-instance-of-A_family=familyY
main_An-instance-of-A_List-of-2-X-elements_An-instance-of-X_objectID=21
main_An-instance-of-A_List-of-2-X-elements_An-instance-of-X_name=name-for-21
main_An-instance-of-A_List-of-2-X-elements_An-instance-of-X_None=no-name
main_An-instance-of-A_List-of-2-X-elements_An-instance-of-X_objectID=22
I hope I understood your requirements correctly. If you don't want to join the paths into strings, you can keep the list of predecessors instead.
Greetings,
Thorsten
Related
I have a (Python) dictionary looking like this:
[
{
"data": "somedata1",
"name": "prefix1.7.9"
},
{
"data": "somedata2",
"name": "prefix1.7.90"
},
{
"data": "somedata3",
"name": "prefix1.1.1"
},
{
"data": "somedata4",
"name": "prefix4.1.1"
},
{
"data": "somedata5",
"name": "prefix4.1.2"
},
{
"data": "somedata5",
"name": "other 123"
},
{
"data": "somedata6",
"name": "different"
},
{
"data": "somedata7",
"name": "prefix1.7.11"
},
{
"data": "somedata7",
"name": "prefix1.11.9"
},
{
"data": "somedata7",
"name": "prefix1.17.9"
}
]
Now I want to sort it by "name" key.
If there postfix are numbers (splitted by 2 points) I want to sort it numerical.
e.g. with a resulting order:
different
other 123
prefix1.1.1
prefix1.1.9
prefix1.7.11
prefix1.7.90
prefix1.11.9
prefix1.17.9
prefix4.1.1
prefix4.1.2
Do you have an idea how to do this short and efficient?
The only idear I had, was to build a complete new list, but possibly this could also be done using a lambda function?
You can use re.findall with a regex that extracts either non-numerical words or digits from each name, and convert those that are digits to integers for numeric comparisons. To avoid comparisons between strings and integers, make the key a tuple where the first item is a Boolean of whether the token is numeric and the second item is the actual key for comparison:
import re
# initialize your input list as the lst variable
lst.sort(
key=lambda d: [
(s.isdigit(), int(s) if s.isdigit() else s)
for s in re.findall(r'[^\W\d]+|\d+', d['name'])
]
)
Demo: https://replit.com/#blhsing/ToughWholeInformationtechnology
You need to come up with a way of extracting your prefix, and your postfix from the 'name' values. This can be achieved using something like:
import math
def extract_prefix(s: str) -> str:
return s.split('.')[0]
def extract_postfix(s: str) -> float:
try:
return float('.'.join(s.split('.')[1:]))
except ValueError:
# if we cannot form a float i.e. no postfix exists, it'll be before some value with same prefix
return -math.inf
arr = [{'data': 'somedata1', 'name': 'prefix1.7.9'},
{'data': 'somedata2', 'name': 'prefix1.7.90'},
{'data': 'somedata3', 'name': 'prefix1.1.1'},
{'data': 'somedata4', 'name': 'prefix4.1.1'},
{'data': 'somedata5', 'name': 'prefix4.1.2'},
{'data': 'somedata5', 'name': 'other 123'},
{'data': 'somedata6', 'name': 'different'},
{'data': 'somedata7', 'name': 'prefix1.7.11'},
{'data': 'somedata7', 'name': 'prefix1.11.9'},
{'data': 'somedata7', 'name': 'prefix1.17.9'}]
result = sorted(sorted(arr, key=lambda d: extract_postfix(d['name'])), key=lambda d: extract_prefix(d['name']))
result:
[{'data': 'somedata6', 'name': 'different'},
{'data': 'somedata5', 'name': 'other 123'},
{'data': 'somedata3', 'name': 'prefix1.1.1'},
{'data': 'somedata7', 'name': 'prefix1.7.11'},
{'data': 'somedata1', 'name': 'prefix1.7.9'},
{'data': 'somedata2', 'name': 'prefix1.7.90'},
{'data': 'somedata7', 'name': 'prefix1.11.9'},
{'data': 'somedata7', 'name': 'prefix1.17.9'},
{'data': 'somedata4', 'name': 'prefix4.1.1'},
{'data': 'somedata5', 'name': 'prefix4.1.2'}]
Since you want to sort numerically you will need a helper function:
def split_name(s):
nameparts = s.split('.')
for i,p in enumerate(nameparts):
if p.isdigit():
nameparts[i] = int(p)
return nameparts
obj = obj.sort(key = lambda x:split_name(x['name']))
Here I am first sorting the list by version. Storing in the another list rank call rank, this list helps to replicates the ranking position for custom sorting.
Code using the pkg_resources:
from pkg_resources import parse_version
rank=sorted([v['name'] for v in Mydata], key=parse_version)
or
rank = sorted(sorted([v['name'] for v in Mydata], key=parse_version), key = lambda s: s[:3]=='pre') #To avoid the prefix value in sorting
sorted(Mydata, key = lambda x: rank.index(x['name']))
Output:
[{'data': 'somedata6', 'name': 'different'},
{'data': 'somedata5', 'name': 'other 123'},
{'data': 'somedata3', 'name': 'prefix1.1.1'},
{'data': 'somedata1', 'name': 'prefix1.7.9'},
{'data': 'somedata7', 'name': 'prefix1.7.11'},
{'data': 'somedata2', 'name': 'prefix1.7.90'},
{'data': 'somedata7', 'name': 'prefix1.11.9'},
{'data': 'somedata7', 'name': 'prefix1.17.9'},
{'data': 'somedata4', 'name': 'prefix4.1.1'},
{'data': 'somedata5', 'name': 'prefix4.1.2'}]
With another inputs:
[{'data': 'somedata6', 'name': 'Aop'},
{'data': 'somedata6', 'name': 'different'},
{'data': 'somedata5', 'name': 'other 123'},
{'data': 'somedata7', 'name': 'pop'},
{'data': 'somedata3', 'name': 'prefix1.hello'},
{'data': 'somedata3', 'name': 'prefix1.1.1'},
{'data': 'somedata4', 'name': 'prefix1.2.hello'},
{'data': 'somedata1', 'name': 'prefix1.7.9'},
{'data': 'somedata7', 'name': 'prefix1.7.11'},
{'data': 'somedata2', 'name': 'prefix1.7.90'},
{'data': 'somedata7', 'name': 'prefix1.17.9'},
{'data': 'somedata7', 'name': 'prefix1.17.9'},
{'data': 'somedata5', 'name': 'prefix4.1.2'},
{'data': 'somedata7', 'name': 'prefix9.1.1'},
{'data': 'somedata7', 'name': 'prefix10.11.9'}]
This is from an R guy.
I have this mess in a Pandas column: data['crew'].
array(["[{'credit_id': '54d5356ec3a3683ba0000039', 'department': 'Production', 'gender': 1, 'id': 494, 'job': 'Casting', 'name': 'Terri Taylor', 'profile_path': None}, {'credit_id': '56407fa89251417055000b58', 'department': 'Sound', 'gender': 0, 'id': 6745, 'job': 'Music Editor', 'name': 'Richard Henderson', 'profile_path': None}, {'credit_id': '5789212392514135d60025fd', 'department': 'Production', 'gender': 2, 'id': 9250, 'job': 'Executive In Charge Of Production', 'name': 'Jeffrey Stott', 'profile_path': None}, {'credit_id': '57892074c3a36835fa002886', 'department': 'Costume & Make-Up', 'gender': 0, 'id': 23783, 'job': 'Makeup Artist', 'name': 'Heather Plott', 'profile_path': None}
It goes on for quite some time. Each new dict starts with a credit_id field. One sell can hold several dicts in an array.
Assume I want the names of all Casting directors, as shown in the first entry. I need to check check the job entry in every dict and, if it's Casting, grab what's in the name field and store it in my data frame in data['crew'].
I tried several strategies, then backed off and went for something simple.
Running the following shut me down, so I can't even access a simple field. How can I get this done in Pandas.
for row in data.head().iterrows():
if row['crew'].job == 'Casting':
print(row['crew'])
EDIT: Error Message
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-138-aa6183fdf7ac> in <module>()
1 for row in data.head().iterrows():
----> 2 if row['crew'].job == 'Casting':
3 print(row['crew'])
TypeError: tuple indices must be integers or slices, not str
EDIT: Code used to get the array of dict (strings?) in the first place.
def convert_JSON(data_as_string):
try:
dict_representation = ast.literal_eval(data_as_string)
return dict_representation
except ValueError:
return []
data["crew"] = data["crew"].map(lambda x: sorted([d['name'] if d['job'] == 'Casting' else '' for d in convert_JSON(x)])).map(lambda x: ','.join(map(str, x))
To create a DataFrame from your sample data, write:
df = pd.DataFrame(data=[
{ 'credit_id': '54d5356ec3a3683ba0000039', 'department': 'Production',
'gender': 1, 'id': 494, 'job': 'Casting', 'name': 'Terri Taylor',
'profile_path': None},
{ 'credit_id': '56407fa89251417055000b58', 'department': 'Sound',
'gender': 0, 'id': 6745, 'job': 'Music Editor',
'name': 'Richard Henderson', 'profile_path': None},
{ 'credit_id': '5789212392514135d60025fd', 'department': 'Production',
'gender': 2, 'id': 9250, 'job': 'Executive In Charge Of Production',
'name': 'Jeffrey Stott', 'profile_path': None},
{ 'credit_id': '57892074c3a36835fa002886', 'department': 'Costume & Make-Up',
'gender': 0, 'id': 23783, 'job': 'Makeup Artist',
'name': 'Heather Plott', 'profile_path': None}])
Then you can get your data with a single instruction:
df[df.job == 'Casting'].name
The result is:
0 Terri Taylor
Name: name, dtype: object
The above result is Pandas Series object with names found.
In this case, 0 is the index value for the record found and
Terri Taylor is the name of (the only in your data) Casting Director.
Edit
If you want just a list (not Series), write:
df[df.job == 'Casting'].name.tolist()
The result is ['Terri Taylor'] - just a list.
I think, both my solutions should be quicker than "ordinary" loop
based on iterrows().
Checking the execution time, you may try also yet another solution:
df.query("job == 'Casting'").name.tolist()
==========
And as far as your code is concerned:
iterrows() returns each time a pair containing:
the key of the current row,
a named tuple - the content of this row.
So your loop should look something like:
for row in df.iterrows():
if row[1].job == 'Casting':
print(row[1]['name'])
You can not write row[1].name because it refers to the index value
(here we have a collision with default attributes of the named tuple).
I have 2 sets of data one returned from database and one from API response.
One returned from database is a list of tuples (variable Db).
One returned from API response is a list of dictionary (variable Api)
Is there a way to check if each tuple element in Db matches dictionary element in Api. I tried through multiple for loops but it did not work.
I would like to know if there is an elegant way (may be assert) to do this.
Db=[('D967E735-070D-48F9-A3BB-D00766D39F57', 'test1', '51-00401'),
('94F903D1-2EE7-4BD2-A0C6-B464D9F2939C', 'test4', '51-00404'),
('FE0CC34C-BA6A-4123-B72C-617ADC0A93E7', 'test10', '51-00409')]
Api=[{'Id': 'd967e735-070d-48f9-a3bb-d00766d39f57',
'name': 'test1',
'Number': '51-00401'},
{'Id': '94f903d1-2ee7-4bd2-a0c6-b464d9f2939c',
'name': 'test4',
'Number': '51-00404'},
{'Id': 'fe0cc34c-ba6a-4123-b72c-617adc0a93e7',
'name': 'test10',
'Number': '51-00409'}]
This is one way:
Db = [('D967E735-070D-48F9-A3BB-D00766D39F57', 'test1', '51-00401'),
('94F903D1-2EE7-4BD2-A0C6-B464D9F2939C', 'test4', '51-00404'),
('FE0CC34C-BA6A-4123-B72C-617ADC0A93E7', 'test10', '51-00409')]
Api = [{'Id': 'd967e735-070d-48f9-a3bb-d00766d39f57', 'name': 'test1', 'Number': '51-00401'},
{'Id': '94f903d1-2ee7-4bd2-a0c6-b464d9f2939c', 'name': 'test4', 'Number': '51-00404'},
{'Id': 'fe0cc34c-ba6a-4123-b72c-617adc0a93e7', 'name': 'test10', 'Number': '51-00409'}]
for i in Db:
match = i[0].lower() in {d['Id'].lower() for d in Api}
print(i[0], match)
# D967E735-070D-48F9-A3BB-D00766D39F57 True
# 94F903D1-2EE7-4BD2-A0C6-B464D9F2939C True
# FE0CC34C-BA6A-4123-B72C-617ADC0A93E7 True
This question already has answers here:
Deleting list elements based on condition
(2 answers)
Closed 6 years ago.
I've created an array for my output by doing this:
for i in nameList
test_array.append({'Name': i, 'Email': memberMail, 'Department': memberDepartment})
However, later in the code, I must remove designated values in my test_array depending on their email. After that, I can easily print out what I need to my csv file.
How do I delete a specific entry from this sort of dictionary list?
For those curious, when I print the array currently it looks like this:
[{'Department': 'Public Works', 'Email': 'joe#xyz.gov', 'Name': 'Joe'}, {'Department': 'Infrastructure', 'Email': 'bob#xyz.gov', 'Name': 'Bob'}, {'Department': 'IT', 'Email': 'suzanne#xyz.gov', 'Name': 'Suzanne'}]
For when you not want to modify test_array
filtered_test_array = filter(lambda entry: entry['Email'] != 'email#example.com', test_array)
Try this:
for i in range(0,len(nameList)):
if nameList[i]['Email'] == 'abc#pqr.com" :
index = i;
del nameList[index];
Try like this.
list_ = [{'Department': 'Public Works', 'Email': 'joe#xyz.gov', 'Name': 'Joe'}, {'Department': 'Infrastructure', 'Email': 'bob#xyz.gov', 'Name': 'Bob'}, {'Department': 'IT', 'Email': 'suzanne#xyz.gov', 'Name': 'Suzanne'}]
for val in list_:
if val['Email'] == 'joe#xyz.gov':
list_.remove(val)
Result
[{'Department': 'Infrastructure', 'Email': 'bob#xyz.gov', 'Name': 'Bob'},
{'Department': 'IT', 'Email': 'suzanne#xyz.gov', 'Name': 'Suzanne'}]
You can iterate over the list and delete with respect to a unique value or a key. Something like this:
for entry in test_array:
if entry['Email'] == 'suzanne#xyz.gov':
test_array.remove(entry)
I have a YAML file that parses into an object, e.g.:
{'name': [{'proj_directory': '/directory/'},
{'categories': [{'quick': [{'directory': 'quick'},
{'description': None},
{'table_name': 'quick'}]},
{'intermediate': [{'directory': 'intermediate'},
{'description': None},
{'table_name': 'intermediate'}]},
{'research': [{'directory': 'research'},
{'description': None},
{'table_name': 'research'}]}]},
{'nomenclature': [{'extension': 'nc'}
{'handler': 'script'},
{'filename': [{'id': [{'type': 'VARCHAR'}]},
{'date': [{'type': 'DATE'}]},
{'v': [{'type': 'INT'}]}]},
{'data': [{'time': [{'variable_name': 'time'},
{'units': 'minutes since 1-1-1980 00:00 UTC'},
{'latitude': [{'variable_n...
I'm having trouble accessing the data in python and regularly see the error TypeError: list indices must be integers, not str
I want to be able to access all elements corresponding to 'name' so to retrieve each data field I imagine it would look something like:
import yaml
settings_stream = open('file.yaml', 'r')
settingsMap = yaml.safe_load(settings_stream)
yaml_stream = True
print 'loaded settings for: ',
for project in settingsMap:
print project + ', ' + settingsMap[project]['project_directory']
and I would expect each element would be accessible via something like ['name']['categories']['quick']['directory']
and something a little deeper would just be:
['name']['nomenclature']['data']['latitude']['variable_name']
or am I completely wrong here?
The brackets, [], indicate that you have lists of dicts, not just a dict.
For example, settingsMap['name'] is a list of dicts.
Therefore, you need to select the correct dict in the list using an integer index, before you can select the key in the dict.
So, giving your current data structure, you'd need to use:
settingsMap['name'][1]['categories'][0]['quick'][0]['directory']
Or, revise the underlying YAML data structure.
For example, if the data structure looked like this:
settingsMap = {
'name':
{'proj_directory': '/directory/',
'categories': {'quick': {'directory': 'quick',
'description': None,
'table_name': 'quick'}},
'intermediate': {'directory': 'intermediate',
'description': None,
'table_name': 'intermediate'},
'research': {'directory': 'research',
'description': None,
'table_name': 'research'},
'nomenclature': {'extension': 'nc',
'handler': 'script',
'filename': {'id': {'type': 'VARCHAR'},
'date': {'type': 'DATE'},
'v': {'type': 'INT'}},
'data': {'time': {'variable_name': 'time',
'units': 'minutes since 1-1-1980 00:00 UTC'}}}}}
then you could access the same value as above with
settingsMap['name']['categories']['quick']['directory']
# quick