How to retrieve particular values from a list: python - python

I have a list of items
list = [{
'id': '1',
'elements': 'A',
'table': 'path/to/table1',
'chart': 'path/to/chart1',
},
{
'id': '2',
'elements': 'B',
'table': 'path/to/table2',
'chart': 'path/to/chart2',
},
{
'id': '3',
'elements': 'C',
'table': 'path/to/table3',
'chart': 'path/to/chart3',
},]
selectionsFromTable = [{('A','2011','Table','Total'),
('C','2011','Bar','Total'),
('B','Pie','2001','Total')}]
Compare list['elements'] with selectionsFromTable items and if elem in selectionsFromTable == list['elements'] then append its respective table or chart to arr[].
Assume selectionsFromTable is form data getting from jquery. The index values and position of items always changes here.
i am doing this.
arr = []
for data in list:
if data['elements'] in selectionsFromTable: # Suggest condition here
inner = []
if 'Table' in selectionsFromTable:
print("table")
inner.append({'id': data['id'],
'section_title': data['elements'],
'tableorChart': data['table'],
})
elif 'Bar' in selectionsFromTable or 'Pie' in selectionsFromTable :
print("chart")
inner.append({'id': data['id'],
'section_title': data['elements'],
'tableorChart': data['chart'],
})
arr.append(inner)
I believe it is wrong, kindly suggest me logic here. I am not able to go to "elif" condition as my selectionsFromTable contains "Table".

According to the example you have given, the following check would not work-
data['elements'] in selectionsFromTable
This is because, selectionsFromTable , contains set types, whereas data['elements'] is a string . You want to check inside each element (set) in selectionsFromTable .
A simple way to do this -
arr = []
for data in list:
elem = next((s for s in selectionsFromTable if data['elements'] in s), None)
if elem:
inner = []
if 'Table' in elem:
print("table")
inner.append({'id': data['id'],
'section_title': data['elements'],
'tableorChart': data['table'],
})
elif ('Bar' in elem) or ('Pie' in elem):
print("chart")
inner.append({'id': data['id'],
'section_title': data['elements'],
'tableorChart': data['chart'],
})
arr.append(inner)
Main change would be the line -
elem = next((s for s in selectionsFromTable if data['elements'] in s), None)
What this does is that it either tries to take the first element in selectionsFromTable in which data['elements'] , or if no such elements exists (that is the generator expression did not yield even a single value) , it returns None .
Then in next line we check if elem is not None and then do similar logic after that based on elem (not selectionsFromTable ).
Also, you should not used list as the name of a variable, it would end up masking the list() built-in function and you would not be able to use it afterwards in the same script.
Example/Demo -
>>> list = [{
... 'id': '1',
... 'elements': 'A',
... 'table': 'path/to/table1',
... 'chart': 'path/to/chart1',
... },
... {
... 'id': '2',
... 'elements': 'B',
... 'table': 'path/to/table2',
... 'chart': 'path/to/chart2',
... },
... {
... 'id': '3',
... 'elements': 'C',
... 'table': 'path/to/table3',
... 'chart': 'path/to/chart3',
... },]
>>>
>>> selectionsFromTable = [{'A','2011','Table','Total'},
... {'C','2011','Bar','Total'},
... {'B','Pie','2001','Total'}]
>>> arr = []
>>> for data in list:
... elem = next((s for s in selectionsFromTable if data['elements'] in s), None)
... if elem:
... inner = []
... if 'Table' in elem:
... print("table")
... inner.append({'id': data['id'],
... 'section_title': data['elements'],
... 'tableorChart': data['table'],
... })
... elif ('Bar' in elem) or ('Pie' in elem):
... print("chart")
... inner.append({'id': data['id'],
... 'section_title': data['elements'],
... 'tableorChart': data['chart'],
... })
... arr.append(inner)
...
table
chart
chart
>>> import pprint
>>> pprint.pprint(arr)
[[{'id': '1', 'section_title': 'A', 'tableorChart': 'path/to/table1'}],
[{'id': '2', 'section_title': 'B', 'tableorChart': 'path/to/chart2'}],
[{'id': '3', 'section_title': 'C', 'tableorChart': 'path/to/chart3'}]]

Related

Grab specific strings within a for loop with variable nested length

I have the following telegram export JSON dataset:
import pandas as pd
df = pd.read_json("data/result.json")
>>>df.colums
Index(['name', 'type', 'id', 'messages'], dtype='object')
>>> type(df)
<class 'pandas.core.frame.DataFrame'>
# Sample output
sample_df = pd.DataFrame({"messages": [
{"id": 11, "from": "user3984", "text": "Do you like soccer?"},
{"id": 312, "from": "user837", "text": ['Not sure', {'type': 'hashtag', 'text': '#confused'}]},
{"id": 4324, "from": "user3984", "text": ['O ', {'type': 'mention', 'text': '#user87324'}, ' really?']}
]})
Within df, there's a "messages" column, which has the following output:
>>> df["messages"]
0 {'id': -999713937, 'type': 'service', 'date': ...
1 {'id': -999713936, 'type': 'service', 'date': ...
2 {'id': -999713935, 'type': 'message', 'date': ...
3 {'id': -999713934, 'type': 'message', 'date': ...
4 {'id': -999713933, 'type': 'message', 'date': ...
...
22377 {'id': 22102, 'type': 'message', 'date': '2022...
22378 {'id': 22103, 'type': 'message', 'date': '2022...
22379 {'id': 22104, 'type': 'message', 'date': '2022...
22380 {'id': 22105, 'type': 'message', 'date': '2022...
22381 {'id': 22106, 'type': 'message', 'date': '2022...
Name: messages, Length: 22382, dtype: object
Within messages, there's a particular key named "text", and that's the place I want to focus. Turns out when you explore the data, text column can have:
A single text:
>>> df["messages"][5]["text"]
'JAJAJAJAJAJAJA'
>>> df["messages"][22262]["text"]
'No creo'
But sometimes it's nested. Like the following:
>>> df["messages"][22373]["text"]
['O ', {'type': 'mention', 'text': '#user87324'}, ' really?']
>>> df["messages"][22189]["text"]
['The average married couple has sex roughly once a week. ', {'type': 'mention', 'text': '#googlefactss'}, ' ', {'type': 'hashtag', 'text': '#funfact'}]
>>> df["messages"][22345]["text"]
[{'type': 'mention', 'text': '#user817430'}]
In case for nested data, if I want to grab the main text, I can do the following:
>>> df["messages"][22373]["text"][0]
'O '
>>> df["messages"][22189]["text"][0]
'The average married couple has sex roughly once a week. '
>>>
From here, everything seems ok. However, the problem arrives when I do the for loop. If I try the following:
for item in df["messages"]:
tg_id = item.get("id", "None")
tg_type = item.get("type", "None")
tg_date = item.get("date", "None")
tg_from = item.get("from", "None")
tg_text = item.get("text", "None")
print(tg_id, tg_from, tg_text)
A sample output is:
21263 user3984 jajajajaja
21264 user837 ['Not sure', {'type': 'hashtag', 'text': '#confused'}]
21265 user3984 What time is it?✋
MY ASK: How to flatten the rows? I need the following (and store that in a data frame):
21263 user3984 jajajajaja
21264 user837 Not sure
21265 user837 type: hashtag
21266 user837 text: #confused
21267 user3984 What time is it?✋
I tried to detect "text" type like this:
for item in df["messages"]:
tg_id = item.get("id", "None")
tg_type = item.get("type", "None")
tg_date = item.get("date", "None")
tg_from = item.get("from", "None")
tg_text = item.get("text", "None")
if type(tg_text) == list:
tg_text = tg_text[0]
print(tg_id, tg_from, tg_text)
With this I only grab the first text, but I'm expecting to grab the other fields as well or to 'flatten' the data.
I also tried:
for item in df["messages"]:
tg_id = item.get("id", "None")
tg_type = item.get("type", "None")
tg_date = item.get("date", "None")
tg_from = item.get("from", "None")
tg_text = item.get("text", "None")
if type(tg_text) == list:
tg_text = tg_text[0]
tg_second = tg_text[1]["text"]
print(tg_id, tg_from, tg_text, tg_second)
But no luck because indices are variable, length from messages are variable too.
In addition, even if the output weren't close of my desired solution, I also tried:
for item in df["messages"]:
tg_text = item.get("text", "None")
if type(tg_text) == list:
for i in tg_text:
print(item, i)
mydict = {}
for k, v in df.items():
print(k, v)
mydict[k] = v
# Used df["text"].explode()
# Used json_normalize but no luck
Any thoughts?
Assuming a dataframe like the following:
df = pd.DataFrame({"messages": [
{"id": 21263, "from": "user3984", "text": "jajajajaja"},
{"id": 21264, "from": "user837", "text": ['Not sure', {'type': 'hashtag', 'text': '#confused'}]},
{"id": 21265, "from": "user3984", "text": ['O ', {'type': 'mention', 'text': '#user87324'}, ' really?']}
]})
First, expand the messages dictionaries into separate id, from and text columns.
expanded = pd.concat([df.drop("messages", axis=1), pd.json_normalize(df["messages"])], axis=1)
Then explode the dataframe to have a row for each entry in text:
exploded = expanded.explode("text")
Then expand the dictionaries that are in some of the entries, converting them to lists of text:
def convert_dict(entry):
if type(entry) is dict:
return [f"{k}: {v}" for k, v in entry.items()]
else:
return entry
exploded["text"] = exploded["text"].apply(convert_dict)
Finally, explode again to separate the converted dicts to separate rows.
final = exploded.explode("text")
The resulting output should look like this
id from text
0 21263 user3984 jajajajaja
1 21264 user837 Not sure
1 21264 user837 type: hashtag
1 21264 user837 text: #confused
2 21265 user3984 O
2 21265 user3984 type: mention
2 21265 user3984 text: #user87324
2 21265 user3984 really?
Just to share some ideas to flatten your list,
def flatlist(srclist):
flatlist=[]
if srclist: #check if srclist is not None
for item in srclist:
if(type(item) == str): #check if item is type of string
flatlist.append(item)
if(type(item) == dict): #check if item is type of dict
for x in item:
flatlist.append(x + ' ' + item[x]) #combine key and value
return flatlist
for item in df["messages"]:
tg_text = item.get("text", "None")
flat_list = flatlist(tg_text) # get the flattened list
for tg in flat_list: # loop through the list and get the data you want
tg_id = item.get("id", "None")
tg_from = item.get("from", "None")
print(tg_id, tg_from, tg)

How to optimize a nested for loop with filtering in python

I'm trying to optimize a nested for-loop with filtering, the code looks like:
user_ids = ['A', 'B', 'C']
all_dict_1 = [
{
'id': 'all',
'user_id': 'B',
},
{
'id': 'foo',
'user_id': 'B',
},
{
'id': 'bar',
'user_id': 'A',
},
{
'id': 'bar',
'user_id': 'D',
},
]
all_dict_2 = [
{
'id': 'all',
'percentage': 0.2,
},
{
'id': 'foo',
'percentage': 0.3,
},
]
def _filter(dict_1, dict_2, user_ids):
if str(dict_1['user_id']) in user_ids:
if dict_2['id'] == 'all':
dict_1['percentage'] = dict_2['percentage']
return dict_1
if dict_1['id'] == dict_2['id']:
dict_1['percentage'] = dict_2['percentage']
return dict_1
return None
hits = [_filter(x, y, user_ids) for x in all_dict_1 for y in all_dict_2]
hits = [i for i in hits if i] # Removing None values
the all_dict_1 list is particularly long (thousands of objects), so the function takes more than 1s to run 😕
Are there any libraries or technics to make it quicker?
The logic in your question can be reduced to the following list-comprehension, which should be slightly faster:
>>> hits = [{**x, 'percentage': y['percentage']}
for x in all_dict_1 for y in all_dict_2
if x['user_id'] in user_ids and
(y['id'] == 'all' or x['id'] == y['id'])]
>>> hits
[{'id': 'all', 'user_id': 'B', 'percentage': 0.2},
{'id': 'foo', 'user_id': 'B', 'percentage': 0.2},
{'id': 'foo', 'user_id': 'B', 'percentage': 0.3},
{'id': 'bar', 'user_id': 'A', 'percentage': 0.2}]
Make user_ids a set to speed up item in user_ids tests. Filter with this first, since it rejects entries that you don't have to process at all. Use filter to avoid repeated global name lookups.
user_ids = {'A', 'B', 'C'}
filtered_dict_1 = filter(
lambda item, ids=user_ids: item['user_id'] in ids,
all_dict_1
)
Change all_dict_2 into an actual dict to allow O(1) access instead of O(n) scanning. When iterating over your entries to change them, directly access the required percentage or use an explicit default.
all_dict_2 = {
'foo': 0.3,
}
def add_percentage(item, default=0.2, percentages=all_dict_2):
item["percentage"] = percentages.get(item['id'], default)
return item
Apply the transformation using map to avoid repeated lookups of your transformation function.
hits = list(map(add_percentage, filtered_dict_1))

KeyError: 'name' Why can't I use 'name'?

I wanna make a dictionary has name's key & data.In views.py I wrote
data_dict ={}
def try_to_int(arg):
try:
return int(arg)
except:
return arg
def main():
book4 = xlrd.open_workbook('./data/excel1.xlsx')
sheet4 = book4.sheet_by_index(0)
data_dict_origin = OrderedDict()
tag_list = sheet4.row_values(0)[1:]
for row_index in range(1, sheet4.nrows):
row = sheet4.row_values(row_index)[1:]
row = list(map(try_to_int, row))
data_dict_origin[row_index] = dict(zip(tag_list, row))
if data_dict_origin['name'] in data_dict:
data_dict[data_dict_origin['name']].update(data_dict_origin)
else:
data_dict[data_dict_origin['name']] = data_dict_origin
main()
When I printed out data_dict,it is
OrderedDict([(1, {'user_id': '100', 'group': 'A', 'name': 'Tom', 'dormitory': 'C'}), (2, {'user_id': '50', 'group': 'B', 'name': 'Blear', 'dormitory': 'E'})])
My ideal dictionary is
dicts = {
Tom: {
'user_id': '100',
'group': 'A',
'name': 'Tom',
'dormitory': 'C'
},
Blear: {
},
}
How should I fix this?What should I write it?
The code is using the wrong key in the dictionary. The keys are 1, 2, and do not have the name key. You can use this code instead:
for value in data_dict.values():
if value['name'] in data_dict:
data_dict[value['name']].update(value)
else:
data_dict[value['name']] = value
Your data_dict_origin has numbers as keys and dicts as values (which technically makes it a sparse array of dicts). The "name" key exists in those dicts, not in your data_dict.

list of dict from existing list of dict in python

I am very new to Python and wondering some kind of solutions to the below issue.
original_list = [{'Table':'A', 'Column':'C1','Data_Type':'int','Column_Style':None, 'others':'O1'},
{'Table':'A', 'Column':'C2', 'Data_Type':'varchar','Column_Style': '20','others':'O2'},
{'Table':'A', 'Column':'C2', 'Data_Type':'numeric','Column_Style': '10,2','others':'O3'}
]
I want to return a list of dictionary where the key is in ['Table', 'Data_Type', 'Column'] and value of Data_Type is the concatenated value of Data_Type and Column_Style.
# expecting output like below
new_list = [{'Table':'A', 'Column':'C1', 'Data_Type':'int'},
{'Table':'A', 'Column':'C2', 'Data_Type':'varchar(20)'},
{'Table':'A', 'Column':'C2', 'Data_Type':'numeric(10,2)'}
]
new_list = []
for innerDict in original_list:
newDict = {}
for key in innerDict:
if key not in ['Data_Type', 'Column_Style', 'others']:
newDict[key] = innerDict[key]
elif key == 'Data_Type':
if innerDict['Column_Style']:
newDict['Data_Type'] = innerDict['Data_Type'] + '(' + innerDict['Column_Style'] + ')'
else:
newDict['Data_Type'] = innerDict['Data_Type']
new_list.append(newDict)
new_list will contain the output that you requested, assuming that original_list is the input list as you have provided it above.
Actually you can use a generator function to generate a dict that match your criteria for each element in your original list of dict
def gen_dict(ori_dict_list):
columns = ['Table', 'Data_Type', 'Column']
for element in ori_dict_list:
d = {}
for field in columns:
if field == 'Data_Type':
if element['Column_Style'] is None:
d['Data_Type'] = element['Data_Type']
else:
d['Data_Type'] = "{}({})".format(element['Data_Type'], element["Column_Style"])
else:
d[field] = element[field]
yield d
Demo:
>>> from pprint import pprint # Just to pretty print nothing special
>>> pprint(list(gen_dict(original_list)))
[{'Column': 'C1', 'Data_Type': 'int', 'Table': 'A'},
{'Column': 'C2', 'Data_Type': 'varchar(20)', 'Table': 'A'},
{'Column': 'C2', 'Data_Type': 'numeric(10,2)', 'Table': 'A'}]

How to retrieve values from list: python

Summary: I have a list
list = [
{
'id': '1',
'elements': 'A',
'table': 'maps/partials/a.html',
'chart': 'maps/partials/charts/a.html',
},
{
'id': '2',
'elements': 'B',
'table': 'maps/partials/census/b.html',
'chart': 'maps/partials/charts/b.html',
},
{
'id': '3',
'elements': ('C','D','E','F'), //i believe it is wrong
'table': 'maps/partials/census/common.html',
'chart': 'maps/partials/charts/common.html',
},]
some_arr = ['E','2011','English','Total']
I want to compare elements with items in some_arr.
and i am doing this to get list and compare it with some_arr.
for data in list:
for i in range(len(some_arr)):
if data['elements'] == some_arr[i]:
print(data['id'])
As you can see in 'id':3 section_title has 4 values. How do i compare elements here with some_arr.
If you want to find any matches:
lst = [
{
'id': '1',
'elements': 'A',
'table': 'maps/partials/a.html',
'chart': 'maps/partials/charts/a.html',
},
{
'id': '2',
'elements': 'B',
'table': 'maps/partials/census/b.html',
'chart': 'maps/partials/charts/b.html',
},
{
'id': '3',
'elements': ('C','D','E','F'),
'table': 'maps/partials/census/common.html',
'chart': 'maps/partials/charts/common.html',
}]
some_arr = ['E','2011','English','Total']
st = set(some_arr)
from collections import Iterable
for d in lst:
val = d["elements"]
if isinstance(val, Iterable) and not isinstance(val, str):
if any(ele in st for ele in val):
print(d["id"])
else:
if val in st:
print(d["id"])
Making a set of all the elements in some_arr will give you O(1) lookups, using isinstance(val, Iterable) and not isinstance(val, str) will catch any iterable value like a list, tuple etc.. and avoid iterating over a string which could give you false positives as "F" is in "Foo".
any will short circuit on the first match so if you actually want to print id for every match then use a for loop. Lastly if you are using python2 use basestring instead of str.
You can use set.intersection to get the intersection between them but before you need to check that if it's a tuple then use intersection.
Also in the first case you can use in to check the membership to check if data['elememnt'] in in some_arr then print data['id']:
for data in list:
d=data['elements']
if isinstance(d,tuple):
if set(d).intersection(some_arr):
print(data['id'])
if d in some_arr:
print(data['id'])

Categories

Resources