Removing keys and values from a list of strings - python

Edit: Sorry for misleading you all, I make amendment to the data. That's what the data looks like after loading it with pandas.
thanks for always helping out. I have a list of strings like this:
Index Data
0 "[{"name": "bob", "age":"11", "id":"94884-0abdvnd-90", "participantid":"Me", "sentiment":"NEUTRAL", "content":"Hey, how you doing."}]"
1 "[{"name": "Roland", "age":"16", "id":"94884-0abdvnd-90", "participantid":"boy", "sentiment":"NEUTRAL", "content":"Hey, I'm doing good and you?."}]"]
And my goal is to remove certain keys and values so I only have the content left. That is:
Index Data
0 "[{"content":"Hey, how you doing."}]"
1 "[{"content":"Hey, I'm doing good and you?."}]"]
My initial approach was to convert each string to list using eval, then loop over it, but that only works for one string at a time. i.e I can only eval on mylist[0] then mylist1 manually.
Here is the screenshot of the data:
Here is the sample of my code:
import ast
x = ast.literal_eval(mylist)
keys_to_keep = ["content"]
new_list = [{ key: item[key] for key in keys_to_keep } for item in x]
The above code will bring an error except I use x[0], x1 etc. Is there any better way of doing this?
Thanks.

You can use json module of Python standard library here - it is safer than using eval on the inputs which might contain code to pretty much anything.
E.g. like this (assuming mylist is a list of strings each of which is a valid json with one-element list):
import json
keys_to_keep = ["content"]
new_list = []
for x in mylist:
item = json.loads(x)
new_list.append({ key: item[key] for key in keys_to_keep})

Related

Python: Index slicing from a list for each index in for loop

I got stuck in slicing from a list of data inside a for loop.
list = ['[init.svc.logd]: [running]', '[init.svc.logd-reinit]: [stopped]']
what I am looking for is to print only key without it values (running/stopped)
Overall code,
for each in list:
print(each[:]) #not really sure what may work here
result expected:
init.svc.logd
anyone for a quick solution?
If you want print only the key, you could use the split function to take whatever is before : and then replace [ and ] with nothing if you don't want them:
list = ['[init.svc.logd]: [running]', '[init.svc.logd-reinit]: [stopped]']
for each in list:
print(each.split(":")[0].replace('[','').replace(']','')) #not really sure what may work here
which gives :
init.svc.logd
init.svc.logd-reinit
You should probably be using a regular expression. The concept of 'key' in the question is ambiguous as there are no data constructs shown that have keys - it's merely a list of strings. So...
import re
list_ = ['[init.svc.logd]: [running]', '[init.svc.logd-reinit]: [stopped]']
for e in list_:
if r := re.findall('\[(.*?)\]', e):
print(r[0])
Output:
init.svc.logd
init.svc.logd-reinit
Note:
This is more robust than string splitting solutions for cases where data are unexpectedly malformed

Output a dictionary based on inputs from another dictionary and two lists

I have a dictionary and two lists and would like to output another dictionary that contains the individual list as the title and sum of the list contents as the values however, I have no clue as to how I could do this.
results = {'Apple':'14.0', 'Banana':'12.0', 'Orange':'2.0', 'Pineapple':'9.0'}
ListA = ['Apple','Pineapple']
ListB = ['Banana','Orange']
Output:
dicttotal = {'ListA':'23.0', 'ListB':'14.0'}
Edit: I have decided to use pandas to work with the above data as I find that the simplicity of pandas is more suited for my level of understanding. Thanks for the help everyone!
in python you can use list comprehensions to make this easy to read:
items_for_a = [float(v) for i, v in results.items() if i in ListA]
total_a = sum(items_for_a)
the dicttotal you want to print is strange, though. I don't think you want to print variable names as dictionary keys.
in python2 you should use .iteritems() instead of .items()
You can use fllowing code get ListA's sum. The same way for ListB. Just try it yourself
dicttotal = {}
ListASum = 0.0
ListBSum = 0.0
for item in ListA:
if item in results:
ListASum += float(results[item])
dicttotal['ListA'] = ListASum
Reduce and dictionary comprehension to create dictionary with an initial value, followed by updating a single value. Had key names not been variable names maybe could have done dict comprehension for both.
from functools import reduce
d_total = {'ListA': str(reduce(lambda x, y: float(results[x]) + float(results[y]), lista))}
d_total['ListB'] = str(reduce(lambda x, y: float(results[x]) + float(results[y]), listb))
{'ListA': '23.0', 'ListB': '14.0'}
Note: PEP-8 snake_case for naming
One-liner, using the (ugly) eval() function, but as Eduard said, it's better not to use variable names as keys :
{list_name: str(sum([float(results[item]) for item in eval(list_name)])) for list_name in ['ListA', 'ListB']}

Sorting a list of dict from redis in python

in my current project i generate a list of data, each entry is from a key in redis in a special DB where only one type of key exist.
r = redis.StrictRedis(host=settings.REDIS_AD, port=settings.REDIS_PORT, db='14')
item_list = []
keys = r.keys('*')
for key in keys:
item = r.hgetall(key)
item_list.append(item)
newlist = sorted(item_list, key=operator.itemgetter('Id'))
The code above let me retrieve the data, create a list of dict each containing the information of an entry, problem is i would like to be able to sort them by ID, so they come out in order when displayed on my html tab in the template, but the sorted function doesn't seem to work since the table isn't sorted.
Any idea why the sorted line doesn't work ? i suppose i'm missing something to make it work but i can't find what.
EDIT :
Thanks to the answer in the comments,the problem was that my 'Id' come out of redis as a string and needed to be casted as int to be sorted
key=lambda d: int(d['Id'])
All values returned from redis are apparently strings and strings do not sort numerically ("10" < "2" == True).
Therefore you need to cast it to a numerical value, probably to int (since they seem to be IDs):
newlist = sorted(item_list, key=lambda d: int(d['Id']))

Finding the same second elements in nested lists - recursive function

I have nested lists looks like this;
[['CELTIC AMBASSASDOR', 'Warrenpoint'],['HAV SNAPPER', 'Silloth'],['BONAY', 'Antwerp'],['NINA', 'Antwerp'],['FRI SKIEN', 'Warrenpoint']]
and goes on. How can I find the lists that have same second elements, for example
['CELTIC AMBASSASDOR', 'Warrenpoint']
['FRI SKIEN', 'Warrenpoint']
['BONAY', 'Antwerp']
['NINA', 'Antwerp']
The list is too long (I'm reading it from a .csv file) and I can't determine to search which thing exactly (eg: I can't search for 'Antwerp' to find all Antwerps because I don't know all of the texts in csv file), so I thought I need a recursive function that will search until find the all nested lists seperated by second items. Couldn't figure out how to make the recursive function, if anyone has a better solution, much appreciated.
There's no need to use recursion here. Create a dictionary with a key of the second element and values of the whole sublist, then create a result that only includes the matches you're interested in:
import collections
l = [['CELTIC AMBASSASDOR', 'Warrenpoint'],['HAV SNAPPER', 'Silloth'],['BONAY', 'Antwerp'],['NINA', 'Antwerp'],['FRI SKIEN', 'Warrenpoint']]
d = collections.defaultdict(list)
for item in l:
d[item[1]].append(item)
result = dict(item for item in d.items() if len(d[item[0]]) > 1)
Result:
>>> import pprint
>>> pprint.pprint(result)
{'Antwerp': [['BONAY', 'Antwerp'], ['NINA', 'Antwerp']],
'Warrenpoint': [['CELTIC AMBASSASDOR', 'Warrenpoint'],
['FRI SKIEN', 'Warrenpoint']]}
filter(lambda x:x[1] in set(filter(lambda x:zip(*l)[1].count(x)==2,zip(*l)[1])),l)

Python split dictionary key at comma

I have a dictionary called muncounty- the keys are municipality, county. separted by a comma and the value is a zip code
muncounty['mun'+','+'county'] = 12345
My goal is to split the keys at the comma separating the mun and county and only extract the mun.
I have tried
muncounty.keys().split(',')
i know this does not work because you cannot use the split function on a list
You need some kind of looping, e.g. a list comprehension:
[key.split(',') for key in muncounty.keys()]
You're question and example code isn't very clear, but I think what you want is this:
for key in muncounty.keys():
mun, county = key.split(',')
Your current code is trying to perform split on a list, which you quite rightly point out can't be done. What the code above does is go through each key, and performs the split on it individually.
You could use map and a lambda function.
di = {'a.b':1}
map(lambda k: k.split('.'), di.keys())
[x.split(',')[0] for x in muncounty.keys()]
But I would recommend to store your key as tuple (municipality, county).
Well, verbose mode for that:
muncounty = {}
muncounty['mun'+','+'county'] = 12345
muncounty['mun2'+','+'county2'] = 54321
l = []
for i in muncounty:
l.append(i)
muns = []
for k in l:
muns.append(k.split(',')[0])
But dude... this is a really bad way to store mun/countries ;-)

Categories

Resources