Refresh a list content with another list in Python - python

How would I extend the content of a given list with another given list without using the method .extend()? I imagine that I could use something with dictionaries.
Code
>>> tags =['N','O','S','Cl']
>>> itags =[1,2,4,3]
>>> anew =['N','H']
>>> inew =[2,5]
I need a function which returns the refreshed lists
tags =['N','O','S','Cl','H']
itags =[3,2,4,3,5]
When an element is already in the list, the number in the other list is added. If I use the extend() method, the the element N will appear in list tags twice:
>>> tags.extend(anew)
>>>itags.extend(inew)
>>> print tags,itags
['N','O','S','Cl','N','H'] [1,2,4,3,5,2,5]

You probably want a Counter for this.
from collections import Counter
tags = Counter({"N":1, "O":2, "S": 4, "Cl":3})
new = Counter({"N": 2, "H": 5})
tags = tags + new
print tags
output:
Counter({'H': 5, 'S': 4, 'Cl': 3, 'N': 3, 'O': 2})

If the order of elements matters, I'd use collections.Counter like so:
from collections import Counter
tags = ['N','O','S','Cl']
itags = [1,2,4,3]
new = ['N','H']
inew = [2,5]
cnt = Counter(dict(zip(tags, itags))) + Counter(dict(zip(new, inew)))
out = tags + [el for el in new if el not in tags]
iout = [cnt[el] for el in out]
print(out)
print(iout)
If the order does not matter, there is a simpler way to obtain out and iout:
out = cnt.keys()
iout = cnt.values()
If you don't have to use a pair of lists, then working with Counter directly is a natural fit for your problem.

If you need to maintain the order, you may want to use an OrderedDict instead of a Counter:
from collections import OrderedDict
tags = ['N','O','S','Cl']
itags = [1,2,4,3]
new = ['N','H']
inew = [2,5]
od = OrderedDict(zip(tags, itags))
for x, i in zip(new, inew):
od[x] = od.setdefault(x, 0) + i
print od.keys()
print od.values()
On Python 3.x, use list(od.keys()) and list(od.values()).

Related

Algorithm to split the values of a list into a specific format

Can you help me with my algorithm in Python to parse a list, please?
List = ['PPPP_YYYY_ZZZZ_XXXX', 'PPPP_TOTO_TATA_TITI_TUTU', 'PPPP_TOTO_MMMM_TITI_TUTU', 'PPPP_TOTO_EHEH_TITI_TUTU', 'PPPP_TOTO_EHEH_OOOO_AAAAA', 'PPPP_TOTO_EHEH_IIII_SSSS_RRRR']
In this list, I have to get the last two words (PARENT_CHILD). For example for PPPP_TOTO_TATA_TITI_TUTU, I only get TITI_TUTU
In the case where there are duplicates, that is to say that in my list, I have: PPPP_TOTO_TATA_TITI_TUTU and PPPP_TOTO_EHEH_TITI_TUTU, I would have two times TITI_TUTU, I then want to recover the GRANDPARENT for each of them, that is: TATA_TITI_TUTU and EHEH_TITI_TUTU
As long as the names are duplicated, we take the level above.
But in this case, if I added the GRANDPARENT for EHEH_TITI_TUTU, I also want it to be added for all those who have EHEH in the name so instead of having OOOO_AAAAA, I would like to have EHEH_OOO_AAAAA and EHEH_IIII_SSSS_RRRR
My final list =
['ZZZZ_XXXX', 'TATA_TITI_TUTU', 'MMMM_TITI_TUTU', 'EHEH_TITI_TUTU', 'EHEH_OOOO_AAAAA', 'EHEH_IIII_SSSS_RRRR']
Thank you in advance.
Here is the code I started to write:
json_paths = ['PPPP_YYYY_ZZZZ_XXXX', 'PPPP_TOTO_TATA_TITI_TUTU',
'PPPP_TOTO_EHEH_TITI_TUTU', 'PPPP_TOTO_MMMM_TITI_TUTU', 'PPPP_TOTO_EHEH_OOOO_AAAAA']
cols_name = []
for path in json_paths:
acc=2
col_name = '_'.join(path.split('_')[-acc:])
tmp = cols_name
while col_name in tmp:
acc += 1
idx = tmp.index(col_name)
cols_name[idx] = '_'.join(json_paths[idx].split('_')[-acc:])
col_name = '_'.join(path.split('_')[-acc:])
tmp = ['_'.join(item.split('_')[-acc:]) for item in json_paths].pop()
cols_name.append(col_name)
print(cols_name.index(col_name), col_name)
cols_name
help ... with ... algorithm
use a dictionary for the initial container while iterating
keys will be PARENT_CHILD's and values will be lists containing grandparents.
>>> s = 'PPPP_TOTO_TATA_TITI_TUTU'
>>> d = collections.defaultdict(list)
>>> *_,grandparent,parent,child = s.rsplit('_',maxsplit=3)
>>> d['_'.join([parent,child])].append(grandparent)
>>> d
defaultdict(<class 'list'>, {'TITI_TUTU': ['TATA']})
>>> s = 'PPPP_TOTO_EHEH_TITI_TUTU'
>>> *_,grandparent,parent,child = s.rsplit('_',maxsplit=3)
>>> d['_'.join([parent,child])].append(grandparent)
>>> d
defaultdict(<class 'list'>, {'TITI_TUTU': ['TATA', 'EHEH']})
>>>
after iteration determine if there are multiple grandparents in a value
if there are, join/append the parent_child to each grandparent
additionally find all the parent_child's with these grandparents and prepend their grandparents. To facilitate build a second dictionary during iteration - {grandparent:[list_of_children],...}.
if the parent_child only has one grandparent use as-is
Instead of splitting each string the info could be extracted with a regular expression.
pattern = r'^.*?_([^_]*)_([^_]*_[^_]*)$'

Is there a way I can make a list out of this?

So I programmed this code to print out how many times a number would be printed in the list that I provided, and the output works, but I want to put all the values that I get into a list, how can I do that?
This is my code...
i = [5,5,7,9,9,9,9,9,8,8]
def num_list(i):
return [(i.count(x),x) for x in set(i)]
for tv in num_list(i):
if tv[1] > 1:
print(tv)
The output that I get is
(2, 8)
(5, 9)
(2, 5)
(1, 7)
but I want the output to be like
[2,8,5,9,2,5,1,7)
How can I do that??
Just do:
tvlist = []
for tv in num_list(i):
if tv[1] > 1:
tvlist.extend(tv)
print(tvlist)
Or a list comprehension:
tvlist = [x for tv in num_list(i) if tv[1] > 1 for x in tv]
Also your function could just simply be collections.Counter:
from collections import Counter
def num_list(i):
return Counter(i).items()
flattened_iter = itertools.chain.from_iterable(num_list(i))
print(list(flattened_iter))
is how i would flatten a list
as mentioned by everyone else collections.Counter is likely to be significantly better performance for large lists...
if you would rather implement it yourself you can pretty easily
def myCounter(a_list):
counter = {}
for item in a_list:
# in modern python versions order is preserved in dicts
counter[item] = counter.get(item,0) + 1
for unique_item in counter:
# make it a generator just for ease
# we will just yield twice to create a flat list
yield counter[unique_item]
yield unique_item
i = [5,5,7,9,9,9,9,9,8,8]
print(list(myCounter(i)))
Using a collections.Counter is more efficient. This paired with itertools.chain will get you your desired result:
from collections import Counter
from itertools import chain
i = [5,5,7,9,9,9,9,9,8,8]
r = list(chain(*((v, k) for k, v in Counter(i).items() if v > 1)))
print(r)
[2, 5, 5, 9, 2, 8]
Without itertools.chain
r = []
for k, v in Counter(i).items():
if v > 1:
r.extend((v, k))

Python - How do i build a dictionary from a text file?

for the class data structures and algorithms at Tilburg University i got a question in an in class test:
build a dictionary from testfile.txt, with only unique values, where if a value appears again, it should be added to the total sum of that productclass.
the text file looked like this, it was not a .csv file:
apples,1
pears,15
oranges,777
apples,-4
oranges,222
pears,1
bananas,3
so apples will be -3 and the output would be {"apples": -3, "oranges": 999...}
in the exams i am not allowed to import any external packages besides the normal: pcinput, math, etc. i am also not allowed to use the internet.
I have no idea how to accomplish this, and this seems to be a big problem in my development of python skills, because this is a question that is not given in a 'dictionaries in python' video on youtube (would be to hard maybe), but also not given in a expert course because there this question would be to simple.
hope you guys can help!
enter code here
from collections import Counter
from sys import exit
from os.path import exists, isfile
##i did not finish it, but wat i wanted to achieve was build a list of the
strings and their belonging integers. then use the counter method to add
them together
## by splitting the string by marking the comma as the split point.
filename = input("filename voor input: ")
if not isfile(filename):
print(filename, "bestaat niet")
exit()
keys = []
values = []
with open(filename) as f:
xs = f.read().split()
for i in xs:
keys.append([i])
print(keys)
my_dict = {}
for i in range(len(xs)):
my_dict[xs[i]] = xs.count(xs[i])
print(my_dict)
word_and_integers_dict = dict(zip(keys, values))
print(word_and_integers_dict)
values2 = my_dict.split(",")
for j in values2:
print( value2 )
the output becomes is this:
[['schijndel,-3'], ['amsterdam,0'], ['tokyo,5'], ['tilburg,777'], ['zaandam,5']]
{'zaandam,5': 1, 'tilburg,777': 1, 'amsterdam,0': 1, 'tokyo,5': 1, 'schijndel,-3': 1}
{}
so i got the dictionary from it, but i did not separate the values.
the error message is this:
28 values2 = my_dict.split(",") <-- here was the error
29 for j in values2:
30 print( value2 )
AttributeError: 'dict' object has no attribute 'split'
I don't understand what your code is actually doing, I think you don't know what your variables are containing, but this is an easy problem to solve in Python. Split into a list, split each item again, and count:
>>> input = "apples,1 pears,15 oranges,777 apples,-4 oranges,222 pears,1 bananas,3"
>>> parts = input.split()
>>> parts
['apples,1', 'pears,15', 'oranges,777', 'apples,-4', 'oranges,222', 'pears,1', 'bananas,3']
Then split again. Behold the list comprehension. This is an idiomatic way to transform a list to another in python. Note that the numbers are strings, not ints yet.
>>> strings = [s.split(',') for s in strings]
>>> strings
[['apples', '1'], ['pears', '15'], ['oranges', '777'], ['apples', '-4'], ['oranges', '222'], ['pears', '1'], ['bananas', '3']]
Now you want to iterate over pairs, and sum all the same fruits. This calls for a dict:
>>> result = {}
>>> for fruit, countstr in pairs:
... if fruit not in result:
... result[fruit] = 0
... result[fruit] += int(countstr)
>>> result
{'pears': 16, 'apples': -3, 'oranges': 999, 'bananas': 3}
This pattern of adding an element if it doesn't exist comes up frequently. You should checkout defaultdict in the collections module. If you use that, you don't even need the if.
Let's walk through what you need to do to. First, check if the file exists and read the contents to a variable. Second, parse each line - you need to split the line on the comma, convert the number from a string to an integer, and then pass the values to a dictionary. In this case I would recommend using defaultdict from collections, but we can also do it with a standard dictionary.
from os.path import exists, isfile
from collections import defaultdict
filename = input("filename voor input: ")
if not isfile(filename):
print(filename, "bestaat niet")
exit()
# this reads the file to a list, removing newline characters
with open(filename) as f:
line_list = [x.strip() for x in f]
# create a dictionary
my_dict = {}
# update the value in the dictionary if it already exists,
# otherwise add it to the dictionary
for line in line_list:
k, v_str = line.split(',')
if k in my_dict:
my_dict[k] += int(v_str)
else:
my_dict[k] = int(v_str)
# print the dictionary
table_str = '{:<30}{}'
print(table_str.format('Item','Count'))
print('='*35)
for k,v in sorted(my_dict.item()):
print(table_str.format(k,v))

Keep duplicates in a list in Python

I know this is probably an easy answer but I can't figure it out. What is the best way in Python to keep the duplicates in a list:
x = [1,2,2,2,3,4,5,6,6,7]
The output should be:
[2,6]
I found this link: Find (and keep) duplicates of sublist in python, but I'm still relatively new to Python and I can't get it to work for a simple list.
I'd use a collections.Counter:
from collections import Counter
x = [1, 2, 2, 2, 3, 4, 5, 6, 6, 7]
counts = Counter(x)
output = [value for value, count in counts.items() if count > 1]
Here's another version which keeps the order of when the item was first duplicated that only assumes that the sequence passed in contains hashable items and it will work back to when set or yeild was introduced to the language (whenever that was).
def keep_dupes(iterable):
seen = set()
dupes = set()
for x in iterable:
if x in seen and x not in dupes:
yield x
dupes.add(x)
else:
seen.add(x)
print list(keep_dupes([1,2,2,2,3,4,5,6,6,7]))
This is a short way to do it if the list is sorted already:
x = [1,2,2,2,3,4,5,6,6,7]
from itertools import groupby
print [key for key,group in groupby(x) if len(list(group)) > 1]
List Comprehension in combination with set() will do exactly what you want.
list(set([i for i in x if x.count(i) >= 2]))
>>> [2,6]
keepin' it simple:
array2 = []
aux = 0
aux2=0
for i in x:
aux2 = i
if(aux2==aux):
array2.append(i)
aux= i
list(set(array2))
That should work
Not efficient but just to get the output, you could try:
import numpy as np
def check_for_repeat(check_list):
repeated_list = []
for idx in range(len(check_list)):
elem = check_list[idx]
check_list[idx] = None
if elem in temp_list:
repeated_list.append(elem)
repeated_list = np.array(repeated_list)
return list(np.unique(repeated_list))

Python NLTK - counting occurrence of word in brown corpora based on returning top results by tag

I'm trying to return the top occurring values from a corpora for specific tags. I can get the tag and the word themselves to return fine however I can't get the count to return within the output.
import itertools
import collections
import nltk
from nltk.corpus import brown
words = brown.words()
def findtags(tag_prefix, tagged_text):
cfd = nltk.ConditionalFreqDist((tag, word) for (word, tag) in tagged_text
if tag.startswith(tag_prefix))
return dict((tag, cfd[tag].keys()[:5]) for tag in cfd.conditions())
tagdictNNS = findtags('NNS', nltk.corpus.brown.tagged_words())
This returns the following fine
for tag in sorted(tagdictNNS):
print tag, tagdictNNS[tag]
I have managed to return the count of every NN based word using this:
pluralLists = tagdictNNS.values()
pluralList = list(itertools.chain(*pluralLists))
for s in pluralList:
sincident = words.count(s)
print s
print sincident
That returns everything.
Is there a better way of inserting the occurrence into the a dict tagdictNN[tag]?
edit 1:
pluralLists = tagdictNNS.values()[:5]
pluralList = list(itertools.chain(*pluralLists))
returns them in size order from the for s loop. still not the right way to do it though.
edit 2: updated dictionaries so they actually search for NNS plurals.
I might not understand, but given your tagdictNNS:
>>> new = {}
>>> for k,v in tagdictNNS.items():
new[k] = len(tagdictNNS[k])
>>> new
{'NNS$-TL-HL': 1, 'NNS-HL': 5, 'NNS$-HL': 4, 'NNS-TL': 5, 'NNS-TL-HL': 5, 'NNS+MD': 2, 'NNS$-NC': 1, 'NNS-TL-NC': 1, 'NNS$-TL': 5, 'NNS': 5, 'NNS$': 5, 'NNS-NC': 5}
Then you can do something like:
>>> sorted(new.items(), key=itemgetter(1), reverse=True)[:2]
[('NNS-HL', 5), ('NNS-TL', 5)]

Categories

Resources