Create Dictionary With Position of Character and its Respective Count - python

I have multiple strings like:
0000NNN
000ANNN
I wish to get a dictionary which has the position of the character in each string as the key and the count of its respective 0 as the value (If the character is NOT 0, it can be ignored). So for the above strings, the output would be:
1:2
2:2
3:2
4:1
5:0
6:0
7:0
So far i tried this:
ctr=1
my_dict={}
for word in string_list:
for letter in word:
if letter == "0":
if ctr not in my_dict.keys():
my_dict[ctr]=1
else:
my_dict[ctr]+=1
else:
pass
print(my_dict)
What am I doing wrong as the output is not correct?

Looks like you never increases and resetting ctr and not adding my_dict[ctr]=0 for 5,6,7. Something like this should work:
string_list = ['0000NNN','000ANNN']
my_dict={}
for word in string_list:
ctr=1 #Moved
for letter in word:
if letter == "0":
if ctr not in my_dict.keys():
my_dict[ctr]=1
else:
my_dict[ctr]+=1
else:
my_dict[ctr]=0 #Added
ctr+=1 #Added
print(my_dict) #{1: 2, 2: 2, 3: 2, 4: 0, 5: 0, 6: 0, 7: 0}

You can use collections.Counter in the following way:
>>> Counter(i for string in strings for i, c in enumerate(string, start=1) if c == '0')
Counter({1: 2, 2: 2, 3: 2, 4: 1})

You're not incrementing ctr, so ctr == 1 always. But, that alone won't get you what you want. Right now, you're counting all of the '0's in all of your words and storing them in the dictionary key '1'.
Instead, you want to keep track of the position and the count separately.
my_dict = {}
for pos in range(7): # assuming your "words" are the same length - 7 chars
my_dict[pos] = 0
for word in string_list:
if word[pos] == '0':
my_dict[pos] += 1

You could use zip() like this.
s = ["0000NNN", "000ANNN"]
d = {}
for i,v in enumerate(zip(s[0], s[1]),1):
d[i] = v.count('0')
print(d)
{1: 2, 2: 2, 3: 2, 4: 1, 5: 0, 6: 0, 7: 0}

Something along the following lines should point you in the right direction:
strings = ["0000NNN", "000ANNN"]
d = {i+1: sum(s[6-i] != "0" for s in strings) for i in range(7)}
# {1: 2, 2: 2, 3: 2, 4: 1, 5: 0, 6: 0, 7: 0}
The 6 six would be length-1 in the general case.

Related

Empty Dictionary while trying to count number of different characters in Input String using a dictionary

I get an empty dictionary while I m trying to count number of different characters(upper case and lower case) in an given string.
Here is my code that i tried: in an if condition i put variable a =1 , to do nothing in if condition.
input_str = "AAaabbBBCC"
histogram = dict()
for idx in range(len(input_str)):
val = input_str[idx]
# print(val)
if val not in histogram:
# do nothing
a = 1
else:
histogram[val] = 1
print(histogram)
#print("number of different are :",len(histogram))
here is my code output:
{}
I am expecting a output as below:
{ 'A': 1,
'a': 1,
'b': 1,
'B': 1,
'C': 1
}
If you wanted to count the number of distinct values in your string, you could do it this way
input_str = "AAaabbBBCC"
histogram = dict()
for idx in range(len(input_str)):
val = input_str[idx]
if val not in histogram:
#add to dictionary
histogram[val] = 1
else:
#increase count
histogram[val] += 1
>>> histogram
{'A': 2, 'a': 2, 'b': 2, 'B': 2, 'C': 2}

returning difference of two dicts and subtracting its values

The following is my code.
I have a list of elements as given in the list [4,5,11,5,6,11]. The resultant output I am expecting is unbalanced elements in the array.
from collections import Counter
list_elem = [4,5,11,5,6,11]
dict_elem = dict(Counter(list_elem))
max_val = dict([max(dict_elem.items(), key=lambda x: x[1])])
o={k:v for k,v in dict_elem.items() if k not in max_val or v != max_val[k]}
Expecting o to be {4: 1, 6: 1} not {4: 1, 11: 2, 6: 1}
If the list_elem is [1,5,6,7,1,6,1] then I want the output to be {5:2,7:2,6:1}
i.e. 3 being the value for the key- 1, and then we need the rest of the values of the keys to have value subtracted from the max, i.e -3
Creating a Counter and subtracting it from what it would be balanced:
ctr = Counter(list_elem)
bal = Counter(dict.fromkeys(ctr, max(ctr.values())))
o = dict(bal - ctr)

Finding an unknown pattern in a string python

I am well aware of following question which also exists on stack overflow String Unknown pattern Matching but the answer there doesn't really work for me.
My problem is next. I get a string of characters e.g
'1211' and what I need to do is see that 1 is most often repeated
and this 2 times in a row.
But it can also be "121212112" where 12 is repeated 3 times in a
row.
But with 12221221 it is 221 that is repeated 2 times rather than 2
that repeats 3 times.
here are some results I like to get (the only numbers ever used are 1 and 2's)
>>> counter('1211')
1
>>> counter('1212')
2
>>> counter('21212')
2
the outcome I want is how many times it occurs.
I have no idea how to even start looking for a pattern since it is not known on forehand and I did some research online and don't find anything similar.
Does anyone have any idea how I even start to tackle this problem ? All help is welcome and if you want more information don't hesitate to let me know.
Really inefficient, but you can
find all substrings (https://stackoverflow.com/a/22470047/264596)
put them into a set to avoid duplicates
for each of the substring, find all its occurrences - and use some function to find the max (I am not sure how you choose between short strings occurring many times and long strings occurring few times)
Obviously you can use some datastructure to pass through the string once and do some counting on the way, but since I am not sure what your constraints and desired output is, I can give you only this.
I agree with Jirka, not sure how you score long vs short to select the optimal results but this function will give you the menu:
#Func1
def sub_string_cts(string):
combos = {}
for i in range(len(string)):
u_start = len(string) - i
for start in range(u_start):
c_str = string[start:i+start+1]
if c_str in combos:
combos[c_str] += 1
else:
combos[c_str] = 1
return combos
sub_string_cts('21212')
{'2': 3,
'1': 2,
'21': 2,
'12': 2,
'212': 2,
'121': 1,
'2121': 1,
'1212': 1,
'21212': 1}
After your comment I think this is more what you're looking for:
#Func2
def sub_string_cts(string):
combos = {}
for i in range(len(string)):
u_start = len(string) - i
substrs = set([string[start:i+start+1] for start in range(u_start)])
for substring in substrs:
combos[substring] = max([len(i) for i in re.findall("((?:{})+)".format(substring), string)])//len(substring)
return combos
sub_string_cts('21212')
{'2': 1,
'1': 1,
'21': 2,
'12': 2,
'212': 1,
'121': 1,
'2121': 1,
'1212': 1,
'21212': 1}
You could narrow that down to the 'best' candidates by collapsing on the highest occuring instance of each string length:
def max_by_len(result_dict):
results = {}
for k, v in result_dict.items():
if len(k) not in results:
results[len(k)] = {}
for c_len in [ln for ln in results]:
len_max_count = max([v for (k, v) in result_dict.items() if len(k) == c_len])
for k,v in result_dict.items():
if len(k) == c_len:
if v == len_max_count:
results[c_len][k] = v
return results
#Func1:
max_by_len(sub_string_cts('21212'))
{1: {'2': 3},
2: {'21': 2, '12': 2},
3: {'212': 2},
4: {'2121': 1, '1212': 1},
5: {'21212': 1}}
#Func2:
max_by_len(sub_string_cts('21212'))
{1: {'2': 1, '1': 1},
2: {'21': 2, '12': 2},
3: {'212': 1, '121': 1},
4: {'2121': 1, '1212': 1},
5: {'21212': 1}}
Assuming we wouldn't select '2121' or '1212' because their occurrence matches '21212' and they're shorter in length, and that similarly we wouldn't select '21' or '12' as they occur at the same frequency as '212' we could limit our viable candidates down to '2', '212', and '21212' with the following code:
def remove_lesser_patterns(result_dict):
len_lst = sorted([k for k in result_dict], reverse=True)
#len_lst = sorted([k for k in max_len_results])
len_crosswalk = {i_len: max([v for (k,v) in result_dict[i_len].items()]) for i_len in len_lst}
for i_len in len_lst[:-1]:
eval_lst = [i for i in len_lst if i < i_len]
for i in eval_lst:
if len_crosswalk[i] <= len_crosswalk[i_len]:
if i in result_dict:
del result_dict[i]
return result_dict
#Func1
remove_lesser_patterns(max_by_len(sub_string_cts('21212')))
{1: {'2': 3}, 3: {'212': 2}, 5: {'21212': 1}}
#Func2
remove_lesser_patterns(max_by_len(sub_string_cts('21212')))
{2: {'21': 2, '12': 2}, 5: {'21212': 1}}
results:
test_string = ["1211", "1212", "21212", "12221221"]
for string in test_string:
print("<Input: '{}'".format(string))
c_answer = remove_lesser_patterns(max_by_len(sub_string_cts(string)))
print("<Output: {}\n".format(c_answer))
<Input: '1211'
<Output: {1: {'1': 2}, 4: {'1211': 1}}
# '1' is repeated twice
<Input: '1212'
<Output: {2: {'12': 2}, 4: {'1212': 1}}
# '12' is repeated twice
<Input: '21212'
<Output: {2: {'21': 2, '12': 2}, 5: {'21212': 1}}
# '21' and '12' are both repeated twice
<Input: '12221221'
<Output: {1: {'2': 3}, 3: {'221': 2}, 8: {'12221221': 1}}
# '2' is repeated 3 times, '221' is repeated twice
These functions together give you the highest occurrence of each pattern by length. The key for the dictionary is the length, with a sub-dictionary of the highest (multiple if tied) occuring patterns.
Func2 requires the patterns be sequential, whereas Func1 does not -- it is strictly occurrence based.
Note:
With your example:
3. But with 12221221 it is 221 that is repeated 2 times rather than 2 that repeats 3 times.
the code solves this ambiguity in your desired output (2 or 3) by giving you both:
<Input: '12221221'
<Output: {1: {'2': 3}, 3: {'221': 2}, 8: {'12221221': 1}}
# '2' is repeated 3 times, '221' is repeated twice
If you're only interested in the 2 char lengths you can easily pull those out of the max_by_len results as follows:
test_string = ["1211", "1212", "21212", "12221221"]
for string in test_string:
print("<Input: '{}'".format(string))
c_answer = remove_lesser_patterns({k:v for (k,v) in max_by_len(sub_string_cts(string)).items() if k == 2})
print("<Output: {}\n".format(max([v for (k,v) in c_answer[2].items()])))
#Func2
<Input: '1211'
<Output: 1
<Input: '1212'
<Output: 2
<Input: '21212'
<Output: 2
<Input: '12221221'
<Output: 1

Fast/Pythonic way to count intervals between repeated list values

I want to make a histogram of all the intervals between repeated values in a list. I wrote some code that works, but it's using a for loop with if statements. I often find that if one can manage to write a version using clever slicing and/or predefined python (numpy) methods, that one can get much faster Python code than using for loops, but in this case I can't think of any way of doing that. Can anyone suggest a faster or more pythonic way of doing this?
# make a 'histogram'/count of all the intervals between repeated values
def hist_intervals(a):
values = sorted(set(a)) # get list of which values are in a
# setup the dict to hold the histogram
hist, last_index = {}, {}
for i in values:
hist[i] = {}
last_index[i] = -1 # some default value
# now go through the array and find intervals
for i in range(len(a)):
val = a[i]
if last_index[val] != -1: # do nothing if it's the first time
interval = i - last_index[val]
if interval in hist[val]:
hist[val][interval] += 1
else:
hist[val][interval] = 1
last_index[val] = i
return hist
# example list/array
a = [1,2,3,1,5,3,2,4,2,1,5,3,3,4]
histdict = hist_intervals(a)
print("histdict = ",histdict)
# correct answer for this example
answer = { 1: {3:1, 6:1},
2: {2:1, 5:1},
3: {1:1, 3:1, 6:1},
4: {6:1},
5: {6:1}
}
print("answer = ",answer)
Sample output:
histdict = {1: {3: 1, 6: 1}, 2: {5: 1, 2: 1}, 3: {3: 1, 6: 1, 1: 1}, 4: {6: 1}, 5: {6: 1}}
answer = {1: {3: 1, 6: 1}, 2: {2: 1, 5: 1}, 3: {1: 1, 3: 1, 6: 1}, 4: {6: 1}, 5: {6: 1}}
^ note: I don't care about the ordering in the dict, so this solution is acceptable, but I want to be able to run on really large arrays/lists and I'm suspecting my current method will be slow.
You can eliminate the setup loop by a carefully constructed defaultdict. Then you're just left with a single scan over the input list, which is as good as it gets. Here I change the resultant defaultdict back to a regular Dict[int, Dict[int, int]], but that's just so it prints nicely.
from collections import defaultdict
def count_intervals(iterable):
# setup
last_seen = {}
hist = defaultdict(lambda: defaultdict(int))
# The actual work
for i, x in enumerate(iterable):
if x in last_seen:
hist[x][i-last_seen[x]] += 1
last_seen[x] = i
return hist
a = [1,2,3,1,5,3,2,4,2,1,5,3,3,4]
hist = count_intervals(a)
for k, v in hist.items():
print(k, dict(v))
# 1 {3: 1, 6: 1}
# 3 {3: 1, 6: 1, 1: 1}
# 2 {5: 1, 2: 1}
# 5 {6: 1}
# 4 {6: 1}
There is an obvious change to make in terms of data structures. instead of using a dictionary of dictionaries for hist use a defaultdict of Counter this lets the code become
from collections import defaultdict, Counter
# make a 'histogram'/count of all the intervals between repeated values
def hist_intervals(a):
values = sorted(set(a)) # get list of which values are in a
# setup the dict to hold the histogram
hist, last_index = defaultdict(Counter), {}
# now go through the array and find intervals
for i, val in enumerate(a):
if val in last_index
interval = i - last_index[val]
hist[val].update((interval,))
last_index[val] = i
return hist
this will be faster as the if's are written in C, and will also be cleaner.

keyerror 1 in my code

I am writing a function that take dictionary input and return list of keys which have unique values in that dictionary. Consider,
ip = {1: 1, 2: 1, 3: 3}
so output should be [3] as key 3 has unique value which is not present in dict.
Now there is problem in given fuction:
def uniqueValues(aDict):
dicta = aDict
dum = 0
for key in aDict.keys():
for key1 in aDict.keys():
if key == key1:
dum = 0
else:
if aDict[key] == aDict[key1]:
if key in dicta:
dicta.pop(key)
if key1 in dicta:
dicta.pop(key1)
listop = dicta.keys()
print listop
return listop
I am getting error like:
File "main.py", line 14, in uniqueValues
if aDict[key] == aDict[key1]: KeyError: 1
Where i am doing wrong?
Your main problem is this line:
dicta = aDict
You think you're making a copy of the dictionary, but actually you still have just one dictionary, so operations on dicta also change aDict (and so, you remove values from adict, they also get removed from aDict, and so you get your KeyError).
One solution would be
dicta = aDict.copy()
(You should also give your variables clearer names to make it more obvious to yourself what you're doing)
(edit) Also, an easier way of doing what you're doing:
def iter_unique_keys(d):
values = list(d.values())
for key, value in d.iteritems():
if values.count(value) == 1:
yield key
print list(iter_unique_keys({1: 1, 2: 1, 3: 3}))
Use Counter from collections library:
from collections import Counter
ip = {
1: 1,
2: 1,
3: 3,
4: 5,
5: 1,
6: 1,
7: 9
}
# Generate a dict with the amount of occurrences of each value in 'ip' dict
count = Counter([x for x in ip.values()])
# For each item (key,value) in ip dict, we check if the amount of occurrences of its value.
# We add it to the 'results' list only if the amount of occurrences equals to 1.
results = [x for x,y in ip.items() if count[y] == 1]
# Finally, print the results list
print results
Output:
[3, 4, 7]

Categories

Resources