Converting (and back) discontinuous values to continuous values in Python 2 - python

Let's say I have a list of:
5 10 10 20 50 50 20
(there are 4 distinguish numbers).
I want to convert them to:
0 1 1 2 3 3 2
(then convert back to the original form).
There are tons of ways to do that, but I am not sure what is the best and Pythonic way?
(a way is to generate a set, convert the set to a list, sort the list, then generate output by the sorted list, but I think it is not the best one)

I think this is a good problem for make use of collections.defaultdict() and itertools.count() methods.
from itertools import count
from collections import defaultdict
c = count()
dct = defaultdict(lambda: next(c))
lst = [5, 10, 10, 20, 50, 50, 20]
conv = [dct[i] for i in lst]
# [0, 1, 1, 2, 3, 3, 2]
back = [k for c in conv for k, v in dct.items() if v == c]
# [5, 10, 10, 20, 50, 50, 20]

The suggested answer by Delgan is O(n^2) due to the nested loops in back. This solution is O(n).
An alternative solution is as follows:
lst = [5, 10, 10, 20, 50, 50, 20]
# Convert (and build reverse mapping)
mapping = {}
reverse_mapping = {}
conv = []
for i in lst:
v = mapping.setdefault(i, len(mapping))
reverse_mapping[v] = i
conv.append(v)
# Convert back
back = [reverse_mapping[v] for v in conv]

Related

Obtaining a list of ordered integers from a list of "pairs" in Python

Hello I am currently working with a large set of data which contains an even amount of integers, all of which have a matching value. I am trying to create a list which is made up of "one of a pair" in Python.I am able to have multiple pairs of the same value, thus simply using the set function does not work. For example, if I have a list:
List = [10, 10, 11, 20, 15, 20, 15, 11, 10, 10]
In this example, indices 0 and 1 would be a pair, then 2 and 7, 3 and 5, 4 and 6, 8 and 9.
I want to extract from that list the values that make up each pair and create a new list with said values to produce something such as:
newList = [10, 11, 20, 15, 10]
Using the set function makes it such that only one element from the entire set of data is put into the list, where I need half of the total data from List. For situations where I have more than one pair of the same value, it would look something such as:
List = [10, 10, 11, 10, 11, 10]
Would need to produce a list such as:
newList = [10, 11, 10]
Any insight would be great as I am new to Python and there are a lot of functions I may not be aware of.
Thank you
Just try:
new_list = set(list)
This should return your desired output.
If I've understood correctly, you don't want to have any duplicated value, want to retain a list with unique values from a particular list.
If I'm right, a simple way to do so would be:
List = [10, 10, 11, 11, 15, 20, 15, 20]
newList = []
for x in List:
if x not in newList:
newList.append(x)
print(newList)
A python-like way to do so would be:
newList = set(List)
Here is a slight variation on one of #Alain T's answer:
[i for s in [set()] for i in List if (s.remove(i) if i in s else (not s.add(i)))]
NB: the following was my answer before you add the ordering requirement
sorted(List)[::2]
This sorts the input List and then take only one value out of each two consecutive.
As a general approach, this'll do:
l = [10, 10, 11, 20, 15, 20, 15, 11, 10, 10]
i = 0
while i < len(l):
del l[l.index(l[i], i + 1)]
i += 1
It iterates through the list one by one, finding the index of the next occurrence of the current value, and deletes it, shortening the list. This can probably be dressed up in various ways, but is a simple algorithm. Should a number not have a matching pair, this will raise a ValueError.
The following code reates a new list of half the number of items occuring in the input list. The order is in the order of first occurrence in the input list.
>>> from collections import Counter
>>> d = [10, 10, 11, 20, 15, 20, 15, 11, 10, 10]
>>> c = Counter(d)
>>> c
Counter({10: 4, 11: 2, 20: 2, 15: 2})
>>> answer = sum([[key] * (val // 2) for key, val in c.items()], [])
>>> answer
[10, 10, 11, 20, 15]
>>>
If you need to preserve the order of the first occurrence of each pair, you could use a set with an XOR operation on values to alternate between first and second occurrences.
List = [10, 10, 11, 20, 15, 20, 15, 11, 10, 10]
paired = [ i for pairs in [set()] for i in List if pairs.symmetric_difference_update({i}) or i in pairs]
print(p)
# [10, 11, 20, 15, 10]
You could also do this with the accumulate function from itertools:
from itertools import accumulate
paired = [a for a,b in zip(List,accumulate(({n} for n in List),set.__xor__)) if a in b]
print(paired)
# [10, 11, 20, 15, 10]
Or use a bitmap instead of a set (if your values are relatively small positive integers (e.g. between 0 and 64):
paired = [ n for n,m in zip(List,accumulate((1<<n for n in List),int.__xor__)) if (1<<n)&m ]
print(paired)
# [10, 11, 20, 15, 10]
Or you could use a Counter from collections
from collections import Counter
paired = [ i for c in [Counter(List)] for i in List if c.update({i:-1}) or c[i]&1 ]
print(paired)
# [10, 11, 20, 15, 10]
And , if you're not too worried about efficiency, a double sort with a 2 step striding could do it:
paired = [List[i] for i,_ in sorted(sorted(enumerate(List),key=lambda n:n[1])[::2])]
print(paired)
# [10, 11, 20, 15, 10]

Eliminating Consecutive Numbers

If you have a range of numbers from 1-49 with 6 numbers to choose from, there are nearly 14 million combinations. Using my current script, I currently have only 7.2 million combinations remaining. Of the 7.2 million remaining combinations, I want to eliminate all 3, 4, 5, 6, dual, and triple consecutive numbers.
Example:
3 consecutive: 1, 2, 3, x, x, x
4 consecutive: 3, 4, 5, 6, x, x
5 consecutive: 4, 5, 6, 7, 8, x
6 consecutive: 5, 6, 7, 8, 9, 10
double separate consecutive: 1, 2, 5, 6, 14, 18
triple separate consecutive: 1, 2, 9, 10, 22, 23
Note: combinations such as 1, 2, 12, 13, 14, 15 must also be eliminated or else they conflict with the rule that double and triple consecutive combinations to be eliminated.
I'm looking to find how many combinations of the 7.2 million remaining combinations have zero consecutive numbers (all mixed) and only 1 consecutive pair.
Thank you!
import functools
_MIN_SUM = 120
_MAX_SUM = 180
_MIN_NUM = 1
_MAX_NUM = 49
_NUM_CHOICES = 6
_MIN_ODDS = 2
_MAX_ODDS = 4
#functools.lru_cache(maxsize=None)
def f(n, l, s = 0, odds = 0):
if s > _MAX_SUM or odds > _MAX_ODDS:
return 0
if n == 0 :
return int(s >= _MIN_SUM and odds >= _MIN_ODDS)
return sum(f(n-1, i+1, s+i, odds + i % 2) for i in range(l, _MAX_NUM+1))
result = f(_NUM_CHOICES, _MIN_NUM)
print('Number of choices = {}'.format(result))
While my answer should work, I think someone might be able to offer a faster solution.
Consider the following code:
not_allowed = []
for x in range(48):
not_allowed.append([x, x+1, x+2])
# not_allowed = [ [0,1,2], [1,2,3], ... [11,12,13], ... [47,48,49] ]
my_numbers = [[1, 2, 5, 9, 11, 33], [1, 3, 7, 8, 9, 31], [12, 13, 14, 15, 23, 43]]
for x in my_numbers:
for y in not_allowed:
if set(y) <= set(x): # if [1,2,3] is a subset of [1,2,5,9,11,33], etc.
# drop x
This code will remove all instances that contain double consecutive numbers, which is all you really need to check for, because triple, quadruple, etc. all imply double consecutive. Try implementing this and let me know how it works.
The easiest approach is probably to generate and filter. I used numpy to try to vectorize as much of this as I could:
import numpy as np
from itertools import combinations
combos = np.array(list(combinations(range(1, 50), 6))) # build all combos
# combos is shape (13983816, 6)
filt = np.where(np.bincount(np.where(np.abs(
np.subtract(combos[:, :-1], combos[:, 1:])) == 1)[0]) <= 1)[0] # magic!
filtered = combos[filt]
# filtered is shape (12489092, 6)
Breaking down that "magic" line
First we subtract the first five items in the list from the last five items to get the differences between them. We do this for the entire set of combinations in one shot with np.subtract(combos[:, :-1], combos[:, 1:]). Note that itertools.combinations produces sorted combinations, on which this depends.
Next we take the absolute value of these differences to make sure we only look at positive distances between numbers with np.abs(...).
Next we grab the indicies from this operation for the entire dataset that indicate a difference of 1 (consecutive numbers) with np.where(... == 1)[0]. Note that np.where returns a tuple where the first item are all of the rows, and the second item are all of the corresponding columns for our condition. This is important because any row value that shows up more than once tells us that we have more than one consecutive number in that row!
So we count how many times each row shows up in our results with np.bincount(...), which will return something like [5, 4, 4, 4, 3, 2, 1, 0] indicating how many consecutive pairs are in each row of our combinations dataset.
Finally we grab only the row numbers where there are 0 or 1 consecutive values with np.where(... <= 1)[0].
I am returning way more combinations than you seem to indicate, but I feel fairly confident that this is working. By all means, poke holes in it in the comments and I will see if I can find fixes!
Bonus, because it's all vectorized, it's super fast!

Count the number of times values appear within a range of values

How do I output a list which counts and displays the number of times different values fit into a range?
Based on the below example, the output would be x = [0, 3, 2, 1, 0] as there are 3 Pro scores (11, 24, 44), 2 Champion scores (101, 888), and 1 King score (1234).
- P1 = 11
- P2 = 24
- P3 = 44
- P4 = 101
- P5 = 1234
- P6 = 888
totalsales = [11, 24, 44, 101, 1234, 888]
Here is ranking corresponding to the sales :
Sales___________________Ranking
0-10____________________Noob
11-100__________________Pro
101-1000________________Champion
1001-10000______________King
100001 - 200000__________Lord
This is one way, assuming your values are integers and ranges do not overlap.
from collections import Counter
# Ranges go to end + 1
score_ranges = [
range(0, 11), # Noob
range(11, 101), # Pro
range(101, 1001), # Champion
range(1001, 10001), # King
range(10001, 200001) # Lord
]
total_sales = [11, 24, 44, 101, 1234, 888]
# This counter counts how many values fall into each score range (by index).
# It works by taking the index of the first range containing each value (or -1 if none found).
c = Counter(next((i for i, r in enumerate(score_ranges) if s in r), -1) for s in total_sales)
# This converts the above counter into a list, taking the count for each index.
result = [c[i] for i in range(len(score_ranges))]
print(result)
# [0, 3, 2, 1, 0]
As a general rule homework should not be posted on stackoverflow. As such, just a pointer on how to solve this, implementation is up to you.
Iterate over the totalsales list and check if each number is in range(start,stop). Then for each matching check increment one per category in your result list (however using a dict to store the result might be more apt).
Here a possible solution with no use of modules such as numpy or collections:
totalsales = [11, 24, 44, 101, 1234, 888]
bins = [10, 100, 1000, 10000, 20000]
output = [0]*len(bins)
for s in totalsales:
slot = next(i for i, x in enumerate(bins) if s <= x)
output[slot] += 1
output
>>> [0, 3, 2, 1, 0]
If your sales-to-ranking mapping always follows a logarithmic curve, the desired output can be calculated in linear time using math.log10 with collections.Counter. Use an offset of 0.5 and the abs function to handle sales of 0 and 1:
from collections import Counter
from math import log10
counts = Counter(int(abs(log10(abs(s - .5)))) for s in totalsales)
[counts.get(i, 0) for i in range(5)]
This returns:
[0, 3, 2, 1, 0]
Here, I have used the power of dataframe to store the values, then using bin and cut to group the values into the right categories. The extracting the value count into list.
Let me know if it is okay.
import pandas as pd
import numpy
df = pd.DataFrame([11, 24, 44, 101, 1234, 888], columns=['P'])# Create dataframe
bins = [0, 10, 100, 1000, 10000, 200000]
labels = ['Noob','Pro', 'Champion', 'King', 'Lord']
df['range'] = pd.cut(df.P, bins, labels = labels)
df
outputs:
P range
0 11 Pro
1 24 Pro
2 44 Pro
3 101 Champion
4 1234 King
5 888 Champion
Finally, to get the value count. Use:
my = df['range'].value_counts().sort_index()#this counts to the number of occurences
output=map(int,my.tolist())#We want the output to be integers
output
The result below:
[0, 3, 2, 1, 0]
You can use collections.Counter and a dict:
from collections import Counter
totalsales = [11, 24, 44, 101, 1234, 888]
ranking = {
0: 'noob',
10: 'pro',
100: 'champion',
1000: 'king',
10000: 'lord'
}
c = Counter()
for sale in totalsales:
for k in sorted(ranking.keys(), reverse=True):
if sale > k:
c[ranking[k]] += 1
break
Or as a two-liner (credits to #jdehesa for the idea):
thresholds = sorted(ranking.keys(), reverse=True)
c = Counter(next((ranking[t] for t in thresholds if s > t)) for s in totalsales)

Python: how to move values in an array from some position to another?

I have an a array of values
a = np.array([0,3,4,5,12,3,78,53,52])
I would like to move the last three elements in the array starting from the index 3 in order to have
a
array([ 0, 3, 4, 78, 53, 52, 5, 12, 3])
You can use slicing and concatenation.
np.concatenate((a[:3], a[-3:], a[3:-3]))
Try this using np.delete() and np.insert():
a = np.array([0,3,4,5,12,3,78,53,52])
index = 6
another_index = 0
v = a[index]
np.delete(a,index)
np.insert(a, another_index, v)
This is just a number-swapping problem -- not a numpy problem.
Any solution to this problem that involves numpy functions such as concatenate, delete, insert, or even slicing, is inefficient, involving unnecessary copying of data.
This should work, with minimum copying of data:
a[3],a[4],a[5], a[-3],a[-2],a[-1] = a[-3],a[-2],a[-1], a[3],a[4],a[5]
print(a)
Output:
[ 0 3 4 78 53 52 5 12 3]
Starting with
a = np.array([0,3,4,5,12,3,78,53,52])
You can just do:
newa=[]
for index, each in enumerate(a):
if index<3:
newa.append(a[index])
else:
newa.append(a[3+index%6])
giving the resulting newa to be:
[0, 3, 4, 78, 53, 52, 5, 12, 3]

Python: Convert this list into dictionary

I've got a problem , and do not know how to code in python.
I've got a list[10, 10, 10, 20, 20, 20, 30]
I want it be in a dictionary like this
{"10": 1, "20": 3, "30" : 1}
How could I achieve this?
from collections import Counter
a = [10, 10, 10, 20, 20, 20, 30]
c = Counter(a)
# Counter({10: 3, 20: 3, 30: 1})
If you really want to convert the keys to strings, that's a separate step:
dict((str(k), v) for k, v in c.iteritems())
This class is new to Python 2.7; for earlier versions, use this implementation:
http://code.activestate.com/recipes/576611/
Edit: Dropping this here since SO won't let me paste code into comments,
from collections import defaultdict
def count(it):
d = defaultdict(int)
for j in it:
d[j] += 1
return d
Another way that does not use set or Counter:
d = {}
x = [10, 10, 10, 20, 20, 20, 30]
for j in x:
d[j] = d.get(j,0) + 1
EDIT: For a list of size 1000000 with 100 unique items, this method runs on my laptop in 0.37 sec, while the answer using set takes 2.59 sec. For only 10 unique items, the former method takes 0.36 sec, while the latter method only takes 0.25 sec.
EDIT: The method using defaultdict takes 0.18 sec on my laptop.
Like this
l = [10, 10, 10, 20, 20, 20, 30]
uniqes = set(l)
answer = {}
for i in uniques:
answer[i] = l.count(i)
answer is now the dictionary that you want
Hope this helps
in Python >= 2.7 you can use dict comprehensions, like:
>>> l = [10, 10, 10, 20, 20, 20, 30]
>>> {x: l.count(x) for x in l}
{10: 3, 20: 3, 30: 1}
not the fastest way, but pretty suitable for small lists
UPDATE
or, inspired by inspectorG4dget, this is better:
{x: l.count(x) for x in set(l)}

Categories

Resources