I cannot catch index from random.shuffle method - python

I cannot catch index from random.shuffle method.
An error happens
TypeError: %d format: a number is required, not list
My code is
if __name__ =="__main__":
bell_nums = range(1,6)
pairs = list(itertools.combinations(bell_nums,2))
for pair in pairs:
bell_num1=int(pair[0])
bell_num2 = int(pair[1])
train_data = np.empty((0,12),float)
train_label = np.array([])
test_data = np.empty((0,12),float)
test_label = np.array([])
noise_nums = list(range(1,12))
level_nums = list(range(0,10))
random.shuffle(noise_nums)
nfft=2048
nceps = 12
for noise_nums_index in noise_nums[0:10]:
random.shuffle(level_nums)
files_name = glob.glob("learning_sample/%d_%d_%d.wav" % (bell_num1,noise_nums_index,level_nums))
for file_name in files_name:
feature = get_feature(files_name,nfft,nceps)
if len(train_data) ==0:
train_data=feature
else:
train_data=np.vstack((train_data,feature))
train_label=np.append(train_label,bell_num1)
files_name="learning_sample/%d_%d_%d.wav"% (bell_num1,noise_num,level_nums[8])
feature = get_feature(file_name,nfft,nceps)
if len(test_data) ==0:
test_data=feature
else:
test_data=np.vstack((test_data,feature))
test_label=np.append(test_label,bell_num1)
I think level_nums is list type,so this error happen.
But I cannot come up with the way to catch index from random.shuffle method in this case.
I wanna name "learning_sample/%d_%d_%d.wav" this file by using noise_nums_index's number and level_nums's number randomly.In this case, how can I do to name this part?How can I write this?Should I use for statement for random.shuffle(level_nums)?

To select a random element in level_nums you should use: random.choice(level_nums)
In your code:
for noise_nums_index in noise_nums[0:10]:
files_name = glob.glob("learning_sample/%d_%d_%d.wav" % (bell_num1,noise_nums_index,random.choice(level_nums)))
Note that since noise_nums is not define in the code you provide I was no able to check the full code. There might be some other errors.

you can use random.randrange method from standard python library.
for example:
random.randrange(1, 12, 1)
or random.choice method:
a = list(range(1,12))
random.choice(a)

level_nums is a list and you are pointing it at a %d format string.
If you are just looking for a random int, and because you are already using numpy have you considered using np.random.choice()? That would remove the need to even use the shuffle method.
>>> np.random.choice(level_nums)
3
>>> np.random.choice(level_nums)
8
Or just the random int function and get rid of level_nums completely
>>> np.random.randint(1, 11)
6

Related

Fastest way to extract and increase latest number from end of string

I have a list of strings that have numbers as suffixes. I'm trying to extract the highest number so I can increase it by 1. Here's what I came up with but I'm wondering if there's a faster way to do this:
data = ["object_1", "object_2", "object_3", "object_blah", "object_123asdfd"]
numbers = [int(obj.split("_")[-1]) for obj in data if obj.split("_")[-1].isdigit()] or [0]
print sorted(numbers)[-1] + 1 # Output is 4
A few conditions:
It's very possible that the suffix is not a number at all, and should be skipped.
If no input is valid, then the output should be 1 (this is why I have or [0])
No Python 3 solutions, only 2.7.
Maybe some regex magic would be faster to find the highest number to increment on? I don't like the fact that I have to split twice.
Edit
I did some benchmarks on the current answers using 100 iterations on data that has 10000 items:
Alex Noname's method: 1.65s
Sushanth's method: 1.95s
Balaji Ambresh method: 2.12s
My original method: 2.16s
I've accepted an answer for now, but feel free to contribute.
Using a heapq.nlargest is a pretty efficient way. Maybe someone will compare with other methods.
import heapq
a = heapq.nlargest(1, map(int, filter(lambda b: b.isdigit(), (c.split('_')[-1] for c in data))))[0]
Comparing with the original method (Python 3.8)
import heapq
import random
from time import time
data = []
for i in range(0, 1000000):
data.append(f'object_{random.randrange(10000000)}')
begin = time()
a = heapq.nlargest(1, map(int, filter(lambda b: b.isdigit(), (c.split('_')[-1] for c in data))))[0]
print('nlargest method: ', time() - begin)
print(a)
begin = time()
numbers = [int(obj.split("_")[-1]) for obj in data if obj.split("_")[-1].isdigit()] or [0]
a = sorted(numbers)[-1]
print('original method: ', time() - begin)
print(a)
nlargest method: 0.4306185245513916
9999995
original method: 0.8409149646759033
9999995
try this, using list comprehension to get all digits & max would return the highest value.
max([
int(x.split("_")[-1]) if x.split("_")[-1].isdigit() else 0 for x in data
]) + 1
Try:
import re
res = max([int( (re.findall('_(\d+)$', item) or [0])[0] ) for item in data]) + 1
Value:
4

Python List Append with bumpy array error

I am trying to use list append function to append a list to a list.
But got error shows list indices must be integers or slices, not tuple. Not sure why.
pca_components = range(1,51)
gmm_components = range(1,5)
covariance_types = ['spherical', 'diag', 'tied', 'full']
# Spherical
spherical_results = []
for i in pca_components:
pca_model = PCA(n_components=i)
pca_train = pca_model.fit(train_data).transform(train_data)
for j in gmm_components:
parameters = (i+i)*j*2
if parameters > 50:
pass
else:
gmm_model = GMM(n_components=j, covariance_type='spherical')
gmm_model.fit(pca_train)
pca_test = pca_model.transform(test_data)
predictions = gmm_model.predict(pca_test)
accuracy = np.mean(predictions.ravel() == test_labels.ravel())
accuracy=int(accuracy)
spherical_results.append([accuracy, i,j, parameters])
spher_results = np.array(spherical_results)
max_accuracy = np.amax(spherical_results[:,0])
print(f"highest accuracy score for spherical is {max_accuracy}")
What's the purpose of this line?
spher_results = np.array(spherical_results)
It makes an array from a list. But you don't use spher_results in the following code.

Generating random data

I'm trying to generate random data for 'ACDEFGHIKLMNPQRSTVWY' in a 3-mer form (like, AEF) using following script, but in the outputs I have gotten many similar 3-mers. Would you please advise me on how to do, not to get similar results or remove the same 3-mer?
Thanks in advance,
Berk
import random
def random_AA_seq(length):
return ''.join(random.choice('ACDEFGHIKLMNPQRSTVWY') for i in range(length))
list_size = 10000
lengths = []
for j in range(list_size):
a = int(random.normalvariate(3, 0))
print random_AA_seq(a)
To remove the same items, just make it a set:
print set(random_AA_seq(a))
To get all possible permutations, you could also use itertools...
from itertools import permutations
length = 3
print permutations('ACDEFGHIKLMNPQRSTVWY', length)
... and pick your 3-mers randomly afterwards.
Per comments:
alphabet = "ACDEFGHIKLMNPQRSTVWY"
all_trimers = map("".join, itertools.product(* [alphabet] * 3))
a_few_distinct_trimers = random.sample(all_trimers, 42)
Just
''.join(random.choice(string.ascii_uppercase) for _ in range(3))
should be fine
Updated answer, the following script will return a list of 3mers of the length required. Each 3mer will occur in the list once:
import random
def random_3mers(length):
seqs = set()
while len(seqs) < length:
seqs.add("".join(random.sample("ACDEFGHIKLMNPQRSTVWY", 3)))
lseqs = list(seqs)
random.shuffle(lseqs)
return lseqs
for three_mer in random_3mers(10):
print three_mer
For a length of 10, the following type of output will be displayed:
MKY
KWV
PRY
WKQ
YGI
ANQ
GFL
RQE
SCN
GRY

Python: How to generate a 12-digit random number?

In Python, how to generate a 12-digit random number? Is there any function where we can specify a range like random.range(12)?
import random
random.randint()
The output should be a string with 12 digits in the range 0-9 (leading zeros allowed).
Whats wrong with a straightforward approach?
>>> import random
>>> random.randint(100000000000,999999999999)
544234865004L
And if you want it with leading zeros, you need a string.
>>> "%0.12d" % random.randint(0,999999999999)
'023432326286'
Edit:
My own solution to this problem would be something like this:
import random
def rand_x_digit_num(x, leading_zeroes=True):
"""Return an X digit number, leading_zeroes returns a string, otherwise int"""
if not leading_zeroes:
# wrap with str() for uniform results
return random.randint(10**(x-1), 10**x-1)
else:
if x > 6000:
return ''.join([str(random.randint(0, 9)) for i in xrange(x)])
else:
return '{0:0{x}d}'.format(random.randint(0, 10**x-1), x=x)
Testing Results:
>>> rand_x_digit_num(5)
'97225'
>>> rand_x_digit_num(5, False)
15470
>>> rand_x_digit_num(10)
'8273890244'
>>> rand_x_digit_num(10)
'0019234207'
>>> rand_x_digit_num(10, False)
9140630927L
Timing methods for speed:
def timer(x):
s1 = datetime.now()
a = ''.join([str(random.randint(0, 9)) for i in xrange(x)])
e1 = datetime.now()
s2 = datetime.now()
b = str("%0." + str(x) + "d") % random.randint(0, 10**x-1)
e2 = datetime.now()
print "a took %s, b took %s" % (e1-s1, e2-s2)
Speed test results:
>>> timer(1000)
a took 0:00:00.002000, b took 0:00:00
>>> timer(10000)
a took 0:00:00.021000, b took 0:00:00.064000
>>> timer(100000)
a took 0:00:00.409000, b took 0:00:04.643000
>>> timer(6000)
a took 0:00:00.013000, b took 0:00:00.012000
>>> timer(2000)
a took 0:00:00.004000, b took 0:00:00.001000
What it tells us:
For any digit under around 6000 characters in length my method is faster - sometimes MUCH faster, but for larger numbers the method suggested by arshajii looks better.
Do random.randrange(10**11, 10**12). It works like randint meets range
From the documentation:
randrange(self, start, stop=None, step=1, int=<type 'int'>, default=None, maxwidth=9007199254740992L) method of random.Random instance
Choose a random item from range(start, stop[, step]).
This fixes the problem with randint() which includes the
endpoint; in Python this is usually not what you want.
Do not supply the 'int', 'default', and 'maxwidth' arguments.
This is effectively like doing random.choice(range(10**11, 10**12)) or random.randint(10**1, 10**12-1). Since it conforms to the same syntax as range(), it's a lot more intuitive and cleaner than these two alternatives
If leading zeros are allowed:
"%012d" %random.randrange(10**12)
Since leading zeros are allowed (by your comment), you could also use:
int(''.join(str(random.randint(0,9)) for _ in xrange(12)))
EDIT: Of course, if you want a string, you can just leave out the int part:
''.join(str(random.randint(0,9)) for _ in xrange(12))
This seems like the most straightforward way to do it in my opinion.
There are many ways to do that:
import random
rnumber1 = random.randint(10**11, 10**12-1) # randint includes endpoint
rnumber2 = random.randrange(10**11, 10**12) # randrange does not
# useful if you want to generate some random string from your choice of characters
digits = "123456789"
digits_with_zero = digits + "0"
rnumber3 = random.choice(digits) + ''.join(random.choice(digits_with_zero) for _ in range(11))
from random import randint
def random_with_N_digits(n):
range_start = 10**(n-1)
range_end = (10**n)-1
return randint(range_start, range_end)
print random_with_N_digits(12)
This may not be exactly what you're looking for, but a library like rstr let's you generate random strings. With that all you would need is (leading 0 allowed):
import rstr
foo = rstr.digits(12)

Code: Code isn't working to sort through a list of 1 million integers, printing top 10

This is for homework, so I must try to use as little python functions as possible, but still allow for a computer to process a list of 1 million numbers efficiently.
#!/usr/bin/python3
#Find the 10 largest integers
#Don't store the whole list
import sys
import heapq
def fOpen(fname):
try:
fd = open(fname,"r")
except:
print("Couldn't open file.")
sys.exit(0)
all = fd.read().splitlines()
fd.close()
return all
words = fOpen(sys.argv[1])
numbs = map(int,words)
print(heapq.nlargest(10,numbs))
li=[]
count = 1
#Make the list
for x in words:
li.append(int(x))
count += 1
if len(li) == 10:
break
#Selection sort, largest-to-smallest
for each in range(0,len(li)-1):
pos = each
for x in range(each+1,10):
if li[x] > li[pos]:
pos = x
if pos != each:
li[each],li[pos] = li[pos],li[each]
for each in words:
print(li)
each = int(each)
if each > li[9]:
for x in range(0,9):
pos = x
if each > li[x]:
li[x] = each
for i in range(x+1,10):
li[pos],li[i] = li[i],li[pos]
break
#Selection sort, largest-to-smallest
for each in range(0,len(li)-1):
pos = each
for x in range(each+1,10):
if li[x] > li[pos]:
pos = x
if pos != each:
li[each],li[pos] = li[pos],li[each]
print(li)
The code is working ALMOST the way that I want it to. I tried to create a list from the first 10 digits. Sort them, so that it in descending order. And then have python ONLY check the list, if the digits are larger than the smaller one (instead of reading through the list 10(len(x)).
This is the output I should be getting:
>>>[9932, 9885, 9779, 9689, 9682, 9600, 9590, 9449, 9366, 9081]
This is the output I am getting:
>>>[9932, 9689, 9885, 9779, 9682, 9025, 9600, 8949, 8612, 8575]
If you only need the 10 top numbers, and don't care to sort the whole list.
And if "must try to use as little python functions as possible" means that you (or your theacher) prefer to to avoid heapq.
Another way could be to keep track of the 10 top numbers while you parse the whole file only one time:
top = []
with open('numbers.txt') as f:
# the first ten numbers are going directly in
for line in f:
top.add(int(line.strip()))
if len(top) == 10:
break
for line in f:
num = int(line.strip())
min_top = min(top)
if num > min_top: # check if the new number is a top one
top.remove(min_top)
top.append(num)
print(sorted(top))
Update: If you don't really need an in-place sort and since you're going to sort only 10 numebrs, I'd avoid the pain of reordering.
I'd just build a new list, example:
sorted_top = []
while top:
max_top = max(top)
sorted_top.append(max_top)
top.remove(max_top)
well, by both reading in the entire file and splitting it, then using map(), you are keeping a lot of data in memory.
As Adrien pointed out, files are iterators in py3k, so you can just use a generator literal to provide the iterable for nlargest:
nums = (int(x) for x in open(sys.argv[1]))
then, using
heapq.nlargest(10, nums)
should get you what you need, and you haven't stored the entire list even once.
the program is even shorter than the original, as well!
#!/usr/bin/env python3
from heapq import nlargest
import sys
nums = (int(x) for x in open(sys.argv[1]))
print(nlargest(10, nums))

Categories

Resources