Im trying to code a 4x4 matrix in python with random integers 1-4.
Thats easy enough my problem is i want for each row and each column only one time uses of each digit 1-4
example
1 2 3 4
2 3 4 1
3 4 1 2
4 1 2 3
my code does it like 33% of the time in my loop there happens somthing like this
2 1 4 3
3 4 2 1
1 3 X <-------- because of this the programm cant contiune and I end up in an infinity loop could someone helb how can i get out?
my code below
""" Programm for playing the game skyline """
from random import randrange
row1 = []
row2 = []
row3 = []
row4 = []
allrows = [row1, row2, row3, row4]
column1 = []
column2 = []
column3 = []
column4 = []
allcolumns = [column1, column2, column3, column4]
def board():
for i in range(4):
j = 0
while len(allrows[i]) != 4:
x = randrange(1,5)
print(i, j)
if x not in allrows[i] and x not in allcolumns[j]:
allrows[i].append(x)
allcolumns[j].append(x)
j += 1
else:
continue
board()
You seem to be looking for permutations, and here is how to get them:
from itertools import permutations
a = list(permutations([1,2,3,4]))
Now to get random 4 lists:
import random
from itertools import permutations
a = list(permutations([1,2,3,4]))
for _ in range(4):
print a[random.randint(0,len(a)-1)]
EDIT is this the one you were looking for:
import random
import numpy as np
from itertools import permutations
a = list(permutations([1,2,3,4]))
i = 0
result = [a[random.randint(0,len(a)-1)]]
a.remove(result[0])
print result
while i < 3:
b = a[random.randint(0,len(a)-1)]
if not any([any(np.equal(b,x)) for x in result]):
result.append(b)
i +=1
a.remove(b)
print result
Basically, what you do is put the numbers you want to select from in a list. Randomly pick an index, use and remove it.
Next time through, you pick one of the remaining ones.
I have tried this using for, if and elif; for ranges more than 4 it is working.
x=int(input("enter your range"))
for i in range(x+1):
if i+1<x+1:
print(i+1,end='')
if(i+2<x+1):
print(i+2,end='')
if(i+3<x+1):
print(i+3,end='')
if(i+4<x+1):
print(i+4)
elif(i!=0 and i+4>=x+1):
print(i)
elif(i!=0 and i+3>=x+1):
print(i-1,end='')
print(i)
elif(i!=0 and i+2>=x+1):
print(i-2,end='')
print(i-1,end='')
print(i)
Related
For each string in a list, I need to find the number of strings in that list that are one levenshtein-distance away. The levenshtein-distance is smallest number of character substitutions, additions, or removals necessary to derive one word from another. For illustration, please see the following DataFrame:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'word':['can', 'cans', 'canse', 'canpe', 'canp', 'camp'],
'code':['k#n', 'k#n}', 'k#(z', np.nan, 'k#()', np.nan]})
word code
0 can k#n
1 cans k#n}
2 canse k#(z
3 canpe
4 canp k#()
5 camp
My current implementation is way too slow:
from Levenshtein import distance as lev
df = df.fillna('')
# get unique strings
wordAll = df['word'].dropna().to_list()
codeAll = list(set(df['code'].dropna().to_list()))
# prepare dataframe for storage
df['wordLev'] = np.nan
df['codeLev'] = np.nan
# find neighbors
for idx,row in df.iterrows():
i=0
j=0
# get word and code
word = row['word']
code = row['code']
# remove word and code from all-strings-list
wordSubset = [w for w in wordAll if w != word]
codeSubset = [c for c in codeAll if c != code]
# compute number of neighbors
for item in wordSubset:
if lev(word, item) == 1:
i += 1
for item in codeSubset:
if lev(code, item) == 1:
j += 1
# add number of neighbors to df
df.loc[df['code'] == code, 'wordLev'] = i
if code != '':
df.loc[df['code'] == code, 'codeLev'] = j
else:
df.loc[df['code'] == code, 'codeLev'] = ''
df
word code wordLev codeLev
0 can k#n 2 1
1 cans k#n} 3 1
2 canse k#(z 2 1
3 canpe 2
4 canp k#() 3 1
5 camp 1
How can I speed it up? The DataFrame has ~500k rows...
The following code seems to be ~5x faster than your code at 1.8ms vs 9.6ms (at least on the df you've provided).
df = df.fillna('')
df['wordLev'] = [sum(1 for item in df['word'] if item!=word and lev(word, item)==1) for word in df['word']]
df['codeLev'] = [sum(1 for item in df['code'] if item!=code and lev(code, item)==1) or '' for code in df['code']]
This code is really very similar to yours. Biggest difference is that instead of creating wordSubset or codeSubset and then iterating over them again to apply the levenshtein distance function, it does it in all in a single generator expression. Since you're checking each word with every word in the column, you can't escape a double loop imo.
I have this task I need to complete:
"There are N athletes competing in a hammer throw. Each of them made M
throws. The athlete with the highest best throw wins. If there are
several of them, then the one with the best sum of results for all
attempts wins. If there are several of them, the athlete with the
minimum number is considered the winner. Determine the number of the winner of the competition."
I can find highest best throw wins, but I can't find the athlete with the minimum number.
Sample Input:
4 3
4 2 7
1 2 7
1 3 5
4 1 6
Sample Output:
1
My code so far:
row,columns = map(int,input().split())
matrix = [[int(i) for i in input().split()] for j in range(row)]
numbers = []
suma = []
for i in range(row):
numbers.append(matrix[i][0])
sumaa = sum(matrix[i]) - matrix[i][0]
suma.append(sumaa)
new_matrix = [numbers,suma]
print(new_matrix.index(max(new_matrix)))
input = """4 3
4 2 7
1 2 7
1 3 5
4 1 6
"""
def winner(input):
athletes = input.split("\n")
best_throw = max(" ".join(athletes).split(" "))
best_total = max(map(lambda s: sum(list(map(lambda n: int(n) if n != '' else 0, s.split(" ")))), athletes))
best_athletes_indexes = []
for i, athlete in enumerate(athletes):
if best_throw in athlete.split(" ") and sum(map(int, athlete.split(" "))) == best_total:
best_athletes_indexes.append(i)
best_athletes_attempts = list(map(lambda i: len(athletes[i].split(" ")), best_athletes_indexes))
return best_athletes_indexes[best_athletes_attempts.index(min(best_athletes_attempts))]
print(winner(input))
please please please do not ask me to explain this this is the first python i hav written in 2 years. i come from a world of type safety wth
my search history is literally "remove item from list python" the python standard library is strange
It's answer
a = []
b = []
row, columns = map(int, input().split())
for i in range(row):
a.append(list(map(int, input().split())))
for i in range(row):
b.append([max(a[i]), sum(a[i])])
print(b.index(max(b)))
Try this code:
row, columns = map(int, input().split())
matrix = [list(map(int, input().split())) for _ in range(row)]
matrix_max_sum_elms = [[max(row), sum(row)] for row in matrix]
best_athlete_ind = matrix_max_sum_elms.index(max(matrix_max_sum_elms)) + 1
print(best_athlete_ind)
Explanation:
First, we create a list of lists with an input value,
then we create a new list of lists, in which each list contains the maximum value and the sum of the elements of the input values. As a result, we take the index of the list that contains the maximum value and add 1, since indexing starts from 0
I know there are multiple questions asked reguarding finding the most frequent numbers and how many times they have been repeated. However, I have a problem that requires to sole the question only using for loop, if, etc.
I'm not allowed to use .count, dic, arrary or any other fancy functions.
My_list=[1,1,1,1,1,1,1,1,2,2,2,3,4,5,6,6,7,7,8,7,8,8,8,8,8,8,8]
The answer that is required to print would be
1, 8times 8, 8times
I know it may be a pain to use only for loop, but it's killing me and i'm craving for hlep :(
There are a lot of questions that exist will practice iterative and list. I do not think so this is a good practice. For your pain, I thought to provide you a little bit of a messy answer (messy means a lot of use of variables).
You have not mentioned length of your list. Therefore, I have created this code to work with any range.
Code with comments
My_list=[1,1,1,1,1,1,1,1,2,2,2,3,4,5,6,6,7,7,8,7,8,8,8,8,8,8,8]
list1 = []
list2 = []
list3 = []
c = 0
k = 1
y = 1
while y == 1: # Use while loop until all objects read and store in another lists
for i in My_list: # read oblectsa in My_list one by one
if i == k:
list1.append(k) # append all same digits into list1
list2.append(len(list1)) # Get the length of list1 that have same digits and store it in list2
list3.append(list1[0]) # Get the first value of list1 that have same digits and store it in list2
list1 = [] # Reset the list one for store next same digits
k = k + 1
if k == My_list[-1] + 1: # get the value of last digit of the list
y = 0
m = 0
for j in list2: # use this for loop to get final outcome
print(m, ",", j, "times", list3[m], ",", j, "times")
m = m + 1
Code without comments
My_list=[1,1,1,1,1,1,1,1,2,2,2,3,4,5,6,6,7,7,8,7,8,8,8,8,8,8,8]
list1 = []
list2 = []
list3 = []
c = 0
k = 1
y = 1
while y == 1:
for i in My_list:
if i == k:
list1.append(k)
list2.append(len(list1))
list3.append(list1[0])
list1 = []
k = k + 1
if k == My_list[-1] + 1:
y = 0
m = 0
for j in list2:
print(m, ",", j, "times", list3[m], ",", j, "times")
m = m + 1
Output -:
0 , 8 times 1 , 8 times
1 , 3 times 2 , 3 times
2 , 1 times 3 , 1 times
3 , 1 times 4 , 1 times
4 , 1 times 5 , 1 times
5 , 2 times 6 , 2 times
6 , 3 times 7 , 3 times
7 , 8 times 8 , 8 times
Note -:
You can use print(list2) and print(list3) end of the code to see what happens. And also try to understand the code by deleting part by part.
I am trying to write each iterated output of for loop for further operations.
Here is my code
#!/usr/bin/python
import io
from operator import itemgetter
with open('test.in') as f:
content = f.readlines()
content = [int(x) for x in content]
content = tuple(content)
nClus = input("Number of Clusters: ")
nEig = input("Number of eigen values: ")
j = 0
k = nClus + 1
content1 = ""
for i in range(1,k):
print content[j*(nEig+1):i*(nEig+1)]
j = j + 1
The file test.in looks like this (which is an example, actual test.in contains huge amount of data)
40
1
4
3
5
7
29
6
9
4
7
3
50
1
2
3
4
5
57
9
8
7
6
5
The values nClus = 4, nEig = 5.
Any suggestions on how to proceed?
Why not just save them to an array (mydata below)? I don't see where j stops (other_dimension, you can probably just delete it if you only have 1 dimension of results, I don't know your array size), but you can follow this format to get a numpy array to save data to:
import numpy as np
... [your code]
mydata = np.zeros([k,other_dimension]) // other_dimension only if you are saving a rectangular matrix of results instead of a vector
for i in range(1,k):
mydata[row, column] = content[j*(nEig+1):i*(nEig+1)] // put your iterators here for row, column if applicable (rectangular matrix), otherwise mydata[iterator]
print mydata[k, other_dimension] // other_dimension only if you are saving a rectangular matrix of results instead of a vector
j = j + 1
I am looking for a module in sklearn that lets you derive the word-word co-occurrence matrix.
I can get the document-term matrix but not sure how to go about obtaining a word-word matrix of co-ocurrences.
Here is my example solution using CountVectorizer in scikit-learn. And referring to this post, you can simply use matrix multiplication to get word-word co-occurrence matrix.
from sklearn.feature_extraction.text import CountVectorizer
docs = ['this this this book',
'this cat good',
'cat good shit']
count_model = CountVectorizer(ngram_range=(1,1)) # default unigram model
X = count_model.fit_transform(docs)
# X[X > 0] = 1 # run this line if you don't want extra within-text cooccurence (see below)
Xc = (X.T * X) # this is co-occurrence matrix in sparse csr format
Xc.setdiag(0) # sometimes you want to fill same word cooccurence to 0
print(Xc.todense()) # print out matrix in dense format
You can also refer to dictionary of words in count_model,
count_model.vocabulary_
Or, if you want to normalize by diagonal component (referred to answer in previous post).
import scipy.sparse as sp
Xc = (X.T * X)
g = sp.diags(1./Xc.diagonal())
Xc_norm = g * Xc # normalized co-occurence matrix
Extra to note #Federico Caccia answer, if you don't want co-occurrence that are spurious from the own text, set occurrence that is greater that 1 to 1 e.g.
X[X > 0] = 1 # do this line first before computing cooccurrence
Xc = (X.T * X)
...
All the provided answers didn't use the window-moving concept into consideration. So, I did my own function that does find the co-occurrence matrix by applying a moving window of a defined size.
This function takes a list of sentences and a window_size number; and it returns a pandas.DataFrame object representing the co-occurrence matrix:
from collections import defaultdict
def co_occurrence(sentences, window_size):
d = defaultdict(int)
vocab = set()
for text in sentences:
# preprocessing (use tokenizer instead)
text = text.lower().split()
# iterate over sentences
for i in range(len(text)):
token = text[i]
vocab.add(token) # add to vocab
next_token = text[i+1 : i+1+window_size]
for t in next_token:
key = tuple( sorted([t, token]) )
d[key] += 1
# formulate the dictionary into dataframe
vocab = sorted(vocab) # sort vocab
df = pd.DataFrame(data=np.zeros((len(vocab), len(vocab)), dtype=np.int16),
index=vocab,
columns=vocab)
for key, value in d.items():
df.at[key[0], key[1]] = value
df.at[key[1], key[0]] = value
return df
Let's try it out given the following two simple sentences:
>>> text = ["I go to school every day by bus .",
"i go to theatre every night by bus"]
>>>
>>> df = co_occurrence(text, 2)
>>> df
. bus by day every go i night school theatre to
. 0 1 1 0 0 0 0 0 0 0 0
bus 1 0 2 1 0 0 0 1 0 0 0
by 1 2 0 1 2 0 0 1 0 0 0
day 0 1 1 0 1 0 0 0 1 0 0
every 0 0 2 1 0 0 0 1 1 1 2
go 0 0 0 0 0 0 2 0 1 1 2
i 0 0 0 0 0 2 0 0 0 0 2
night 0 1 1 0 1 0 0 0 0 1 0
school 0 0 0 1 1 1 0 0 0 0 1
theatre 0 0 0 0 1 1 0 1 0 0 1
to 0 0 0 0 2 2 2 0 1 1 0
[11 rows x 11 columns]
Now, we have our co-occurrence matrix.
#titipata I think your solution is not a good metric because we are giving the same weight to real co-ocurrences and to occurrences that are just spurious.
For example, if I have 5 texts and the words apple and house appears with this frecuency:
text1: apple:10, "house":1
text2: apple:10, "house":0
text3: apple:10, "house":0
text4: apple:10, "house":0
text5: apple:10, "house":0
The co-occurrence we are going to measure is 10*1+10*0+10*0+10*0+10*0=10, but is just spurious.
And, in this another important cases, like the following:
text1: apple:1, "banana":1
text2: apple:1, "banana":1
text3: apple:1, "banana":1
text4: apple:1, "banana":1
text5: apple:1, "banana":1
we are going to get just a co-occurrence of 1*1+1*1+1*1+1*1=5, when in fact that co-occurrence really important.
#Guiem Bosch In this case co-occurrences are measured only when the two words are contiguous.
I propose to use something the #titipa solution to compute the matrix:
Xc = (Y.T * Y) # this is co-occurrence matrix in sparse csr format
where, instead of using X, use a matrix Y with ones in positions greater than 0 and zeros in another positions.
Using this, in the first example we are going to have:
co-occurrence:1*1+1*0+1*0+1*0+1*0=1
and in the second example:
co-occurrence:1*1+1*1+1*1+1*1+1*0=5
which is what we are really looking for.
You can use the ngram_range parameter in the CountVectorizer or TfidfVectorizer
Code example:
bigram_vectorizer = CountVectorizer(ngram_range=(2, 2)) # by saying 2,2 you are telling you only want pairs of 2 words
In case you want to explicitly say which co-occurrences of words you want to count, use the vocabulary param, i.e: vocabulary = {'awesome unicorns':0, 'batman forever':1}
http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html
Self-explanatory and ready to use code with predefined word-word co-occurrences. In this case we are tracking for co-occurrences of awesome unicorns and batman forever:
from sklearn.feature_extraction.text import CountVectorizer
import numpy as np
samples = ['awesome unicorns are awesome','batman forever and ever','I love batman forever']
bigram_vectorizer = CountVectorizer(ngram_range=(1, 2), vocabulary = {'awesome unicorns':0, 'batman forever':1})
co_occurrences = bigram_vectorizer.fit_transform(samples)
print 'Printing sparse matrix:', co_occurrences
print 'Printing dense matrix (cols are vocabulary keys 0-> "awesome unicorns", 1-> "batman forever")', co_occurrences.todense()
sum_occ = np.sum(co_occurrences.todense(),axis=0)
print 'Sum of word-word occurrences:', sum_occ
print 'Pretty printig of co_occurrences count:', zip(bigram_vectorizer.get_feature_names(),np.array(sum_occ)[0].tolist())
Final output is ('awesome unicorns', 1), ('batman forever', 2), which corresponds exactly to our samples provided data.
I used the below code for creating co-occurrance matrix with window size:
#https://stackoverflow.com/questions/4843158/check-if-a-python-list-item-contains-a-string-inside-another-string
import pandas as pd
def co_occurance_matrix(input_text,top_words,window_size):
co_occur = pd.DataFrame(index=top_words, columns=top_words)
for row,nrow in zip(top_words,range(len(top_words))):
for colm,ncolm in zip(top_words,range(len(top_words))):
count = 0
if row == colm:
co_occur.iloc[nrow,ncolm] = count
else:
for single_essay in input_text:
essay_split = single_essay.split(" ")
max_len = len(essay_split)
top_word_index = [index for index, split in enumerate(essay_split) if row in split]
for index in top_word_index:
if index == 0:
count = count + essay_split[:window_size + 1].count(colm)
elif index == (max_len -1):
count = count + essay_split[-(window_size + 1):].count(colm)
else:
count = count + essay_split[index + 1 : (index + window_size + 1)].count(colm)
if index < window_size:
count = count + essay_split[: index].count(colm)
else:
count = count + essay_split[(index - window_size): index].count(colm)
co_occur.iloc[nrow,ncolm] = count
return co_occur
then i used the below code to perform test:
corpus = ['ABC DEF IJK PQR','PQR KLM OPQ','LMN PQR XYZ ABC DEF PQR ABC']
words = ['ABC','PQR','DEF']
window_size =2
result = co_occurance_matrix(corpus,words,window_size)
result
Output is here:
with numpy, as corpus would be list of lists (each list a tokenized document):
corpus = [['<START>', 'All', 'that', 'glitters', "isn't", 'gold', '<END>'],
['<START>', "All's", 'well', 'that', 'ends', 'well', '<END>']]
and a word->row/col mapping
def compute_co_occurrence_matrix(corpus, window_size):
words = sorted(list(set([word for words_list in corpus for word in words_list])))
num_words = len(words)
M = np.zeros((num_words, num_words))
word2Ind = dict(zip(words, range(num_words)))
for doc in corpus:
cur_idx = 0
doc_len = len(doc)
while cur_idx < doc_len:
left = max(cur_idx-window_size, 0)
right = min(cur_idx+window_size+1, doc_len)
words_to_add = doc[left:cur_idx] + doc[cur_idx+1:right]
focus_word = doc[cur_idx]
for word in words_to_add:
outside_idx = word2Ind[word]
M[outside_idx, word2Ind[focus_word]] += 1
cur_idx += 1
return M, word2Ind