Python Duplicating values when appending to array in class - python

This is my first StackOverflow post, so please tell me if I have done anything wrong!
I am trying to make a card game in python and have been told that using a class based system would be best.
Whilst trying to do so, when putting all the cards into a deck, the classes seem to duplicate the values onto the board.cards part of the code.
#definitions and imports
import random
class Card:
suit = ""
base = ""
class Hand:
cards = []
poweri = 0
powerii = 0
class Stack:
cards = []
#instantiates classes
deck = Stack()
board = Stack()
player = Hand()
dealer = Hand()
#creates ordered empty deck
def newdeck(obj):
for x in ["2","3","4","5","6","7","8","9","A","B","C","D","E"]:
for y in ["C","D","H","S"]:
card = Card()
card.base = x
card.suit = y
obj.cards.append(card)
#shuffles deck
def shuffle():
random.shuffle(deck.cards)
newdeck(deck)
#disabled to make debug easier
#shuffle()
#prints entire deck
print("\nDeck")
for i in range(len(deck.cards)):
print(deck.cards[i].base, deck.cards[i].suit)
print(len(deck.cards))
#prints entire board
print("\nBoard")
for i in range(len(board.cards)):
print(board.cards[i].base, board.cards[i].suit)
The program returns this:
Deck
2 C
2 D
2 H
2 S
3 C
3 D
3 H
3 S
4 C
4 D
4 H
4 S
5 C
5 D
5 H
5 S
6 C
6 D
6 H
6 S
7 C
7 D
7 H
7 S
8 C
8 D
8 H
8 S
9 C
9 D
9 H
9 S
A C
A D
A H
A S
B C
B D
B H
B S
C C
C D
C H
C S
D C
D D
D H
D S
E C
E D
E H
E S
52
Board
2 C
2 D
2 H
2 S
3 C
3 D
3 H
3 S
4 C
4 D
4 H
4 S
5 C
5 D
5 H
5 S
6 C
6 D
6 H
6 S
7 C
7 D
7 H
7 S
8 C
8 D
8 H
8 S
9 C
9 D
9 H
9 S
A C
A D
A H
A S
B C
B D
B H
B S
C C
C D
C H
C S
D C
D D
D H
D S
E C
E D
E H
E S
Process returned 0 (0x0) execution time : 0.314 s
Press any key to continue . . .
Board should be empty???
Regards,
Alex

The Stack.cards is a mutable class attribute. That means, all class instances will have reference to the same object with the same data.
You probably want each object to have its own data. To change it, instantiate cards within __init__() method:
class Stack:
def __init__(self):
self.cards = []

Related

Slow NetworkX graph creation

I have to create a graph, starting from a documents-terms matrix, loaded into a pandas dataframe, where nodes are terms and where arches contain the number of documents in which the two nodes appear together.
The code works well but is really really slow.
edges = []
edges_attrs = {}
columns = list(dtm.columns)
for key in dtm.columns:
for key1 in columns:
# skip the same node
if key == key1:
continue
df = dtm.loc[(dtm[key] != 0) & (dtm[key1] != 0), [key, key1]]
docs = df.shape[0]
edges.append((key, key1))
edges_attrs[(key, key1)] = {'docs': docs}
# no double arches (u, v) == (v, u)
columns.remove(key)
graph.add_edges_from(edges)
nx.set_edge_attributes(graph, edges_attrs)
For a dtm with 2k terms (columns), it takes more than 3 hours, that it sounds to me quite too much for that size.
Some hints on how to speed up?
Don't use for loops. Learn about inner and outer joins in databases. An introductory course in SQL would cover these concepts. Applying them to a pandas dataframe is then pretty straightforward:
#!/usr/bin/env python
"""
https://stackoverflow.com/q/62406586/2912349
"""
import numpy as np
import pandas as pd
# simulate some data
x = pd.DataFrame(np.random.normal(0, 1, (4,4)), index=['a', 'b', 'c', 'd'], columns=['e', 'f', 'g', 'h'])
x[:] = x > 0
# e f g h
# a False False True False
# b False False False True
# c True True True True
# d False True True True
sparse = pd.DataFrame(x[x > 0].stack().index.tolist(), columns=['Documents', 'Terms'])
# Documents Terms
# 0 a g
# 1 b h
# 2 c e
# 3 c f
# 4 c g
# 5 c h
# 6 d f
# 7 d g
# 8 d h
cooccurrences = pd.merge(sparse, sparse, how='inner', on='Documents')
# Documents Terms_x Terms_y
# 0 a g g
# 1 b h h
# 2 c e e
# 3 c e f
# 4 c e g
# 5 c e h
# 6 c f e
# 7 c f f
# 8 c f g
# 9 c f h
# 10 c g e
# 11 c g f
# 12 c g g
# 13 c g h
# 14 c h e
# 15 c h f
# 16 c h g
# 17 c h h
# 18 d f f
# 19 d f g
# 20 d f h
# 21 d g f
# 22 d g g
# 23 d g h
# 24 d h f
# 25 d h g
# 26 d h h
# remove self loops and repeat pairings such as the second tuple in (u, v), (v, u)
valid = cooccurrences['Terms_x'] > cooccurrences['Terms_y']
valid_cooccurrences = cooccurrences[valid]
# Documents Terms_x Terms_y
# 6 c f e
# 10 c g e
# 11 c g f
# 14 c h e
# 15 c h f
# 16 c h g
# 21 d g f
# 24 d h f
# 25 d h g
counts = valid_cooccurrences.groupby(['Terms_x', 'Terms_y']).count()
# Documents
# Terms_x Terms_y
# f e 1
# g e 1
# f 2
# h e 1
# f 2
# g 2
documents = valid_cooccurrences.groupby(['Terms_x', 'Terms_y']).aggregate(lambda x : set(x))
# Documents
# Terms_x Terms_y
# f e {c}
# g e {c}
# f {d, c}
# h e {c}
# f {d, c}
# g {d, c}

Sorting a dataframe by another

I have an initial dataframe X:
x y z w
0 1 a b c
1 1 d e f
2 0 g h i
3 0 k l m
4 -1 n o p
5 -1 q r s
6 -1 t v à
with many columns and rows (this is a toy example). After applying some Machine Learning procedures, I get back a similar dataframe, but with the -1s changed to 0s or 1s and the rows sorted in a different way; for example:
x y z w
4 1 n o p
0 1 a b c
6 0 t v à
1 1 d e f
2 0 g h i
5 0 q r s
3 0 k l m
How could I do in order to sort the second dataframe as the first one? For example, like
x y z w
0 1 a b c
1 1 d e f
2 0 g h i
3 0 k l m
4 1 n o p
5 0 q r s
6 0 t v à
If you can't trust just sorting the indexes (e.g. if the first df's indexes are not sorted, or if you have something other than RangeIndex), just use loc
df2.loc[df.index]
x y z w
0 1 a b c
1 1 d e f
2 0 g h i
3 0 k l m
4 1 n o p
5 0 q r s
6 0 t v à
Use:
df.sort_index(inplace=True)
It restores the order, just by index

Clustering geodata into same size group with K-means in Python

I have the following sample geo-data which have 100 rows, and I want to cluster those POIs to 10 groups and 10 points in each group, also, if possible data in each group from same areas.
id areas lng lat
1010094160 A 116.31967 40.03229
1010737675 A 116.28941 40.03968
1010724217 A 116.32256 40.048
1010122181 A 116.28683 40.09652
1010732739 A 116.33482 40.06456
1010730289 A 116.3724 40.04066
1010737817 A 116.24174 40.074
1010124109 A 116.2558 40.08371
1010732695 B 116.31591 40.07096
1010112361 B 116.33331 39.96539
1010042095 B 116.31283 39.98804
1010097579 B 116.37637 39.98865
1010110203 B 116.41351 40.00851
1010085120 B 116.41364 39.98069
1010310183 B 116.42757 40.03738
1010087029 B 116.38947 39.97715
1010737155 B 116.38391 39.9849
1010729305 B 116.37803 40.04512
1010085100 B 116.37679 39.98838
1010750159 B 116.32162 39.98518
1010061742 B 116.31618 39.99087
1010091848 B 116.37617 39.97739
1010104343 C 116.3295 39.98156
1010091704 C 116.37236 39.9943
1010086652 C 116.36102 39.92978
1010030017 C 116.39017 39.99287
1010091851 C 116.35854 40.0063
1010705229 C 116.39114 39.97511
1010107321 C 116.42535 39.95417
1010130423 C 116.31651 40.04164
1010126133 C 116.29051 40.05081
1010177543 C 116.41114 39.99635
1010123271 C 116.35923 40.02031
1010315589 C 116.33906 39.99895
Here is the result expected
id areas lng lat clusterNumber
1010094160 A 116.31967 40.03229 0
1010737675 A 116.28941 40.03968 0
1010724217 A 116.32256 40.048 0
1010122181 A 116.28683 40.09652 0
1010732739 A 116.33482 40.06456 0
1010730289 A 116.3724 40.04066 0
1010737817 A 116.24174 40.074 0
1010124109 A 116.2558 40.08371 0
1010732695 B 116.31591 40.07096 0
1010112361 B 116.33331 39.96539 1
1010042095 B 116.31283 39.98804 1
1010097579 B 116.37637 39.98865 1
1010110203 B 116.41351 40.00851 1
1010085120 B 116.41364 39.98069 1
1010310183 B 116.42757 40.03738 1
1010087029 B 116.38947 39.97715 1
1010737155 B 116.38391 39.9849 1
1010729305 B 116.37803 40.04512 1
1010085100 B 116.37679 39.98838 1
1010750159 B 116.32162 39.98518 2
1010061742 B 116.31618 39.99087 2
1010091848 B 116.37617 39.97739 2
1010104343 C 116.3295 39.98156 2
1010091704 C 116.37236 39.9943 2
1010086652 C 116.36102 39.92978 2
1010030017 C 116.39017 39.99287 2
1010091851 C 116.35854 40.0063 2
1010705229 C 116.39114 39.97511 2
1010107321 C 116.42535 39.95417 2
1010130423 C 116.31651 40.04164 3
1010126133 C 116.29051 40.05081 3
I have tried with K-means, but I can't keep each group has same size. Are there any other better method I can use in Python? Please share your ideas and hint. Thanks.
Here is what I have tried:
X = []
for row in result:
X.append([float(row['lng']), float(row['lat'])])
X = np.array(X)
n_clusters = 100
cls = KMeans(n_clusters, random_state=0).fit(X)
#cls = EqualGroupsKMeans(n_clusters, random_state=0).fit(X)
#km1 = KMeans(n_clusters=6, n_init=25, max_iter = 600, random_state=0)
cls.labels_
markers = ['^','x','o','*','+', '+']
colors = ['b', 'c', 'g', 'k', 'm', 'r']
for i in range(n_clusters):
members = cls.labels_ == i
print(len(X[members,0]))
#plt.scatter(X[members,0],X[members,1],s=6,marker=markers[i],c=colors[i],alpha=0.5)
plt.scatter(X[members,0],X[members,1],s=6,marker="^",c=colors[i%6],alpha=0.5)
plt.title(' ')
plt.show()
Here is reference I find for Same-Size-K-Means from Github:
https://github.com/ndanielsen/Same-Size-K-Means

How to extract data recursively on Linux?

I'm attempting to work on a large dataset, however, the format structure of the data has been split up into hundreds of directories.
data/:
0 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q r s t u v w x y z
data/0:
0 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q r s symbols t u v w x y z
data/1:
0 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q r s symbols t u v w x y z
data/2:
0 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q r s symbols t u v w x y z
data/3:
0 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q r s symbols t u v w x y z
data/4:
0 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q r s symbols t u v w x y z
data/5:
0 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q r s symbols t u v w x y z
data/6:
0 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q r s symbols t u v w x y z
data/7:
0 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q r s symbols t u v w x y z
data/8:
0 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q r s symbols t u v w x y z
data/9:
0 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q r s symbols t u v w x y z
data/a:
0 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q r s symbols t u v w x y z
Furthermore, the file types are also completely random.
0: UTF-8 Unicode text
1: UTF-8 Unicode text
2: UTF-8 Unicode text
3: UTF-8 Unicode text
4: UTF-8 Unicode text
5: Non-ISO extended-ASCII text, with LF, NEL line terminators
6: UTF-8 Unicode text
7: UTF-8 Unicode text
8: UTF-8 Unicode text
9: UTF-8 Unicode text
a: UTF-8 Unicode text
...
z: UTF-8 Unicode text
The files contain a email:password format.
How can I get all of the content into a JSON file, or CSV file?
I'm looking to import the data to MongoDB.
Thanks.
I'm sure someone will help you better than I can but if I can point you in right direction I will.
Have you tried making a perl script? Ie
opendir(DIR, ".");
#files = grep(/\.cnf$/,readdir(DIR));
closedir(DIR);
foreach $file (#files) {
//shuv in a JSON file
}
Something like that?
The question was tagged with python, so I would recommend os.walk() (documentation) for recursively reading files. Something like:
# path is the path to the data
for subdir, dirs, files in os.walk(path):
for file in files:
file_path = os.path.join(subdir, file)
try:
read_file(file_path) # This is where you read the files and push to mongo etc
except:
continue
For the second part about reading Non-ISO extended-ASCII English text, there are some answers that might be helpful here: File encoding from English text to UTF-8

regex pattern won't return in python script

Why does the first snippet return digits, but the latter does not? I have tried more complicated expressions without success. The expressions I use are valid according to pythex.org, but do not work in the script.
(\d{6}-){7}\d{6}) is one such expression. I've tested it against this string: 123138-507716-007469-173316-534644-033330-675057-093280
import re
pattern = re.compile('(\d{1})')
load_file = open('demo.txt', 'r')
search_file = load_file.read()
result = pattern.findall(search_file)
print(result)
==============
import re
pattern = re.compile('(\d{6})')
load_file = open('demo.txt', 'r')
search_file = load_file.read()
result = pattern.findall(search_file)
print(result)
When I put the string into a variable and then search the variable it works just fine. This should work as is. But it doesn't help if I want to read a text file. I've tried to read each line of the file and that seems to be where the script breaks down.
import re
pattern = re.compile('((\d{6}-){7})')
#pattern = re.compile('(\d{6})')
#load_file = open('demo.txt', 'r')
#search_file = load_file.read()
test_string = '123138-507716-007469-173316-534644-033330-675057-093280'
result = pattern.findall(test_string)
print(result)
=========
printout,
Search File:
ÿþB i t L o c k e r D r i v e E n c r y p t i o n R e c o v e r y K e y
T h e r e c o v e r y k e y i s u s e d t o r e c o v e r t h e d a t a o n a B i t L o c k e r p r o t e c t e d d r i v e .
T o v e r i f y t h a t t h i s i s t h e c o r r e c t r e c o v e r y k e y c o m p a r e t h e i d e n t i f i c a t i o n w i t h w h a t i s p r e s e n t e d o n t h e r e c o v e r y s c r e e n .
R e c o v e r y k e y i d e n t i f i c a t i o n : f f s d f a - f s d f - s f
F u l l r e c o v e r y k e y i d e n t i f i c a t i o n : 8 8 8 8 8 8 8 8 - 8 8 8 8 - 8 8 8 8 - 8 8 8 8 - 8 8 8 8 8 8 8 8 8 8 8
B i t L o c k e r R e c o v e r y K e y :
1 1 1 1 1 1 - 1 1 1 1 1 1 - 1 1 1 1 1 1 - 1 1 1 1 1 1 - 1 1 1 1 1 1 - 1 1 1 1 1 1 - 1 1 1 1 1 1 - 1 1 1 1 1 1
6 6 6 6 6 6
Search Results:
[]
Process finished with exit code 0
================
This is where I ended up. It finds the string just fine and without the commas.
import re
pattern = re.compile('(\w{6}-\w{6}-\w{6}-\w{6}-\w{6}-\w{6}-\w{6}-\w{6})')
load_file = open('demo3.txt', 'r')
for line in load_file:
print(pattern.findall(line))

Categories

Resources