In Machine learning in action Chapter 2, one example reads records from file, each line like:
124 110 223 largeDoses
(forget its actual meaning)
One function in kNN.py is:
def file2matrix(filename):
fr = open(filename)
numberOfLines = len(fr.readlines())
returnMat = zeros((numberOfLines,3))
classLabelVector = []
fr = open(filename)
index = 0
for line in fr.readlines():
line = line.strip()
listFromLine = line.split('\t')
returnMat[index,:] = listFromLine[0:3]
classLabelVector.append(int(listFromLine[-1]))
index += 1
return returnMat,classLabelVector
The problem is listFromLine[-1] is a string ('largeDoses', etc.), how can it convert to int?
In the book, it says numpy can handle this.
(From the book : You have to explicitly tell the interpreter that you’d like the integer version of the last item in the list, or it will give you the string version. Usually, you’d have to do this, but NumPy takes care of those details for you.)
However,
ValueError: invalid literal for int() with base 10: 'largeDoses'
occurs for
import kNN
kNN.file2matrix('dataset.txt')
BTW, the book's Chinese version is different from English Version.
String (indeed) cannot convert to int, neither in python, nor in other environment,
however,
the solution is
Put Machine Learning (indeed) in action
In case all kNN-input training / cross-validation records ( a.k.a. observations, examples )
do conform to the convention of [ 3x FEATURE, 1x LABEL]
use:
classLabelVector.append( listFromLine[-1] ) # to .append a LABEL, not an int()
You should convert those 'largeDoses' 'smallDoses' 'didntLike' to the number by hand. String cannot convert to int unless the String inside is int.
if (listLine[-1]=='largeDoses'):
listLine[-1] = '3'
elif (listLine[-1]=='smallDoses'):
listLine[-1] = '2'
else:
listLine[-1] = '1'
It can be seen that instead of simply changing the string to integer data, it is changed to a table. So, the modification program is as follows.
labels = {'didntLike':1,'smallDoses':2,'largeDoses':3}
classLabelVector.append(labels[listFromLine[-1]])
Related
So I wanna store a long integer which is too big for one line in python. Do I just ignore PEP 8 and just make it longer than 120 characters? Cause if I do it like this:
num="""7316717653133062491922511967442657474235534919493496983520312774506326239578318016984801869478851843
8586156078911294949545950173795833195285320880551112540698747158523863050715693290963295227443043557
6689664895044524452316173185640309871112172238311362229893423380308135336276614282806444486645238749
3035890729629049156044077239071381051585930796086670172427121883998797908792274921901699720888093776
6572733300105336788122023542180975125454059475224352584907711670556013604839586446706324415722155397
5369781797784617406495514929086256932197846862248283972241375657056057490261407972968652414535100474
8216637048440319989000889524345065854122758866688116427171479924442928230863465674813919123162824586
1786645835912456652947654568284891288314260769004224219022671055626321111109370544217506941658960408
0719840385096245544436298123098787992724428490918884580156166097919133875499200524063689912560717606
0588611646710940507754100225698315520005593572972571636269561882670428252483600823257530420752963450"""
and try to access a specific index of that integer or use len() on it I get a length of 1009 instead of the 1000 digits the number actually has. And putting everything into one line would make that line 1004 characters long which doesn't seem that great either.
I would use the following literal over multiple lines in parentheses for cleanliness:
num = (
'7316717653'
'1330624919'
'2251196744'
)
so that len(num) from the above example returns: 30
Another option you have is to put the number into another file (say number.txt) and read it at runtime:
number.txt
7316717653133062491922511967442657474235534919493496983520312774506326239578318016984801869478851843858615607891129494954595017379583319528532088055111254069874715852386305071569329096329522744304355766896648950445244523161731856403098711121722383113622298934233803081353362766142828064444866452387493035890729629049156044077239071381051585930796086670172427121883998797908792274921901699720888093776657273330010533678812202354218097512545405947522435258490771167055601360483958644670632441572215539753697817977846174064955149290862569321978468622482839722413756570560574902614079729686524145351004748216637048440319989000889524345065854122758866688116427171479924442928230863465674813919123162824586178664583591245665294765456828489128831426076900422421902267105562632111110937054421750694165896040807198403850962455444362981230987879927244284909188845801561660979191338754992005240636899125607176060588611646710940507754100225698315520005593572972571636269561882670428252483600823257530420752963450
main.py
with open("number.txt", "r") as f:
number = f.read()
I wouldn't use this personally, but one option is to remove the newlines:
num = """
123
456
""".replace('\n', '')
print(repr(num)) # -> '123456'
There's lots of good answers already, but here's one that will give you a bit of extra convenience. You just have to put in a number and the size of the chunks per line, and you can reuse it for lots of long numbers, if needed:
Format your number into multiple strings using a for loop and string concatenation:
x = str(7316717653133062491922511967442657474235534919493496983520312774506326239578318016984801869478851843858615607891129494954595017379583319528532088055111254069874715852386305071569329096329522744304355766896648950445244523161731856403098711121722383113622298934233803081353362766142828064444866452387493035890729629049156044077239071381051585930796086670172427121883998797908792274921901699720888093776657273330010533678812202354218097512545405947522435258490771167055601360483958644670632441572215539753697817977846174064955149290862569321978468622482839722413756570560574902614079729686524145351004748216637048440319989000889524345065854122758866688116427171479924442928230863465674813919123162824586178664583591245665294765456828489128831426076900422421902267105562632111110937054421750694165896040807198403850962455444362981230987879927244284909188845801561660979191338754992005240636899125607176060588611646710940507754100225698315520005593572972571636269561882670428252483600823257530420752963450)
y = []
y.append("long_num = (")
chunksize = 10
for i in range(0, len(x), chunksize ):
y.append("\t"+"\""+x[i:i+chunksize ]+"\"")
y.append(")")
for part in y:
print (part)
Outputs the following string that you can use in your code, referencing #blhsing's answer:
long_num = (
"7316717653"
"1330624919"
"2251196744"
"2657474235"
"5349194934"
"9698352031"
"2774506326"
"2395783180"
"1698480186"
...
) ```
You can take a look at this post Is there a way to implement methods like __len__ or __eq__ as classmethods?
Simple make a class for your long integer, and replace the len(self) function to not count \n
I'm using a function to build an array of strings (which happens to be 0s and 1s only), which are rather large. The function works when I am building smaller strings, but somehow the data type seems to be restricting the size of the string to 32 characters long (U32), without my having asked for it. Am I missing something simple?
As I build the strings, I am first casting them as lists so as to more easily manipulate individual characters before joining them into a string again. Am I somehow limiting my ability to use 'larger' data types by my method? The value of np.max(CM1) in this case is something like ~300 (one recent run yielded 253), but the string only come out 32 characters long...
''' Function to derive genome and count mutations in provided list of cells '''
def derive_genome_biopsy(biopsy_list, family_dict, CM1):
derived_genomes_inBx = np.zeros(len(biopsy_list)).astype(str)
for position, cell in np.ndenumerate(biopsy_list):
if cell == 0: continue
temp_parent = 2
bitstring = list('1')
bitstring += (np.max(CM1)-1)*'0'
if cell == 1:
derived_genomes_inBx[position] = ''.join(bitstring)
continue
else:
while temp_parent > 1:
temp_parent = family_dict[cell]
bitstring[cell-1] = '1'
if temp_parent == 1: break
cell = family_dict[cell]
derived_genomes_inBx[position] = ''.join(bitstring)
return derived_genomes_inBx
The specific error message I get is:
Traceback (most recent call last):
File "biopsyCA.py", line 77, in <module>
if genome[site] == '1':
IndexError: string index out of range
family_dict is a dictionary which carries a list of parents and children that the algorithm above works through to reconstruct the 'genome' of individuals from the branching family tree. it basically sets positions in the bitstring to '1' if your parent had it, then if your grandparent etc... until you get to the first bit, which is always '1', then it should be done.
The 32 character limitation comes from the conversion of float64 array to string array in this line:
derived_genomes_inBx = np.zeros(len(biopsy_list)).astype(str)
The resulting array contains datatype S32 values which limit the contents to 32 characters.
To change this limit, use 'S300' or larger instead of str.
You may also use map(str, np.zeros(len(biopsy_list)) to get more flexible string list and convert it back to numpy array with numpy.array() after you have populated it.
Thanks to help from a number of folks here and local, I finally got this working and the working function is:
''' Function to derive genome and count mutations in provided list of cells '''
def derive_genome_biopsy(biopsy_list, family_dict, CM1):
derived_genomes_inBx = list(map(str, np.zeros(len(biopsy_list))))
for biopsy in range(0,len(biopsy_list)):
if biopsy_list[biopsy] == 0:
bitstring = (np.max(CM1))*'0'
derived_genomes_inBx[biopsy] = ''.join(bitstring)
continue
bitstring = list('1')
bitstring += (np.max(CM1)-1)*'0'
if biopsy_list[biopsy] == 1:
derived_genomes_inBx[biopsy] = ''.join(bitstring)
continue
else:
temp_parent = family_dict[biopsy_list[biopsy]]
bitstring[biopsy_list[biopsy]-1] = '1'
while temp_parent > 1:
temp_parent = family_dict[position]
bitstring[temp_parent-1] = '1'
if temp_parent == 1: break
derived_genomes_inBx[biopsy] = ''.join(bitstring)
return derived_genomes_inBx
The original problem was as Teppo Tammisto pointed out an issue with the 'str' datastructure taking 'S32' format. Once I changed to using the list(map(str, ...) functionality a few more issues arose with the original code, which I've now fixed. When I finish this thesis chapter I'll publish the whole family of functions to use to virtually 'biopsy' a cellular automaton model (well, just an array really) and reconstruct 'genomes' from family tree data and the current automaton state vector.
Thanks all!
I want to retrieve hexadecimal data from user, using python. How to retrieve the data from user and convert it to hex.
#to read varibales from Python
STX = '\xF7' #hex(input("enter STX Value"))
Deviceid = hex(input("enter device id"))
subid = hex(input("enter address of the Device and load details"))
Comnd = hex(41)
Data = hex(01)
EorCode = input("enter EOR Code")
ADD_sum = '\xF2' #hex(input("Enter Add sum value"))
tuple = (STX, Deviceid,subid,Comnd,Data,EorCode,ADD_sum)
print tuple
i am reading the above data from user,but i am getting output as follows
enter device id03
enter address of the Device and load details81
enter EOR Code32
('\xf7', '0x3', '0x51', '0x29', '0x1', '0x20', '\xf2')
But i need to be printed as 0x03 and 0x01.
I am very new to PYTHON please help.
You're looking for string formatting:
>>> "0x{0:04x}".format(42)
'0x002a'
So you'll want to modify your lines like so:
Deviceid = "0x{0:2x}".format((input("enter device id"))
Also, if any other Python developer will be looking at this code you may want to look at the Python style guide, PEP8.
Following the style guide, your code might look like this:
stx = '\xF7' # hex(input("enter STX Value"))
device_id = hex(input("enter device id")) # deviceid might also be fine
sub_id = hex(input("enter address of the Device and load details"))
comnd = hex(41)
data = hex(01)
eor_code = input("enter EOR Code")
add_sum = '\xF2' # hex(input("Enter Add sum value"))
values = (stx, device_id, sub_id, comnd, data, eor_code, add_sum)
print values # tuple is a keyword - it's best to *not* override them if possible
Of course,
A style guide is about consistency. Consistency with this style guide is important. Consistency within a project is more important. Consistency within one module or function is most important.
But most importantly: know when to be inconsistent -- sometimes the style guide just doesn't apply. When in doubt, use your best judgment. Look at other examples and decide what looks best. And don't hesitate to ask!
It seems to me that all you really need is to specify how to print the numbers, but hex function returns a string.
Because in python, '10' is a string and this is different from 10, which is an int. Python is dynamicaly, but strongly typed language.
So in order to have output you want, you may choose from 2 options:
write your own function to convert numbers to hexaxecimal numbers in a format you want and use it instead of hex:
def myhex(num):
return '0x%02x' % num
this 0x%02x means - first, 0x is just normal text which you probably want to prefix all your hexadecimal numbers, %02x means: print argument as hexadecimal number of length 2, prefixed with 0 if it's too short (one-digit hexadecimal number).
do not convert numbers to hex when reading values (it's probably a good thing to work with numbers represented as numbers) and print them formated to your specification at the end:
print '(' + ', '.join('%0x02x' % x for x in tuple) + ')'
which creates list of all values in tuple (btw, avoid using keywords as your variable names when possible) converted to correct 2-digit hexadecimal numbers with 0x prefixes, joins them using ', ' and surrounds them with parentheses. But feel free to change it - I'm just building on your example and trying to duplicate your output.
About a year back, I wrote a little program in python that basically automates a part of my job (with quite a bit of assistance from you guys!) However, I ran into a problem. As I kept making the program better and better, I realized that Python did not want to play nice with excel, and (without boring you with the details suffice to say xlutils will not copy formulas) I NEED to have more access to excel for my intentions.
So I am starting back at square one with VB (2010 Express if it helps.) The only programming course I ever took in my life was on it, and it was pretty straight forward so I decided I'd go back to it for this. Unfortunately, I've forgotten much of what I had learned, and we never really got this far down the rabbit hole in the first place. So, long story short I am trying to:
1) Read data from a .csv structured as so:
41,332.568825,22.221759,-0.489714,eow
42,347.142926,-2.488763,-0.19358,eow
46,414.9969,19.932693,1.306851,r
47,450.626074,21.878299,1.841957,r
48,468.909171,21.362568,1.741944,r
49,506.227269,15.441723,1.40972,r
50,566.199838,17.656284,1.719818,r
51,359.069935,-11.773073,2.443772,l
52,396.321911,-8.711589,1.83507,l
53,423.766684,-4.238343,1.85591,l
2) Sort that data alphabetically by column 5
3) Then selecting only the ones with an "l" in column 5, sort THOSE numerically by column 2 (ascending order) AND copy them to a new file called coil.csv
4) Then selecting only the ones that have an "r" in column 5, sort those numerically by column 2 (descending order) and copy them to the SAME file coil.csv (appended after the others obviously)
After all of that hoopla I wish to get out:
51,359.069935,-11.773073,2.443772,l
52,396.321911,-8.711589,1.83507,l
53,423.766684,-4.238343,1.85591,l
50,566.199838,17.656284,1.719818,r
49,506.227269,15.441723,1.40972,r
48,468.909171,21.362568,1.741944,r
47,450.626074,21.878299,1.841957,r
46,414.9969,19.932693,1.306851,r
I realize that this may be a pretty involved question, and I certainly understand if no one wants to deal with all this bs, lol. Anyway, some full on code, snippets, ideas or even relevant links would be GREATLY appreciated. I've been, and still am googling, but it's harder than expected to find good reliable information pertaining to this.
P.S. Here is the piece of python code that did what I am talking about (although it created two seperate files for the lefts and rights which I don't really need) - if it helps you at all.
msgbox(msg="Please locate your survey file in the next window.")
mainfile = fileopenbox(title="Open survey file")
toponame = boolbox(msg="What is the name of the shots I should use for topography? Note: TOPO is used automatically",choices=("Left","Right"))
fieldnames = ["A","B","C","D","E"]
surveyfile = open(mainfile, "r")
left_file = open("left.csv",'wb')
right_file = open("right.csv",'wb')
coil_file = open("coil1.csv","wb")
reader = csv.DictReader(surveyfile, fieldnames=fieldnames, delimiter=",")
left_writer = csv.DictWriter(left_file, fieldnames + ["F"], delimiter=",")
sortedlefts = sorted(reader,key=lambda x:float(x["B"]))
surveyfile.seek(0,0)
right_writer = csv.DictWriter(right_file, fieldnames + ["F"], delimiter=",")
sortedrights = sorted(reader,key=lambda x:float(x["B"]), reverse=True)
coil_writer = csv.DictWriter(coil_file, fieldnames, delimiter=",",extrasaction='ignore')
for row in sortedlefts:
if row["E"] == "l" or row["E"] == "cl+l":
row['F'] = '%s,%s' % (row['B'], row['D'])
left_writer.writerow(row)
coil_writer.writerow(row)
for row in sortedrights:
if row["E"] == "r":
row['F'] = '%s,%s' % (row['B'], row['D'])
right_writer.writerow(row)
coil_writer.writerow(row)
One option you have is to start with a class to hold the fields. This allows you to override the ToString method to facilitate the output. Then, it's a fairly simple matter of reading each line and assigning the values to a list of the class. In your case you'll want the extra step of making 2 lists sorting one descending and combining them:
Class Fields
Property A As Double = 0
Property B As Double = 0
Property C As Double = 0
Property D As Double = 0
Property E As String = ""
Public Overrides Function ToString() As String
Return Join({A.ToString, B.ToString, C.ToString, D.ToString, E}, ",")
End Function
End Class
Function SortedFields(filename As String) As List(Of Fields)
SortedFields = New List(Of Fields)
Dim test As New List(Of Fields)
Dim sr As New IO.StreamReader(filename)
Using sr As New IO.StreamReader(filename)
Do Until sr.EndOfStream
Dim fieldarray() As String = sr.ReadLine.Split(","c)
If fieldarray.Length = 5 AndAlso Not fieldarray(4)(0) = "e"c Then
If fieldarray(4) = "r" Then
test.Add(New Fields With {.A = Double.Parse(fieldarray(0)), .B = Double.Parse(fieldarray(1)), .C = Double.Parse(fieldarray(2)), .D = Double.Parse(fieldarray(3)), .E = fieldarray(4)})
Else
SortedFields.Add(New Fields With {.A = Double.Parse(fieldarray(0)), .B = Double.Parse(fieldarray(1)), .C = Double.Parse(fieldarray(2)), .D = Double.Parse(fieldarray(3)), .E = fieldarray(4)})
End If
End If
Loop
End Using
SortedFields = SortedFields.OrderBy(Function(x) x.B).Concat(test.OrderByDescending(Function(x) x.B)).ToList
End Function
One simple way of writing the data to a csv file is to use the IO.File.WriteAllLines methods and the ConvertAll method of the List:
IO.File.WriteAllLines(" coil.csv", SortedFields("textfile1.txt").ConvertAll(New Converter(Of Fields, String)(Function(x As Fields) x.ToString)))
You'll notice how the ToString method facilitates this quite easily.
If the class will only be used for this you do have the option to make all the fields string.
I'm rewriting some code from Ruby to Python. The code is for a Perceptron, listed in section 8.2.6 of Clever Algorithms: Nature-Inspired Programming Recipes. I've never used Ruby before and I don't understand this part:
def test_weights(weights, domain, num_inputs)
correct = 0
domain.each do |pattern|
input_vector = Array.new(num_inputs) {|k| pattern[k].to_f}
output = get_output(weights, input_vector)
correct += 1 if output.round == pattern.last
end
return correct
end
Some explanation: num_inputs is an integer (2 in my case), and domain is a list of arrays: [[1,0,1], [0,0,0], etc.]
I don't understand this line:
input_vector = Array.new(num_inputs) {|k| pattern[k].to_f}
It creates an array with 2 values, every values |k| stores pattern[k].to_f, but what is pattern[k].to_f?
Try this:
input_vector = [float(pattern[i]) for i in range(num_inputs)]
pattern[k].to_f
converts pattern[k] to a float.
I'm not a Ruby expert, but I think it would be something like this in Python:
def test_weights(weights, domain, num_inputs):
correct = 0
for pattern in domain:
output = get_output(weights, pattern[:num_inputs])
if round(output) == pattern[-1]:
correct += 1
return correct
There is plenty of scope for optimising this: if num_inputs is always one less then the length of the lists in domain then you may not need that parameter at all.
Be careful about doing line by line translations from one language to another: that tends not to give good results no matter what languages are involved.
Edit: since you said you don't think you need to convert to float you can just slice the required number of elements from the domain value. I've updated my code accordingly.