I know there are related threads, but I've searched and none of them helped me with my problem.
I have a input text file with square matrices that looks as follows:
1 2 3
4 5 6
7 8 9
*
10 11 12 13 14
15 16 17 18 19
20 21 22 23 24
25 26 27 28 29
30 31 32 33 34
Now there could be more matrices following, all with a * in between them.
I want to separate the matrices (without using numpy!) and then using list within lists to work with them (list comprehension might be useful I guess), all the entries would be integers. The first matrix would then look as follows [[1,2,3],[4,5,6],[7,8,9]] and the second one would be [[10,11,12,13,14],[15,16,17,18,19],[20,21,22,23,24],[25,26,27,28,29],[31,31,32,33,34]] and so on.
My plan was to separate the matrices in their own list entries and then (using the fact that the matrices are square, and therefore it can easily be determined how many integers one list entry should contain) using some list comprehension to change the strings into integers. But I'm stuck at the very beginning.
matrix = open('input.txt','r').read()
matrix = matrix.replace('\n',' ')
list = matrix.split('* ')
Now if I print list I get ['1 2 3 4 5 6 7 8 9', '10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34']
The problem is that I'm now stuck with two strings instead of a list of integers.
Any suggestions?
mat_list = [[[int(num_str) for num_str in line.split()] for line in inner_mat.split('\n')] for inner_mat in open('input_mat.txt','r').read().split('\n*\n')]
[[[1, 2, 3], [4, 5, 6], [7, 8, 9]], [[10, 11, 12, 13, 14], [15, 16, 17, 18, 19], [20, 21, 22, 23, 24], [25, 26, 27, 28, 29], [30, 31, 32, 33, 34]]]
Now that you have strings, split each string on space and convert each little string to an integer. After that, you can convert each list of strings to a square "matrix" (list of lists).
An additional point: do not use list as a variable name. That conflicts with the built-in list type. In this code I used the name mylist.
newlist = [[int(nstr) for nstr in v.split()] for v in mylist]
# Now convert each list of strings to a square "matrix" (list of lists)
However, note that the approach given by #Selcuk in his comment is a better way to solve your overall problem. Though it would be a little easier to read the entire file into memory and split on the stars, then split each line into its integers. This would result in easier code but larger memory usage.
Related
Example of the code that works great
s="abc123"
swap_seq="103254"
swapped=''.join([s[int(i)] for i in swap_seq])
if s[0].isupper(): swapped.capitalize()
print (swapped)
but I have a large amount of characters that I need to be swapped 18 exactly, just wondering how to do the double digits
The example i'm trying to get it working with is the following.
s="0123456789abcdefgh"
swap_seq="1 0 3 2 5 4 7 6 9 8 11 10 13 12 15 14 17 16"
swapped=''.join([s[int(i)] for i in swap_seq])
if s[0].isupper(): swapped.capitalize()
print (swapped)
Tried ["0", "1", "2"]
but then the input would have to be exactly the same length or I get an error.
What i'm trying to do is have a user input anything up to 18 characters and the code will swap the letters/numbers around.
I think this is what you want, didn't try it. The way I read your code, you are indexing s, using the sequence outlined in swap_seq, so the split will give you a list of the positions, and i, instead of being each possible character in swap_seq, which includes spaces, will now be a list of the strings with the spaces taken out.
swapped=''.join([s[int(i)] for i in swap_seq.split()])
swap_seq shouldn't be a string. It's a sequence of index values so use a more appropriate data type, list a list of integers:
swap_seq=[1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14, 17, 16]
swapped=''.join([s[i] for i in swap_seq])
I was recently wondering how I could by-pass the following numpy behavior.
Starting with an simple example:
import numpy as np
a = np.array([[1,2,3,4,5,6,7,8,9,0], [11, 12, 13, 14, 15, 16, 17, 18, 19, 10]])
then:
b = a.copy()
b[:, [0,1,4,8]] = b[:, [0,1,4,8]] + 50
print(b)
...results in printing:
[[51 52 3 4 55 6 7 8 59 0]
[61 62 13 14 65 16 17 18 69 10]]
but also taking one index double into the slice then:
c = a.copy()
c[:, [0,1,4,4,8]] = c[:, [0,1,4,4,8]] + 50
print(c)
giving:
[[51 52 3 4 55 6 7 8 59 0]
[61 62 13 14 65 16 17 18 69 10]]
(in short; they do the same thing)
Could I also have that for index 4 it is executed 2 times?
Or more practically; Let the slice element i be given r times: Can we let the above expression be applied r times, instead of numpy just taking it once into account? Also if we replace "50" by something that differs for every occurance of i?
For my current code, I used:
w[p1] = w[p1] + D[pix]
where I define "pix", "p1" as some numpy arrays with dtype int, same length and some integers may appear multiple times.
(So one may have pix = [..., 1,1,1,2,2,3,...] at the same time as p1 = [..., 21,32,13,23,11,78,...], however, thus resulting on its own into taking for index 1 only the first 1 and the corresponding 21 and scraping the rest of the ones.)
Of course using a for loop would solve the problem easily. The point is that both the integers and the sizes of the arrays are huge, so it would cost a lot of computational resources to use for-loops instead of efficient numpy-array routines. Any ideas, links to existing documentation etc.?
I am doing some machine learning stuff in python/numpy in which I want to index a 2-dimensional ndarray with a 1-D ndarray, so that I get a 1-D array with the indexed values.
I got it to work with some ugly piece of code and I would like to know if there is a better way, because this just seems unnatural for such a nice language and module combination as python+numpy.
a = np.arange(50).reshape(10, 5) # Array to be indexed
b = np.arange(9, -1, -2) # Indexing array
print(a)
print(b)
print(a[b, np.arange(0, a.shape[1]).reshape(1,a.shape[1])])
#Prints:
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]
[25 26 27 28 29]
[30 31 32 33 34]
[35 36 37 38 39]
[40 41 42 43 44]
[45 46 47 48 49]]
[9 7 5 3 1]
[[45 36 27 18 9]]
This is exactly what I want(even though technically a 2-D ndarray), but this seems very complicated. Is there a neater and tidier way?
Edit:
To clarify, I actually I do not want a 1-D array, that was very poorly explained. I actually do want one dimension with length 1, because that is what I need for processing it later, but this would be easily achieved with a reshape() statement. Sorry for the confusion, I just mixed my actual code with the more general question.
You want a 1D array, yet you included a reshape call whose only purpose is to take the array from the format you want to a format you don't want.
Stop reshaping the arange output. Also, you don't need to specify the 0 start value explicitly:
result = a[b, np.arange(a.shape[1])]
You can just use np.diagonal to get what you want. No need of reshape or indexing. The tricky part here was to identify the pattern which you want to obtain which is basically the diagonal elements of a[b] matrix.
a = np.arange(50).reshape(10, 5) # Array to be indexed
b = np.arange(9, -1, -2) # Indexing array
print (np.diagonal(a[b]))
# [45 36 27 18 9]
As #user2357112 mentioned in the comments, the return of np.diagonal is read only. In my opinion, it would be a problem if you plan to append/modify the values to this final desired list. If you just want to print them or use them for some further indexing, it should be fine.
As per the docs
Starting in NumPy 1.9 it returns a read-only view on the original array. Attempting to write to the resulting array will produce an error.
In some future release, it will return a read/write view and writing to the returned array will alter your original array. The returned array will have the same type as the input array.
If you don’t write to the array returned by this function, then you can just ignore all of the above.
I couldn't get my 4 arrays of year, day of year, hour, and minute to concatenate the way I wanted, so I decided to test several variations on shorter arrays than my data.
I found that it worked using method "t" from my test code:
import numpy as np
a=np.array([[1, 2, 3, 4, 5, 6]])
b=np.array([[11, 12, 13, 14, 15, 16]])
c=np.array([[21, 22, 23, 24, 25, 26]])
d=np.array([[31, 32, 33, 34, 35, 36]])
print a
print b
print c
print d
q=np.concatenate((a, b, c, d), axis=0)
#concatenation along 1st axis
print q
t=np.concatenate((a.T, b.T, c.T, d.T), axis=1)
#transpose each array before concatenation along 2nd axis
print t
x=np.concatenate((a, b, c, d), axis=1)
#concatenation along 2nd axis
print x
But when I tried this with the larger arrays it behaved the same as method "q".
I found an alternative approach of using vstack over here that did what I wanted, but I am trying to figure out why concatenation sometimes works for this, but not always.
Thanks for any insights.
Also, here are the outputs of the code:
q:
[[ 1 2 3 4 5 6]
[11 12 13 14 15 16]
[21 22 23 24 25 26]
[31 32 33 34 35 36]]
t:
[[ 1 11 21 31]
[ 2 12 22 32]
[ 3 13 23 33]
[ 4 14 24 34]
[ 5 15 25 35]
[ 6 16 26 36]]
x:
[[ 1 2 3 4 5 6 11 12 13 14 15 16 21 22 23 24 25 26 31 32 33 34 35 36]]
EDIT: I added method t to the end of a section of the code that was already fixed with vstack, so you can compare how vstack will work with this data but not concatenate. Again, to clarify, I found a workaround already, but I don't know why the concatenate method doesn't seem to be consistent.
Here is the code:
import numpy as np
BAO10m=np.genfromtxt('BAO_010_2015176.dat', delimiter=",", usecols=range(0-6), dtype=[('h', int), ('year', int), ('day', int), ('time', int), ('temp', float)])
#10 meter weather readings at BAO tower site for June 25, 2015
hourBAO=BAO10m['time']/100
minuteBAO=BAO10m['time']%100
#print hourBAO
#print minuteBAO
#time arrays
dayBAO=BAO10m['day']
yearBAO=BAO10m['year']
#date arrays
datetimeBAO=np.vstack((yearBAO, dayBAO, hourBAO, minuteBAO))
#t=np.concatenate((a.T, b.T, c.T, d.T), axis=1) <this gave desired results in simple tests
#not working for this data, use vstack instead, with transposition after stack
print datetimeBAO
test=np.transpose(datetimeBAO)
#rotate array
print test
#this prints something that can be used for datetime
t=np.concatenate((yearBAO.T, dayBAO.T, hourBAO.T, minuteBAO.T), axis=1)
print t
#this prints a 1D array of all the year values, then all the day values, etc...
#but this method worked for shorter 1D arrays
The file I used can be found at this site.
Suppose I had a pandas series of dollar values and wanted to discretize into 9 groups using qcut. The # of observations is not divisible by 9. SQL Server's ntile function has a standard approach for this case: it makes the first n out of 9 groups 1 observation larger than the remaining (9-n) groups.
I noticed in pandas that the assignment of which groups had x observations vs x + 1 observations seemed random. I tried to decipher the code in algos to figure out how the quantile function deals with this issue but could not figure it out.
I have three related questions:
Any pandas developers out there than can explain qcut's behavior? Is it random which groups get the larger number of observations?
Is there a way to force qcut to behave similarly to NTILE (i.e., first groups get x + 1 observations)?
If the answer to #2 is no, any ideas on a function that would behave like NTILE? (If this is a complicated endeavor, just an outline to your approach would be helpful.)
Here is an example of SQL Server's NTILE output.
Bin |# Observations
1 26
2 26
3 26
4 26
5 26
6 26
7 26
8 25
9 25
Here is pandas:
Bin |# Observations
1 26
2 26
3 26
4 25 (Why is this 25 vs others?)
5 26
6 26
7 25 (Why is this 25 vs others?)
8 26
9 26
The qcut behaves like this because it's more accurate. Here is an example:
for the ith level, it starts at quantile (i-1)*10%:
import pandas as pd
import numpy as np
a = np.random.rand(26*10+3)
r = pd.qcut(a, 10)
np.bincount(r.labels)
the output is:
array([27, 26, 26, 26, 27, 26, 26, 26, 26, 27])
If you want NTILE, you can calculate the quantiles yourself:
n = len(a)
ngroup = 10
counts = np.ones(ngroup, int)*(n//ngroup)
counts[:n%ngroup] += 1
q = np.r_[0, np.cumsum(counts / float(n))]
q[-1] = 1.0
r2 = pd.qcut(a, q)
np.bincount(r2.labels)
the output is:
array([27, 27, 27, 26, 26, 26, 26, 26, 26, 26])