Related
In python would it be possible to write a program which reads the data from a .txt file and turn that data into lists, make calculations, etc...
For example if you had a .txt file that read:
LINE1| 2011 18.5 11.8 19.7 31.6 26.6 37.3 37.4 24.6 34.0 71.3 46.3 28.4
LINE2| 2012 55.2 92.0 87.5 81.0 83.5 79.6 115.1 112.7 115.7 112.7 136.2 127.4
Could you compute the average for the numbers assigned to each year?
(Note: I'm using version 3.4)
You can open file and read lines and then split them and convert them to list. You can use numpy to calculate the mean. Hope the following code helps
import numpy as np
text_f = open('file.txt').readlines()
text_f.remove('\n')
arrays= []
for i in text_f:
new = i[6:]
arrays.append(list(map(float, new.split())))
avrgs = {}
for i in arrays:
avrgs[i[0]] = np.mean(i[1:])
avrgs
Output:
{2011.0: 32.291666666666664, 2012.0: 99.88333333333334}
First you need to read the txt file and format properly by stripping any newline or whitespace characters:
with open(name.txt) as f:
c = f.readlines()
c = [i.strip() for i in c]
Right now,
c = ['2011 18.5 11.8 19.7 31.6 26.6 37.3 37.4 24.6 34.0 71.3 46.3 28.4', '2012 55.2 92.0 87.5 81.0 83.5 79.6 115.1 112.7 115.7 112.7 136.2 127.4']
Now, that you have each line into the a list, you now have to split each string within the list into a list and convert the strings into floats:
for i in range(len(c)):
c[i] = map(float, c[i].split(" "))
Now, we have
c = [[2011.0, 18.5, 11.8, 19.7, 31.6, 26.6, 37.3, 37.4, 24.6, 34.0, 71.3, 46.3, 28.4], [2012.0, 55.2, 92.0, 87.5, 81.0, 83.5, 79.6, 115.1, 112.7, 115.7, 112.7, 136.2, 127.4]]
Now, you know that the first index of each sublist in c is the year. The best data structure to store this is a dictionary where the key is the year and the value is the average.
year_avg = dict()
for arr in c:
year_avg[arr[0]] = sum(arr[1:]) / len(arr[1:])
And you now have:
year_avg = {2011.0: 32.291666666666664, 2012.0: 99.88333333333334}
For reference, the entire code:
with open("file_name.txt") as f: # Open the file
c = f.readlines() # Read all the files into a variable
c = [i.strip() for i in c] # Format the string properly
for i in range(len(c)):
c[i] = map(float, c[i].split(" ")) # Split each line into list and convert values to floats
year_avg = dict() # Initialize dictionary to store averages
for arr in c: # Iterate over the list
year_avg[arr[0]] = sum(arr[1:]) / len(arr[1:]) # We know that the first index is the year (becomes the key) and find the average from the remaining numbers.
print year_avg
I have a data file looks like this,
# some text
# some text
# some text
100000 3 4032
1 0.0125 101.27 293.832
2 0.0375 108.624 292.285
3 0.0625 84.13 291.859
200000 3 4032
4 0.0125 101.27 293.832
5 0.0375 108.624 292.285
6 0.0625 84.13 291.859
300000 3 4032
7 0.0125 101.27 293.832
8 0.0375 108.624 292.285
9 0.0625 84.13 291.859
........
I want to read these data in to an array for further processing. However I only need data with four columns. Therefore, either I have to skip three column data or store them in a different array. Since this data file is large and repeating the same way, it would be easier if I could read this in one shot.
I have tried numpy.genfromtxt(file) with itertools.islice(file,4,7) however couldn't find a way to store all the four column data to a single array(because of the three column data in between).
Any help regarding this would be greatly appreciated.
Thanks!
import itertools as IT
import numpy as np
arr=[]
with open('data.txt', 'rb') as f:
ln = IT.islice(f, 4, 7)
arr.append(np.genfromtxt(ln))
ln = IT.islice(f, 1, 4)
arr.append(np.genfromtxt(ln))
ln = IT.islice(f, 1, 4)
arr.append(np.genfromtxt(ln))
print arr
This code works however my data file is much larger than above example. Therefore, I don't want to repeat the code as it will not be efficient. Is there a more elegant way to achieve this?
This seems to be what you want.
from io import StringIO
dataFile = StringIO('''\
# some text
# some text
# some text
100000 3 4032
1 0.0125 101.27 293.832
2 0.0375 108.624 292.285
3 0.0625 84.13 291.859
200000 3 4032
4 0.0125 101.27 293.832
5 0.0375 108.624 292.285
6 0.0625 84.13 291.859
300000 3 4032
7 0.0125 101.27 293.832
8 0.0375 108.624 292.285
9 0.0625 84.13 291.859''')
def wantedLines():
count = -1
with dataFile as data:
while True:
line = data.readline()
if line: line = line.strip()
else: break
if line.startswith('#'): continue
else:
count +=1
if count % 4==0: continue
else: yield line.encode()
import numpy as np
result = np.genfromtxt(wantedLines())
print (result)
result:
[[ 1.00000000e+00 1.25000000e-02 1.01270000e+02 2.93832000e+02]
[ 2.00000000e+00 3.75000000e-02 1.08624000e+02 2.92285000e+02]
[ 3.00000000e+00 6.25000000e-02 8.41300000e+01 2.91859000e+02]
[ 4.00000000e+00 1.25000000e-02 1.01270000e+02 2.93832000e+02]
[ 5.00000000e+00 3.75000000e-02 1.08624000e+02 2.92285000e+02]
[ 6.00000000e+00 6.25000000e-02 8.41300000e+01 2.91859000e+02]
[ 7.00000000e+00 1.25000000e-02 1.01270000e+02 2.93832000e+02]
[ 8.00000000e+00 3.75000000e-02 1.08624000e+02 2.92285000e+02]
[ 9.00000000e+00 6.25000000e-02 8.41300000e+01 2.91859000e+02]]
i am implementing the Jacobi iterative method
The problem is i can not store the calculated matrix after each iteration, i tried to append into an empty list but it keeps overwriting the previous elements in that list and i end up with a single matrix repeated K times.
I need to subtract and operate on those matrices for convergence criteria
# Iterate Jacobi until convergence
U = np.array([[8.9,8.9,8.9,8.9,8.9],[8.4,0,0,0,9.2],[7.2,0,0,0,9.4],[6.1,6.8,7.7,8.7,6.1]])
UI=U
UF=U
UFK=[]
k=0
while k<3:
k=k+1 # update the iteration counter
for i in range (1,Nx-1):
for j in range (1,Ny-1):
UF[j,i] = (UI[j+1,i]+UI[j,i+1]+UI[j-1,i]+UI[j,i-1])*0.25 #the matrix i want to store after each iteration
UFK.append(UF) #
print (UF) # when i print UF i get the correct matrix at each iteration displayed
[[ 8.9 8.9 8.9 8.9 8.9 ]
[ 8.4 4.325 3.30625 5.3515625 9.2 ]
[ 7.2 4.58125 3.896875 6.83710938 9.4 ]
[ 6.1 6.8 7.7 8.7 6.1 ]]
[[ 8.9 8.9 8.9 8.9 8.9 ]
[ 8.4 6.296875 6.11132812 7.76210937 9.2 ]
[ 7.2 6.0484375 6.67421875 8.13408203 9.4 ]
[ 6.1 6.8 7.7 8.7 6.1 ]]
[[ 8.9 8.9 8.9 8.9 8.9 ]
[ 8.4 7.36494141 7.67531738 8.47734985 9.2 ]
[ 7.2 7.00979004 7.62979736 8.5517868 9.4 ]
[ 6.1 6.8 7.7 8.7 6.1 ]]
print(UFK) # when i display the appended UFK it is just repeating a single matrix 3 times
[array([[ 8.9 , 8.9 , 8.9 , 8.9 , 8.9 ],
[ 8.4 , 7.36494141, 7.67531738, 8.47734985, 9.2 ],
[ 7.2 , 7.00979004, 7.62979736, 8.5517868 , 9.4 ],
[ 6.1 , 6.8 , 7.7 , 8.7 , 6.1 ]]),
array([[ 8.9 , 8.9 , 8.9 , 8.9 , 8.9 ],
[ 8.4 , 7.36494141, 7.67531738, 8.47734985, 9.2 ],
[ 7.2 , 7.00979004, 7.62979736, 8.5517868 , 9.4 ],
[ 6.1 , 6.8 , 7.7 , 8.7 , 6.1 ]]),
array([[ 8.9 , 8.9 , 8.9 , 8.9 , 8.9 ],
[ 8.4 , 7.36494141, 7.67531738, 8.47734985, 9.2 ],
[ 7.2 , 7.00979004, 7.62979736, 8.5517868 , 9.4 ],
[ 6.1 , 6.8 , 7.7 , 8.7 , 6.1 ]])]
UI=U # why? UI is not a copy of U, it IS U
# UF=U # another why? Changes of UF will change UI and U as well
UFK=[] # appending to a list is great
k=0
while k<3:
k=k+1 # update the iteration counter
UF = np.zeros_like(U) # a fresh copy for iteration
for i in range (1,Nx-1):
for j in range (1,Ny-1):
UF[j,i] = (UI[j+1,i]+UI[j,i+1]+UI[j-1,i]+UI[j,i-1])*0.25
UFK.append(UF) #
print (UF)
print(UFK)
UFK should now be a list of the k UF arrays.
Since you are overwriting all elements of UF it doesn't matter how it it is initialed, just so long as it does not step on other arrays, including UF from previous iterations.
But on further thought, maybe changing UI is part of the plan. If so, why obscure the fact with the UF and UI variables? In this case you can collect the intermediate iterations with a U.copy() - that is, save a copy of U to the list, rather than the U itself.
for i... :
for j....:
U[j,i] = (U[j+1,i]+U[j,i+1]+U[j-1,i]+U[j,i-1])*0.25
UFK.append(U.copy())
print (U)
A list contains pointers to objects. If I write
alist = [U, U, U]
U[0,0] = 10000
that 10000 will appear in all 3 elements of the list - because they are the same thing.
In your code you case UF to the list, and then modify it at each iteration. The result is that your list just contains k pointers to the same array.
You have to set the dimension of UFK before you append it or you always replicate the same matrix several times. The following code can generate the output correctly:
UFK = np.array([]).reshape(0,5)
k = 0
while k < 3:
k += 1
for i in range(1, Nx-1):
for j in range(1, Ny-1):
UF[j, i] = (UI[j+1, i] + UI[j, i+1] + UI[j-1, i] + UI[j, i-1]) * 0.25
UFK = np.append(UFK, UF, axis=0)
Another way to append the array is UFK = np.vstack((UFK, UF)) which will give you the same result.
I'm trying to create a table of cosines using numpy in python. I want to have the angle next to the cosine of the angle, so it looks something like this:
0.0 1.000 5.0 0.996 10.0 0.985 15.0 0.966
20.0 0.940 25.0 0.906 and so on.
I'm trying to do it using a for loop but I'm not sure how to get this to work.
Currently, I have .
Any suggestions?
Let's say you have:
>>> d = np.linspace(0, 360, 10, endpoint=False)
>>> c = np.cos(np.radians(d))
If you don't mind having some brackets and such on the side, then you can simply concatenate column-wise using np.c_, and display:
>>> print(np.c_[d, c])
[[ 0.00000000e+00 1.00000000e+00]
[ 3.60000000e+01 8.09016994e-01]
[ 7.20000000e+01 3.09016994e-01]
[ 1.08000000e+02 -3.09016994e-01]
[ 1.44000000e+02 -8.09016994e-01]
[ 1.80000000e+02 -1.00000000e+00]
[ 2.16000000e+02 -8.09016994e-01]
[ 2.52000000e+02 -3.09016994e-01]
[ 2.88000000e+02 3.09016994e-01]
[ 3.24000000e+02 8.09016994e-01]]
But if you care about removing them, one possibility is to use a simple regex:
>>> import re
>>> print(re.sub(r' *\n *', '\n',
np.array_str(np.c_[d, c]).replace('[', '').replace(']', '').strip()))
0.00000000e+00 1.00000000e+00
3.60000000e+01 8.09016994e-01
7.20000000e+01 3.09016994e-01
1.08000000e+02 -3.09016994e-01
1.44000000e+02 -8.09016994e-01
1.80000000e+02 -1.00000000e+00
2.16000000e+02 -8.09016994e-01
2.52000000e+02 -3.09016994e-01
2.88000000e+02 3.09016994e-01
3.24000000e+02 8.09016994e-01
I'm removing the brackets, and then passing it to the regex to remove the spaces on either side in each line.
np.array_str also lets you set the precision. For more control, you can use np.array2string instead.
Side-by-Side Array Comparison using Numpy
A built-in Numpy approach using the column_stack((...)) method.
numpy.column_stack((A, B)) is a column stack with Numpy which allows you to compare two or more matrices/arrays.
Use the numpy.column_stack((A, B)) method with a tuple. The tuple must be represented with () parenthesizes representing a single argument with as many matrices/arrays as you want.
import numpy as np
A = np.random.uniform(size=(10,1))
B = np.random.uniform(size=(10,1))
C = np.random.uniform(size=(10,1))
np.column_stack((A, B, C)) ## <-- Compare Side-by-Side
The result looks like this:
array([[0.40323596, 0.95947336, 0.21354263],
[0.18001121, 0.35467198, 0.47653884],
[0.12756083, 0.24272134, 0.97832504],
[0.95769626, 0.33855075, 0.76510239],
[0.45280595, 0.33575171, 0.74295859],
[0.87895151, 0.43396391, 0.27123183],
[0.17721346, 0.06578044, 0.53619146],
[0.71395251, 0.03525021, 0.01544952],
[0.19048783, 0.16578012, 0.69430883],
[0.08897691, 0.41104408, 0.58484384]])
Numpy column_stack is useful for AI/ML applications when comparing the predicted results with the expected answers. This determines the effectiveness of the Neural Net training. It is a quick way to detect where errors are in the network calculations.
Pandas is very convenient module for such tasks:
In [174]: import pandas as pd
...:
...: x = pd.DataFrame({'angle': np.linspace(0, 355, 355//5+1),
...: 'cos': np.cos(np.deg2rad(np.linspace(0, 355, 355//5+1)))})
...:
...: pd.options.display.max_rows = 20
...:
...: x
...:
Out[174]:
angle cos
0 0.0 1.000000
1 5.0 0.996195
2 10.0 0.984808
3 15.0 0.965926
4 20.0 0.939693
5 25.0 0.906308
6 30.0 0.866025
7 35.0 0.819152
8 40.0 0.766044
9 45.0 0.707107
.. ... ...
62 310.0 0.642788
63 315.0 0.707107
64 320.0 0.766044
65 325.0 0.819152
66 330.0 0.866025
67 335.0 0.906308
68 340.0 0.939693
69 345.0 0.965926
70 350.0 0.984808
71 355.0 0.996195
[72 rows x 2 columns]
You can use python's zip function to go through the elements of both lists simultaneously.
import numpy as np
degreesVector = np.linspace(0.0, 360.0, 73.0)
cosinesVector = np.cos(np.radians(degreesVector))
for d, c in zip(degreesVector, cosinesVector):
print d, c
And if you want to make a numpy array out of the degrees and cosine values, you can modify the for loop in this way:
table = []
for d, c in zip(degreesVector, cosinesVector):
table.append([d, c])
table = np.array(table)
And now on one line!
np.array([[d, c] for d, c in zip(degreesVector, cosinesVector)])
You were close - but if you iterate over angles, just generate the cosine for that angle:
In [293]: for angle in range(0,60,10):
...: print('{0:8}{1:8.3f}'.format(angle, np.cos(np.radians(angle))))
...:
0 1.000
10 0.985
20 0.940
30 0.866
40 0.766
50 0.643
To work with arrays, you have lots of options:
In [294]: angles=np.linspace(0,60,7)
In [295]: cosines=np.cos(np.radians(angles))
iterate over an index:
In [297]: for i in range(angles.shape[0]):
...: print('{0:8}{1:8.3f}'.format(angles[i],cosines[i]))
Use zip to dish out the values 2 by 2:
for a,c in zip(angles, cosines):
print('{0:8}{1:8.3f}'.format(a,c))
A slight variant on that:
for ac in zip(angles, cosines):
print('{0:8}{1:8.3f}'.format(*ac))
You could concatenate the arrays together into a 2d array, and display that:
In [302]: np.vstack((angles, cosines)).T
Out[302]:
array([[ 0. , 1. ],
[ 10. , 0.98480775],
[ 20. , 0.93969262],
[ 30. , 0.8660254 ],
[ 40. , 0.76604444],
[ 50. , 0.64278761],
[ 60. , 0.5 ]])
In [318]: print(np.vstack((angles, cosines)).T)
[[ 0. 1. ]
[ 10. 0.98480775]
[ 20. 0.93969262]
[ 30. 0.8660254 ]
[ 40. 0.76604444]
[ 50. 0.64278761]
[ 60. 0.5 ]]
np.column_stack can do that without the transpose.
And you can pass that array to your formatting with:
for ac in np.vstack((angles, cosines)).T:
print('{0:8}{1:8.3f}'.format(*ac))
or you could write that to a csv style file with savetxt (which just iterates over the 'rows' of the 2d array and writes with fmt):
In [310]: np.savetxt('test.txt', np.vstack((angles, cosines)).T, fmt='%8.1f %8.3f')
In [311]: cat test.txt
0.0 1.000
10.0 0.985
20.0 0.940
30.0 0.866
40.0 0.766
50.0 0.643
60.0 0.500
Unfortunately savetxt requires the old style formatting. And trying to write to sys.stdout runs into byte v unicode string issues in Py3.
Just in numpy with some format ideas, to use #MaxU 's syntax
a = np.array([[i, np.cos(np.deg2rad(i)), np.sin(np.deg2rad(i))]
for i in range(0,361,30)])
args = ["Angle", "Cos", "Sin"]
frmt = ("{:>8.0f}"+"{:>8.3f}"*2)
print(("{:^8}"*3).format(*args))
for i in a:
print(frmt.format(*i))
Angle Cos Sin
0 1.000 0.000
30 0.866 0.500
60 0.500 0.866
90 0.000 1.000
120 -0.500 0.866
150 -0.866 0.500
180 -1.000 0.000
210 -0.866 -0.500
240 -0.500 -0.866
270 -0.000 -1.000
300 0.500 -0.866
330 0.866 -0.500
360 1.000 -0.000
I would like to read a data grid (3D array of floats) from .xsf file. (format documentation is here http://www.xcrysden.org/doc/XSF.html the BEGIN_BLOCK_DATAGRID_3D block )
the problem is that data are in 5 columns and if the number of elements Nx*Ny*Nz is not divisible by 5 than the last line can have any length.
For this reason I'm not able to use numpy.genfromtxt() of numpy.loadtxt() ...
I made a subroutine which does solve the problem, but is terribly slow ( because it use tight loops probably ). The files i want to read are large ( >200 MB 200x200x200 = 8000000 numbers in ASCII )
Is there any really fast way how to read such unfriendly formats in python / numpy into ndarray?
xsf datagrids looks like this (example for shape=(3,3,3))
BEGIN_BLOCK_DATAGRID_3D
BEGIN_DATAGRID_3D_this_is_3Dgrid
3 3 3 # number of elements Nx Ny Nz
0.0 0.0 0.0 # grid origin in real space
1.0 0.0 0.0 # grid size in real space
0.0 1.0 0.0
0.0 0.0 1.0
0.000 1.000 2.000 5.196 8.000 # data in 5 columns
1.000 1.414 2.236 5.292 8.062
2.000 2.236 2.828 5.568 8.246
3.000 3.162 3.606 6.000 8.544
4.000 4.123 4.472 6.557 8.944
1.000 1.414 # this is the problem
END_DATAGRID_3D
END_BLOCK_DATAGRID_3D
I got something working with Pandas and Numpy. Pandas will fill in nan values for the missing data.
import pandas as pd
import numpy as np
df = pd.read_csv("xyz.data", header=None, delimiter=r'\s+', dtype=np.float, skiprows=7, skipfooter=2)
data = df.values.flatten()
data = data[~np.isnan(data)]
result = data.reshape((data.size/3, 3))
Output
>>> result
array([[ 0. , 1. , 2. ],
[ 5.196, 8. , 1. ],
[ 1.414, 2.236, 5.292],
[ 8.062, 2. , 2.236],
[ 2.828, 5.568, 8.246],
[ 3. , 3.162, 3.606],
[ 6. , 8.544, 4. ],
[ 4.123, 4.472, 6.557],
[ 8.944, 1. , 1.414]])