Creating columns with numpy Python - python

I have some elements stored in numpy.array[]. I wish to store them in a ".txt" file. The case is it needs to fit a certain standard, which means each element needs to be stored x lines into the file.
Example:
numpy.array[0] needs to start in line 1, col 26.
numpy.array[1] needs to start in line 1, col 34.
I use numpy.savetxt() to save the arrays to file.
Later I will implement this in a loop to create a lagre ".txt" file with coordinates.
Edit: This good example was provided below, it does point out my struggle:
In [117]: np.savetxt('test.txt',A.T,'%20d %10d')
In [118]: cat test.txt
0 6
1 7
2 8
3 9
4 10
5 11
The fmt option '%20d %10d' gives you spacing which depend on the last integer. What I need is an option which lets me set the spacing from the left side regardless of other integers.
Template is need to fit integers into:
XXXXXXXX.XXX YYYYYYY.YYY ZZZZ.ZZZ
Final Edit:
I solved it by creating a test which checks how many spaces the last float used. I was then able to predict the number of spaces the next float needed to fit the template.

Have you played with the fmt of np.savetxt?
Let me illustrate with a concrete example (the sort that you should have given us)
Make a 2 row array:
In [111]: A=np.arange((12)).reshape(2,6)
In [112]: A
Out[112]:
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11]])
Save it, and get 2 rows, 6 columns
In [113]: np.savetxt('test.txt',A,'%d')
In [114]: cat test.txt
0 1 2 3 4 5
6 7 8 9 10 11
save its transpose, and get 6 rows, 2 columns
In [115]: np.savetxt('test.txt',A.T,'%d')
In [116]: cat test.txt
0 6
1 7
2 8
3 9
4 10
5 11
Put more detail into fmt to space out the columns
In [117]: np.savetxt('test.txt',A.T,'%20d %10d')
In [118]: cat test.txt
0 6
1 7
2 8
3 9
4 10
5 11
I think you can figure out how to make a fmt string that puts your numbers in the correct columns (join 26 spaces etc, or use left and right justification - the usual Python formatting issues).
savetxt also takes an opened file. So you can open a file for writing, write one array, add some filler lines, and write another. Also, savetxt doesn't do anything fancy. It just iterates through the rows of the array, and writes each row to a line, e.g.
for row in A:
file.write(fmt % tuple(row))
So if you don't like the control that savetxt gives you, write the file directly.

Related

Printing the number of different numbers in python

I would like to ask a question please regarding printing the number of different numbers in python.
for example:
Let us say that I have the following list:
X = [5, 5, 5]
Since here we have only one number, I want to build a code that can recognize that we have only one number here so the output must be:
1
The number is: 5
Let us say that I have the following list:
X = [5,4,5]
Since here we have two numbers (5 and 4), I want to the code to recognize that we have only two numbers here so the output must be:
2
The numbers are: 4, 5
Let us say that I have the following list:
X = [24,24,24,24,24,24,24,24,26,26,26,26,26,26,26,26]
Since here we have two numbers (24 and 26), I want to the code to recognize that we have only two numbers here so the output must be:
2
The numbers are: 24, 26
You could keep track of unique numbers with a set object:
X = [1,2,3,3,3]
S = set(X)
n = len(S)
print(n, S) # 3 {1,2,3}
Bear in mind sets are unordered, so you would need to convert back to a list and sort them if needed.
you can change this list into set, it will remove duplicate, then you can change it again into list.
list(set(X))
You can try numpy.unique, and use len() on the result
May I ask you please if we can use set() to read the data in a specific column in pandas?
For example, I have the following the DataFrame:
df1= [ 0 -10 2 5
1 24 5 10
2 30 3 6
3 30 2 1
4 30 4 5 ]
where the first column is the index..
I tried first to isolate the second column
[-10
24
30
30
30]
using the following: x = pd.DataFrame(df1, coulmn=[0])
Then, I transposed the column using the following XX = x.T
Then, I used set() function.
However, instead of obtaining
[-10 24 30]
I got the following [0 1 2 3 4]
So set() read the index instead of reading the first column

Creating a DataFrame from a dictionary of Series results in lost indices and NaNs

dict_with_series = {'Even':pd.Series([2,4,6,8,10]),'Odd':pd.Series([1,3,5,7,9])}
Data_frame_using_dic_Series = pd.DataFrame(dict_with_series)
# Data_frame_using_dic_Series = pd.DataFrame(dict_with_series,index=\[1,2,3,4,5\]), gives a NaN value I dont know why
display(Data_frame_using_dic_Series)
I tried labeling the index but when i did it eliminates the first column and row instead it prints extra column and row at the bottom with NaN value. Can anyone explain me why is it behaving like this , have I done something wrong
If I don't use the index labeling argument it works fine
When you run:
Data_frame_using_dic_Series = pd.DataFrame(dict_with_series,index=[1,2,3,4,5])
You request to only use the indices 1-5 from the provided Series, but the original indexing of a Series is from 0, thus resulting in a reindexing.
If you want to change the index, do it afterwards:
Data_frame_using_dic_Series = (pd.DataFrame(dict_with_series)
.set_axis([1, 2, 3, 4, 5])
)
Output:
Even Odd
1 2 1
2 4 3
3 6 5
4 8 7
5 10 9

how to flip columns in reverse order using shell scripting/python

Dear experts i have a small problem where i just want to reverse the columns.For example i have a data sets arranged in 4 columns i need to put last column first, and so on reversely...how can this work be done...i hope some expert will definitely answer my questions.Thanks
in put data example
1 2 3 4 5
6 7 8 9 0
3 4 5 2 1
5 6 7 2 3
i need output like as below
5 4 3 2 1
0 9 8 7 6
1 2 5 4 3
3 2 7 6 5
Perl to the rescue!
perl -lane 'print join " ", reverse #F' < input-file
-n reads the file line by line, running the code specified after -e for each line
-l removes newlines from input and adds them to output
-a splits the input line on whitespace populating the #F array
reverse reverses the array
join turns a list to a string
What is the type of your data? In python, if your data is a Numpy array then just do data[:, ::-1]. It also work list (but for the first dimension obsviously. In fact it is the general behavior of Python Slice (first, end, stride), where first and last are omitted. It works with any object supported indexing.
But if it is the only data manipulation that you have to do, it may be overkill to use Python to do it. However, it may be more efficient than raw string manipulation (using perl or whatever) depending of the size of your data.

Read file with last col header spanning 2 column values in python

I have a tab delimited file and I wish I to read all col headers but the last 2 columns will have just one column header.
Example 1st row of file:
xx yy zz ii jj
5 5 10 2 a d
In my example, that will be colheader = jj and values will be a and d which spans 2 tabs. I tried with genfromtxt but it gives:
ValueError: Some errors were detected !
Line #2 (got 6 columns instead of 5).
I wish I can use numpy's genfromtxt due to my prior code but
any method will do right now. It seems difficult to use genfromtxt.
I expect a tuple of rows. At one point I got
[(5, 5, 10, 2, b'a') for 1st row but I wish I can get [(5, 5, 10, 2, ['a','d']) if possible
Thank you

How to save a large array in txt file in python

I have 2D array:
import numpy as np
output = np.array([1,1,6])*np.arange(6)[:,None]+1
output
Out[32]:
array([[ 1, 1, 1],
[ 2, 2, 7],
[ 3, 3, 13],
[ 4, 4, 19],
[ 5, 5, 25],
[ 6, 6, 31]])
I tried to use np.savetxt('file1.txt', output, fmt='%10d')
i have got the result in one line only
How can I save it in txt file simillar to :
x y z
1 1 1
2 2 7
3 3 13
4 4 19
5 5 25
6 6 31
3 separate columns, each column has name (x,y,z)
Please note: the original array too large (40000000 rows and 3 columns), I am using Python 3.6
I have tried the solutions in here and here but, it does not work with me
Noor, let me guess - you are using windows notepad to view the file?
I use Notepad++ which is smart enough to understand Unix-style-Lineendings which are used (by default) when creating files by np.savetxt() even when operated under windows.
You might want to explicitly specify newline="\r\n" when calling savetxt.
np.savetxt('file1.txt', output, fmt='%10d' ,header= " x y z", newline="\r\n")
Doku: https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.savetxt.html
I am not sure about your data, but this:
import numpy as np
output = np.array([1,1,6])*np.arange(60)[:,None]+1
print(output)
np.savetxt('file1.txt', output, fmt='%10d' ,header= " x y z")
Produces this output:
# x y z
1 1 1
2 2 7
3 3 13
=== snipped a few lines ===
58 58 343
59 59 349
60 60 355
for me.
for np.arange(1000000) its about 32MB big and similarly formatted...
for np.arange(10000000) its about 322MB big and similarly formatted...
willem-van-onsem 1+Gb was far closer.
I did not account for the spacing of fixed 10 chars per number, my bad.

Categories

Resources