Read file with last col header spanning 2 column values in python

Read file with last col header spanning 2 column values in python - python

I have a tab delimited file and I wish I to read all col headers but the last 2 columns will have just one column header.
Example 1st row of file:
xx yy zz ii jj
5 5 10 2 a d
In my example, that will be colheader = jj and values will be a and d which spans 2 tabs. I tried with genfromtxt but it gives:
ValueError: Some errors were detected !
Line #2 (got 6 columns instead of 5).
I wish I can use numpy's genfromtxt due to my prior code but
any method will do right now. It seems difficult to use genfromtxt.
I expect a tuple of rows. At one point I got
[(5, 5, 10, 2, b'a') for 1st row but I wish I can get [(5, 5, 10, 2, ['a','d']) if possible
Thank you

Related

Creating a DataFrame from a dictionary of Series results in lost indices and NaNs

dict_with_series = {'Even':pd.Series([2,4,6,8,10]),'Odd':pd.Series([1,3,5,7,9])}
Data_frame_using_dic_Series = pd.DataFrame(dict_with_series)
# Data_frame_using_dic_Series = pd.DataFrame(dict_with_series,index=\[1,2,3,4,5\]), gives a NaN value I dont know why
display(Data_frame_using_dic_Series)
I tried labeling the index but when i did it eliminates the first column and row instead it prints extra column and row at the bottom with NaN value. Can anyone explain me why is it behaving like this , have I done something wrong
If I don't use the index labeling argument it works fine

When you run:
Data_frame_using_dic_Series = pd.DataFrame(dict_with_series,index=[1,2,3,4,5])
You request to only use the indices 1-5 from the provided Series, but the original indexing of a Series is from 0, thus resulting in a reindexing.
If you want to change the index, do it afterwards:
Data_frame_using_dic_Series = (pd.DataFrame(dict_with_series)
.set_axis([1, 2, 3, 4, 5])
)
Output:
Even Odd
1 2 1
2 4 3
3 6 5
4 8 7
5 10 9

Pandas query not working with square brackets in column name

I have to evaluate a lot of csv files. The columns of the files are always in a different order because some columns were removed and some new were added. Some columns are in every file and have the same name, therefore I want to switch from numpy to pandas because it's possible to access the data by the column name.
I want to calculate the average of a column dependent on the values in another column.
First I want to filter the values:
import pandas as pd
d = {"Y Position [0] [mm]": [1, 2, 3, 4, 5], "Y Position [1] [mm]": [6, 7, 8, 9, 0]}
df = pd.DataFrame(data=d)
dq = df.query("`Y Position [0] [mm]` > 2")
print(dq)
But I get this error:
File "<unknown>", line 1
Y_Position_[_0_]_[_mm_]_BACKTICK_QUOTED_STRING >2
^
SyntaxError: invalid syntax
When I remove the square brackets it works fine:
Y Position 0 Y Position [1] [mm]
2 3 8
3 4 9
4 5 0
I checked the documentation but I could not find a reason why it should not work.

How do I reverse the first four elements of the 1st axis and reversing the 2nd axis of a numpy array in a single operation?

I have a numpy array M of shape (n, 1000, 6). This can be thought of as n matrices with 1000 rows and 6 columns. For each matrix I would like to reverse the order of the rows (i.e. the top row is now at the bottom and vice versa) and then reverse the order of just the first 4 columns (so column 0 is now column 3, column 1 is column 2, column 2 is column 1 and column 3 is column 0 but column 4 is still column 4 and column 5 is still column 5). I would like to do this in a single operation, without doing indexing on the left side of the expression, so this would not be acceptable:
M[:,0:4,:] = M[:,0:4,:][:,::-1,:]
M[:,:,:] = M[:,:,::-1]
The operation needs to be achieveable using Keras backend which disallowes this. It must be of the form
M = M[indexing here that solves the task]
If I wanted to reverse the order of all the columns instead of just the first 4 this could easily be achieved with M = M[:,::-1,::-1] so I've being trying to modify this to achieve my goal but unfortunately can't work out how. Is this even possible?

M[:, ::-1, [3, 2, 1, 0, 4, 5]]

Import .dat file in Python 3

I would like to import a .dat file which includes
lines/header/numbers/lines
something like this example
start using data to calculate something
x y z g h
1 4 6 8 3
4 5 6 8 9
2 3 6 8 5
end the data that I should import.
Now I am trying to read this file, remove first and last lines and put the numbers in an array and do some basic calculation on them, But I could not get rid of the lines. I used data = np.genfromtxt('sample.dat') to import data, but with lines, I cannot do anything. Can anyone help me?

Maybe this helps you:
import numpy as np
data = np.genfromtxt('sample.dat',
skip_header=1,
skip_footer=1,
names=True,
dtype=None,
delimiter=' ')
print(data)
# Output: [(1, 4, 6, 8, 3) (4, 5, 6, 8, 9) (2, 3, 6, 8, 5)]
Please refer to the numpy documentation for further information about the parameters used: https://numpy.org/doc/stable/reference/generated/numpy.genfromtxt.html

Creating columns with numpy Python

I have some elements stored in numpy.array[]. I wish to store them in a ".txt" file. The case is it needs to fit a certain standard, which means each element needs to be stored x lines into the file.
Example:
numpy.array[0] needs to start in line 1, col 26.
numpy.array[1] needs to start in line 1, col 34.
I use numpy.savetxt() to save the arrays to file.
Later I will implement this in a loop to create a lagre ".txt" file with coordinates.
Edit: This good example was provided below, it does point out my struggle:
In [117]: np.savetxt('test.txt',A.T,'%20d %10d')
In [118]: cat test.txt
0 6
1 7
2 8
3 9
4 10
5 11
The fmt option '%20d %10d' gives you spacing which depend on the last integer. What I need is an option which lets me set the spacing from the left side regardless of other integers.
Template is need to fit integers into:
XXXXXXXX.XXX YYYYYYY.YYY ZZZZ.ZZZ
Final Edit:
I solved it by creating a test which checks how many spaces the last float used. I was then able to predict the number of spaces the next float needed to fit the template.

Have you played with the fmt of np.savetxt?
Let me illustrate with a concrete example (the sort that you should have given us)
Make a 2 row array:
In [111]: A=np.arange((12)).reshape(2,6)
In [112]: A
Out[112]:
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11]])
Save it, and get 2 rows, 6 columns
In [113]: np.savetxt('test.txt',A,'%d')
In [114]: cat test.txt
0 1 2 3 4 5
6 7 8 9 10 11
save its transpose, and get 6 rows, 2 columns
In [115]: np.savetxt('test.txt',A.T,'%d')
In [116]: cat test.txt
0 6
1 7
2 8
3 9
4 10
5 11
Put more detail into fmt to space out the columns
In [117]: np.savetxt('test.txt',A.T,'%20d %10d')
In [118]: cat test.txt
0 6
1 7
2 8
3 9
4 10
5 11
I think you can figure out how to make a fmt string that puts your numbers in the correct columns (join 26 spaces etc, or use left and right justification - the usual Python formatting issues).
savetxt also takes an opened file. So you can open a file for writing, write one array, add some filler lines, and write another. Also, savetxt doesn't do anything fancy. It just iterates through the rows of the array, and writes each row to a line, e.g.
for row in A:
file.write(fmt % tuple(row))
So if you don't like the control that savetxt gives you, write the file directly.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Read file with last col header spanning 2 column values in python - python

Related

Creating a DataFrame from a dictionary of Series results in lost indices and NaNs

Pandas query not working with square brackets in column name

How do I reverse the first four elements of the 1st axis and reversing the 2nd axis of a numpy array in a single operation?

Import .dat file in Python 3

Creating columns with numpy Python

Categories

Resources