Naming pandas columns from arbitrary row data

Naming pandas columns from arbitrary row data - python

I have a python script where I have read in a csv file using pandas:
colnames = ['col1','col2','col3','col4','col5','col6','col7','col8','col9','col10']
csv_input = pd.read_csv(ifile, names=colnames)
The CSV file is filled with lots of uneeded junk, but the column names I want to use are defined by a row with DataName in col1.
csv_names = csv_input[csv_input.col1 == 'DataName']
The actual data is in rows with DataValue in col1, and I don't need the rest.
csv_input = csv_input[csv_input.col1 == 'DataValue']
What I'd like to do is rename the columns in csv_input with the values of csv_names, but I can't find the right syntax to do this. I have tried
csv_input.columns = csv_names.values
Which gives the error
ValueError: Length mismatch: Expected axis has 10 elements, new values have 1 elements
Any suggestions greatly appreciated.

You should be able to just directly assign them like so:
In [28]:
df = pd.DataFrame({'a':[0,'e',1], 'b':[0,'f',2],'c':[0,'g',2]})
df
Out[28]:
a b c
0 0 0 0
1 e f g
2 1 2 2
In [29]:
df.columns = df.loc[1]
df
Out[29]:
1 e f g
0 0 0 0
1 e f g
2 1 2 2
so in your case just do:
csv_input.columns = csv_names

Related

How to Pivot/Stack for multi header column dataframe

np.random.seed(2022) # added to make the data the same each time
cols = pd.MultiIndex.from_arrays([['A','A' ,'B','B'], ['min','max','min','max']])
df = pd.DataFrame(np.random.rand(3,4),columns=cols)
df.index.name = 'item'
A B
min max min max
item
0 0.009359 0.499058 0.113384 0.049974
1 0.685408 0.486988 0.897657 0.647452
2 0.896963 0.721135 0.831353 0.827568
There are two column headers and while working with csv, I get a blank column name for every other column on unmerging.
I want result that looks like this. How can I do it?
I tried to use pivot table but couldn't do it.

Try:
df = (
df.stack(level=0)
.reset_index()
.rename(columns={"level_1": "title"})
.sort_values(by=["title", "item"])
)
print(df)
Prints:
item title max min
0 0 A 0.762221 0.737758
2 1 A 0.930523 0.275314
4 2 A 0.746246 0.123621
1 0 B 0.044137 0.264969
3 1 B 0.577637 0.699877
5 2 B 0.601034 0.706978
Then to CSV:
df.to_csv('out.csv', index=False)

Can you print a column using its index values instead of its name? [duplicate]

I have a pandas dataframe and a numpy array of values of that dataframe.
I have the index of a specific column and I already have the row index of an important value. Now I need to get the column name of that particular value from my dataframe.
After searching through the documentations, I found out that I can do the opposite but not what I want.

I think you need index columns names by position (python counts from 0, so for fourth column need 3):
colname = df.columns[pos]
Sample:
df = pd.DataFrame({'A':[1,2,3],
'B':[4,5,6],
'C':[7,8,9],
'D':[1,3,5],
'E':[5,3,6],
'F':[7,4,3]})
print (df)
A B C D E F
0 1 4 7 1 5 7
1 2 5 8 3 3 4
2 3 6 9 5 6 3
pos = 3
colname = df.columns[pos]
print (colname)
D
pos = [3,5]
colname = df.columns[pos]
print (colname)
Index(['D', 'F'], dtype='object')

Column.name
It works wonders, especially when iterating!
Eg
Third_Column=DF.iloc[:,2] # where its name is "Third"
Third_Column=='Third' Returns True
for i in DF:
i.name
# this returns the name of each column and can be used in a condition
# to apply a different rule such as a different scale when plotting a
# certain column

Drop / move header of a dataframe into first row

I have a dataframe who looks like this:
A B 10
0 A B 20
1 C A 10
so the headers are not the real headers of the dataframe (I have to map them from another dataframe), how can I drop the headers in this case into the first row, that it looks like this:
0 1 2
0 A B 10
1 A B 20
2 C A 10
Note that pd.read_csv(..., header=None) leads to an error in this case, I don't know why, so I am searching for a solution to fix it after I load the file.

The best is avoid it by header=None parameter in read_csv:
df = pd.read_csv(file, header=None)
If not possible append columns names converted to one row DataFrame to original data and then set range to columns names:
df = df.columns.to_frame().T.append(df, ignore_index=True)
df.columns = range(len(df.columns))
print (df)
0 1 2
0 A B 10
1 A B 20
2 C A 10

Let us try reset_index for fixing
df = df.T.reset_index().T

dictionary values to excel columns

I would like to convert a dictionary of key-value pairs to an excel file with column names that match the values to the corresponding columns.
For example :
I have an excel file with column names as:
a,b,c,d,e,f,g and h.
I have a dictionary like:
{1:['c','d'],2:['a','h'],3:['a','b','b','f']}.
I need the output to be:
a b c d e f g h
1 1 1
2 1 1
3 1 2 1
the 1,2,3 are the keys from the dictionary.
The rest of the columns could be either 0 or null.
I have tried splitting the dictionary and am getting
1 = ['c','d']
2 = ['a','h']
3 = ['a','b','b','f']
but, I don't know how to pass this to match with the excel file.

Your problem can be solved with pandas and collections (there may exist a more efficient solution):
import pandas as pd
from collections import Counter
d = {...} # Your dictionary
series = pd.Series(d) # Convert the dict into a Series
counts = series.apply(Counter) # Count items row-wise
counts = counts.apply(pd.Series) # Convert the counters to Series
table = counts.fillna(0).astype(int) # Fill the gaps and make the counts integer
print(table)
# a b c d f h
1 0 0 1 1 0 0
2 1 0 0 0 0 1
3 1 2 0 0 1 0
It is not clear what type of output you expect, so I leave it to you to convert the DataFrame to the output of your choice.

A simple solution only based on standard lists and dictionaries. It generates a 2D list, which is then easy to convert into a CSV file than can be loaded by Excel.
d = {1:['c','d'],2:['a','h'],3:['a','b','b','f']}
cols = dict((c,n) for n,c in enumerate('abcdefgh'))
rows = dict((k,n) for n,k in enumerate('123'))
table = [[0 for col in cols] for row in rows]
for row, values in d.items():
for col in values:
table[rows[row]][cols[col]] += 1
print(table)
# output:
# [[0,0,1,1,0,0,0,0], [1,0,0,0,0,0,0,1], [1,2,0,0,0,1,0,0]]

how to write the pivot_table to txt file by python

I have get the pivot_table as follows:
there are spaces in the table,
what i want to write to txt is:
how to get it ?
chaoshidishi=pd.pivot_table(clsc,index="故障发生地市",values="工单号",aggfunc=len)
chaoshidishi=chaoshidishi.to_frame()
f=open('E:\gaotie\dishi.txt','w')
for row in chaoshidishi:
f.write(row[0]+row[1])
f.close()

Following up on #shanmuga's comment, you should be able to use to_csv() without first using to_frame().
First, here's some sample data that seems to reflect your setup:
import pandas as pd
group = ['a','a','b','c','c']
value = [1,2,3,4,5]
df = pd.DataFrame({'group':group,'value':value})
print(df)
group value
0 a 1
1 a 2
2 b 3
3 c 4
4 c 5
Now apply pivot_table():
df.pivot_table(columns='group', values='value', aggfunc=len)
group
a 2
b 1
c 2
Name: value, dtype: int64
You can save to file directly from this output. If you don't want to preserve index and column names, use header=None on load:
(df.pivot_table(columns='group', values='value', aggfunc=len)
.to_csv('foo.txt'))
newdf = pd.read_csv('foo.txt', header=None)
print(newdf)
0 1
0 a 2
1 b 1
2 c 2
To preserve column and index names, use the header argument on save, and the index_col argument on load:
(df.pivot_table(columns='group', values='value', aggfunc=len)
.to_csv('foo.txt', header='group'))
newdf = pd.read_csv('foo.txt', index_col='group')
print(newdf)
value
group
a 2
b 1
c 2

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Naming pandas columns from arbitrary row data - python

Related

How to Pivot/Stack for multi header column dataframe

Can you print a column using its index values instead of its name? [duplicate]

Drop / move header of a dataframe into first row

dictionary values to excel columns

how to write the pivot_table to txt file by python

Categories

Resources