How to make table with multi-tier row header (index) using Pandas - python

I have the following data:
# colh1 rh1 rh2 rh3/up rh4/down
AddaVax ID LV 29 18
AddaVax ID SP 16 13
AddaVax ID LN 61 73
ADX ID LV 11 14
ADX IP LV 160 88
ADX ID SP 14 13
ADX IP SP 346 129
ADX ID LN 25 25
What I'd like to do is to make a table that looks like this
(later to be written in text or Excel file):
The actual data contain more than 2 columns but the number of rows
is always fixed (i.e. 10 rows).
I'm stuck with the following code:
import pandas as pd
from collections import defaultdict
dod = defaultdict(dict)
with open("mediate.txt", 'r') as tsvfile:
tabreader = csv.reader(tsvfile, delimiter=' ')
for row in tabreader:
if "#" in row[0]: continue
colh1, rh1, rh2, rhup, rhdown = row
dod["colh1"] = colh1
dod["rh1"] = rh1
dod["rh2"] = rh2
dod["rhup"] = rhup
dod["rhdown"] = rhdown
What's the way to do it?

Just using Pandas:
import pandas as pd
df = pd.read_csv('mediate.txt', sep='\t') # or sep=',' if comma delimited.
df.rename(columns={'rh3/up': 'Up', 'rh4/down': 'Down'}, inplace=True)
result = df.pivot_table(values=['Up', 'Down'],
columns='colh1',
index=['rh1', 'rh2']).stack(0) # Stack Up/Down
>>> result
colh1 ADX AddaVax
rh1 rh2
ID LN Up 25 61
Down 25 73
LV Up 11 29
Down 14 18
SP Up 14 16
Down 13 13
IP LV Up 160 NaN
Down 88 NaN
SP Up 346 NaN
Down 129 NaN

Related

Manipulating data in Pandas

That is my database:
Number Name Points Math Points BG Wish
0 1 Огнян 50 65 MT
1 2 Момчил 61 27 MT
2 3 Радослав 68 68 MT
3 4 Павел 28 16 MT
4 10 Виктор 67 76 MT
5 11 Петър 26 68 BT
6 12 Антон 64 58 BT
7 13 Васил 29 42 BT
8 20 Виктория 62 67 BT
That's my code:
df = pd.read_csv('Input_data.csv', encoding='utf-8-sig')
df['Total'] = df.iloc[:, 2:].sum(axis=1)
df = df.sort_values(['Total', 'Name'], ascending=[0, 1])
df_5.to_excel("BT RANKING_5.xlsx", encoding='utf-8-sig', index=False)
I want for each person who has Wish == MT to double the score in Points Math column.
I tried:
df.loc[df['Wish'] == 'MT', 'Points Math'] = df.loc[df['Points Math'] * 2]
but this didn't work. I als tried to do an if statement, for loop but they didn't work either.
What's the appropriate sytax to do the logic?
Use this:
df['Points_Math'] = np.where(df['Wish'] == 'MT', df['Points Math'] * 2, df['Points Math'])
A new column would be created 'Points_Math' with desired results or you can overwrite by replacing 'Points_Math' with 'Points Math'

How to get a specific field for parsing log files using pandas regular expressions [duplicate]

I have pandas DataFrame like this
X Y Z Value
0 18 55 1 70
1 18 55 2 67
2 18 57 2 75
3 18 58 1 35
4 19 54 2 70
I want to write this data to a text file that looks like this:
18 55 1 70
18 55 2 67
18 57 2 75
18 58 1 35
19 54 2 70
I have tried something like
f = open(writePath, 'a')
f.writelines(['\n', str(data['X']), ' ', str(data['Y']), ' ', str(data['Z']), ' ', str(data['Value'])])
f.close()
It's not correct. How to do this?
You can just use np.savetxt and access the np attribute .values:
np.savetxt(r'c:\data\np.txt', df.values, fmt='%d')
yields:
18 55 1 70
18 55 2 67
18 57 2 75
18 58 1 35
19 54 2 70
or to_csv:
df.to_csv(r'c:\data\pandas.txt', header=None, index=None, sep=' ', mode='a')
Note for np.savetxt you'd have to pass a filehandle that has been created with append mode.
The native way to do this is to use df.to_string() :
with open(writePath, 'a') as f:
dfAsString = df.to_string(header=False, index=False)
f.write(dfAsString)
Will output the following
18 55 1 70
18 55 2 67
18 57 2 75
18 58 1 35
19 54 2 70
This method also lets you easily choose which columns to print with the columns attribute, lets you keep the column, index labels if you wish, and has other attributes for spacing ect.
You can use pandas.DataFrame.to_csv(), and setting both index and header to False:
In [97]: print df.to_csv(sep=' ', index=False, header=False)
18 55 1 70
18 55 2 67
18 57 2 75
18 58 1 35
19 54 2 70
pandas.DataFrame.to_csv can write to a file directly, for more info you can refer to the docs linked above.
Late to the party: Try this>
base_filename = 'Values.txt'
with open(os.path.join(WorkingFolder, base_filename),'w') as outfile:
df.to_string(outfile)
#Neatly allocate all columns and rows to a .txt file
#AHegde - To get the tab delimited output use separator sep='\t'.
For df.to_csv:
df.to_csv(r'c:\data\pandas.txt', header=None, index=None, sep='\t', mode='a')
For np.savetxt:
np.savetxt(r'c:\data\np.txt', df.values, fmt='%d', delimiter='\t')
Way to get Excel data to text file in tab delimited form.
Need to use Pandas as well as xlrd.
import pandas as pd
import xlrd
import os
Path="C:\downloads"
wb = pd.ExcelFile(Path+"\\input.xlsx", engine=None)
sheet2 = pd.read_excel(wb, sheet_name="Sheet1")
Excel_Filter=sheet2[sheet2['Name']=='Test']
Excel_Filter.to_excel("C:\downloads\\output.xlsx", index=None)
wb2=xlrd.open_workbook(Path+"\\output.xlsx")
df=wb2.sheet_by_name("Sheet1")
x=df.nrows
y=df.ncols
for i in range(0,x):
for j in range(0,y):
A=str(df.cell_value(i,j))
f=open(Path+"\\emails.txt", "a")
f.write(A+"\t")
f.close()
f=open(Path+"\\emails.txt", "a")
f.write("\n")
f.close()
os.remove(Path+"\\output.xlsx")
print(Excel_Filter)
We need to first generate the xlsx file with filtered data and then convert the information into a text file.
Depending on requirements, we can use \n \t for loops and type of data we want in the text file.
I used a slightly modified version:
with open(file_name, 'w', encoding = 'utf-8') as f:
for rec_index, rec in df.iterrows():
f.write(rec['<field>'] + '\n')
I had to write the contents of a dataframe field (that was delimited) as a text file.
If you have a Dataframe that is an output of pandas compare method, such a dataframe looks like below when it is printed:
grossRevenue netRevenue defaultCost
self other self other self other
2098 150.0 160.0 NaN NaN NaN NaN
2110 1400.0 400.0 NaN NaN NaN NaN
2127 NaN NaN NaN NaN 0.0 909.0
2137 NaN NaN 0.000000 8.900000e+01 NaN NaN
2150 NaN NaN 0.000000 8.888889e+07 NaN NaN
2162 NaN NaN 1815.000039 1.815000e+03 NaN NaN
I was looking to persist the whole dataframe into a text file as its visible above. Using pandas's to_csv or numpy's savetxt does not achieve this goal. I used plain old print to log the same into a text file:
with open('file1.txt', mode='w') as file_object:
print(data_frame, file=file_object)

How to read specific columns and Rows in Python?

Timestamp SP DP
20-03-2017 10:00:01 50 60.5
20-03-2017 10:10:00 60 70
20-03-2017 10:40:01 75 80
20-03-2017 11:05:00 44 65
20-03-2017 11:25:01 98 42
20-03-2017 11:50:01 12 99
20-03-2017 12:00:05 13 54
20-03-2017 12:05:01 78 78
20-03-2017 12:59:01 15 89
20-03-2017 13:00:00 46 99
20-03-2017 13:23:01 44 45
20-03-2017 13:45:08 80 39
import csv
output = []
f = open( 'test.csv', 'r' ) #open the file in read universal mode
for line in f:
cells = line.split( "," )
output.append( ( cells[ 0 ], cells[ 1 ] ) ) #since we want the first, second column
print (output)
how to read specific columns and specific rows?
Desired Output:
i want only first column and 2 rows;
Timestamp SP
20-03-2017 10:00:01 50
20-03-2017 10:10:00 60
How to do that?
Use your csv module, and either count your rows (using the enumerate() function or use itertools.islice() to limit how much is read:
import csv
output = []
with open( 'test.csv', 'r', newline='') as f:
reader = csv.reader(f)
for counter, row in enumerate(reader):
if counter > 2:
# read only the header and first two rows
break
output.append(row[:2])
or using islice():
import csv
from itertools import islice
with open( 'test.csv', 'r', newline='') as f:
reader = csv.reader(f)
output = list(islice((row[:2] for row in reader), 3))
You can use index slicing. Just read csv from the source.
from pandas import *
df = read_csv("Name of csv file.")
df2 = df.ix[:1, 0:2]
print df2
Try it.
You to use pandas to read it.
import pandas
df = pandas.read_csv("filepath", index_col = 0)
Then you can call first column and 2 rows by
df.SP.head(2)
or
df.ix[:1, 0:2] # first row and column first two column

Error in using knn for multidimensional data

I am a beginer in Machine Learning, I am trying to classify multi dimensional data into two classes. Each data point is 40x6 float values. To begin with I have read my csv file. In this file shot number represents data point.
https://docs.google.com/spreadsheets/d/1tW1xJqnNZa1PhVDAE-ieSVbcdqhT8XfYGy8ErUEY_X4/edit?usp=sharing
Here is the code in python:
import pandas as pd
1 import numpy as np
2 import matplotlib.pyplot as plot
3
4 from sklearn.neighbors import KNeighborsClassifier
5
6 # Read csv data into pandas data frame
7 data_frame = pd.read_csv('data.csv')
8
9 extract_columns = ['LinearAccX', 'LinearAccY', 'LinearAccZ', 'Roll', 'pitch', 'compass']
10
11 # Number of sample in one shot
12 samples_per_shot = 40
13
14 # Calculate number of shots in dataframe
15 count_of_shots = len(data_frame.index)/samples_per_shot
16
17 # Initialize Empty data frame
18 training_index = range(count_of_shots)
19 training_data_list = []
20
21 # flag for backward compatibility
22 make_old_data_compatible_with_new = 0
23
24 if make_old_data_compatible_with_new:
25 # Convert 40 shot data to 25 shot data
26 # New logic takes 25 samples/shot
27 # old logic takes 40 samples/shot
28 start_shot_sample_index = 9
29 end_shot_sample_index = 34
30 else:
31 # Start index from 1 and continue till lets say 40
32 start_shot_sample_index = 1
33 end_shot_sample_index = samples_per_shot
34
35 # Extract each shot into pandas series
36 for shot in range(count_of_shots):
37 # Extract current shot
38 current_shot_data = data_frame[data_frame['shot_no']==(shot+1)]
39
40 # Select only the following column
41 selected_columns_from_shot = current_shot_data[extract_columns]
42
43 # Select columns from selected rows
44 # Find start and end row indexes
45 current_shot_data_start_index = shot * samples_per_shot + start_shot_sample_index
46 current_shot_data_end_index = shot * samples_per_shot + end_shot_sample_index
47 selected_rows_from_shot = selected_columns_from_shot.ix[current_shot_data_start_index:curren t_shot_data_end_index]
48
49 # Append to list of lists
50 # Convert selected short into multi-dimensional array
51
training_data_list.append([selected_columns_from_shot[extract_columns[index]].values.tolist( ) for index in range(len(extract_columns))])
8
7 # Append each sliced shot into training data
6 training_data = pd.DataFrame(training_data_list, columns=extract_columns)
5 training_features = [1 for i in range(count_of_shots)]
4 knn = KNeighborsClassifier(n_neighbors=3)
3 knn.fit(training_data, training_features)
training_data_list.append([selected_columns_from_shot[extract_columns[index]].values.tolist( ) for index in range(len(extract_columns))])
After running the above code, I am getting an error
ValueError: setting an array element with a sequence.
for the line
knn.fit(training_data, training_features)

How to write values to a csv file from another csv file

For index.csv file, its fourth column has ten numbers ranging from 1-5. Each number can be regarded as an index, and each index corresponds with an array of numbers in filename.csv.
The row number of filename.csv represents the index, and each row has three numbers. My question is about using a nesting loop to transfer the numbers in filename.csv to index.csv.
from numpy import genfromtxt
import numpy as np
import csv
data1 = genfromtxt('filename.csv', delimiter=',')
data2 = genfromtxt('index.csv', delimiter=',')
f = open('index.csv','wb')
write = csv.writer(f, delimiter=',',quoting=csv.QUOTE_ALL)
for row in data2:
for ch_row in data1:
if ( data2[row,3] == ch_row ):
write.writerow(data1[data2[row,3],:])
For example, the fourth column of index.csv contains 1,2,5,3,4,1,4,5,2,3 and filename.csv contains:
# filename.csv
20 30 50
70 60 45
35 26 77
93 37 68
13 08 55
What I need is to write the indexed row from filename.csv to index.csv and store these number in 5th, 6th and 7th column:
# index.csv
# 4 5 6 7
... 1 20 30 50
... 2 70 60 45
... 5 13 08 55
... 3 35 26 77
... 4 93 37 68
... 1 20 30 50
... 4 93 37 68
... 5 13 08 55
... 2 70 60 45
... 3 35 26 77
Can anyone help me solve this problem?
You need to indent your last 2 lines. Also, it looks like you are writing to the file from which you are reading.

Categories

Resources