How to read specific columns and Rows in Python? - python

Timestamp SP DP
20-03-2017 10:00:01 50 60.5
20-03-2017 10:10:00 60 70
20-03-2017 10:40:01 75 80
20-03-2017 11:05:00 44 65
20-03-2017 11:25:01 98 42
20-03-2017 11:50:01 12 99
20-03-2017 12:00:05 13 54
20-03-2017 12:05:01 78 78
20-03-2017 12:59:01 15 89
20-03-2017 13:00:00 46 99
20-03-2017 13:23:01 44 45
20-03-2017 13:45:08 80 39
import csv
output = []
f = open( 'test.csv', 'r' ) #open the file in read universal mode
for line in f:
cells = line.split( "," )
output.append( ( cells[ 0 ], cells[ 1 ] ) ) #since we want the first, second column
print (output)
how to read specific columns and specific rows?
Desired Output:
i want only first column and 2 rows;
Timestamp SP
20-03-2017 10:00:01 50
20-03-2017 10:10:00 60
How to do that?

Use your csv module, and either count your rows (using the enumerate() function or use itertools.islice() to limit how much is read:
import csv
output = []
with open( 'test.csv', 'r', newline='') as f:
reader = csv.reader(f)
for counter, row in enumerate(reader):
if counter > 2:
# read only the header and first two rows
break
output.append(row[:2])
or using islice():
import csv
from itertools import islice
with open( 'test.csv', 'r', newline='') as f:
reader = csv.reader(f)
output = list(islice((row[:2] for row in reader), 3))

You can use index slicing. Just read csv from the source.
from pandas import *
df = read_csv("Name of csv file.")
df2 = df.ix[:1, 0:2]
print df2
Try it.

You to use pandas to read it.
import pandas
df = pandas.read_csv("filepath", index_col = 0)
Then you can call first column and 2 rows by
df.SP.head(2)
or
df.ix[:1, 0:2] # first row and column first two column

Related

better way to write a csv into a StringIO from another StringIO object

I have the following stringIO object:
s = io.StringIO("""idx Exam_Results Hours_Studied
0 93 8.232795
1 94 7.879095
2 92 6.972698
3 88 6.854017
4 91 6.043066
5 87 5.510013
6 89 5.509297""")
I want to transform it into a csv format and dump it into a new stringIO object. I'm using currently this strategy to do that, but to me it seems I bit clumsy.
output = ""
for line in s.getvalue().split('\n'):
output += re.sub(r'\s+',',',line) + '\n'
output = io.StringIO(output)
print(output.getvalue())
Result:
idx,Exam_Results,Hours_Studied
0,93,8.232795
1,94,7.879095
2,92,6.972698
3,88,6.854017
4,91,6.043066
5,87,5.510013
6,89,5.509297
Is there a clever way to achieve this ?
You can use the csv module:
import csv
from io import StringIO
s = StringIO(
"""idx Exam_Results Hours_Studied
0 93 8.232795
1 94 7.879095
2 92 6.972698
3 88 6.854017
4 91 6.043066
5 87 5.510013
6 89 5.509297"""
)
def convert(origin: str) -> StringIO:
si = StringIO(newline="")
spamwriter = csv.writer(
si, delimiter=",", quotechar="|", quoting=csv.QUOTE_MINIMAL
)
for line in origin.splitlines():
spamwriter.writerow(line.split())
return si
def main():
sio = convert(s.getvalue())
print(sio.getvalue())
if __name__ == "__main__":
main()
from io import StringIO
import csv
text = StringIO("""idx Exam_Results Hours_Studied
0 93 8.232795
1 94 7.879095
2 92 6.972698
3 88 6.854017
4 91 6.043066
5 87 5.510013
6 89 5.509297""")
output = StringIO('')
writer = csv.writer(output, delimiter=',')
writer.writerows(csv.reader(text, delimiter=' ', skipinitialspace=True))
print(output.getvalue())
Output:
idx,Exam_Results,Hours_Studied
0,93,8.232795
1,94,7.879095
2,92,6.972698
3,88,6.854017
4,91,6.043066
5,87,5.510013
6,89,5.509297
You can try pandas package
import io
import pandas as pd
s = io.StringIO("""idx Exam_Results Hours_Studied
0 93 8.232795
1 94 7.879095
2 92 6.972698
3 88 6.854017
4 91 6.043066
5 87 5.510013
6 89 5.509297""")
out = io.StringIO()
df = (pd.read_csv(s, delim_whitespace=True)
.to_csv(out, index=False, sep=';'))
print(out.getvalue())
idx;Exam_Results;Hours_Studied
0;93;8.232795
1;94;7.879095
2;92;6.972698
3;88;6.854017
4;91;6.043066
5;87;5.510013
6;89;5.509297

Open a Latex file with Pandas?

I am trying to replicate using Python the content of the "Tidy Data" paper available here.
However, the datasets are available on github as .tex files, and I can't seem to be able to open them with pandas.
To the extent of my searches so far, it seems that pandas can export to latex, but not import from it...
1) Am I correct ?
2) If so, how would you advise me to open those files ?
Thank you for your time !
Using this as example :
import pandas as pd
from pandas.compat import StringIO
with open('test.tex') as input_file:
text = ""
for line in input_file:
if '&' in line:
text += line.replace('\\', '') + '\n'
data = StringIO(text)
df = pd.read_csv(data, sep="&")
data.close()
Returns :
year artist track time date.entered wk1 wk2 wk3
0 2000 2 Pac Baby Don't Cry 4:22 2000-02-26 87 82 72
1 2000 2Ge+her The Hardest Part Of ... 3:15 2000-09-02 91 87 92
2 2000 3 Doors Down Kryptonite 3:53 2000-04-08 81 70 68
3 2000 98verb|^|0 Give Me Just One Nig... 3:24 2000-08-19 51 39 34
4 2000 A*Teens Dancing Queen 3:44 2000-07-08 97 97 96
5 2000 Aaliyah I Don't Wanna 4:15 2000-01-29 84 62 51
6 2000 Aaliyah Try Again 4:03 2000-03-18 59 53 38
7 2000 Adams, Yolanda Open My Heart 5:30 2000-08-26 76 76 74
You can also write one script which transform the file :
with open('test.tex') as input_file:
with open('test.csv', 'w') as output_file:
for line in input_file:
if '&' in line:
output_file.write(line.replace('\\', '') + '\n')
Then another script wich uses pandas
import pandas as pd
pd.read_csv('test.csv', sep="&")
1) To my knowledge you can open any standard type of file with python
2) You could try:
with open('test.tex', 'w') as text_file:
//Do something to text_file here

How to get a specific field for parsing log files using pandas regular expressions [duplicate]

I have pandas DataFrame like this
X Y Z Value
0 18 55 1 70
1 18 55 2 67
2 18 57 2 75
3 18 58 1 35
4 19 54 2 70
I want to write this data to a text file that looks like this:
18 55 1 70
18 55 2 67
18 57 2 75
18 58 1 35
19 54 2 70
I have tried something like
f = open(writePath, 'a')
f.writelines(['\n', str(data['X']), ' ', str(data['Y']), ' ', str(data['Z']), ' ', str(data['Value'])])
f.close()
It's not correct. How to do this?
You can just use np.savetxt and access the np attribute .values:
np.savetxt(r'c:\data\np.txt', df.values, fmt='%d')
yields:
18 55 1 70
18 55 2 67
18 57 2 75
18 58 1 35
19 54 2 70
or to_csv:
df.to_csv(r'c:\data\pandas.txt', header=None, index=None, sep=' ', mode='a')
Note for np.savetxt you'd have to pass a filehandle that has been created with append mode.
The native way to do this is to use df.to_string() :
with open(writePath, 'a') as f:
dfAsString = df.to_string(header=False, index=False)
f.write(dfAsString)
Will output the following
18 55 1 70
18 55 2 67
18 57 2 75
18 58 1 35
19 54 2 70
This method also lets you easily choose which columns to print with the columns attribute, lets you keep the column, index labels if you wish, and has other attributes for spacing ect.
You can use pandas.DataFrame.to_csv(), and setting both index and header to False:
In [97]: print df.to_csv(sep=' ', index=False, header=False)
18 55 1 70
18 55 2 67
18 57 2 75
18 58 1 35
19 54 2 70
pandas.DataFrame.to_csv can write to a file directly, for more info you can refer to the docs linked above.
Late to the party: Try this>
base_filename = 'Values.txt'
with open(os.path.join(WorkingFolder, base_filename),'w') as outfile:
df.to_string(outfile)
#Neatly allocate all columns and rows to a .txt file
#AHegde - To get the tab delimited output use separator sep='\t'.
For df.to_csv:
df.to_csv(r'c:\data\pandas.txt', header=None, index=None, sep='\t', mode='a')
For np.savetxt:
np.savetxt(r'c:\data\np.txt', df.values, fmt='%d', delimiter='\t')
Way to get Excel data to text file in tab delimited form.
Need to use Pandas as well as xlrd.
import pandas as pd
import xlrd
import os
Path="C:\downloads"
wb = pd.ExcelFile(Path+"\\input.xlsx", engine=None)
sheet2 = pd.read_excel(wb, sheet_name="Sheet1")
Excel_Filter=sheet2[sheet2['Name']=='Test']
Excel_Filter.to_excel("C:\downloads\\output.xlsx", index=None)
wb2=xlrd.open_workbook(Path+"\\output.xlsx")
df=wb2.sheet_by_name("Sheet1")
x=df.nrows
y=df.ncols
for i in range(0,x):
for j in range(0,y):
A=str(df.cell_value(i,j))
f=open(Path+"\\emails.txt", "a")
f.write(A+"\t")
f.close()
f=open(Path+"\\emails.txt", "a")
f.write("\n")
f.close()
os.remove(Path+"\\output.xlsx")
print(Excel_Filter)
We need to first generate the xlsx file with filtered data and then convert the information into a text file.
Depending on requirements, we can use \n \t for loops and type of data we want in the text file.
I used a slightly modified version:
with open(file_name, 'w', encoding = 'utf-8') as f:
for rec_index, rec in df.iterrows():
f.write(rec['<field>'] + '\n')
I had to write the contents of a dataframe field (that was delimited) as a text file.
If you have a Dataframe that is an output of pandas compare method, such a dataframe looks like below when it is printed:
grossRevenue netRevenue defaultCost
self other self other self other
2098 150.0 160.0 NaN NaN NaN NaN
2110 1400.0 400.0 NaN NaN NaN NaN
2127 NaN NaN NaN NaN 0.0 909.0
2137 NaN NaN 0.000000 8.900000e+01 NaN NaN
2150 NaN NaN 0.000000 8.888889e+07 NaN NaN
2162 NaN NaN 1815.000039 1.815000e+03 NaN NaN
I was looking to persist the whole dataframe into a text file as its visible above. Using pandas's to_csv or numpy's savetxt does not achieve this goal. I used plain old print to log the same into a text file:
with open('file1.txt', mode='w') as file_object:
print(data_frame, file=file_object)

Save groupedby items to different excel sheet

I have a excel file that I want to group based on the Column name 'Step No.' and want the corresponding value.Here is a piece of code I wrote :
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
fpath=('/Users/Anil/Desktop/Test data.xlsx')
df=pd.read_excel(fpath)
data=df.loc[:,['Step No.','Parameter','Values']]
grp_data=pd.DataFrame(data.groupby(['Step No.','Values']).size().reset_index())
grp_data.to_excel('/Users/Anil/Desktop/Test1 data.xlsx')
The data gets grouped just as I want it to.
Step No. Values
1 62
1 62.5
1 63
1 66.5
1 68
1 70
1 72
1 76.5
1 77
2 66.5
2 67
2 69
3 75.5
3 77
But, I want data corresponding to each Step No. in a different excel sheet, i.e all values corresponding to Step No.1 in one sheet, Step No. 2 in another sheet and so on. I think I should use some sort of iteration, but don't know what kind exactly.
This should do it:
from pandas import ExcelWriter
steps = df['Step No.'].unique()
dfs = [df.loc[df['Step No.']==step] for step in steps]
def save_xls(list_dfs, xls_path):
writer = ExcelWriter(xls_path)
for n, df in enumerate(list_dfs):
df.to_excel(writer,'sheet%s' % n)
writer.save()
save_xls(dfs, 'YourFile.xlsx')

How to make table with multi-tier row header (index) using Pandas

I have the following data:
# colh1 rh1 rh2 rh3/up rh4/down
AddaVax ID LV 29 18
AddaVax ID SP 16 13
AddaVax ID LN 61 73
ADX ID LV 11 14
ADX IP LV 160 88
ADX ID SP 14 13
ADX IP SP 346 129
ADX ID LN 25 25
What I'd like to do is to make a table that looks like this
(later to be written in text or Excel file):
The actual data contain more than 2 columns but the number of rows
is always fixed (i.e. 10 rows).
I'm stuck with the following code:
import pandas as pd
from collections import defaultdict
dod = defaultdict(dict)
with open("mediate.txt", 'r') as tsvfile:
tabreader = csv.reader(tsvfile, delimiter=' ')
for row in tabreader:
if "#" in row[0]: continue
colh1, rh1, rh2, rhup, rhdown = row
dod["colh1"] = colh1
dod["rh1"] = rh1
dod["rh2"] = rh2
dod["rhup"] = rhup
dod["rhdown"] = rhdown
What's the way to do it?
Just using Pandas:
import pandas as pd
df = pd.read_csv('mediate.txt', sep='\t') # or sep=',' if comma delimited.
df.rename(columns={'rh3/up': 'Up', 'rh4/down': 'Down'}, inplace=True)
result = df.pivot_table(values=['Up', 'Down'],
columns='colh1',
index=['rh1', 'rh2']).stack(0) # Stack Up/Down
>>> result
colh1 ADX AddaVax
rh1 rh2
ID LN Up 25 61
Down 25 73
LV Up 11 29
Down 14 18
SP Up 14 16
Down 13 13
IP LV Up 160 NaN
Down 88 NaN
SP Up 346 NaN
Down 129 NaN

Categories

Resources