Parsing a pipe-delimited file in Python - python

I'm trying to parse a pipe-delimited file and pass the values into a list, so that later I can print selective values from the list.
The file looks like:
name|age|address|phone|||||||||||..etc
It has more than 100 columns.

Use the 'csv' library.
First, register your dialect:
import csv
csv.register_dialect('piper', delimiter='|', quoting=csv.QUOTE_NONE)
Then, use your dialect on the file:
with open(myfile, "rb") as csvfile:
for row in csv.DictReader(csvfile, dialect='piper'):
print row['name']

Use Pandas:
import pandas as pd
pd.read_csv(filename, sep="|")
This will store the file in a dataframe. For each column, you can apply conditions to select the required values to print. It takes a very short time to execute. I tried with 111,047 rows.

If you're parsing a very simple file that won't contain any | characters in the actual field values, you can use split:
fileHandle = open('file', 'r')
for line in fileHandle:
fields = line.split('|')
print(fields[0]) # prints the first fields value
print(fields[1]) # prints the second fields value
fileHandle.close()
A more robust way to parse tabular data would be to use the csv library as mentioned in Spencer Rathbun's answer.

In 2022, with Python 3.8 or above, you can simply do:
import csv
with open(file_path, "r") as csvfile:
reader = csv.reader(csvfile, delimiter='|')
for row in reader:
print(row[0], row[1])

Related

how to append a column from a csv file to another csv file without using panda?

I want to append a column from 'b.csv' file and put it into 'a.csv' file but it only add a letter and not the whole string. I tried searching in google but there's no answer. I want to put the column under the headline "number". This is my code:
f = open('b.csv')
default_text = f.read()
with open('a.csv', 'r') as read_obj, \
open('output_1.csv', 'w', newline='') as write_obj:
csv_reader = reader(read_obj)
csv_writer = writer(write_obj)
for row in csv_reader:
row.append(default_text[8])
csv_writer.writerow(row)
This is the info in 'a.csv'
name,age,course,school,number
Leo,18,BSIT,STI
Rommel,23,BSIT,STI
Gaby,33,BSIT,STI
Ranel,31,BSIT,STI
This is the info in 'b.csv'
1212121
1094534
1345684
1093245
You can just concat rows read from both CSV file and pass it immediately to writer:
import csv
from operator import concat
with open(r'a.csv') as f1, \
open(r'b.csv') as f2, \
open(r'output_1.csv', 'w', newline='') as out:
f1_reader = csv.reader(f1)
f2_reader = csv.reader(f2)
writer = csv.writer(out)
writer.writerow(next(f1_reader)) # write column names
writer.writerows(map(concat, f1_reader, f2_reader))
So we initialize csv.reader() for both CSV files and csv.writer() for output. As first file (a.csv) contains column names, we read it using next() and pass to .writerow() to write them into output without any modifications. Then using map() we can iterate over both readers simultaneously applying operator.concat() which concatenate rows returned from both reader. We can pass it directly to .writerows() and let it consume generator returned by map().
If only pandas cannot be used, then it's convenient to use Table helper from convtools library (github).
from convtools.contrib.tables import Table
from convtools import conversion as c
(
Table.from_csv("tmp/1.csv", header=True)
# this step wouldn't be needed if your first file wouldn't have missing
# "number" column
.drop("number")
.zip(Table.from_csv("tmp/2.csv", header=["number"]))
.into_csv("tmp/results.csv")
)

Accessing Data in csv.reader

I'm trying to access a csv file of currency pairs using csv.reader. The first column shows dates, the first row shows the currency pair eg.USD/CAD. I can read in the file but cannot access the currency pairs data to perform simple calculations.
I've tried using next(x) to skip header row (currency pairs). If i do this, i get a Typeerror: csv reader is not subscriptable.
path = x
file = open(path)
dataset = csv.reader(file, delimiter = '\t',)
header = next(dataset)
header
Output shows the header row which is
['Date,USD,Index,CNY,JPY,EUR,KRW,GBP,SGD,INR,THB,NZD,TWD,MYR,IDR,VND,AED,PGK,HKD,CAD,CHF,SEK,SDR']
I expect to be able to access the underlying currency pairs but i'm getting the type error as noted above. Is there a simple way to access the currency pairs, for example I want to use USD.describe() to get simple statistics on the USD currency pair.
How can i move from this stage to accessing the data underlying the header row?
try this example
import csv
with open('file.csv') as csv_file:
csv_reader = csv.Reader(csv_file, delimiter='\t')
line_count = 0
for row in csv_reader:
print(f'\t{row[0]} {row[1]} {row[3]}')
It's apparent from the output of your header row that the columns are comma-delimited rather than tab-delimited, so instead of passing delimiter = '\t' to csv.reader, you should let it use the default delimiter ',' instead:
dataset = csv.reader(file)
If you need to elaborate some statistics pandas is your friend. No need to use the csv module, use pandas.read_csv.
import pandas
filename = 'path/of/file.csv'
dataset = pandas.read_csv(filename, sep = '\t') #or whatever the separator is
pandas.read_csv uses the first line as the header automatically.
To see statistics, simply do:
dataset.describe()
Or for a single column:
dataset['column_name'].describe()
Are you sure that your delimiter is '\t'? In first row your delimiter is ','... Anyway you can skip first row by doing file.readline() before using it by csv.reader:
import csv
example = """Date,USD,Index,CNY,JPY,EUR,KRW,GBP,SGD,INR,THB,NZD,TWD,MYR,IDR,VND,AED,PGK,HKD,CAD,CHF,SEK,SDR
1-2-3\tabc\t1.1\t1.2
4-5-6\txyz\t2.1\t2.2
"""
with open('demo.csv', 'w') as f:
f.write(example)
with open('demo.csv') as f:
f.readline()
reader = csv.reader(f, delimiter='\t')
for row in reader:
print(row)
# ['1-2-3', 'abc', '1.1', '1.2']
# ['4-5-6', 'xyz', '2.1', '2.2']
I think that you need something else... Can you add to your question:
example of first 3 lines in your csv
Example of what you'd like to access:
is using row[0], row[1] enough for you?
or do you want "named" access like row['Date'], row['USD'],
or you want something more complex like data_by_date['2019-05-01']['USD']

Retrieve columns from a csv files and extract value to another csv file

I have a csv file as follow:
lat,lon,date,data1,data2
1,2,3,4,5
6,7,8,9,10
From this csv file I want to retrieve and extract the column date and data1 to another csv file. I have the following code:
import csv
os.chdir(mydir)
column_names = ["date", "data1"]
index=[]
with open("my.csv", "r") as f:
mycsv = csv.DictReader(f)
for row in mycsv:
for col in column_names:
try:
data=print(row[col])
with open("test2.txt", "w") as f:
print(data, file=f)
except KeyError:
pass
Unfortunately, the output is a file with a "none" on it... Does anyone knows how to retrieve and write to another file the data I wish to use?
There are a few issues with your code:
Everytime you open("test2.txt", "w"), w option will open your file and delete all its contents.
You are storing return value or print, which is None and then trying to print this into yout file
Read your CSV into a list of dict's, as below:
import csv
with open('your_csv.csv') as csvfile:
reader = csv.DictReader(csvfile)
read_l = [{key:value for key, value in row.items() if key in ('date', 'data1')}
for row in reader]
and then use DictWriter to write to a new CSV.
with open('new.csv', 'w') as csvfile:
fieldnames = read_l[0].keys()
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for row in read_l[1:]:
writer.writerow(row)
Try with below steps may help you. But they require pandas library.Install pandas library before you go for below steps. input.csv contains data that you have mentioned.
import pandas as pd
df=pd.read_csv('input.csv')
df_new=df.iloc[0:,2:4]
df_new.to_csv("output.csv",index=False)
The reason why you see None in your file is because you're assigning the result of print(row[col]) to your data variable:
data=print(row[col])
print() doesn't return anything, therefore the content of data is None. If you remove the print() and just have data = row[col], you will get something valuable.
There is one more issue that I see in your code, which you probably want to get fixed:
You're opening the file over and over again with each iteration in the first loop. Therefore, with each row you're overwriting the entire file with that rows value. If you want the entire column, then you'd have open the file once, before the loop.
I will recommend you should use panda. I haven't run this script but something like this should work.
import panda as pd
import csv
frame = pd.read_csv('my.csv')
df=frame[['date','data2']]
with open('test2.csv', 'a', newline='') as csvfile:
writer = csv.writer(csvfile, delimiter=',',
quotechar='|', quoting=csv.QUOTE_MINIMAL)
writer.writerow(df)
import pandas as pd
df = pd.read_csv("my.csv") #optional "header"=True
new_df = df[["date","data1"]]
new_df.to_csv("new_csv_name.csv")
#if you don't need index
new_df.to_csv('new_csv_name.csv', index=False)

Convert from CSV to array in Python

I have a CSV file containing the following.
0.000264,0.000352,0.000087,0.000549
0.00016,0.000223,0.000011,0.000142
0.008853,0.006519,0.002043,0.009819
0.002076,0.001686,0.000959,0.003107
0.000599,0.000133,0.000113,0.000466
0.002264,0.001927,0.00079,0.003815
0.002761,0.00288,0.001261,0.006851
0.000723,0.000617,0.000794,0.002189
I want convert the values into an array in Python and keep the same order (row and column). How I can achieve this?
I have tried different functions but ended with error.
You should use the csv module:
import csv
results = []
with open("input.csv") as csvfile:
reader = csv.reader(csvfile, quoting=csv.QUOTE_NONNUMERIC) # change contents to floats
for row in reader: # each row is a list
results.append(row)
This gives:
[[0.000264, 0.000352, 8.7e-05, 0.000549],
[0.00016, 0.000223, 1.1e-05, 0.000142],
[0.008853, 0.006519, 0.002043, 0.009819],
[0.002076, 0.001686, 0.000959, 0.003107],
[0.000599, 0.000133, 0.000113, 0.000466],
[0.002264, 0.001927, 0.00079, 0.003815],
[0.002761, 0.00288, 0.001261, 0.006851],
[0.000723, 0.000617, 0.000794, 0.002189]]
If your file doesn't contain parentheses
with open('input.csv') as f:
output = [float(s) for line in f.readlines() for s in line[:-1].split(',')]
print(output);
The csv module was created to do just this. The following implementation of the module is taken straight from the Python docs.
import csv
with open('file.csv','rb') as csvfile:
reader = csv.reader(csvfile, delimiter=',', quotechar='|')
for row in reader:
#add data to list or other data structure
The delimiter is the character that separates data entries, and the quotechar is the quotechar.

How to read a column without header from csv and save the output in a txt file using Python?

I have a file "TAB.csv" with many columns. I would like to choose one column without header (index of that column is 3) from CSV file. Then create a new text file "NEW.txt" and write there that column (without header).
Below code reads that column but with the header. How to omit the header and save that column in a new text file?
import csv
with open('TAB.csv','rb') as f:
reader = csv.reader(f)
for row in reader:
print row[3]
This is the solution #tmrlvi was talking: it skips the first row (header) via next function:
import csv
with open('TAB.csv','rb') as input_file:
reader = csv.reader(input_file)
output_file = open('output.csv','w')
next(reader, None)
for row in reader:
row_str = row[3]
output_file.write(row_str + '\n')
output_file.close()
Try this:
import csv
with open('TAB.csv', 'rb') as f, open('out.txt', 'wb') as g:
reader = csv.reader(f)
next(reader) # skip header
g.writelines(row[3] + '\n' for row in reader)
enumerate is a nice function that returns a tuple. It enables to to view the index while running over an iterator.
import csv
with open('NEW.txt','wb') as outfile:
with open('TAB.csv','rb') as f:
reader = csv.reader(f)
for index, row in enumerate(reader):
if index > 0:
outfile.write(row[3])
outfile.write("\n")
Another solution would be to read one line from the file (in order to skip the header).
It's an old question but I would like to add my answer about Pandas library, I would like to say. It's better to use Pandas library for such tasks instead of writing your own code. And the simple code with Pandas will be like :
import pandas as pd
reader = pd.read_csv('TAB.csv', header = None)

Categories

Resources