I am working on implementation of a data mining algorithm in python. I have a large csv file which I am using as the input file to get the itemsets. I want to split the csv file into rows through program. Can someone tell how to make it possible?
import pandas as pd
pd.read_csv(file_name,sep='rows separator')
see http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html for details.
I assume the rows are delimited by new-lines and that columns are delimited by commas. In which case just python already knows how to read it line by line which in your case means row by row. Then each row can be split where there are commas.
item_sets=[] #Will put the data in here
with open(filename, "r") as file: # open the file
for data_row in file: #get data one row at a time
# split up the row into columns, stripping whitespace from each one
# and store it in item_sets
item_sets.append( [x.strip() for x in data_row.split(",")] )
import csv
with open('eggs.csv', 'rb') as csvfile:
spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
for row in spamreader:
print row
will printout all rows of a csv file as lists
I assume pandas impelmentation of read_csv is more efficient, but the csv module is built into python so if you don't want any dependencies, you can use it.
Related
I'm beginner in python, I'm trying to read a csv and to extract some of the result in another file:
import csv
with open('test.csv') as csvfile:
readCSV = csv.reader(csvfile, delimiter=',')
for row in readCSV:
print(row[0])
I get the error IndexError: list index out of range. It happens when I select a row which doesn't exist. However, my csv as 5 columns and I can't isolate any of them.
Use the python Pandas library for File reading.
Make sure the encoding format for the CSV File.
import pandas as pd
data = pd.read_csv("file_name.csv")
data.head() //it will print the first 5 rows
//for 1 row
data.head(1)
check this, and you'll get the answer for yoru question
I'm trying to access a csv file of currency pairs using csv.reader. The first column shows dates, the first row shows the currency pair eg.USD/CAD. I can read in the file but cannot access the currency pairs data to perform simple calculations.
I've tried using next(x) to skip header row (currency pairs). If i do this, i get a Typeerror: csv reader is not subscriptable.
path = x
file = open(path)
dataset = csv.reader(file, delimiter = '\t',)
header = next(dataset)
header
Output shows the header row which is
['Date,USD,Index,CNY,JPY,EUR,KRW,GBP,SGD,INR,THB,NZD,TWD,MYR,IDR,VND,AED,PGK,HKD,CAD,CHF,SEK,SDR']
I expect to be able to access the underlying currency pairs but i'm getting the type error as noted above. Is there a simple way to access the currency pairs, for example I want to use USD.describe() to get simple statistics on the USD currency pair.
How can i move from this stage to accessing the data underlying the header row?
try this example
import csv
with open('file.csv') as csv_file:
csv_reader = csv.Reader(csv_file, delimiter='\t')
line_count = 0
for row in csv_reader:
print(f'\t{row[0]} {row[1]} {row[3]}')
It's apparent from the output of your header row that the columns are comma-delimited rather than tab-delimited, so instead of passing delimiter = '\t' to csv.reader, you should let it use the default delimiter ',' instead:
dataset = csv.reader(file)
If you need to elaborate some statistics pandas is your friend. No need to use the csv module, use pandas.read_csv.
import pandas
filename = 'path/of/file.csv'
dataset = pandas.read_csv(filename, sep = '\t') #or whatever the separator is
pandas.read_csv uses the first line as the header automatically.
To see statistics, simply do:
dataset.describe()
Or for a single column:
dataset['column_name'].describe()
Are you sure that your delimiter is '\t'? In first row your delimiter is ','... Anyway you can skip first row by doing file.readline() before using it by csv.reader:
import csv
example = """Date,USD,Index,CNY,JPY,EUR,KRW,GBP,SGD,INR,THB,NZD,TWD,MYR,IDR,VND,AED,PGK,HKD,CAD,CHF,SEK,SDR
1-2-3\tabc\t1.1\t1.2
4-5-6\txyz\t2.1\t2.2
"""
with open('demo.csv', 'w') as f:
f.write(example)
with open('demo.csv') as f:
f.readline()
reader = csv.reader(f, delimiter='\t')
for row in reader:
print(row)
# ['1-2-3', 'abc', '1.1', '1.2']
# ['4-5-6', 'xyz', '2.1', '2.2']
I think that you need something else... Can you add to your question:
example of first 3 lines in your csv
Example of what you'd like to access:
is using row[0], row[1] enough for you?
or do you want "named" access like row['Date'], row['USD'],
or you want something more complex like data_by_date['2019-05-01']['USD']
I have a CSV file containing the following.
0.000264,0.000352,0.000087,0.000549
0.00016,0.000223,0.000011,0.000142
0.008853,0.006519,0.002043,0.009819
0.002076,0.001686,0.000959,0.003107
0.000599,0.000133,0.000113,0.000466
0.002264,0.001927,0.00079,0.003815
0.002761,0.00288,0.001261,0.006851
0.000723,0.000617,0.000794,0.002189
I want convert the values into an array in Python and keep the same order (row and column). How I can achieve this?
I have tried different functions but ended with error.
You should use the csv module:
import csv
results = []
with open("input.csv") as csvfile:
reader = csv.reader(csvfile, quoting=csv.QUOTE_NONNUMERIC) # change contents to floats
for row in reader: # each row is a list
results.append(row)
This gives:
[[0.000264, 0.000352, 8.7e-05, 0.000549],
[0.00016, 0.000223, 1.1e-05, 0.000142],
[0.008853, 0.006519, 0.002043, 0.009819],
[0.002076, 0.001686, 0.000959, 0.003107],
[0.000599, 0.000133, 0.000113, 0.000466],
[0.002264, 0.001927, 0.00079, 0.003815],
[0.002761, 0.00288, 0.001261, 0.006851],
[0.000723, 0.000617, 0.000794, 0.002189]]
If your file doesn't contain parentheses
with open('input.csv') as f:
output = [float(s) for line in f.readlines() for s in line[:-1].split(',')]
print(output);
The csv module was created to do just this. The following implementation of the module is taken straight from the Python docs.
import csv
with open('file.csv','rb') as csvfile:
reader = csv.reader(csvfile, delimiter=',', quotechar='|')
for row in reader:
#add data to list or other data structure
The delimiter is the character that separates data entries, and the quotechar is the quotechar.
I have a code to read csv file by row
import csv
with open('example.csv') as csvfile:
readCSV = csv.reader(csvfile, delimiter=',')
for row in readCSV:
print(row)
print(row[0])
But i want only selected columns what is the technique could anyone give me a script?
import csv
with open('example.csv') as csvfile:
readCSV = csv.reader(csvfile, delimiter=',')
column_one = [row[0] for row in readCSV ]
Will give you list of values from the first column. That being said - you'll have to read the entire file anyway.
You can't do that, because files are written byte-by-byte to your filesystem. To know where one line ends, you will have to read all the line to detect the presence of a line-break character. There's no way around this in a CSV.
So you'll have to read all the file -- but you can choose which parts of each row you want to keep.
I would definitely use pandas for that.
However, in plain python this one of the way to do it.
In this example I am extracting the content of row 3, column 4.
import csv
target_row = 3
target_col = 4
with open('yourfile.csv', 'rb') as csvfile:
reader = csv.reader(csvfile)
n = 0
for row in reader:
if row == target_row:
data = row.split()[target_col]
break
print data
read_csv in pandas module can load a subset of columns.
Assume you only want to load columns 1 and 3 in your .csv file.
import pandas as pd
usecols = [1, 3]
df = pd.read_csv('example.csv',usecols=usecols, sep=',')
Here is Doc for read_csv.
In addition, if your file is big, you can read the file piece by piece by specifying chucksize in read_csv
So I have a text file that looks like this:
1,989785345,"something 1",,234.34,254.123
2,234823423,"something 2",,224.4,254.123
3,732847233,"something 3",,266.2,254.123
4,876234234,"something 4",,34.4,254.123
...
I'm running this code right here:
file = open("file.txt", 'r')
readFile = file.readline()
lineID = readFile.split(",")
print lineID[1]
This lets me break up the content in my text file by "," but what I want to do is separate it into columns because I have a massive number of IDs and other things in each line. How would I go about splitting the text file into columns and call each individual row in the column one by one?
You have a CSV file, use the csv module to read it:
import csv
with open('file.txt', 'rb') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
This still gives you data by row, but with the zip() function you can transpose this to columns instead:
import csv
with open('file.txt', 'rb') as csvfile:
reader = csv.reader(csvfile)
for column in zip(*reader):
Do be careful with the latter; the whole file will be read into memory in one go, and a large CSV file could eat up all your available memory in the process.