How to know CSV line count before loading in python?

How to know CSV line count before loading in python? - python

I am new to python and have a requirement to load dataframes from various CSV files. It turns out that there is a business logic depending on the number of rows in csv. Can i know this beforehand if i can know CSV total row numbers without writing read_csv?

yes, you can:
lines = sum(1 for line in open('/path/to/file.csv'))
but be aware that Pandas will read the whole file again
if you are sure that the whole file will fit into memory we can do this:
with open('/path/to/file.csv') as f:
data = f.readlines()
lines = len(data)
df = pd.read_csv(data, ...)

You can read the file without saving the contents. Try:
with open(filename, "r") as filehandle:
number_of_lines = len(filehandle.readlines())

Related

Reading CSV with newlines within row

I'm using Pandas to read a CSV file but some of the rows will continue on the next line and the delimiter(") will be at the start of the next line. I'm trying to find a parameter for 'pd.read_csv' that will fix it but after reading the documentation, I still not sure if there is one.
Ex:
"0","","0","0","0","0","0
","0","0","0","0","0","0"
In other words,
"0","","0","0","0","0","0\r\n","0","0","0","0","0","0"

with open(file, 'rU') as myfile:
filtered = (line.replace('\n', '') for line in myfile)
for row in csv.reader(filtered):

Read csv lines and save it as seperate txt file, named as a line - python

i have some problem with simple code.
I have a csv file with one column, and hundreds rows. I would like to get a code to read each line of csv and save it as separate txt files. What is important, the txt files should have be named as read line.
Example:
1.Adam
2. Doroty
3. Pablo
will give me adam.txt, doroty.txt and pablo txt. files. Please, help.

This should do what you need on python 3.6
with open('file.csv') as f: # Open file with hundreds of rows
for name in f.read().split('\n'): # Get list of all names
with open(f'{name.strip()}.txt', 'w') as s: # Create file per name
pass

Alternatively you can use built-in CSV library to avoid any complications with parsing csv files:
import csv
with open('names.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
file_name ='{0}.txt'.format(row['first_name'])
with open(file_name, 'w') as f:
pass

How to import data from a CSV file and store it in a variable?

I am extremely new to python 3 and I am learning as I go here. I figured someone could help me with a basic question: how to store text from a CSV file as a variable to be used later in the code. So the idea here would be to import a CSV file into the python interpreter:
import csv
with open('some.csv', 'rb') as f:
reader = csv.reader(f)
for row in reader:
...
and then extract the text from that file and store it as a variable (i.e. w = ["csv file text"]) to then be used later in the code to create permutations:
print (list(itertools.permutations(["w"], 2)))
If someone could please help and explain the process, it would be very much appreciated as I am really trying to learn. Please let me know if any more explanation is needed!

itertools.permutations() wants an iterable (e.g. a list) and a length as its arguments, so your data structure needs to reflect that, but you also need to define what you are trying to achieve here. For example, if you wanted to read a CSV file and produce permutations on every individual CSV field you could try this:
import csv
with open('some.csv', newline='') as f:
reader = csv.reader(f)
w = []
for row in reader:
w.extend(row)
print(list(itertools.permutations(w, 2)))
The key thing here is to create a flat list that can be passed to itertools.permutations() - this is done by intialising w to an empty list, and then extending its elements with the elements/fields from each row of the CSV file.
Note: As pointed out by #martineau, for the reasons explained here, the file should be opened with newline='' when used with the Python 3 csv module.

If you want to use Python 3 (as you state in the question) and to process the CSV file using the standard csv module, you should be careful about how to open the file. So far, your code and the answers use the Python 2 way of opening the CSV file. The things has changed in Python 3.
As shengy wrote, the CSV file is just a text file, and the csv module gets the elements as strings. Strings in Python 3 are unicode strings. Because of that, you should open the file in the text mode, and you should supply the encoding. Because of the nature of CSV file processing, you should also use the newline='' when opening the file.
Now extending the explanation of Burhan Khalid... When reading the CSV file, you get the rows as lists of strings. If you want to read all content of the CSV file into memory and store it in a variable, you probably want to use the list of rows (i.e. list of lists where the nested lists are the rows). The for loop iterates through the rows. The same way the list() function iterates through the sequence (here through the sequence of rows) and build the list of the items. To combine that with the wish to store everything in the content variable, you can write:
import csv
with open('some.csv', newline='', encoding='utf_8') as f:
reader = csv.reader(f)
content = list(reader)
Now you can do your permutation as you wish. The itertools is the correct way to do the permutations.

import csv
data = csv.DictReader(open('FileName.csv', 'r'))
print data.fieldnames
output = []
for each_row in data:
row = {}
try:
p = dict((k.strip(), v) for k, v in p.iteritems() if v.lower() != 'null')
except AttributeError, e:
print e
print p
raise Exception()
//based on the number of column
if p.get('col1'):
row['col1'] = p['col1']
if p.get('col2'):
row['col2'] = p['col2']
output.append(row)
Finally all data stored in output variable

Is this what you need?
import csv
with open('some.csv', 'rb') as f:
reader = csv.reader(f, delimiter=',')
rows = list(reader)
print('The csv file had {} rows'.format(len(rows)))
for row in rows:
do_stuff(row)
do_stuff_to_all_rows(rows)
The interesting line is rows = list(reader), which converts each row from the csv file (which will be a list), into another list rows, in effect giving you a list of lists.
If you had a csv file with three rows, rows would be a list with three elements, each element a row representing each line in the original csv file.

If all you care about is to read the raw text in the file (csv or not) then:
with open('some.csv') as f:
w = f.read()
will be a simple solution to having w="csv, file, text\nwithout, caring, about columns\n"

You should try pandas, which work both with Python 2.7 and Python 3.2+ :
import pandas as pd
csv = pd.read_csv("your_file.csv")
Then you can handle you data easily.
More fun here

First, a csv file is a text file too, so everything you can do with a file, you can do it with a csv file. That means f.read(), f.readline(), f.readlines() can all be used. see detailed information of these functions here.
But, as your file is a csv file, you can utilize the csv module.
# input.csv
# 1,david,enterprise
# 2,jeff,personal
import csv
with open('input.csv') as f:
reader = csv.reader(f)
for serial, name, version in reader:
# The csv module already extracts the information for you
print serial, name, version
More details about the csv module is here.

Reading a CSV in Python - columns not at start of file

I'm trying to adjust a script that previously took in a CSV file where the columns were at the start of a file, however now the CSV it reads has changed so that there is a load of spiel before the column headers are given.
Is there a way using DictReader (or even any other method) to skip down to where the columns are (line 15) and use these?
Currently I'm using the below code, but it will always take the first line in the file.
f = open(fileName)
reader = csv.DictReader(f)
lineU = 0
for underlyer in reader:
lineU = lineU + 1
if(lineU == 6):
#start the code
Appreciate any help given.

Try reading the 15 lines from f first, before passing it to the DictReader.

The csv.reader will iterate over the file, so you can basically read those lines using file.readline()before starting using the reader, so that they don't appear to the reader.

Replace a word in a file

I am new to Python programming...
I have a .txt file....... It looks like..
0,Salary,14000
0,Bonus,5000
0,gift,6000
I want to to replace the first '0' value to '1' in each line. How can I do this? Any one can help me.... With sample code..
Thanks in advance.
Nimmyliji

I know that you're asking about Python, but forgive me for suggesting that perhaps a different tool is better for the job. :) It's a one-liner via sed:
sed 's/^0,/1,/' yourtextfile.txt > output.txt
This applies the regex /^0,/ (which matches any 0, that occurs at the beginning of a line) to each line and replaces the matched text with 1, instead. The output is directed into the file output.txt specified.

inFile = open("old.txt", "r")
outFile = open("new.txt", "w")
for line in inFile:
outFile.write(",".join(["1"] + (line.split(","))[1:]))
inFile.close()
outFile.close()
If you would like something more general, take a look to Python csv module. It contains utilities for processing comma-separated values (abbreviated as csv) in files. But it can work with arbitrary delimiter, not only comma. So as you sample is obviously a csv file, you can use it as follows:
import csv
reader = csv.reader(open("old.txt"))
writer = csv.writer(open("new.txt", "w"))
writer.writerows(["1"] + line[1:] for line in reader)
To overwrite original file with new one:
import os
os.remove("old.txt")
os.rename("new.txt", "old.txt")
I think that writing to new file and then renaming it is more fault-tolerant and less likely corrupt your data than direct overwriting of source file. Imagine, that your program raised an exception while source file was already read to memory and reopened for writing. So you would lose original data and your new data wouldn't be saved because of program crash. In my case, I only lose new data while preserving original.

o=open("output.txt","w")
for line in open("file"):
s=line.split(",")
s[0]="1"
o.write(','.join(s))
o.close()
Or you can use fileinput with in place edit
import fileinput
for line in fileinput.FileInput("file",inplace=1):
s=line.split(",")
s[0]="1"
print ','.join(s)

f = open(filepath,'r')
data = f.readlines()
f.close()
edited = []
for line in data:
edited.append( '1'+line[1:] )
f = open(filepath,'w')
f.writelines(edited)
f.flush()
f.close()
Or in Python 2.5+:
with open(filepath,'r') as f:
data = f.readlines()
with open(outfilepath, 'w') as f:
for line in data:
f.write( '1' + line[1:] )
This should do it. I wouldn't recommend it for a truly big file though ;-)
What is going on (ex 1):
1: Open the file in read mode
2,3: Read all the lines into a list (each line is a separate index) and close the file.
4,5,6: Iterate over the list constructing a new list where each line has the first character replaced by a 1. The line[1:] slices the string from index 1 onward. We concatenate the 1 with the truncated list.
7,8,9: Reopen the file in write mode, write the list to the file (overwrite), flush the buffer, and close the file handle.
In Ex. 2:
I use the with statement that lets the file handle closing itself, but do essentially the same thing.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to know CSV line count before loading in python? - python

I am new to python and have a requirement to load dataframes from various CSV files. It turns out that there is a business logic depending on the number of rows in csv. Can i know this beforehand if i can know CSV total row numbers without writing read_csv?

yes, you can: lines = sum(1 for line in open('/path/to/file.csv')) but be aware that Pandas will read the whole file again if you are sure that the whole file will fit into memory we can do this: with open('/path/to/file.csv') as f: data = f.readlines() lines = len(data) df = pd.read_csv(data, ...)

You can read the file without saving the contents. Try: with open(filename, "r") as filehandle: number_of_lines = len(filehandle.readlines())

Related

Reading CSV with newlines within row

Read csv lines and save it as seperate txt file, named as a line - python

How to import data from a CSV file and store it in a variable?

Reading a CSV in Python - columns not at start of file

Replace a word in a file

Categories

Resources