Skip first row in a file

Skip first row in a file - python

I'm trying to open 2 files, read the contents of both files, and export the rows of both files to an output file.
The issue I'm having is that the 2nd file that I'm reading in has the same headers in it's first row as the 1st, so I'm trying to skip writing the 1st row of the 2nd file to my new outputFile, but the code written below doesn't execute that.
I know my code says if row[1]=='Full Name' and not column[1], but if I skip writting row[1] to the outputFile, it skips the first column and not the first row, so I figured I'd use that first column in my if statement. The first column's header in my input data is Full Name, so that's why I used that particular if statement.
I didn't include incoming data because I didn't think it was necesary to answer this question, but if you feel it would be helpful, I'm more than glad to post it up here.
If anyone can help me skip writing the first row of the second incoming into my outputFile it would be greatly appreciated.
import csv, sys
firstFile = open(sys.argv[1],'rU')# U to parse
reader1 = csv.reader(firstFile, delimiter=',')
outputFile = open((sys.argv[3]), 'w')
for row in reader1:
row=(str(row))
row=row.replace("'", "")
row=row.replace("[", "")
row=row.replace("]", "")
outputFile.write(row)
outputFile.write('\n')
secondFile = open(sys.argv[2],'rU')
reader2 = csv.reader(secondFile, delimiter=',')
for row in reader2:
row=(str(row))
row=row.replace("'", "")
row=row.replace("[", "")
row=row.replace("]", "")
if row[1]=='Full Name':
next(reader2)
else:
outputFile.write(row)
outputFile.write('\n')

You can use the next function:
secondFile = open(sys.argv[2],'rU')
# skip first line
next(secondFile)
# csv parser does not see the first line of the file
reader2 = csv.reader(secondFile, delimiter=',')

Related

Python If Statement and Lists

i'm fairly new to python and am looking for some help. What i would like to do is read a csv file and then use a for loop with an if statement to locate at rows of that data contain a value and print it out with a header and some formatting using f'.
The issue i seem to have it when finding the data using the if statement, im unsure what i can output the data to, which will then enable it to be printed out (the search output could contain multiple rows and columns):
with open(r'data.csv', 'r') as csv_file:
# loop through the csv file using for loop
for row in csv_file:
# search each row of data for the input from the user
if panel_number in row:
??

Use the csv module. Then in your if statement you can append the row to a list of matches
import csv
matched_rows = []
with open(r'data.csv', 'r') as file:
file.readline() # skip over header line -- remove this if there's no header
csv_file = csv.reader(file)
for row in csv_file:
# search each row of data for the input from the user
if row[0] == panel_number:
matched_rows.append(row)
print(matched_rows)

How to write on nth row in csv using Python?

I have a following problem. Lets say I want to write a word on the cell in column = 1 & and row = 3.
I have written this function:
import csv
def write_to_csv(myfile, word):
with open(myfile, "w", newline="") as csv_file:
csv_writer = csv.writer(csv_file)
write = [word]
csv_writer.writerow(elt for elt in write)
write_to_csv("output.csv", "hello")
My function writes the word "hello" into the cell in column = 1 & and row = 1.
Now imagine that my output.csv already has someting in the first cell. And I don`t want to rewrite it. So how can I modify my function to write the word "hello" on column = 1 & and row = 3 ?
I found this queston, but it did not help me: How to select every Nth row in CSV file using python
Thank you very much!

A CSV file is a text file. That means that you should not try to overwrite it in place. The common way is to copy it to a new file introducing your changes at that time. When done, you move the new file with the old name.
Here is a possible code. It just hope that the file output.csv.tmp does not exist, but can be created and that output.csv has at least 4 lines:
def write_to_csv(myfile, word, row_nb, col_nb):
"""Updates a csv file by writing word at row_nb row and col_nb column"""
with open(myfile) as csv_file, open(myfile+'.tmp', "w", newline='') as out:
csv_reader = csv.reader(csv_file)
csv_writer = csv.writer(out)
#skip row_nb rows
for i in range(row_nb):
csv_writer.writerow(next(csv_reader))
# read and change the expected row
row = next(csv_reader)
row[col_nb] = word
# print(row) # uncomment for debugging
csv_writer.writerow(row)
# copy last rows
for row in csv_reader:
csv_writer.writerow(row)
# rename the tmp file
os.remove(myfile)
os.rename(myfile+'.tmp', myfile)
# write hello at first column of fourth row in output.csv
write_to_csv('output.csv', 'hello', 3, 0)

Making Python ignore CSV separator instruction [duplicate]

I am asking Python to print the minimum number from a column of CSV data, but the top row is the column number, and I don't want Python to take the top row into account. How can I make sure Python ignores the first line?
This is the code so far:
import csv
with open('all16.csv', 'rb') as inf:
incsv = csv.reader(inf)
column = 1
datatype = float
data = (datatype(column) for row in incsv)
least_value = min(data)
print least_value
Could you also explain what you are doing, not just give the code? I am very very new to Python and would like to make sure I understand everything.

You could use an instance of the csv module's Sniffer class to deduce the format of a CSV file and detect whether a header row is present along with the built-in next() function to skip over the first row only when necessary:
import csv
with open('all16.csv', 'r', newline='') as file:
has_header = csv.Sniffer().has_header(file.read(1024))
file.seek(0) # Rewind.
reader = csv.reader(file)
if has_header:
next(reader) # Skip header row.
column = 1
datatype = float
data = (datatype(row[column]) for row in reader)
least_value = min(data)
print(least_value)
Since datatype and column are hardcoded in your example, it would be slightly faster to process the row like this:
data = (float(row[1]) for row in reader)
Note: the code above is for Python 3.x. For Python 2.x use the following line to open the file instead of what is shown:
with open('all16.csv', 'rb') as file:

To skip the first line just call:
next(inf)
Files in Python are iterators over lines.

Borrowed from python cookbook,
A more concise template code might look like this:
import csv
with open('stocks.csv') as f:
f_csv = csv.reader(f)
headers = next(f_csv)
for row in f_csv:
# Process row ...

In a similar use case I had to skip annoying lines before the line with my actual column names. This solution worked nicely. Read the file first, then pass the list to csv.DictReader.
with open('all16.csv') as tmp:
# Skip first line (if any)
next(tmp, None)
# {line_num: row}
data = dict(enumerate(csv.DictReader(tmp)))

You would normally use next(incsv) which advances the iterator one row, so you skip the header. The other (say you wanted to skip 30 rows) would be:
from itertools import islice
for row in islice(incsv, 30, None):
# process

use csv.DictReader instead of csv.Reader.
If the fieldnames parameter is omitted, the values in the first row of the csvfile will be used as field names. you would then be able to access field values using row["1"] etc

Python 2.x
csvreader.next()
Return the next row of the reader’s iterable object as a list, parsed
according to the current dialect.
csv_data = csv.reader(open('sample.csv'))
csv_data.next() # skip first row
for row in csv_data:
print(row) # should print second row
Python 3.x
csvreader.__next__()
Return the next row of the reader’s iterable object as a list (if the
object was returned from reader()) or a dict (if it is a DictReader
instance), parsed according to the current dialect. Usually you should
call this as next(reader).
csv_data = csv.reader(open('sample.csv'))
csv_data.__next__() # skip first row
for row in csv_data:
print(row) # should print second row

The documentation for the Python 3 CSV module provides this example:
with open('example.csv', newline='') as csvfile:
dialect = csv.Sniffer().sniff(csvfile.read(1024))
csvfile.seek(0)
reader = csv.reader(csvfile, dialect)
# ... process CSV file contents here ...
The Sniffer will try to auto-detect many things about the CSV file. You need to explicitly call its has_header() method to determine whether the file has a header line. If it does, then skip the first row when iterating the CSV rows. You can do it like this:
if sniffer.has_header():
for header_row in reader:
break
for data_row in reader:
# do something with the row

this might be a very old question but with pandas we have a very easy solution
import pandas as pd
data=pd.read_csv('all16.csv',skiprows=1)
data['column'].min()
with skiprows=1 we can skip the first row then we can find the least value using data['column'].min()

The new 'pandas' package might be more relevant than 'csv'. The code below will read a CSV file, by default interpreting the first line as the column header and find the minimum across columns.
import pandas as pd
data = pd.read_csv('all16.csv')
data.min()

Because this is related to something I was doing, I'll share here.
What if we're not sure if there's a header and you also don't feel like importing sniffer and other things?
If your task is basic, such as printing or appending to a list or array, you could just use an if statement:
# Let's say there's 4 columns
with open('file.csv') as csvfile:
csvreader = csv.reader(csvfile)
# read first line
first_line = next(csvreader)
# My headers were just text. You can use any suitable conditional here
if len(first_line) == 4:
array.append(first_line)
# Now we'll just iterate over everything else as usual:
for row in csvreader:
array.append(row)

Well, my mini wrapper library would do the job as well.
>>> import pyexcel as pe
>>> data = pe.load('all16.csv', name_columns_by_row=0)
>>> min(data.column[1])
Meanwhile, if you know what header column index one is, for example "Column 1", you can do this instead:
>>> min(data.column["Column 1"])

For me the easiest way to go is to use range.
import csv
with open('files/filename.csv') as I:
reader = csv.reader(I)
fulllist = list(reader)
# Starting with data skipping header
for item in range(1, len(fulllist)):
# Print each row using "item" as the index value
print (fulllist[item])

I would convert csvreader to list, then pop the first element
import csv
with open(fileName, 'r') as csvfile:
csvreader = csv.reader(csvfile)
data = list(csvreader) # Convert to list
data.pop(0) # Removes the first row
for row in data:
print(row)

I would use tail to get rid of the unwanted first line:
tail -n +2 $INFIL | whatever_script.py

just add [1:]
example below:
data = pd.read_csv("/Users/xyz/Desktop/xyxData/xyz.csv", sep=',', header=None)**[1:]**
that works for me in iPython

Python 3.X
Handles UTF8 BOM + HEADER
It was quite frustrating that the csv module could not easily get the header, there is also a bug with the UTF-8 BOM (first char in file).
This works for me using only the csv module:
import csv
def read_csv(self, csv_path, delimiter):
with open(csv_path, newline='', encoding='utf-8') as f:
# https://bugs.python.org/issue7185
# Remove UTF8 BOM.
txt = f.read()[1:]
# Remove header line.
header = txt.splitlines()[:1]
lines = txt.splitlines()[1:]
# Convert to list.
csv_rows = list(csv.reader(lines, delimiter=delimiter))
for row in csv_rows:
value = row[INDEX_HERE]

Simple Solution is to use csv.DictReader()
import csv
def read_csv(file): with open(file, 'r') as file:
reader = csv.DictReader(file)
for row in reader:
print(row["column_name"]) # Replace the name of column header.

Delete blank columns from header row

I'm pretty new to python and I'm having trouble deleting the header columns after the 25th column. There are 8 more extra columns that have no data so I'm trying to delete those columns. Columns 1-25 have like 50,000k of data and the rest of the columns are blank.How would I do this? My code for now is able to clean up the file but I cant delete the headers for row[0] AFTER COLUMN 25.
Thanks
import csv
my_file_name = "NVG.txt"
cleaned_file = "cleanNVG.csv"
remove_words = ['INAC-EIM','-INAC','TO-INAC','TO_INAC','SHIP_TO-inac','SHIP_TOINAC']
with open(my_file_name, 'r', newline='') as infile, open(cleaned_file, 'w',newline='') as outfile:
writer = csv.writer(outfile)
cr = csv.reader(infile, delimiter='|')
writer.writerow(next(cr)) #I think this is why is not working
for line in (r[0:25] for r in cr):
#del line [26:32]
if not any(remove_word in element for element in line for remove_word in remove_words):
line[11]= line[11][:5]
writer.writerow(line)

You've found the line with the problem - all you have to do is only print the headers you want. next(cr) reads the header line, but you pass the entire line to writer.writerow().
Instead of
writer.writerow(next(cr))
you want:
writer.writerow(next(cr)[:25])
([:25] and [0:25] are the same in Python)

Python: Adding a column and row of data to exisitng csv file

I am able to create a csv file with the necessary header and then append it with output. Now I would like to reopen the csv, create another header and then add data to the rows based on an if, else condition.
When I print the results out to console, I get the desired output (as seen below), but when I try to append the output to the csv file, I'm not seeing the same results.
Title: update 1
Added or Deleted Files: True
Title: update 2
Added or Deleted Files: False
Title: update 3
Added or Deleted Files: False
I believe it's how the if condition is executed when opening the csv file, but I can't seem to figure where I'm going wrong. The new column, Add or Deleted Files is created, but the values added to the rows beneath it don't match the output I get in the console, which is the correct output. The output under the column for Added or Deleted Files are all True and not True, False, False as shown in the console output. The Title column and titles of the pull request are all captured correctly as well as the new column in the new csv file, it's the values under Added or Deleted Files that are incorrect (as seen in the output below).
Title,Added or Deleted Files
update 1,True
update 2,True
update 3,True
The code contains print to console and output to csv. Thanks in advance for any help. It's the last with open statements that open the existing csv, creates a new one and then adds the column, but incorrect row data that's giving me trouble.
with open(filename, 'w+', newline='') as f:
csv_writer = csv.writer(f)
csv_writer.writerow(['Title'])
for prs in repo.pull_requests():
getlabels = repo.issue(prs.number).as_dict()
if 'ready-to-merge' in [getlabels['name'] for getlabels in getlabels['labels']] and 'Validation Succeeded' in [getlabels['name'] for getlabels in getlabels['labels']]:
changes = repo.pull_request(prs.number).as_dict()
#print to console statement
print('Title: ', changes['title'])
#output to csv
with open(filename,'a+',newline='') as f:
csv_writer = csv.writer(f)
csv_writer.writerow([changes['title']])
#print to console
if 'added' in (data.status for data in repo.pull_request(prs.number).files()) or 'removed' in (data.status for data in repo.pull_request(prs.number).files()):
print('Added or Deleted Files: True')
else:
print('Added or Deleted Files: False')
#output to new csv with added column and new data
with open(filename, 'r') as csvinput:
with open(filename2, 'w') as csvoutput:
writer = csv.writer(csvoutput, lineterminator="\n")
reader = csv.reader(csvinput)
all = []
row = next(reader)
row.append('Added or Deleted Files')
all.append(row)
for row in reader:
all.append(row)
if 'added' in (data.status for data in repo.pull_request(prs.number).files()) or 'removed' in (data.status for data in repo.pull_request(prs.number).files()):
row.append('True')
else:
row.append('False')
writer.writerows(all)

The structure of your code is broken. Here is what happens:
open csv file, write header line, close
loop over request
add request to csv file (by open, append, close)
open second file and erase it (because of "w" mode)
for each line in csv
copy field from csv file
copy status of current request
So your result file was written in totality in last request iteration, and the value of 2nd column for that last request is consistenly copied to every line.
Your code should be:
with open(filename, 'w+', newline='') as f, open(filename2, 'w') as csvoutput:
csv_writer = csv.writer(f)
writer = csv.writer(csvoutput, lineterminator="\n")
row = ['Title']
csv_writer.writerow(row)
row.append('Added or Deleted Files')
writer.writerow(row)
for prs in repo.pull_requests():
...
row = [changes['title']]
csv_writer.writerow(row)
csv_writer.writerow([changes['title']])
...
if 'added' in (data.status for data in repo.pull_request(prs.number).files()) or 'removed' in (data.status for data in repo.pull_request(prs.number).files()):
row.append('True')
else:
row.append('False')
writer.writerow(row)
That is:
open the files once at the beginning of block and only close them at the end.
write the two files one row at a time, when processing elements from repo.pull_requests()
append the second column to row after writing to csv file and before writing to second file.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Skip first row in a file - python

You can use the next function: secondFile = open(sys.argv[2],'rU') # skip first line next(secondFile) # csv parser does not see the first line of the file reader2 = csv.reader(secondFile, delimiter=',')

Related

Python If Statement and Lists

How to write on nth row in csv using Python?

Making Python ignore CSV separator instruction [duplicate]

Delete blank columns from header row

Python: Adding a column and row of data to exisitng csv file

Categories

Resources