Using python to print strings between csv values - python

My overarching goal is to write a Python script that transforms each row of a spreadsheet into a standalone markdown file, using each column as a value in the file's YAML header. Right now, the final for loop I've written not only keeps going and going and going… it also doesn't seem to place the values correctly.
import csv
f = open('data.tsv')
csv_f = csv.reader(f, dialect=csv.excel_tab)
date = []
title = []
for column in csv_f:
date.append(column[0])
title.append(column[1])
for year in date:
for citation in title:
print "---\ndate: %s\ntitle: %s\n---\n\n" % (year, citation)
I'm using tab-separated values because some of the fields in my spreadsheet are chunks of text with commas. So ideally, the script should output something like the following (I figured I'd tackle splitting this output into individual markdown files later. One thing at a time):
---
date: 2015
title: foo
---
---
date: 2016
title: bar
---
But instead I getting misplaced values and output that never ends. I'm obviously learning as I go along here, so any advice is appreciated.

import csv
with open('data.tsv', newline='') as f:
csv_f = csv.reader(f, dialect=csv.excel_tab)
for column in csv_f:
year, citation = column # column is a list, unpack them directly
print "---\ndate: %s\ntitle: %s\n---\n\n" % (year, citation)
This is all I can do without the sample CSV file.

Related

copy all except first row into new csv file python

I generate several csv files each day, and I'm trying to write a python script that will open each csv file, one by one, rewrite the csv to a new location and fill in the mission information that is in the filename.
I've been able to cobble together separate python scripts that can do most of it, but I'm hitting brick walls.
each CSV has a standard filename:
"Number_Name_Location.csv"
Row 1 has a header that labels each column, Name, Location, Date etc, and each .csv can have n number of pre-filled in rows.
what I'm trying to automate is the following steps:
Opens the 1st .csv in the “/ToParse” folder
Fill in the remaining details for all rows, which is gathered from the filename itself.
Column A: Number
Column B: Date of Parsing
Column k: Name
Column L: Location
writes a new csv in the folder “/Parsed” w
continue to parse the next .csv until no .csv are in the "/ToParse" folder
move all orginal parsed files to /Parsed/Original
below is the cobbled together code, which is overwriting the header row,
How can i adjust the code to ignore the first row in the csv it's opening and just copy from rows 1 onward
import glob
import csv
from datetime import date
with open('Parsed/output.csv', 'w', newline='') as f_output:
csv_output = csv.writer(f_output)
for filename in glob.glob('*.csv'):
with open(filename, newline='') as f_input:
csv_input = csv.reader(f_input)
filename_parts = filename.split("_")
#print(filename_parts[0])
for row in csv_input:
row[0] = filename_parts[0]
row[1] = date.today()
row[10] = filename_parts[1]
row[11] = filename_parts[2]
csv_output.writerow(row)
The open(filename, newline='') seems strange to me. Are you sure you want newline=''?
You can skip the first line by manually doing f_input.readline() before the for loop. This will read (=skip) the first line and the for-loop will start on the next line.
Or you could do for row in csv_input[1:] but I am not sure that works and cannot test it right now. ([1:] means skip the first item in any list-like variable (formally called: Iterable) )

How to extract a month from date in csv file?

I'm trying to get an output of all the employees who worked this month by extracting the month from the date but I get this error:
month = int(row[1].split('-')[1])
IndexError: list index out of range
A row in the attendance log csv looks like this:
"404555403","2020-10-14 23:58:15.668520","Chandler Bing"
I don't understand why it's out of range?
Thanks for any help!
import csv
import datetime
def monthly_attendance_report():
"""
The function prints the attendance data of all employees from this month.
"""
this_month = datetime.datetime.now().month
with open('attendance_log.csv', 'r') as csvfile:
content = csv.reader(csvfile, delimiter=',')
for row in content:
month = int(row[1].split('-')[1])
if month == this_month:
return row
monthly_attendance_report()
It is working for me. The problem will be probably in processing the csv file, because csv files have in most cases headers, which means that you can't split header text. So add slicer [1:] to your for loop and ignore first line with header:
for row in content[1:]:
And processing date by slicing is not good at all, too. Use datetime module or something like that.

Get a non-blank cell recursively from previous columns of a csv using Python

I am new to both Python and Stack Overflow.
I extract from a csv file a few columns into an interim csv file and clean up the data to remove the nan entries. Once I have extracted them, I endup with below two csv files.
Main CSV File:
Sort,Parent 1,Parent 2,Parent 3,Parent 4,Parent 5,Name,Parent 6
1,John,,,Ned,,Dave
2,Sam,Mike,,,,Ken
3,,,Pete,,,Steve
4,,Kerry,,Rachel,,Rog
5,,,Laura,Mitchell,,Kim
Extracted CSV:
Name,ParentNum
Dave,Parent 4
Ken,Parent 2
Steve,Parent 3
Rog,Parent 4
Kim,Parent 4
What I am trying to accomplish is that I would like to recurse through main csv using the name and parent number. But, if I write a for loop it prints empty rows because it is looking up every row for the first value. What is the best approach instead of for loop. I tried dictionary reader to read scv but could not get far. Any help will be appreciated.
CODE:
import xlrd
import csv
import pandas as pd
print('Opening and Reading the msl sheet from the xlsx file')
with xlrd.open_workbook('msl.xlsx') as wb:
sh = wb.sheet_by_index(2)
print("The sheet name is :", sh.name)
with open(msl.csv, 'w', newline="") as f:
c = csv.writer(f)
print('Writing to the CSV file')
for r in range(sh.nrows):
c.writerow(sh.row_values(r))
df1 = pd.read_csv(msl.csv, index_col='Sort')
with open('dirty-processing.csv', 'w', newline="") as tbl_writer1:
c2 = csv.writer(tbl_writer1)
c2.writerow(['Name','Parent'])
for list_item in first_row:
for item in df1[list_item].unique():
row_content = [item, list_item]
c2.writerow(row_content)
Expected Result:
Input Main CSV:
enter image description here
In the above CSV, I would like to grab unique values from each column into a separate file or any other data type. Then also capture the header of the column they are taken from.
Ex:
Negarnaviricota,Phylum
Haploviricotina,Subphylum
...
so on
Next thing is would like to do is get its parent. Which is where I am stuck. Also, as you can see not all columns have data, so I want to get the last non-blank column. Up to this point everything is accomplished using the above code. So the sample output should look like below.
enter image description here

Python Looping through CSV files and their columns

so I've seen this done is other questions asked here but I'm still a little confused. I've been learning python3 for the last few days and figured I'd start working on a project to really get my hands dirty. I need to loop through a certain amount of CSV files and make edits to those files. I'm having trouble with going to a specific column and also for loops in python in general. I'm used to the convention (int i = 0; i < expression; i++), but in python it's a little different. Here's my code so far and I'll explain where my issue is.
import os
import csv
pathName = os.getcwd()
numFiles = []
fileNames = os.listdir(pathName)
for fileNames in fileNames:
if fileNames.endswith(".csv"):
numFiles.append(fileNames)
for i in numFiles:
file = open(os.path.join(pathName, i), "rU")
reader = csv.reader(file, delimiter=',')
for column in reader:
print(column[4])
My issue falls on this line:
for column in reader:
print(column[4])
So in the Docs it says column is the variable and reader is what I'm looping through. But when I write 4 I get this error:
IndexError: list index out of range
What does this mean? If I write 0 instead of 4 it prints out all of the values in column 0 cell 0 of each CSV file. I basically need it to go through the first row of each CSV file and find a specific value and then go through that entire column. Thanks in advance!
It could be that you don't have 5 columns in your .csv file.
Python is base0 which means it starts counting at 0 so the first column would be column[0], the second would be column[1].
Also you may want to change your
for column in reader:
to
for row in reader:
because reader iterates through the rows, not the columns.
This code loops through each row and then each column in that row allowing you to view the contents of each cell.
for i in numFiles:
file = open(os.path.join(pathName, i), "rU")
reader = csv.reader(file, delimiter=',')
for row in reader:
for column in row:
print(column)
if column=="SPECIFIC VALUE":
#do stuff
Welcome to Python! I suggest you to print some debugging messages.
You could add this to you printing loop:
for row in reader:
try:
print(row[4])
except IndexError as ex:
print("ERROR: %s in file %s doesn't contain 5 colums" % (row, i))
This will print bad lines (as lists because this is how they are represented in CSVReader) so you could fix the CSV files.
Some notes:
It is common to use snake_case in Python and not camelCase
Name your variables appropriately (csv_filename instead of i, row instead of column etc.)
Use the with close to handle files (read more)
Enjoy!

manipulating a csv file and writing its output to a new csv file in python

I have a simple file named saleem.csv which contains the following lines of csv information:
File,Run,Module,Name,,,,,
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithInterference,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithoutInterference,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithInterferencePartial,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithoutInterferencePartial,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithInterferenceDropped,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithoutInterferenceDropped,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].appl,broadcast queued,3,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].appl,replies sent,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].appl,replies received,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,nominal,1.188e+07,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,total,1232.22,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,lifetime,-1,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,Mean power consumption,55.7565,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,num devices,1,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,physical layer,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,device total (mWs),1232.22,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,account,0,1,2,3,4
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,energy (mWs),0,207.519,1024.7,0,0
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,time (s),0,3.83442,18.2656,0,
I want to skip the first line, read this file and only write column[2] and column[4] to a new csv file named out.csv. I have written the following to script to do the job.
import csv
with open('saleem.csv') as f:
readcsv = csv.reader(f)
for row in readcsv:
dele = (row[2], row[4])
print dele
with open('out.csv', 'w+') as j:
writecsv = csv.writer(j)
#for row in dele:
for row in dele:
writecsv.writerows(dele)
f.close()
j.close()
This produces the following output:
M,y,N,e,t,w,o,r,k,.,n,o,d,e,[,4,],.,b,a,t,t,e,r,y,S,t,a,t,s
0
M,y,N,e,t,w,o,r,k,.,n,o,d,e,[,4,],.,b,a,t,t,e,r,y,S,t,a,t,s
0
Please help me, Sorry for the mistake previously please, as i mistakenly wrote row.
Edited to reflect revised question
Some problems I can see:
P1: writerows(...)
for row in dele:
writecsv.writerows(dele)
writerows takes a list of rows to write to the csv file. So it shouldn't be inside a loop where you iterate over all rows and attempt to write them individually.
P2: overwriting
for row in readcsv:
dele = (row[2], row[4])
You are continuously overwriting dele, so you aren't going to be keeping track of row[2] and row[4] from every row.
What you could do instead:
dele = []
with open('saleem.csv') as f:
readcsv = csv.reader(f)
for row in readcsv:
dele.append([row[2], row[4])
print([row[2], row[4]])
with open('out.csv', 'w+') as j:
writecsv.csvwriter(j)
writecsv.writerows(dele)
This produced output:
MyNetwork.node[0].nic.phy,0
MyNetwork.node[0].nic.phy,0
MyNetwork.node[0].nic.phy,0
MyNetwork.node[0].nic.phy,0
MyNetwork.node[0].nic.phy,0
MyNetwork.node[0].nic.phy,0
MyNetwork.node[0].appl,3
MyNetwork.node[0].appl,0
MyNetwork.node[0].appl,0
MyNetwork.node[0].batteryStats,1.188e+07
MyNetwork.node[0].batteryStats,1232.22
MyNetwork.node[0].batteryStats,-1
MyNetwork.node[0].batteryStats,55.7565
MyNetwork.node[0].batteryStats,1
MyNetwork.node[0].batteryStats,0
MyNetwork.node[0].batteryStats,1232.22
MyNetwork.node[0].batteryStats,0
MyNetwork.node[0].batteryStats,0
MyNetwork.node[0].batteryStats,0
Also, unrelated to your issue at hand, the following code is unnecessary:
f.close()
j.close()
The reason why with open(...): syntax is so widely used, is because it handles gracefully closing the file for you. You don't need to separately close it yourself. As soon as the with block ends, the file will be closed.
I would suggest using the pandas library.
It makes working with csv files very easy.
import pandas as pd #standard convention for importing pandas
# reads the csv file into a pandas dataframe
dataframe = pd.read_csv('saleem.csv')
# make a new dataframe with just columns 2 and 4
print_dataframe = dataframe.iloc[:,[2,4]]
# output the csv file, but don't include the index numbers or header, just the data
print_dataframe.to_csv('out.csv', index=False, header=False)
If you use Ipython or Jupyter Notebook, you can type
dataframe.head()
to see the first few values of the dataframe. There is a lot more you can do with the library that might be worth learning, but in general it is a great way to read in, filter, and process csv data.

Categories

Resources