Using python to print strings between csv values - python
My overarching goal is to write a Python script that transforms each row of a spreadsheet into a standalone markdown file, using each column as a value in the file's YAML header. Right now, the final for loop I've written not only keeps going and going and going… it also doesn't seem to place the values correctly.
import csv
f = open('data.tsv')
csv_f = csv.reader(f, dialect=csv.excel_tab)
date = []
title = []
for column in csv_f:
date.append(column[0])
title.append(column[1])
for year in date:
for citation in title:
print "---\ndate: %s\ntitle: %s\n---\n\n" % (year, citation)
I'm using tab-separated values because some of the fields in my spreadsheet are chunks of text with commas. So ideally, the script should output something like the following (I figured I'd tackle splitting this output into individual markdown files later. One thing at a time):
---
date: 2015
title: foo
---
---
date: 2016
title: bar
---
But instead I getting misplaced values and output that never ends. I'm obviously learning as I go along here, so any advice is appreciated.
import csv
with open('data.tsv', newline='') as f:
csv_f = csv.reader(f, dialect=csv.excel_tab)
for column in csv_f:
year, citation = column # column is a list, unpack them directly
print "---\ndate: %s\ntitle: %s\n---\n\n" % (year, citation)
This is all I can do without the sample CSV file.
Related
copy all except first row into new csv file python
I generate several csv files each day, and I'm trying to write a python script that will open each csv file, one by one, rewrite the csv to a new location and fill in the mission information that is in the filename. I've been able to cobble together separate python scripts that can do most of it, but I'm hitting brick walls. each CSV has a standard filename: "Number_Name_Location.csv" Row 1 has a header that labels each column, Name, Location, Date etc, and each .csv can have n number of pre-filled in rows. what I'm trying to automate is the following steps: Opens the 1st .csv in the “/ToParse” folder Fill in the remaining details for all rows, which is gathered from the filename itself. Column A: Number Column B: Date of Parsing Column k: Name Column L: Location writes a new csv in the folder “/Parsed” w continue to parse the next .csv until no .csv are in the "/ToParse" folder move all orginal parsed files to /Parsed/Original below is the cobbled together code, which is overwriting the header row, How can i adjust the code to ignore the first row in the csv it's opening and just copy from rows 1 onward import glob import csv from datetime import date with open('Parsed/output.csv', 'w', newline='') as f_output: csv_output = csv.writer(f_output) for filename in glob.glob('*.csv'): with open(filename, newline='') as f_input: csv_input = csv.reader(f_input) filename_parts = filename.split("_") #print(filename_parts[0]) for row in csv_input: row[0] = filename_parts[0] row[1] = date.today() row[10] = filename_parts[1] row[11] = filename_parts[2] csv_output.writerow(row)
The open(filename, newline='') seems strange to me. Are you sure you want newline=''? You can skip the first line by manually doing f_input.readline() before the for loop. This will read (=skip) the first line and the for-loop will start on the next line. Or you could do for row in csv_input[1:] but I am not sure that works and cannot test it right now. ([1:] means skip the first item in any list-like variable (formally called: Iterable) )
How to extract a month from date in csv file?
I'm trying to get an output of all the employees who worked this month by extracting the month from the date but I get this error: month = int(row[1].split('-')[1]) IndexError: list index out of range A row in the attendance log csv looks like this: "404555403","2020-10-14 23:58:15.668520","Chandler Bing" I don't understand why it's out of range? Thanks for any help! import csv import datetime def monthly_attendance_report(): """ The function prints the attendance data of all employees from this month. """ this_month = datetime.datetime.now().month with open('attendance_log.csv', 'r') as csvfile: content = csv.reader(csvfile, delimiter=',') for row in content: month = int(row[1].split('-')[1]) if month == this_month: return row monthly_attendance_report()
It is working for me. The problem will be probably in processing the csv file, because csv files have in most cases headers, which means that you can't split header text. So add slicer [1:] to your for loop and ignore first line with header: for row in content[1:]: And processing date by slicing is not good at all, too. Use datetime module or something like that.
Get a non-blank cell recursively from previous columns of a csv using Python
I am new to both Python and Stack Overflow. I extract from a csv file a few columns into an interim csv file and clean up the data to remove the nan entries. Once I have extracted them, I endup with below two csv files. Main CSV File: Sort,Parent 1,Parent 2,Parent 3,Parent 4,Parent 5,Name,Parent 6 1,John,,,Ned,,Dave 2,Sam,Mike,,,,Ken 3,,,Pete,,,Steve 4,,Kerry,,Rachel,,Rog 5,,,Laura,Mitchell,,Kim Extracted CSV: Name,ParentNum Dave,Parent 4 Ken,Parent 2 Steve,Parent 3 Rog,Parent 4 Kim,Parent 4 What I am trying to accomplish is that I would like to recurse through main csv using the name and parent number. But, if I write a for loop it prints empty rows because it is looking up every row for the first value. What is the best approach instead of for loop. I tried dictionary reader to read scv but could not get far. Any help will be appreciated. CODE: import xlrd import csv import pandas as pd print('Opening and Reading the msl sheet from the xlsx file') with xlrd.open_workbook('msl.xlsx') as wb: sh = wb.sheet_by_index(2) print("The sheet name is :", sh.name) with open(msl.csv, 'w', newline="") as f: c = csv.writer(f) print('Writing to the CSV file') for r in range(sh.nrows): c.writerow(sh.row_values(r)) df1 = pd.read_csv(msl.csv, index_col='Sort') with open('dirty-processing.csv', 'w', newline="") as tbl_writer1: c2 = csv.writer(tbl_writer1) c2.writerow(['Name','Parent']) for list_item in first_row: for item in df1[list_item].unique(): row_content = [item, list_item] c2.writerow(row_content) Expected Result: Input Main CSV: enter image description here In the above CSV, I would like to grab unique values from each column into a separate file or any other data type. Then also capture the header of the column they are taken from. Ex: Negarnaviricota,Phylum Haploviricotina,Subphylum ... so on Next thing is would like to do is get its parent. Which is where I am stuck. Also, as you can see not all columns have data, so I want to get the last non-blank column. Up to this point everything is accomplished using the above code. So the sample output should look like below. enter image description here
Python Looping through CSV files and their columns
so I've seen this done is other questions asked here but I'm still a little confused. I've been learning python3 for the last few days and figured I'd start working on a project to really get my hands dirty. I need to loop through a certain amount of CSV files and make edits to those files. I'm having trouble with going to a specific column and also for loops in python in general. I'm used to the convention (int i = 0; i < expression; i++), but in python it's a little different. Here's my code so far and I'll explain where my issue is. import os import csv pathName = os.getcwd() numFiles = [] fileNames = os.listdir(pathName) for fileNames in fileNames: if fileNames.endswith(".csv"): numFiles.append(fileNames) for i in numFiles: file = open(os.path.join(pathName, i), "rU") reader = csv.reader(file, delimiter=',') for column in reader: print(column[4]) My issue falls on this line: for column in reader: print(column[4]) So in the Docs it says column is the variable and reader is what I'm looping through. But when I write 4 I get this error: IndexError: list index out of range What does this mean? If I write 0 instead of 4 it prints out all of the values in column 0 cell 0 of each CSV file. I basically need it to go through the first row of each CSV file and find a specific value and then go through that entire column. Thanks in advance!
It could be that you don't have 5 columns in your .csv file. Python is base0 which means it starts counting at 0 so the first column would be column[0], the second would be column[1]. Also you may want to change your for column in reader: to for row in reader: because reader iterates through the rows, not the columns. This code loops through each row and then each column in that row allowing you to view the contents of each cell. for i in numFiles: file = open(os.path.join(pathName, i), "rU") reader = csv.reader(file, delimiter=',') for row in reader: for column in row: print(column) if column=="SPECIFIC VALUE": #do stuff
Welcome to Python! I suggest you to print some debugging messages. You could add this to you printing loop: for row in reader: try: print(row[4]) except IndexError as ex: print("ERROR: %s in file %s doesn't contain 5 colums" % (row, i)) This will print bad lines (as lists because this is how they are represented in CSVReader) so you could fix the CSV files. Some notes: It is common to use snake_case in Python and not camelCase Name your variables appropriately (csv_filename instead of i, row instead of column etc.) Use the with close to handle files (read more) Enjoy!
manipulating a csv file and writing its output to a new csv file in python
I have a simple file named saleem.csv which contains the following lines of csv information: File,Run,Module,Name,,,,, General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithInterference,0,NaN,NaN,NaN,NaN General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithoutInterference,0,NaN,NaN,NaN,NaN General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithInterferencePartial,0,NaN,NaN,NaN,NaN General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithoutInterferencePartial,0,NaN,NaN,NaN,NaN General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithInterferenceDropped,0,NaN,NaN,NaN,NaN General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithoutInterferenceDropped,0,NaN,NaN,NaN,NaN General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].appl,broadcast queued,3,NaN,NaN,NaN,NaN General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].appl,replies sent,0,NaN,NaN,NaN,NaN General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].appl,replies received,0,NaN,NaN,NaN,NaN General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,nominal,1.188e+07,NaN,NaN,NaN,NaN General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,total,1232.22,NaN,NaN,NaN,NaN General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,lifetime,-1,NaN,NaN,NaN,NaN General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,Mean power consumption,55.7565,NaN,NaN,NaN,NaN General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,num devices,1,NaN,NaN,NaN,NaN General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,physical layer,0,NaN,NaN,NaN,NaN General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,device total (mWs),1232.22,NaN,NaN,NaN,NaN General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,account,0,1,2,3,4 General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,energy (mWs),0,207.519,1024.7,0,0 General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,time (s),0,3.83442,18.2656,0, I want to skip the first line, read this file and only write column[2] and column[4] to a new csv file named out.csv. I have written the following to script to do the job. import csv with open('saleem.csv') as f: readcsv = csv.reader(f) for row in readcsv: dele = (row[2], row[4]) print dele with open('out.csv', 'w+') as j: writecsv = csv.writer(j) #for row in dele: for row in dele: writecsv.writerows(dele) f.close() j.close() This produces the following output: M,y,N,e,t,w,o,r,k,.,n,o,d,e,[,4,],.,b,a,t,t,e,r,y,S,t,a,t,s 0 M,y,N,e,t,w,o,r,k,.,n,o,d,e,[,4,],.,b,a,t,t,e,r,y,S,t,a,t,s 0 Please help me, Sorry for the mistake previously please, as i mistakenly wrote row.
Edited to reflect revised question Some problems I can see: P1: writerows(...) for row in dele: writecsv.writerows(dele) writerows takes a list of rows to write to the csv file. So it shouldn't be inside a loop where you iterate over all rows and attempt to write them individually. P2: overwriting for row in readcsv: dele = (row[2], row[4]) You are continuously overwriting dele, so you aren't going to be keeping track of row[2] and row[4] from every row. What you could do instead: dele = [] with open('saleem.csv') as f: readcsv = csv.reader(f) for row in readcsv: dele.append([row[2], row[4]) print([row[2], row[4]]) with open('out.csv', 'w+') as j: writecsv.csvwriter(j) writecsv.writerows(dele) This produced output: MyNetwork.node[0].nic.phy,0 MyNetwork.node[0].nic.phy,0 MyNetwork.node[0].nic.phy,0 MyNetwork.node[0].nic.phy,0 MyNetwork.node[0].nic.phy,0 MyNetwork.node[0].nic.phy,0 MyNetwork.node[0].appl,3 MyNetwork.node[0].appl,0 MyNetwork.node[0].appl,0 MyNetwork.node[0].batteryStats,1.188e+07 MyNetwork.node[0].batteryStats,1232.22 MyNetwork.node[0].batteryStats,-1 MyNetwork.node[0].batteryStats,55.7565 MyNetwork.node[0].batteryStats,1 MyNetwork.node[0].batteryStats,0 MyNetwork.node[0].batteryStats,1232.22 MyNetwork.node[0].batteryStats,0 MyNetwork.node[0].batteryStats,0 MyNetwork.node[0].batteryStats,0 Also, unrelated to your issue at hand, the following code is unnecessary: f.close() j.close() The reason why with open(...): syntax is so widely used, is because it handles gracefully closing the file for you. You don't need to separately close it yourself. As soon as the with block ends, the file will be closed.
I would suggest using the pandas library. It makes working with csv files very easy. import pandas as pd #standard convention for importing pandas # reads the csv file into a pandas dataframe dataframe = pd.read_csv('saleem.csv') # make a new dataframe with just columns 2 and 4 print_dataframe = dataframe.iloc[:,[2,4]] # output the csv file, but don't include the index numbers or header, just the data print_dataframe.to_csv('out.csv', index=False, header=False) If you use Ipython or Jupyter Notebook, you can type dataframe.head() to see the first few values of the dataframe. There is a lot more you can do with the library that might be worth learning, but in general it is a great way to read in, filter, and process csv data.