I am trying to read each line of the csv file below.
csv file
I would like to extract the data from each row and append to a new text file.
Here is the code i use to read the csv file:
#Count number of lines in the csv file
cnd = datetime.now().strftime("%m%d%y")
df = pd.read_csv(cnd+'sr.csv')
numline = len(df)
ma = df['Make'][:numline]
mo = df['Model'][:numline]
yr = df['Year'][:numline]
cl = df['Color'][:numline]
pr = df['Price'][:numline]
#For loop to break the csv file into 10 lines at a time
for x in range(0,numline,10):
if x < 10:
mk = ma[:10]
mod = mo[:10]
ye = yr[:10]
clr = cl[:10]
prc = pr[:10]
I'm using the snippet of code below to create the new text file, extract the data from the csv file and appended the data to the new text file. The problem lies in the 3rd block of code.
#Create text File
facts = datetime.now().strftime("%m-%d-%y-%H%M%S.txt")
f = open(facts, 'w+')
#Write generic Title
f.write('Car Facts' + '\r')
f.close()
#Open the text file. Iterate thru each line of the csv file and append to the text file
of = open(facts, "a")
for k in mk:
for m in mod:
for y in ye:
for c in clr:
for p in prc:
of.write('\r' + 'Make: ' + str(k) + '\r' + 'Model: ' + str(m) + '\r' + 'Year: ' + str(y) + '\r' + 'Color: ' + str(c) + '\r' + 'Price: ' + str(p) + '\r')
of.close()
I end up with a text file that has about 30 blocks of random combinations.
The very first and last block written to the text file are correct.
Everything else in between is just random combinations that i don't need.
How can I fix the code to get the desired end result below?
Thank you
Car Facts
Make: Ford
Model: Bronco
Year: 1978
Color: Blue
Price: $24,000.00
Make: Land Rover
Model: Defender
Year: 1985
Color: Black
Price: $21,000.00
This opens the existing CSV file and loops through it while writing the line to the new text file. You might have to manually delete the first entry since it consists of the titles
File1 = open("Existing CSV File.csv", "r")
File2 = open("New text file.txt", "w")
for X in File1:
Line = X.split(",")
File2.write("Make: " + Line[0] + "\n")
File2.write("Model: " + Line[1] + "\n")
File2.write("Year: " + Line[2] + "\n")
File2.write("Colour: " + Line[3] + "\n")
File2.write("Price: " + Line[4] + "\n")
File2.write("")
File1.close()
File2.close()
I would like load data which are 10 categories of document, each cateory contains text files, but I keep getting the following error:
IndexError: list index out of range
THis is code :
def load_data(folder):
data = []
files = [join(folder, x) for x in os.listdir(folder)]
for file in files:
topic = file.split("/")[9] # this is where the error occurs
label = topic.replace(" ", "_")
name = "__label__" + label
with open(file, "rb") as f:
content = f.read()
content = content.decode('utf-16')
content = " ".join(i for i in content.split())
data.append(name + " " + content)
return data
Easy way to debug this would be to add print statements and check what the objects hold. For e.g. in this case, you can add 2 print statements at the beginning of the for loop. This would help you to figure out why you are getting IndexError
def load_data(folder):
data = []
files = [join(folder, x) for x in os.listdir(folder)]
for file in files:
print(file)
print(file.split("/"))
topic = file.split("/")[9] # this is where the error occurs
label = topic.replace(" ", "_")
name = "__label__" + label
with open(file, "rb") as f:
content = f.read()
content = content.decode('utf-16')
content = " ".join(i for i in content.split())
data.append(name + " " + content)
return data
Truoble with a really annoying homework. I have a csv-file with lots of comma-delimitered fields per row. I need to take the last two fields from every row and write them into a new txt-file. The problem is that some of the latter fields have sentences, those with commas are in double quotes, those without them aren't. For example:
180,easy
240min,"Quite easy, but number 3, wtf?"
300,much easier than the last assignment
I did this and it worked just fine, but the double quotes disappear. The assignment is to copy the fields to the txt-file, use semicolon as delimiter and remove possible line breaks. The text must remain exactly the same. We have an automatic check system, so it's no use arguing if this makes any sense.
import csv
file = open('myfile.csv', 'r')
output= open('mytxt.txt', 'w')
csvr = csv.reader(file)
headline = next(csvr)
for line in csvr:
lgt = len(line)
time = line[lgt - 2].replace('\n', '')
feedb = line[lgt - 1].replace('\n', '')
if time != '' and feedb != '':
output.write(time + ';' + feedb + '\n')
output.close()
file.close()
Is there some easy solution for this? Can I use csv module at all? No one seems to have exactly the same problem.
Thank you all beforehand.
Try this,
import csv
file = open('myfile.csv', 'r')
output= open('mytxt.txt', 'w')
csvr = csv.reader(file)
headline = next(csvr)
for line in csvr:
lgt = len(line)
time = line[lgt - 2].replace('\n', '')
feedb = line[lgt - 1].replace('\n', '')
if time != '' and feedb != '':
if ',' in feedb:
output.write(time + ';"' + feedb + '"\n')
else:
output.write(time + ';' + feedb + '\n')
output.close()
file.close()
Had to do it the ugly way, the file was too irrational. Talked with some collaegues on the same course and apparently the idea was NOT to use csv module here, but to rehearse basic file handling in Python.
file = open('myfile.csv','r')
output = open('mytxt.txt', 'w')
headline = file.readline()
feedb_lst = []
count = 0
for line in file:
if line.startswith('1'): #found out all lines should start with an ID number,
data_lst = line.split(',', 16) #that always starts with '1'
lgt = len(data_lst)
time = data_lst[lgt - 2]
feedb = data_lst[lgt - 1].rstrip()
feedback = [time, feedb]
feedb_lst.append(feedback)
count += 1
else:
feedb_lst[count - 1][1] = feedb_lst[count - 1][1] + line.rstrip()
i = 1
for item in feedb_lst:
if item[0] != '' and item[1] != '':
if i == len(feedb_lst):
output.write(item[0] + ';' + item[1])
else:
output.write(item[0] + ';' + item[1] + '\n')
i += 1
output.close()
file.close()
Thank you for your help!
I'm trying to see how I can structure a script in a way that I can use the inheritance method. I'm fairly new to python. And my problem is using variables in one class from another class-def. I just recently learned about the super function and I don't think I'm using it right because it keeps printing and recalculating everything that it's pulling from.
Let's say I have a bunch of messages coming in a text file delimited by commas that give me different information. I want to be able to take that text file and...
be able to read the content delimited by commas (done)
tell me how many of each type of message there are (done)
then create a class called messages that has defs for each type of message with its respective calculations and variables it creates in those instances (done)
create class to print and write those calculations and variables in the client and xls (partially done due to my issue)
create class to convert xls to csv and kml (somewhat done)
Here is a toy structure of what I'm working with
import bunch of stuff
data = [] #empty because we will store data into it
#Reads a CSV file and return it as a list of rows
def read_csv_file(filename):
"""Reads a CSV file and return it as a list of rows."""
for row in csv.reader(open(filename)):
data.append(row)
return data
with open(path_in + data_file) as csvfile:
read_it = list(csv.reader(csvfile, delimiter=','))
#Counts the number of times a GPS command is observed
def list_msg_type_countdata):
"""Counts the number of times a GPS command is observed.
Returns a dictionary object."""
msg_count = dict()
for row in data:
try:
msg_count[row[0]] += 1
except KeyError:
msg_count[row[0]] = 1
return msg_count
print(list_msg_type_count(read_it))
print ("- - - - - - - - - - - - -")
class CreateWorkbook:
def openworkbook(self, data):
global output_filename
output_filename = input('output filename:')
global workbook
workbook = xlsxwriter.Workbook(path_out + output_filename + '_' + command_type +'.xlsx')
self.worksheet = workbook.add_worksheet()
#formatting definitions
global bold
bold = workbook.add_format({'bold': True})
global date_format
date_format = workbook.add_format({'num_format': "m/d/yyyy hh:mm:ss"})
global time_format
time_format = workbook.add_format({'num_format': "hh:mm:ss"})
def closeworkbook_gprmc(self, data):
print('closeworkbook')
#pull data from process_msg1
(i1, i2, i3) = messagetype.process_msg1(data)
#sets up the header row
self.worksheet.write('A1','item1',bold)
self.worksheet.write('B1', 'item2',bold)
self.worksheet.write('C1', 'item3',bold)
self.worksheet.autofilter('A1:C1') #dropdown menu created for filtering
# Create a For loop to iterate through each row in the XLS file, starting at row 2 to skip the headers
for r, row in enumerate(data, start=1): #where you want to start printing results inside workbook
for c, col in enumerate(data):
self.worksheet.write_column(r,0, i1)
self.worksheet.write_column(r,1, i2)
self.worksheet.write_column(r,2, i3)
workbook.close()
f.close()
print('XLSX file named ' + output_filename + '_' + command_type +' was created')
def closeworkbook_msg2(self, data):
#pull data from process_msg2
(i1, i2, i3, i4) = messagetype.process_msg2(data)
#sets up the header row
self.worksheet.write('A1','item1',bold)
self.worksheet.write('B1', 'item2',bold)
self.worksheet.write('C1', 'item3',bold)
self.worksheet.write('C1', 'item4',bold)
self.worksheet.autofilter('A1:C1') #dropdown menu created for filtering
# Create a For loop to iterate through each row in the XLS file, starting at row 2 to skip the headers
for r, row in enumerate(data, start=1): #where you want to start printing results inside workbook
for c, col in enumerate(data):
self.worksheet.write_column(r,0, i1)
self.worksheet.write_column(r,1, i2)
self.worksheet.write_column(r,2, i3)
self.worksheet.write_column(r,3, i4)
workbook.close()
f.close()
print('XLSX file named ' + output_filename + '_' + command_type + ' was created')
class ConvertFile
def convert2csv(self, data):
# set path to folder containing xlsx files
os.chdir(path_out)
# find the file with extension .xlsx
xlsx = glob.glob(output_filename + '_' + command_type + '.xlsx')
# create output filenames with extension .csv
csvs = [x.replace('.xlsx','.csv') for x in xlsx]
# zip into a list of tuples
in_out = zip(xlsx,csvs)
# loop through each file, calling the in2csv utility from subprocess
for xl,csv in in_out:
out = open(csv,'w')
command = 'c:/python34/scripts/in2csv %s\\%s' % (path_out,xl)
proc = subprocess.Popen(command,stdout=out)
proc.wait()
out.close()
print('CSV file named ' + output_filename + '_' + command_type + ' was created')
def convert2kml(self, data):
#Input the file name.
h = open(path_out + output_filename + '_' + command_type + '.csv')
with h as csvfile2:
data2 = csv.reader(csvfile2,delimiter=',')
next(data2)
#Open the file to be written.
g = open(output_filename + '_' + command_type +'.kml','w')
g.write("<?xml version='1.0' encoding='UTF-8'?>\n")
g.write("<kml xmlns='http://earth.google.com/kml/2.1'>\n")
g.write("<Document>\n")
g.write(" <name>" + output_filename + '_' + command_type + '.kml' +"</name>\n")
for row in data2:
g.write(" <Placemark>\n")
g.write("<TimeStamp><when>" + str(row[0]) + "</when></TimeStamp>\n")
g.write(" <Point>\n")
g.write(" <coordinates>" + str(row[2]) + "," + str(row[1]) + "</coordinates>\n")
g.write(" </Point>\n")
g.write(" </Placemark>\n")
g.write("</Document>\n")
g.write("</kml>\n")
g.close()
h.close()
print('and ' + output_filename + '_' + command_type +'.kml was created,too!')
class MessageType:
def process_msg1(self,data)
item1 = []
item2 = []
item3 = []
print('printing stuff')
for r in data:
if row[0] == 'msg type1'
item1.append('calculations')
item2.append('calculations')
item3.append('calculations')
print('calculations done')
return(array(item1),array(item2),array(item3))
def process_msg2(self,data)
item1 = []
item2 = []
item3 = []
item4 = []
print('printing stuff')
for r in data:
if row[0] == 'msg type1'
item1.append('calculations')
item2.append('calculations')
item3.append('calculations')
item4.append('calculations')
print('calculations done')
return(array(item1),array(item2),array(item3),array(item4))
class PrintMSG(MessageType):
def process_msg1(self, data):
(i1, i2, i3) = super(PrintMSG, self).process_msg1(data)
print('printing plus plotting using variables from class Message')
def process_msg2(self, data):
(i1, i2, i3,i4) = super(PrintMSG, self).process_msg2(data)
print('printing plus plotting using variables from class Message')
#processing piece
keep_asking = True
while keep_asking:
command_type = input("What message type do you want to look at?")
if command_type == 'msg type1':
createworkbook = CreateWorkbook()
createworkbook.openworkbook(data)
msg = MessageType()
print_msg = PrintMSG()
print_msg.process_msg1(data)
createworkbook.closeworkbook_msg1(data)
convert2csv(data)
convert2kml(data)
elif command_type == 'msg type2':
createworkbook = CreateWorkbook()
createworkbook.openworkbook(data)
msg = MessageType()
print_msg = PrintMSG()
print_msg.process_msg2(data)
createworkbook.closeworkbook_msg2(data)
convert2csv(data)
convert2kml(data)
else:
print("Invalid type:", command_type)
wannalook = input('Want to look at another message or no?')
if not wannalook.startswith('y'):
keep_asking = False
Class definition
The code is kind of big and there are many things that do not work or could be improved. As a starter, take the class CreateWorkbook. You need always use self, as the first argument for methods. (There are a few exceptions but they are not relevant here.) To be able to use variables defined in one method in another, you need to prefix them with self.:
class CreateWorkbook:
def openworkbook(self, data):
self.output_filename = input('output filename:')
self.workbook = xlsxwriter.Workbook(path_out + output_filename + '_' + command_type +'.xlsx')
self.worksheet = workbook.add_worksheet()
def closeworkbook_msg1(self, data):
#sets up the header row
self.worksheet.write('A1','item1',bold)
self.worksheet.write('B1', 'item2',bold)
self.worksheet.write('C1', 'item3',bold)
self.worksheet.autofilter('A1:C1') #dropdown menu created for filtering
# Create a For loop to iterate through each row in the XLS file, starting at row 2 to skip the headers
for r, row in enumerate(data, start=1): #where you want to start printing results inside workbook
for c, col in enumerate(data):
self.worksheet.write_column(r,0, i1)
self.worksheet.write_column(r,1, i2)
self.worksheet.write_column(r,2, i3)
self.workbook.close()
print('XLSX file named ' + output_filename + '_' + command_type +' was created')
def closeworkbook_msg2(self, data):
#sets up the header row
self.worksheet.write('A1','item1',bold)
self.worksheet.write('B1', 'item2',bold)
self.worksheet.write('C1', 'item3',bold)
self.worksheet.write('C1', 'item4',bold)
self.worksheet.autofilter('A1:C1') #dropdown menu created for filtering
# Create a For loop to iterate through each row in the XLS file, starting at row 2 to skip the headers
for r, row in enumerate(data, start=1): #where you want to start printing results inside workbook
for c, col in enumerate(data):
self.worksheet.write_column(r,0, i1)
self.worksheet.write_column(r,1, i2)
self.worksheet.write_column(r,2, i3)
self.worksheet.write_column(r,3, i4)
self.workbook.close()
print('XLSX file named ' + output_filename + '_' + command_type + ' was created')
Reading csv
This doesn't make much sense:
f = open(path_in + data_file)
read_it = read_csv_file(path_in + data_file)
with f as csvfile:
readCSV = csv.reader(csvfile,delimiter=',')
I would interpret it as something like this:
with open(path_in + data_file) as csvfile:
read_it = list(csv.reader(csvfile, delimiter=','))
I successfully simplified a python module that imports data from a spectrometer
(I'm a total beginner, somebody else wrote the model of the code for me...)
I only have one problem: half of the output data (in a .csv file) is surrounded by brackets: []
I would like the file to contain a structure like this:
name, wavelength, measurement
i.e
a,400,0.34
a,410,0.65
...
but what I get is:
a,400,[0.34]
a,410,[0.65]
...
Is there any simple fix for this?
Is it because measurement is a string?
Thank you
import serial # requires pyserial library
ser = serial.Serial(0)
ofile = file( 'spectral_data.csv', 'ab')
while True:
name = raw_input("Pigment name [Q to finish]: ")
if name == "Q":
print "bye bye!"
ofile.close()
break
first = True
while True:
line = ser.readline()
if first:
print " Data incoming..."
first = False
split = line.split()
if 10 <= len(split):
try:
wavelength = int(split[0])
measurement = [float(split[i]) for i in [6]]
ofile.write(str(name) + "," + str(wavelength) + "," + str(measurement) + '\n')
except ValueError:
pass # handles the table heading
if line[:3] == "110":
break
print " Data gathered."
ofile.write('\n')
do this:
measurement = [float(split[i]) for i in [6]]
ofile.write(str(name) + "," + str(wavelength) + "," + ",".join(measurement) + '\n')
OR
ofile.write(str(name) + "," + str(wavelength) + "," + split[6] + '\n')