Condensing Repetitive Code - python

I created a simple application that loads up multiple csv's and stores them into lists.
import csv
import collections
list1=[]
list2=[]
list3=[]
l = open("file1.csv")
n = open("file2.csv")
m = open("file3.csv")
csv_l = csv.reader(l)
csv_n = csv.reader(n)
csv_p = csv.reader(m)
for row in csv_l:
list1.append(row)
for row in csv_n:
list2.append(row)
for row in csv_p:
list3.append(row)
l.close()
n.close()
m.close()
I wanted to create a function that would be responsible for this, so that I could avoid being repetitive and to clean up the code so I was thinking something like this.
def read(filename):
x = open(filename)
y = csv.reader(x)
for row in y:
list1.append(row)
x.close()
However it gets tough for me when I get to the for loop which appends to the list. This would work to append to 1 list, however if i pass another file name into the function it will append to the same list. Not sure the best way to go about this.

You just need to create a new list each time, and return it from your function:
def read(filename):
rows = []
x = open(filename)
y = csv.reader(x)
for row in y:
rows.append(row)
x.close()
return rows
Then call it as follows
list1 = read("file1.csv")
Another option is to pass the list in as an argument to your function - then you can choose whether to create a new list each time, or append multiple CSVs to the same list:
def read(filename, rows):
x = open(filename)
y = csv.reader(x)
for row in y:
rows.append(row)
x.close()
return rows
# One list per file:
list1 = []
read("file1.csv", list1)
# Multiple files combined into one list:
listCombined = []
read("file2.csv", listCombined)
read("file3.csv", listCombined)
I have used your original code in my answer, but see also Malik Brahimi's answer for a better way to write the function body itself using with and list(), and DogWeather's comments - there are lots of different choices here!

You can make a single function, but use a with statement to condense even further:
def parse_csv(path):
with open(path) as csv_file:
return list(csv.reader(csv_file))

I like #DNA's approach. But consider a purely functional style. This can be framed as a map operation which converts
["file1.csv", "file2.csv", "file3.csv"]
to...
[list_of_rows, list_of_rows, list_of_rows]
This function would be invoked like this:
l, n, m = map_to_csv(["file1.csv", "file2.csv", "file3.csv"])
And map_to_csv could be implemented something like this:
def map_to_csv(filenames):
return [list(csv.reader(open(filename))) for filename in filenames]
The functional solution is shorter and doesn't need temporary variables.

Related

Appending array to a larger one and then returning it

My goal is to create a function that takes as input a CSV file. From that CSV file I want it to create an array where the data in the first column is the x-coordinate and the data in the second column is the y-coordinate. For every row in the data file I want it to add the data into the array.
import numpy as np
doc = open("d.csv")
headers = doc.readline()
def generateArray(doc):
for theData in doc:
editDocument = theData.strip().split(",")
x = splitDocument[0]
y = splitDocument[1]
createArray = np.array((x, y))
return createArray
print(generateArray(doc))
When I return it, it simply returns the last row of the CSV file, when in fact I want all arrays created to return. Functions I have used like .append() give me an error saying that "append cannot be used on a 0 dimensional array." Any suggestions on how I can edit my code to give return arrays.
As a fast (in terms of changing your code) solution, use a generator:
def generateArray(doc):
for theData in doc:
editDocument = theData.strip().split(",")
x = splitDocument[0]
y = splitDocument[1]
createArray = [x, y]
yield createArray
arr = np.array(list(generateArray(doc)))
print(arr)
As a better solution, I would suggest that you check how to use np.loadtxt.

Writing to textfile in a specific way using a list in python

I am trying to write to a textfile in python where the the output in the file.
I have a Class called phonebook which has a list containing objects of the phonebook class.
My constructor looks like this:
def __init__(self,name,number):
self.name = name
self.number = number
When i add a new object to the list looks like this:
def add(self):
name = input()
number = input()
p = Phonebook(name,number)
list.append(p)
When I'm writing my list to the textfile the function looks like this:
def save():
f = open("textfile.txt","w")
for x in list:
f.write(x.number+";"+x.name+";")
f.close()
And its writes out:
12345;david;12345;dave;12345;davey;09876;cathryn;09876;cathy; and so on..
should look like this:
12345;david,dave,davey
09876;cathryn,cathy,
78887;peter,pete,petr,petemon
My question is then.. How do I implement this save function so it will only write out one unique number and all its names connected to that number?
Feels like its impossible to do with only a list containing names and numbers.. Maybe im wrong..
Dictionaries in Python give you fast access to items based on their key. So a good solution to your problem would be to index the Phonebook objects using the Phonebook.number as the key to store a list of Phonebooks as the values. Then at the end just handle the printing based on however you want each line to appear.
This example should work in your case:
phone_dict = dict() # Used to store Phonebook objects intead of list
def add(self):
name = input()
number = input()
p = Phonebook(name,number)
if p.number in phone_dict:
phone_dict[p.number].append(p) # Append p to list of Phonebooks for same number
else:
phone_dict[p.number] = [p] # Create list for new phone number key
def save():
f = open("textfile.txt","w")
# Loop through all keys in dict
for number in phone_dict:
f.write(x.number + ";") # Write out number
phone_books = phone_dict[number]
# Loop through all phone_books associated with number
for i, pb in enumerate(phone_books):
f.write(pb.name)
# Only append comma if not last value
if i < len(phone_books) - 1:
f.write(",")
f.write("\n") # Go to next line for next number
f.close()
so how would the load function look?
I have tried doing one, and it loads everything into the dictionary but the program doesnt function with my other functions like it did before i saved it and reload it to the program again..
def load(self,filename):
self.dictList = {}
f = open(filename,"r")
for readLine in f:
readLine = readLine.split(";")
number = readLine[0]
nameLength = len(readLine[1:])
name = readLine[1:nameLength]
p = phonebook(name)
self.dictList[number] = [p]
print(self.dictList)
f.close()

Python Dynamic Data Structures

I am going to read the lines of a given text file and select several chunks of data whose format are (int, int\n) . Every time the number of lines are different so I need a dynamic sized data structure in Python. I also would like to store those chunks in 2D data structures. If you are familiar with MATLAB programming, I'd like to have something like a structure A{n} n = number of chunks of data and each chunk includes several lines of the data mentioned above.
Which type of data structure would you recommend? and how to implement with it?
i.e. A{0} = ([1,2],[2,3],[3,4]) A{1} = ([1,1],[2,2],[5,5],[7,4]) and so on.
Thank you
A python list can contain lists as well any different data type.
l = []
l.append(2) # l is now (2)
l.extend([3,2]) # l is now (2,3,2)
l.append([4,5]) # l is now (2,3,2,[4,5])
list.Append appends whatever it is given as argument to the list
while list.extend makes the given the argument the tail of the list.
I guess you required list would appear somehwhat like this:
l = ([[1,2],[2,3],[3,4]],[[1,1],[2,2],[5,5],[7,4]])
PS: Here's a link to get you jump start learning python
https://learnxinyminutes.com/docs/python/
Just keep in mind that if your are reading data from text file , the format is string , you need to use int() to convert your string to int.
The issue was resolved with 2 steps appending the list.
import numpy as np
file = ('data.txt')
f = open(file)
i = 0
str2 = '.PEN_DOWN\n'
str3 = '.PEN_UP\n'
A = []
B = []
for line in f.readlines():
switch_end = False
if (line == str2) or (~switch_end):
if line[0].isdigit():
A.append(line[:-1])
elif line == str3:
switch_end = True
B.append(A)
A = []
B.append(A)
f.close()
print(np.shape(A))
print(np.shape(B))

Custom sort method in Python is not sorting list properly

I'm a student in a Computing class and we have to write a program which contains file handling and a sort. I've got the file handling done and I wrote out my sort (it's a simple sort) but it doesn't sort the list. My code is this:
namelist = []
scorelist = []
hs = open("hst.txt", "r")
namelist = hs.read().splitlines()
hss = open("hstscore.txt","r")
for line in hss:
scorelist.append(int(line))
scorelength = len(scorelist)
for i in range(scorelength):
for j in range(scorelength + 1):
if scorelist[i] > scorelist[j]:
temp = scorelist[i]
scorelist[i] = scorelist[j]
scorelist[j] = temp
return scorelist
I've not been doing Python for very long so I know the code may not be efficient but I really don't want to use a completely different method for sorting it and we're not allowed to use .sort() or .sorted() since we have to write our own sort function. Is there something I'm doing wrong?
def super_simple_sort(my_list):
switched = True
while switched:
switched = False
for i in range(len(my_list)-1):
if my_list[i] > my_list[i+1]:
my_list[i],my_list[i+1] = my_list[i+1],my_list[i]
switched = True
super_simple_sort(some_list)
print some_list
is a very simple sorting implementation ... that is equivelent to yours but takes advantage of some things to speed it up (we only need one for loop, and we only need to repeat as long as the list is out of order, also python doesnt require a temp var for swapping values)
since its changing the actual array values you actually dont even need to return

map function to columns of list iterator

Imagine I'm reading in a csv file of numbers that looks like this:
1,6.2,10
5.4,5,11
17,1.5,5
...
And it's really really long.
I'm going to iterate through this file with a csv reader like this:
import csv
reader = csv.reader('numbers.csv')
Now assume I have some function that can take an iterator like max:
max((float(rec[0]) for rec in reader))
This finds the max of the first column and doesn't need to read the whole file into memory.
But what if I want to run max on each column of the csv file, still without reading the whole file into memory?
If max were rewritten like this:
def max(iterator):
themax = float('-inf')
for i in iterator:
themax = i if i > themax else themax
yield
yield themax
I could then do some fancy work (and have) to make this happen.
But what if I constrain the problem and don't allow max to be rewritten? Is this possible?
Thanks!
If you're comfortable with a more functional approach you can use functools.reduce to iterate through the file, pulling only two rows into memory at once, and accumulating the column-maximums as it goes.
import csv
from functools import reduce
def column_max(row1, row2):
# zip contiguous rows and apply max to each of the column pairs
return [max(float(c1), float(c2)) for (c1, c2) in zip(row1, row2)]
reader = csv.reader('numbers.csv')
# calling `next` on reader advances its state by one row
first_row = next(reader)
column_maxes = reduce(column_max, reader, first_row)
#
#
# another way to write this code is to unpack the reduction into explicit iteration
column_maxes = next(reader) # advances `reader` to its second row
for row in reader:
column_maxes = [max(float(c1), float(c2)) for (c1, c2) in zip(column_maxes, row)]
I would just move away from using a function which you pass the iterator but instead iterate on your own over the reader:
maxes = []
for row in reader:
for i in range(len(row)):
if i > len(maxes):
maxes.append(row[i])
else:
maxes[i] = max(maxes[i], row[i])
At the end, you will have the list maxes which will contain each maximum value, without having the whole file in memory.
def col_max(x0,x1):
"""x0 is a list of the accumulated maxes so far,
x1 is a line from the file."""
return [max(a,b) for a,b in zip(x0,x1)]
Now functools.reduce(col_max,reader,initializer) will return just what you want. You will have to supply initializer as a list of -inf's of the correct length.

Categories

Resources