Appending to lists created from range - python

I'm trying to append multiple columns of a csv to multiple lists. Column 1 will go in list 1, column 2 will go in list 2 etc...
However I want to be able to not hard code in the number of columns so it could work with multiple csv files. So I've used a column count to assign how many lists there should be.
I'm coming unstuck when trying to append values to these lists though. I've initiated a count that should be able to assign the right column to the right list however it seems like the loop just exits after the first loop and wont append the other columns to the list.
import csv
#open csv
f = open('attendees1.csv')
csv_f = csv.reader(f)
#count columns
first_row = next(csv_f)
num_cols = len(first_row)
#create multiple lists (within lists) based on column count
d = [[] for x in xrange(num_cols)]
#initiate count
count = 0
#im trying to state that whilst the count is less than the amount of columns, rows should be appended to lists, which list and which column will be defined by the [count] value.
while count < (num_cols):
for row in csv_f:
d[count].append(row[count])
count += 1
print count
print d

The iteration for row in csv_f: does not reset after each instance of the while loop, thus this loop exits immediately after the first time through.
You can read in everything as a list of rows, then transpose it to create a list of columns:
import csv
with open('attendees1.csv', 'r') as f:
csv_f = csv.reader(f)
first_row = next(csv_f) # Throw away the first row
d = [row for row in csv_f]
d = zip(*d)
See Transpose a matrix in Python.
If you want to keep re-reading the CSV file in the same manner as the OP, you can do that as well (but this is extremely inefficient):
while count < (num_cols):
for row in csv_f:
d[count].append(row[count])
count += 1
print count
f.seek(0) # rewind to the beginning of the file
next(csv_f) # throw away the first line again
See Python csv.reader: How do I return to the top of the file?.

Transposing the list of rows is a very elegant answer. There is another solution, not so elegant, but a little more transparent for a beginner.
Read rows, and append each element to the corresponding list, like so:
for row in csv_f:
for i in range(len(d)):
d[i].append(row[i])

Related

How to store data in python with number of row limit

For a project I have devices who send payloads and I should store them on a localfile, but I have memory limitation and I dont want to store more than 2000 data rows. again for the memory limitation I cannot have a database so I chose to store data in csv file.
I tried to use open('output.csv', 'r+') as f: ; I'm appending the rows to the end of my csv and I have to check each time the lenght with sum(1 for line in f) to be sure its not more than 2000.
The big problem starts when I reach 2000 rows and I want to ideally delete the first row and add another row to the end or start to write rows from the beginning of the file and overwrite the old rows without deleting evrything, but I dont know how to do it. I tried to use open('output.csv', 'w+') or open('output.csv', 'a+') but it will delete all the contents with w+ while writing only one row and by a+ it just continues to append to the end. I on the otherhand I cannot count the number of rows anymore with both. can you pleas help me which command should I use to start to rewrite each line from the beginning or delete one line from the beginning and append one to the end? I will also appriciate if you can tell me if there is a better chioce than csv files for storing many data or I can use a better way to count the number of rows.
This should help. See comments inline
import pandas as pd
allowed_length = 2 # Set it to the required value
df = pd.read_csv('output.csv') #Read your csv file to df
row_count = df.shape[0] #Get row count
df.loc[row_count] = ['Fridge', 15] #Insert row at end of df. In my case it has only 2 values
#if count of dataframe is greater or equal to allowed_length, the delete first row
if row_count >= allowed_length:
df = df.drop(df.head(1).index)
df.to_csv('output.csv', index=False)

Openpyxl, Pandas or both

I'm trying to process an excel file so that i can use each row and column for specific operations later on. 
My problem is as follows:
Using Openpyxl made it easier for me to load the file and be able to iterate over the rows
#reading the excel file
path = r'Datasets/Chapter 1/Table B1.1.xlsx'
wb = load_workbook(path) #loading the excel table
ws = wb.active #grab the active worksheet
#Setting the doc Header
for h in ws.iter_rows(max_row = 1, values_only = True): #getting the first row (Headers) in the table
header = list(h)
for sh in ws.iter_rows(min_row = 1 ,max_row = 2, values_only = True):
sub_header = list(sh)
#removing all of the none Values
header = list(filter(None, header))
sub_header = list(filter(None, sub_header))
#creating a list of all the rows in the excel file
row_list = []
for row in ws.iter_rows(min_row=3): #Iteration over every single row starting from the third row since first two are the headers
row = [cell.value for cell in row] #Creating a list from each row
row = list(filter(None, row)) #removing the none values from each row
row_list.append(row) #creating a list of all rows (starting from the 3d one)
colm = []
for col in ws.iter_cols(min_row=3,min_col = 1): #Iteration over every single row starting from the third row since first two are the headers
col = [cell.value for cell in col] #Creating a list from each row
col = list(filter(None, col)) #removing the none values from each row
colm.append(col) #creating a list of all rows (starting from the 3d one)
but at the same time (as far as I've read in the docs), I can't visualize it or do direct operations on the rows or columns.
While using pandas is more efficient to do direct operations on the rows and columns, I've read that iterating over a dataframe to get the rows in a list is not recommended even if it were to be done using df.iloc[2:] it would not give me the same result (saving each row in a specific list since the headers would always be there). However, unlike Openpyxl, doing direct operations on columns is much easier using something like df[col1]-df[col2] using the column name which is something I need to do. (Since just putting all columns values in a list wont do it for me)
So my question is whether or not there is a solution to be able to do what I want using only one of them, or if using both of them isn't that bad, keeping in mind I'd have to load the excel file twice.
"Thanks in Advance!"
There is no problem to read an excel file once using openpyxl and then load rows to pandas:
pandas.DataFrame(row_list, columns=header)
You are right, iterating over a DataFrame using indexes is quite slow, but you have other options: apply(), iterrows(), itertuples()
Link: Different ways to iterate over rows in pandas DataFrame
I would also like to point out that your code probably does not do what you would like.
list(filter(None, header)) filters not only None, but all falsy-values such as 0 or "".
such filtering shifts the columns. for example, you have a row [1, None, 3] and columns ['a', 'b', 'c']. by filtering None, you will get [1, 3] which will relate to columns 'a' and 'b'.

Only getting every nth value of each row in a csv file

So I have a .csv file with 35 columns, some of which I want to write to a database.
I only need about 4 of these columns - is it possible to just write say the 3rd value, the 25th value, and the 29th value in each row to a MySQL database?
Either that, or can I only write where the values are "Year", "Amount", and "Whatever"?
Now I know I could just truncate the Excel file, but its for a college assignment so I wanted to show a "techy" solution.
Maybe something like this?
desired_rows = [...] # Rows you'd like to read, 0-based
for number, row in enumerate(reader):
if not number in desired_rows:
continue
# Do stuff with the rows you want
....
You could use operator.itemgetter to create a function that would retrieve all of the elements from each row each time it's called.
Something like the following. Note I subtract 1 from each column because the first column is at row index 0, the second at index 1, etc.
import csv
from operator import itemgetter
COLS = 3, 25, 29
filename = 'columns.csv'
getters = itemgetter(*(col-1 for col in COLS))
with open(filename, newline='') as csvfile:
for row in csv.reader(csvfile):
print(getters(row))

How to remove duplicated rows in a csv file without pandas in python?

Pandas is not allowed in the solution, only the python standard library is allowed. I have a csv file that contains one column (left side). How do I remove the duplicated rows to make the csv look exactly like the right side? "25,60" and "60,25" should be seen as a pair of duplicated rows. For each pair of duplicated row, the kept row in format "A,B" where A < B, the removed row should be the one A>B. In this case, "25,60" and "80,123" should be kept. For unique row, it should stay whatever it is.
k = []
with open('file.csv','r') as dat, open('newfile.csv','w') as f:
for i in dat:
a = sorted(int(j) for j in i.split(','))
if a not in k:
k.append(a)
f.write(','.join([str(m) for m in a]) +'\n')

Creating a list using csv.reader()

I'm trying to create a list in python from a csv file. The CSV file contains only one column, with about 300 rows of data. The list should (ideally) contain a string of the data in each row.
When I execute the below code, I end up with a list of lists (each element is a list, not a string). Is the CSV file I'm using formatted incorrectly, or is there something else I'm missing?
filelist = []
with open(r'D:\blah\blahblah.csv', 'r') as expenses:
reader = csv.reader(expenses)
for row in reader:
filelist.append(row)
row is a row with one field. You need to get the first item in that row:
filelist.append(row[0])
Or more concisely:
filelist = [row[0] for row in csv.reader(expenses)]
It seems your "csv" doesn't contain any seperator like ";" or ",".
Because you said it only contains 1 column. So it ain't a real csv and there shouldn't be a seperator.
so you could simply read the file line-wise:
filelist = []
for line in open(r'D:\blah\blahblah.csv', 'r').readlines():
filelist.append(line.strip())
Each row is read as list of cells.
So what you want to do is
output = [ row[0] for row in reader ]
since you only have the first cell filled out in each row.

Categories

Resources