So I have a .csv file with 35 columns, some of which I want to write to a database.
I only need about 4 of these columns - is it possible to just write say the 3rd value, the 25th value, and the 29th value in each row to a MySQL database?
Either that, or can I only write where the values are "Year", "Amount", and "Whatever"?
Now I know I could just truncate the Excel file, but its for a college assignment so I wanted to show a "techy" solution.
Maybe something like this?
desired_rows = [...] # Rows you'd like to read, 0-based
for number, row in enumerate(reader):
if not number in desired_rows:
continue
# Do stuff with the rows you want
....
You could use operator.itemgetter to create a function that would retrieve all of the elements from each row each time it's called.
Something like the following. Note I subtract 1 from each column because the first column is at row index 0, the second at index 1, etc.
import csv
from operator import itemgetter
COLS = 3, 25, 29
filename = 'columns.csv'
getters = itemgetter(*(col-1 for col in COLS))
with open(filename, newline='') as csvfile:
for row in csv.reader(csvfile):
print(getters(row))
Related
For a project I have devices who send payloads and I should store them on a localfile, but I have memory limitation and I dont want to store more than 2000 data rows. again for the memory limitation I cannot have a database so I chose to store data in csv file.
I tried to use open('output.csv', 'r+') as f: ; I'm appending the rows to the end of my csv and I have to check each time the lenght with sum(1 for line in f) to be sure its not more than 2000.
The big problem starts when I reach 2000 rows and I want to ideally delete the first row and add another row to the end or start to write rows from the beginning of the file and overwrite the old rows without deleting evrything, but I dont know how to do it. I tried to use open('output.csv', 'w+') or open('output.csv', 'a+') but it will delete all the contents with w+ while writing only one row and by a+ it just continues to append to the end. I on the otherhand I cannot count the number of rows anymore with both. can you pleas help me which command should I use to start to rewrite each line from the beginning or delete one line from the beginning and append one to the end? I will also appriciate if you can tell me if there is a better chioce than csv files for storing many data or I can use a better way to count the number of rows.
This should help. See comments inline
import pandas as pd
allowed_length = 2 # Set it to the required value
df = pd.read_csv('output.csv') #Read your csv file to df
row_count = df.shape[0] #Get row count
df.loc[row_count] = ['Fridge', 15] #Insert row at end of df. In my case it has only 2 values
#if count of dataframe is greater or equal to allowed_length, the delete first row
if row_count >= allowed_length:
df = df.drop(df.head(1).index)
df.to_csv('output.csv', index=False)
I'm just starting up with python and am struggling to extract a value from my first column, at the end of the dataframe.
so let's say I have a .csv file with 3 columns:
id,name,country
1,bob,USA
2,john,Brazil
3,brian,austria
i'm trying to extract '3' from the ID column (last row ID value)
fileName=open('data.csv')
reader=csv.reader(fileName,delimiter=',')
count=0
for row in reader:
count=count+1
I'm able to get the rowcount, but am unsure how to get the value from that particular column
this should do the job:
import csv
fileName=open('123.csv')
reader=csv.reader(fileName,delimiter=',')
count=0
for row in reader:
if count == 3:
print(row[0])
count=count+1
but its better to import pandas and convert your csv file to a dataframe by doing this :
import csv
import pandas as pd
fileName=open('123.csv')
reader=csv.reader(fileName,delimiter=',')
df = pd.DataFrame(reader)
print(df.loc[3][0])
it would be easier to grab whatever element you want.
using loc, you can access any element using the row number and the column number, for example you wanted to grab the element 3 which is on the row 3, column 0, so you just grab it by df.loc[3][0]
if you don't have pandas installed, install it in the command prompt using the command:
pip install pandas
I found your question a bit ambiguous, so I'm answering for both cases.
If you need the first column, third row value:
value = None
with open('data.csv') as fileName:
reader = csv.reader(fileName, delimiter=',')
for row_number, row in enumerate(reader, 1):
if row_number == 3:
value = row[0]
If you need the first column, last row value:
value = None
with open('data.csv') as fileName:
reader = csv.reader(fileName, delimiter=',')
for row in reader:
value = row[0]
In both cases, value has the value you want.
As mentioned in the comments df['id'].iloc[-1] will return the last id value in the DataFrame which in this case is what you want.
You can also access based on the values in the other rows. For example:
df.id[(df.name == 'brian')] would also give you a value of 3 because brian is the name associated with an id of 3.
You also don't have to loop through the DataFrame rows to get the size, but when you have the DataFrame loaded can simply do count = df.shape[0] which will return the number of rows.
Given that you are starting with Python, and looking at the code provided, I think this Idiomatic Python video will be super helpful. Transforming Code Into Beautiful, Idiomatic Python | Raymond Hettinger
In addition to pandas documentation referenced below, this summary is pretty helpful as well:
Select rows in pandas MultiIndex DataFrame.
Pandas indexing documentation:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html
I want to create a program in which reads a CSV file and writes in another file. My problem is, the file I'm ready is kinda big and I don't want to go through every column by doing this:
columns = defaultdict(list)
reader = csv.DictReader(csvfile)
for row in reader:
for (k,v) in row.items():
columns[k].append(v)
print(columns['name'])
print(columns['id'])
...
I wanted to, instead, do columns[0] to find 'name', and so on. Is there any way I can do this?
You are now reading the CSV with a DictReader this creates the columns based on names, in your case you could just use the reader:
columns = defaultdict(list)
reader = csv.reader(csvfile)
next(reader) # to skip the header row
for row in reader:
for i, v in enumerate(row):
columns[i].append(v)
print(columns[0])
print(columns[1])
I'm not sure that I understand your question. If you are asking, "can I read only the first column?", then the short answer is no. CSV is specifically designed to read a fixed number of columns from variable length records. More specifically, the data is organized as a list of rows, not a list of columns. You can't just seek past what you don't want to read. It sounds like what you are trying to do is reorganized your data into columns.
If you want to minimize the processing of what you do read, it sounds like all you need to do is use csv.reader and skip the first row containing the header. Each row from the reader will return a list of strings and the construction of this list should be less expensive than a map.
If you collect the list of rows you can then put it in a numpy array. A numpy array will allow you to access columns (e.g., x[:, 0]) or rows (e.g., x[0, :]).
Given that I am not entirely sure what you are asking, my answers may not not be what you are looking for; however, whatever your problem is, I am certain you cannot avoid reading the entire file.
I'm trying to append multiple columns of a csv to multiple lists. Column 1 will go in list 1, column 2 will go in list 2 etc...
However I want to be able to not hard code in the number of columns so it could work with multiple csv files. So I've used a column count to assign how many lists there should be.
I'm coming unstuck when trying to append values to these lists though. I've initiated a count that should be able to assign the right column to the right list however it seems like the loop just exits after the first loop and wont append the other columns to the list.
import csv
#open csv
f = open('attendees1.csv')
csv_f = csv.reader(f)
#count columns
first_row = next(csv_f)
num_cols = len(first_row)
#create multiple lists (within lists) based on column count
d = [[] for x in xrange(num_cols)]
#initiate count
count = 0
#im trying to state that whilst the count is less than the amount of columns, rows should be appended to lists, which list and which column will be defined by the [count] value.
while count < (num_cols):
for row in csv_f:
d[count].append(row[count])
count += 1
print count
print d
The iteration for row in csv_f: does not reset after each instance of the while loop, thus this loop exits immediately after the first time through.
You can read in everything as a list of rows, then transpose it to create a list of columns:
import csv
with open('attendees1.csv', 'r') as f:
csv_f = csv.reader(f)
first_row = next(csv_f) # Throw away the first row
d = [row for row in csv_f]
d = zip(*d)
See Transpose a matrix in Python.
If you want to keep re-reading the CSV file in the same manner as the OP, you can do that as well (but this is extremely inefficient):
while count < (num_cols):
for row in csv_f:
d[count].append(row[count])
count += 1
print count
f.seek(0) # rewind to the beginning of the file
next(csv_f) # throw away the first line again
See Python csv.reader: How do I return to the top of the file?.
Transposing the list of rows is a very elegant answer. There is another solution, not so elegant, but a little more transparent for a beginner.
Read rows, and append each element to the corresponding list, like so:
for row in csv_f:
for i in range(len(d)):
d[i].append(row[i])
I'm "pseudo" creating a .bib file by reading a csv file and then following this structure writing down every thing including newline characters. It's a tedious process but it's a raw form on converting csv to .bib in python.
I'm using Pandas to read csv and write row by row, (and since it has special characters I'm using latin1 encoder) but I'm getting a huge problem: it only reads the first row. From the official documentation I'm using their method on reading row by row, which only gives me the first row (example 1):
row = next(df.iterrows())[1]
But if I remove the next() and [1] it gives me the content of every column concentrated in one field (example 2).
Why is this happenning? Why using the method in the docs does not iterate through all rows nicely? How would be the solution for example 1 but for all rows?
My code:
import csv
import pandas
import bibtexparser
import codecs
colnames = ['AUTORES', 'TITULO', 'OUTROS', 'DATA','NOMEREVISTA','LOCAL','VOL','NUM','PAG','PAG2','ISBN','ISSN','ISSN2','ERC','IF','DOI','CODEN','WOS','SCOPUS','URL','CODIGO BIBLIOGRAFICO','INDEXAÇÕES',
'EXTRAINFO','TESTE']
data = pandas.read_csv('test1.csv', names=colnames, delimiter =r";", encoding='latin1')#, nrows=1
df = pandas.DataFrame(data=data)
with codecs.open('test1.txt', 'w', encoding='latin1') as fh:
fh.write('#Book{Arp, ')
fh.write('\n')
rl = data.iterrows()
for i in rl:
ix = str(i)
fh.write(' Title = {')
fh.write(ix)
fh.write('}')
fh.write('\n')
PS: I'm new to python and programming, I know this code has flaws and it's not the most effective way to convert csv to bib.
The example row = next(df.iterrows())[1] intentionally only returns the first row.
df.iterrows() returns a generator over tuples describing the rows. The tuple's first entry contains the row index and the second entry is a pandas series with your data of the row.
Hence, next(df.iterrows()) returns the next entry of the generator. If next has not been called before, this is the very first tuple.
Accordingly, next(df.iterrows())[1] returns the first row (i.e. the second tuple entry) as a pandas series.
What you are looking for is probably something like this:
for row_index, row in df.iterrows():
convert_to_bib(row)
Secondly, all your writing to your file handle fh must happen within the block with codecs.open('test1.txt', 'w', encoding='latin1') as fh:
because at the end of the block the file handle will be closed.
For example:
with codecs.open('test1.txt', 'w', encoding='latin1') as fh:
# iterate through all rows
for row_index, row in df.iterrows():
# iterate through all elements in the row
for colname in df.columns:
row_element = row[colname]
fh.write('%s = {%s},\n' % (colname, str(row_element)))
Still I am not sure if the names of the columns exactly match the bibtex fields you have in mind. Probably you have to convert these first. But I hope you get the principle behind the iterations :-)