I'm trying to skip the first pipe delimited piece of data in my .txt file when reading it with a csv.DictReader. Here is a sample of the data I'm working with:
someCSVfile.csv|cust_no,0|streetaddr,1|city,2|state,3|zip,4|phone_home,5|firstname,6|lastname,7|status,9|
someCSVfile1.csv|cust_no,0|streetaddr,1|city,2|state,3|zip,4|phone_home,5|firstname,6|lastname,7|status,9|
And here is my code so far:
import csv
reader = csv.reader(open('match_log.txt','rb'), dialect='excel', delimiter='|')
for row in reader:
skipfirstRow=reader.next()
skipfirstRowAgain=reader.next()
Dictreader=csv.DictReader(reader,skipfirstRow)
print row
I've been researching .next() pretty thoroughly, but that doesn't seem to work. When I print my rows, it prints every row, when I don't want the first row (the .csv files) to be printed. Is there another method that may work?
EDIT: Here is my latest code:
import csv
reader = csv.reader(open('match_log.txt','rb'), dialect='excel', delimiter='|')
data = {}
for row in reader:
filenameVariable = row[0]
data = dict(item.split(',') for item in row[1:])
print data
print filenameVariable
Right now, data and filenameVariable are printing the final row when I need all rows. I tried .append but that didn't work. What else could I use?
The .csv parts are the first column/field, not the first row. Advancing reader will indeed skip rows, but won't affect what's in each individual row. (Rows go across!)
If you want to leave off the first item in a sequence, print row[1:] instead of row.
Related
I have a few thousand Twitter tweets in a csv with one tweet per row (there are blank rows between each tweet). Each column of each row contains different parts of the tweet (like time, text, language, location, etc.) but not each column has the same information (ie: sometimes language appears in column AG or AH or some other one). I'm trying to clean up the data by creating a new CSV containing only English tweets and also filtering out the punctuations from each of these (English) tweets.
I'm currently stuck on how to filter out only the English tweets. This is what I have so far:
import csv
f = open('twitDB.csv')
csv_f = csv.reader(f) # csv_f is a list of lists
for row in csv_f:
for col in row:
if col == 'lang:"en"':
with open('cleaned.csv', 'w') as fp:
wr = csv.writer(fp, delimiter = ',')
wr.writerow(row)
wr.writerow('\n')
The new cleaned.csv only contains the last English tweet (of thousands) in its Row 1. I have a feeling that my code is continuously overwriting the first row of cleaned.csv and not writing each tweet onto the next row but I'm unsure how to fix this.
you need to use open('cleaned.csv', 'a') the 'a' will append each time.. 'w' will open and overwrite what is there each time.. This is why you are only seeing 1 row..
Started learning python after lots of ruby experience. With that context in mind:
I have a csv file that looks something like this:
city_names.csv
"abidjan","addis_ababa","adelaide","ahmedabad"
With the following python script I'd like to read this into a list:
city_names_reader.py
import csv
city_name_file = r"./city_names.csv"
with open(city_name_file, 'rb') as file:
reader = csv.reader(file)
city_name_list = list(reader)
print city_name_list
The result surprised me:
[['abidjan', 'addis_ababa', 'adelaide', 'ahmedabad']]
Any idea why I'm getting a nested list rather than a 4-element list? I must be overlooking something self-evident.
A CSV file represents a table of data. A table contains both columns and rows, like a spreadsheet. Each line in a CSV file is one row in the table. One row contains multiple columns, separated by ,
When you read a CSV file you get a list of rows. Each row is a list of columns.
If your file have only one row you can easily just read that row from the list:
city_name_list = city_name_list[0]
Usually each column represent some kind of data (think "column of email addresses"). Each row then represent a different object (think "one object per row, each row can have one email address"). You add more objects to the table by adding more rows.
It is not common with wide tables. Wide tables are those that grow by adding more columns instead of rows. In your case you have only one kind of data: city names. So you should have one column ("name"), with one row per city. To get city names from your file you could then read the first element from each row:
city_name_list = [row[0] for row in city_name_list]
In both cases you can flatten the list by using itertools.chain:
city_name_list = itertools.chain(city_name_list)
As others suggest, your file is not an idiomatic CSV file. You can simply do:
with open(city_name_file, "rb") as fp:
city_names_list = fp.read().split(",")
Based on comments, here is a possible solution:
import csv
city_name_file = r"./city_names.csv"
city_name_list = []
with open(city_name_file, 'rb') as file:
reader = csv.reader(file)
for item in reader:
city_name_list += item
print city_name_list
I'm "pseudo" creating a .bib file by reading a csv file and then following this structure writing down every thing including newline characters. It's a tedious process but it's a raw form on converting csv to .bib in python.
I'm using Pandas to read csv and write row by row, (and since it has special characters I'm using latin1 encoder) but I'm getting a huge problem: it only reads the first row. From the official documentation I'm using their method on reading row by row, which only gives me the first row (example 1):
row = next(df.iterrows())[1]
But if I remove the next() and [1] it gives me the content of every column concentrated in one field (example 2).
Why is this happenning? Why using the method in the docs does not iterate through all rows nicely? How would be the solution for example 1 but for all rows?
My code:
import csv
import pandas
import bibtexparser
import codecs
colnames = ['AUTORES', 'TITULO', 'OUTROS', 'DATA','NOMEREVISTA','LOCAL','VOL','NUM','PAG','PAG2','ISBN','ISSN','ISSN2','ERC','IF','DOI','CODEN','WOS','SCOPUS','URL','CODIGO BIBLIOGRAFICO','INDEXAÇÕES',
'EXTRAINFO','TESTE']
data = pandas.read_csv('test1.csv', names=colnames, delimiter =r";", encoding='latin1')#, nrows=1
df = pandas.DataFrame(data=data)
with codecs.open('test1.txt', 'w', encoding='latin1') as fh:
fh.write('#Book{Arp, ')
fh.write('\n')
rl = data.iterrows()
for i in rl:
ix = str(i)
fh.write(' Title = {')
fh.write(ix)
fh.write('}')
fh.write('\n')
PS: I'm new to python and programming, I know this code has flaws and it's not the most effective way to convert csv to bib.
The example row = next(df.iterrows())[1] intentionally only returns the first row.
df.iterrows() returns a generator over tuples describing the rows. The tuple's first entry contains the row index and the second entry is a pandas series with your data of the row.
Hence, next(df.iterrows()) returns the next entry of the generator. If next has not been called before, this is the very first tuple.
Accordingly, next(df.iterrows())[1] returns the first row (i.e. the second tuple entry) as a pandas series.
What you are looking for is probably something like this:
for row_index, row in df.iterrows():
convert_to_bib(row)
Secondly, all your writing to your file handle fh must happen within the block with codecs.open('test1.txt', 'w', encoding='latin1') as fh:
because at the end of the block the file handle will be closed.
For example:
with codecs.open('test1.txt', 'w', encoding='latin1') as fh:
# iterate through all rows
for row_index, row in df.iterrows():
# iterate through all elements in the row
for colname in df.columns:
row_element = row[colname]
fh.write('%s = {%s},\n' % (colname, str(row_element)))
Still I am not sure if the names of the columns exactly match the bibtex fields you have in mind. Probably you have to convert these first. But I hope you get the principle behind the iterations :-)
Python 2.6(necessary for the job)
import csv
list = ['apple,whiskey,turtle', 'orange,gin,wolf', 'banana,vodka,sparrow']
fieldNames = ['Fruit', 'Spirit', 'Animal']
reader = csv.DictReader(list,fieldnames= fieldNames)
for row in reader:
print row['Fruit']
for row in reader:
print row['Fruit']
I have some code that generates a uniform list of items per row, making a list object. For ease of use I used the csv module's DictReader to step through the rows and do any calculations I need to but when I try to iterate a second time, I get no output. I suspect the end of the list is being treated like an EOF but I am unable to 'seek' to the beginning of the list to do the iteration again.
Any suggestions on what I can do? Perhaps there is a better way than using the CSV, it just seemed really convenient.
New Code
import csv
list = ['apple,"whiskey,rum",turtle', 'orange,gin,wolf', 'banana,vodka,sparrow']
processed = []
fieldNames = ['Fruit', 'Spirit', 'Animal']
reader = csv.DictReader(list,fieldnames= fieldNames, quoatechar = '"')
for row in reader:
processed.append(row)
print row
for row in processed:
print row['Fruit']
for row in processed:
print row['Spirit']
#jonrsharpe suggested placing the rows of reader into a list. It works perfectly for what I had in mind. Thank you everyone.
You're indeed correct that iterating over the rows once is what the DictReader provides. So your options are:
Create a new DictReader and iterate again (seems wasteful)
Iterate over the rows once and perform all computations that you want to perform
Iterate over the rows once, store the data in another data structure and iterate over that data structure as many times as you wish.
Also, if you have only the list and the field names you don't need a DictReader to do the same thing. If you know the data is relatively straightforward (no comma's inside the data for example and all the same number of items) then you can simply do:
merged = [zip(fieldnames, row.split(",")) for row in my_list]
print merged
I'm trying to create a list in python from a csv file. The CSV file contains only one column, with about 300 rows of data. The list should (ideally) contain a string of the data in each row.
When I execute the below code, I end up with a list of lists (each element is a list, not a string). Is the CSV file I'm using formatted incorrectly, or is there something else I'm missing?
filelist = []
with open(r'D:\blah\blahblah.csv', 'r') as expenses:
reader = csv.reader(expenses)
for row in reader:
filelist.append(row)
row is a row with one field. You need to get the first item in that row:
filelist.append(row[0])
Or more concisely:
filelist = [row[0] for row in csv.reader(expenses)]
It seems your "csv" doesn't contain any seperator like ";" or ",".
Because you said it only contains 1 column. So it ain't a real csv and there shouldn't be a seperator.
so you could simply read the file line-wise:
filelist = []
for line in open(r'D:\blah\blahblah.csv', 'r').readlines():
filelist.append(line.strip())
Each row is read as list of cells.
So what you want to do is
output = [ row[0] for row in reader ]
since you only have the first cell filled out in each row.