Have a question. If I run this
code, I get exactly what I have in the CSV. My target is to get a text/data/csv file which would look like this:
['red', 'green', 'blue'] meaning:
Converting a single column into a row.
While converting to row, entering a comma to differentiate values.
Is it possible to do it through Python? Is there any online materials I can look into?
The csv file is read sequentially, that is, you will always get the data back one row at a time. However, you can build an array of just the values you want as you read the file and discard the rest of the data.
import csv
with open('example.csv') as csvfile:
colours = []
for row in csv.reader(csvfile, delimiter=','):
if len(row) <= 3:
continue
colours.append(row[3])
You can achieve this code using pandas. Try this code:
import pandas as pd
df = pd.read_csv("input csv file")
df = df.T
df.to_csv("output csv file")
Update if this is what you are looking for
Without using any extra libraries, you could write a function that takes a specific column, and returns that column as an array.
Such a function might look like this.
def column_to_row(col_num, csv):
return list(row[col_num] for row in csv)
Then you can extract whatever column you want, or iterate through the whole csv like this.
new_csv = []
for i in range(0, len(csv[0]):
new_csv = new_csv.append(column_to_row(i, csv))
Related
import csv
with open('doc.csv','r') as f:
file=csv.reader(f)
for row in file:
if row==['NAME']:
print(row)
I wanted to print all the names from a csv file in python. I tried this using this method but I got a blank output. Can anyone help me out ?
row is just a list, if you want first column from row, try:
print(row[0])
if you want all row, just
print(row[:])
if you want cells number 2 and 4:
print(row[1],row[3])
if you really want to have a better control of csv try with read_csv() pandas method:
import pandas as pd
df = pd.read_csv('AAPL.csv')
print(df['your_field_name_here'])
I have a csv file that stores the following information on each line; name, phone number, class time, duration of the class. I am trying to store only the phone number from each line of the csv file into a list. I am currently trying to get it to work using regex, but if there are better suggestions, I am all ears. I am relatively new to coding python, so any other advice would be much appreciated.
'''
def get_numbers():
file = open("students.csv")
regex = r"(\d+)"
for row in file:
if row:
result = re.search(regex, row)
print(result[0])
'''
This is a sample of what each line in the csv file looks like:
James Example,611-544-3091,8:00pm,1hr
Carl Example,900-122-818,12:15pm,30 mins
There are quite a few ways to do this.
1
Pandas offers a very elegant solution. You can read the csv file, and extract only the phone numbers. Here is how to do it.
import pandas as pd
df = pd.read_csv('file.csv', names=['name', 'phone number', 'class time', 'duration'])
phno = df['Phone number'].tolist()
What this essentially does is, it takes your entire data and makes it into a table. Each line of your file corresponds to one row, and each entry in a line corresponds to a column entry. Once you make it into a table using the read_csv instruction, you can then extract any column. You require the 'Phone Number` column, thus you pick up that column using df['Phone Number'] and convert it into a list.
2
If you do not want to use pandas, here is another method.
for row in file:
phno = row.split(',')[1]
print(phno)
#or append it to some master list if you wish
The best way would be to use pandas
df = pd.read_csv("path/to/file.csv")
https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html
It also lets you easily manipulate rows and a lot more. And there is plenty of tutorials etc. on the web.
The Python standard library has a csv module which is intended for exactly this; you can use either csv.reader or csv.DictReader:
import csv
def get_numbers():
with open("students.csv") as fh:
for row in csv.reader(fh):
if not row:
# row is empty; skip
continue
# unpack the row into four variables
name, number, time, duration = row
print(number)
I have a very large csv file with millions of rows and a list of the row numbers that I need.like
rownumberList = [1,2,5,6,8,9,20,22]
I know there is something called skiprows that helps to skip several rows when reading csv file like that
df = pd.read_csv('myfile.csv',skiprows = skiplist)
#skiplist would contain the total row list deducts rownumberList
However, since the csv file is very large, directly selecting the rows that I need could be more efficient. So I was wondering are there any methods to select rows when using read_csv? Not try to select rows using dataframe afterwards, since I try to minimize the time of reading file.Thanks.
There is a parameter called nrows : int, default None
Number of rows of file to read. Useful for reading pieces of large files (Docs)
pd.read_csv(file_name,nrows=int)
In case you need some part in the middle. Use both skiprows as well as nrows in read_csv.if skiprows indicate the beginning rows and nrows will indicate the next number of rows after skipping eg.
Example:
pd.read_csv('../input/sample_submission.csv',skiprows=5,nrows=10)
This will select data from the 6th row to 16 row
Edit based on comment:
Since there is a list this one might help i.e
li = [1,2,3,5,9]
r = [i for i in range(max(li)) if i not in li]
df = pd.read_csv('../input/sample_submission.csv',skiprows=r,nrows= max(li))
# This will skip the rows you dont want as well as limit the number of rows to maximum of the list.
import pandas as pd
rownumberList = [1,2,5,6,8,9,20,22]
df = pd.read_csv('myfile.csv',skiprows=lambda x: x not in rownumberList)
for pandas 0.25.1, pandas read_csv, you can pass callable function to skiprows
I am not sure about read_csv() from Pandas (there is though a way to use an iterator for reading a large file in chunks), but you can read the file line by line (lazy-loading, not reading the whole file in memory) with csv.reader (or csv.DictReader), leaving only the desired rows with the help of enumerate():
import csv
import pandas as pd
DESIRED_ROWS = {1, 17, 28}
with open("input.csv") as input_file:
reader = csv.reader(input_file)
desired_rows = [row for row_number, row in enumerate(reader)
if row_number in DESIRED_ROWS]
df = pd.DataFrame(desired_rows)
(assuming you would like to pick random/discontinuous rows and not a "continuous chunk" from somewhere in the middle - in that case #James's idea to have "start and "stop" would work generally better).
import pandas as pd
df = pd.read_csv('Data.csv')
df.iloc[3:6]
Returns rows 3 through 5 and all columns.
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iloc.html
From de documentation you can see that skiprows can take an integer or a list as values to remove some lines.
So basicaly you can tell it to remove all but those you want. For this you first need to know the number in lines in the file (best if you know beforehand) by open it and counting as following:
with open('myfile.csv') as f:
row_count = sum(1 for row in f)
Now you need to create the complementary list (here are sets but also works, don't know why). First you create the one from 1 to the number of rows and then substract the numbers of the rows you want to read.
skiplist = set(range(1, row_count+1)) - set(rownumberList)
Finally you can read the csv as normal.
df = pd.read_csv('myfile.csv',skiprows = skiplist)
here is the full code:
import pandas as pd
with open('myfile.csv') as f:
row_count = sum(1 for row in f)
rownumberList = [1,2,5,6,8,9,20,22]
skiplist = set(range(1, row_count+1)) - set(rownumberList)
df = pd.read_csv('myfile.csv', skiprows=skiplist)
you could try this
import pandas as pd
#making data frame from a csv file
data = pd.read_csv("your_csv_flie.csv", index_col ="What_you_want")
# retrieving multiple rows by iloc method
rows = data.iloc [[1,2,5,6,8,9,20,22]]
You will not be able to circumvent the read time when accessing a large file. If you have a very large CSV file, any program will need to read through it at least up to the point where you want to begin extracting rows. Really, that is what databases are designed for.
However, if you want to extract rows 300,000 to 300,123 from a 10,000,000 row CSV file, you are better off reading just the data you need into Python before converting it to a data frame in Pandas. For this you can use the csv module.
import csv
import pandas
start = 300000
stop = start + 123
data = []
with open('/very/large.csv', 'r') as fp:
reader = csv.reader(fp)
for i, line in enumerate(reader):
if i >= start:
data.append(line)
if i > stop:
break
df = pd.DataFrame(data)
for i in range (1,20)
the first parameter is the first row and the last parameter is the last row...
I have an array named genrelist that contains all the different genres of a movie. How do i write them out to the csv such that each element in the genrelist is in one cell and they are in a row instead of a column in Python 3.6?
Currently, i can write them out in a column by using this code:
with open('data.csv', 'a') as csvFile:
csvFileWriter = csv.writer(csvFile)
for genre in genrelist:
csvFileWriter.writerow([genre])
csvFile.close()
This will produce an output of:
|shonen|
|action|
|Adventure|
Desired output: |shonen| |action| |adventure|
The for loop writes the single genre to a row, like you desire, but you initiate a new row every single time! This makes the multiple rows seem like a column. Your desired output can be generated by printing the entire genreList with the writerow function. Like so:
with open('data.csv', 'a') as csvFile:
csvFileWriter = csv.writer(csvFile)
csvFileWriter.writerow(genreList)
csvFile.close()
The module pandas happens to have a really nice read_csv() function and also a df.to_csv function.
What you would do is create a dataframe like so:
import pandas as pd
df = pd.DataFrame(read_csv('data.csv'))
to change the columns to rows, just use:
df.transpose()
and then you can write it to a file like this:
df.to_csv('transposeddata.csv')
The full documentation can be found here:
Pandas Documentation
I'm "pseudo" creating a .bib file by reading a csv file and then following this structure writing down every thing including newline characters. It's a tedious process but it's a raw form on converting csv to .bib in python.
I'm using Pandas to read csv and write row by row, (and since it has special characters I'm using latin1 encoder) but I'm getting a huge problem: it only reads the first row. From the official documentation I'm using their method on reading row by row, which only gives me the first row (example 1):
row = next(df.iterrows())[1]
But if I remove the next() and [1] it gives me the content of every column concentrated in one field (example 2).
Why is this happenning? Why using the method in the docs does not iterate through all rows nicely? How would be the solution for example 1 but for all rows?
My code:
import csv
import pandas
import bibtexparser
import codecs
colnames = ['AUTORES', 'TITULO', 'OUTROS', 'DATA','NOMEREVISTA','LOCAL','VOL','NUM','PAG','PAG2','ISBN','ISSN','ISSN2','ERC','IF','DOI','CODEN','WOS','SCOPUS','URL','CODIGO BIBLIOGRAFICO','INDEXAÇÕES',
'EXTRAINFO','TESTE']
data = pandas.read_csv('test1.csv', names=colnames, delimiter =r";", encoding='latin1')#, nrows=1
df = pandas.DataFrame(data=data)
with codecs.open('test1.txt', 'w', encoding='latin1') as fh:
fh.write('#Book{Arp, ')
fh.write('\n')
rl = data.iterrows()
for i in rl:
ix = str(i)
fh.write(' Title = {')
fh.write(ix)
fh.write('}')
fh.write('\n')
PS: I'm new to python and programming, I know this code has flaws and it's not the most effective way to convert csv to bib.
The example row = next(df.iterrows())[1] intentionally only returns the first row.
df.iterrows() returns a generator over tuples describing the rows. The tuple's first entry contains the row index and the second entry is a pandas series with your data of the row.
Hence, next(df.iterrows()) returns the next entry of the generator. If next has not been called before, this is the very first tuple.
Accordingly, next(df.iterrows())[1] returns the first row (i.e. the second tuple entry) as a pandas series.
What you are looking for is probably something like this:
for row_index, row in df.iterrows():
convert_to_bib(row)
Secondly, all your writing to your file handle fh must happen within the block with codecs.open('test1.txt', 'w', encoding='latin1') as fh:
because at the end of the block the file handle will be closed.
For example:
with codecs.open('test1.txt', 'w', encoding='latin1') as fh:
# iterate through all rows
for row_index, row in df.iterrows():
# iterate through all elements in the row
for colname in df.columns:
row_element = row[colname]
fh.write('%s = {%s},\n' % (colname, str(row_element)))
Still I am not sure if the names of the columns exactly match the bibtex fields you have in mind. Probably you have to convert these first. But I hope you get the principle behind the iterations :-)