I have a csv file like following :
A, B, C, D
2,3,4,5
4,3,5,2
5,8,3,9
7,4,2,6
8,6,3,7
I want to fetch the B values from 3 rows at a time(for first iteration values would be 3,3,8) and save in some variable(value1=3,value2=3,value3=8) and pass it on to a function. Once those values are processed. I want to fetch the values from next 3 rows (value1=3,value2=8,value3=4) and so on.
The csv file is large.
I am a JAVA developer, if possible suggest the simplest possible code.
An easy solution would be the following:
import pandas as pd
data = pd.read_csv("path.csv")
for i in range(len(data)-2):
value1 = data.loc[i,"B"]
value2 = data.loc[i+1,"B"]
value3 = data.loc[i+2,"B"]
function(value1, value2, value3)
This is a possible solution (I have used the function proposed in this answer):
import csv
import itertools
# Function to iterate the csv file by chunks (of any size)
def grouper(n, iterable):
it = iter(iterable)
while True:
chunk = tuple(itertools.islice(it, n))
if not chunk:
return
yield chunk
# Open the csv file
with open('myfile.csv') as f:
csvreader = csv.reader(f)
# Read the headers: ['A', 'B', 'C', 'D']
headers = next(csvreader, None)
# Read the rest of the file by chunks of 3 rows
for chunk in grouper(3, csvreader):
# do something with your chunk of rows
print(chunk)
Printed result:
(['2', '3', '4', '5'], ['4', '3', '5', '2'], ['5', '8', '3', '9'])
(['7', '4', '2', '6'], ['8', '6', '3', '7'])
You can use pandas to read your csv with chunksize argument as described here (How can I partially read a huge CSV file?)
import pandas as pd
#Function that you want to apply to you arguments
def fn(A, B, C, D):
print(sum(A), sum(B), sum(C), sum(D))
#Iterate through the chunks
for chunk in pd.read_csv('test.csv', chunksize=3):
#Convert dataframe to dict
chunk_dict = chunk.to_dict(orient = 'list')
#Pass arguments to your functions
fn(**chunk_dict)
You can use csv module
import csv
with open('data.txt') as fp:
reader = csv.reader(fp)
next(reader) #skips the header
res = [int(row[1]) for row in reader]
groups = (res[idx: idx + 3] for idx in range(0, len(res) - 2))
for a, b, c in groups:
print(a, b, c)
Output:
3 3 8
3 8 4
8 4 6
Related
This is my code:
data = [['a', 'b', 'c'],['1', '2', '3']]
col_width = max(len(word) for row in data for word in row) + 2
for row in data:
print("".join(word.ljust(col_width) for word in row))
this is what I tried:
import sys
data = [['a', 'b', 'c'],['1', '2', '3']]
col_width = max(len(word) for row in data for word in row) + 2
for row in data:
print("".join(word.ljust(col_width) for word in row))
original_stdout = sys.stdout
with open('filename.txt', 'w') as f:
sys.stdout = f
print("".join(word.ljust(col_width) for word in row))
sys.stdout = original_stdout
this is the output I want in the file:
a b c
1 2 3
this is the output I get when I run my code:
1 2 3
I have a list of {n} dictionaries in a txt file. Each dictionary per line as illustrated below which i want exported in csv format with each key presented per column.
{'a':'1','b':'2','c':'3'}
{'a':'4','b':'5','c':'6'}
{'a':'7','b':'8','c':'9'}
{'a':'10','b':'11','c':'12'}
...
{'a':'x','b':'y','c':'z'}
i want csv output for {n} rows as below with index
a b c
0 1 2 3
1 4 5 6
2 7 8 9
... ... ... ...
n x y z
You can use ast.literal_eval (doc) to load your data from the text file.
With contents of input file file.txt:
{'a':'1','b':'2','c':'3'}
{'a':'4','b':'5','c':'6'}
{'a':'7','b':'8','c':'9'}
{'a':'10','b':'11','c':'12'}
{'a':'x','b':'y','c':'z'}
You could use this script to load the data and input file.csv:
import csv
from ast import literal_eval
with open('file.txt', 'r') as f_in:
lst = [literal_eval(line) for line in f_in if line.strip()]
with open('file.csv', 'w', newline='') as csvfile:
fieldnames = ['a', 'b', 'c']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
writer.writerows(lst)
file.csv will become:
a,b,c
1,2,3
4,5,6
7,8,9
10,11,12
x,y,z
Importing the file to LibreOffice:
x =[{'a':'1','b':'2','c':'3'},
{'a':'4','b':'5','c':'6'},
{'a':'7','b':'8','c':'9'},
{'a':'10','b':'11','c':'12'}]
n = len(x)
keys = list(x[0].keys())
newdict=dict()
for m in keys:
newdict[m]=[]
for i in range(n):
newdict[m].append(x[i][m])
newdict
Output is
{'a': ['1', '4', '7', '10'],
'b': ['2', '5', '8', '11'],
'c': ['3', '6', '9', '12']}
Or you can use pandas.concat which is used to combine DataFrames with the same columns.
import pandas as pd
x =[{'a':'1','b':'2','c':'3'},
{'a':'4','b':'5','c':'6'},
{'a':'7','b':'8','c':'9'},
{'a':'10','b':'11','c':'12'}]
xpd=[]
for i in x:
df=pd.DataFrame(i, index=[0])
xpd.append(df)
pd.concat(xpd, ignore_index=True)
I have a 2d list saved in a text file that looks like this (showing the first 2 entries):
('9b7dad', "text", 'http://imgur.com/gallery/SPdGm27', '1', 'A', 5)
('2b6ebj', 'text2', 'https://i.redd.it/lzft358csdi21.jpg', '1', 'B', 6)
How should this be loaded into a list? (so for example list[0][0] = '9b7dad', list[1][1] = 'text2' etc)
You could try this:
f = open(<your file path>)
result = [
[g.replace("'", "")
for g in l.strip('()\n').replace(' ', '').replace('"', '').split(',')]
for l in f.readlines()]
f.close()
Given a text file with each line in the form you've shown:
('9b7dad', "text", 'http://imgur.com/gallery/SPdGm27', '1', 'A', 5)
('2b6ebj', 'text2', 'https://i.redd.it/lzft358csdi21.jpg', '1', 'B', 6)
You can use Pandas which offers a more straightforward way to handle/manipulate different data types.
Import pandas and read in the file, here called 'stack.txt':
import pandas as pd
data = pd.read_csv('stack.txt', sep=",", header=None)
Returns only the list of list:
alist = data.values.tolist()
Print to check:
print(alist)
[['9b7dad', 'text', 'http://imgur.com/gallery/SPdGm27', '1', 'A', 5],
['2b6ebj', 'text2', 'https://i.redd.it/lzft358csdi21.jpg', '1', 'B', 6]]
If need to process columns:
for i in range(len(data.columns)):
if i == 0:
data[i] = data[i].map(lambda x: str(x)[1:])
data[i] = data[i].map(lambda x: str(x)[1:-1])
if i == 5:
data[i] = data[i].map(lambda x: str(x)[:-1])
data[i] = data[i].astype(int)
if 0 < i < 5:
data[i] = data[i].map(lambda x: str(x)[2:-1])
#!/usr/bin/env python
import sys
myList = []
for line in sys.stdin:
elems = line.strip('()\n').replace(' ', '').split(',')
elems = [x.strip('\'\"') for x in elems]
myList.append(elems)
print(myList[0][0])
print(myList[1][1])
To use:
$ python ./load.py < someText.txt
9b7dad
text2
Use int(), float(), or str() to coerce fields in elems to certain types, as needed. Use a try..except block to catch malformed input.
import ast
with open(file_name) as f:
content = f.readlines()
content = [list(ast.literal_eval(x)) for x in content]
How to read files:
In Python, how do I read a file line-by-line into a list?
More about eval:
Convert string representation of list to list
try this (converting tuple to list):
my_list = []
my_list.append(list('9b7dad', "text", 'http://imgur.com/gallery/SPdGm27', '1', 'A', 5))
my_list.append(list('2b6ebj', 'text2', 'https://i.redd.it/lzft358csdi21.jpg', '1', 'B', 6))
result is a list of lists i.e. a 2 dimensional list. You can easily modify the code to fetch a line at a time in a for loop and append it to the list. Consider using split(',') if it is a comma separated list instead of a tuple e.g.
mylist = []
with open(filename, 'r') as my_file:
for text in my_file.readlines()
my_list.append(text.split(','))
I have a recursive function that reads a list of scout records from a file, and adds then in order of their ID's to a list box. The function is called with addScouts(1) The function is below:
def addScouts(self,I):
i = I
with open(fileName,"r") as f:
lines = f.readlines()
for line in lines:
if str(line.split(",")[3])[:-1] == str(i):
self.scoutList.insert(END,line[:-1])
i += 1
return self.addScouts(i)
return
My issue is that my file ID's are ordered 1,2,4,5 as at some point I removed the scout with ID of 3. However, when I run the function to re-order the scouts in the list box (the function above), it only lists the scouts up to and including ID 3. This is because when i = 3, none of the items in the file are equal to 3, so the function reaches the end and returns before it gets a chance to check the remaining records.
File contents:
Kris,Rice,17,1
Olly,Fallows,17,2
Olivia,Bird,17,4
Louis,Martin,18,5
Any idea's how to fix this?
Just sort on the last column:
sorted(f,key=lambda x: int(x.split(",")[-1]))
You can use bisect to find where to put the new data to keep the data ordered after it is sorted once:
from bisect import bisect
import csv
with open("foo.txt") as f:
r = list(csv.reader(f))
keys = [int(row[-1]) for row in r]
new = ["foo","bar","12","3"]
ind = bisect(keys, int(new[-1]))
r.insert(ind,new)
print(r)
Output:
[['Kris', 'Rice', '17', '1'], ['Olly', 'Fallows', '17', '2'], ['foo', 'bar', '12', '3'], ['Olivia', 'Bird', '17', '4'], ['Louis', 'Martin', '18', '5']]
A simpler way is to check for the first row that has a higher id, if none are higher just append to the end:
import csv
with open("foo.txt") as f:
r = list(csv.reader(f))
new = ["foo","bar","12","3"]
key = int(new[-1])
ind = None
for i, row in enumerate(r):
if int(row[-1]) >= key:
ind = i
break
r.insert(ind, new) if ind is not None else r.append(new)
print(r)
Output:
[['Kris', 'Rice', '17', '1'], ['Olly', 'Fallows', '17', '2'], ['foo', 'bar', '12', '3'], ['Olivia', 'Bird', '17', '4'], ['Louis', 'Martin', '18', '5']
To always keep that file in order when adding a new value we just need to write to a temp file, writing the line in the correct place and then replace the original with the updated file:
import csv
from tempfile import NamedTemporaryFile
from shutil import move
with open("foo.csv") as f, NamedTemporaryFile(dir=".", delete=False) as temp:
r = csv.reader(f)
wr = csv.writer(temp)
new = ["foo", "bar", "12", "3"]
key, ind = int(new[-1]), None
for i, row in enumerate(r):
if int(row[-1]) >= key:
wr.writerow(new)
wr.writerow(row)
wr.writerows(r)
break
wr.writerow(row)
else:
wr.writerow(new)
move(temp.name, "foo.csv")
foo.csv after will have the data in order:
Kris,Rice,17,1
Olly,Fallows,17,2
foo,bar,12,3
Olivia,Bird,17,4
Louis,Martin,18,5
You can check if your list has the same length as your file and if not, you run addScouts again, and if true, you end. Like this:
def addScouts(self,I):
i = I
with open(fileName,"r") as f:
lines = f.readlines()
for line in lines:
if str(line.split(",")[3])[:-1] == str(i):
self.scoutList.insert(END,line[:-1])
i += 1
return self.addScouts(i)
if len(scoutList) < len(lines):
return self.addScouts(i+1)
else:
return
I have a function, table(), that reads a csv file and returns the rows in individual lists. I want to map these lists to create a dictionary, with field headers being the keys and the underlying rows being the values. I cannot seem to do this however. When I try to call only the first element within the list of lists I created from the function ( l) in the command prompt, it returns all the lists up to 'N', the first letter in the word 'None', despite me breaking (return) if reader is None. When I do the same with a sys.stdout to a text file, it does the same, but the 'N' is replaced with <type 'list'>. Does anyone know what I'm doing wrong, and how I can go about creating a dictionary (or a list of columns, for that matter) from a CSV file given my table() function?
import csv
def table():
with open('C:\\Python27\\test.csv', 'rb') as f:
reader = csv.reader(f)
for row in reader:
print row
if reader is None:
return
l = list(str(table()))
keys = l[0]
print keys
Output text file:
['Field1', 'Field2', 'Field3', 'Field4']
['a', '1', 'I', 'access']
['b', '2', 'II', 'accessing\n']
['c', '3', 'III', 'accommodation']
['d', '4', 'IIII', 'basically']
<type 'list'>
More pythonically
def table():
with open('C:\\Python27\\test.csv', 'rb') as f:
reader = csv.reader(f)
for row in reader:
yield row
You don't actually return anything from the table function. Try it like this:
def table():
with open('C:\\Python27\\test.csv', 'rb') as f:
lines = []
reader = csv.reader(f)
if reader:
for row in reader:
lines.append(row)
return lines
else:
return []