How to split file into smaller by first number in second column - python

So my data looks like:
1 3456542 5 may 2014
2 1245678 4 may 2014
3 4256876 2 may 2014
4 5643156 6 may 2014
.....
The goal is to sort it by the 2nd column then separate the rows based on the first number in the 2nd column (i.e. 3456542 goes to subs_3.txt, 1245678 goes to subs_1.txt...). The output is totally wrong and gives me 6 files with what appears to be random rows in them. Any suggestions?
import csv
from operator import itemgetter
file_lines = []
with open("subs.txt", "r") as csv_file:
reader = csv.reader(csv_file, delimiter=" ")
for row in reader:
file_lines.append(row)
file_lines.sort(key=itemgetter(1))
with open("sorted_subs.txt", "w") as csv_file:
writer = csv.writer(csv_file, delimiter=" ")
for row in file_lines:
writer.writerow(row)
for row in file_lines:
file_num = row[1[1]
with open("file_{0}.txt".format(file_num), "w") as f:
writer = csv.writer(f, delimiter=" ")
writer.writerow(row)

You could use itertools.groupby to group the lines that go to same file together and then just loop over the groups in order to write the files:
from itertools import groupby
for k, g in groupby(file_lines, key=lambda x: x[1][0]):
with open("file_{0}.txt".format(k), "w") as f:
csv.writer(f, delimiter=" ").writerows(g)
Update: groupby will group the lines based on the first number in second column. It will return the key used for grouping and iterator containing the grouped items. Since file_lines is already sorted we know that all items belonging to same group will be returned within one group. Here's a short example how it works, note that test data is different than in original question in order to demonstrate grouping:
from itertools import groupby
lst = [
['2', '1245678', '', '4', 'may', '2014'],
['1', '3456542', '', '5', 'may', '2014'],
['3', '3256876', '', '2', 'may', '2014'],
['4', '5643156', '', '6', 'may', '2014']
]
for k, g in groupby(lst, key=lambda x: x[1][0]):
print('key: {0}, items: {1}'.format(k, list(g)))
Output:
key: 1, items: [['2', '1245678', '', '4', 'may', '2014']]
key: 3, items: [['1', '3456542', '', '5', 'may', '2014'], ['3', '3256876', '', '2', 'may', '2014']]
key: 5, items: [['4', '5643156', '', '6', 'may', '2014']]

Related

Python : I am not able to access individual elements inside sublist. The entire sublist is displayed as single element

My Code :
import ast
with open('input.txt', 'r') as file :
filedata = file.read()
filedata = filedata.replace('|', ',')
out = []
buff = []
for c in filedata :
if c == '\n':
out.append(''.join(buff))
buff = []
else:
buff.append(c)
else:
if buff:
out.append(''.join(buff))
list = [[i] for i in out]
print(list)
Input :
10|1|SELL|toaster_1|10.00|20 12|8|BID|toaster_1|7.50
13|5|BID|toaster_1|12.50 15|8|SELL|tv_1|250.00|20 16
17|8|BID|toaster_1|20.00 18|1|BID|tv_1|150.00 19|3|BID|tv_1|200.00
20 21|3|BID|tv_1|300.00
Expected Output
[["10","1","SELL","toaster_1","10.00","20"],
["12","8","BID","toaster_1","7.50"],
["13","5","BID","toaster_1","12.50"],
["15","8","SELL","tv_1","250.00","20"], ["16"],
["17","8","BID","toaster_1","20.00"],
["18","1","BID","tv_1","150.00"], ["19","3","BID","tv_1","200.00"],
["20"], ["21","3","BID","tv_1","300.00"]] "
The Output I am getting:
[['10,1,SELL,toaster_1,10.00,20'],
['12,8,BID,toaster_1,7.50'], ['13,5,BID,toaster_1,12.50'],
['15,8,SELL,tv_1,250.00,20'], ['16'], ['17,8,BID,toaster_1,20.00'],
['18,1,BID,tv_1,150.00'], ['19,3,BID,tv_1,200.00'], ['20'],
['21,3,BID,tv_1,300.00']] [Finished in 0.1s]
I want to access individual elements within sublist, eg, SELL, or
toaster, but I am not able to access them. Can someone advice please?
Use:
# filedata = file.read()
filedata = """10|1|SELL|toaster_1|10.00|20 12|8|BID|toaster_1|7.50
13|5|BID|toaster_1|12.50 15|8|SELL|tv_1|250.00|20 16
17|8|BID|toaster_1|20.00 18|1|BID|tv_1|150.00 19|3|BID|tv_1|200.00
20 21|3|BID|tv_1|300.00 """
result = []
for i in filedata.split(): #split by space
result.append(i.split("|")) #split by `|` and append to result
print(result)
Or a list comprehension
Ex:
result = [i.split("|") for i in filedata.split()]
Output:
[['10', '1', 'SELL', 'toaster_1', '10.00', '20'],
['12', '8', 'BID', 'toaster_1', '7.50'],
['13', '5', 'BID', 'toaster_1', '12.50'],
['15', '8', 'SELL', 'tv_1', '250.00', '20'],
['16'],
['17', '8', 'BID', 'toaster_1', '20.00'],
['18', '1', 'BID', 'tv_1', '150.00'],
['19', '3', 'BID', 'tv_1', '200.00'],
['20'],
['21', '3', 'BID', 'tv_1', '300.00']]
Well your code never handles splitting the line into comma separated values. You just read the line character by character, join all those characters together into a string, and append it to the out list.
The following code should work (I minimally changed your own code. I would instead use a more clean solution like the one by Rakesh):
import ast
with open('input.txt', 'r') as file :
filedata = file.read()
filedata = filedata.replace('|', ',')
out = []
buff = []
for c in filedata :
if c == '\n':
line = ''.join(buff)
for word in line.split(","):
out.append(word)
buff = []
else:
buff.append(c)
else:
if buff:
out.append(''.join(buff))
# l = [[i] for i in out]
print(out)
By the way, it is recommended not to use list as a variable name.

How to export multiple dictionaries with same keys, different values in txt file to csv

I have a list of {n} dictionaries in a txt file. Each dictionary per line as illustrated below which i want exported in csv format with each key presented per column.
{'a':'1','b':'2','c':'3'}
{'a':'4','b':'5','c':'6'}
{'a':'7','b':'8','c':'9'}
{'a':'10','b':'11','c':'12'}
...
{'a':'x','b':'y','c':'z'}
i want csv output for {n} rows as below with index
a b c
0 1 2 3
1 4 5 6
2 7 8 9
... ... ... ...
n x y z
You can use ast.literal_eval (doc) to load your data from the text file.
With contents of input file file.txt:
{'a':'1','b':'2','c':'3'}
{'a':'4','b':'5','c':'6'}
{'a':'7','b':'8','c':'9'}
{'a':'10','b':'11','c':'12'}
{'a':'x','b':'y','c':'z'}
You could use this script to load the data and input file.csv:
import csv
from ast import literal_eval
with open('file.txt', 'r') as f_in:
lst = [literal_eval(line) for line in f_in if line.strip()]
with open('file.csv', 'w', newline='') as csvfile:
fieldnames = ['a', 'b', 'c']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
writer.writerows(lst)
file.csv will become:
a,b,c
1,2,3
4,5,6
7,8,9
10,11,12
x,y,z
Importing the file to LibreOffice:
x =[{'a':'1','b':'2','c':'3'},
{'a':'4','b':'5','c':'6'},
{'a':'7','b':'8','c':'9'},
{'a':'10','b':'11','c':'12'}]
n = len(x)
keys = list(x[0].keys())
newdict=dict()
for m in keys:
newdict[m]=[]
for i in range(n):
newdict[m].append(x[i][m])
newdict
Output is
{'a': ['1', '4', '7', '10'],
'b': ['2', '5', '8', '11'],
'c': ['3', '6', '9', '12']}
Or you can use pandas.concat which is used to combine DataFrames with the same columns.
import pandas as pd
x =[{'a':'1','b':'2','c':'3'},
{'a':'4','b':'5','c':'6'},
{'a':'7','b':'8','c':'9'},
{'a':'10','b':'11','c':'12'}]
xpd=[]
for i in x:
df=pd.DataFrame(i, index=[0])
xpd.append(df)
pd.concat(xpd, ignore_index=True)

When calling a recursive function to order values, it misses one. How do I fix this?

I have a recursive function that reads a list of scout records from a file, and adds then in order of their ID's to a list box. The function is called with addScouts(1) The function is below:
def addScouts(self,I):
i = I
with open(fileName,"r") as f:
lines = f.readlines()
for line in lines:
if str(line.split(",")[3])[:-1] == str(i):
self.scoutList.insert(END,line[:-1])
i += 1
return self.addScouts(i)
return
My issue is that my file ID's are ordered 1,2,4,5 as at some point I removed the scout with ID of 3. However, when I run the function to re-order the scouts in the list box (the function above), it only lists the scouts up to and including ID 3. This is because when i = 3, none of the items in the file are equal to 3, so the function reaches the end and returns before it gets a chance to check the remaining records.
File contents:
Kris,Rice,17,1
Olly,Fallows,17,2
Olivia,Bird,17,4
Louis,Martin,18,5
Any idea's how to fix this?
Just sort on the last column:
sorted(f,key=lambda x: int(x.split(",")[-1]))
You can use bisect to find where to put the new data to keep the data ordered after it is sorted once:
from bisect import bisect
import csv
with open("foo.txt") as f:
r = list(csv.reader(f))
keys = [int(row[-1]) for row in r]
new = ["foo","bar","12","3"]
ind = bisect(keys, int(new[-1]))
r.insert(ind,new)
print(r)
Output:
[['Kris', 'Rice', '17', '1'], ['Olly', 'Fallows', '17', '2'], ['foo', 'bar', '12', '3'], ['Olivia', 'Bird', '17', '4'], ['Louis', 'Martin', '18', '5']]
A simpler way is to check for the first row that has a higher id, if none are higher just append to the end:
import csv
with open("foo.txt") as f:
r = list(csv.reader(f))
new = ["foo","bar","12","3"]
key = int(new[-1])
ind = None
for i, row in enumerate(r):
if int(row[-1]) >= key:
ind = i
break
r.insert(ind, new) if ind is not None else r.append(new)
print(r)
Output:
[['Kris', 'Rice', '17', '1'], ['Olly', 'Fallows', '17', '2'], ['foo', 'bar', '12', '3'], ['Olivia', 'Bird', '17', '4'], ['Louis', 'Martin', '18', '5']
To always keep that file in order when adding a new value we just need to write to a temp file, writing the line in the correct place and then replace the original with the updated file:
import csv
from tempfile import NamedTemporaryFile
from shutil import move
with open("foo.csv") as f, NamedTemporaryFile(dir=".", delete=False) as temp:
r = csv.reader(f)
wr = csv.writer(temp)
new = ["foo", "bar", "12", "3"]
key, ind = int(new[-1]), None
for i, row in enumerate(r):
if int(row[-1]) >= key:
wr.writerow(new)
wr.writerow(row)
wr.writerows(r)
break
wr.writerow(row)
else:
wr.writerow(new)
move(temp.name, "foo.csv")
foo.csv after will have the data in order:
Kris,Rice,17,1
Olly,Fallows,17,2
foo,bar,12,3
Olivia,Bird,17,4
Louis,Martin,18,5
You can check if your list has the same length as your file and if not, you run addScouts again, and if true, you end. Like this:
def addScouts(self,I):
i = I
with open(fileName,"r") as f:
lines = f.readlines()
for line in lines:
if str(line.split(",")[3])[:-1] == str(i):
self.scoutList.insert(END,line[:-1])
i += 1
return self.addScouts(i)
if len(scoutList) < len(lines):
return self.addScouts(i+1)
else:
return

.csv data into a dictionary in Python: Duplicate values

I'm attempting to turn .csv data into a dictionary in Python but I appear to be getting duplicate dictionary entries.
This is an example of what the .csv data looks like:
ticker,1,2,3,4,5,6
XOM,10,15,17,11,13,20
AAPL,12,11,12,13,11,22
My intention is to use the first column as the key and the remaining columns as the values. Ideally I should have 3 entries: ticker, XOM, and AAPL. But instead I get this:
{'ticker': ['1', '2', '3', '4', '5', '6']}
{'ticker': ['1', '2', '3', '4', '5', '6']}
{'XOM': ['10', '15', '17', '11', '13', '20']}
{'ticker': ['1', '2', '3', '4', '5', '6']}
{'XOM': ['10', '15', '17', '11', '13', '20']}
{'AAPL': ['12', '11', '12', '13', '11', '22']}
So it looks like I'm getting row 1, then row 1 & 2, then row 1, 2 & 3.
This is the code I'm using:
def data_pull():
#gets data out of a .csv file
datafile = open("C:\sample.csv")
data = [] #blank list
dict = {} #blank dictionary
for row in datafile:
data.append(row.strip().split(",")) #removes whitespace and commas
for x in data: #organizes data from list into dictionary
k = x[0]
v = x[1:]
dict = {k:v for x in data}
print dict
data_pull()
I'm trying to figure out why the duplicate entries are showing up.
You have too many loops; you extend data then loop over the whole data list with all entries gathered so far:
for row in datafile:
data.append(row.strip().split(",")) #removes whitespace and commas
for x in data:
# will loop over all entries parsed so far
so you'd append a row to data, then loop over the list, with one item:
data = [['ticker', '1', '2', '3', '4', '5', '6']]
then you'd read the next line and append to data, so then you loop over data again and process:
data = [
['ticker', '1', '2', '3', '4', '5', '6'],
['XOM', '10', '15', '17', '11', '13', '20'],
]
so iterate twice, then add the next line, loop three times, etc.
You could simplify this to:
for row in datafile:
x = row.strip().split(",")
dict[x[0]] = x[1:]
You can save yourself some work by using the csv module:
import csv
def data_pull():
results = {}
with open("C:\sample.csv", 'rb') as datafile:
reader = csv.reader(datafile)
for row in reader:
results[row[0]] = row[1:]
return results
Use the built in csv module:
import csv
output = {}
with open("C:\sample.csv") as f:
freader = csv.reader(f)
for row in freader:
output[row[0]] = row[1:]
The loop for x in data should be outside of the loop for row in datafile:
for row in datafile:
data.append(row.strip().split(",")) #removes whitespace and commas
for x in data: #organizes data from list into dictionary
k = x[0]
Or, csv module can be your friend:
with open("text.csv") as lines:
print {row[0]: row[1:] for row in csv.reader(lines)}
A side note. It's always a good idea to use the raw strings for Windows paths:
open(r"C:\sample.csv")
If your file was named, e.g, C:\text.csv then \t would be interpreted as a tab character.

python reading text file

I have a text file, of which i need each column, preferably into a dictionary or list, the format is :
N ID REMAIN VERS
2 2343333 bana twelve
3 3549287 moredp twelve
3 9383737 hinsila twelve
3 8272655 hinsila eight
I have tried:
crs = open("file.txt", "r")
for columns in ( raw.strip().split() for raw in crs ):
print columns[0]
Result = 'Out of index error'
Also tried:
crs = csv.reader(open(file.txt", "r"), delimiter=',', quotechar='|', skipinitialspace=True)
for row in crs:
for columns in row:
print columns[3]
Which seems to read each char as a column, instead of each 'word'
I would like to get the four columns, ie:
2
2343333
bana
twelve
into seperate dictionaries or lists
Any help is great, thanks!
This works fine for me:
>>> crs = open("file.txt", "r")
>>> for columns in ( raw.strip().split() for raw in crs ):
... print columns[0]
...
N
2
3
3
3
If you want to convert columns to rows, use zip.
>>> crs = open("file.txt", "r")
>>> rows = (row.strip().split() for row in crs)
>>> zip(*rows)
[('N', '2', '3', '3', '3'),
('ID', '2343333', '3549287', '9383737', '8272655'),
('REMAIN', 'bana', 'moredp', 'hinsila', 'hinsila'),
('VERS', 'twelve', 'twelve', 'twelve', 'eight')]
If you have blank lines, filter them before using zip.
>>> crs = open("file.txt", "r")
>>> rows = (row.strip().split() for row in crs)
>>> zip(*(row for row in rows if row))
[('N', '2', '3', '3', '3'), ('ID', '2343333', '3549287', '9383737', '8272655'), ('REMAIN', 'bana', 'moredp', 'hinsila', 'hinsila'), ('VERS', 'twelve', 'twelve', 'twelve', 'eight')]
>>> with open("file.txt") as f:
... c = csv.reader(f, delimiter=' ', skipinitialspace=True)
... for line in c:
... print(line)
...
['N', 'ID', 'REMAIN', 'VERS', ''] #that '' is for leading space after columns.
['2', '2343333', 'bana', 'twelve', '']
['3', '3549287', 'moredp', 'twelve', '']
['3', '9383737', 'hinsila', 'twelve', '']
['3', '8272655', 'hinsila', 'eight', '']
Or, old-fashioned way:
>>> with open("file.txt") as f:
... [line.split() for line in f]
...
[['N', 'ID', 'REMAIN', 'VERS'],
['2', '2343333', 'bana', 'twelve'],
['3', '3549287', 'moredp', 'twelve'],
['3', '9383737', 'hinsila', 'twelve'],
['3', '8272655', 'hinsila', 'eight']]
And for getting column values:
>>> l
[['N', 'ID', 'REMAIN', 'VERS'],
['2', '2343333', 'bana', 'twelve'],
['3', '3549287', 'moredp', 'twelve'],
['3', '9383737', 'hinsila', 'twelve'],
['3', '8272655', 'hinsila', 'eight']]
>>> {l[0][i]: [line[i] for line in l[1:]] for i in range(len(l[0]))}
{'ID': ['2343333', '3549287', '9383737', '8272655'],
'N': ['2', '3', '3', '3'],
'REMAIN': ['bana', 'moredp', 'hinsila', 'hinsila'],
'VERS': ['twelve', 'twelve', 'twelve', 'eight']}
You could use a list comprehension like this:
with open("split.txt","r") as splitfile:
for columns in [line.split() for line in splitfile]:
print(columns)
You will then have it in a 2d array allowing you to group it any way you like it.
How about this?
f = open("file.txt")
for i in f:
k = i.split()
for j in k:
print j
just use a list of lists
import csv
columns = [[] for _ in range(4)] # 4 columns expected
with open('path', rb) as f:
reader = csv.reader(f, delimiter=' ')
for row in reader:
for i, col in enumerate(row):
columns[i].append(col)
or if the number of columns needs to grow dynamically:
import csv
columns = []
with open('path', rb) as f:
reader = csv.reader(f, delimiter=' ')
for row in reader:
while len(row) > len(columns):
columns.append([])
for i, col in enumerate(row):
columns[i].append(col)
In the end, you can then print your columns with:
for i, col in enumerate(columns, 1):
print 'List{}: {{{}}}'.format(i, ','.join(col))

Categories

Resources