python reading text file - python

I have a text file, of which i need each column, preferably into a dictionary or list, the format is :
N ID REMAIN VERS
2 2343333 bana twelve
3 3549287 moredp twelve
3 9383737 hinsila twelve
3 8272655 hinsila eight
I have tried:
crs = open("file.txt", "r")
for columns in ( raw.strip().split() for raw in crs ):
print columns[0]
Result = 'Out of index error'
Also tried:
crs = csv.reader(open(file.txt", "r"), delimiter=',', quotechar='|', skipinitialspace=True)
for row in crs:
for columns in row:
print columns[3]
Which seems to read each char as a column, instead of each 'word'
I would like to get the four columns, ie:
2
2343333
bana
twelve
into seperate dictionaries or lists
Any help is great, thanks!

This works fine for me:
>>> crs = open("file.txt", "r")
>>> for columns in ( raw.strip().split() for raw in crs ):
... print columns[0]
...
N
2
3
3
3
If you want to convert columns to rows, use zip.
>>> crs = open("file.txt", "r")
>>> rows = (row.strip().split() for row in crs)
>>> zip(*rows)
[('N', '2', '3', '3', '3'),
('ID', '2343333', '3549287', '9383737', '8272655'),
('REMAIN', 'bana', 'moredp', 'hinsila', 'hinsila'),
('VERS', 'twelve', 'twelve', 'twelve', 'eight')]
If you have blank lines, filter them before using zip.
>>> crs = open("file.txt", "r")
>>> rows = (row.strip().split() for row in crs)
>>> zip(*(row for row in rows if row))
[('N', '2', '3', '3', '3'), ('ID', '2343333', '3549287', '9383737', '8272655'), ('REMAIN', 'bana', 'moredp', 'hinsila', 'hinsila'), ('VERS', 'twelve', 'twelve', 'twelve', 'eight')]

>>> with open("file.txt") as f:
... c = csv.reader(f, delimiter=' ', skipinitialspace=True)
... for line in c:
... print(line)
...
['N', 'ID', 'REMAIN', 'VERS', ''] #that '' is for leading space after columns.
['2', '2343333', 'bana', 'twelve', '']
['3', '3549287', 'moredp', 'twelve', '']
['3', '9383737', 'hinsila', 'twelve', '']
['3', '8272655', 'hinsila', 'eight', '']
Or, old-fashioned way:
>>> with open("file.txt") as f:
... [line.split() for line in f]
...
[['N', 'ID', 'REMAIN', 'VERS'],
['2', '2343333', 'bana', 'twelve'],
['3', '3549287', 'moredp', 'twelve'],
['3', '9383737', 'hinsila', 'twelve'],
['3', '8272655', 'hinsila', 'eight']]
And for getting column values:
>>> l
[['N', 'ID', 'REMAIN', 'VERS'],
['2', '2343333', 'bana', 'twelve'],
['3', '3549287', 'moredp', 'twelve'],
['3', '9383737', 'hinsila', 'twelve'],
['3', '8272655', 'hinsila', 'eight']]
>>> {l[0][i]: [line[i] for line in l[1:]] for i in range(len(l[0]))}
{'ID': ['2343333', '3549287', '9383737', '8272655'],
'N': ['2', '3', '3', '3'],
'REMAIN': ['bana', 'moredp', 'hinsila', 'hinsila'],
'VERS': ['twelve', 'twelve', 'twelve', 'eight']}

You could use a list comprehension like this:
with open("split.txt","r") as splitfile:
for columns in [line.split() for line in splitfile]:
print(columns)
You will then have it in a 2d array allowing you to group it any way you like it.

How about this?
f = open("file.txt")
for i in f:
k = i.split()
for j in k:
print j

just use a list of lists
import csv
columns = [[] for _ in range(4)] # 4 columns expected
with open('path', rb) as f:
reader = csv.reader(f, delimiter=' ')
for row in reader:
for i, col in enumerate(row):
columns[i].append(col)
or if the number of columns needs to grow dynamically:
import csv
columns = []
with open('path', rb) as f:
reader = csv.reader(f, delimiter=' ')
for row in reader:
while len(row) > len(columns):
columns.append([])
for i, col in enumerate(row):
columns[i].append(col)
In the end, you can then print your columns with:
for i, col in enumerate(columns, 1):
print 'List{}: {{{}}}'.format(i, ','.join(col))

Related

Python : I am not able to access individual elements inside sublist. The entire sublist is displayed as single element

My Code :
import ast
with open('input.txt', 'r') as file :
filedata = file.read()
filedata = filedata.replace('|', ',')
out = []
buff = []
for c in filedata :
if c == '\n':
out.append(''.join(buff))
buff = []
else:
buff.append(c)
else:
if buff:
out.append(''.join(buff))
list = [[i] for i in out]
print(list)
Input :
10|1|SELL|toaster_1|10.00|20 12|8|BID|toaster_1|7.50
13|5|BID|toaster_1|12.50 15|8|SELL|tv_1|250.00|20 16
17|8|BID|toaster_1|20.00 18|1|BID|tv_1|150.00 19|3|BID|tv_1|200.00
20 21|3|BID|tv_1|300.00
Expected Output
[["10","1","SELL","toaster_1","10.00","20"],
["12","8","BID","toaster_1","7.50"],
["13","5","BID","toaster_1","12.50"],
["15","8","SELL","tv_1","250.00","20"], ["16"],
["17","8","BID","toaster_1","20.00"],
["18","1","BID","tv_1","150.00"], ["19","3","BID","tv_1","200.00"],
["20"], ["21","3","BID","tv_1","300.00"]] "
The Output I am getting:
[['10,1,SELL,toaster_1,10.00,20'],
['12,8,BID,toaster_1,7.50'], ['13,5,BID,toaster_1,12.50'],
['15,8,SELL,tv_1,250.00,20'], ['16'], ['17,8,BID,toaster_1,20.00'],
['18,1,BID,tv_1,150.00'], ['19,3,BID,tv_1,200.00'], ['20'],
['21,3,BID,tv_1,300.00']] [Finished in 0.1s]
I want to access individual elements within sublist, eg, SELL, or
toaster, but I am not able to access them. Can someone advice please?
Use:
# filedata = file.read()
filedata = """10|1|SELL|toaster_1|10.00|20 12|8|BID|toaster_1|7.50
13|5|BID|toaster_1|12.50 15|8|SELL|tv_1|250.00|20 16
17|8|BID|toaster_1|20.00 18|1|BID|tv_1|150.00 19|3|BID|tv_1|200.00
20 21|3|BID|tv_1|300.00 """
result = []
for i in filedata.split(): #split by space
result.append(i.split("|")) #split by `|` and append to result
print(result)
Or a list comprehension
Ex:
result = [i.split("|") for i in filedata.split()]
Output:
[['10', '1', 'SELL', 'toaster_1', '10.00', '20'],
['12', '8', 'BID', 'toaster_1', '7.50'],
['13', '5', 'BID', 'toaster_1', '12.50'],
['15', '8', 'SELL', 'tv_1', '250.00', '20'],
['16'],
['17', '8', 'BID', 'toaster_1', '20.00'],
['18', '1', 'BID', 'tv_1', '150.00'],
['19', '3', 'BID', 'tv_1', '200.00'],
['20'],
['21', '3', 'BID', 'tv_1', '300.00']]
Well your code never handles splitting the line into comma separated values. You just read the line character by character, join all those characters together into a string, and append it to the out list.
The following code should work (I minimally changed your own code. I would instead use a more clean solution like the one by Rakesh):
import ast
with open('input.txt', 'r') as file :
filedata = file.read()
filedata = filedata.replace('|', ',')
out = []
buff = []
for c in filedata :
if c == '\n':
line = ''.join(buff)
for word in line.split(","):
out.append(word)
buff = []
else:
buff.append(c)
else:
if buff:
out.append(''.join(buff))
# l = [[i] for i in out]
print(out)
By the way, it is recommended not to use list as a variable name.

How to export multiple dictionaries with same keys, different values in txt file to csv

I have a list of {n} dictionaries in a txt file. Each dictionary per line as illustrated below which i want exported in csv format with each key presented per column.
{'a':'1','b':'2','c':'3'}
{'a':'4','b':'5','c':'6'}
{'a':'7','b':'8','c':'9'}
{'a':'10','b':'11','c':'12'}
...
{'a':'x','b':'y','c':'z'}
i want csv output for {n} rows as below with index
a b c
0 1 2 3
1 4 5 6
2 7 8 9
... ... ... ...
n x y z
You can use ast.literal_eval (doc) to load your data from the text file.
With contents of input file file.txt:
{'a':'1','b':'2','c':'3'}
{'a':'4','b':'5','c':'6'}
{'a':'7','b':'8','c':'9'}
{'a':'10','b':'11','c':'12'}
{'a':'x','b':'y','c':'z'}
You could use this script to load the data and input file.csv:
import csv
from ast import literal_eval
with open('file.txt', 'r') as f_in:
lst = [literal_eval(line) for line in f_in if line.strip()]
with open('file.csv', 'w', newline='') as csvfile:
fieldnames = ['a', 'b', 'c']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
writer.writerows(lst)
file.csv will become:
a,b,c
1,2,3
4,5,6
7,8,9
10,11,12
x,y,z
Importing the file to LibreOffice:
x =[{'a':'1','b':'2','c':'3'},
{'a':'4','b':'5','c':'6'},
{'a':'7','b':'8','c':'9'},
{'a':'10','b':'11','c':'12'}]
n = len(x)
keys = list(x[0].keys())
newdict=dict()
for m in keys:
newdict[m]=[]
for i in range(n):
newdict[m].append(x[i][m])
newdict
Output is
{'a': ['1', '4', '7', '10'],
'b': ['2', '5', '8', '11'],
'c': ['3', '6', '9', '12']}
Or you can use pandas.concat which is used to combine DataFrames with the same columns.
import pandas as pd
x =[{'a':'1','b':'2','c':'3'},
{'a':'4','b':'5','c':'6'},
{'a':'7','b':'8','c':'9'},
{'a':'10','b':'11','c':'12'}]
xpd=[]
for i in x:
df=pd.DataFrame(i, index=[0])
xpd.append(df)
pd.concat(xpd, ignore_index=True)

Split a string which is inside a list of a list in two elements which stay in the same list

This is a chain of number inside a text file which I import and want to convert into a specific list.
3 04,24
4 04,75
4 05,11
4 05,47
4 05,78
4 06,80
3 07,25
3 07,92
3 08,23
2 09,76
Actually with my code I reach this point :
[['3 04,24'], ['4 04,75'], ['4 05,11'], ['4 05,47'], ['4 05,78'], ['4 06,80'], ['3 07,25'], ['3 07,92'], ['3 08,23'], ['2 09,76']]
But I want to split the elements inside the tuples in two to get something like this :
[['3','04,24'], ['4','04,75']] etc...
But after many research I can't find the solution, also if you could tell me how to convert these elements from string to int that would be very helpful !
Here's my code :
with open("myfile.txt") as f:
mylist = [line.rstrip('\n') for line in f]
mylist = [mylist[x:x+1] for x in range(0, len(mylist), 1)]
print(mylist)
Thanks.
This is one solution using csv module from the standard library:
import csv
with open('myfile.txt', 'r') as f:
reader = csv.reader(f, delimiter=' ')
res = list(reader)
Example with your data:-
from io import StringIO
import csv
mystr = StringIO("""3 04,24
4 04,75
4 05,11
4 05,47
4 05,78
4 06,80
3 07,25
3 07,92
3 08,23
2 09,76""")
with mystr as f:
reader = csv.reader(f, delimiter=' ')
res = list(reader)
print(res)
# [['3', '04,24'],
# ['4', '04,75'],
# ['4', '05,11'],
# ['4', '05,47'],
# ['4', '05,78'],
# ['4', '06,80'],
# ['3', '07,25'],
# ['3', '07,92'],
# ['3', '08,23'],
# ['2', '09,76']]
Or if you need to convert data to numeric:
with mystr as f:
reader = csv.reader(f, delimiter=' ')
res = [[int(i), float(j.replace(',', '.'))] for i, j in reader]
print(res)
[[3, 4.24],
[4, 4.75],
[4, 5.11],
...
Use a list-comprehension:
>>> lst = [['3 04.24'], ['4 04.75'], ['4 05.11'], ['4 05.47'], ['4 05.78'], ['4 06.80'], ['3 07.25'], ['3 07.92'], ['3 08.23'], ['2 09.76']]
>>> [x[0].split() for x in lst]
Outputs:
[['3', '04.24'],
['4', '04.75'],
['4', '05.11'],
['4', '05.47'],
['4', '05.78'],
['4', '06.80'],
['3', '07.25'],
['3', '07.92'],
['3', '08.23'],
['2', '09.76']]
To convert string into integer:
[[int(i) if not '.' in i else float(i) for i in x[0].split()] for x in lst]
Use the str.split() method:
with open("myfile.txt") as f:
mylist = [line.rstrip('\n') for line in f]
my_structured_list = [line.split(" ") for line in mylist]
print(my_structured_list)
For the second part of your question about converting the elements to int, you can use str.split() again and convert the resulting elements to int:
with open("myfile.txt") as f:
mylist = [line.rstrip('\n') for line in f]
my_structured_list = [line.split(" ") for line in mylist]
my_structured_int_list = []
for line_tuple in my_structured_list:
input_first_element = line_tuple[0]
input_second_element, input_third_element = line_tuple[1].split(",")
output_first_half = int(input_first_element)
output_second_half = int(input_second_element), int(input_third_element)
my_structured_int_list.append((output_first_half, output_second_half))
print(my_structured_int_list)
simple solution is as follows
with open(file,'r') as f:
print([each.split() for each in f])

How to split file into smaller by first number in second column

So my data looks like:
1 3456542 5 may 2014
2 1245678 4 may 2014
3 4256876 2 may 2014
4 5643156 6 may 2014
.....
The goal is to sort it by the 2nd column then separate the rows based on the first number in the 2nd column (i.e. 3456542 goes to subs_3.txt, 1245678 goes to subs_1.txt...). The output is totally wrong and gives me 6 files with what appears to be random rows in them. Any suggestions?
import csv
from operator import itemgetter
file_lines = []
with open("subs.txt", "r") as csv_file:
reader = csv.reader(csv_file, delimiter=" ")
for row in reader:
file_lines.append(row)
file_lines.sort(key=itemgetter(1))
with open("sorted_subs.txt", "w") as csv_file:
writer = csv.writer(csv_file, delimiter=" ")
for row in file_lines:
writer.writerow(row)
for row in file_lines:
file_num = row[1[1]
with open("file_{0}.txt".format(file_num), "w") as f:
writer = csv.writer(f, delimiter=" ")
writer.writerow(row)
You could use itertools.groupby to group the lines that go to same file together and then just loop over the groups in order to write the files:
from itertools import groupby
for k, g in groupby(file_lines, key=lambda x: x[1][0]):
with open("file_{0}.txt".format(k), "w") as f:
csv.writer(f, delimiter=" ").writerows(g)
Update: groupby will group the lines based on the first number in second column. It will return the key used for grouping and iterator containing the grouped items. Since file_lines is already sorted we know that all items belonging to same group will be returned within one group. Here's a short example how it works, note that test data is different than in original question in order to demonstrate grouping:
from itertools import groupby
lst = [
['2', '1245678', '', '4', 'may', '2014'],
['1', '3456542', '', '5', 'may', '2014'],
['3', '3256876', '', '2', 'may', '2014'],
['4', '5643156', '', '6', 'may', '2014']
]
for k, g in groupby(lst, key=lambda x: x[1][0]):
print('key: {0}, items: {1}'.format(k, list(g)))
Output:
key: 1, items: [['2', '1245678', '', '4', 'may', '2014']]
key: 3, items: [['1', '3456542', '', '5', 'may', '2014'], ['3', '3256876', '', '2', 'may', '2014']]
key: 5, items: [['4', '5643156', '', '6', 'may', '2014']]

.csv data into a dictionary in Python: Duplicate values

I'm attempting to turn .csv data into a dictionary in Python but I appear to be getting duplicate dictionary entries.
This is an example of what the .csv data looks like:
ticker,1,2,3,4,5,6
XOM,10,15,17,11,13,20
AAPL,12,11,12,13,11,22
My intention is to use the first column as the key and the remaining columns as the values. Ideally I should have 3 entries: ticker, XOM, and AAPL. But instead I get this:
{'ticker': ['1', '2', '3', '4', '5', '6']}
{'ticker': ['1', '2', '3', '4', '5', '6']}
{'XOM': ['10', '15', '17', '11', '13', '20']}
{'ticker': ['1', '2', '3', '4', '5', '6']}
{'XOM': ['10', '15', '17', '11', '13', '20']}
{'AAPL': ['12', '11', '12', '13', '11', '22']}
So it looks like I'm getting row 1, then row 1 & 2, then row 1, 2 & 3.
This is the code I'm using:
def data_pull():
#gets data out of a .csv file
datafile = open("C:\sample.csv")
data = [] #blank list
dict = {} #blank dictionary
for row in datafile:
data.append(row.strip().split(",")) #removes whitespace and commas
for x in data: #organizes data from list into dictionary
k = x[0]
v = x[1:]
dict = {k:v for x in data}
print dict
data_pull()
I'm trying to figure out why the duplicate entries are showing up.
You have too many loops; you extend data then loop over the whole data list with all entries gathered so far:
for row in datafile:
data.append(row.strip().split(",")) #removes whitespace and commas
for x in data:
# will loop over all entries parsed so far
so you'd append a row to data, then loop over the list, with one item:
data = [['ticker', '1', '2', '3', '4', '5', '6']]
then you'd read the next line and append to data, so then you loop over data again and process:
data = [
['ticker', '1', '2', '3', '4', '5', '6'],
['XOM', '10', '15', '17', '11', '13', '20'],
]
so iterate twice, then add the next line, loop three times, etc.
You could simplify this to:
for row in datafile:
x = row.strip().split(",")
dict[x[0]] = x[1:]
You can save yourself some work by using the csv module:
import csv
def data_pull():
results = {}
with open("C:\sample.csv", 'rb') as datafile:
reader = csv.reader(datafile)
for row in reader:
results[row[0]] = row[1:]
return results
Use the built in csv module:
import csv
output = {}
with open("C:\sample.csv") as f:
freader = csv.reader(f)
for row in freader:
output[row[0]] = row[1:]
The loop for x in data should be outside of the loop for row in datafile:
for row in datafile:
data.append(row.strip().split(",")) #removes whitespace and commas
for x in data: #organizes data from list into dictionary
k = x[0]
Or, csv module can be your friend:
with open("text.csv") as lines:
print {row[0]: row[1:] for row in csv.reader(lines)}
A side note. It's always a good idea to use the raw strings for Windows paths:
open(r"C:\sample.csv")
If your file was named, e.g, C:\text.csv then \t would be interpreted as a tab character.

Categories

Resources