Python: Separate text file data into tuples? - python

I'm currently working on trying to separate values inside of a .txt file into tuples. This is so that, later on, I want to create a simple database using these tuples to look up the data. Here is my current code:
with open("data.txt") as load_file:
data = [tuple(line.split()) for line in load_file]
c = 0
pts = []
while c < len(data):
pts.append(data[c][0])
c += 1
print(pts)
pts = []
Here is the text file:
John|43|123 Apple street|514 428-3452
Katya|26|49 Queen Mary Road|514 234-7654
Ahmad|91|1888 Pepper Lane|
I want to store each value that is separated with a "|" and store these into my tuple in order for this database to work. Here is my current output:
['John|43|123']
['Katya|26|49']
['Ahmad|91|1888']
So it is storing some of the data as a single string, and I can't figure out how to make this work. My desired end result is something like this:
['John', 43, '123 Apple street', 514 428-3452]
['Katya', 26, '49 Queen Mary Road', 514 234-7654]
['Ahmad', 91, '1888 Pepper Lane', ]

try with this:
with open("data.txt") as load_file:
data = [line.strip('\n').split('|') for line in load_file]
for elem in data:
print(elem)

Try to use csv module with custom delimiter=:
import csv
with open("your_file.txt", "r") as f_in:
reader = csv.reader(f_in, delimiter="|")
for a, b, c, d in reader:
print([a, int(b), c, d])
Prints:
['John', 43, '123 Apple street', '514 428-3452']
['Katya', 26, '49 Queen Mary Road', '514 234-7654']
['Ahmad', 91, '1888 Pepper Lane', '']

Related

Is there a way to split a line by multiple characters using the spilt method in python?

So far I have this code to split my file lines.
with open("example.dat", 'r') as f:
lines = [line.strip().split(',') for line in f]
print(lines)
I want to split the code so that I have a multidimensional array where the data is represented like [city, state, latitude, longitude, population]. However, the split method only takes one parameter, so after some research I imported re and tried to use that since the file I am working with has a pattern. However, the run results are not separating the data from the file into the array in the manner I would like.
For example, if the file has the information
New York City, NY[40,74]11000000
The code above would print [['New York City', ' NY[40', '70]11000000'], etc.].
I want it to print [['New York City', 'NY', 40, 70, 11000000], etc.].
Since I didn't get the results I wanted I tried the following code.
import re
with open("example.dat", 'r') as f:
lines = [re.split(r',[,]', line) for line in f]
print(lines)
The is code outputs the data in this manner:
[['New York City, NY[40,70]11000000\n'], etc.]
So can I use re or split method to split a line by different characters or no?
The easiest solution may be to flatten the different split characters to a single one:
with open("example.dat", "r") as fh:
lines = []
for line in fh:
lines.append( line.strip().replace("[", ",").replace("]", ",").split(",") )
You can use named groups in regular expression to more properly extract the information (read more here: https://www.regular-expressions.info/refext.html):
import re
pat = r"(?P<city>[^,]*), (?P<state>[\w\W]*)\[(?P<lat>\d+),(?P<lon>\d+)\](?P<pop>\d+)"
pat = re.compile(pat, re.VERBOSE)
city = match.group("city")
state = match.group("state")
lat = float(match.group("lat"))
lon = float(match.group("lon"))
population = int(match.group("pop"))
line = [city, state, lat, lon, population)
# => ['New York City', ' NY', 40.0, 74.0, 11000000]
Regex is pretty useful in such cases:
import re
x = 'New York City, NY[40,74]11000000'
res = re.split(', |\[|\]|,', x)
print(res)
#####
['New York City', 'NY', '40', '74', '11000000']

Writing to a file the same way you read from it

The title may be a little confusing but there was no other way I could explain it. I'm first importing scores from my file, then I sort them into order, then I try to export them back to my file however in the same way it was imported - not as a list.
Here's a little sketch:
Import as ' James 120 ' into list [James, 120]
Export as James 120
Here's what I have so far:
def Leaderboard(User, Score):
Scores = open("Scores.txt", "r+")
content = Scores.readlines()
new = []
for i in content:
temp = []
newlist = i.split(" ")
temp.append(newlist[0])
temp.append(int(newlist[1]))
new.append(temp)
temp = []
temp.append(User)
temp.append(Score)
new.append(temp)
new = sorted(new, key=itemgetter(1), reverse=True)
print(new)
Scores.close()
Leaderboard('Hannah', 3333)
The file currently looks like this:
Olly 150
Billy 290
Graham 320
James 2
Alex 333
Here is the end result:
[['Hannah', 3333], ['Alex', 333], ['Graham', 320], ['Billy', 290], ['Olly',
150], ['James', 2]]
Here's what I want it exported as to my file:
Hannah 3333
Alex 333
Graham 320
Billy 290
Olly 150
James 2
The writing code would be something like this:
Scores.seek(0) # go to the beginning
for name, number in new:
print(name, number, file=Scores)
You can use file.read(), which will read the whole file. I believe you can just run contents = file.read(), as it returns as a string. There is also file.readlines() which returns a list of strings, each string being one line.
If this does not completely answer what you want, read more here.

How to check how close a number is to another number?

I have a TEXT FILE that looks like:
John: 27
Micheal8483: 160
Mary Smith: 57
Adam 22: 68
Patty: 55
etc etc. They are usernames that is why their names contain numbers occasionally. What I want to do is check each of their numbers (the ones after the ":") and get the 3 names that have the numbers that are closest in value to a integer (specifically named targetNum). It will always be positive.
I have tried multiple things but I am new to Python and I am not really sure how to go about this problem. Any help is appreciated!
You can parse the file into a list of name/number pairs. Then sort the list by difference between a number and targetNum. The first three items of the list will then contain the desired names:
users = []
with open("file.txt") as f:
for line in f:
name, num = line.split(":")
users.append((name, int(num)))
targetNum = 50
users.sort(key=lambda pair: abs(pair[1] - targetNum))
print([pair[0] for pair in users[:3]]) # ['Patty', 'Mary Smith', 'Adam 22']
You could use some regex recipe here :
import re
pattern=r'(\w.+)?:\s(\d+)'
data_1=[]
targetNum = 50
with open('new_file.txt','r') as f:
for line in f:
data=re.findall(pattern,line)
for i in data:
data_1.append((int(i[1])-targetNum,i[0]))
print(list(map(lambda x:x[1],data_1[-3:])))
output:
['Mary Smith', 'Adam 22', 'Patty']

How to split a string that is inside a 2D array in Python?

I'm fairly new to python and I have a task where I have to import some addresses from a text file and then search for and abbreveate the title for the road (i.e change 'road' to 'RD').
So far I've managed to import the file and I've been able to form a 2D array where the whole address is in a seperate array inside the main array. I've been looking for a way to get inside the arrays so I can split the strings them so I can do the abbreveation and then output each sub array on its own line in excel.
This is my code at the moment:
def sec_2():
addresses = []
with open('Addresses2.txt', newline='') as Addresses2:
for row in csv.reader(Addresses2):
addresses.append(row)
print(addresses)
The code outputs this so far:
[['52 Corinthian Road', ' First Floor'], ['20 Ingram Street', ' Forest Hills', ' New York'], ['14 Westbourne Terrace Road', ' Buxton'], ['The Terrace Restaurant', ' 81 Royal Street', ' Solihull']]
I need it to be:
[['52', 'Corinthian', 'Road', ' First', 'Floor']], [['20', 'Ingram', 'Street', ' Forest', 'Hills', 'New', 'York'] etc...
You should use the str.split function for each string in your list.
split_strings = []
for string in string_list:
split_strings.append(string.split(' '))
You can use itertools:
import csv
import itertools
data = csv.reader(open('filename.csv'))
final_data = [list(itertools.chain.from_iterable([b.split() for b in i])) for i in data]

Python: parsing texts in a .txt file

I have a text file like this.
1 firm A Manhattan (company name) 25,000
SK Ventures 25,000
AEA investors 10,000
2 firm B Tencent collaboration 16,000
id TechVentures 4,000
3 firm C xxx 625
(and so on)
I want to make a matrix form and put each item into the matrix.
For example, the first row of matrix would be like:
[[1,Firm A,Manhattan,25,000],['','',SK Ventures,25,000],['','',AEA investors,10,000]]
or,
[[1,'',''],[Firm A,'',''],[Manhattan,SK Ventures,AEA Investors],[25,000,25,000,10,000]]
For doing so, I wanna parse texts from each line of the text file. For example, from the first line, I can create [1,firm A, Manhattan, 25,000]. However, I can't figure out how exactly to do it. Every text starts at the same position, but ends at different positions. Is there any good way to do this?
Thank you.
Well if you know all of the start positions:
# 0123456789012345678901234567890123456789012345678901234567890
# 1 firm A Manhattan (company name) 25,000
# SK Ventures 25,000
# AEA investors 10,000
# 2 firm B Tencent collaboration 16,000
# id TechVentures 4,000
# 3 firm C xxx 625
# Field #1 is 8 wide (0 -> 7)
# Field #2 is 15 wide (8 -> 22)
# Field #3 is 19 wide (23 -> 41)
# Field #4 is arbitrarily wide (42 -> end of line)
field_lengths = [ 8, 15, 19, ]
data = []
with open('/path/to/file', 'r') as f:
row = f.readline()
row = row.strip()
pieces = []
for x in field_lengths:
piece = row[:x].strip()
pieces.append(piece)
row = row[x:]
pieces.append(row)
data.append(pieces)
From what you've given as data*, the input changes if the lines starts with a number or a space, and the data can be separated as
(numbers)(spaces)(letters with 1 space)(spaces)(letters with 1 space)(spaces)(numbers+commas)
or
(spaces)(letters with 1 space)(spaces)(numbers+commas)
That's what the two regexes below look for, and they build a dictionary with indexes from the leading numbers, each having a firm name and a list of company and value pairs.
I can't really tell what your matrix arrangement is.
import re
data = {}
f = open('data.txt')
for line in f:
if re.match('^\d', line):
matches = re.findall('^(\d+)\s+((\S\s|\s\S|\S)+)\s\s+((\S\s|\s\S|\S)+)\s\s+([0-9,]+)', line)
idx, firm, x, company, y, value = matches[0]
data[idx] = {}
data[idx]['firm'] = firm.strip()
data[idx]['company'] = [(company.strip(), value)]
else:
matches = re.findall('\s+((\S\s|\s\S|\S)+)\s\s+([0-9,]+)', line)
company, x, value = matches[0]
data[idx]['company'].append((company.strip(), value))
import pprint
pprint.pprint(data)
->
{'1': {'company': [('Manhattan (company name)', '25,000'),
('SK Ventures', '25,000'),
('AEA investors', '10,000')],
'firm': 'firm A'},
'2': {'company': [('Tencent collaboration', '16,000'),
('id TechVentures', '4,000')],
'firm': 'firm B'},
'3': {'company': [('xxx', '625')],
'firm': 'firm C'}
}
* This works on your example, but it may not work on your real data very well. YMMV.
If I understand you correctly (although I'm not totally sure I do), this will produce the output I think your looking for.
import re
with open('data.txt', 'r') as f:
f_txt = f.read() # Change file object to text
f_lines = re.split(r'\n(?=\d)', f_txt)
matrix = []
for line in f_lines:
inner1 = line.split('\n')
inner2 = [re.split(r'\s{2,}', l) for l in inner1]
matrix.append(inner2)
print(matrix)
print('')
for row in matrix:
print(row)
Output of the program:
[[['1', 'firm A', 'Manhattan (company name)', '25,000'], ['', 'SK Ventures', '25,000'], ['', 'AEA investors', '10,000']], [['2', 'firm B', 'Tencent collaboration', '16,000'], ['', 'id TechVentures', '4,000']], [['3', 'firm C', 'xxx', '625']]]
[['1', 'firm A', 'Manhattan (company name)', '25,000'], ['', 'SK Ventures', '25,000'], ['', 'AEA investors', '10,000']]
[['2', 'firm B', 'Tencent collaboration', '16,000'], ['', 'id TechVentures', '4,000']]
[['3', 'firm C', 'xxx', '625']]
I am basing this on the fact that you wanted the first row of your matrix to be:
[[1,Firm A,Manhattan,25,000],['',SK Ventures,25,000],['',AEA investors,10,000]]
However, to achieve this with more rows, we then get a list that is nested 3 levels deep. Such is the output of print(matrix). This can be a little unwieldy to use, which is why TessellatingHeckler's answer uses a dictionary to store the data, which I think is a much better way to access what you need. But if a list of list of "matrices' is what your after, then the code I wrote above does that.

Categories

Resources