Split each line in a file based on delimitters

Split each line in a file based on delimitters - python

This is the sample data in a file. I want to split each line in the file and add to a dataframe. In some cases they have more than 1 child. So whenever they have more than one child new set of column have to be added child2 Name and DOB
(P322) Rashmika Chadda 15/05/1995 – Rashmi C 12/02/2024
(P324) Shiva Bhupati 01/01/1994 – Vinitha B 04/08/2024
(P356) Karthikeyan chandrashekar 22/02/1991 – Kanishka P 10/03/2014
(P366) Kalyani Manoj 23/01/1975 - Vandana M 15/05/1995 - Chandana M 18/11/1998
This is the code I have tried but this splits only by taking "-" into consideration
with open("text.txt") as read_file:
file_contents = read_file.readlines()
content_list = []
temp = []
for each_line in file_contents:
temp = each_line.replace("â€“", " ").split()
content_list.append(temp)
print(content_list)
Current output:
[['(P322)', 'Rashmika', 'Chadda', '15/05/1995', 'Rashmi', 'Chadda', 'Teega', '12/02/2024'], ['(P324)', 'Shiva', 'Bhupati', '01/01/1994', 'Vinitha', 'B', 'Sahu', '04/08/2024'], ['(P356)', 'Karthikeyan', 'chandrashekar', '22/02/1991', 'Kanishka', 'P', '10/03/2014'], ['(P366)', 'Kalyani', 'Manoj', '23/01/1975', '-', 'Vandana', 'M', '15/05/1995', '-', 'Chandana', 'M', '18/11/1998']]
Final output should be like below
Code
Parent_Name
DOB
Child1_Name
DOB
Child2_Name
DOB
P322
Rashmika Chadda
15/05/1995
Rashmi C
12/02/2024
P324
Shiva Bhupati
01/01/1994
Vinitha B
04/08/2024
P356
Karthikeyan chandrashekar
22/02/1991
Kanishka P
10/03/2014
P366
Kalyani Manoj
23/01/1975
Vandana M
15/05/1995
Chandana M
18/11/1998

I'm not sure if you want it as a list or something else.
To get lists:
result = []
for t in text[:]:
# remove the \n at the end of each line
t = t.strip()
# remove the parenthesis you don't wnt
t = t.replace("(", "")
t = t.replace(")", "")
# split on space
t = t.split(" – ")
# reconstruct
for i, person in enumerate(t):
person = person.split(" ")
# print(person)
# remove code
if i==0:
res = [person.pop(0)]
res.extend([" ".join(person[:2]), person[2]])
result.append(res)
print(result)
Which would give the below output:
[['P322', 'Rashmika Chadda', '15/05/1995', 'Rashmi C', '12/02/2024'], ['P324', 'Shiva Bhupati', '01/01/1994', 'Vinitha B', '04/08/2024'], ['P356', 'Karthikeyan chandrashekar', '22/02/1991', 'Kanishka P', '10/03/2014'], ['P366', 'Kalyani Manoj', '23/01/1975', 'Vandana M', '15/05/1995', 'Chandana M', '18/11/1998']]
You can organise a bit more the data using dictionnary:
result = {}
for t in text[:]:
# remove the \n at the end of each line
t = t.strip()
# remove the parenthesis you don't wnt
t = t.replace("(", "")
t = t.replace(")", "")
# split on space
t = t.split(" – ")
for i, person in enumerate(t):
# split name
person = person.split(" ")
# remove code
if i==0:
code = person.pop(0)
if i==0:
result[code] = {"parent_name": " ".join(person[:2]), "parent_DOB": person[2], "children": [] }
else:
result[code]['children'].append({f"child{i}_name": " ".join(person[:2]), f"child{i}_DOB": person[2]})
print(result)
Which would give this output:
{'P322': {'children': [{'child1_DOB': '12/02/2024',
'child1_name': 'Rashmi C'}],
'parent_DOB': '15/05/1995',
'parent_name': 'Rashmika Chadda'},
'P324': {'children': [{'child1_DOB': '04/08/2024',
'child1_name': 'Vinitha B'}],
'parent_DOB': '01/01/1994',
'parent_name': 'Shiva Bhupati'},
'P356': {'children': [{'child1_DOB': '10/03/2014',
'child1_name': 'Kanishka P'}],
'parent_DOB': '22/02/1991',
'parent_name': 'Karthikeyan chandrashekar'},
'P366': {'children': [{'child1_DOB': '15/05/1995',
'child1_name': 'Vandana M'},
{'child2_DOB': '18/11/1998', 'child2_name': 'Chandana M'}],
'parent_DOB': '23/01/1975',
'parent_name': 'Kalyani Manoj'}}
In the end, to have an actual table, you would need to use pandas but that will require for you to fix the number of children max so that you can pad the empty cells.

Related

Nested loops in Python with two list

Why my loop not return first values? I want replaces specific text if this text exist in my value, but if not exist, i want get a initial meaning. In last value I get that i need, but first value my code miss.
p = ["Adams","Tonny","Darjus FC", "Marcus FC", "Jessie AFC", "John CF", "Miler
SV","Redgard"]
o = [' FC'," CF"," SSV"," SV"," CM", " AFC"]
for i, j in itertools.product(p, o):
if j in i:
name = i.replace(f"{j}","")
print(name)
elif j not in i:
pass
print(i)
I got this:
Darjus
Marcus
Jessie
John
Miler
Redgard
but i want this:
Adams
Tonny
Darjus
Marcus
Jessie
John
Miler
Redgard

The use of product() is going to make solving this problem a lot harder than it needs to be. It would be easier to use a nested loop.
p = ["Adams", "Tonny", "Darjus FC", "Marcus FC",
"Jessie AFC", "John CF", "Miler SV", "Redgard"]
o = [' FC', " CF", " SSV", " SV", " CM", " AFC"]
for i in p:
# Store name, for if no match found
name = i
for j in o:
if j in i:
# Reformat name if match
name = i.replace(j, "")
print(name)

If you would like to store the names in a list, here's one way to do it:
p = ['Adams', 'Tonny', 'Darjus FC', 'Marcus FC', 'Jessie AFC', 'John CF', 'Miler SV', 'Redgard']
o = ['FC', 'CF', 'SSV', 'SV', 'CM', 'AFC']
result = []
for name in p:
if name.split()[-1] in o:
result.append(name.split()[0])
else:
result.append(name)
print(result)
['Adams', 'Tonny', 'Darjus', 'Marcus', 'Jessie', 'John', 'Miler', 'Redgard']

add values to a list from specific part of a text file

I am having this text
/** Goodmorning
Alex
Dog
House
Red
*/
/** Goodnight
Maria
Cat
Office
Green
*/
I would like to have Alex , Dog , House and red in one list and Maria,Cat,office,green in an other list.
I am having this code
with open(filename) as f :
for i in f:
if i.startswith("/** Goodmorning"):
#add files to list
elif i.startswith("/** Goodnight"):
#add files to other list
So, is there any way to write the script so it can understands that Alex belongs in the part of the text that has Goodmorning?

I'd recommend you to use dict, where "section name" will be a key:
with open(filename) as f:
result = {}
current_list = None
for line in f:
if line.startswith("/**"):
current_list = []
result[line[3:].strip()] = current_list
elif line != "*/":
current_list.append(line.strip())
Result:
{'Goodmorning': ['Alex', 'Dog', 'House', 'Red'], 'Goodnight': ['Maria', 'Cat', 'Office', 'Green']}
To search which key one of values belongs you can use next code:
search_value = "Alex"
for key, values in result.items():
if search_value in values:
print(search_value, "belongs to", key)
break

I would recommend to use Regular expressions. In python there is a module for this called re
import re
s = """/** Goodmorning
Alex
Dog
House
Red
*/
/** Goodnight
Maria
Cat
Office
Green
*/"""
pattern = r'/\*\*([\w \n]+)\*/'
word_groups = re.findall(pattern, s, re.MULTILINE)
d = {}
for word_group in word_groups:
words = word_group.strip().split('\n\n')
d[words[0]] = words[1:]
print(d)
Output:
{'Goodmorning': ['Alex', 'Dog', 'House', 'Red'], 'Goodnight':
['Maria', 'Cat', 'Office', 'Green']}

expanding on Olvin Roght (sorry can't comment - not enough reputation) I would keep a second dictionary for the reverse lookup
with open(filename) as f:
key_to_list = {}
name_to_key = {}
current_list = None
current_key = None
for line in f:
if line.startswith("/**"):
current_list = []
current_key = line[3:].strip()
key_to_list[current_key] = current_list
elif line != "*/":
current_name=line.strip()
name_to_key[current_name]=current_key
current_list.append(current_name)
print key_to_list
print name_to_key['Alex']
alternative is to convert the dictionary afterwards:
name_to_key = {n : k for k in key_to_list for n in key_to_list[k]}
(i.e if you want to go with the regex version from ashwani)
Limitation is that this only permits one membership per name.

Making a list from a loop output

data = ['network 10.185.16.64 255.255.255.224','network 55.242.33.0 255.255.255.0','network 55.242.154.0 255.255.255.252']
pref_network_find = re.findall('(\S+\s+255.255.255.\w+)',str(data))
mydict = {"255.255.255.0":24,"255.255.255.128":25,"255.255.255.192":26,"255.255.255.224":27,"255.255.255.240":28,"255.255.255.248":29,"255.255.255.252":30}
for i in pref_network_find:
splitlines = i.split()
for word in splitlines:
if word in mydict:
i = i.replace(word,str(mydict[word]))
pref = print (i)
listi = []
for line in pref_network_find:
listi.append(i)
print (listi)
10.185.16.64 27
55.242.33.0 24
55.242.154.0 30
['55.242.154.0 30', '55.242.154.0 30', '55.242.154.0 30']
Process finished with exit code 0
Im trying to get ['55.242.154.0 30', '55.242.33.0 24', '10.185.16.64 27'] as list1 at the end, but cant understand my mistake here. Could you help me with that?

You do not need to garner the initial spliced and joined IPs with regex; instead, just use str.split():
import re
data = ['network 10.185.16.64 255.255.255.224','network 55.242.33.0 255.255.255.0','network 55.242.154.0 255.255.255.252']
mydict = {"255.255.255.0":24,"255.255.255.128":25,"255.255.255.192":26,"255.255.255.224":27,"255.255.255.240":28,"255.255.255.248":29,"255.255.255.252":30}
final_list = sorted(['{} {}'.format(b, mydict[c]) for a, b, c in [i.split() for i in data]], key=lambda x:map(int, re.split('\.|\s', x)), reverse=True)
Output:
['55.242.154.0 30', '55.242.33.0 24', '10.185.16.64 27']

Obviously, it will print 30 at the end because your this code
for i in pref_network_find:
splitlines = i.split()
for word in splitlines:
if word in mydict:
i = i.replace(word,str(mydict[word]))
pref = print (i)
i is 30 after execution. And you are using old variable 'i' like this
for line in pref_network_find:
listi.append(i)
So yes the code is doing its job well, i is 30 and it is appending 30 to your result.
Correct code goes like this.
import re
data = ['network 10.185.16.64 255.255.255.224','network 55.242.33.0 255.255.255.0','network 55.242.154.0 255.255.255.252']
pref_network_find = re.findall('(\S+\s+255.255.255.\w+)',str(data))
mydict = {"255.255.255.0":24,"255.255.255.128":25,"255.255.255.192":26,"255.255.255.224":27,"255.255.255.240":28,"255.255.255.248":29,"255.255.255.252":30}
listi = []
for i in pref_network_find:
splitlines = i.split()
for word in splitlines:
if word in mydict:
i = i.replace(word,str(mydict[word]))
pref = print (i)
listi.append(i)
print (listi)
Correct me if I am wrong here, maybe you want something else, however, this is what I understood by your question.

Your code is wrong because you are appending with wrong index i at here :
for line in pref_network_find:
listi.append(i)
We have last value in i = 55.242.154.0 from previous loop. You should use line instead of i or append in for loop directly
data = ['network 10.185.16.64 255.255.255.224','network 55.242.33.0 255.255.255.0','network 55.242.154.0 255.255.255.252']
pref_network_find = re.findall('(\S+\s+255.255.255.\w+)',str(data))
mydict = {"255.255.255.0":24,"255.255.255.128":25,"255.255.255.192":26,"255.255.255.224":27,"255.255.255.240":28,"255.255.255.248":29,"255.255.255.252":30}
listi = []
for i in pref_network_find:
splitlines = i.split()
for word in splitlines:
if word in mydict:
listi.append(i.replace(word, str(mydict[word])))
print(listi)

Adding on values to a key based on different lengths

I'm trying to add on values to a key after making a dictionary.
This is what I have so far:
movie_list = "movies.txt" # using a file that contains this order on first line: Title, year, genre, director, actor
in_file = open(movie_list, 'r')
in_file.readline()
def list_maker(in_file):
movie1 = str(input("Enter in a movie: "))
movie2 = str(input("Enter in another movie: "))
d = {}
for line in in_file:
l = line.split(",")
title_year = (l[0], l[1]) # only then making the tuple ('Title', 'year')
for i in range(4, len(l)):
d = {title_year: l[i]}
if movie1 or movie2 == l[0]:
print(d.values())
The output I get it:
Enter in a movie: 13 B
Enter in another movie: 1920
{('13 B', '(2009)'): 'R. Madhavan'}
{('13 B', '(2009)'): 'Neetu Chandra'}
{('13 B', '(2009)'): 'Poonam Dhillon\n'}
{('1920', '(2008)'): 'Rajneesh Duggal'}
{('1920', '(2008)'): 'Adah Sharma'}
{('1920', '(2008)'): 'Anjori Alagh\n'}
{('1942 A Love Story', '(1994)'): 'Anil Kapoor'}
{('1942 A Love Story', '(1994)'): 'Manisha Koirala'}
{('1942 A Love Story', '(1994)'): 'Jackie Shroff\n'}
.... so on and so forth. I get the whole list of movies.
How would I go about doing so if I wanted to enter in those two movies (any 2 movies as a union of the values to the key (movie1, movie2) )?
Example:
{('13 B', '(2009)'): 'R. Madhavan', 'Neetu Chandra', 'Poonam Dhillon'}
{('1920', '(2008)'): 'Rajneesh Duggal', 'Adah Sharma', 'Anjori Alagh'}

Sorry if the output isn't completely what you want, but here's how you should do it:
d = {}
for line in in_file:
l = line.split(",")
title_year = (l[0], l[1])
people = []
for i in range(4, len(l)):
people.append(l[i]) # we append items to the list...
d = {title_year: people} # ...and then make the dict so that the list is in it.
if movie1 or movie2 == l[0]:
print(d.values())
Basically, what we are doing here is that we are making a list, and then setting the list to a key inside of the dict.

How to search for multiple data from multiple lines and store them in dictionary?

Say I have a file with the following:
/* Full name: abc */
.....
.....(.....)
.....(".....) ;
/* .....
/* .....
..... : "....."
}
"....., .....
Car : true ;
House : true ;
....
....
Age : 33
....
/* Full name: xyz */
....
....
Car : true ;
....
....
Age : 56
....
I am only interested in full name, car, house and age of each person. There are many other lines of data with different format between the variable/attritbute that I am interested.
My code so far:
import re
initial_val = {'House': 'false', 'Car': 'false'}
with open('input.txt') as f:
records = []
current_record = None
for line in f:
if not line.strip():
continue
elif current_record is None:
people_name = re.search('.+Full name ?: (.+) ', line)
if people_name:
current_record = dict(initial_val, Name = people_name.group(1))
else:
continue
elif current_record is not None:
house = re.search(' *(House) ?: ?([a-z]+)', line)
if house:
current_record['House'] = house.group(2)
car = re.search(' *(Car) ?: ?([a-z]+)', line)
if car:
current_record['Car'] = car.group(2)
people_name = re.search('.+Full name ?: (.+) ', line)
if people_name:
records.append(current_record)
current_record = dict(initial_val, Name = people_name.group(1))
print records
What I get:
[{'Name': 'abc', 'House': 'true', 'Car': 'true'}]
My question:
How am I suppose to extract the data and store it in a dictionary like:
{'abc': {'Car': true, 'House': true, 'Age': 33}, 'xyz':{'Car': true, 'House': false, 'Age': 56}}
My purpose:
check whether each person has car, house and age, if no then return false
The I could print them in a table like this:
Name Car House Age
abc true true 33
xyz true false 56
Note that I am using Python 2.7 and I do not know what is the actual value of each variable/attribute (Eg. abc, true, true, 33) of each person.
What is the best solution to my question? Thanks.

Well, you just have to keep track of the current record:
def parse_name(line):
# first remove the initial '/* ' and final ' */'
stripped_line = line.strip('/* ')
return stripped_line.split(':')[-1]
WANTED_KEYS = ('Car', 'Age', 'House')
# default values for when the lines are not present for a record
INITIAL_VAL = {'Car': False, 'House': False, Age: -1}
with open('the_filename') as f:
records = []
current_record = None
for line in f:
if not line.strip():
# skip empty lines
continue
elif current_record is None:
# first record in the file
if line.startswith('/*'):
current_record = dict(INITIAL_VAL, name=parse_name(line))
else:
# this should probably be an error in the file contents
continue
elif line.startswith('/*'):
# this means that the current record finished, and a new one is starting
records.append(current_record)
current_record = dict(INITIAL_VAL, name=parse_name(line))
else:
key, val = line.split(':')
if key.strip() in WANTED_KEYS:
# we want to keep track of this field
current_record[key.strip()] = val.strip()
# otherwise just ignore the line
print('Name\tCar\tHouse\tAge')
for record in records:
print(record['name'], record['Car'], record['House'], record['Age'], sep='\t')
Note that for Age you may want to convert it to an integer using int:
if key == 'Age':
current_record['Age'] = int(val)
The above code produces a list of dictionaries, but it is easy enough to convert it to a dictionary of dicts:
new_records = {r['name']: dict(r) for r in records}
for val in new_records.values():
del val['name']
After this new_records will be something like:
{'abc': {'Car': True, 'House': True, Age: 20}, ...}
If you have other lines with a different format in between the interesting ones you can simply write a function that returns True or False depending on whether the line is in the format you require and use it to filter the lines of the file:
def is_interesting_line(line):
if line.startswith('/*'):
return True
elif ':' in line:
return True
for line in filter(is_interesting_line, f):
# code as before
Change is_interesting_line to suit your needs. In the end, if you have to handle several different formats etc. maybe using a regex would be better, in that case you could do something like:
import re
LINE_REGEX = re.compile(r'(/\*.*\*/)|(\w+\s*:.*)| <other stuff>')
def is_interesting_line(line):
return LINE_REGEX.match(line) is not None
If you want you can obtain fancier formatting for the table, but you probably first need to determine the maximum length of the name etc. or you can use something like tabulate to do that for you.
For example something like (not tested):
max_name_length = max(max(len(r['name']) for r in records), 4)
format_string = '{:<{}}\t{:<{}}\t{}\t{}'
print(format_string.format('Name', max_name_length, 'Car', 5, 'House', 'Age'))
for record in records:
print(format_string.format(record['name'], max_name_length, record['Car'], 5, record['House'], record['Age']))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Split each line in a file based on delimitters - python

Related

Nested loops in Python with two list

add values to a list from specific part of a text file

Making a list from a loop output

Adding on values to a key based on different lengths

How to search for multiple data from multiple lines and store them in dictionary?

Categories

Resources