Why my loop not return first values? I want replaces specific text if this text exist in my value, but if not exist, i want get a initial meaning. In last value I get that i need, but first value my code miss.
p = ["Adams","Tonny","Darjus FC", "Marcus FC", "Jessie AFC", "John CF", "Miler
SV","Redgard"]
o = [' FC'," CF"," SSV"," SV"," CM", " AFC"]
for i, j in itertools.product(p, o):
if j in i:
name = i.replace(f"{j}","")
print(name)
elif j not in i:
pass
print(i)
I got this:
Darjus
Marcus
Jessie
John
Miler
Redgard
but i want this:
Adams
Tonny
Darjus
Marcus
Jessie
John
Miler
Redgard
The use of product() is going to make solving this problem a lot harder than it needs to be. It would be easier to use a nested loop.
p = ["Adams", "Tonny", "Darjus FC", "Marcus FC",
"Jessie AFC", "John CF", "Miler SV", "Redgard"]
o = [' FC', " CF", " SSV", " SV", " CM", " AFC"]
for i in p:
# Store name, for if no match found
name = i
for j in o:
if j in i:
# Reformat name if match
name = i.replace(j, "")
print(name)
If you would like to store the names in a list, here's one way to do it:
p = ['Adams', 'Tonny', 'Darjus FC', 'Marcus FC', 'Jessie AFC', 'John CF', 'Miler SV', 'Redgard']
o = ['FC', 'CF', 'SSV', 'SV', 'CM', 'AFC']
result = []
for name in p:
if name.split()[-1] in o:
result.append(name.split()[0])
else:
result.append(name)
print(result)
['Adams', 'Tonny', 'Darjus', 'Marcus', 'Jessie', 'John', 'Miler', 'Redgard']
Related
This is the sample data in a file. I want to split each line in the file and add to a dataframe. In some cases they have more than 1 child. So whenever they have more than one child new set of column have to be added child2 Name and DOB
(P322) Rashmika Chadda 15/05/1995 – Rashmi C 12/02/2024
(P324) Shiva Bhupati 01/01/1994 – Vinitha B 04/08/2024
(P356) Karthikeyan chandrashekar 22/02/1991 – Kanishka P 10/03/2014
(P366) Kalyani Manoj 23/01/1975 - Vandana M 15/05/1995 - Chandana M 18/11/1998
This is the code I have tried but this splits only by taking "-" into consideration
with open("text.txt") as read_file:
file_contents = read_file.readlines()
content_list = []
temp = []
for each_line in file_contents:
temp = each_line.replace("–", " ").split()
content_list.append(temp)
print(content_list)
Current output:
[['(P322)', 'Rashmika', 'Chadda', '15/05/1995', 'Rashmi', 'Chadda', 'Teega', '12/02/2024'], ['(P324)', 'Shiva', 'Bhupati', '01/01/1994', 'Vinitha', 'B', 'Sahu', '04/08/2024'], ['(P356)', 'Karthikeyan', 'chandrashekar', '22/02/1991', 'Kanishka', 'P', '10/03/2014'], ['(P366)', 'Kalyani', 'Manoj', '23/01/1975', '-', 'Vandana', 'M', '15/05/1995', '-', 'Chandana', 'M', '18/11/1998']]
Final output should be like below
Code
Parent_Name
DOB
Child1_Name
DOB
Child2_Name
DOB
P322
Rashmika Chadda
15/05/1995
Rashmi C
12/02/2024
P324
Shiva Bhupati
01/01/1994
Vinitha B
04/08/2024
P356
Karthikeyan chandrashekar
22/02/1991
Kanishka P
10/03/2014
P366
Kalyani Manoj
23/01/1975
Vandana M
15/05/1995
Chandana M
18/11/1998
I'm not sure if you want it as a list or something else.
To get lists:
result = []
for t in text[:]:
# remove the \n at the end of each line
t = t.strip()
# remove the parenthesis you don't wnt
t = t.replace("(", "")
t = t.replace(")", "")
# split on space
t = t.split(" – ")
# reconstruct
for i, person in enumerate(t):
person = person.split(" ")
# print(person)
# remove code
if i==0:
res = [person.pop(0)]
res.extend([" ".join(person[:2]), person[2]])
result.append(res)
print(result)
Which would give the below output:
[['P322', 'Rashmika Chadda', '15/05/1995', 'Rashmi C', '12/02/2024'], ['P324', 'Shiva Bhupati', '01/01/1994', 'Vinitha B', '04/08/2024'], ['P356', 'Karthikeyan chandrashekar', '22/02/1991', 'Kanishka P', '10/03/2014'], ['P366', 'Kalyani Manoj', '23/01/1975', 'Vandana M', '15/05/1995', 'Chandana M', '18/11/1998']]
You can organise a bit more the data using dictionnary:
result = {}
for t in text[:]:
# remove the \n at the end of each line
t = t.strip()
# remove the parenthesis you don't wnt
t = t.replace("(", "")
t = t.replace(")", "")
# split on space
t = t.split(" – ")
for i, person in enumerate(t):
# split name
person = person.split(" ")
# remove code
if i==0:
code = person.pop(0)
if i==0:
result[code] = {"parent_name": " ".join(person[:2]), "parent_DOB": person[2], "children": [] }
else:
result[code]['children'].append({f"child{i}_name": " ".join(person[:2]), f"child{i}_DOB": person[2]})
print(result)
Which would give this output:
{'P322': {'children': [{'child1_DOB': '12/02/2024',
'child1_name': 'Rashmi C'}],
'parent_DOB': '15/05/1995',
'parent_name': 'Rashmika Chadda'},
'P324': {'children': [{'child1_DOB': '04/08/2024',
'child1_name': 'Vinitha B'}],
'parent_DOB': '01/01/1994',
'parent_name': 'Shiva Bhupati'},
'P356': {'children': [{'child1_DOB': '10/03/2014',
'child1_name': 'Kanishka P'}],
'parent_DOB': '22/02/1991',
'parent_name': 'Karthikeyan chandrashekar'},
'P366': {'children': [{'child1_DOB': '15/05/1995',
'child1_name': 'Vandana M'},
{'child2_DOB': '18/11/1998', 'child2_name': 'Chandana M'}],
'parent_DOB': '23/01/1975',
'parent_name': 'Kalyani Manoj'}}
In the end, to have an actual table, you would need to use pandas but that will require for you to fix the number of children max so that you can pad the empty cells.
I am having this text
/** Goodmorning
Alex
Dog
House
Red
*/
/** Goodnight
Maria
Cat
Office
Green
*/
I would like to have Alex , Dog , House and red in one list and Maria,Cat,office,green in an other list.
I am having this code
with open(filename) as f :
for i in f:
if i.startswith("/** Goodmorning"):
#add files to list
elif i.startswith("/** Goodnight"):
#add files to other list
So, is there any way to write the script so it can understands that Alex belongs in the part of the text that has Goodmorning?
I'd recommend you to use dict, where "section name" will be a key:
with open(filename) as f:
result = {}
current_list = None
for line in f:
if line.startswith("/**"):
current_list = []
result[line[3:].strip()] = current_list
elif line != "*/":
current_list.append(line.strip())
Result:
{'Goodmorning': ['Alex', 'Dog', 'House', 'Red'], 'Goodnight': ['Maria', 'Cat', 'Office', 'Green']}
To search which key one of values belongs you can use next code:
search_value = "Alex"
for key, values in result.items():
if search_value in values:
print(search_value, "belongs to", key)
break
I would recommend to use Regular expressions. In python there is a module for this called re
import re
s = """/** Goodmorning
Alex
Dog
House
Red
*/
/** Goodnight
Maria
Cat
Office
Green
*/"""
pattern = r'/\*\*([\w \n]+)\*/'
word_groups = re.findall(pattern, s, re.MULTILINE)
d = {}
for word_group in word_groups:
words = word_group.strip().split('\n\n')
d[words[0]] = words[1:]
print(d)
Output:
{'Goodmorning': ['Alex', 'Dog', 'House', 'Red'], 'Goodnight':
['Maria', 'Cat', 'Office', 'Green']}
expanding on Olvin Roght (sorry can't comment - not enough reputation) I would keep a second dictionary for the reverse lookup
with open(filename) as f:
key_to_list = {}
name_to_key = {}
current_list = None
current_key = None
for line in f:
if line.startswith("/**"):
current_list = []
current_key = line[3:].strip()
key_to_list[current_key] = current_list
elif line != "*/":
current_name=line.strip()
name_to_key[current_name]=current_key
current_list.append(current_name)
print key_to_list
print name_to_key['Alex']
alternative is to convert the dictionary afterwards:
name_to_key = {n : k for k in key_to_list for n in key_to_list[k]}
(i.e if you want to go with the regex version from ashwani)
Limitation is that this only permits one membership per name.
I want to take a dictionary, where the key is a string and the value a list of strings, and be able to print it in a new file where the keys are alphabetized, as well as the values. I'm not having any issue with the values. The problem lies in figuring out how to get the keys to print in the file alphabetically. Here is what I have:
def write_movie_info(string, aDict):
newFile = open(string, 'w')
myList = []
for movie in aDict:
aDict[movie].sort()
myList.append([movie] + aDict[movie])
for aList in myList:
joiner = ", ".join(aList[1:])
newFile.write(aList[0] + ': ' + joiner + '\n')
and the dictionary is:
movies = {"Chocolat": ["Juliette Binoche", "Judi Dench", "Johnny Depp", "Alfred Molina"], "Skyfall": ["Judi Dench", "Daniel Craig", "Javier Bardem", "Naomie Harris"]}
write_movie_info("Test.txt", movies)
Rather than iterating over the dictionary, you can iterate over its sorted keys created with:
sorted(aDict.keys())
Here's the function modified to just print the list:
def write_movie_info(string, aDict):
myList = []
sorted_keys = sorted(aDict.keys())
for movie in sorted_keys:
aDict[movie].sort()
myList.append([movie] + aDict[movie])
print(myList)
movies = {"Lawrence of Arabia": ["Peter O'Toole", "Omar Sharif"], "Chocolat": ["Juliette Binoche", "Judi Dench", "Johnny Depp", "Alfred Molina"], "Skyfall": ["Judi Dench", "Daniel Craig", "Javier Bardem", "Naomie Harris"]}
write_movie_info("Test.txt", movies)
Prints:
[['Chocolat', 'Alfred Molina', 'Johnny Depp', 'Judi Dench', 'Juliette Binoche'], ['Lawrence of Arabia', 'Omar Harif', "Peter O'Toole"], ['Skyfall', 'Daniel Craig', 'Javier Bardem', 'Judi Dench', 'Naomie Harris']]
You can use sorted method to sort your dictionary using key
sorted(your_dict.items(), key=lambda x: x[0])
and you can directly iterate it.
for key, value in sorted(your_dict.items(), key=lambda x: x[0]):
# do something
ref: https://docs.python.org/3/library/functions.html#sorted
OrderedDict() could be used to maintain the insertion-ordering, once the keys are sorted.
Ex.
import collections
movies = {"Skyfall": ["Judi Dench", "Daniel Craig", "Javier Bardem", "Naomie Harris"],
"Chocolat": ["Juliette Binoche", "Judi Dench", "Johnny Depp", "Alfred Molina"]}
order_dict = collections.OrderedDict(sorted(movies.items(), key=lambda x: x[0]))
print(order_dict)
O/P:
OrderedDict([('Chocolat', ['Juliette Binoche', 'Judi Dench', 'Johnny Depp', 'Alfred Molina']), ('Skyfall', ['Judi Dench', 'Daniel Craig', 'Javier Bardem', 'Naomie Harris'])])
Suppose I have a big text file in the following form
[Surname: "Gordon"]
[Name: "James"]
[Age: "13"]
[Weight: "46"]
[Height: "12"]
[Quote: "I want to be a pilot"]
[Name: "Monica"]
[Weight: "33"]
[Quote: "I am looking forward to christmas"]
There are in total 8 keys which will always be in the order of "Surname","Name","Age","Weight","Height","School","Siblings","Quote" which I know beforehand. As you can see, some profiles do not have the full set of variables. The only thing you can be sure will exist is the name.
I want to create a pandas dataframe with each observation as a row and each column as a key. In the case of James, since he does not have the entries in "School" and "Sibling" I would like the entries of those cells to be the numpy nan object.
My attempt is using something like (?:\[Surname: \"()\"\]) for every variable. But even for the single case of surname I run into problems. If surname does not exist, it returns no place holders just the empty list.
Update:
As an example, I would like the return for monica's profile to be
('','Monica','','33','','','','I am looking forward to christmas')
You can parse the file data, group the results, and pass to a dataframe:
import re
import pandas as pd
def group_results(d):
_group = [d[0]]
for a, b in d[1:]:
if a == 'Name' and not any(c == 'Name' for c, _ in _group):
_group.append([a, b])
elif a == 'Surname' and any(c == 'Name' for c, _ in _group):
yield _group
_group = [[a, b]]
else:
if a == 'Name':
yield _group
_group = [[a, b]]
else:
_group.append([a, b])
yield _group
headers = ["Surname","Name","Age","Weight","Height","School","Siblings","Quote"]
data = list(filter(None, [i.strip('\n') for i in open('filename.txt')]))
parsed = [(lambda x:[x[0], x[-1][1:-1]])(re.findall('(?<=^\[)\w+|".*?"(?=\]$)', i)) for i in data]
_grouped = list(map(dict, group_results(parsed)))
result = pd.DataFrame([[c.get(i, "") for i in headers] for c in _grouped], columns=headers)
Output:
Surname Name ... Siblings Quote
0 Gordon James ... I want to be a pilot
1 Monica ... I am looking forward to christmas
[2 rows x 8 columns]
Building on #WiktorStribiżew comment, you could use groupby (from itertools) to group the lines into empty lines and data lines, for instance like this:
import re
from itertools import groupby
text = '''[Surname: "Gordon"]
[Name: "James"]
[Age: "13"]
[Weight: "46"]
[Height: "12"]
[Quote: "I want to be a pilot"]
[Name: "Monica"]
[Weight: "33"]
[Quote: "I am looking forward to christmas"]
[Name: "John"]
[Height: "33"]
[Quote: "I am looking forward to christmas"]
[Surname: "Gordon"]
[Name: "James"]
[Height: "44"]
[Quote: "I am looking forward to christmas"]'''
patterns = [re.compile('(\[Surname: "(?P<surname>\w+?)"\])'),
re.compile('(\[Name: "(?P<name>\w+?)"\])'),
re.compile('(\[Age: "(?P<age>\d+?)"\])'),
re.compile('\[Weight: "(?P<weight>\d+?)"\]'),
re.compile('\[Height: "(?P<height>\d+?)"\]'),
re.compile('\[Quote: "(?P<quote>.+?)"\]')]
records = []
for non_empty, group in groupby(text.splitlines(), key=lambda l: bool(l.strip())):
if non_empty:
lines = list(group)
record = {}
for line in lines:
for pattern in patterns:
match = pattern.search(line)
if match:
record.update(match.groupdict())
break
records.append(record)
for record in records:
print(record)
Output
{'weight': '46', 'quote': 'I want to be a pilot', 'age': '13', 'name': 'James', 'height': '12', 'surname': 'Gordon'}
{'weight': '33', 'quote': 'I am looking forward to christmas', 'name': 'Monica'}
{'height': '33', 'quote': 'I am looking forward to christmas', 'name': 'John'}
{'height': '44', 'surname': 'Gordon', 'quote': 'I am looking forward to christmas', 'name': 'James'}
Note: This creates a dictionary where the keys are the field names and the values are the values of each, this format does not match your intended output, but I believe is more complete that what you requested. In any case you can easily convert from this format into the desired tuple format.
Explanation
The groupby function from itertools groups the input data into contiguous groups of empty lines and record lines. Then you only need to process the groups that are not empty. The processing is simple for each line try to match a pattern if the pattern is matched break, assuming the lines are exclusive for each match update the record dictionary with the value of the field, leveraging named groups.
You can rewrite your data file. The code parses your original file into classes D, then uses csv.DictWriter to write it into a normal style csv that should be readable by pandas:
Create demo file:
fn = "t.txt"
with open (fn,"w") as f:
f.write("""
[Surname: "Gordon"]
[Name: "James"]
[Age: "13"]
[Weight: "46"]
[Height: "12"]
[Quote: "I want to be a pilot"]
[Name: "Monica"]
[Weight: "33"]
[Quote: "I am looking forward to christmas"]
""")
Itermediate class:
class D:
fields = ["Surname","Name","Age","Weight","Height","Quote"]
def __init__(self,textlines):
t = [(k.strip(),v.strip()) for k,v in (x.strip().split(":",1) for x in textlines)]
self.data = {k:"" for k in D.fields}
self.data.update(t)
def surname(self): return self.data["Surname"]
def name(self): return self.data["Name"]
def age(self): return self.data["Age"]
def weight(self): return self.data["Weight"]
def height(self): return self.data["Height"]
def quote(self): return self.data["Quote"]
def get_data(self):
return self.data
Parsing and rewriting:
fn = "t.txt"
# list of all collected D-Instances
data = []
with open(fn) as f:
# each dataset contains all lines belonging to one "person"
dataset = []
surname = False
for line in f.readlines():
clean = line.strip().strip("[]")
if clean and (clean.startswith("Surname") or clean.startswith("Name")):
if any(e.startswith("Name") for e in dataset):
data.append(D(dataset))
dataset = []
if clean:
dataset.append(clean)
else:
if clean:
dataset.append(clean)
elif clean:
dataset.append(clean)
if dataset:
data.append(D(dataset))
import csv
with open("other.txt", "w", newline="") as f:
dw = csv.DictWriter(f,fieldnames=D.fields)
dw.writeheader()
for entry in data:
dw.writerow(entry.get_data())
Check what was written:
with open("other.txt","r") as f:
print(f.read())
Output:
Surname,Name,Age,Weight,Height,Quote
"""Gordon""","""James""","""13""","""46""","""12""","""I want to be a pilot"""
,"""Monica""",,"""33""",,"""I am looking forward to christmas"""
Create a list of (key,value) tuples for each info block with re.findall(), and put them in separate dictionaries:
text="""[Surname: "Gordon"]
[Name: "James"]
[Age: "13"]
[Weight: "46"]
[Height: "12"]
[Quote: "I want to be a pilot"]
[Name: "Monica"]
[Weight: "33"]
[Quote: "I am looking forward to christmas"]"""
keys=['Surname','Name','Age','Weight','Height','Quote']
rslt=[{}]
for k,v in re.findall(r"(?m)(?:^\s*\[(\w+):\s*\"\s*([^\]\"]+)\"\s*\])+",text):
d=rslt[-1]
if (k=="Surname" and d) or (k=="Name" and "Name" in d):
d={}
rslt.append(d)
d[k]=v
for d in rslt:
print( [d.get(k,'') for k in keys] )
Out:
['Gordon', 'James', '13', '46', '12', 'I want to be a pilot']
['', 'Monica', '', '33', '', 'I am looking forward to christmas']
I'm trying to add on values to a key after making a dictionary.
This is what I have so far:
movie_list = "movies.txt" # using a file that contains this order on first line: Title, year, genre, director, actor
in_file = open(movie_list, 'r')
in_file.readline()
def list_maker(in_file):
movie1 = str(input("Enter in a movie: "))
movie2 = str(input("Enter in another movie: "))
d = {}
for line in in_file:
l = line.split(",")
title_year = (l[0], l[1]) # only then making the tuple ('Title', 'year')
for i in range(4, len(l)):
d = {title_year: l[i]}
if movie1 or movie2 == l[0]:
print(d.values())
The output I get it:
Enter in a movie: 13 B
Enter in another movie: 1920
{('13 B', '(2009)'): 'R. Madhavan'}
{('13 B', '(2009)'): 'Neetu Chandra'}
{('13 B', '(2009)'): 'Poonam Dhillon\n'}
{('1920', '(2008)'): 'Rajneesh Duggal'}
{('1920', '(2008)'): 'Adah Sharma'}
{('1920', '(2008)'): 'Anjori Alagh\n'}
{('1942 A Love Story', '(1994)'): 'Anil Kapoor'}
{('1942 A Love Story', '(1994)'): 'Manisha Koirala'}
{('1942 A Love Story', '(1994)'): 'Jackie Shroff\n'}
.... so on and so forth. I get the whole list of movies.
How would I go about doing so if I wanted to enter in those two movies (any 2 movies as a union of the values to the key (movie1, movie2) )?
Example:
{('13 B', '(2009)'): 'R. Madhavan', 'Neetu Chandra', 'Poonam Dhillon'}
{('1920', '(2008)'): 'Rajneesh Duggal', 'Adah Sharma', 'Anjori Alagh'}
Sorry if the output isn't completely what you want, but here's how you should do it:
d = {}
for line in in_file:
l = line.split(",")
title_year = (l[0], l[1])
people = []
for i in range(4, len(l)):
people.append(l[i]) # we append items to the list...
d = {title_year: people} # ...and then make the dict so that the list is in it.
if movie1 or movie2 == l[0]:
print(d.values())
Basically, what we are doing here is that we are making a list, and then setting the list to a key inside of the dict.