Parse name, age, gender from given str.
input:
str = "musleh23malejemi22femaletanjir26male"
output:
Name: musleh, Age: 23, Gender: male
Name: jemi Age: 22, Gender: female
Name: tanjir, Age: 26, Gender: male
Here's what I have so far:
import re
text = "musleh23malejemi22femaletanjir26male"
chunks = re.split(r"\d+}", text)
this isn't working however. How should I approach this?
You could use a regular expression:
import re
s1 = "musleh23malejemi22femaletanjir26male"
pattern = r"([a-z]+)([0-9]+)(male|female)+"
for match in re.finditer(pattern, s1):
print(match.groups())
Out:
('musleh', '23', 'male')
('jemi', '22', 'female')
('tanjir', '26', 'male')
Related
I have a list in the following format:
list_names = ['Name: Mark - Age: 42 - Country: NL',
'Name: Katherine - Age: 23 - Country: NL',
'Name: Tom - Age: 31 - Country: NL']
As you can see, all the information is set in one string. What I need is to order this list based on the age, which is located somewhere in the middle of the string.
How can I do this?
The key to sorting the list by the names stored in the strings is, to extract the age from the string. Once, you have defined a function which does that, you can use the key argument of the .sort method. Using regular expressions, extracting the age is simple. A solution could look as follows.
import re
pattern = re.compile(r'age:\s*(\d+)', re.IGNORECASE)
def extract_age(s):
return int(pattern.search(s).group(1))
list_names = ['Name: Mark - Age: 42 - Country: NL',
'Name: Katherine - Age: 23 - Country: NL',
'Name: Tom - Age: 31 - Country: NL']
list_names.sort(key=extract_age)
print(list_names)
You can use regex to capture the age and use it as the sort key.
import re
list_names = ['Name: Mark - Age: 42 - Country: NL',
'Name: Katherine - Age: 23 - Country: NL',
'Name: Tom - Age: 31 - Country: NL']
def get_age(value):
match = re.search("Age: (\d+)", value)
return int(match.group(1))
list_names_sorted = sorted(list_names, key=get_age)
print(list_names_sorted)
Output (pretty printed):
[
'Name: Katherine - Age: 23 - Country: NL',
'Name: Tom - Age: 31 - Country: NL',
'Name: Mark - Age: 42 - Country: NL'
]
I am practicing with a dataset with customers. Each customer has a first name, last name, city, age, gender and invoice number.
I want to create a dictionary with the customers first and last name as the key value and append the rest of the information to the key value. There can be many invoices per customer, so that customer should only be counted once and have many invoice numbers.
City FirstName LastName Gender Age InvoiceNum
NYC Jane Doe Female 35 1023
NYC Jane Doe Female 35 6523
Jersey City John Smith Male 54 6985
Houston Kay Johnson Female 45 2357
To do so, I want to create a for loop.
class Customers:
city = ""
age = 0
invoices = []
f = open("customers".csv)
import csv
reader = csv.reader (f)
next(reader)
customers = {}
for row in reader:
This is where I am stuck. For every row in reader, I want to check if the customer already exists. If it exists, I want to add the repeating invoice numbers. If it does not exist, this will be a new customer where I will have to append the other values (city, gender, age, single invoice number).
Desired Output:
There are 3 customers. 2 are female, 1 is male. their average age is xxxx.
The count of customers does not repeat Jane Doe. the count of female does not repeat for Jane Doe. The average age will not sum Jane Doe's age twice.
I came up with this:
from collections import defaultdict
from dataclasses import dataclass, field
from typing import List
#dataclass
class Customer:
first_name: str = ''
last_name: str = ''
city: str = ''
age: int = 0
invoices: List = field(init=False, default_factory=list)
def process_entry(self, **row):
self.first_name = row['FirstName']
self.last_name = row['LastName']
self.city = row['City']
self.age = row['Age']
self.invoices.append(row['InvoiceNum'])
fake_reader = [
{
'FirstName': 'John',
'LastName': 'Doe',
'City': 'New York',
'Age': 30,
'InvoiceNum': 1
},
{
'FirstName': 'John',
'LastName': 'Doe',
'City': 'New York',
'Age': 30,
'InvoiceNum': 2
},
{
'FirstName': 'Clark',
'LastName': 'Kent',
'City': 'Metropolis',
'Age': 35,
'InvoiceNum': 3
}
]
customers = defaultdict(Customer)
for row in fake_reader:
customers[(row['FirstName'], row['LastName'])].process_entry(**row)
print(customers)
Output:
defaultdict(<class '__main__.Customer'>, {('John', 'Doe'): Customer(first_name='John', last_name='Doe', city='New York', age=30, invoices=[1, 2]), ('Clark', 'Kent'): Customer(first_name='Clark', last_name='Kent', city='Metropolis', age=35, invoices=[3])})
The "trick" here is to define the Customer class with default values, this way the real values can get filled using the process_entry method.
I think you're looking for something of the sort:
if name not in customers:
customers[name] = [invoice]
else:
customers[name].append(invoice)
This creates a key-value pair, with the value as an array which can then be appended to every time the for loop finds a new invoice for that name.
Edit: update to match your csv file
customers = {}
# [1:] to ignore file header
for row in reader[1:]:
City, FirstName, LastName, Gender, Age, InvoiceNum = row.split().strip()
newEntry = {'InvoiceNum': int(InvoiceNum), 'City': City, 'Gender': Gender, 'Age': int(Age)}
if (FirstName, LastName) not in customers:
customers[(FirstName, LastName)] = [newEntry]
else:
customers[(FirstName, LastName)].append(newEntry)
Immutable types can be dictionary keys, so I choose a tuple of the first and last name.
Edit: I'm hoping my answer takes you in the right direction, I left the 'csv' details to you, as your row may not correspond to what I did there.
When I was reading multiple files and exporting it, I realised that the values on these 4 column got overwritten by the latest value. Every file has the same iat cell location. I will like to know if this can be looped and values not getting overwritten.
name = df.iat[1,1]
age = df.iat[2,1]
height = df.iat[2,2]
address = df.iat[2,3]
Details = {'Name':name, 'Age':age,'Height':height,'Address':address}
df1 = pd.Series(Details).to_Frame()
df1 = df1.T
For example,
(1st Data):
Name: John
Age: 20
Height: 1.7m
Address: Bla Bla Bla
(2nd Data):
Name: Jack
Age: 21
Height: 1.7m
Address: Blah Blah Blah
(3rd Data):
Name: Jane
Age: 20
Height: 1.62m
Address: Blah Blah
You can loop and append your values to list.
name, age, height, address = [], [], [], []
for df in dfs:
name.append(df.iat[1,1])
age.append(df.iat[2,1])
height.append(df.iat[2,2])
address.append(df.iat[2,3])
Details = {'Name':name, 'Age':age,'Height':height,'Address':address}
df1 = pd.DataFrame(Details)
What would be the best way to define a config file and parse it using ConfigParser defining a bunch of objects initial values (aka: constructor values)
Example:
[Person-Objects]
Name: X
Age: 12
Profession: Student
Address: 555 Tortoise Drive
Name: Y
Age: 29
Profession: Programmer
Address: The moon
And then be able to parse it in Python so I can have something like:
People = []
for person in config:
People.append(person)
Person1 = People[0]
print Person1.Profession # Prints Student
You could do something like:
[person:X]
Age: 12
Profession: Student
Address: 555 Tortoise Drive
[person:Y]
Age: 29
Profession: Programmer
Address: The moon
And then in your code:
config = ConfigParser()
config.read('people.ini')
people = []
for s in config.sections():
if not s.startswith('person:'):
continue
name = s[7:]
person = dict(config.items(s))
person['name'] = name
people.append(person)
I have a little database text file db.txt:
(peter)
name = peter
surname = asd
year = 23
(tom)
name = tom
surname = zaq
year = 22
hobby = sport
(paul)
name = paul
surname = zxc
hobby = music
job = teacher
How to get all data section from for example tom? I want to get in variable:
(tom)
name = tom
surname = zaq
year = 22
hobby = sport
Then i want to change data:
replace("year = 22", "year = 23")
and get:
(tom)
name = tom
surname = zaq
year = 23
hobby = sport
Now add(job) and delete(surname) data:
(tom)
name = tom
year = 23
hobby = sport
job = taxi driver
And finally rewrite that changed section to old db.txt file:
(peter)
name = peter
surname = asd
year = 23
(tom)
name = tom
year = 23
hobby = sport
job = taxi driver
(paul)
name = paul
surname = zxc
hobby = music
job = teacher
Any solutions or hints how to do it? Thanks a lot!
Using PyYAML as suggested by #aitchnyu and making a little modifications on the original format makes this an easy task:
import yaml
text = """
peter:
name: peter
surname: asd
year: 23
tom:
name: tom
surname: zaq
year: 22
hobby: sport
paul:
name: paul
surname: zxc
hobby: music
job: teacher
"""
persons = yaml.load(text)
persons["tom"]["year"] = persons["tom"]["year"]*4 # Tom is older now
print yaml.dump(persons, default_flow_style=False)
Result:
paul:
hobby: music
job: teacher
name: paul
surname: zxc
peter:
name: peter
surname: asd
year: 23
tom:
hobby: sport
name: tom
surname: zaq
year: 88
Of course, you should read "text" from your file (db.txt) and write it after finished
Addendum to Sebastien's comment: use an in-memory SQLite DB. SQLite is already embedded in Python, so its just a few lines to set up.
Also, unless that format cannot be changed, consider YAML for the text. Python can readily translate to/from YAML and Python objects (an object composed of python dicts, lists, strings, real numbers etc) in a single step.
http://pyyaml.org/wiki/PyYAML
So my suggestion is a YAML -> Python object -> SQLite DB and back again.