Get Subject and grade from string - python
Given this string
result = '''Check here to visit our corporate website
Results
Candidate Information
Examination Number
986542346
Candidate Name
JOHN DOE JAMES
Examination
MFFG FOR SCHOOL CANDIDATES 2021
Centre
LORDYARD
Subject Grades
DATA PROCESSING
B3
ECONOMICS
B3
CIVIC EDUCATION
B3
ENGLISH LANGUAGE
A1
MATHEMATICS
B3
AGRICULTURAL SCIENCE
OUTSTANDING
BIOLOGY
A1
CHEMISTRY
B2
PHYSICS
C5
C Information
Card Use
1 of 5'''
How can I extract the NAME(JOHN DOE JAMES, SUBJECTS and the GRADES to different list.
I have tried this for the subject and grades but not giving me the desired results. Firstly, where subject name is more than one word it only returns to last 1 eg instead DATA PROCESSING am getting PROCESSING. Secondly, it is skipping AGRICULTURAL SCIENCE(subject) and OUTSTANDING(grade)
Please note that am new in using regex. Thanks in advance.
pattern = re.compile(r'[A-Z]+\n{1}[A-Z][0-9]')
searches = pattern.findall(result)
if searches:
print(searches)searches = pattern.findall(result)
for search in searches:
print(search)
OUTPUT FOR THE FIRST PRINT STATEMENT:
['PROCESSING\nB3', 'ECONOMICS\nB3', 'EDUCATION\nB3', 'LANGUAGE\nA1', 'MATHEMATICS\nB3', 'BIOLOGY\nA1', 'CHEMISTRY\nB2', 'PHYSICS\nC5']
SECOND PRINT STATEMENT
PROCESSING
B3
ECONOMICS
B3
EDUCATION
B3
LANGUAGE
A1
MATHEMATICS
B3
BIOLOGY
A1
CHEMISTRY
B2
PHYSICS
C5
Here's a way to do this without using regexes. Note that I am assuming "OUTSTANDING" is intended to be a grade. That takes special processing.
result = '''Check here to visit our corporate website Results Candidate Information Examination Number 986542346 Candidate Name JOHN DOE JAMES Examination MFFG FOR SCHOOL CANDIDATES 2021 Centre LORDYARD Subject Grades DATA PROCESSING B3 ECONOMICS B3 CIVIC EDUCATION B3 ENGLISH LANGUAGE A1 MATHEMATICS B3 AGRICULTURAL SCIENCE OUTSTANDING BIOLOGY A1 CHEMISTRY B2 PHYSICS C5 C Information Card Use 1 of 5'''
i = result.find('Name')
j = result.find('Examination',i)
k = result.find('Centre')
l = result.find('Subject Grades')
m = result.find('Information Card')
name = result[i+5:j-1]
exam = result[j+12:k-1]
grades = result[l+15:m].split()
print("Name:", name)
print("Exam:", exam)
print("Grades:")
subject = []
for word in grades:
if len(word) == 2 or word=='OUTSTANDING':
print(' '.join(subject), "......", word)
subject = []
else:
subject.append(word)
Output:
Name: JOHN DOE JAMES
Exam: MFFG FOR SCHOOL CANDIDATES 2021
Grades:
DATA PROCESSING ...... B3
ECONOMICS ...... B3
CIVIC EDUCATION ...... B3
ENGLISH LANGUAGE ...... A1
MATHEMATICS ...... B3
AGRICULTURAL SCIENCE ...... OUTSTANDING
BIOLOGY ...... A1
CHEMISTRY ...... B2
PHYSICS ...... C5
Related
Data Cleaning How to split Pandas column
It has been sometime since I tried working in python. I have below data frame with many columns too many to name. last/first location job department smith john Vancouver A1 servers rogers steve Toronto A2 eng Rogers Dave Toronto A4 HR How to I remove caps in the last/first column and also split the last/first column by " "? Goal: last first location job department smith john Vancouver A1 servers rogers steve Toronto A2 eng rogers dave Toronto A4 HR
IIUC, you could use str.lower and str.split: df[['last', 'first']] = (df.pop('last/first') .str.lower() .str.split(n=1, expand=True) ) output: location job department last first 0 Vancouver A1 servers smith john 1 Toronto A2 eng rogers steve 2 Toronto A4 HR rogers dave
How to create a transition matrix out of a Pandas dataframe
I have customers at a specific location. The customer may change the location from year to year. I would like to create a transition matrix that shows me the customers that transited from one location to another. Tidy dataframe: year cust loc 2019 C1 LA 2019 C2 LA 2019 C3 LB 2019 C4 LC 2019 C5 LA 2019 C6 LA 2020 C1 LB 2020 C2 LA 2020 C4 LC 2020 C5 LC 2020 C6 LC 2020 C7 LD LA LB LC LD dorp LA C1 C5,C6 LB C3 LC LD I am looking for an elegant way to achieve that in pandas. Any clever idea where extensive nested looping is not needed?
Lookup values from one DataFrame to create a dict from another
I am very new to Python and came across a problem that I could not solve. I have two Dataframe extracted columns only needed to consider, for example, df1 Student ID Subjects 0 S1 Maths, Physics, Chemistry, Biology 1 S2 Maths, Chemistry, Computing 2 S3 Maths, Chemistry, Computing 3 S4 Biology, Chemistry, Maths 4 S5 English Literature, History, French 5 S6 Economics, Maths, Geography 6 S7 Further Mathematics, Maths, Physics 7 S8 Arts, Film Studies, Psychology 8 S9 English Literature, English Language, Classical 9 S10 Business, Computing, Maths df2 Subject ID Subjects 58 Che13 Chemistry 59 Bio13 Biology 60 Mat13 Maths 61 FMat13 Further Mathematics 62 Phy13 Physics 63 Eco13 Economics 64 Geo13 Geography 65 His13 History 66 EngLang13 English Langauge 67 EngLit13 English Literature How can I compare for every df2 subjects, if there is a student taking that subject, make a dictionary with key "Subject ID" and values "student ID"? Desired output will be something like; Che13:[S1, S2, S3, ...] Bio13:[S1,S4,...]
Use explode and map, then you can do a little grouping to get your output: (df.set_index('Student ID')['Subjects'] .str.split(', ') .explode() .map(df2.set_index('Subjects')['Subject ID']) .reset_index() .groupby('Subjects')['Student ID'] .agg(list)) Subjects Bio13 [S1, S4] Che13 [S1, S2, S3, S4] Eco13 [S6] EngLit13 [S5, S9] FMat13 [S7] Geo13 [S6] His13 [S5] Mat13 [S1, S2, S3, S4, S6, S7, S10] Phy13 [S1, S7] Name: Student ID, dtype: object From here, call .to_dict() if you want the result in a dictionary.
Not pythonic but simple {row['Subject ID'] : df1[df1.Subjects.str.contains(row['Subjects'])]['Student ID'].to_list() for _, row in df2.iterrows()} What are we doing : Iterate over all the Subjects and check if the Subject string lies in the subjects taken by a student. If so, get the students ID.
Python Program to split a new file from a master file
I have a master file which has 4 columns. Name Parent Child Property A1 World USA 1 A2 USA Texas 2 A3 Texas Houston 3 A4 USA Austin 4 A5 World USA 5 A6 World Canada 6 A7 Canada Toronto 7 I need to create a new file and extract those records which are in between the keyword(USA) in column 3. The output file should be : Name Parent Child Property A1 World USA 1 A2 USA Texas 2 A3 Texas Houston 3 A4 USA Austin 4 A5 World USA 5
Please find the sample code and working fine on my test box !/usr/bin/python import re oldfile = open("old.txt", "r") - -- > old.txt - source file with all contents newfile = open("new.txt", "w") - - -> new file to write the output for line in oldfile: if re.match("(.)USA(.)", line): print >> newfile, line, Output file: cat new.txt A1 World USA 1 A2 USA Texas 2 A4 USA Austin 4 A5 World USA 5
How can i split a pandas dataframe in such a way that for each split value it creates a column
for eg. Input Data frame Name Subjects Ramesh Maths,Science Rakesh MAths,Science,Social Studies John Social Science, Lietrature Output Data frame Name Subject1 Subject2 Subjects3 Ramesh Maths Science NaN Rakesh MAths Science Social Studies John Social Science Literature Nan
You can create a new df from the result of str.split and then concat them: In [66]: subjects = df['Subjects'].str.split(',', expand=True) subjects Out[66]: 0 1 2 0 Maths Science None 1 MAths Science Social Studies 2 Social Science Lietrature None In [71]: subjects.columns = ['Subject ' + str(x + 1) for x in range(len(subjects.columns))] subjects Out[71]: Subject 1 Subject 2 Subject 3 0 Maths Science None 1 MAths Science Social Studies 2 Social Science Lietrature None In [74]: concatenated = pd.concat([df,subjects], axis=1) concatenated.drop('Subjects',axis=1,inplace=True) concatenated Out[74]: Name Subject 1 Subject 2 Subject 3 0 Ramesh Maths Science None 1 Rakesh MAths Science Social Studies 2 John Social Science Lietrature None