I have an XML file and I would like to parse it into a table. (Pandas dataframe)
Below is just a sample of the XML file. Those are only two of the records.
<?xml version="1.0" encoding="UTF-8"?>
<file>
<C13_335010X321A1_837Y6>
<BHT_BeginningOfHierarchicalTransaction>
<BHT01__HierarchicalStructureCode>0011</BHT01__HierarchicalStructureCode>
<BHT02__TransactionSetPurposeCode>00</BHT02__TransactionSetPurposeCode>
<BHT03__OriginatorApplicationTransactionIdentifier>513513TR</BHT03__OriginatorApplicationTransactionIdentifier>
<BHT04__TransactionSetCreationDate>20200212</BHT04__TransactionSetCreationDate>
<BHT05__TransactionSetCreationTime>1287</BHT05__TransactionSetCreationTime>
<BHT06__ClaimOrEncounterIdentifier>DD</BHT06__ClaimOrEncounterIdentifier>
</BHT_BeginningOfHierarchicalTransaction>
<Loop_1000A>
<NM1_SubmitterName_1000A>
<NM101__EntityIdentifierCode>27</NM101__EntityIdentifierCode>
<NM102__EntityTypeQualifier>9</NM102__EntityTypeQualifier>
<NM103__SubmitterLastOrOrganizationName>AAA</NM103__SubmitterLastOrOrganizationName>
<NM108__IdentificationCodeQualifier>22</NM108__IdentificationCodeQualifier>
<NM109__SubmitterIdentifier>55555500</NM109__SubmitterIdentifier>
</NM1_SubmitterName_1000A>
<PER_SubmitterEDIContactInformation_1000A>
<PER01__ContactFunctionCode>LK</PER01__ContactFunctionCode>
<PER02__SubmitterContactName>John Smith</PER02__SubmitterContactName>
<PER03__CommunicationNumberQualifier>WW</PER03__CommunicationNumberQualifier>
<PER04__CommunicationNumber>2132220011</PER04__CommunicationNumber>
<PER05__CommunicationNumberQualifier>DD</PER05__CommunicationNumberQualifier>
<PER06__CommunicationNumber>DD_2#GMAIL.COM</PER06__CommunicationNumber>
</PER_SubmitterEDIContactInformation_1000A>
</Loop_1000A>
<Loop_1000B>
<NM1_ReceiverName_1000B>
<NM101__EntityIdentifierCode>21</NM101__EntityIdentifierCode>
<NM102__EntityTypeQualifier>0</NM102__EntityTypeQualifier>
<NM103__ReceiverName>AAA</NM103__ReceiverName>
<NM108__IdentificationCodeQualifier>32</NM108__IdentificationCodeQualifier>
<NM109__ReceiverPrimaryIdentifier>2514521</NM109__ReceiverPrimaryIdentifier>
</NM1_ReceiverName_1000B>
</Loop_1000B>
<Loop_2000A>
<HL_BillingProviderHierarchicalLevel_2000A>
<HL01__HierarchicalIDNumber>32</HL01__HierarchicalIDNumber>
<HL03__HierarchicalLevelCode>54</HL03__HierarchicalLevelCode>
<HL04__HierarchicalChildCode>32</HL04__HierarchicalChildCode>
</HL_BillingProviderHierarchicalLevel_2000A>
<Loop_2010AA>
<NM1_BillingProviderName_2010AA>
<NM101__EntityIdentifierCode>54</NM101__EntityIdentifierCode>
<NM102__EntityTypeQualifier>21</NM102__EntityTypeQualifier>
<NM103__BillingProviderLastOrOrganizationalName>AAA</NM103__BillingProviderLastOrOrganizationalName>
<NM108__IdentificationCodeQualifier>XX</NM108__IdentificationCodeQualifier>
<NM109__BillingProviderIdentifier>515151325</NM109__BillingProviderIdentifier>
</NM1_BillingProviderName_2010AA>
<N3_BillingProviderAddress_2010AA>
<N301__BillingProviderAddressLine>214 SS STREET</N301__BillingProviderAddressLine>
</N3_BillingProviderAddress_2010AA>
<N4_BillingProviderCityStateZIPCode_2010AA>
<N401__BillingProviderCityName>LA</N401__BillingProviderCityName>
<N402__BillingProviderStateOrProvinceCode>CA</N402__BillingProviderStateOrProvinceCode>
<N403__BillingProviderPostalZoneOrZIPCode>93500</N403__BillingProviderPostalZoneOrZIPCode>
</N4_BillingProviderCityStateZIPCode_2010AA>
<REF_BillingProviderTaxIdentification_2010AA>
<REF01__ReferenceIdentificationQualifier>OI</REF01__ReferenceIdentificationQualifier>
<REF02__BillingProviderTaxIdentificationNumber>5135151315</REF02__BillingProviderTaxIdentificationNumber>
</REF_BillingProviderTaxIdentification_2010AA>
</Loop_2010AA>
<Loop_2000B>
<HL_SubscriberHierarchicalLevel_2000B>
<HL01__HierarchicalIDNumber>5</HL01__HierarchicalIDNumber>
<HL02__HierarchicalParentIDNumber>5</HL02__HierarchicalParentIDNumber>
<HL03__HierarchicalLevelCode>55</HL03__HierarchicalLevelCode>
<HL04__HierarchicalChildCode>5</HL04__HierarchicalChildCode>
</HL_SubscriberHierarchicalLevel_2000B>
<SBR_SubscriberInformation_2000B>
<SBR01__PayerResponsibilitySequenceNumberCode>L</SBR01__PayerResponsibilitySequenceNumberCode>
<SBR02__IndividualRelationshipCode>32</SBR02__IndividualRelationshipCode>
<SBR03__SubscriberGroupOrPolicyNumber>252525Z125</SBR03__SubscriberGroupOrPolicyNumber>
<SBR09__ClaimFilingIndicatorCode>NM</SBR09__ClaimFilingIndicatorCode>
</SBR_SubscriberInformation_2000B>
<Loop_2010BA>
<NM1_SubscriberName_2010BA>
<NM101__EntityIdentifierCode>DCX</NM101__EntityIdentifierCode>
<NM102__EntityTypeQualifier>5</NM102__EntityTypeQualifier>
<NM103__SubscriberLastName>SMITH</NM103__SubscriberLastName>
<NM104__SubscriberFirstName>JOHN</NM104__SubscriberFirstName>
<NM108__IdentificationCodeQualifier>CA</NM108__IdentificationCodeQualifier>
<NM109__SubscriberPrimaryIdentifier>3656361.</NM109__SubscriberPrimaryIdentifier>
</NM1_SubscriberName_2010BA>
<N3_SubscriberAddress_2010BA>
<N301__SubscriberAddressLine>111 STREET</N301__SubscriberAddressLine>
</N3_SubscriberAddress_2010BA>
<N4_SubscriberCityStateZIPCode_2010BA>
<N401__SubscriberCityName>LA</N401__SubscriberCityName>
<N402__SubscriberStateCode>CA</N402__SubscriberStateCode>
<N403__SubscriberPostalZoneOrZIPCode>93000</N403__SubscriberPostalZoneOrZIPCode>
</N4_SubscriberCityStateZIPCode_2010BA>
<DMG_SubscriberDemographicInformation_2010BA>
<DMG01__DateTimePeriodFormatQualifier>K5</DMG01__DateTimePeriodFormatQualifier>
<DMG02__SubscriberBirthDate>19851010</DMG02__SubscriberBirthDate>
<DMG03__SubscriberGenderCode>U</DMG03__SubscriberGenderCode>
</DMG_SubscriberDemographicInformation_2010BA>
</Loop_2010BA>
<Loop_2010BB>
<NM1_PayerName_2010BB>
<NM101__EntityIdentifierCode>FF</NM101__EntityIdentifierCode>
<NM102__EntityTypeQualifier>3</NM102__EntityTypeQualifier>
<NM103__PayerName>AAA</NM103__PayerName>
<NM108__IdentificationCodeQualifier>GF</NM108__IdentificationCodeQualifier>
<NM109__PayerIdentifier>32514</NM109__PayerIdentifier>
</NM1_PayerName_2010BB>
</Loop_2010BB>
<Loop_2300>
<CLM_ClaimInformation_2300>
<CLM01__PatientControlNumber>5413</CLM01__PatientControlNumber>
<CLM02__TotalClaimChargeAmount>651</CLM02__TotalClaimChargeAmount>
<CLM05_HealthCareServiceLocationInformation_2300>
<CLM05_01_PlaceOfServiceCode>13</CLM05_01_PlaceOfServiceCode>
<CLM05_02_FacilityCodeQualifier>D</CLM05_02_FacilityCodeQualifier>
<CLM05_03_ClaimFrequencyCode>3</CLM05_03_ClaimFrequencyCode>
</CLM05_HealthCareServiceLocationInformation_2300>
<CLM06__ProviderOrSupplierSignatureIndicator>N</CLM06__ProviderOrSupplierSignatureIndicator>
<CLM07__AssignmentOrPlanParticipationCode>R</CLM07__AssignmentOrPlanParticipationCode>
<CLM08__BenefitsAssignmentCertificationIndicator>N</CLM08__BenefitsAssignmentCertificationIndicator>
<CLM09__ReleaseOfInformationCode>N</CLM09__ReleaseOfInformationCode>
<CLM10__PatientSignatureSourceCode>X</CLM10__PatientSignatureSourceCode>
</CLM_ClaimInformation_2300>
<REF_ClaimIdentifierForTransmissionIntermediaries_2300>
<REF01__ReferenceIdentificationQualifier>J1</REF01__ReferenceIdentificationQualifier>
<REF02__ValueAddedNetworkTraceNumber>FVC2514543254</REF02__ValueAddedNetworkTraceNumber>
</REF_ClaimIdentifierForTransmissionIntermediaries_2300>
<HI_HealthCareDiagnosisCode_2300>
<HI01_HealthCareCodeInformation_2300>
<HI01_01_DiagnosisTypeCode>CCC</HI01_01_DiagnosisTypeCode>
<HI01_02_DiagnosisCode>N111</HI01_02_DiagnosisCode>
</HI01_HealthCareCodeInformation_2300>
</HI_HealthCareDiagnosisCode_2300>
<Loop_2310B>
<NM1_RenderingProviderName_2310B>
<NM101__EntityIdentifierCode>32</NM101__EntityIdentifierCode>
<NM102__EntityTypeQualifier>2</NM102__EntityTypeQualifier>
<NM103__RenderingProviderLastOrOrganizationName>JOHN</NM103__RenderingProviderLastOrOrganizationName>
<NM104__RenderingProviderFirstName>SMITH</NM104__RenderingProviderFirstName>
<NM108__IdentificationCodeQualifier>TT</NM108__IdentificationCodeQualifier>
<NM109__RenderingProviderIdentifier>25431251</NM109__RenderingProviderIdentifier>
</NM1_RenderingProviderName_2310B>
<PRV_RenderingProviderSpecialtyInformation_2310B>
<PRV01__ProviderCode>TR</PRV01__ProviderCode>
<PRV02__ReferenceIdentificationQualifier>VFD</PRV02__ReferenceIdentificationQualifier>
<PRV03__ProviderTaxonomyCode>135454353L</PRV03__ProviderTaxonomyCode>
</PRV_RenderingProviderSpecialtyInformation_2310B>
</Loop_2310B>
<Loop_2400>
<LX_ServiceLineNumber_2400>
<LX01__AssignedNumber>2</LX01__AssignedNumber>
</LX_ServiceLineNumber_2400>
<SV1_ProfessionalService_2400>
<SV101_CompositeMedicalProcedureIdentifier_2400>
<SV101_01_ProductOrServiceIDQualifier>EE</SV101_01_ProductOrServiceIDQualifier>
<SV101_02_ProcedureCode>99999</SV101_02_ProcedureCode>
<SV101_07_Description>BLOOD</SV101_07_Description>
</SV101_CompositeMedicalProcedureIdentifier_2400>
<SV102__LineItemChargeAmount>200</SV102__LineItemChargeAmount>
<SV103__UnitOrBasisForMeasurementCode>PP</SV103__UnitOrBasisForMeasurementCode>
<SV104__ServiceUnitCount>3.5</SV104__ServiceUnitCount>
<SV107_CompositeDiagnosisCodePointer_2400>
<SV107_01_DiagnosisCodePointer>2</SV107_01_DiagnosisCodePointer>
</SV107_CompositeDiagnosisCodePointer_2400>
</SV1_ProfessionalService_2400>
<DTP_DateServiceDate_2400>
<DTP01__DateTimeQualifier>654</DTP01__DateTimeQualifier>
<DTP02__DateTimePeriodFormatQualifier>U8</DTP02__DateTimePeriodFormatQualifier>
<DTP03__ServiceDate>20191010</DTP03__ServiceDate>
</DTP_DateServiceDate_2400>
<REF_LineItemControlNumber_2400>
<REF01__ReferenceIdentificationQualifier>5F</REF01__ReferenceIdentificationQualifier>
<REF02__LineItemControlNumber>DDD.32.123</REF02__LineItemControlNumber>
</REF_LineItemControlNumber_2400>
</Loop_2400>
</Loop_2300>
</Loop_2000B>
</Loop_2000A>
</C13_335010X321A1_837Y6>
<C13_335010X321A1_837Y6>
<BHT_BeginningOfHierarchicalTransaction>
<BHT01__HierarchicalStructureCode>0011</BHT01__HierarchicalStructureCode>
<BHT02__TransactionSetPurposeCode>00</BHT02__TransactionSetPurposeCode>
<BHT03__OriginatorApplicationTransactionIdentifier>513513TR</BHT03__OriginatorApplicationTransactionIdentifier>
<BHT04__TransactionSetCreationDate>20200212</BHT04__TransactionSetCreationDate>
<BHT05__TransactionSetCreationTime>1287</BHT05__TransactionSetCreationTime>
<BHT06__ClaimOrEncounterIdentifier>DD</BHT06__ClaimOrEncounterIdentifier>
</BHT_BeginningOfHierarchicalTransaction>
<Loop_1000A>
<NM1_SubmitterName_1000A>
<NM101__EntityIdentifierCode>27</NM101__EntityIdentifierCode>
<NM102__EntityTypeQualifier>9</NM102__EntityTypeQualifier>
<NM103__SubmitterLastOrOrganizationName>AAA</NM103__SubmitterLastOrOrganizationName>
<NM108__IdentificationCodeQualifier>22</NM108__IdentificationCodeQualifier>
<NM109__SubmitterIdentifier>55555500</NM109__SubmitterIdentifier>
</NM1_SubmitterName_1000A>
<PER_SubmitterEDIContactInformation_1000A>
<PER01__ContactFunctionCode>LK</PER01__ContactFunctionCode>
<PER02__SubmitterContactName>John Smith</PER02__SubmitterContactName>
<PER03__CommunicationNumberQualifier>WW</PER03__CommunicationNumberQualifier>
<PER04__CommunicationNumber>2132220011</PER04__CommunicationNumber>
<PER05__CommunicationNumberQualifier>DD</PER05__CommunicationNumberQualifier>
<PER06__CommunicationNumber>DD_2#GMAIL.COM</PER06__CommunicationNumber>
</PER_SubmitterEDIContactInformation_1000A>
</Loop_1000A>
<Loop_1000B>
<NM1_ReceiverName_1000B>
<NM101__EntityIdentifierCode>21</NM101__EntityIdentifierCode>
<NM102__EntityTypeQualifier>0</NM102__EntityTypeQualifier>
<NM103__ReceiverName>AAA</NM103__ReceiverName>
<NM108__IdentificationCodeQualifier>32</NM108__IdentificationCodeQualifier>
<NM109__ReceiverPrimaryIdentifier>2514521</NM109__ReceiverPrimaryIdentifier>
</NM1_ReceiverName_1000B>
</Loop_1000B>
<Loop_2000A>
<HL_BillingProviderHierarchicalLevel_2000A>
<HL01__HierarchicalIDNumber>32</HL01__HierarchicalIDNumber>
<HL03__HierarchicalLevelCode>54</HL03__HierarchicalLevelCode>
<HL04__HierarchicalChildCode>32</HL04__HierarchicalChildCode>
</HL_BillingProviderHierarchicalLevel_2000A>
<Loop_2010AA>
<NM1_BillingProviderName_2010AA>
<NM101__EntityIdentifierCode>54</NM101__EntityIdentifierCode>
<NM102__EntityTypeQualifier>21</NM102__EntityTypeQualifier>
<NM103__BillingProviderLastOrOrganizationalName>AAA</NM103__BillingProviderLastOrOrganizationalName>
<NM108__IdentificationCodeQualifier>XX</NM108__IdentificationCodeQualifier>
<NM109__BillingProviderIdentifier>515151325</NM109__BillingProviderIdentifier>
</NM1_BillingProviderName_2010AA>
<N3_BillingProviderAddress_2010AA>
<N301__BillingProviderAddressLine>214 SS STREET</N301__BillingProviderAddressLine>
</N3_BillingProviderAddress_2010AA>
<N4_BillingProviderCityStateZIPCode_2010AA>
<N401__BillingProviderCityName>LA</N401__BillingProviderCityName>
<N402__BillingProviderStateOrProvinceCode>CA</N402__BillingProviderStateOrProvinceCode>
<N403__BillingProviderPostalZoneOrZIPCode>93500</N403__BillingProviderPostalZoneOrZIPCode>
</N4_BillingProviderCityStateZIPCode_2010AA>
<REF_BillingProviderTaxIdentification_2010AA>
<REF01__ReferenceIdentificationQualifier>OI</REF01__ReferenceIdentificationQualifier>
<REF02__BillingProviderTaxIdentificationNumber>5135151315</REF02__BillingProviderTaxIdentificationNumber>
</REF_BillingProviderTaxIdentification_2010AA>
</Loop_2010AA>
<Loop_2000B>
<HL_SubscriberHierarchicalLevel_2000B>
<HL01__HierarchicalIDNumber>5</HL01__HierarchicalIDNumber>
<HL02__HierarchicalParentIDNumber>5</HL02__HierarchicalParentIDNumber>
<HL03__HierarchicalLevelCode>55</HL03__HierarchicalLevelCode>
<HL04__HierarchicalChildCode>5</HL04__HierarchicalChildCode>
</HL_SubscriberHierarchicalLevel_2000B>
<SBR_SubscriberInformation_2000B>
<SBR01__PayerResponsibilitySequenceNumberCode>L</SBR01__PayerResponsibilitySequenceNumberCode>
<SBR02__IndividualRelationshipCode>32</SBR02__IndividualRelationshipCode>
<SBR03__SubscriberGroupOrPolicyNumber>252525Z125</SBR03__SubscriberGroupOrPolicyNumber>
<SBR09__ClaimFilingIndicatorCode>NM</SBR09__ClaimFilingIndicatorCode>
</SBR_SubscriberInformation_2000B>
<Loop_2010BA>
<NM1_SubscriberName_2010BA>
<NM101__EntityIdentifierCode>DCX</NM101__EntityIdentifierCode>
<NM102__EntityTypeQualifier>5</NM102__EntityTypeQualifier>
<NM103__SubscriberLastName>SMITH</NM103__SubscriberLastName>
<NM104__SubscriberFirstName>JOHN</NM104__SubscriberFirstName>
<NM108__IdentificationCodeQualifier>CA</NM108__IdentificationCodeQualifier>
<NM109__SubscriberPrimaryIdentifier>3656361.</NM109__SubscriberPrimaryIdentifier>
</NM1_SubscriberName_2010BA>
<N3_SubscriberAddress_2010BA>
<N301__SubscriberAddressLine>111 STREET</N301__SubscriberAddressLine>
</N3_SubscriberAddress_2010BA>
<N4_SubscriberCityStateZIPCode_2010BA>
<N401__SubscriberCityName>LA</N401__SubscriberCityName>
<N402__SubscriberStateCode>CA</N402__SubscriberStateCode>
<N403__SubscriberPostalZoneOrZIPCode>93000</N403__SubscriberPostalZoneOrZIPCode>
</N4_SubscriberCityStateZIPCode_2010BA>
<DMG_SubscriberDemographicInformation_2010BA>
<DMG01__DateTimePeriodFormatQualifier>K5</DMG01__DateTimePeriodFormatQualifier>
<DMG02__SubscriberBirthDate>19851010</DMG02__SubscriberBirthDate>
<DMG03__SubscriberGenderCode>U</DMG03__SubscriberGenderCode>
</DMG_SubscriberDemographicInformation_2010BA>
</Loop_2010BA>
<Loop_2010BB>
<NM1_PayerName_2010BB>
<NM101__EntityIdentifierCode>FF</NM101__EntityIdentifierCode>
<NM102__EntityTypeQualifier>3</NM102__EntityTypeQualifier>
<NM103__PayerName>AAA</NM103__PayerName>
<NM108__IdentificationCodeQualifier>GF</NM108__IdentificationCodeQualifier>
<NM109__PayerIdentifier>32514</NM109__PayerIdentifier>
</NM1_PayerName_2010BB>
</Loop_2010BB>
<Loop_2300>
<CLM_ClaimInformation_2300>
<CLM01__PatientControlNumber>5413</CLM01__PatientControlNumber>
<CLM02__TotalClaimChargeAmount>651</CLM02__TotalClaimChargeAmount>
<CLM05_HealthCareServiceLocationInformation_2300>
<CLM05_01_PlaceOfServiceCode>13</CLM05_01_PlaceOfServiceCode>
<CLM05_02_FacilityCodeQualifier>D</CLM05_02_FacilityCodeQualifier>
<CLM05_03_ClaimFrequencyCode>3</CLM05_03_ClaimFrequencyCode>
</CLM05_HealthCareServiceLocationInformation_2300>
<CLM06__ProviderOrSupplierSignatureIndicator>N</CLM06__ProviderOrSupplierSignatureIndicator>
<CLM07__AssignmentOrPlanParticipationCode>R</CLM07__AssignmentOrPlanParticipationCode>
<CLM08__BenefitsAssignmentCertificationIndicator>N</CLM08__BenefitsAssignmentCertificationIndicator>
<CLM09__ReleaseOfInformationCode>N</CLM09__ReleaseOfInformationCode>
<CLM10__PatientSignatureSourceCode>X</CLM10__PatientSignatureSourceCode>
</CLM_ClaimInformation_2300>
<REF_ClaimIdentifierForTransmissionIntermediaries_2300>
<REF01__ReferenceIdentificationQualifier>J1</REF01__ReferenceIdentificationQualifier>
<REF02__ValueAddedNetworkTraceNumber>FVC2514543254</REF02__ValueAddedNetworkTraceNumber>
</REF_ClaimIdentifierForTransmissionIntermediaries_2300>
<HI_HealthCareDiagnosisCode_2300>
<HI01_HealthCareCodeInformation_2300>
<HI01_01_DiagnosisTypeCode>CCC</HI01_01_DiagnosisTypeCode>
<HI01_02_DiagnosisCode>N111</HI01_02_DiagnosisCode>
</HI01_HealthCareCodeInformation_2300>
</HI_HealthCareDiagnosisCode_2300>
<Loop_2310B>
<NM1_RenderingProviderName_2310B>
<NM101__EntityIdentifierCode>32</NM101__EntityIdentifierCode>
<NM102__EntityTypeQualifier>2</NM102__EntityTypeQualifier>
<NM103__RenderingProviderLastOrOrganizationName>JOHN</NM103__RenderingProviderLastOrOrganizationName>
<NM104__RenderingProviderFirstName>SMITH</NM104__RenderingProviderFirstName>
<NM108__IdentificationCodeQualifier>TT</NM108__IdentificationCodeQualifier>
<NM109__RenderingProviderIdentifier>25431251</NM109__RenderingProviderIdentifier>
</NM1_RenderingProviderName_2310B>
<PRV_RenderingProviderSpecialtyInformation_2310B>
<PRV01__ProviderCode>TR</PRV01__ProviderCode>
<PRV02__ReferenceIdentificationQualifier>VFD</PRV02__ReferenceIdentificationQualifier>
<PRV03__ProviderTaxonomyCode>135454353L</PRV03__ProviderTaxonomyCode>
</PRV_RenderingProviderSpecialtyInformation_2310B>
</Loop_2310B>
<Loop_2400>
<LX_ServiceLineNumber_2400>
<LX01__AssignedNumber>2</LX01__AssignedNumber>
</LX_ServiceLineNumber_2400>
<SV1_ProfessionalService_2400>
<SV101_CompositeMedicalProcedureIdentifier_2400>
<SV101_01_ProductOrServiceIDQualifier>EE</SV101_01_ProductOrServiceIDQualifier>
<SV101_02_ProcedureCode>99999</SV101_02_ProcedureCode>
<SV101_07_Description>BLOOD</SV101_07_Description>
</SV101_CompositeMedicalProcedureIdentifier_2400>
<SV102__LineItemChargeAmount>200</SV102__LineItemChargeAmount>
<SV103__UnitOrBasisForMeasurementCode>PP</SV103__UnitOrBasisForMeasurementCode>
<SV104__ServiceUnitCount>3.5</SV104__ServiceUnitCount>
<SV107_CompositeDiagnosisCodePointer_2400>
<SV107_01_DiagnosisCodePointer>2</SV107_01_DiagnosisCodePointer>
</SV107_CompositeDiagnosisCodePointer_2400>
</SV1_ProfessionalService_2400>
<DTP_DateServiceDate_2400>
<DTP01__DateTimeQualifier>654</DTP01__DateTimeQualifier>
<DTP02__DateTimePeriodFormatQualifier>U8</DTP02__DateTimePeriodFormatQualifier>
<DTP03__ServiceDate>20191010</DTP03__ServiceDate>
</DTP_DateServiceDate_2400>
<REF_LineItemControlNumber_2400>
<REF01__ReferenceIdentificationQualifier>5F</REF01__ReferenceIdentificationQualifier>
<REF02__LineItemControlNumber>DDD.32.123</REF02__LineItemControlNumber>
</REF_LineItemControlNumber_2400>
</Loop_2400>
</Loop_2300>
</Loop_2000B>
</Loop_2000A>
</C13_335010X321A1_837Y6>
</file>
These have to be in two rows, I am using the following python code to convert it into panda data frame, but I am getting empty data frame.
import pandas as pd
import xml.etree.ElementTree as et
def xml_file(file):
columns = file.attrib
for xml in file.iter('C13_335010X321A1_837Y6'):
file_dict = columns.copy()
file_dict.update(xml.attrib)
yield file_dict
tree = et.parse(r"C:\Users\Desktop\test1.xml")
root = tree.getroot()
df = pd.DataFrame(list(xml_file(root)))
I am new to python. Sorry for asking this stupid question.
I am trying to read a XML file to python object (preferably to pandas)
For now I am just trying to print the variables, to see if I can read them properly in a tabular form.
I have used xml.etree.ElementTree for this, but I might not be using it as intended.
Code:
import xml.etree.ElementTree as ET
tree = ET.parse("data.xml")
ODM = tree.getroot()
ns = {'xmlns': 'http://www.cdisc.org/ns/odm/v1.3',
'mdsol': 'http://www.mdsol.com/ns/odm/metadata'}
for ClinicalData in ODM:
LocationOID=None
#print(ClinicalData.tag, ClinicalData.attrib)
for SubjectData in ClinicalData:
for SiteRef in SubjectData:
LocationOID=SiteRef.attrib.get('LocationOID')
for StudyEventData in SubjectData:
for AuditRecord in StudyEventData:
print(ClinicalData.attrib.get('MetaDataVersionOID'),
ClinicalData.attrib.get('AuditSubCategoryName'), #null ouptput due to namespace issue
SubjectData.attrib.get('SubjectKey'),
SubjectData.attrib.get('SubjectName'), #null ouptput due to namespace issue
LocationOID, #not sure what is the issue
StudyEventData.attrib.get('StudyEventRepeatKey'),
AuditRecord.find('DateTimeStamp') #not sure what is the issue
)
Input:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<ODM xmlns="http://www.cdisc.org/ns/odm/v1.3"
xmlns:mdsol="http://www.mdsol.com/ns/odm/metadata"
CreationDateTime="2019-08-23T12:59:09" FileOID="3b2b4161-fad8-4239-9c83-03d0e62624dd" FileType="Transactional" ODMVersion="1.3">
<ClinicalData MetaDataVersionOID="1772" StudyOID="0ACC SP3 MAPPING1(DEV)" mdsol:AuditSubCategoryName="Activated">
<SubjectData SubjectKey="7735fd9c-1792-457c-aa58-0ca26ecdc810" mdsol:SubjectKeyType="SubjectUUID" mdsol:SubjectName="ACC-SUBJ-3">
<SiteRef LocationOID="0ACCSP3MAPPING1SITE1"/>
<StudyEventData StudyEventOID="FV" StudyEventRepeatKey="VIST[1]/FV[1]" mdsol:InstanceId="2960580">
<AuditRecord>
<UserRef UserOID="systemuser"/>
<LocationRef LocationOID="0ACCSP3MAPPING1SITE1"/>
<DateTimeStamp>2019-07-10T07:56:54</DateTimeStamp>
<ReasonForChange>Update</ReasonForChange>
<SourceID>394263772</SourceID>
</AuditRecord>
</StudyEventData>
</SubjectData>
</ClinicalData>
</ODM>
I am expecting all the print variables need to have the proper variable assigned values as in XML file. Please let me know is there any other proper way of doing it instead of inner looping multiple times.
Namespaces are a pain using ElementTree. See this discussion.
Short answer:
for ClinicalData in ODM:
#print(ClinicalData.tag, ClinicalData.attrib)
for SubjectData in ClinicalData:
SiteRef = SubjectData.find('{http://www.cdisc.org/ns/odm/v1.3}SiteRef')
LocationOID = SiteRef.attrib.get('LocationOID')
for StudyEventData in SubjectData:
for AuditRecord in StudyEventData:
print(
ClinicalData.attrib.get('MetaDataVersionOID'),
ClinicalData.attrib.
get('{http://www.mdsol.com/ns/odm/metadata}AuditSubCategoryName'
), #null ouptput due to namespace issue
SubjectData.attrib.get('SubjectKey'),
SubjectData.attrib.get(
'{http://www.mdsol.com/ns/odm/metadata}SubjectName'
), #null ouptput due to namespace issue
LocationOID, #not sure what is the issue
StudyEventData.attrib.get('StudyEventRepeatKey'),
AuditRecord.find(
'{http://www.cdisc.org/ns/odm/v1.3}DateTimeStamp').
text #not sure what is the issue
)
I think you can use BeautifulSoup for parsing XML:
from bs4 import BeautifulSoup
temp ="""<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<ODM xmlns="http://www.cdisc.org/ns/odm/v1.3"
xmlns:mdsol="http://www.mdsol.com/ns/odm/metadata"
CreationDateTime="2019-08-23T12:59:09" FileOID="3b2b4161-fad8-4239-9c83-03d0e62624dd" FileType="Transactional" ODMVersion="1.3">
<ClinicalData MetaDataVersionOID="1772" StudyOID="0ACC SP3 MAPPING1(DEV)" mdsol:AuditSubCategoryName="Activated">
<SubjectData SubjectKey="7735fd9c-1792-457c-aa58-0ca26ecdc810" mdsol:SubjectKeyType="SubjectUUID" mdsol:SubjectName="ACC-SUBJ-3">
<SiteRef LocationOID="0ACCSP3MAPPING1SITE1"/>
<StudyEventData StudyEventOID="FV" StudyEventRepeatKey="VIST[1]/FV[1]" mdsol:InstanceId="2960580">
<AuditRecord>
<UserRef UserOID="systemuser"/>
<LocationRef LocationOID="0ACCSP3MAPPING1SITE1"/>
<DateTimeStamp>2019-07-10T07:56:54</DateTimeStamp>
<ReasonForChange>Update</ReasonForChange>
<SourceID>394263772</SourceID>
</AuditRecord>
</StudyEventData>
</SubjectData>
</ClinicalData>
</ODM>"""
temp=BeautifulSoup(temp,"lxml")
ClinicalData = temp.find('ClinicalData'.lower())
SubjectData = ClinicalData.find_all('SubjectData'.lower())
LocationOID=None
for i in SubjectData:
SiteRef = i.find('SiteRef'.lower())
LocationOID = SiteRef.attrs['locationoid']
print('LocationOID',LocationOID)
output:
LocationOID 0ACCSP3MAPPING1SITE1
[Finished in 1.2s]
#Justin
I have applied your suggestions, it worked, until I broke it.
Input:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<ODM xmlns="http://www.cdisc.org/ns/odm/v1.3" xmlns:mdsol="http://www.mdsol.com/ns/odm/metadata" CreationDateTime="2019-08-23T12:59:09" FileOID="3b2b4161-fad8-4239-9c83-03d0e62624dd" FileType="Transactional" ODMVersion="1.3">
<ClinicalData MetaDataVersionOID="2965" StudyOID="0ACC SP3 MAPPING1(DEV)" mdsol:AuditSubCategoryName="Entered">
<SubjectData SubjectKey="481e4653-693c-4e15-8762-d8a66c0d2cf1" mdsol:SubjectKeyType="SubjectUUID" mdsol:SubjectName="ACC-SUBJ-1">
<SiteRef LocationOID="0ACCSP3MAPPING1SITE1"/>
<StudyEventData StudyEventOID="FV" StudyEventRepeatKey="VIST[1]/FV[1]" mdsol:InstanceId="2960564">
<FormData FormOID="VS" FormRepeatKey="1" mdsol:DataPageId="15331229">
<ItemGroupData ItemGroupOID="VS" mdsol:RecordId="17928808">
<ItemData ItemOID="VS.WT" TransactionType="Upsert" Value="45">
<AuditRecord>
<UserRef UserOID="alscrave2"/>
<LocationRef LocationOID="0ACCSP3MAPPING1SITE1"/>
<DateTimeStamp>2018-02-02T09:39:30</DateTimeStamp>
<ReasonForChange/>
<SourceID>122841525</SourceID>
</AuditRecord>
<MeasurementUnitRef MeasurementUnitOID="1761.Weight.1"/>
</ItemData>
</ItemGroupData>
</FormData>
</StudyEventData>
</SubjectData>
</ClinicalData>
<ClinicalData MetaDataVersionOID="2965" StudyOID="0ACC SP3 MAPPING1(DEV)" mdsol:AuditSubCategoryName="Entered">
<SubjectData SubjectKey="481e4653-693c-4e15-8762-d8a66c0d2cf1" mdsol:SubjectKeyType="SubjectUUID" mdsol:SubjectName="ACC-SUBJ-1">
<SiteRef LocationOID="0ACCSP3MAPPING1SITE1"/>
<StudyEventData StudyEventOID="FV" StudyEventRepeatKey="VIST[1]/FV[1]" mdsol:InstanceId="2960564">
<FormData FormOID="VS" FormRepeatKey="1" mdsol:DataPageId="15331229">
<ItemGroupData ItemGroupOID="VS" mdsol:RecordId="17928809">
<ItemData ItemOID="VS.WT" TransactionType="Upsert" Value="46">
<AuditRecord>
<UserRef UserOID="alscrave2"/>
<LocationRef LocationOID="0ACCSP3MAPPING1SITE1"/>
<DateTimeStamp>2018-02-02T09:39:30</DateTimeStamp>
<ReasonForChange/>
<SourceID>122841525</SourceID>
</AuditRecord>
<MeasurementUnitRef MeasurementUnitOID="1761.Weight.1"/>
</ItemData>
</ItemGroupData>
</FormData>
</StudyEventData>
</SubjectData>
</ClinicalData>
</ODM>
Code:
import xml.etree.ElementTree as ET
import pandas as pd
def getvalueofnode(node):
""" return node text or None """
return node.text if node is not None else None
tree = ET.parse("data.xml")
ODM = tree.getroot()
xmlns = "{http://www.cdisc.org/ns/odm/v1.3}"
mdsol = "{http://www.mdsol.com/ns/odm/metadata}"
def data_reader():
dfcols = ['CreationDateTime','StudyOID','MetaDataVersionOID','SubjectName','SUBJECTUUID','LocationOID','StudyEventOID',
'StudyEventRepeatKey','FormOID','FormRepeatKey','DataPageId','ItemgroupOID','RecordId','var_name','Value',
'DateTimeStamp','ASC_Name','Measurement_Unit','SourceID','UserOID','InstanceId']
df_xml = pd.DataFrame(columns=dfcols)
CreationDateTime = ODM.attrib.get('CreationDateTime')
for ClinicalData in ODM:
StudyOID = ClinicalData.attrib.get('StudyOID')
MetaDataVersionOID = ClinicalData.attrib.get('MetaDataVersionOID')
ASC_Name = ClinicalData.attrib.get('{0}AuditSubCategoryName'.format(mdsol))
for SubjectData in ClinicalData:
SubjectName = SubjectData.attrib.get('{0}SubjectName'.format(mdsol))
SUBJECTUUID = SubjectData.attrib.get('SubjectKey')
LocationOID = SubjectData.find('{0}SiteRef'.format(xmlns)).attrib.get('LocationOID')
for StudyEventData in SubjectData:
StudyEventOID = StudyEventData.attrib.get('StudyEventOID')
StudyEventRepeatKey = StudyEventData.attrib.get('StudyEventRepeatKey')
InstanceId = StudyEventData.attrib.get('{0}InstanceId'.format(mdsol))
for FormData in StudyEventData:
FormOID = FormData.attrib.get('FormOID')
FormRepeatKey = FormData.attrib.get('FormRepeatKey')
DataPageId = FormData.attrib.get('{0}DataPageId'.format(mdsol))
for ItemGroupData in FormData:
ItemgroupOID = ItemGroupData.attrib.get('ItemgroupOID')
RecordId = ItemGroupData.attrib.get('{0}RecordId'.format(mdsol))
for ItemData in ItemGroupData:
var_name = ItemData.attrib.get('ItemOID')
Value = ItemData.attrib.get('Value')
Measurement_Unit = ItemData.find('MeasurementUnitRef'.format(xmlns)).attrib.get('MeasurementUnitOID')
for AuditRecord in ItemData:
DateTimeStamp = AuditRecord.find('{0}DateTimeStamp'.format(xmlns)).text;
SourceID = AuditRecord.find('{0}SourceID'.format(xmlns)).text;
UserOID = ItemData.find('{0}UserRef'.format(xmlns)).attrib.get('UserOID')
df_xml = df_xml.append(
pd.Series([CreationDateTime,StudyOID,MetaDataVersionOID,SubjectName,
SUBJECTUUID,LocationOID,StudyEventOID,
StudyEventRepeatKey,FormOID,FormRepeatKey,DataPageId,ItemgroupOID,
RecordId,var_name,Value,DateTimeStamp,ASC_Name,Measurement_Unit,
SourceID,UserOID,InstanceId], index=dfcols),
ignore_index=True)
print(df_xml)
data_reader()
Issue: I am getting duplicate records. And variables DateTimeStamp, SourceID, UserOID and Measurement_Unit are throwing run time errors during assignment.
I want to take out some element from xml which look up from variable.
here is my.xml file:
<?xml version='1.0' encoding='UTF-8'?>
<ArrayOfSalesOrderHeader xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<SalesOrderHeader>
<TenantCode>15152343</TenantCode>
<SalesOrderDetails>
<SalesOrderDetail>
<ItemCode>20072129</ItemCode>
</SalesOrderDetail>
<SalesOrderDetail>
<ItemCode>67332054</ItemCode>
</SalesOrderDetail>
<SalesOrderDetail>
<ItemCode>20206133</ItemCode>
</SalesOrderDetail>
<SalesOrderDetail>
<ItemCode>62071796</ItemCode>
</SalesOrderDetail>
</SalesOrderDetails>
</SalesOrderHeader>
</ArrayOfSalesOrderHeader>
this is my script:
doc = ET.parse("my.xml")
arrDat = '20206133'
fol = doc.xpath('.//SalesOrderDetail[descendant::ItemCode[not(contains(text(),"' + arrDat + '"))]]')
for SOD in fol :
SOD.getparent().remove(SOD)
doc.write('output.xml', xml_declaration=True, encoding='utf-8', method="xml")
The problem when i defined arrDat as array:
doc = ET.parse("my.xml")
arrDat = ['20072129','67332054']
cnt = 0
while cnt < len(arrDat) :
fol = doc.xpath('.//SalesOrderDetail[descendant::ItemCode[not(contains(text(),"' + arrDat[cnt] + '"))]]')
for SOD in fol :
SOD.getparent().remove(SOD)
doc.write('output.xml', xml_declaration=True, encoding='utf-8', method="xml")
cnt += 1
i need output.xml to be like:
<?xml version='1.0' encoding='UTF-8'?>
<ArrayOfSalesOrderHeader xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<SalesOrderHeader>
<TenantCode>15152343</TenantCode>
<SalesOrderDetails>
<SalesOrderDetail>
<ItemCode>20072129</ItemCode>
</SalesOrderDetail>
<SalesOrderDetail>
<ItemCode>67332054</ItemCode>
</SalesOrderDetail>
</SalesOrderDetails>
</SalesOrderHeader>
</ArrayOfSalesOrderHeader>
I think you can simply check the item node value and remove the one not present on your list. Here is the implementation:
from lxml import etree as ET
doc = ET.parse("data1.xml")
arrDat = ['20072129', '67332054']
for order in doc.xpath("//SalesOrderDetail"):
item = order.xpath('ItemCode')
item_code = item[0].text
if item_code not in arrDat:
order.getparent().remove(order)
doc.write('output.xml', xml_declaration=True, encoding='utf-8', method="xml")
I'm using python and I need to find sku, min-order-qty and step-quantity for each occurence of sku.
Input file is:
<product sku="1235997403">
<sku>1235997403</sku>
<name xml:lang="fr-FR">Huile pour entretien des destructeurs de documents HSM</name>
<short-description xml:lang="fr-FR">Flacon 250 ml. Colis de 1 flacon.</short-description>
<category-links>
<category-link name="20319647o.rjpf_20320074o.rjpf" domain="RAJA-FR-WEB-0092-21" default = "1" hotdeal = "0"/>
</category-links>
<online>1</online>
<quantity unit="pcs">
<min-order-quantity>1</min-order-quantity>
<step-quantity>1</step-quantity>
</quantity>
....
</product>
....
I try to use lxml but fail to get min-order-qty and step-quantity
from lxml import etree
tree = etree.parse('./ST2CleanCourt.xml')
elem = tree.getroot()
for child in elem:
print (child.attrib["sku"])
I tried to use the 2 solutions below. It works but I need to read the file so I write
from lxml import etree
import codecs
f=codecs.open('./ST2CleanCourt.xml','r','utf-8')
fichier = f.read()
tree = etree.fromstring(fichier)
for child in tree:
print ('sku:', child.attrib['sku'])
print ('min:', child.find('.//min-order-quantity').text)
and I always get this error
print ('min:', child.find('.//min-order-quantity').text)
AttributeError: 'NoneType' object has no attribute 'text'
what is wrong ?
You can use the xpath method to get the required values.
Example:
from lxml import etree
a = """<product sku="1235997403">
<sku>1235997403</sku>
<name xml:lang="fr-FR">Huile pour entretien des destructeurs de documents HSM</name>
<short-description xml:lang="fr-FR">Flacon 250 ml. Colis de 1 flacon.</short-description>
<category-links>
<category-link name="20319647o.rjpf_20320074o.rjpf" domain="RAJA-FR-WEB-0092-21" default = "1" hotdeal = "0"/>
</category-links>
<online>1</online>
<quantity unit="pcs">
<min-order-quantity>1</min-order-quantity>
<step-quantity>1</step-quantity>
</quantity>
</product>
"""
tree = etree.fromstring(a)
tags = tree.xpath('/product')
for b in tags:
print b.attrib["sku"]
min_order = b.xpath("//quantity/min-order-quantity")
print min_order[0].text
step_quality = b.xpath("//quantity/step-quantity")
print step_quality[0].text
Output:
1235997403
1
1
Using more then 1 product and root node of products you can find this:
x = """
<products>
<product sku="1235997403">
<sku>1235997403</sku>
<name xml:lang="fr-FR">Huile pour entretien des destructeurs de documents HSM</name>
<short-description xml:lang="fr-FR">Flacon 250 ml. Colis de 1 flacon.</short-description>
<category-links>
<category-link name="20319647o.rjpf_20320074o.rjpf" domain="RAJA-FR-WEB-0092-21" default = "1" hotdeal = "0"/>
</category-links>
<online>1</online>
<quantity unit="pcs">
<min-order-quantity>1</min-order-quantity>
<step-quantity>1</step-quantity>
</quantity>
</product>
<product sku="997403">
<sku>1235997403</sku>
<name xml:lang="fr-FR">Huile pour entretien des destructeurs de documents HSM</name>
<short-description xml:lang="fr-FR">Flacon 250 ml. Colis de 1 flacon.</short-description>
<category-links>
<category-link name="20319647o.rjpf_20320074o.rjpf" domain="RAJA-FR-WEB-0092-21" default = "1" hotdeal = "0"/>
</category-links>
<online>1</online>
<quantity unit="pcs">
<min-order-quantity>5</min-order-quantity>
<step-quantity>7</step-quantity>
</quantity>
</product>
</products>
"""
from lxml import etree
tree = etree.fromstring(x)
for child in tree:
print ("sku:", child.attrib["sku"])
print ("min:", child.find(".//min-order-quantity").text) # looks for node below
print ("step:" ,child.find(".//step-quantity").text) # child with the given name
Essentially you look for any node below child that has the correct name and print its text.
Output:
sku:1235997403
min:1
step:1
sku:997403
min:5
step:7
Doku: http://lxml.de/tutorial.html#elementpath
A similar question is asked here (Python XML Parsing) but I could not reach to the content I am interested in.
I need to extract all the information that is enclosed between the tag patent-classification if the classification-scheme tag value is CPC. There are multiple such element and are enclosed inside patent-classifications tag.
In the example given below, there are three such values: C 07 K 16 22 I , A 61 K 2039 505 A and C 07 K 2317 21 A
<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="/3.0/style/exchange.xsl"?>
<ops:world-patent-data xmlns="http://www.epo.org/exchange" xmlns:ops="http://ops.epo.org" xmlns:xlink="http://www.w3.org/1999/xlink">
<ops:meta name="elapsed-time" value="21"/>
<exchange-documents>
<exchange-document system="ops.epo.org" family-id="39103486" country="US" doc-number="2009234106" kind="A1">
<bibliographic-data>
<publication-reference>
<document-id document-id-type="docdb">
<country>US</country>
<doc-number>2009234106</doc-number>
<kind>A1</kind>
<date>20090917</date>
</document-id>
<document-id document-id-type="epodoc">
<doc-number>US2009234106</doc-number>
<date>20090917</date>
</document-id>
</publication-reference>
<classifications-ipcr>
<classification-ipcr sequence="1">
<text>C07K 16/ 44 A I </text>
</classification-ipcr>
</classifications-ipcr>
<patent-classifications>
<patent-classification sequence="1">
<classification-scheme office="" scheme="CPC"/>
<section>C</section>
<class>07</class>
<subclass>K</subclass>
<main-group>16</main-group>
<subgroup>22</subgroup>
<classification-value>I</classification-value>
</patent-classification>
<patent-classification sequence="2">
<classification-scheme office="" scheme="CPC"/>
<section>A</section>
<class>61</class>
<subclass>K</subclass>
<main-group>2039</main-group>
<subgroup>505</subgroup>
<classification-value>A</classification-value>
</patent-classification>
<patent-classification sequence="7">
<classification-scheme office="" scheme="CPC"/>
<section>C</section>
<class>07</class>
<subclass>K</subclass>
<main-group>2317</main-group>
<subgroup>92</subgroup>
<classification-value>A</classification-value>
</patent-classification>
<patent-classification sequence="1">
<classification-scheme office="US" scheme="UC"/>
<classification-symbol>530/387.9</classification-symbol>
</patent-classification>
</patent-classifications>
</bibliographic-data>
</exchange-document>
</exchange-documents>
</ops:world-patent-data>
Install BeautifulSoup if you don't have it:
$ easy_install BeautifulSoup4
Try this:
from bs4 import BeautifulSoup
xml = open('example.xml', 'rb').read()
bs = BeautifulSoup(xml)
# find patent-classification
patents = bs.findAll('patent-classification')
# filter the ones with CPC
for pa in patents:
if pa.find('classification-scheme', {'scheme': 'CPC'} ):
print pa.getText()
You can use python xml standard module:
import xml.etree.ElementTree as ET
root = ET.parse('a.xml').getroot()
for node in root.iterfind(".//{http://www.epo.org/exchange}classification-scheme[#scheme='CPC']/.."):
data = []
for d in node.getchildren():
if d.text:
data.append(d.text)
print ' '.join(data)