I have an XML file and I would like to parse it into a table. (Pandas dataframe)
Below is just a sample of the XML file. Those are only two of the records.
<?xml version="1.0" encoding="UTF-8"?>
<file>
<C13_335010X321A1_837Y6>
<BHT_BeginningOfHierarchicalTransaction>
<BHT01__HierarchicalStructureCode>0011</BHT01__HierarchicalStructureCode>
<BHT02__TransactionSetPurposeCode>00</BHT02__TransactionSetPurposeCode>
<BHT03__OriginatorApplicationTransactionIdentifier>513513TR</BHT03__OriginatorApplicationTransactionIdentifier>
<BHT04__TransactionSetCreationDate>20200212</BHT04__TransactionSetCreationDate>
<BHT05__TransactionSetCreationTime>1287</BHT05__TransactionSetCreationTime>
<BHT06__ClaimOrEncounterIdentifier>DD</BHT06__ClaimOrEncounterIdentifier>
</BHT_BeginningOfHierarchicalTransaction>
<Loop_1000A>
<NM1_SubmitterName_1000A>
<NM101__EntityIdentifierCode>27</NM101__EntityIdentifierCode>
<NM102__EntityTypeQualifier>9</NM102__EntityTypeQualifier>
<NM103__SubmitterLastOrOrganizationName>AAA</NM103__SubmitterLastOrOrganizationName>
<NM108__IdentificationCodeQualifier>22</NM108__IdentificationCodeQualifier>
<NM109__SubmitterIdentifier>55555500</NM109__SubmitterIdentifier>
</NM1_SubmitterName_1000A>
<PER_SubmitterEDIContactInformation_1000A>
<PER01__ContactFunctionCode>LK</PER01__ContactFunctionCode>
<PER02__SubmitterContactName>John Smith</PER02__SubmitterContactName>
<PER03__CommunicationNumberQualifier>WW</PER03__CommunicationNumberQualifier>
<PER04__CommunicationNumber>2132220011</PER04__CommunicationNumber>
<PER05__CommunicationNumberQualifier>DD</PER05__CommunicationNumberQualifier>
<PER06__CommunicationNumber>DD_2#GMAIL.COM</PER06__CommunicationNumber>
</PER_SubmitterEDIContactInformation_1000A>
</Loop_1000A>
<Loop_1000B>
<NM1_ReceiverName_1000B>
<NM101__EntityIdentifierCode>21</NM101__EntityIdentifierCode>
<NM102__EntityTypeQualifier>0</NM102__EntityTypeQualifier>
<NM103__ReceiverName>AAA</NM103__ReceiverName>
<NM108__IdentificationCodeQualifier>32</NM108__IdentificationCodeQualifier>
<NM109__ReceiverPrimaryIdentifier>2514521</NM109__ReceiverPrimaryIdentifier>
</NM1_ReceiverName_1000B>
</Loop_1000B>
<Loop_2000A>
<HL_BillingProviderHierarchicalLevel_2000A>
<HL01__HierarchicalIDNumber>32</HL01__HierarchicalIDNumber>
<HL03__HierarchicalLevelCode>54</HL03__HierarchicalLevelCode>
<HL04__HierarchicalChildCode>32</HL04__HierarchicalChildCode>
</HL_BillingProviderHierarchicalLevel_2000A>
<Loop_2010AA>
<NM1_BillingProviderName_2010AA>
<NM101__EntityIdentifierCode>54</NM101__EntityIdentifierCode>
<NM102__EntityTypeQualifier>21</NM102__EntityTypeQualifier>
<NM103__BillingProviderLastOrOrganizationalName>AAA</NM103__BillingProviderLastOrOrganizationalName>
<NM108__IdentificationCodeQualifier>XX</NM108__IdentificationCodeQualifier>
<NM109__BillingProviderIdentifier>515151325</NM109__BillingProviderIdentifier>
</NM1_BillingProviderName_2010AA>
<N3_BillingProviderAddress_2010AA>
<N301__BillingProviderAddressLine>214 SS STREET</N301__BillingProviderAddressLine>
</N3_BillingProviderAddress_2010AA>
<N4_BillingProviderCityStateZIPCode_2010AA>
<N401__BillingProviderCityName>LA</N401__BillingProviderCityName>
<N402__BillingProviderStateOrProvinceCode>CA</N402__BillingProviderStateOrProvinceCode>
<N403__BillingProviderPostalZoneOrZIPCode>93500</N403__BillingProviderPostalZoneOrZIPCode>
</N4_BillingProviderCityStateZIPCode_2010AA>
<REF_BillingProviderTaxIdentification_2010AA>
<REF01__ReferenceIdentificationQualifier>OI</REF01__ReferenceIdentificationQualifier>
<REF02__BillingProviderTaxIdentificationNumber>5135151315</REF02__BillingProviderTaxIdentificationNumber>
</REF_BillingProviderTaxIdentification_2010AA>
</Loop_2010AA>
<Loop_2000B>
<HL_SubscriberHierarchicalLevel_2000B>
<HL01__HierarchicalIDNumber>5</HL01__HierarchicalIDNumber>
<HL02__HierarchicalParentIDNumber>5</HL02__HierarchicalParentIDNumber>
<HL03__HierarchicalLevelCode>55</HL03__HierarchicalLevelCode>
<HL04__HierarchicalChildCode>5</HL04__HierarchicalChildCode>
</HL_SubscriberHierarchicalLevel_2000B>
<SBR_SubscriberInformation_2000B>
<SBR01__PayerResponsibilitySequenceNumberCode>L</SBR01__PayerResponsibilitySequenceNumberCode>
<SBR02__IndividualRelationshipCode>32</SBR02__IndividualRelationshipCode>
<SBR03__SubscriberGroupOrPolicyNumber>252525Z125</SBR03__SubscriberGroupOrPolicyNumber>
<SBR09__ClaimFilingIndicatorCode>NM</SBR09__ClaimFilingIndicatorCode>
</SBR_SubscriberInformation_2000B>
<Loop_2010BA>
<NM1_SubscriberName_2010BA>
<NM101__EntityIdentifierCode>DCX</NM101__EntityIdentifierCode>
<NM102__EntityTypeQualifier>5</NM102__EntityTypeQualifier>
<NM103__SubscriberLastName>SMITH</NM103__SubscriberLastName>
<NM104__SubscriberFirstName>JOHN</NM104__SubscriberFirstName>
<NM108__IdentificationCodeQualifier>CA</NM108__IdentificationCodeQualifier>
<NM109__SubscriberPrimaryIdentifier>3656361.</NM109__SubscriberPrimaryIdentifier>
</NM1_SubscriberName_2010BA>
<N3_SubscriberAddress_2010BA>
<N301__SubscriberAddressLine>111 STREET</N301__SubscriberAddressLine>
</N3_SubscriberAddress_2010BA>
<N4_SubscriberCityStateZIPCode_2010BA>
<N401__SubscriberCityName>LA</N401__SubscriberCityName>
<N402__SubscriberStateCode>CA</N402__SubscriberStateCode>
<N403__SubscriberPostalZoneOrZIPCode>93000</N403__SubscriberPostalZoneOrZIPCode>
</N4_SubscriberCityStateZIPCode_2010BA>
<DMG_SubscriberDemographicInformation_2010BA>
<DMG01__DateTimePeriodFormatQualifier>K5</DMG01__DateTimePeriodFormatQualifier>
<DMG02__SubscriberBirthDate>19851010</DMG02__SubscriberBirthDate>
<DMG03__SubscriberGenderCode>U</DMG03__SubscriberGenderCode>
</DMG_SubscriberDemographicInformation_2010BA>
</Loop_2010BA>
<Loop_2010BB>
<NM1_PayerName_2010BB>
<NM101__EntityIdentifierCode>FF</NM101__EntityIdentifierCode>
<NM102__EntityTypeQualifier>3</NM102__EntityTypeQualifier>
<NM103__PayerName>AAA</NM103__PayerName>
<NM108__IdentificationCodeQualifier>GF</NM108__IdentificationCodeQualifier>
<NM109__PayerIdentifier>32514</NM109__PayerIdentifier>
</NM1_PayerName_2010BB>
</Loop_2010BB>
<Loop_2300>
<CLM_ClaimInformation_2300>
<CLM01__PatientControlNumber>5413</CLM01__PatientControlNumber>
<CLM02__TotalClaimChargeAmount>651</CLM02__TotalClaimChargeAmount>
<CLM05_HealthCareServiceLocationInformation_2300>
<CLM05_01_PlaceOfServiceCode>13</CLM05_01_PlaceOfServiceCode>
<CLM05_02_FacilityCodeQualifier>D</CLM05_02_FacilityCodeQualifier>
<CLM05_03_ClaimFrequencyCode>3</CLM05_03_ClaimFrequencyCode>
</CLM05_HealthCareServiceLocationInformation_2300>
<CLM06__ProviderOrSupplierSignatureIndicator>N</CLM06__ProviderOrSupplierSignatureIndicator>
<CLM07__AssignmentOrPlanParticipationCode>R</CLM07__AssignmentOrPlanParticipationCode>
<CLM08__BenefitsAssignmentCertificationIndicator>N</CLM08__BenefitsAssignmentCertificationIndicator>
<CLM09__ReleaseOfInformationCode>N</CLM09__ReleaseOfInformationCode>
<CLM10__PatientSignatureSourceCode>X</CLM10__PatientSignatureSourceCode>
</CLM_ClaimInformation_2300>
<REF_ClaimIdentifierForTransmissionIntermediaries_2300>
<REF01__ReferenceIdentificationQualifier>J1</REF01__ReferenceIdentificationQualifier>
<REF02__ValueAddedNetworkTraceNumber>FVC2514543254</REF02__ValueAddedNetworkTraceNumber>
</REF_ClaimIdentifierForTransmissionIntermediaries_2300>
<HI_HealthCareDiagnosisCode_2300>
<HI01_HealthCareCodeInformation_2300>
<HI01_01_DiagnosisTypeCode>CCC</HI01_01_DiagnosisTypeCode>
<HI01_02_DiagnosisCode>N111</HI01_02_DiagnosisCode>
</HI01_HealthCareCodeInformation_2300>
</HI_HealthCareDiagnosisCode_2300>
<Loop_2310B>
<NM1_RenderingProviderName_2310B>
<NM101__EntityIdentifierCode>32</NM101__EntityIdentifierCode>
<NM102__EntityTypeQualifier>2</NM102__EntityTypeQualifier>
<NM103__RenderingProviderLastOrOrganizationName>JOHN</NM103__RenderingProviderLastOrOrganizationName>
<NM104__RenderingProviderFirstName>SMITH</NM104__RenderingProviderFirstName>
<NM108__IdentificationCodeQualifier>TT</NM108__IdentificationCodeQualifier>
<NM109__RenderingProviderIdentifier>25431251</NM109__RenderingProviderIdentifier>
</NM1_RenderingProviderName_2310B>
<PRV_RenderingProviderSpecialtyInformation_2310B>
<PRV01__ProviderCode>TR</PRV01__ProviderCode>
<PRV02__ReferenceIdentificationQualifier>VFD</PRV02__ReferenceIdentificationQualifier>
<PRV03__ProviderTaxonomyCode>135454353L</PRV03__ProviderTaxonomyCode>
</PRV_RenderingProviderSpecialtyInformation_2310B>
</Loop_2310B>
<Loop_2400>
<LX_ServiceLineNumber_2400>
<LX01__AssignedNumber>2</LX01__AssignedNumber>
</LX_ServiceLineNumber_2400>
<SV1_ProfessionalService_2400>
<SV101_CompositeMedicalProcedureIdentifier_2400>
<SV101_01_ProductOrServiceIDQualifier>EE</SV101_01_ProductOrServiceIDQualifier>
<SV101_02_ProcedureCode>99999</SV101_02_ProcedureCode>
<SV101_07_Description>BLOOD</SV101_07_Description>
</SV101_CompositeMedicalProcedureIdentifier_2400>
<SV102__LineItemChargeAmount>200</SV102__LineItemChargeAmount>
<SV103__UnitOrBasisForMeasurementCode>PP</SV103__UnitOrBasisForMeasurementCode>
<SV104__ServiceUnitCount>3.5</SV104__ServiceUnitCount>
<SV107_CompositeDiagnosisCodePointer_2400>
<SV107_01_DiagnosisCodePointer>2</SV107_01_DiagnosisCodePointer>
</SV107_CompositeDiagnosisCodePointer_2400>
</SV1_ProfessionalService_2400>
<DTP_DateServiceDate_2400>
<DTP01__DateTimeQualifier>654</DTP01__DateTimeQualifier>
<DTP02__DateTimePeriodFormatQualifier>U8</DTP02__DateTimePeriodFormatQualifier>
<DTP03__ServiceDate>20191010</DTP03__ServiceDate>
</DTP_DateServiceDate_2400>
<REF_LineItemControlNumber_2400>
<REF01__ReferenceIdentificationQualifier>5F</REF01__ReferenceIdentificationQualifier>
<REF02__LineItemControlNumber>DDD.32.123</REF02__LineItemControlNumber>
</REF_LineItemControlNumber_2400>
</Loop_2400>
</Loop_2300>
</Loop_2000B>
</Loop_2000A>
</C13_335010X321A1_837Y6>
<C13_335010X321A1_837Y6>
<BHT_BeginningOfHierarchicalTransaction>
<BHT01__HierarchicalStructureCode>0011</BHT01__HierarchicalStructureCode>
<BHT02__TransactionSetPurposeCode>00</BHT02__TransactionSetPurposeCode>
<BHT03__OriginatorApplicationTransactionIdentifier>513513TR</BHT03__OriginatorApplicationTransactionIdentifier>
<BHT04__TransactionSetCreationDate>20200212</BHT04__TransactionSetCreationDate>
<BHT05__TransactionSetCreationTime>1287</BHT05__TransactionSetCreationTime>
<BHT06__ClaimOrEncounterIdentifier>DD</BHT06__ClaimOrEncounterIdentifier>
</BHT_BeginningOfHierarchicalTransaction>
<Loop_1000A>
<NM1_SubmitterName_1000A>
<NM101__EntityIdentifierCode>27</NM101__EntityIdentifierCode>
<NM102__EntityTypeQualifier>9</NM102__EntityTypeQualifier>
<NM103__SubmitterLastOrOrganizationName>AAA</NM103__SubmitterLastOrOrganizationName>
<NM108__IdentificationCodeQualifier>22</NM108__IdentificationCodeQualifier>
<NM109__SubmitterIdentifier>55555500</NM109__SubmitterIdentifier>
</NM1_SubmitterName_1000A>
<PER_SubmitterEDIContactInformation_1000A>
<PER01__ContactFunctionCode>LK</PER01__ContactFunctionCode>
<PER02__SubmitterContactName>John Smith</PER02__SubmitterContactName>
<PER03__CommunicationNumberQualifier>WW</PER03__CommunicationNumberQualifier>
<PER04__CommunicationNumber>2132220011</PER04__CommunicationNumber>
<PER05__CommunicationNumberQualifier>DD</PER05__CommunicationNumberQualifier>
<PER06__CommunicationNumber>DD_2#GMAIL.COM</PER06__CommunicationNumber>
</PER_SubmitterEDIContactInformation_1000A>
</Loop_1000A>
<Loop_1000B>
<NM1_ReceiverName_1000B>
<NM101__EntityIdentifierCode>21</NM101__EntityIdentifierCode>
<NM102__EntityTypeQualifier>0</NM102__EntityTypeQualifier>
<NM103__ReceiverName>AAA</NM103__ReceiverName>
<NM108__IdentificationCodeQualifier>32</NM108__IdentificationCodeQualifier>
<NM109__ReceiverPrimaryIdentifier>2514521</NM109__ReceiverPrimaryIdentifier>
</NM1_ReceiverName_1000B>
</Loop_1000B>
<Loop_2000A>
<HL_BillingProviderHierarchicalLevel_2000A>
<HL01__HierarchicalIDNumber>32</HL01__HierarchicalIDNumber>
<HL03__HierarchicalLevelCode>54</HL03__HierarchicalLevelCode>
<HL04__HierarchicalChildCode>32</HL04__HierarchicalChildCode>
</HL_BillingProviderHierarchicalLevel_2000A>
<Loop_2010AA>
<NM1_BillingProviderName_2010AA>
<NM101__EntityIdentifierCode>54</NM101__EntityIdentifierCode>
<NM102__EntityTypeQualifier>21</NM102__EntityTypeQualifier>
<NM103__BillingProviderLastOrOrganizationalName>AAA</NM103__BillingProviderLastOrOrganizationalName>
<NM108__IdentificationCodeQualifier>XX</NM108__IdentificationCodeQualifier>
<NM109__BillingProviderIdentifier>515151325</NM109__BillingProviderIdentifier>
</NM1_BillingProviderName_2010AA>
<N3_BillingProviderAddress_2010AA>
<N301__BillingProviderAddressLine>214 SS STREET</N301__BillingProviderAddressLine>
</N3_BillingProviderAddress_2010AA>
<N4_BillingProviderCityStateZIPCode_2010AA>
<N401__BillingProviderCityName>LA</N401__BillingProviderCityName>
<N402__BillingProviderStateOrProvinceCode>CA</N402__BillingProviderStateOrProvinceCode>
<N403__BillingProviderPostalZoneOrZIPCode>93500</N403__BillingProviderPostalZoneOrZIPCode>
</N4_BillingProviderCityStateZIPCode_2010AA>
<REF_BillingProviderTaxIdentification_2010AA>
<REF01__ReferenceIdentificationQualifier>OI</REF01__ReferenceIdentificationQualifier>
<REF02__BillingProviderTaxIdentificationNumber>5135151315</REF02__BillingProviderTaxIdentificationNumber>
</REF_BillingProviderTaxIdentification_2010AA>
</Loop_2010AA>
<Loop_2000B>
<HL_SubscriberHierarchicalLevel_2000B>
<HL01__HierarchicalIDNumber>5</HL01__HierarchicalIDNumber>
<HL02__HierarchicalParentIDNumber>5</HL02__HierarchicalParentIDNumber>
<HL03__HierarchicalLevelCode>55</HL03__HierarchicalLevelCode>
<HL04__HierarchicalChildCode>5</HL04__HierarchicalChildCode>
</HL_SubscriberHierarchicalLevel_2000B>
<SBR_SubscriberInformation_2000B>
<SBR01__PayerResponsibilitySequenceNumberCode>L</SBR01__PayerResponsibilitySequenceNumberCode>
<SBR02__IndividualRelationshipCode>32</SBR02__IndividualRelationshipCode>
<SBR03__SubscriberGroupOrPolicyNumber>252525Z125</SBR03__SubscriberGroupOrPolicyNumber>
<SBR09__ClaimFilingIndicatorCode>NM</SBR09__ClaimFilingIndicatorCode>
</SBR_SubscriberInformation_2000B>
<Loop_2010BA>
<NM1_SubscriberName_2010BA>
<NM101__EntityIdentifierCode>DCX</NM101__EntityIdentifierCode>
<NM102__EntityTypeQualifier>5</NM102__EntityTypeQualifier>
<NM103__SubscriberLastName>SMITH</NM103__SubscriberLastName>
<NM104__SubscriberFirstName>JOHN</NM104__SubscriberFirstName>
<NM108__IdentificationCodeQualifier>CA</NM108__IdentificationCodeQualifier>
<NM109__SubscriberPrimaryIdentifier>3656361.</NM109__SubscriberPrimaryIdentifier>
</NM1_SubscriberName_2010BA>
<N3_SubscriberAddress_2010BA>
<N301__SubscriberAddressLine>111 STREET</N301__SubscriberAddressLine>
</N3_SubscriberAddress_2010BA>
<N4_SubscriberCityStateZIPCode_2010BA>
<N401__SubscriberCityName>LA</N401__SubscriberCityName>
<N402__SubscriberStateCode>CA</N402__SubscriberStateCode>
<N403__SubscriberPostalZoneOrZIPCode>93000</N403__SubscriberPostalZoneOrZIPCode>
</N4_SubscriberCityStateZIPCode_2010BA>
<DMG_SubscriberDemographicInformation_2010BA>
<DMG01__DateTimePeriodFormatQualifier>K5</DMG01__DateTimePeriodFormatQualifier>
<DMG02__SubscriberBirthDate>19851010</DMG02__SubscriberBirthDate>
<DMG03__SubscriberGenderCode>U</DMG03__SubscriberGenderCode>
</DMG_SubscriberDemographicInformation_2010BA>
</Loop_2010BA>
<Loop_2010BB>
<NM1_PayerName_2010BB>
<NM101__EntityIdentifierCode>FF</NM101__EntityIdentifierCode>
<NM102__EntityTypeQualifier>3</NM102__EntityTypeQualifier>
<NM103__PayerName>AAA</NM103__PayerName>
<NM108__IdentificationCodeQualifier>GF</NM108__IdentificationCodeQualifier>
<NM109__PayerIdentifier>32514</NM109__PayerIdentifier>
</NM1_PayerName_2010BB>
</Loop_2010BB>
<Loop_2300>
<CLM_ClaimInformation_2300>
<CLM01__PatientControlNumber>5413</CLM01__PatientControlNumber>
<CLM02__TotalClaimChargeAmount>651</CLM02__TotalClaimChargeAmount>
<CLM05_HealthCareServiceLocationInformation_2300>
<CLM05_01_PlaceOfServiceCode>13</CLM05_01_PlaceOfServiceCode>
<CLM05_02_FacilityCodeQualifier>D</CLM05_02_FacilityCodeQualifier>
<CLM05_03_ClaimFrequencyCode>3</CLM05_03_ClaimFrequencyCode>
</CLM05_HealthCareServiceLocationInformation_2300>
<CLM06__ProviderOrSupplierSignatureIndicator>N</CLM06__ProviderOrSupplierSignatureIndicator>
<CLM07__AssignmentOrPlanParticipationCode>R</CLM07__AssignmentOrPlanParticipationCode>
<CLM08__BenefitsAssignmentCertificationIndicator>N</CLM08__BenefitsAssignmentCertificationIndicator>
<CLM09__ReleaseOfInformationCode>N</CLM09__ReleaseOfInformationCode>
<CLM10__PatientSignatureSourceCode>X</CLM10__PatientSignatureSourceCode>
</CLM_ClaimInformation_2300>
<REF_ClaimIdentifierForTransmissionIntermediaries_2300>
<REF01__ReferenceIdentificationQualifier>J1</REF01__ReferenceIdentificationQualifier>
<REF02__ValueAddedNetworkTraceNumber>FVC2514543254</REF02__ValueAddedNetworkTraceNumber>
</REF_ClaimIdentifierForTransmissionIntermediaries_2300>
<HI_HealthCareDiagnosisCode_2300>
<HI01_HealthCareCodeInformation_2300>
<HI01_01_DiagnosisTypeCode>CCC</HI01_01_DiagnosisTypeCode>
<HI01_02_DiagnosisCode>N111</HI01_02_DiagnosisCode>
</HI01_HealthCareCodeInformation_2300>
</HI_HealthCareDiagnosisCode_2300>
<Loop_2310B>
<NM1_RenderingProviderName_2310B>
<NM101__EntityIdentifierCode>32</NM101__EntityIdentifierCode>
<NM102__EntityTypeQualifier>2</NM102__EntityTypeQualifier>
<NM103__RenderingProviderLastOrOrganizationName>JOHN</NM103__RenderingProviderLastOrOrganizationName>
<NM104__RenderingProviderFirstName>SMITH</NM104__RenderingProviderFirstName>
<NM108__IdentificationCodeQualifier>TT</NM108__IdentificationCodeQualifier>
<NM109__RenderingProviderIdentifier>25431251</NM109__RenderingProviderIdentifier>
</NM1_RenderingProviderName_2310B>
<PRV_RenderingProviderSpecialtyInformation_2310B>
<PRV01__ProviderCode>TR</PRV01__ProviderCode>
<PRV02__ReferenceIdentificationQualifier>VFD</PRV02__ReferenceIdentificationQualifier>
<PRV03__ProviderTaxonomyCode>135454353L</PRV03__ProviderTaxonomyCode>
</PRV_RenderingProviderSpecialtyInformation_2310B>
</Loop_2310B>
<Loop_2400>
<LX_ServiceLineNumber_2400>
<LX01__AssignedNumber>2</LX01__AssignedNumber>
</LX_ServiceLineNumber_2400>
<SV1_ProfessionalService_2400>
<SV101_CompositeMedicalProcedureIdentifier_2400>
<SV101_01_ProductOrServiceIDQualifier>EE</SV101_01_ProductOrServiceIDQualifier>
<SV101_02_ProcedureCode>99999</SV101_02_ProcedureCode>
<SV101_07_Description>BLOOD</SV101_07_Description>
</SV101_CompositeMedicalProcedureIdentifier_2400>
<SV102__LineItemChargeAmount>200</SV102__LineItemChargeAmount>
<SV103__UnitOrBasisForMeasurementCode>PP</SV103__UnitOrBasisForMeasurementCode>
<SV104__ServiceUnitCount>3.5</SV104__ServiceUnitCount>
<SV107_CompositeDiagnosisCodePointer_2400>
<SV107_01_DiagnosisCodePointer>2</SV107_01_DiagnosisCodePointer>
</SV107_CompositeDiagnosisCodePointer_2400>
</SV1_ProfessionalService_2400>
<DTP_DateServiceDate_2400>
<DTP01__DateTimeQualifier>654</DTP01__DateTimeQualifier>
<DTP02__DateTimePeriodFormatQualifier>U8</DTP02__DateTimePeriodFormatQualifier>
<DTP03__ServiceDate>20191010</DTP03__ServiceDate>
</DTP_DateServiceDate_2400>
<REF_LineItemControlNumber_2400>
<REF01__ReferenceIdentificationQualifier>5F</REF01__ReferenceIdentificationQualifier>
<REF02__LineItemControlNumber>DDD.32.123</REF02__LineItemControlNumber>
</REF_LineItemControlNumber_2400>
</Loop_2400>
</Loop_2300>
</Loop_2000B>
</Loop_2000A>
</C13_335010X321A1_837Y6>
</file>
These have to be in two rows, I am using the following python code to convert it into panda data frame, but I am getting empty data frame.
import pandas as pd
import xml.etree.ElementTree as et
def xml_file(file):
columns = file.attrib
for xml in file.iter('C13_335010X321A1_837Y6'):
file_dict = columns.copy()
file_dict.update(xml.attrib)
yield file_dict
tree = et.parse(r"C:\Users\Desktop\test1.xml")
root = tree.getroot()
df = pd.DataFrame(list(xml_file(root)))
Related
I have folders full of XML files which I want to parse to a dataframe. The following functions iterate through an XML tree recursively and return a dataframe with three columns: path, attributes and text.
def XML2DF(filename,df1,MAX_DEPTH=20):
with open(filename) as f:
xml_str = f.read()
tree = etree.fromstring(xml_str)
df1 = recursive_parseXML2DF(tree, df1, MAX_DEPTH=MAX_DEPTH)
return
def recursive_parseXML2DF(element, df1, depth=0, MAX_DEPTH=20):
if depth > MAX_DEPTH:
return df1
df2 = pd.DataFrame([[element.getroottree().getpath(element), element.attrib, element.text]],
columns=["path", "attrib", "text"])
#print(df2)
df1 = pd.concat([df1, df2])
for child in element.getchildren():
df1 = recursive_parseXML2DF(child, df1, depth=depth + 1)
return df1
The code for the function was adapted from this post.
Most of the times the function works fine and returns the entire path but for some documents the returned path looks like this:
/*/*[1]/*[3]
/*/*[1]/*[3]/*[1]
The text tag entry remains valid and correct.
The only difference in the XML between working path and widlcard path documents I can make out is that the XML tags are written in all caps.
Working example:
<?xml version="1.0" encoding="utf-8"?>
<root>
<Header>
<ReceivingApplication>ReceivingApplication</ReceivingApplication>
<SendingApplication>SendingApplication</SendingApplication>
<MessageControlID>12345</MessageControlID>
<ReceivingApplication>ReceivingApplication</ReceivingApplication>
<FileCreationDate>2000-01-01T00:00:00</FileCreationDate>
</Header>
<Einsendung>
<Patient>
<PatientName>Name</PatientName>
<PatientVorname>FirstName</PatientVorname>
<PatientGebDat>2000-01-01T00:00:00</PatientGebDat>
<PatientSex>4</PatientSex>
<PatientPWID>123456</PatientPWID>
</Patient>
<Visit>
<VisitNumber>A2000.0001</VisitNumber>
<PatientPLZ>1234</PatientPLZ>
<PatientOrt>PatientOrt</PatientOrt>
<PatientAdr2>
</PatientAdr2>
<PatientStrasse>PatientStrasse 01</PatientStrasse>
<VisitEinsID>1234</VisitEinsID>
<VisitBefund>VisitBefund</VisitBefund>
<Befunddatum>2000-01-01T00:00:00</Befunddatum>
</Visit>
</Einsendung>
</root>
nonsensical Example:
<?xml version="1.0"?>
<KRSCHWEIZ xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="krSCHWEIZ">
<KEY_VS>abcdefg</KEY_VS>
<KEY_KLR>abcdefg</KEY_KLR>
<ABSENDER>
<ABSENDER_MELDER_ID>123456</ABSENDER_MELDER_ID>
<MELDER>
<MELDER_ID>123456</MELDER_ID>
<QUELLSYSTEM>ABCDEF</QUELLSYSTEM>
<PATIENT>
<REFERENZNR>987654</REFERENZNR>
<NACHNAME>my name</NACHNAME>
<VORNAMEN>my first name</VORNAMEN>
<GEBURTSNAME />
<GEBURTSDATUM>my dob</GEBURTSDATUM>
<GESCHLECHT>XX</GESCHLECHT>
<PLZ>9999</PLZ>
<WOHNORT>Mycity</WOHNORT>
<STRASSE>mystreet</STRASSE>
<HAUSNR>99</HAUSNR>
<VERSICHERTENNR>999999999</VERSICHERTENNR>
<DATEIEN>
<DATEI>
<DATEINAME>my_attached_document.html</DATEINAME>
<DATEIBASE64>mybase_64_encoded_document</DATEIBASE64>
</DATEI>
</DATEIEN>
</PATIENT>
</MELDER>
</ABSENDER>
</KRSCHWEIZ>
How do I get correct explicit path information also for this case?
The prescence of namespaces changes the output of .getpath() - you can use .getelementpath() instead which will include the namespace prefix instead of using wildcards.
If the prefix should be discarded completely - you can strip them out before using .getpath()
import lxml.etree
import pandas as pd
rows = []
tree = lxml.etree.parse("broken.xml")
for node in tree.iter():
try:
node.tag = lxml.etree.QName(node).localname
except ValueError:
# skip tags with no name
continue
rows.append([tree.getpath(node), node.attrib, node.text])
df = pd.DataFrame(rows, columns=["path", "attrib", "text"])
Resulting dataframe:
>>> df
path attrib text
0 /KRSCHWEIZ [] \n
1 /KRSCHWEIZ/KEY_VS [] abcdefg
2 /KRSCHWEIZ/KEY_KLR [] abcdefg
3 /KRSCHWEIZ/ABSENDER [] \n
4 /KRSCHWEIZ/ABSENDER/ABSENDER_MELDER_ID [] 123456
5 /KRSCHWEIZ/ABSENDER/MELDER [] \n
6 /KRSCHWEIZ/ABSENDER/MELDER/MELDER_ID [] 123456
7 /KRSCHWEIZ/ABSENDER/MELDER/QUELLSYSTEM [] ABCDEF
8 /KRSCHWEIZ/ABSENDER/MELDER/PATIENT [] \n
9 /KRSCHWEIZ/ABSENDER/MELDER/PATIENT/REFERENZNR [] 987654
10 /KRSCHWEIZ/ABSENDER/MELDER/PATIENT/NACHNAME [] my name
11 /KRSCHWEIZ/ABSENDER/MELDER/PATIENT/VORNAMEN [] my first name
12 /KRSCHWEIZ/ABSENDER/MELDER/PATIENT/GEBURTSNAME [] None
13 /KRSCHWEIZ/ABSENDER/MELDER/PATIENT/GEBURTSDATUM [] my dob
14 /KRSCHWEIZ/ABSENDER/MELDER/PATIENT/GESCHLECHT [] XX
15 /KRSCHWEIZ/ABSENDER/MELDER/PATIENT/PLZ [] 9999
16 /KRSCHWEIZ/ABSENDER/MELDER/PATIENT/WOHNORT [] Mycity
17 /KRSCHWEIZ/ABSENDER/MELDER/PATIENT/STRASSE [] mystreet
18 /KRSCHWEIZ/ABSENDER/MELDER/PATIENT/HAUSNR [] 99
19 /KRSCHWEIZ/ABSENDER/MELDER/PATIENT/VERSICHERTENNR [] 999999999
20 /KRSCHWEIZ/ABSENDER/MELDER/PATIENT/DATEIEN [] \n
21 /KRSCHWEIZ/ABSENDER/MELDER/PATIENT/DATEIEN/DATEI [] \n
22 /KRSCHWEIZ/ABSENDER/MELDER/PATIENT/DATEIEN/DAT... [] my_attached_document.html
23 /KRSCHWEIZ/ABSENDER/MELDER/PATIENT/DATEIEN/DAT... [] mybase_64_encoded_document
I have a text file and would like to extract the <gml:pos>73664.300 836542.700</gml:pos> from it. More precisely I would like to get the GPS coordinate system [73664.300 836542.700] from the pos tag. The file contains multiple <wfs:member> and each of them has a <gml:pos> (deepest layer).
<?xml version='1.0' encoding='UTF-8'?>
<wfs:FeatureCollection xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/wfs/2.0 http://schemas.opengis.net/wfs/2.0/wfs.xsd http://www.opengis.net/gml/3.2 http://schemas.opengis.net/gml/3.2.1/gml.xsd http://www.deegree.org/app https://web.de/feature_descr?SERVICE=WFS&VERSION=2.0.0&REQUEST=DescribeFeatureType&OUTPUTFORMAT=application%2Fgml%2Bxml%3B+version%3D3.2&TYPENAME=app:lsa_data&NAMESPACES=xmlns(app,http%3A%2F%2Fwww.deegree.org%2Fapp)" xmlns:wfs="http://www.opengis.net/wfs/2.0" timeStamp="2020-11-18T15:01:17Z" xmlns:gml="http://www.opengis.net/gml/3.2" numberMatched="unknown" numberReturned="0">
<!--NOTE: numberReturned attribute should be 'unknown' as well, but this would not validate against the current version of the WFS 2.0 schema (change upcoming). See change request (CR 144): https://portal.opengeospatial.org/files?fact_id=6798.-->
<wfs:member>
<app:dat_set xmlns:app="http://www.deegree.org/app" gml:id="app:dat_set_1">
<app:point>2</app:point>
<app:art>K </app:art>
<app:L_Name>westt / woustest </app:L_Name>
<app:geom>
<!--Inlined geometry 'data_1_APP_GEOM'-->
<gml:MultiPoint gml:id="data_1_APP_GEOM" srsName="EPSG:25832">
<gml:pointMember>
<gml:Point gml:id="GEOMETRY_ad608059-f297-4554-8464-cdde248cb531" srsName="EPSG:25832">
<gml:pos>73664.300 836542.700</gml:pos>
</gml:Point>
</gml:pointMember>
</gml:MultiPoint>
</app:geom>
</app:lsa_pointdata>
</wfs:member>
<wfs:member>
<app:dat_set xmlns:app="http://www.deegree.org/app" gml:id="app:dat_set_2">
<app:point>3</app:point>
<app:art>K </app:art>
<app:L_Name>route / riztr </app:L_Name>
<app:geom>
<!--Inlined geometry 'data_2_APP_GEOM'-->
<gml:MultiPoint gml:id="data_2_APP_GEOM" srsName="EPSG:25832">
<gml:pointMember>
<gml:Point gml:id="GEOMETRY_440d8630-b674-4768-a5b7-3fab46d9ac8c" srsName="EPSG:25832">
<gml:pos>74354.900 837456.300</gml:pos>
</gml:Point>
</gml:pointMember>
</gml:MultiPoint>
</app:geom>
</app:lsa_pointdata>
</wfs:member>
<wfs:member>
...
...
How could I get those gps coordinates ?
Thank you in advance.
You can use lxml and XPATH.
data = b'''\
<?xml version='1.0' encoding='UTF-8'?>
<wfs:FeatureCollection xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/wfs/2.0 http://schemas.opengis.net/wfs/2.0/wfs.xsd http://www.opengis.net/gml/3.2 http://schemas.opengis.net/gml/3.2.1/gml.xsd http://www.deegree.org/app https://web.de/feature_descr?SERVICE=WFS&VERSION=2.0.0&REQUEST=DescribeFeatureType&OUTPUTFORMAT=application%2Fgml%2Bxml%3B+version%3D3.2&TYPENAME=app:lsa_data&NAMESPACES=xmlns(app,http%3A%2F%2Fwww.deegree.org%2Fapp)" xmlns:wfs="http://www.opengis.net/wfs/2.0" timeStamp="2020-11-18T15:01:17Z" xmlns:gml="http://www.opengis.net/gml/3.2" numberMatched="unknown" numberReturned="0">
<!--NOTE: numberReturned attribute should be 'unknown' as well, but this would not validate against the current version of the WFS 2.0 schema (change upcoming). See change request (CR 144): https://portal.opengeospatial.org/files?fact_id=6798.-->
<wfs:member>
<app:dat_set xmlns:app="http://www.deegree.org/app" gml:id="app:dat_set_1">
<app:point>2</app:point>
<app:art>K </app:art>
<app:L_Name>westt / woustest </app:L_Name>
<app:geom>
<!--Inlined geometry 'data_1_APP_GEOM'-->
<gml:MultiPoint gml:id="data_1_APP_GEOM" srsName="EPSG:25832">
<gml:pointMember>
<gml:Point gml:id="GEOMETRY_ad608059-f297-4554-8464-cdde248cb531" srsName="EPSG:25832">
<gml:pos>73664.300 836542.700</gml:pos>
</gml:Point>
</gml:pointMember>
</gml:MultiPoint>
</app:geom>
</app:dat_set>
</wfs:member>
<wfs:member>
<app:dat_set xmlns:app="http://www.deegree.org/app" gml:id="app:dat_set_2">
<app:point>3</app:point>
<app:art>K </app:art>
<app:L_Name>route / riztr </app:L_Name>
<app:geom>
<!--Inlined geometry 'data_2_APP_GEOM'-->
<gml:MultiPoint gml:id="data_2_APP_GEOM" srsName="EPSG:25832">
<gml:pointMember>
<gml:Point gml:id="GEOMETRY_440d8630-b674-4768-a5b7-3fab46d9ac8c" srsName="EPSG:25832">
<gml:pos>74354.900 837456.300</gml:pos>
</gml:Point>
</gml:pointMember>
</gml:MultiPoint>
</app:geom>
</app:dat_set>
</wfs:member>
</wfs:FeatureCollection>
'''
from lxml import etree
from io import BytesIO
f = BytesIO(data)
ns = {"gml": "http://www.opengis.net/gml/3.2"}
tree = etree.parse(f)
for e in tree.findall("//gml:pos", ns):
print(e.text)
I'm trying to analysis XML file with python. I ned to get xml data as a pandas data frame.
import pandas as pd
import xml.etree.ElementTree as et
def parse_XML(xml_file, df_cols):
xtree = et.parse(xml_file)
xroot = xtree.getroot()
rows = []
for node in xroot:
res = []
res.append(node.attrib.get(df_cols[0]))
for el in df_cols[1:]:
if node is not None and node.find(el) is not None:
res.append(node.find(el).text)
else:
res.append(None)
rows.append({df_cols[i]: res[i]
for i, _ in enumerate(df_cols)})
out_df = pd.DataFrame(rows, columns=df_cols)
return out_df
parse_XML('/Users/newuser/Desktop/TESTRATP/arrets.xml', ["Name","gml"])
But I'm getting below data frame.
Name gml
0 None None
1 None None
2 None None
My XML file is :
<?xml version="1.0" encoding="UTF-8"?>
<PublicationDelivery version="1.09:FR-NETEX_ARRET-2.1-1.0" xmlns="http://www.netex.org.uk/netex" xmlns:core="http://www.govtalk.gov.uk/core" xmlns:gml="http://www.opengis.net/gml/3.2" xmlns:ifopt="http://www.ifopt.org.uk/ifopt" xmlns:siri="http://www.siri.org.uk/siri" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.netex.org.uk/netex">
<PublicationTimestamp>2020-08-05T06:00:01+00:00</PublicationTimestamp>
<ParticipantRef>transport.data.gouv.fr</ParticipantRef>
<dataObjects>
<GeneralFrame id="FR:GeneralFrame:NETEX_ARRET:" version="any">
<members>
<Quay id="FR:Quay:zenbus_StopPoint_SP_351400003_LOC:" version="any">
<Name>ST FELICIEN - Centre</Name>
<Centroid>
<Location>
<gml:pos srsName="EPSG:2154">828054.2068251468 6444393.512041969</gml:pos>
</Location>
</Centroid>
<TransportMode>bus</TransportMode>
</Quay>
<Quay id="FR:Quay:zenbus_StopPoint_SP_361350004_LOC:" version="any">
<Name>ST FELICIEN - Chemin de Juny</Name>
<Centroid>
<Location>
<gml:pos srsName="EPSG:2154">828747.3172982805 6445226.100290826</gml:pos>
</Location>
</Centroid>
<TransportMode>bus</TransportMode>
</Quay>
<Quay id="FR:Quay:zenbus_StopPoint_SP_343500005_LOC:" version="any">
<Name>ST FELICIEN - Darone</Name>
<Centroid>
<Location>
<gml:pos srsName="EPSG:2154">829036.2709757038 6444724.878001894</gml:pos>
</Location>
</Centroid>
<TransportMode>bus</TransportMode>
</Quay>
<Quay id="FR:Quay:zenbus_StopPoint_SP_359440004_LOC:" version="any">
<Name>ST FELICIEN - Col de Fontayes</Name>
<Centroid>
<Location>
<gml:pos srsName="EPSG:2154">829504.7993360173 6445490.57188837</gml:pos>
</Location>
</Centroid>
<TransportMode>bus</TransportMode>
</Quay>
</members>
</GeneralFrame>
</dataObjects>
</PublicationDelivery>
I gave you here little part of my xml file. I can't give you full XML file as it exceeding the character limits in stackoverflow. I'm wondering why I got above output and I don't know where the my error is. As I'm new to this please some one can help me? Thank you
My approach is avoid xml parsing and switch straight into pandas by using xmlplain to generate JSON from xml.
import xmlplain
with open("so_sample.xml") as f: js = xmlplain.xml_to_obj(f, strip_space=True, fold_dict=True)
df1 = pd.json_normalize(js).explode("PublicationDelivery.dataObjects.GeneralFrame.members")
# cleanup column names...
df1 = df1.rename(columns={c:c.replace("PublicationDelivery.", "").replace("dataObjects.GeneralFrame.","").strip()
for c in df1.columns})
# drop spurious columns
df1 = df1.drop(columns=[c for c in df1.columns if c[0]=="#"])
# expand second level of dictionaries
df1 = pd.json_normalize(df1.to_dict(orient="records"))
# cleanup columns from second set of dictionaries
df1 = df1.rename(columns={c:c.replace("members.Quay.", "") for c in df1.columns})
# expand next list and dicts
df1 = pd.json_normalize(df1.explode("Centroid.Location.gml:pos").to_dict(orient="records"))
# there are some NaNs - dela with them
df1["Centroid.Location.gml:pos.#srsName"].fillna(method="ffill", inplace=True)
df1["Centroid.Location.gml:pos"].fillna(method="bfill", inplace=True)
# de-dup
df1 = df1.groupby("#id", as_index=False).first()
# more columns than requested... for SO output
print(df1.loc[:,["Name", "Centroid.Location.gml:pos.#srsName", "Centroid.Location.gml:pos"]].to_string(index=False))
output
Name Centroid.Location.gml:pos.#srsName Centroid.Location.gml:pos
ST FELICIEN - Darone EPSG:2154 829036.2709757038 6444724.878001894
ST FELICIEN - Centre EPSG:2154 828054.2068251468 6444393.512041969
ST FELICIEN - Col de Fontayes EPSG:2154 829504.7993360173 6445490.57188837
ST FELICIEN - Chemin de Juny EPSG:2154 828747.3172982805 6445226.100290826
Alternative solution using pandas-read-xml
pip install pandas-read-xml
import pandas_read_xml as pdx
from pandas_read_xml import fully_flatten
df = pdx.read_xml(xml, ['PublicationDelivery', 'dataObjects', 'GeneralFrame', 'members']).pipe(fully_flatten)
The list is just the tags that you want to navigate to as the "root".
You many need to clean the column names afterwards.
The below code sample works if there is only one node.
However, our use case we dont know how many nodes we will receive
Convert a xml to pandas data frame python
Sample as below.
How we can parse this into dataframe
In particular, we dont know how manby
we will received in the feed file
<?xml version = '1.0' encoding = 'UTF-8'?>
<EVENT spec="IDL:com/RfcCallEvents:1.0#Z_BAPI_UPDT_SERV_NOTIFICATION">
<eventHeader>
<objectName/>
<objectKey/>
<eventName/>
<eventId/>
</eventHeader>
<TAB_DETAIL_DATA>
<ZNEWFLAG>X</ZNEWFLAG>
<FENUM>2</FENUM>
<BAUTL>661-01727</BAUTL>
<OTEIL/>
<FECOD>KBB</FECOD>
<URCOD>B08</URCOD>
<ZCOMPMDF>A</ZCOMPMDF>
<ZOPREPL/>
<ZWRNCOV>LP</ZWRNCOV>
<ZWRNREF/>
<ZNEWPS>C07XMAAEJCLD</ZNEWPS>
<ZOLDPN/>
<ZOLDPD/>
<ZOLDPS>C07XMAACJCLD</ZOLDPS>
<MAILINFECOD/>
<ZUNITPR/>
<ZNEWPD/>
<ZNEWPN/>
<ZABUSE/>
<ZRPS>S</ZRPS>
<ZEXKGB/>
<ZKGBMM/>
<ZINSTS>000</ZINSTS>
<ZACKBB/>
<ZCHKOVR/>
<ZSNDB/>
<ZNOTAFISCAL/>
<ZCONSGMT/>
<ZPRTCONS/>
<ZZRTNTRNO/>
<ZZRTNCAR/>
<ZZINSPECT/>
<ZZPR_OPT/>
</TAB_DETAIL_DATA>
<TAB_DETAIL_DATA>
<ZNEWFLAG>X</ZNEWFLAG>
<FENUM>1</FENUM>
<BAUTL>661-01727</BAUTL>
<OTEIL/>
<FECOD>KBB</FECOD>
<URCOD>B08</URCOD>
<ZCOMPMDF>A</ZCOMPMDF>
<ZOPREPL/>
<ZWRNCOV>LP</ZWRNCOV>
<ZWRNREF/>
<ZNEWPS>C07XMAAEJCLD</ZNEWPS>
<ZOLDPN/>
<ZOLDPD/>
<ZOLDPS>C07XMAACJCLD</ZOLDPS>
<MAILINFECOD/>
<ZUNITPR/>
<ZNEWPD/>
<ZNEWPN/>
<ZABUSE/>
<ZRPS>S</ZRPS>
<ZEXKGB/>
<ZKGBMM/>
<ZINSTS>000</ZINSTS>
<ZACKBB/>
<ZCHKOVR/>
<ZSNDB/>
<ZNOTAFISCAL/>
<ZCONSGMT/>
<ZPRTCONS/>
<ZZRTNTRNO/>
<ZZRTNCAR/>
<ZZINSPECT/>
<ZZPR_OPT/>
</TAB_DETAIL_DATA>
<TAB_HEADER_DATA>
<QMNUM>030334920069</QMNUM>
<ZGSXREF>CONSUMER</ZGSXREF>
<ZVANTREF>G338005317</ZVANTREF>
<ZSHIPER/>
<ZSHPRNO/>
<ZRVREF/>
<ZTECHID>4HQ2OD6C19</ZTECHID>
<ZADREPAIR/>
<ZZKATR7/>
</TAB_HEADER_DATA>
</EVENT>
I suspect you need to parse xml-data to several dataframes, e.g. as follows:
import xmltodict # install this module first
data = """<?xml version = '1.0' encoding = 'UTF-8'?>
<EVENT spec="IDL:com/RfcCallEvents:1.0#Z_BAPI_UPDT_SERV_NOTIFICATION">
<eventHeader>
<objectName/>
<objectKey/>
<eventName/>
<eventId/>
</eventHeader>
<TAB_DETAIL_DATA>
<ZNEWFLAG>X</ZNEWFLAG>
<FENUM>2</FENUM>
<BAUTL>661-01727</BAUTL>
<OTEIL/>
<FECOD>KBB</FECOD>
<URCOD>B08</URCOD>
<ZCOMPMDF>A</ZCOMPMDF>
<ZOPREPL/>
<ZWRNCOV>LP</ZWRNCOV>
<ZWRNREF/>
<ZNEWPS>C07XMAAEJCLD</ZNEWPS>
<ZOLDPN/>
<ZOLDPD/>
<ZOLDPS>C07XMAACJCLD</ZOLDPS>
<MAILINFECOD/>
<ZUNITPR/>
<ZNEWPD/>
<ZNEWPN/>
<ZABUSE/>
<ZRPS>S</ZRPS>
<ZEXKGB/>
<ZKGBMM/>
<ZINSTS>000</ZINSTS>
<ZACKBB/>
<ZCHKOVR/>
<ZSNDB/>
<ZNOTAFISCAL/>
<ZCONSGMT/>
<ZPRTCONS/>
<ZZRTNTRNO/>
<ZZRTNCAR/>
<ZZINSPECT/>
<ZZPR_OPT/>
</TAB_DETAIL_DATA>
<TAB_DETAIL_DATA>
<ZNEWFLAG>X</ZNEWFLAG>
<FENUM>1</FENUM>
<BAUTL>661-01727</BAUTL>
<OTEIL/>
<FECOD>KBB</FECOD>
<URCOD>B08</URCOD>
<ZCOMPMDF>A</ZCOMPMDF>
<ZOPREPL/>
<ZWRNCOV>LP</ZWRNCOV>
<ZWRNREF/>
<ZNEWPS>C07XMAAEJCLD</ZNEWPS>
<ZOLDPN/>
<ZOLDPD/>
<ZOLDPS>C07XMAACJCLD</ZOLDPS>
<MAILINFECOD/>
<ZUNITPR/>
<ZNEWPD/>
<ZNEWPN/>
<ZABUSE/>
<ZRPS>S</ZRPS>
<ZEXKGB/>
<ZKGBMM/>
<ZINSTS>000</ZINSTS>
<ZACKBB/>
<ZCHKOVR/>
<ZSNDB/>
<ZNOTAFISCAL/>
<ZCONSGMT/>
<ZPRTCONS/>
<ZZRTNTRNO/>
<ZZRTNCAR/>
<ZZINSPECT/>
<ZZPR_OPT/>
</TAB_DETAIL_DATA>
<TAB_HEADER_DATA>
<QMNUM>030334920069</QMNUM>
<ZGSXREF>CONSUMER</ZGSXREF>
<ZVANTREF>G338005317</ZVANTREF>
<ZSHIPER/>
<ZSHPRNO/>
<ZRVREF/>
<ZTECHID>4HQ2OD6C19</ZTECHID>
<ZADREPAIR/>
<ZZKATR7/>
</TAB_HEADER_DATA>
</EVENT>"""
dct = xmltodict.parse(data)
def make_df(name="TAB_DETAIL_DATA", dct=dct):
df = pd.DataFrame()
if isinstance(dct['EVENT'][name], list):
for j in dct['EVENT'][name]:
_ = pd.DataFrame({'value': [y for x, y in j.items()]}, index=j.keys())
df = pd.concat([df, _])
else:
df = pd.DataFrame({'value': [y for x, y in dct['EVENT'][name].items()]}, index=dct['EVENT'][name].keys())
return df
Now, you can experiment with the parser:
make_df(name="TAB_HEADER_DATA") # produces single df
make_df(name="TAB_DETAIL_DATA") # concatenates all content occurred in TAB_DETAIL_DATA sections, returns single df
I am trying to merge two XML files using ElementTree module. Following are the XMLs:
a.xml:
<?xml version="1.0"?>
<ListOrdersResponse xmlns="https://mws.amazonservices.com/Orders/2013-09-01">
<ListOrdersResult>
<NextToken>token</NextToken>
<CreatedBefore>2014-10-07T08:13:11Z</CreatedBefore>
<Orders>
<Order>
<AmazonOrderId>12345</AmazonOrderId>
<SellerOrderId>R12345</SellerOrderId>
<PurchaseDate>2014-10-02T14:40:37Z</PurchaseDate>
<LastUpdateDate>2014-10-03T09:47:02Z</LastUpdateDate>
<OrderStatus>Shipped</OrderStatus>
<FulfillmentChannel>MFN</FulfillmentChannel>
<SalesChannel>Amazon.in</SalesChannel>
<ShipServiceLevel>IN Exp Dom 2</ShipServiceLevel>
<ShippingAddress>
<Name>name</Name>
<AddressLine1>line1</AddressLine1>
<AddressLine2>line2</AddressLine2>
<City>Pune</City>
<StateOrRegion>Maharashtra</StateOrRegion>
<PostalCode>411027</PostalCode>
<CountryCode>IN</CountryCode>
<Phone>123456789</Phone>
</ShippingAddress>
<OrderTotal>
<CurrencyCode>INR</CurrencyCode>
<Amount>520.00</Amount>
</OrderTotal>
<NumberOfItemsShipped>1</NumberOfItemsShipped>
<NumberOfItemsUnshipped>0</NumberOfItemsUnshipped>
<PaymentExecutionDetail/>
<PaymentMethod>Other</PaymentMethod>
<MarketplaceId>mid</MarketplaceId>
<BuyerEmail>email#buyer.com</BuyerEmail>
<BuyerName>name</BuyerName>
<ShipmentServiceLevelCategory>Expedited</ShipmentServiceLevelCategory>
<ShippedByAmazonTFM>false</ShippedByAmazonTFM>
<TFMShipmentStatus>Delivered</TFMShipmentStatus>
<OrderType>StandardOrder</OrderType>
<EarliestShipDate>2014-10-05T18:30:00Z</EarliestShipDate>
<LatestShipDate>2014-10-07T18:29:59Z</LatestShipDate>
<EarliestDeliveryDate>2014-10-07T18:30:00Z</EarliestDeliveryDate>
<LatestDeliveryDate>2014-10-11T18:29:59Z</LatestDeliveryDate>
</Order>
</Orders>
</ListOrdersResult>
</ListOrdersResponse>
b.xml:
<?xml version="1.0"?>
<ListOrdersByNextTokenResponse xmlns="https://mws.amazonservices.com/Orders/2013-09-01">
<ListOrdersByNextTokenResult>
<NextToken>token1</NextToken>
<CreatedBefore>2014-10-07T08:13:11Z</CreatedBefore>
<Orders>
<Order>
<AmazonOrderId>oid1</AmazonOrderId>
<PurchaseDate>2014-10-04T13:37:41Z</PurchaseDate>
<LastUpdateDate>2014-10-06T09:52:21Z</LastUpdateDate>
<OrderStatus>Shipped</OrderStatus>
<FulfillmentChannel>MFN</FulfillmentChannel>
<SalesChannel>Amazon.in</SalesChannel>
<ShipServiceLevel>IN Std Dom 2_50k_cod</ShipServiceLevel>
<ShippingAddress>
<Name>name1</Name>
<AddressLine1>line1-1</AddressLine1>
<AddressLine2>line2-1</AddressLine2>
<City>WADHVANCITY,SURENDRANAGAR</City>
<StateOrRegion>Gujarat</StateOrRegion>
<PostalCode>363035</PostalCode>
<CountryCode>IN</CountryCode>
<Phone>987654321</Phone>
</ShippingAddress>
<OrderTotal>
<CurrencyCode>INR</CurrencyCode>
<Amount>242.00</Amount>
</OrderTotal>
<NumberOfItemsShipped>1</NumberOfItemsShipped>
<NumberOfItemsUnshipped>0</NumberOfItemsUnshipped>
<PaymentExecutionDetail/>
<PaymentMethod>Other</PaymentMethod>
<MarketplaceId>mid1</MarketplaceId>
<BuyerEmail>email1#buyer.com</BuyerEmail>
<BuyerName>name1</BuyerName>
<ShipmentServiceLevelCategory>Standard</ShipmentServiceLevelCategory>
<ShippedByAmazonTFM>false</ShippedByAmazonTFM>
<TFMShipmentStatus>PendingPickUp</TFMShipmentStatus>
<OrderType>StandardOrder</OrderType>
<EarliestShipDate>2014-10-05T18:30:00Z</EarliestShipDate>
<LatestShipDate>2014-10-07T18:29:59Z</LatestShipDate>
<EarliestDeliveryDate>2014-10-09T18:30:00Z</EarliestDeliveryDate>
<LatestDeliveryDate>2014-10-15T18:29:59Z</LatestDeliveryDate>
</Order>
</Orders>
</ListOrdersByNextTokenResult>
</ListOrdersByNextTokenResponse>
I want to add the elements inside Orders elemnt in b.xml to that of a.xml
So, the expected output is:
<?xml version="1.0"?>
<ListOrdersResponse xmlns="https://mws.amazonservices.com/Orders/2013-09-01">
<ListOrdersResult>
<NextToken>token</NextToken>
<CreatedBefore>2014-10-07T08:13:11Z</CreatedBefore>
<Orders>
<Order>
<AmazonOrderId>12345</AmazonOrderId>
<SellerOrderId>R12345</SellerOrderId>
<PurchaseDate>2014-10-02T14:40:37Z</PurchaseDate>
<LastUpdateDate>2014-10-03T09:47:02Z</LastUpdateDate>
<OrderStatus>Shipped</OrderStatus>
<FulfillmentChannel>MFN</FulfillmentChannel>
<SalesChannel>Amazon.in</SalesChannel>
<ShipServiceLevel>IN Exp Dom 2</ShipServiceLevel>
<ShippingAddress>
<Name>name</Name>
<AddressLine1>line1</AddressLine1>
<AddressLine2>line2</AddressLine2>
<City>Pune</City>
<StateOrRegion>Maharashtra</StateOrRegion>
<PostalCode>411027</PostalCode>
<CountryCode>IN</CountryCode>
<Phone>123456789</Phone>
</ShippingAddress>
<OrderTotal>
<CurrencyCode>INR</CurrencyCode>
<Amount>520.00</Amount>
</OrderTotal>
<NumberOfItemsShipped>1</NumberOfItemsShipped>
<NumberOfItemsUnshipped>0</NumberOfItemsUnshipped>
<PaymentExecutionDetail/>
<PaymentMethod>Other</PaymentMethod>
<MarketplaceId>mid</MarketplaceId>
<BuyerEmail>email#buyer.com</BuyerEmail>
<BuyerName>name</BuyerName>
<ShipmentServiceLevelCategory>Expedited</ShipmentServiceLevelCategory>
<ShippedByAmazonTFM>false</ShippedByAmazonTFM>
<TFMShipmentStatus>Delivered</TFMShipmentStatus>
<OrderType>StandardOrder</OrderType>
<EarliestShipDate>2014-10-05T18:30:00Z</EarliestShipDate>
<LatestShipDate>2014-10-07T18:29:59Z</LatestShipDate>
<EarliestDeliveryDate>2014-10-07T18:30:00Z</EarliestDeliveryDate>
<LatestDeliveryDate>2014-10-11T18:29:59Z</LatestDeliveryDate>
</Order>
<Order>
<AmazonOrderId>oid1</AmazonOrderId>
<PurchaseDate>2014-10-04T13:37:41Z</PurchaseDate>
<LastUpdateDate>2014-10-06T09:52:21Z</LastUpdateDate>
<OrderStatus>Shipped</OrderStatus>
<FulfillmentChannel>MFN</FulfillmentChannel>
<SalesChannel>Amazon.in</SalesChannel>
<ShipServiceLevel>IN Std Dom 2_50k_cod</ShipServiceLevel>
<ShippingAddress>
<Name>name1</Name>
<AddressLine1>line1-1</AddressLine1>
<AddressLine2>line2-1</AddressLine2>
<City>WADHVANCITY,SURENDRANAGAR</City>
<StateOrRegion>Gujarat</StateOrRegion>
<PostalCode>363035</PostalCode>
<CountryCode>IN</CountryCode>
<Phone>987654321</Phone>
</ShippingAddress>
<OrderTotal>
<CurrencyCode>INR</CurrencyCode>
<Amount>242.00</Amount>
</OrderTotal>
<NumberOfItemsShipped>1</NumberOfItemsShipped>
<NumberOfItemsUnshipped>0</NumberOfItemsUnshipped>
<PaymentExecutionDetail/>
<PaymentMethod>Other</PaymentMethod>
<MarketplaceId>mid1</MarketplaceId>
<BuyerEmail>email1#buyer.com</BuyerEmail>
<BuyerName>name1</BuyerName>
<ShipmentServiceLevelCategory>Standard</ShipmentServiceLevelCategory>
<ShippedByAmazonTFM>false</ShippedByAmazonTFM>
<TFMShipmentStatus>PendingPickUp</TFMShipmentStatus>
<OrderType>StandardOrder</OrderType>
<EarliestShipDate>2014-10-05T18:30:00Z</EarliestShipDate>
<LatestShipDate>2014-10-07T18:29:59Z</LatestShipDate>
<EarliestDeliveryDate>2014-10-09T18:30:00Z</EarliestDeliveryDate>
<LatestDeliveryDate>2014-10-15T18:29:59Z</LatestDeliveryDate>
</Order>
</Orders>
</ListOrdersResult>
</ListOrdersResponse>
I tried:
import xml.etree.ElementTree as ET
import os
import shlex
import subprocess
tree = ET.parse("a.xml")
root = tree.getroot()
combined_xml = root
namespaces = {'resp': 'https://mws.amazonservices.com/Orders/2013-09-01'}
results = combined_xml.find("resp:ListOrdersResult", namespaces=namespaces)
insertion_point = results.find("resp:Orders", namespaces=namespaces)
tree1 = ET.parse("b.xml")
root1 = tree1.getroot()
results1 = root1.find("resp:ListOrdersByNextTokenResult", namespaces=namespaces)
order_array1 = results1.find("resp:Orders", namespaces=namespaces)
for order in order_array1:
insertion_point.extend(order)
print ET.tostring(combined_xml)
But I am getting the following output:
<ns0:ListOrdersResponse xmlns:ns0="https://mws.amazonservices.com/Orders/2013-09-01">
<ns0:ListOrdersResult>
<ns0:NextToken>token</ns0:NextToken>
<ns0:CreatedBefore>2014-10-07T08:13:11Z</ns0:CreatedBefore>
<ns0:Orders>
<ns0:Order>
<ns0:AmazonOrderId>12345</ns0:AmazonOrderId>
<ns0:SellerOrderId>R12345</ns0:SellerOrderId>
<ns0:PurchaseDate>2014-10-02T14:40:37Z</ns0:PurchaseDate>
<ns0:LastUpdateDate>2014-10-03T09:47:02Z</ns0:LastUpdateDate>
<ns0:OrderStatus>Shipped</ns0:OrderStatus>
<ns0:FulfillmentChannel>MFN</ns0:FulfillmentChannel>
<ns0:SalesChannel>Amazon.in</ns0:SalesChannel>
<ns0:ShipServiceLevel>IN Exp Dom 2</ns0:ShipServiceLevel>
<ns0:ShippingAddress>
<ns0:Name>name</ns0:Name>
<ns0:AddressLine1>line1</ns0:AddressLine1>
<ns0:AddressLine2>line2</ns0:AddressLine2>
<ns0:City>Pune</ns0:City>
<ns0:StateOrRegion>Maharashtra</ns0:StateOrRegion>
<ns0:PostalCode>411027</ns0:PostalCode>
<ns0:CountryCode>IN</ns0:CountryCode>
<ns0:Phone>123456789</ns0:Phone>
</ns0:ShippingAddress>
<ns0:OrderTotal>
<ns0:CurrencyCode>INR</ns0:CurrencyCode>
<ns0:Amount>520.00</ns0:Amount>
</ns0:OrderTotal>
<ns0:NumberOfItemsShipped>1</ns0:NumberOfItemsShipped>
<ns0:NumberOfItemsUnshipped>0</ns0:NumberOfItemsUnshipped>
<ns0:PaymentExecutionDetail />
<ns0:PaymentMethod>Other</ns0:PaymentMethod>
<ns0:MarketplaceId>mid</ns0:MarketplaceId>
<ns0:BuyerEmail>email#buyer.com</ns0:BuyerEmail>
<ns0:BuyerName>name</ns0:BuyerName>
<ns0:ShipmentServiceLevelCategory>Expedited</ns0:ShipmentServiceLevelCategory>
<ns0:ShippedByAmazonTFM>false</ns0:ShippedByAmazonTFM>
<ns0:TFMShipmentStatus>Delivered</ns0:TFMShipmentStatus>
<ns0:OrderType>StandardOrder</ns0:OrderType>
<ns0:EarliestShipDate>2014-10-05T18:30:00Z</ns0:EarliestShipDate>
<ns0:LatestShipDate>2014-10-07T18:29:59Z</ns0:LatestShipDate>
<ns0:EarliestDeliveryDate>2014-10-07T18:30:00Z</ns0:EarliestDeliveryDate>
<ns0:LatestDeliveryDate>2014-10-11T18:29:59Z</ns0:LatestDeliveryDate>
</ns0:Order>
<ns0:AmazonOrderId>oid1</ns0:AmazonOrderId>
<ns0:PurchaseDate>2014-10-04T13:37:41Z</ns0:PurchaseDate>
<ns0:LastUpdateDate>2014-10-06T09:52:21Z</ns0:LastUpdateDate>
<ns0:OrderStatus>Shipped</ns0:OrderStatus>
<ns0:FulfillmentChannel>MFN</ns0:FulfillmentChannel>
<ns0:SalesChannel>Amazon.in</ns0:SalesChannel>
<ns0:ShipServiceLevel>IN Std Dom 2_50k_cod</ns0:ShipServiceLevel>
<ns0:ShippingAddress>
<ns0:Name>name1</ns0:Name>
<ns0:AddressLine1>line1-1</ns0:AddressLine1>
<ns0:AddressLine2>line2-1</ns0:AddressLine2>
<ns0:City>WADHVANCITY,SURENDRANAGAR</ns0:City>
<ns0:StateOrRegion>Gujarat</ns0:StateOrRegion>
<ns0:PostalCode>363035</ns0:PostalCode>
<ns0:CountryCode>IN</ns0:CountryCode>
<ns0:Phone>987654321</ns0:Phone>
</ns0:ShippingAddress>
<ns0:OrderTotal>
<ns0:CurrencyCode>INR</ns0:CurrencyCode>
<ns0:Amount>242.00</ns0:Amount>
</ns0:OrderTotal>
<ns0:NumberOfItemsShipped>1</ns0:NumberOfItemsShipped>
<ns0:NumberOfItemsUnshipped>0</ns0:NumberOfItemsUnshipped>
<ns0:PaymentExecutionDetail />
<ns0:PaymentMethod>Other</ns0:PaymentMethod>
<ns0:MarketplaceId>mid1</ns0:MarketplaceId>
<ns0:BuyerEmail>email1#byer.com</ns0:BuyerEmail>
<ns0:BuyerName>name1</ns0:BuyerName>
<ns0:ShipmentServiceLevelCategory>Standard</ns0:ShipmentServiceLevelCategory>
<ns0:ShippedByAmazonTFM>false</ns0:ShippedByAmazonTFM>
<ns0:TFMShipmentStatus>PendingPickUp</ns0:TFMShipmentStatus>
<ns0:OrderType>StandardOrder</ns0:OrderType>
<ns0:EarliestShipDate>2014-10-05T18:30:00Z</ns0:EarliestShipDate>
<ns0:LatestShipDate>2014-10-07T18:29:59Z</ns0:LatestShipDate>
<ns0:EarliestDeliveryDate>2014-10-09T18:30:00Z</ns0:EarliestDeliveryDate>
<ns0:LatestDeliveryDate>2014-10-15T18:29:59Z</ns0:LatestDeliveryDate>
</ns0:Orders>
</ns0:ListOrdersResult>
</ns0:ListOrdersResponse>
Why am I getting ns0? Also, <Order> tag is missing for the second order. How can I get the desired output without ns0. I am ok with suggestions for using another module if it makes life easier.:)
Thanks
ns0 means 'namespace 0' - it's a result of your namespace dict and "resp:tagname" terms.
I'd really recommend using beautifulsoup4 for this, though - it's much nicer for working with xml:
from bs4 import BeautifulSoup
soup = BeautifulSoup(open('a.xml'))
insertion_point = soup.listordersresult.orders
orders_b = BeautifulSoup(open('b.xml')).listordersbynexttokenresult.orders
# could probably just be orders_b = BeautifulSoup(open('b.xml'))
orders_to_insert = orders_b.find_all('order')
for order in orders_to_insert:
insertion_point.append(order)
print(soup)
import xml.etree.ElementTree as ET
from StringIO import StringIO
namespaces = {'resp': 'https://mws.amazonservices.com/Orders/2013-09-01'}
tree = ET.parse("a.xml")
root = tree.getroot()
results = root.find("resp:ListOrdersResult", namespaces=namespaces)
order_array = results.find("resp:Orders", namespaces=namespaces).getchildren()
tree1 = ET.parse("b.xml")
root1 = tree1.getroot()
results1 = root1.find("resp:ListOrdersByNextTokenResult", namespaces=namespaces)
order_array1 = results1.find("resp:Orders", namespaces=namespaces).getchildren()
for order in order_array1:
order_array.append(order)
tree.write("temp.xml")
correct_data = open("temp.xml").read().replace('ns0:', '').replace(':ns0','')
filewrite = open("combined.xml", 'w')
filewrite.write(correct_data)
filewrite.close()