I have a CSV that has been returned and the data is in a god awful state, I need to parse both the header and then the data out from each row.
This is an example of one row:
+--------------+------------+--------------------+--------------+------------+-------------+--------------------+----------+--------------+----------+----------+-----------+-------------+-------------+----------+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+--------------------+--------------+----------+------------+----------+--------------+---------------+
| _c0| _c1| _c2| _c3| _c4| _c5| _c6| _c7| _c8| _c9| _c10| _c11| _c12| _c13| _c14| _c15| _c16| _c17| _c18| _c19| _c20| _c21| _c22| _c23| _c24| _c25| _c26| _c27| _c28| _c29| _c30| _c31| _c32| _c33| _c34| _c35| _c36| _c37| _c38| _c39| _c40| _c41| _c42| _c43| _c44| _c45| _c46| _c47| _c48| _c49| _c50| _c51| _c52| _c53| _c54| _c55| _c56| _c57| _c58| _c59| _c60| _c61| _c62| _c63| _c64| _c65| _c66| _c67| _c68| _c69| _c70| _c71| _c72| _c73| _c74| _c75| _c76| _c77| _c78| _c79| _c80| _c81| _c82| _c83| _c84| _c85| _c86| _c87| _c88| _c89| _c90| _c91| _c92| _c93| _c94| _c95| _c96| _c97| _c98| _c99| _c100| _c101| _c102| _c103| _c104| _c105| _c106| _c107| _c108| _c109| _c110| _c111| _c112| _c113| _c114| _c115| _c116| _c117| _c118| _c119| _c120| _c121| _c122| _c123| _c124| _c125| _c126| _c127| _c128| _c129| _c130| _c131| _c132| _c133| _c134| _c135| _c136| _c137| _c138| _c139| _c140| _c141| _c142| _c143| _c144| _c145| _c146| _c147| _c148| _c149| _c150|
+--------------+------------+--------------------+--------------+------------+-------------+--------------------+----------+--------------+----------+----------+-----------+-------------+-------------+----------+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+--------------------+--------------+----------+------------+----------+--------------+---------------+
|{"MANDT":"400"|"LEDNR":"00"|"OBJNR":"KS660000...|"GJAHR":"2022"|"WRTTP":"04"|"VERSN":"000"|"KSTAR":"0051040100"|"HRKFT":""|"VRGNG":"COIN"|"VBUND":""|"PARGB":""|"BEKNZ":"H"|"TWAER":"THB"|"PERBL":"016"|"MEINH":""|"WTG001":-1854554.89|"WTG002":0.00|"WTG003":0.00|"WTG004":0.00|"WTG005":0.00|"WTG006":0.00|"WTG007":0.00|"WTG008":0.00|"WTG009":0.00|"WTG010":0.00|"WTG011":0.00|"WTG012":0.00|"WTG013":0.00|"WTG014":0.00|"WTG015":0.00|"WTG016":0.00|"WOG001":-1854554.89|"WOG002":0.00|"WOG003":0.00|"WOG004":0.00|"WOG005":0.00|"WOG006":0.00|"WOG007":0.00|"WOG008":0.00|"WOG009":0.00|"WOG010":0.00|"WOG011":0.00|"WOG012":0.00|"WOG013":0.00|"WOG014":0.00|"WOG015":0.00|"WOG016":0.00|"WKG001":-1854554.89|"WKG002":0.00|"WKG003":0.00|"WKG004":0.00|"WKG005":0.00|"WKG006":0.00|"WKG007":0.00|"WKG008":0.00|"WKG009":0.00|"WKG010":0.00|"WKG011":0.00|"WKG012":0.00|"WKG013":0.00|"WKG014":0.00|"WKG015":0.00|"WKG016":0.00|"WKF001":0.00|"WKF002":0.00|"WKF003":0.00|"WKF004":0.00|"WKF005":0.00|"WKF006":0.00|"WKF007":0.00|"WKF008":0.00|"WKF009":0.00|"WKF010":0.00|"WKF011":0.00|"WKF012":0.00|"WKF013":0.00|"WKF014":0.00|"WKF015":0.00|"WKF016":0.00|"PAG001":0.00|"PAG002":0.00|"PAG003":0.00|"PAG004":0.00|"PAG005":0.00|"PAG006":0.00|"PAG007":0.00|"PAG008":0.00|"PAG009":0.00|"PAG010":0.00|"PAG011":0.00|"PAG012":0.00|"PAG013":0.00|"PAG014":0.00|"PAG015":0.00|"PAG016":0.00|"MEG001":0.000|"MEG002":0.000|"MEG003":0.000|"MEG004":0.000|"MEG005":0.000|"MEG006":0.000|"MEG007":0.000|"MEG008":0.000|"MEG009":0.000|"MEG010":0.000|"MEG011":0.000|"MEG012":0.000|"MEG013":0.000|"MEG014":0.000|"MEG015":0.000|"MEG016":0.000|"MEF001":0.000|"MEF002":0.000|"MEF003":0.000|"MEF004":0.000|"MEF005":0.000|"MEF006":0.000|"MEF007":0.000|"MEF008":0.000|"MEF009":0.000|"MEF010":0.000|"MEF011":0.000|"MEF012":0.000|"MEF013":0.000|"MEF014":0.000|"MEF015":0.000|"MEF016":0.000|"MUV001":""|"MUV002":""|"MUV003":""|"MUV004":""|"MUV005":""|"MUV006":""|"MUV007":""|"MUV008":""|"MUV009":""|"MUV010":""|"MUV011":""|"MUV012":""|"MUV013":""|"MUV014":""|"MUV015":""|"MUV016":""|"BELTP":"1"|"TIMESTMP":101246...|"BUKRS":"6611"|"FKBER":""|"SEGMENT":""|"GEBER":""|"GRANT_NBR":""|"BUDGET_PD":""}|
+--------------+------------+--------------------+--------------+------------+-------------+--------------------+----------+--------------+----------+----------+-----------+-------------+-------------+----------+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+--------------------+--------------+----------+------------+----------+--------------+---------------+
The first part for example MANDT is the column header and the bit after the : is the value. I basically need to
A) Loop all the columns and change the headers so they relate to the bit prior to the :
B) then populate the rows with the second part after.
I've attempted a small piece of code just to edit all the columns like below
from pyspark.sql.functions import split
for colname in COSPDF.columns:
print(colname)
COSPDF = COSPDF.withColumn(col(colname), lower(colname))
and I receive an error TypeError: 'str' object is not callable
I've then done the "lazy" thing and found some code like below
from pyspark.sql.functions import split
split_df = COSPDF.select(split(COSPDF._c0, ':').alias('split_text'))
split_df.selectExpr("split_text[0] as left").show() # left of delim
split_df.selectExpr("split_text[1] as right").show() # right of delim
However this code only works one column that I have to "specify" which doesn't work when the CSV has 123 columns, I'm not doing it 123 times. Any assistance would really help with this please, it's had me stuck for hours.
UPDATED
Some rows from the original file:
"{""MANDT"":""400""","""LEDNR"":""00""","""OBJNR"":""KS66000011001070""","""GJAHR"":""2022""","""WRTTP"":""04""","""VERSN"":""000""","""KSTAR"":""0051040100""","""HRKFT"":""""","""VRGNG"":""COIN""","""VBUND"":""""","""PARGB"":""""","""BEKNZ"":""H""","""TWAER"":""THB""","""PERBL"":""016""","""MEINH"":""""","""WTG001"":-1854554.89","""WTG002"":0.00","""WTG003"":0.00","""WTG004"":0.00","""WTG005"":0.00","""WTG006"":0.00","""WTG007"":0.00","""WTG008"":0.00","""WTG009"":0.00","""WTG010"":0.00","""WTG011"":0.00","""WTG012"":0.00","""WTG013"":0.00","""WTG014"":0.00","""WTG015"":0.00","""WTG016"":0.00","""WOG001"":-1854554.89","""WOG002"":0.00","""WOG003"":0.00","""WOG004"":0.00","""WOG005"":0.00","""WOG006"":0.00","""WOG007"":0.00","""WOG008"":0.00","""WOG009"":0.00","""WOG010"":0.00","""WOG011"":0.00","""WOG012"":0.00","""WOG013"":0.00","""WOG014"":0.00","""WOG015"":0.00","""WOG016"":0.00","""WKG001"":-1854554.89","""WKG002"":0.00","""WKG003"":0.00","""WKG004"":0.00","""WKG005"":0.00","""WKG006"":0.00","""WKG007"":0.00","""WKG008"":0.00","""WKG009"":0.00","""WKG010"":0.00","""WKG011"":0.00","""WKG012"":0.00","""WKG013"":0.00","""WKG014"":0.00","""WKG015"":0.00","""WKG016"":0.00","""WKF001"":0.00","""WKF002"":0.00","""WKF003"":0.00","""WKF004"":0.00","""WKF005"":0.00","""WKF006"":0.00","""WKF007"":0.00","""WKF008"":0.00","""WKF009"":0.00","""WKF010"":0.00","""WKF011"":0.00","""WKF012"":0.00","""WKF013"":0.00","""WKF014"":0.00","""WKF015"":0.00","""WKF016"":0.00","""PAG001"":0.00","""PAG002"":0.00","""PAG003"":0.00","""PAG004"":0.00","""PAG005"":0.00","""PAG006"":0.00","""PAG007"":0.00","""PAG008"":0.00","""PAG009"":0.00","""PAG010"":0.00","""PAG011"":0.00","""PAG012"":0.00","""PAG013"":0.00","""PAG014"":0.00","""PAG015"":0.00","""PAG016"":0.00","""MEG001"":0.000","""MEG002"":0.000","""MEG003"":0.000","""MEG004"":0.000","""MEG005"":0.000","""MEG006"":0.000","""MEG007"":0.000","""MEG008"":0.000","""MEG009"":0.000","""MEG010"":0.000","""MEG011"":0.000","""MEG012"":0.000","""MEG013"":0.000","""MEG014"":0.000","""MEG015"":0.000","""MEG016"":0.000","""MEF001"":0.000","""MEF002"":0.000","""MEF003"":0.000","""MEF004"":0.000","""MEF005"":0.000","""MEF006"":0.000","""MEF007"":0.000","""MEF008"":0.000","""MEF009"":0.000","""MEF010"":0.000","""MEF011"":0.000","""MEF012"":0.000","""MEF013"":0.000","""MEF014"":0.000","""MEF015"":0.000","""MEF016"":0.000","""MUV001"":""""","""MUV002"":""""","""MUV003"":""""","""MUV004"":""""","""MUV005"":""""","""MUV006"":""""","""MUV007"":""""","""MUV008"":""""","""MUV009"":""""","""MUV010"":""""","""MUV011"":""""","""MUV012"":""""","""MUV013"":""""","""MUV014"":""""","""MUV015"":""""","""MUV016"":""""","""BELTP"":""1""","""TIMESTMP"":10124662650000.0","""BUKRS"":""6611""","""FKBER"":""""","""SEGMENT"":""""","""GEBER"":""""","""GRANT_NBR"":""""","""BUDGET_PD"":""""}"
"{""MANDT"":""400""","""LEDNR"":""00""","""OBJNR"":""KS66000011001070""","""GJAHR"":""2022""","""WRTTP"":""04""","""VERSN"":""000""","""KSTAR"":""0051040100""","""HRKFT"":""""","""VRGNG"":""COIN""","""VBUND"":""""","""PARGB"":""""","""BEKNZ"":""S""","""TWAER"":""THB""","""PERBL"":""016""","""MEINH"":""""","""WTG001"":7424891.07","""WTG002"":0.00","""WTG003"":0.00","""WTG004"":0.00","""WTG005"":0.00","""WTG006"":0.00","""WTG007"":0.00","""WTG008"":0.00","""WTG009"":0.00","""WTG010"":0.00","""WTG011"":0.00","""WTG012"":0.00","""WTG013"":0.00","""WTG014"":0.00","""WTG015"":0.00","""WTG016"":0.00","""WOG001"":7424891.07","""WOG002"":0.00","""WOG003"":0.00","""WOG004"":0.00","""WOG005"":0.00","""WOG006"":0.00","""WOG007"":0.00","""WOG008"":0.00","""WOG009"":0.00","""WOG010"":0.00","""WOG011"":0.00","""WOG012"":0.00","""WOG013"":0.00","""WOG014"":0.00","""WOG015"":0.00","""WOG016"":0.00","""WKG001"":7424891.07","""WKG002"":0.00","""WKG003"":0.00","""WKG004"":0.00","""WKG005"":0.00","""WKG006"":0.00","""WKG007"":0.00","""WKG008"":0.00","""WKG009"":0.00","""WKG010"":0.00","""WKG011"":0.00","""WKG012"":0.00","""WKG013"":0.00","""WKG014"":0.00","""WKG015"":0.00","""WKG016"":0.00","""WKF001"":0.00","""WKF002"":0.00","""WKF003"":0.00","""WKF004"":0.00","""WKF005"":0.00","""WKF006"":0.00","""WKF007"":0.00","""WKF008"":0.00","""WKF009"":0.00","""WKF010"":0.00","""WKF011"":0.00","""WKF012"":0.00","""WKF013"":0.00","""WKF014"":0.00","""WKF015"":0.00","""WKF016"":0.00","""PAG001"":0.00","""PAG002"":0.00","""PAG003"":0.00","""PAG004"":0.00","""PAG005"":0.00","""PAG006"":0.00","""PAG007"":0.00","""PAG008"":0.00","""PAG009"":0.00","""PAG010"":0.00","""PAG011"":0.00","""PAG012"":0.00","""PAG013"":0.00","""PAG014"":0.00","""PAG015"":0.00","""PAG016"":0.00","""MEG001"":0.000","""MEG002"":0.000","""MEG003"":0.000","""MEG004"":0.000","""MEG005"":0.000","""MEG006"":0.000","""MEG007"":0.000","""MEG008"":0.000","""MEG009"":0.000","""MEG010"":0.000","""MEG011"":0.000","""MEG012"":0.000","""MEG013"":0.000","""MEG014"":0.000","""MEG015"":0.000","""MEG016"":0.000","""MEF001"":0.000","""MEF002"":0.000","""MEF003"":0.000","""MEF004"":0.000","""MEF005"":0.000","""MEF006"":0.000","""MEF007"":0.000","""MEF008"":0.000","""MEF009"":0.000","""MEF010"":0.000","""MEF011"":0.000","""MEF012"":0.000","""MEF013"":0.000","""MEF014"":0.000","""MEF015"":0.000","""MEF016"":0.000","""MUV001"":""""","""MUV002"":""""","""MUV003"":""""","""MUV004"":""""","""MUV005"":""""","""MUV006"":""""","""MUV007"":""""","""MUV008"":""""","""MUV009"":""""","""MUV010"":""""","""MUV011"":""""","""MUV012"":""""","""MUV013"":""""","""MUV014"":""""","""MUV015"":""""","""MUV016"":""""","""BELTP"":""1""","""TIMESTMP"":10160936750000.0","""BUKRS"":""6611""","""FKBER"":""""","""SEGMENT"":""""","""GEBER"":""""","""GRANT_NBR"":""""","""BUDGET_PD"":""""}"
"{""MANDT"":""400""","""LEDNR"":""00""","""OBJNR"":""KS66000011001070""","""GJAHR"":""2022""","""WRTTP"":""04""","""VERSN"":""000""","""KSTAR"":""0051040105""","""HRKFT"":""""","""VRGNG"":""COIN""","""VBUND"":""""","""PARGB"":""""","""BEKNZ"":""H""","""TWAER"":""THB""","""PERBL"":""016""","""MEINH"":""""","""WTG001"":-509518.63","""WTG002"":0.00","""WTG003"":0.00","""WTG004"":0.00","""WTG005"":0.00","""WTG006"":0.00","""WTG007"":0.00","""WTG008"":0.00","""WTG009"":0.00","""WTG010"":0.00","""WTG011"":0.00","""WTG012"":0.00","""WTG013"":0.00","""WTG014"":0.00","""WTG015"":0.00","""WTG016"":0.00","""WOG001"":-509518.63","""WOG002"":0.00","""WOG003"":0.00","""WOG004"":0.00","""WOG005"":0.00","""WOG006"":0.00","""WOG007"":0.00","""WOG008"":0.00","""WOG009"":0.00","""WOG010"":0.00","""WOG011"":0.00","""WOG012"":0.00","""WOG013"":0.00","""WOG014"":0.00","""WOG015"":0.00","""WOG016"":0.00","""WKG001"":-509518.63","""WKG002"":0.00","""WKG003"":0.00","""WKG004"":0.00","""WKG005"":0.00","""WKG006"":0.00","""WKG007"":0.00","""WKG008"":0.00","""WKG009"":0.00","""WKG010"":0.00","""WKG011"":0.00","""WKG012"":0.00","""WKG013"":0.00","""WKG014"":0.00","""WKG015"":0.00","""WKG016"":0.00","""WKF001"":0.00","""WKF002"":0.00","""WKF003"":0.00","""WKF004"":0.00","""WKF005"":0.00","""WKF006"":0.00","""WKF007"":0.00","""WKF008"":0.00","""WKF009"":0.00","""WKF010"":0.00","""WKF011"":0.00","""WKF012"":0.00","""WKF013"":0.00","""WKF014"":0.00","""WKF015"":0.00","""WKF016"":0.00","""PAG001"":0.00","""PAG002"":0.00","""PAG003"":0.00","""PAG004"":0.00","""PAG005"":0.00","""PAG006"":0.00","""PAG007"":0.00","""PAG008"":0.00","""PAG009"":0.00","""PAG010"":0.00","""PAG011"":0.00","""PAG012"":0.00","""PAG013"":0.00","""PAG014"":0.00","""PAG015"":0.00","""PAG016"":0.00","""MEG001"":0.000","""MEG002"":0.000","""MEG003"":0.000","""MEG004"":0.000","""MEG005"":0.000","""MEG006"":0.000","""MEG007"":0.000","""MEG008"":0.000","""MEG009"":0.000","""MEG010"":0.000","""MEG011"":0.000","""MEG012"":0.000","""MEG013"":0.000","""MEG014"":0.000","""MEG015"":0.000","""MEG016"":0.000","""MEF001"":0.000","""MEF002"":0.000","""MEF003"":0.000","""MEF004"":0.000","""MEF005"":0.000","""MEF006"":0.000","""MEF007"":0.000","""MEF008"":0.000","""MEF009"":0.000","""MEF010"":0.000","""MEF011"":0.000","""MEF012"":0.000","""MEF013"":0.000","""MEF014"":0.000","""MEF015"":0.000","""MEF016"":0.000","""MUV001"":""""","""MUV002"":""""","""MUV003"":""""","""MUV004"":""""","""MUV005"":""""","""MUV006"":""""","""MUV007"":""""","""MUV008"":""""","""MUV009"":""""","""MUV010"":""""","""MUV011"":""""","""MUV012"":""""","""MUV013"":""""","""MUV014"":""""","""MUV015"":""""","""MUV016"":""""","""BELTP"":""1""","""TIMESTMP"":10124662700000.0","""BUKRS"":""6611""","""FKBER"":""""","""SEGMENT"":""""","""GEBER"":""""","""GRANT_NBR"":""""","""BUDGET_PD"":""""}"
Simply, You need to put Header name in Pandas Dataframe like...
df.columns = ["Column_Name1", "Column_Name2", "Column_Name3", "Column_Name4" and so on..]
And, If you want to use loop to append name for each col then you need iterate over the list and append based on the index and length of the list
First read csv and get each key value pair by iterating over the columns
import pandas as pd
read_df = pd.read_csv(<your csv file path>)
dict_of_pairs = {pairs: read_df[pairs] for pairs in read_df}
Write it in another file
write_df = pd.DataFrame({k: pd.Series(v) for k, v in dict_of_pairs.items()}) // this will allow you to write even if some column has no values in it
writer = pd.ExcelWriter(write_path, engine='xlsxwriter')
df.to_excel(writer, sheet_name='Somename for your sheet', index=False)
Hope this answers your question.....