I have some difficulty in importing a JSON file with pandas.
import pandas as pd
map_index_to_word = pd.read_json('people_wiki_map_index_to_word.json')
This is the error that I get:
ValueError: If using all scalar values, you must pass an index
The file structure is simplified like this:
{"biennials": 522004, "lb915": 116290, "shatzky": 127647, "woode": 174106, "damfunk": 133206, "nualart": 153444, "hatefillot": 164111, "missionborn": 261765, "yeardescribed": 161075, "theoryhe": 521685}
It is from the machine learning course of University of Washington on Coursera. You can find the file here.
Try
ser = pd.read_json('people_wiki_map_index_to_word.json', typ='series')
That file only contains key value pairs where values are scalars. You can convert it to a dataframe with ser.to_frame('count').
You can also do something like this:
import json
with open('people_wiki_map_index_to_word.json', 'r') as f:
data = json.load(f)
Now data is a dictionary. You can pass it to a dataframe constructor like this:
df = pd.DataFrame({'count': data})
You can do as #ayhan mention which will give you a column base format
Or you can enclose the object in [ ] (source) as shown below to give you a row format that will be convenient if you are loading multiple values and planing on using matrix for your machine learning models.
df = pd.DataFrame([data])
I think what is happening is that the data in
map_index_to_word = pd.read_json('people_wiki_map_index_to_word.json')
is being read as a string instead of a json
{"biennials": 522004, "lb915": 116290, "shatzky": 127647, "woode": 174106, "damfunk": 133206, "nualart": 153444, "hatefillot": 164111, "missionborn": 261765, "yeardescribed": 161075, "theoryhe": 521685}
is actually
'{"biennials": 522004, "lb915": 116290, "shatzky": 127647, "woode": 174106, "damfunk": 133206, "nualart": 153444, "hatefillot": 164111, "missionborn": 261765, "yeardescribed": 161075, "theoryhe": 521685}'
Since a string is a scalar, it wants you to load it as a json, you have to convert it to a dict which is exactly what the other response is doing
The best way is to do a json loads on the string to convert it to a dict and load it into pandas
myfile=f.read()
jsonData=json.loads(myfile)
df=pd.DataFrame(data)
{
"biennials": 522004,
"lb915": 116290
}
df = pd.read_json('values.json')
As pd.read_json expects a list
{
"biennials": [522004],
"lb915": [116290]
}
for a particular key, it returns an error saying
If using all scalar values, you must pass an index.
So you can resolve this by specifying 'typ' arg in pd.read_json
map_index_to_word = pd.read_json('Datasets/people_wiki_map_index_to_word.json', typ='dictionary')
For newer pandas, 0.19.0 and later, use the lines parameter, set it to True.
The file is read as a json object per line.
import pandas as pd
map_index_to_word = pd.read_json('people_wiki_map_index_to_word.json', lines=True)
If fixed the following errors I encountered especially when some of the json files have only one value:
ValueError: If using all scalar values, you must pass an index
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
ValueError: Trailing data
For example
cat values.json
{
name: "Snow",
age: "31"
}
df = pd.read_json('values.json')
Chances are you might end up with this
Error: if using all scalar values, you must pass an index
Pandas looks up for a list or dictionary in the value. Something like
cat values.json
{
name: ["Snow"],
age: ["31"]
}
So try doing this. Later on to convert to html tohtml()
df = pd.DataFrame([pd.read_json(report_file, typ='series')])
result = df.to_html()
I solved this by converting it into an array like so
[{"biennials": 522004, "lb915": 116290, "shatzky": 127647, "woode": 174106, "damfunk": 133206, "nualart": 153444, "hatefillot": 164111, "missionborn": 261765, "yeardescribed": 161075, "theoryhe": 521685}]
I have a CSV that has been returned and the data is in a god awful state, I need to parse both the header and then the data out from each row.
This is an example of one row:
+--------------+------------+--------------------+--------------+------------+-------------+--------------------+----------+--------------+----------+----------+-----------+-------------+-------------+----------+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+--------------------+--------------+----------+------------+----------+--------------+---------------+
| _c0| _c1| _c2| _c3| _c4| _c5| _c6| _c7| _c8| _c9| _c10| _c11| _c12| _c13| _c14| _c15| _c16| _c17| _c18| _c19| _c20| _c21| _c22| _c23| _c24| _c25| _c26| _c27| _c28| _c29| _c30| _c31| _c32| _c33| _c34| _c35| _c36| _c37| _c38| _c39| _c40| _c41| _c42| _c43| _c44| _c45| _c46| _c47| _c48| _c49| _c50| _c51| _c52| _c53| _c54| _c55| _c56| _c57| _c58| _c59| _c60| _c61| _c62| _c63| _c64| _c65| _c66| _c67| _c68| _c69| _c70| _c71| _c72| _c73| _c74| _c75| _c76| _c77| _c78| _c79| _c80| _c81| _c82| _c83| _c84| _c85| _c86| _c87| _c88| _c89| _c90| _c91| _c92| _c93| _c94| _c95| _c96| _c97| _c98| _c99| _c100| _c101| _c102| _c103| _c104| _c105| _c106| _c107| _c108| _c109| _c110| _c111| _c112| _c113| _c114| _c115| _c116| _c117| _c118| _c119| _c120| _c121| _c122| _c123| _c124| _c125| _c126| _c127| _c128| _c129| _c130| _c131| _c132| _c133| _c134| _c135| _c136| _c137| _c138| _c139| _c140| _c141| _c142| _c143| _c144| _c145| _c146| _c147| _c148| _c149| _c150|
+--------------+------------+--------------------+--------------+------------+-------------+--------------------+----------+--------------+----------+----------+-----------+-------------+-------------+----------+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+--------------------+--------------+----------+------------+----------+--------------+---------------+
|{"MANDT":"400"|"LEDNR":"00"|"OBJNR":"KS660000...|"GJAHR":"2022"|"WRTTP":"04"|"VERSN":"000"|"KSTAR":"0051040100"|"HRKFT":""|"VRGNG":"COIN"|"VBUND":""|"PARGB":""|"BEKNZ":"H"|"TWAER":"THB"|"PERBL":"016"|"MEINH":""|"WTG001":-1854554.89|"WTG002":0.00|"WTG003":0.00|"WTG004":0.00|"WTG005":0.00|"WTG006":0.00|"WTG007":0.00|"WTG008":0.00|"WTG009":0.00|"WTG010":0.00|"WTG011":0.00|"WTG012":0.00|"WTG013":0.00|"WTG014":0.00|"WTG015":0.00|"WTG016":0.00|"WOG001":-1854554.89|"WOG002":0.00|"WOG003":0.00|"WOG004":0.00|"WOG005":0.00|"WOG006":0.00|"WOG007":0.00|"WOG008":0.00|"WOG009":0.00|"WOG010":0.00|"WOG011":0.00|"WOG012":0.00|"WOG013":0.00|"WOG014":0.00|"WOG015":0.00|"WOG016":0.00|"WKG001":-1854554.89|"WKG002":0.00|"WKG003":0.00|"WKG004":0.00|"WKG005":0.00|"WKG006":0.00|"WKG007":0.00|"WKG008":0.00|"WKG009":0.00|"WKG010":0.00|"WKG011":0.00|"WKG012":0.00|"WKG013":0.00|"WKG014":0.00|"WKG015":0.00|"WKG016":0.00|"WKF001":0.00|"WKF002":0.00|"WKF003":0.00|"WKF004":0.00|"WKF005":0.00|"WKF006":0.00|"WKF007":0.00|"WKF008":0.00|"WKF009":0.00|"WKF010":0.00|"WKF011":0.00|"WKF012":0.00|"WKF013":0.00|"WKF014":0.00|"WKF015":0.00|"WKF016":0.00|"PAG001":0.00|"PAG002":0.00|"PAG003":0.00|"PAG004":0.00|"PAG005":0.00|"PAG006":0.00|"PAG007":0.00|"PAG008":0.00|"PAG009":0.00|"PAG010":0.00|"PAG011":0.00|"PAG012":0.00|"PAG013":0.00|"PAG014":0.00|"PAG015":0.00|"PAG016":0.00|"MEG001":0.000|"MEG002":0.000|"MEG003":0.000|"MEG004":0.000|"MEG005":0.000|"MEG006":0.000|"MEG007":0.000|"MEG008":0.000|"MEG009":0.000|"MEG010":0.000|"MEG011":0.000|"MEG012":0.000|"MEG013":0.000|"MEG014":0.000|"MEG015":0.000|"MEG016":0.000|"MEF001":0.000|"MEF002":0.000|"MEF003":0.000|"MEF004":0.000|"MEF005":0.000|"MEF006":0.000|"MEF007":0.000|"MEF008":0.000|"MEF009":0.000|"MEF010":0.000|"MEF011":0.000|"MEF012":0.000|"MEF013":0.000|"MEF014":0.000|"MEF015":0.000|"MEF016":0.000|"MUV001":""|"MUV002":""|"MUV003":""|"MUV004":""|"MUV005":""|"MUV006":""|"MUV007":""|"MUV008":""|"MUV009":""|"MUV010":""|"MUV011":""|"MUV012":""|"MUV013":""|"MUV014":""|"MUV015":""|"MUV016":""|"BELTP":"1"|"TIMESTMP":101246...|"BUKRS":"6611"|"FKBER":""|"SEGMENT":""|"GEBER":""|"GRANT_NBR":""|"BUDGET_PD":""}|
+--------------+------------+--------------------+--------------+------------+-------------+--------------------+----------+--------------+----------+----------+-----------+-------------+-------------+----------+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+--------------------+--------------+----------+------------+----------+--------------+---------------+
The first part for example MANDT is the column header and the bit after the : is the value. I basically need to
A) Loop all the columns and change the headers so they relate to the bit prior to the :
B) then populate the rows with the second part after.
I've attempted a small piece of code just to edit all the columns like below
from pyspark.sql.functions import split
for colname in COSPDF.columns:
print(colname)
COSPDF = COSPDF.withColumn(col(colname), lower(colname))
and I receive an error TypeError: 'str' object is not callable
I've then done the "lazy" thing and found some code like below
from pyspark.sql.functions import split
split_df = COSPDF.select(split(COSPDF._c0, ':').alias('split_text'))
split_df.selectExpr("split_text[0] as left").show() # left of delim
split_df.selectExpr("split_text[1] as right").show() # right of delim
However this code only works one column that I have to "specify" which doesn't work when the CSV has 123 columns, I'm not doing it 123 times. Any assistance would really help with this please, it's had me stuck for hours.
UPDATED
Some rows from the original file:
"{""MANDT"":""400""","""LEDNR"":""00""","""OBJNR"":""KS66000011001070""","""GJAHR"":""2022""","""WRTTP"":""04""","""VERSN"":""000""","""KSTAR"":""0051040100""","""HRKFT"":""""","""VRGNG"":""COIN""","""VBUND"":""""","""PARGB"":""""","""BEKNZ"":""H""","""TWAER"":""THB""","""PERBL"":""016""","""MEINH"":""""","""WTG001"":-1854554.89","""WTG002"":0.00","""WTG003"":0.00","""WTG004"":0.00","""WTG005"":0.00","""WTG006"":0.00","""WTG007"":0.00","""WTG008"":0.00","""WTG009"":0.00","""WTG010"":0.00","""WTG011"":0.00","""WTG012"":0.00","""WTG013"":0.00","""WTG014"":0.00","""WTG015"":0.00","""WTG016"":0.00","""WOG001"":-1854554.89","""WOG002"":0.00","""WOG003"":0.00","""WOG004"":0.00","""WOG005"":0.00","""WOG006"":0.00","""WOG007"":0.00","""WOG008"":0.00","""WOG009"":0.00","""WOG010"":0.00","""WOG011"":0.00","""WOG012"":0.00","""WOG013"":0.00","""WOG014"":0.00","""WOG015"":0.00","""WOG016"":0.00","""WKG001"":-1854554.89","""WKG002"":0.00","""WKG003"":0.00","""WKG004"":0.00","""WKG005"":0.00","""WKG006"":0.00","""WKG007"":0.00","""WKG008"":0.00","""WKG009"":0.00","""WKG010"":0.00","""WKG011"":0.00","""WKG012"":0.00","""WKG013"":0.00","""WKG014"":0.00","""WKG015"":0.00","""WKG016"":0.00","""WKF001"":0.00","""WKF002"":0.00","""WKF003"":0.00","""WKF004"":0.00","""WKF005"":0.00","""WKF006"":0.00","""WKF007"":0.00","""WKF008"":0.00","""WKF009"":0.00","""WKF010"":0.00","""WKF011"":0.00","""WKF012"":0.00","""WKF013"":0.00","""WKF014"":0.00","""WKF015"":0.00","""WKF016"":0.00","""PAG001"":0.00","""PAG002"":0.00","""PAG003"":0.00","""PAG004"":0.00","""PAG005"":0.00","""PAG006"":0.00","""PAG007"":0.00","""PAG008"":0.00","""PAG009"":0.00","""PAG010"":0.00","""PAG011"":0.00","""PAG012"":0.00","""PAG013"":0.00","""PAG014"":0.00","""PAG015"":0.00","""PAG016"":0.00","""MEG001"":0.000","""MEG002"":0.000","""MEG003"":0.000","""MEG004"":0.000","""MEG005"":0.000","""MEG006"":0.000","""MEG007"":0.000","""MEG008"":0.000","""MEG009"":0.000","""MEG010"":0.000","""MEG011"":0.000","""MEG012"":0.000","""MEG013"":0.000","""MEG014"":0.000","""MEG015"":0.000","""MEG016"":0.000","""MEF001"":0.000","""MEF002"":0.000","""MEF003"":0.000","""MEF004"":0.000","""MEF005"":0.000","""MEF006"":0.000","""MEF007"":0.000","""MEF008"":0.000","""MEF009"":0.000","""MEF010"":0.000","""MEF011"":0.000","""MEF012"":0.000","""MEF013"":0.000","""MEF014"":0.000","""MEF015"":0.000","""MEF016"":0.000","""MUV001"":""""","""MUV002"":""""","""MUV003"":""""","""MUV004"":""""","""MUV005"":""""","""MUV006"":""""","""MUV007"":""""","""MUV008"":""""","""MUV009"":""""","""MUV010"":""""","""MUV011"":""""","""MUV012"":""""","""MUV013"":""""","""MUV014"":""""","""MUV015"":""""","""MUV016"":""""","""BELTP"":""1""","""TIMESTMP"":10124662650000.0","""BUKRS"":""6611""","""FKBER"":""""","""SEGMENT"":""""","""GEBER"":""""","""GRANT_NBR"":""""","""BUDGET_PD"":""""}"
"{""MANDT"":""400""","""LEDNR"":""00""","""OBJNR"":""KS66000011001070""","""GJAHR"":""2022""","""WRTTP"":""04""","""VERSN"":""000""","""KSTAR"":""0051040100""","""HRKFT"":""""","""VRGNG"":""COIN""","""VBUND"":""""","""PARGB"":""""","""BEKNZ"":""S""","""TWAER"":""THB""","""PERBL"":""016""","""MEINH"":""""","""WTG001"":7424891.07","""WTG002"":0.00","""WTG003"":0.00","""WTG004"":0.00","""WTG005"":0.00","""WTG006"":0.00","""WTG007"":0.00","""WTG008"":0.00","""WTG009"":0.00","""WTG010"":0.00","""WTG011"":0.00","""WTG012"":0.00","""WTG013"":0.00","""WTG014"":0.00","""WTG015"":0.00","""WTG016"":0.00","""WOG001"":7424891.07","""WOG002"":0.00","""WOG003"":0.00","""WOG004"":0.00","""WOG005"":0.00","""WOG006"":0.00","""WOG007"":0.00","""WOG008"":0.00","""WOG009"":0.00","""WOG010"":0.00","""WOG011"":0.00","""WOG012"":0.00","""WOG013"":0.00","""WOG014"":0.00","""WOG015"":0.00","""WOG016"":0.00","""WKG001"":7424891.07","""WKG002"":0.00","""WKG003"":0.00","""WKG004"":0.00","""WKG005"":0.00","""WKG006"":0.00","""WKG007"":0.00","""WKG008"":0.00","""WKG009"":0.00","""WKG010"":0.00","""WKG011"":0.00","""WKG012"":0.00","""WKG013"":0.00","""WKG014"":0.00","""WKG015"":0.00","""WKG016"":0.00","""WKF001"":0.00","""WKF002"":0.00","""WKF003"":0.00","""WKF004"":0.00","""WKF005"":0.00","""WKF006"":0.00","""WKF007"":0.00","""WKF008"":0.00","""WKF009"":0.00","""WKF010"":0.00","""WKF011"":0.00","""WKF012"":0.00","""WKF013"":0.00","""WKF014"":0.00","""WKF015"":0.00","""WKF016"":0.00","""PAG001"":0.00","""PAG002"":0.00","""PAG003"":0.00","""PAG004"":0.00","""PAG005"":0.00","""PAG006"":0.00","""PAG007"":0.00","""PAG008"":0.00","""PAG009"":0.00","""PAG010"":0.00","""PAG011"":0.00","""PAG012"":0.00","""PAG013"":0.00","""PAG014"":0.00","""PAG015"":0.00","""PAG016"":0.00","""MEG001"":0.000","""MEG002"":0.000","""MEG003"":0.000","""MEG004"":0.000","""MEG005"":0.000","""MEG006"":0.000","""MEG007"":0.000","""MEG008"":0.000","""MEG009"":0.000","""MEG010"":0.000","""MEG011"":0.000","""MEG012"":0.000","""MEG013"":0.000","""MEG014"":0.000","""MEG015"":0.000","""MEG016"":0.000","""MEF001"":0.000","""MEF002"":0.000","""MEF003"":0.000","""MEF004"":0.000","""MEF005"":0.000","""MEF006"":0.000","""MEF007"":0.000","""MEF008"":0.000","""MEF009"":0.000","""MEF010"":0.000","""MEF011"":0.000","""MEF012"":0.000","""MEF013"":0.000","""MEF014"":0.000","""MEF015"":0.000","""MEF016"":0.000","""MUV001"":""""","""MUV002"":""""","""MUV003"":""""","""MUV004"":""""","""MUV005"":""""","""MUV006"":""""","""MUV007"":""""","""MUV008"":""""","""MUV009"":""""","""MUV010"":""""","""MUV011"":""""","""MUV012"":""""","""MUV013"":""""","""MUV014"":""""","""MUV015"":""""","""MUV016"":""""","""BELTP"":""1""","""TIMESTMP"":10160936750000.0","""BUKRS"":""6611""","""FKBER"":""""","""SEGMENT"":""""","""GEBER"":""""","""GRANT_NBR"":""""","""BUDGET_PD"":""""}"
"{""MANDT"":""400""","""LEDNR"":""00""","""OBJNR"":""KS66000011001070""","""GJAHR"":""2022""","""WRTTP"":""04""","""VERSN"":""000""","""KSTAR"":""0051040105""","""HRKFT"":""""","""VRGNG"":""COIN""","""VBUND"":""""","""PARGB"":""""","""BEKNZ"":""H""","""TWAER"":""THB""","""PERBL"":""016""","""MEINH"":""""","""WTG001"":-509518.63","""WTG002"":0.00","""WTG003"":0.00","""WTG004"":0.00","""WTG005"":0.00","""WTG006"":0.00","""WTG007"":0.00","""WTG008"":0.00","""WTG009"":0.00","""WTG010"":0.00","""WTG011"":0.00","""WTG012"":0.00","""WTG013"":0.00","""WTG014"":0.00","""WTG015"":0.00","""WTG016"":0.00","""WOG001"":-509518.63","""WOG002"":0.00","""WOG003"":0.00","""WOG004"":0.00","""WOG005"":0.00","""WOG006"":0.00","""WOG007"":0.00","""WOG008"":0.00","""WOG009"":0.00","""WOG010"":0.00","""WOG011"":0.00","""WOG012"":0.00","""WOG013"":0.00","""WOG014"":0.00","""WOG015"":0.00","""WOG016"":0.00","""WKG001"":-509518.63","""WKG002"":0.00","""WKG003"":0.00","""WKG004"":0.00","""WKG005"":0.00","""WKG006"":0.00","""WKG007"":0.00","""WKG008"":0.00","""WKG009"":0.00","""WKG010"":0.00","""WKG011"":0.00","""WKG012"":0.00","""WKG013"":0.00","""WKG014"":0.00","""WKG015"":0.00","""WKG016"":0.00","""WKF001"":0.00","""WKF002"":0.00","""WKF003"":0.00","""WKF004"":0.00","""WKF005"":0.00","""WKF006"":0.00","""WKF007"":0.00","""WKF008"":0.00","""WKF009"":0.00","""WKF010"":0.00","""WKF011"":0.00","""WKF012"":0.00","""WKF013"":0.00","""WKF014"":0.00","""WKF015"":0.00","""WKF016"":0.00","""PAG001"":0.00","""PAG002"":0.00","""PAG003"":0.00","""PAG004"":0.00","""PAG005"":0.00","""PAG006"":0.00","""PAG007"":0.00","""PAG008"":0.00","""PAG009"":0.00","""PAG010"":0.00","""PAG011"":0.00","""PAG012"":0.00","""PAG013"":0.00","""PAG014"":0.00","""PAG015"":0.00","""PAG016"":0.00","""MEG001"":0.000","""MEG002"":0.000","""MEG003"":0.000","""MEG004"":0.000","""MEG005"":0.000","""MEG006"":0.000","""MEG007"":0.000","""MEG008"":0.000","""MEG009"":0.000","""MEG010"":0.000","""MEG011"":0.000","""MEG012"":0.000","""MEG013"":0.000","""MEG014"":0.000","""MEG015"":0.000","""MEG016"":0.000","""MEF001"":0.000","""MEF002"":0.000","""MEF003"":0.000","""MEF004"":0.000","""MEF005"":0.000","""MEF006"":0.000","""MEF007"":0.000","""MEF008"":0.000","""MEF009"":0.000","""MEF010"":0.000","""MEF011"":0.000","""MEF012"":0.000","""MEF013"":0.000","""MEF014"":0.000","""MEF015"":0.000","""MEF016"":0.000","""MUV001"":""""","""MUV002"":""""","""MUV003"":""""","""MUV004"":""""","""MUV005"":""""","""MUV006"":""""","""MUV007"":""""","""MUV008"":""""","""MUV009"":""""","""MUV010"":""""","""MUV011"":""""","""MUV012"":""""","""MUV013"":""""","""MUV014"":""""","""MUV015"":""""","""MUV016"":""""","""BELTP"":""1""","""TIMESTMP"":10124662700000.0","""BUKRS"":""6611""","""FKBER"":""""","""SEGMENT"":""""","""GEBER"":""""","""GRANT_NBR"":""""","""BUDGET_PD"":""""}"
Simply, You need to put Header name in Pandas Dataframe like...
df.columns = ["Column_Name1", "Column_Name2", "Column_Name3", "Column_Name4" and so on..]
And, If you want to use loop to append name for each col then you need iterate over the list and append based on the index and length of the list
First read csv and get each key value pair by iterating over the columns
import pandas as pd
read_df = pd.read_csv(<your csv file path>)
dict_of_pairs = {pairs: read_df[pairs] for pairs in read_df}
Write it in another file
write_df = pd.DataFrame({k: pd.Series(v) for k, v in dict_of_pairs.items()}) // this will allow you to write even if some column has no values in it
writer = pd.ExcelWriter(write_path, engine='xlsxwriter')
df.to_excel(writer, sheet_name='Somename for your sheet', index=False)
Hope this answers your question.....
I have a pyspark dataframe, this is what it looks like
+------------------------------------+-------------------+-------------+--------------------------------+---------+
|member_uuid |Timestamp |updated |member_id |easy_id |
+------------------------------------+-------------------+-------------+--------------------------------+---------+
|027130fe-584d-4d8e-9fb0-b87c984a0c20|2020-02-11 19:15:32|password_hash|ajuypjtnlzmk4na047cgav27jma6_STG|993269700|
I transformed the above dataframe to this,
+---------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------+-------------------+
|attribute|operation|params |timestamp |
+---------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------+-------------------+
|profile |UPDATE |{"member_uuid":"027130fe-584d-4d8e-9fb0-b87c984a0c20","member_id":"ajuypjtnlzmk4na047cgav27jma6_STG","easy_id":993269700,"field":"password_hash"}|2020-02-11 19:15:32|
Using the following code,
ll = ['member_uuid', 'member_id', 'easy_id', 'field']
df = df.withColumn('timestamp', col('Timestamp')).withColumn('attribute', lit('profile')).withColumn('operation', lit(col_name)) \
.withColumn('field', col('updated')).withColumn('params', F.to_json(struct([x for x in ll])))
df = df.select('attribute', 'operation', 'params', 'timestamp')
I have save this dataframe df to a text file after converting it to JSON.
I tried using the following code to do the same,
df_final.toJSON().coalesce(1).saveAsTextFile('file')
The file contains,
{"attribute":"profile","operation":"UPDATE","params":"{\"member_uuid\":\"027130fe-584d-4d8e-9fb0-b87c984a0c20\",\"member_id\":\"ajuypjtnlzmk4na047cgav27jma6_STG\",\"easy_id\":993269700,\"field\":\"password_hash\"}","timestamp":"2020-02-11T19:15:32.000Z"}
I want it to save in this format,
{"attribute":"profile","operation":"UPDATE","params":{"member_uuid":"027130fe-584d-4d8e-9fb0-b87c984a0c20","member_id":"ajuypjtnlzmk4na047cgav27jma6_STG","easy_id":993269700,"field":"password_hash"},"timestamp":"2020-02-11T19:15:32.000Z"}
to_json saves the value in the params columns as a string, is there a way to keep the json context here so I can save it as the desired output?
Don't use to_json to create params column in dataframe.
The trick here is just create struct and write to the file (using .saveAsTextFile (or) .write.json()) Spark will create JSON for the Struct field.
if we already created json object and writing in json format Spark will add \ to escape the quotes already exists in Json string.
Example:
from pyspark.sql.functions import *
#sample data
df=spark.createDataFrame([("027130fe-584d-4d8e-9fb0-b87c984a0c20","2020-02-11 19:15:32","password_hash","ajuypjtnlzmk4na047cgav27jma6_STG","993269700")],["member_uuid","Timestamp","updated","member_id","easy_id"])
df1=df.withColumn("attribute",lit("profile")).withColumn("operation",lit("UPDATE"))
df1.selectExpr("struct(member_uuid,member_id,easy_id) as params","attribute","operation","timestamp").write.format("json").mode("overwrite").save("<path>")
#{"params":{"member_uuid":"027130fe-584d-4d8e-9fb0-b87c984a0c20","member_id":"ajuypjtnlzmk4na047cgav27jma6_STG","easy_id":"993269700"},"attribute":"profile","operation":"UPDATE","timestamp":"2020-02-11 19:15:32"}
df1.selectExpr("struct(member_uuid,member_id,easy_id) as params","attribute","operation","timestamp").toJSON().saveAsTextFile("<path>")
#{"params":{"member_uuid":"027130fe-584d-4d8e-9fb0-b87c984a0c20","member_id":"ajuypjtnlzmk4na047cgav27jma6_STG","easy_id":"993269700"},"attribute":"profile","operation":"UPDATE","timestamp":"2020-02-11 19:15:32"}
A simple way to handle it is to just do a replace operation on the file
sourceData=open('file').read().replace('"{','{').replace('}"','}').replace('\\','')
with open('file','w') as final:
final.write(sourceData)
This might not be what you are looking for, but will achieve the end result.