How to iterate over a CSV file with Pywikibot - python

I wanted to try uploading a series of items to test.wikidata, creating the item and then adding a statement of inception P571. The csv file sometimes has a date value, sometimes not. When no date value is given, I want to write out a placeholder 'some value'.
Imagine a dataframe like this:
df = {'Object': [1, 2,3], 'Date': [250,,300]}
However, I am not sure using Pywikibot how to iterate over a csv file with pywikibot to create an item for each row and add a statement. Here is the code I wrote:
import pywikibot
import pandas as pd
site = pywikibot.Site("test", "wikidata")
repo = site.data_repository()
df = pd.read_csv('experiment.csv')
item = pywikibot.ItemPage(repo)
for item in df:
date = df['date']
prop_date = pywikibot.Claim(repo, u'P571')
if date=='':
prop_date.setSnakType('somevalue')
else:
target = pywikibot.WbTime(year=date)
prop_date.setTarget(target)
item.addClaim(prop_date)
When I run this through PAWS, I get the message: KeyError: 'date'
But I think the real issue here is that I am not sure how to get Pywikibot to iterate over each row of the dataframe and create a new claim for each new date value. I would value any feedback or suggestions for good examples and documentation. Many thanks!

Looking back on this, the solution was to use .iterrows() or .itertuples() or .loc[] to access the values in the row.
So
for item in df.itertuples():
prop_date = pywikibot.Claim(repo, u'P571')
if item.date=='':
prop_date.setSnakType('somevalue')
else:
target = pywikibot.WbTime(year=date)
prop_date.setTarget(target)
item.addClaim(prop_date)

Related

Parse data in a new dataframe with correct headers taken from within the data

I have a CSV that has been returned and the data is in a god awful state, I need to parse both the header and then the data out from each row.
This is an example of one row:
+--------------+------------+--------------------+--------------+------------+-------------+--------------------+----------+--------------+----------+----------+-----------+-------------+-------------+----------+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+--------------------+--------------+----------+------------+----------+--------------+---------------+
| _c0| _c1| _c2| _c3| _c4| _c5| _c6| _c7| _c8| _c9| _c10| _c11| _c12| _c13| _c14| _c15| _c16| _c17| _c18| _c19| _c20| _c21| _c22| _c23| _c24| _c25| _c26| _c27| _c28| _c29| _c30| _c31| _c32| _c33| _c34| _c35| _c36| _c37| _c38| _c39| _c40| _c41| _c42| _c43| _c44| _c45| _c46| _c47| _c48| _c49| _c50| _c51| _c52| _c53| _c54| _c55| _c56| _c57| _c58| _c59| _c60| _c61| _c62| _c63| _c64| _c65| _c66| _c67| _c68| _c69| _c70| _c71| _c72| _c73| _c74| _c75| _c76| _c77| _c78| _c79| _c80| _c81| _c82| _c83| _c84| _c85| _c86| _c87| _c88| _c89| _c90| _c91| _c92| _c93| _c94| _c95| _c96| _c97| _c98| _c99| _c100| _c101| _c102| _c103| _c104| _c105| _c106| _c107| _c108| _c109| _c110| _c111| _c112| _c113| _c114| _c115| _c116| _c117| _c118| _c119| _c120| _c121| _c122| _c123| _c124| _c125| _c126| _c127| _c128| _c129| _c130| _c131| _c132| _c133| _c134| _c135| _c136| _c137| _c138| _c139| _c140| _c141| _c142| _c143| _c144| _c145| _c146| _c147| _c148| _c149| _c150|
+--------------+------------+--------------------+--------------+------------+-------------+--------------------+----------+--------------+----------+----------+-----------+-------------+-------------+----------+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+--------------------+--------------+----------+------------+----------+--------------+---------------+
|{"MANDT":"400"|"LEDNR":"00"|"OBJNR":"KS660000...|"GJAHR":"2022"|"WRTTP":"04"|"VERSN":"000"|"KSTAR":"0051040100"|"HRKFT":""|"VRGNG":"COIN"|"VBUND":""|"PARGB":""|"BEKNZ":"H"|"TWAER":"THB"|"PERBL":"016"|"MEINH":""|"WTG001":-1854554.89|"WTG002":0.00|"WTG003":0.00|"WTG004":0.00|"WTG005":0.00|"WTG006":0.00|"WTG007":0.00|"WTG008":0.00|"WTG009":0.00|"WTG010":0.00|"WTG011":0.00|"WTG012":0.00|"WTG013":0.00|"WTG014":0.00|"WTG015":0.00|"WTG016":0.00|"WOG001":-1854554.89|"WOG002":0.00|"WOG003":0.00|"WOG004":0.00|"WOG005":0.00|"WOG006":0.00|"WOG007":0.00|"WOG008":0.00|"WOG009":0.00|"WOG010":0.00|"WOG011":0.00|"WOG012":0.00|"WOG013":0.00|"WOG014":0.00|"WOG015":0.00|"WOG016":0.00|"WKG001":-1854554.89|"WKG002":0.00|"WKG003":0.00|"WKG004":0.00|"WKG005":0.00|"WKG006":0.00|"WKG007":0.00|"WKG008":0.00|"WKG009":0.00|"WKG010":0.00|"WKG011":0.00|"WKG012":0.00|"WKG013":0.00|"WKG014":0.00|"WKG015":0.00|"WKG016":0.00|"WKF001":0.00|"WKF002":0.00|"WKF003":0.00|"WKF004":0.00|"WKF005":0.00|"WKF006":0.00|"WKF007":0.00|"WKF008":0.00|"WKF009":0.00|"WKF010":0.00|"WKF011":0.00|"WKF012":0.00|"WKF013":0.00|"WKF014":0.00|"WKF015":0.00|"WKF016":0.00|"PAG001":0.00|"PAG002":0.00|"PAG003":0.00|"PAG004":0.00|"PAG005":0.00|"PAG006":0.00|"PAG007":0.00|"PAG008":0.00|"PAG009":0.00|"PAG010":0.00|"PAG011":0.00|"PAG012":0.00|"PAG013":0.00|"PAG014":0.00|"PAG015":0.00|"PAG016":0.00|"MEG001":0.000|"MEG002":0.000|"MEG003":0.000|"MEG004":0.000|"MEG005":0.000|"MEG006":0.000|"MEG007":0.000|"MEG008":0.000|"MEG009":0.000|"MEG010":0.000|"MEG011":0.000|"MEG012":0.000|"MEG013":0.000|"MEG014":0.000|"MEG015":0.000|"MEG016":0.000|"MEF001":0.000|"MEF002":0.000|"MEF003":0.000|"MEF004":0.000|"MEF005":0.000|"MEF006":0.000|"MEF007":0.000|"MEF008":0.000|"MEF009":0.000|"MEF010":0.000|"MEF011":0.000|"MEF012":0.000|"MEF013":0.000|"MEF014":0.000|"MEF015":0.000|"MEF016":0.000|"MUV001":""|"MUV002":""|"MUV003":""|"MUV004":""|"MUV005":""|"MUV006":""|"MUV007":""|"MUV008":""|"MUV009":""|"MUV010":""|"MUV011":""|"MUV012":""|"MUV013":""|"MUV014":""|"MUV015":""|"MUV016":""|"BELTP":"1"|"TIMESTMP":101246...|"BUKRS":"6611"|"FKBER":""|"SEGMENT":""|"GEBER":""|"GRANT_NBR":""|"BUDGET_PD":""}|
+--------------+------------+--------------------+--------------+------------+-------------+--------------------+----------+--------------+----------+----------+-----------+-------------+-------------+----------+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+--------------------+--------------+----------+------------+----------+--------------+---------------+
The first part for example MANDT is the column header and the bit after the : is the value. I basically need to
A) Loop all the columns and change the headers so they relate to the bit prior to the :
B) then populate the rows with the second part after.
I've attempted a small piece of code just to edit all the columns like below
from pyspark.sql.functions import split
for colname in COSPDF.columns:
print(colname)
COSPDF = COSPDF.withColumn(col(colname), lower(colname))
and I receive an error TypeError: 'str' object is not callable
I've then done the "lazy" thing and found some code like below
from pyspark.sql.functions import split
split_df = COSPDF.select(split(COSPDF._c0, ':').alias('split_text'))
split_df.selectExpr("split_text[0] as left").show() # left of delim
split_df.selectExpr("split_text[1] as right").show() # right of delim
However this code only works one column that I have to "specify" which doesn't work when the CSV has 123 columns, I'm not doing it 123 times. Any assistance would really help with this please, it's had me stuck for hours.
UPDATED
Some rows from the original file:
"{""MANDT"":""400""","""LEDNR"":""00""","""OBJNR"":""KS66000011001070""","""GJAHR"":""2022""","""WRTTP"":""04""","""VERSN"":""000""","""KSTAR"":""0051040100""","""HRKFT"":""""","""VRGNG"":""COIN""","""VBUND"":""""","""PARGB"":""""","""BEKNZ"":""H""","""TWAER"":""THB""","""PERBL"":""016""","""MEINH"":""""","""WTG001"":-1854554.89","""WTG002"":0.00","""WTG003"":0.00","""WTG004"":0.00","""WTG005"":0.00","""WTG006"":0.00","""WTG007"":0.00","""WTG008"":0.00","""WTG009"":0.00","""WTG010"":0.00","""WTG011"":0.00","""WTG012"":0.00","""WTG013"":0.00","""WTG014"":0.00","""WTG015"":0.00","""WTG016"":0.00","""WOG001"":-1854554.89","""WOG002"":0.00","""WOG003"":0.00","""WOG004"":0.00","""WOG005"":0.00","""WOG006"":0.00","""WOG007"":0.00","""WOG008"":0.00","""WOG009"":0.00","""WOG010"":0.00","""WOG011"":0.00","""WOG012"":0.00","""WOG013"":0.00","""WOG014"":0.00","""WOG015"":0.00","""WOG016"":0.00","""WKG001"":-1854554.89","""WKG002"":0.00","""WKG003"":0.00","""WKG004"":0.00","""WKG005"":0.00","""WKG006"":0.00","""WKG007"":0.00","""WKG008"":0.00","""WKG009"":0.00","""WKG010"":0.00","""WKG011"":0.00","""WKG012"":0.00","""WKG013"":0.00","""WKG014"":0.00","""WKG015"":0.00","""WKG016"":0.00","""WKF001"":0.00","""WKF002"":0.00","""WKF003"":0.00","""WKF004"":0.00","""WKF005"":0.00","""WKF006"":0.00","""WKF007"":0.00","""WKF008"":0.00","""WKF009"":0.00","""WKF010"":0.00","""WKF011"":0.00","""WKF012"":0.00","""WKF013"":0.00","""WKF014"":0.00","""WKF015"":0.00","""WKF016"":0.00","""PAG001"":0.00","""PAG002"":0.00","""PAG003"":0.00","""PAG004"":0.00","""PAG005"":0.00","""PAG006"":0.00","""PAG007"":0.00","""PAG008"":0.00","""PAG009"":0.00","""PAG010"":0.00","""PAG011"":0.00","""PAG012"":0.00","""PAG013"":0.00","""PAG014"":0.00","""PAG015"":0.00","""PAG016"":0.00","""MEG001"":0.000","""MEG002"":0.000","""MEG003"":0.000","""MEG004"":0.000","""MEG005"":0.000","""MEG006"":0.000","""MEG007"":0.000","""MEG008"":0.000","""MEG009"":0.000","""MEG010"":0.000","""MEG011"":0.000","""MEG012"":0.000","""MEG013"":0.000","""MEG014"":0.000","""MEG015"":0.000","""MEG016"":0.000","""MEF001"":0.000","""MEF002"":0.000","""MEF003"":0.000","""MEF004"":0.000","""MEF005"":0.000","""MEF006"":0.000","""MEF007"":0.000","""MEF008"":0.000","""MEF009"":0.000","""MEF010"":0.000","""MEF011"":0.000","""MEF012"":0.000","""MEF013"":0.000","""MEF014"":0.000","""MEF015"":0.000","""MEF016"":0.000","""MUV001"":""""","""MUV002"":""""","""MUV003"":""""","""MUV004"":""""","""MUV005"":""""","""MUV006"":""""","""MUV007"":""""","""MUV008"":""""","""MUV009"":""""","""MUV010"":""""","""MUV011"":""""","""MUV012"":""""","""MUV013"":""""","""MUV014"":""""","""MUV015"":""""","""MUV016"":""""","""BELTP"":""1""","""TIMESTMP"":10124662650000.0","""BUKRS"":""6611""","""FKBER"":""""","""SEGMENT"":""""","""GEBER"":""""","""GRANT_NBR"":""""","""BUDGET_PD"":""""}"
"{""MANDT"":""400""","""LEDNR"":""00""","""OBJNR"":""KS66000011001070""","""GJAHR"":""2022""","""WRTTP"":""04""","""VERSN"":""000""","""KSTAR"":""0051040100""","""HRKFT"":""""","""VRGNG"":""COIN""","""VBUND"":""""","""PARGB"":""""","""BEKNZ"":""S""","""TWAER"":""THB""","""PERBL"":""016""","""MEINH"":""""","""WTG001"":7424891.07","""WTG002"":0.00","""WTG003"":0.00","""WTG004"":0.00","""WTG005"":0.00","""WTG006"":0.00","""WTG007"":0.00","""WTG008"":0.00","""WTG009"":0.00","""WTG010"":0.00","""WTG011"":0.00","""WTG012"":0.00","""WTG013"":0.00","""WTG014"":0.00","""WTG015"":0.00","""WTG016"":0.00","""WOG001"":7424891.07","""WOG002"":0.00","""WOG003"":0.00","""WOG004"":0.00","""WOG005"":0.00","""WOG006"":0.00","""WOG007"":0.00","""WOG008"":0.00","""WOG009"":0.00","""WOG010"":0.00","""WOG011"":0.00","""WOG012"":0.00","""WOG013"":0.00","""WOG014"":0.00","""WOG015"":0.00","""WOG016"":0.00","""WKG001"":7424891.07","""WKG002"":0.00","""WKG003"":0.00","""WKG004"":0.00","""WKG005"":0.00","""WKG006"":0.00","""WKG007"":0.00","""WKG008"":0.00","""WKG009"":0.00","""WKG010"":0.00","""WKG011"":0.00","""WKG012"":0.00","""WKG013"":0.00","""WKG014"":0.00","""WKG015"":0.00","""WKG016"":0.00","""WKF001"":0.00","""WKF002"":0.00","""WKF003"":0.00","""WKF004"":0.00","""WKF005"":0.00","""WKF006"":0.00","""WKF007"":0.00","""WKF008"":0.00","""WKF009"":0.00","""WKF010"":0.00","""WKF011"":0.00","""WKF012"":0.00","""WKF013"":0.00","""WKF014"":0.00","""WKF015"":0.00","""WKF016"":0.00","""PAG001"":0.00","""PAG002"":0.00","""PAG003"":0.00","""PAG004"":0.00","""PAG005"":0.00","""PAG006"":0.00","""PAG007"":0.00","""PAG008"":0.00","""PAG009"":0.00","""PAG010"":0.00","""PAG011"":0.00","""PAG012"":0.00","""PAG013"":0.00","""PAG014"":0.00","""PAG015"":0.00","""PAG016"":0.00","""MEG001"":0.000","""MEG002"":0.000","""MEG003"":0.000","""MEG004"":0.000","""MEG005"":0.000","""MEG006"":0.000","""MEG007"":0.000","""MEG008"":0.000","""MEG009"":0.000","""MEG010"":0.000","""MEG011"":0.000","""MEG012"":0.000","""MEG013"":0.000","""MEG014"":0.000","""MEG015"":0.000","""MEG016"":0.000","""MEF001"":0.000","""MEF002"":0.000","""MEF003"":0.000","""MEF004"":0.000","""MEF005"":0.000","""MEF006"":0.000","""MEF007"":0.000","""MEF008"":0.000","""MEF009"":0.000","""MEF010"":0.000","""MEF011"":0.000","""MEF012"":0.000","""MEF013"":0.000","""MEF014"":0.000","""MEF015"":0.000","""MEF016"":0.000","""MUV001"":""""","""MUV002"":""""","""MUV003"":""""","""MUV004"":""""","""MUV005"":""""","""MUV006"":""""","""MUV007"":""""","""MUV008"":""""","""MUV009"":""""","""MUV010"":""""","""MUV011"":""""","""MUV012"":""""","""MUV013"":""""","""MUV014"":""""","""MUV015"":""""","""MUV016"":""""","""BELTP"":""1""","""TIMESTMP"":10160936750000.0","""BUKRS"":""6611""","""FKBER"":""""","""SEGMENT"":""""","""GEBER"":""""","""GRANT_NBR"":""""","""BUDGET_PD"":""""}"
"{""MANDT"":""400""","""LEDNR"":""00""","""OBJNR"":""KS66000011001070""","""GJAHR"":""2022""","""WRTTP"":""04""","""VERSN"":""000""","""KSTAR"":""0051040105""","""HRKFT"":""""","""VRGNG"":""COIN""","""VBUND"":""""","""PARGB"":""""","""BEKNZ"":""H""","""TWAER"":""THB""","""PERBL"":""016""","""MEINH"":""""","""WTG001"":-509518.63","""WTG002"":0.00","""WTG003"":0.00","""WTG004"":0.00","""WTG005"":0.00","""WTG006"":0.00","""WTG007"":0.00","""WTG008"":0.00","""WTG009"":0.00","""WTG010"":0.00","""WTG011"":0.00","""WTG012"":0.00","""WTG013"":0.00","""WTG014"":0.00","""WTG015"":0.00","""WTG016"":0.00","""WOG001"":-509518.63","""WOG002"":0.00","""WOG003"":0.00","""WOG004"":0.00","""WOG005"":0.00","""WOG006"":0.00","""WOG007"":0.00","""WOG008"":0.00","""WOG009"":0.00","""WOG010"":0.00","""WOG011"":0.00","""WOG012"":0.00","""WOG013"":0.00","""WOG014"":0.00","""WOG015"":0.00","""WOG016"":0.00","""WKG001"":-509518.63","""WKG002"":0.00","""WKG003"":0.00","""WKG004"":0.00","""WKG005"":0.00","""WKG006"":0.00","""WKG007"":0.00","""WKG008"":0.00","""WKG009"":0.00","""WKG010"":0.00","""WKG011"":0.00","""WKG012"":0.00","""WKG013"":0.00","""WKG014"":0.00","""WKG015"":0.00","""WKG016"":0.00","""WKF001"":0.00","""WKF002"":0.00","""WKF003"":0.00","""WKF004"":0.00","""WKF005"":0.00","""WKF006"":0.00","""WKF007"":0.00","""WKF008"":0.00","""WKF009"":0.00","""WKF010"":0.00","""WKF011"":0.00","""WKF012"":0.00","""WKF013"":0.00","""WKF014"":0.00","""WKF015"":0.00","""WKF016"":0.00","""PAG001"":0.00","""PAG002"":0.00","""PAG003"":0.00","""PAG004"":0.00","""PAG005"":0.00","""PAG006"":0.00","""PAG007"":0.00","""PAG008"":0.00","""PAG009"":0.00","""PAG010"":0.00","""PAG011"":0.00","""PAG012"":0.00","""PAG013"":0.00","""PAG014"":0.00","""PAG015"":0.00","""PAG016"":0.00","""MEG001"":0.000","""MEG002"":0.000","""MEG003"":0.000","""MEG004"":0.000","""MEG005"":0.000","""MEG006"":0.000","""MEG007"":0.000","""MEG008"":0.000","""MEG009"":0.000","""MEG010"":0.000","""MEG011"":0.000","""MEG012"":0.000","""MEG013"":0.000","""MEG014"":0.000","""MEG015"":0.000","""MEG016"":0.000","""MEF001"":0.000","""MEF002"":0.000","""MEF003"":0.000","""MEF004"":0.000","""MEF005"":0.000","""MEF006"":0.000","""MEF007"":0.000","""MEF008"":0.000","""MEF009"":0.000","""MEF010"":0.000","""MEF011"":0.000","""MEF012"":0.000","""MEF013"":0.000","""MEF014"":0.000","""MEF015"":0.000","""MEF016"":0.000","""MUV001"":""""","""MUV002"":""""","""MUV003"":""""","""MUV004"":""""","""MUV005"":""""","""MUV006"":""""","""MUV007"":""""","""MUV008"":""""","""MUV009"":""""","""MUV010"":""""","""MUV011"":""""","""MUV012"":""""","""MUV013"":""""","""MUV014"":""""","""MUV015"":""""","""MUV016"":""""","""BELTP"":""1""","""TIMESTMP"":10124662700000.0","""BUKRS"":""6611""","""FKBER"":""""","""SEGMENT"":""""","""GEBER"":""""","""GRANT_NBR"":""""","""BUDGET_PD"":""""}"
Simply, You need to put Header name in Pandas Dataframe like...
df.columns = ["Column_Name1", "Column_Name2", "Column_Name3", "Column_Name4" and so on..]
And, If you want to use loop to append name for each col then you need iterate over the list and append based on the index and length of the list
First read csv and get each key value pair by iterating over the columns
import pandas as pd
read_df = pd.read_csv(<your csv file path>)
dict_of_pairs = {pairs: read_df[pairs] for pairs in read_df}
Write it in another file
write_df = pd.DataFrame({k: pd.Series(v) for k, v in dict_of_pairs.items()}) // this will allow you to write even if some column has no values in it
writer = pd.ExcelWriter(write_path, engine='xlsxwriter')
df.to_excel(writer, sheet_name='Somename for your sheet', index=False)
Hope this answers your question.....

How to filter out Column data From Multiple rows data?

Good Evening
Hi everyone, so i got the following JSON file from Walmart regarding their product items and price.
so i loaded up jupyter notebook, imported pandas and then loaded it into a Dataframe with custom columns as shown in the pics below.
now this is what i want to do:
make new columns named as min price and max price and load the data into it
how can i do that ?
Here is the code in jupyter notebook for reference.
i also want the offer price as some items dont have minprice and maxprice :)
EDIT: here is the PYTHON Code:
import json
import pandas as pd
with open("walmart.json") as f:
data = json.load(f)
walmart = data["items"]
wdf = pd.DataFrame(walmart,columns=["productId","primaryOffer"])
print(wdf.loc[0,"primaryOffer"])
pd.set_option('display.max_colwidth', None)
print(wdf)
Here is the JSON File:
https://pastebin.com/sLGCFCDC
The following code snippet on top of your code would achieve the required task:
min_prices = []
max_prices = []
offer_prices = []
for i,row in wdf.iterrows():
if('showMinMaxPrice' in row['primaryOffer']):
min_prices.append(row['primaryOffer']['minPrice'])
max_prices.append(row['primaryOffer']['maxPrice'])
offer_prices.append('N/A')
else:
min_prices.append('N/A')
max_prices.append('N/A')
offer_prices.append(row['primaryOffer']['offerPrice'])
wdf['minPrice'] = min_prices
wdf['maxPrice'] = max_prices
wdf['offerPrice'] = offer_prices
Here we are checking for the 'showMinMaxPrice' element from the json in the column named 'primaryOffer'. For cases where the minPrice and maxPrice is available, the offerPrice is shown as 'N/A' and vice-versa. These are first stored in lists and later added to the dataframe as columns.
The output for wdf.head() would then be:

Create Forecasts Looping over SKUs and Export to CSV using Facebook Prophet

I am new to Python so please bear with me.
I am trying to convert what I think may be a nested dictionary into a csv that I can export. Below is my code:
import pandas as pd
import os
from fbprophet import Prophet
# Read in File
df1 = pd.read_csv('File_Path.csv')
#Create Loop to Forecast Multiple SKUs
def get_prediction(df):
prediction = {}
df1 = df.rename(columns={'Date': 'ds','qty_ordered': 'y', 'item_no': 'item'})
list_items = df1.item.unique()
for item in list_items:
item_df = df1.loc[df1['item'] == item]
# set the uncertainty interval to 95% (the Prophet default is 80%)
my_model = Prophet(yearly_seasonality= True, seasonality_prior_scale=1.0)
my_model.fit(item_df)
future_dates = my_model.make_future_dataframe(periods=12, freq='M')
forecast = my_model.predict(future_dates)
prediction[item] = forecast
return prediction
# Save predictions to dictionary
df2 = get_prediction(df1)
# Convert dictionary
df3 = pd.DataFrame.from_dict(df3, index='columns)
So the last part of the code is where I am struggling. I need to convert the df2 dictionary to a dataframe (df3) so I can export it to a csv. But it looks as if it is a nested dictionary? Not sure if I need to update my function or not.
This is what a snippet of the dictionary looks like
I need to export it so it will look like this
Any help would be greatly appreciated!
The following code should help flattening df2 (dictionary of dataframes if I understand correctly).
def flatten(dict_of_df):
# insert column 'item'
for key, value in dict_of_df.items():
value['item'] = key
# return vertically concatenated dataframe with all the items
return pd.concat(dict_of_df.values())

How to change all columns in csv file to str?

I am working on a script that imports an excel file, iterates through a column called "Title," and returns False if a certain keyword is present in "Title." The script runs, until I get to part where I want to export another csv file that gives me a separate column. My error is as follows: AttributeError: 'int' object has no attribute 'lower'
Based on this error, I changed the df.Title to a string using df['Title'].astype(str), but I get the same error.
import pandas as pd
data = pd.read_excel(r'C:/Users/Downloads/61_MONDAY_PROCESS_9.16.19.xlsx')
df = pd.DataFrame(data, columns=['Date Added','Track Item', 'Retailer Item ID','UPC','Title','Manufacturer','Brand','Client Product
Group','Category','Subcategory',
'Amazon Sub Category','Segment','Platform'])
df['Title'].astype(str)
df['Retailer Item ID'].astype(str)
excludes = ['chainsaw','pail','leaf blower','HYOUJIN','brush','dryer','genie','Genuine
Joe','backpack','curling iron','dog','cat','wig','animal','dryer',':','tea', 'Adidas', 'Fila',
'Reebok','Puma','Nike','basket','extension','extensions','batteries','battery','[EXPLICIT]']
my_excludes = [set(x.lower().split()) for x in excludes]
match_titles = [e for e in df.Title.astype(str) if any(keywords.issubset(e.lower().split()) for
keywords in my_excludes)]
def is_match(title, excludes = my_excludes):
if any(keywords.issubset(title.lower().split()) for keywords in my_excludes):
return True
return False
This is the part that returns the error:
df['match_titles'] = df['Title'].apply(is_match)
result = df[df['match_titles']]['Retailer Item ID']
print(df)
df.to_csv('Asin_List(9.18.19).csv',index=False)
Use the following code to import your file:
data = pd.read_excel(r'C:/Users/Downloads/61_MONDAY_PROCESS_9.16.19.xlsx',
dtype='str')`
For pandas.read_excel, you can pass an optional parameter dtype.
You can also use it to pass multiple data types for different columns:
ex: dtype={'Retailer Item ID': int, 'Title': str})
At the line where you wrote
match_titles = [e for e in df.Title.astype(str) if any(keywords.issubset(e.lower().split()) for
keywords in my_excludes)]
python returns as variable e an integer and not the String you like.This happens because when you write df.Title.astype(str) you are searching the index of a new pandas dataframe containing only the column Title and not the contents of the column.If you want to iterate through column you should try
match_titles = [e for e in df.ix[:,5] if any(keywords.issubset(e.lower().split()) for keywords in my_excludes)
The df.ix[:,5] returns the fifth column of the dataframe df,which is the column you want.If this doesn't work try with the iteritems() function.
The main idea is that if you directly assign a df[column] to something else,you are assigning its index,not its contents.

How do i add two dates that are saved in .json files?

I am having a hard time summing two dates that are saved in two separate json files. I want to add set dates together which are saved in separate libraries.
The first file (A1.json) contains: {"expires": "2019-09-11"}
The second file (Whitelist.json) contains: {"expires": "0000-01-00"}
These dates are created by using tkcalendar and are later exported to these seperate files, the idea being that summing them lets me set a time date one month into the future. However, I can't seem to add them together without some form of an error.
I have tried converting the json files to strings in python and then adding them and also using the striptime command to sum the dates.
Here is the relevant chunk of the code:
{with open('A1.json') as f:
data=json.loads(f.read())
for material in data.items():
A1 = (format(material[1]['expires']))
with open('Whitelist.json') as f:
data=json.loads(f.read())
for material in data.items():
A2 = (format(material[1]['expires']))
print(A1+A2)}
When this is used, they just get pasted one after another. They don't get summed the way I need.
I also have tried the following code:
{t1 = dt.datetime.strptime('A1', '%d-%m-%Y')
t2 = dt.datetime.strptime('Whitelist', '%d-%m-%Y')
time_zero = dt.datetime.strptime('00:00:00', '%d/%m/%Y')
print((t1 - time_zero + Whitelist).time())}
However, this constantly gives out ValueError: time data does not match format '%y:%m:%d'.
What I expect is the sum of 2019-09-11 and 0000-01-00's result is 2019-10-11. However, the result is 2019-09-110000-01-00. Trying the strptime method gives out ValueErrors such as: ValueError: time data does not match format '%y:%m:%d'.
Thank you in advance, and I apologize if I did something wrong on my first post.
Use pandas:
the actual format of the json file isn't provided, so use something like the following to get the data into a DataFrame:
pd.read_json('A1.json', orient='records'): parameters will depend on the format of the file
json_normalize
d2 is not a proper datetime format so don't try to convert it.
the Code section below, will use a dict to set up the DataFrame for the example.
json files to DataFrames:
df1 = pd.read_json('A1.json', orient='records')
df2 = pd.read_json('Whitelist.json', orient='records')
df = pd.DataFrame()
df['expires'] = df1.expires
df['d2'] = df2.expires
Code:
import pandas as pd
df = pd.DataFrame({"expires": ["2019-09-11", "2019-10-11", "2019-11-11"],
"d2": ["0000-01-00", "0000-02-00", "0000-03-00"]})
Expand d2 using str.split:
df.expires = pd.to_datetime(df.expires)
df[['y', 'm', 'd']] = df.d2.str.split('-', expand=True)
Use pd.DateOffset:
df['expires_new'] = df[['expires', 'm']].apply(lambda x: x[0] + pd.DateOffset(months=int(x[1])), axis=1)
if d2 is expected to have more than just a new m or month value, the lambda expression can be changed to call a function that adjusts for y, m, and d values.

Categories

Resources