replace None with Null, in place - python

I have a requirement to drop my test results into a csv for reporting. In my python test code when I don't have a value, my variables are filled in the python way with None.
I have been asked to replace these with "Null" in the CSV for the reporting tool. I am thinking this is easy and has probably be solved a hundred times.
Here is the code I came up with:
for field in (TestCase.productUnderTest,TestCase.versionUnderTest,TestCase.name,TestCase.results,TestCase.lastTestEnd,TestCase.parent,TestCase.level):
if field == None:
field = 'Null'
ME.csvoutput.write("%s,%s,%s,%s,%s,%s,%s\n" % (TestCase.productUnderTest,TestCase.versionUnderTest,TestCase.name,TestCase.results,TestCase.lastTestEnd,TestCase.parent,TestCase.level))
Unfortunately that only changes the field within the scope of the for loop. How can I change it for the scope of the write statement.
(I would be quite happy to just write "Null" and leave my variables unchanged, but I can work either way.)

result = [TestCase.productUnderTest,TestCase.versionUnderTest,TestCase.name,TestCase.results,TestCase.lastTestEnd,TestCase.parent,TestCase.level]
result = map(lambda x:x==None and 'Null' or str(x), result)
ME.csvoutput.write(",".join(result)+'\n')

To keep your code, you can try:
for field_name in ('productUnderTest','versionUnderTest','name','results','lastTestEnd','parent','level'):
if getattr(TestCase, field_name) is None:
setattr(TestCase, 'Null')
I suggest to look at the csv module.

Do it like this:
fields = [str(field) or "Null" for field in (TestCase.productUnderTest,TestCase.versionUnderTest,TestCase.name,TestCase.results,TestCase.lastTestEnd,TestCase.parent,TestCase.level)]
ME.csvoutput.write("%s\n" % ",".join(fields))))
Or, even more powerfull: use generator object instead:
fields = (str(field) or "Null" for field in (TestCase.productUnderTest,TestCase.versionUnderTest,TestCase.name,TestCase.results,TestCase.lastTestEnd,TestCase.parent,TestCase.level))

You should use method Pandas fillna:
from pandas import DataFrame
DataFrane.fillna(value='NULL', inplace=True)
Example
import pandas as pd
df = pd.read_csv(csv_file)
df.fillna(value='NULL', inplace=True)

Related

How to iterate over a CSV file with Pywikibot

I wanted to try uploading a series of items to test.wikidata, creating the item and then adding a statement of inception P571. The csv file sometimes has a date value, sometimes not. When no date value is given, I want to write out a placeholder 'some value'.
Imagine a dataframe like this:
df = {'Object': [1, 2,3], 'Date': [250,,300]}
However, I am not sure using Pywikibot how to iterate over a csv file with pywikibot to create an item for each row and add a statement. Here is the code I wrote:
import pywikibot
import pandas as pd
site = pywikibot.Site("test", "wikidata")
repo = site.data_repository()
df = pd.read_csv('experiment.csv')
item = pywikibot.ItemPage(repo)
for item in df:
date = df['date']
prop_date = pywikibot.Claim(repo, u'P571')
if date=='':
prop_date.setSnakType('somevalue')
else:
target = pywikibot.WbTime(year=date)
prop_date.setTarget(target)
item.addClaim(prop_date)
When I run this through PAWS, I get the message: KeyError: 'date'
But I think the real issue here is that I am not sure how to get Pywikibot to iterate over each row of the dataframe and create a new claim for each new date value. I would value any feedback or suggestions for good examples and documentation. Many thanks!
Looking back on this, the solution was to use .iterrows() or .itertuples() or .loc[] to access the values in the row.
So
for item in df.itertuples():
prop_date = pywikibot.Claim(repo, u'P571')
if item.date=='':
prop_date.setSnakType('somevalue')
else:
target = pywikibot.WbTime(year=date)
prop_date.setTarget(target)
item.addClaim(prop_date)

How to replace a string in a list of strings in a DataFrame (Python)?

I have a Dataframe which consists of lists of lists in two seperate columns.
import pandas as pd
data = pd.DataFrame()
data["Website"] = [["google.com", "amazon.com"], ["google.com"], ["aol.com", "no website"]]
data["App"] = [["Ok Google", "Alexa"], ["Ok Google"], ["AOL App", "Generic Device"]]
Thats how the Dataframe looks like
I need to replace certain strings in the first column (here: "no website") with the according string in the second column (here: "Generic Device"). The replacing string has the same index in the list as the string that needs to be replaced.
What did not work so far:
I tried several forms of str.replace(x,y) for lists and DataFrames and nothing worked. A simple replace(x,y) does not work as I need to replace several different strings. I think I can't get my head around the indexing thing.
I already googled and stackoverflowed for two hours and haven't found a solution yet.
Many thanks in advance! Sorry for bad engrish or noob mistakes, I am still learning.
-Max
Define replacement function and use apply to vectorize
def replacements(websites, apps):
" Substitute items in list replace_items that's found in websites "
replace_items = ["no website", ] # can add to this list of keys
# that trigger replacement
for i, k in enumerate(websites):
# Check each item in website for replacement
if k in replace_items:
# This is an item to be replaced
websites[i] = apps[i] # replace with corresponding item in apps
return websites
# Create Dataframe
websites = [["google.com", "amazon.com"], ["google.com"], ["aol.com", "no website"]]
app = [["Ok Google", "Alexa"], ["Ok Google"], ["AOL App", "Generic Device"]]
data = list(zip(websites, app))
df = pd.DataFrame(data, columns = ['Websites', 'App'])
# Perform replacement
df['Websites'] = df.apply(lambda row: replacements(row['Websites'], row['App']), axis=1)
print(df)
Output
Websites App
0 [google.com, amazon.com] [Ok Google, Alexa]
1 [google.com] [Ok Google]
2 [aol.com, Generic Device] [AOL App, Generic Device]
Try this,You can define replaceable values in a array and execute.
def f(x,items):
for rep in items:
if rep in list(x.Website):
x.Website[list(x.Website).index(rep)]=list(x.App)[list(x.Website).index(rep)]
return x
items = ["no website"]
data = data.apply(lambda x: f(x,items),axis=1)
Output:
Website App
0 [google.com, amazon.com] [Ok Google, Alexa]
1 [google.com] [Ok Google]
2 [aol.com, Generic Device] [AOL App, Generic Device]
First of all, Happy Holidays!
I wasn't really sure what your expected output was and I'm not really sure what you have tried previously, but I think that this may work:
data["Website"] = data["Website"].replace("no website", "Generic Device")
I really hope this helps!
You can create a function like this:
def f(replaced_value, col1, col2):
def r(s):
while replaced_value in s[col1]:
s[col1][s[col1].index(replaced_value)] = s[col2][s[col1].index(replaced_value)]
return s
return r
and use apply:
df=df.apply(f("no website","Website","App"), axis=1)
print(df)

How to either skip lines or check type of data within single construction line when processing csv input into Python dictionary

My input is a .csv file that happens to have headers.
I want to use a concise line, like this:
mydict = {custID:[parser.parse(str(date)), amount]
for transID, custID, amount, date in reader}
to create a dictionary from the input. However, the data isn't perfectly "clean". I want to check that each row of data is the sort of data that I want the dictionary to map.
Something like:
mydict = {if custID is type int custID:[parser.parse(str(date)), amount]
for transID, custID, amount, date in reader}
would be a nice fix, but, alas, it does not work.
Any suggestions that keep the short dictionary constructor while facilitating input processing?
I think you are on the right track and filtering with dictionary comprehension should work here:
mydict = {custID: [parser.parse(str(date)), amount]
for transID, custID, amount, date in reader
if isinstance(custID, int)}
In this case, you would though silently ignore rows where custID is not of an integer type.
Plus, things would go wrong if custID is not unique. If custIDs could repeat, you might want to switch to a defaultdict(list) collection, collecting date+amount pairs grouped by custID.
For a similar task, I've personally used CsvSchema third-party package - you can define what types in csv columns are you expecting, extra validation rules:
CsvSchema is easy to use module designed to make CSV file checking
easier. It allows to create more complex validation rules faster
thanks to some predefined building blocks.
In your case, here is an example CSV structure class you may start with:
from datetime import datetime
from csv_schema.structure.base import BaseCsvStructure
from csv_schema.columns.base import BaseColumn
from csv_schema.exceptions import ImproperValueException
from csv_schema.columns import IntColumn, DecimalColumn, StringColumn
class DateColumn(BaseColumn):
def convert(self, raw_val):
try:
return datetime.strptime(raw_val, '%Y-%m-%d') if raw_val else None
except ValueError:
raise ImproperValueException('Invalid date format')
class MyCsvStructure(BaseCsvStructure):
transID = IntColumn(max_length=10)
custID = IntColumn(max_length=10)
amount = DecimalColumn(blank=True, fraction_digits=2)
date = DateColumn(max_length=10, blank=True)

How do I exclude some responses from a function that pulls that from a spreadsheet?

I am trying to only show the dates to the uses of this Python application. For some reason, the code returns responses like "Date" and "None" from the spreadsheet. Date is in the column that I am trying to draw the dates from. Here is the code:
sh = gc.open("Deposits")
worksheet = sh.worksheet("Sheet2")
values_list = worksheet.col_values(3)
set = set(values_list)
result = list(set)
print "Here are all the possible dates to check:",result
Result:
['3/10/2012', '2/18/2013', '3/18/2011', '3/17/2010', 'Date', None, '2/9/2010']
How do I get this function to only return the dates and exclude 'Date' and 'None'?
Just subtract a set that contains the things you don't want to include.
myset = set(values_list) - {None, 'Date'}
Also, don't use variable names that are already assigned to built-in functions, like set, or you'll run into problems when you want to use that built-in function.
You can use a list comprehension to get rid of "Date" and None
a = ['3/10/2012', '2/18/2013', '3/18/2011', '3/17/2010', 'Date', None, '2/9/2010']
r = list(set([i for i in a if i not in("Date",None)]))
['3/10/2012', '2/18/2013', '3/18/2011', '3/17/2010', '2/9/2010']

DataFrame constructor not properly called! error

I am new to Python and I am facing problem in creating the Dataframe in the format of key and value i.e.
data = [{'key':'\[GlobalProgramSizeInThousands\]','value':'1000'},]
Here is my code:
columnsss = ['key','value'];
query = "select * from bparst_tags where tag_type = 1 ";
result = database.cursor(db.cursors.DictCursor);
result.execute(query);
result_set = result.fetchall();
data = "[";
for row in result_set:
`row["tag_expression"]`)
data += "{'value': %s , 'key': %s }," % ( `row["tag_expression"]`, `row["tag_name"]` )
data += "]" ;
df = DataFrame(data , columns=columnsss);
But when I pass the data in DataFrame it shows me
pandas.core.common.PandasError: DataFrame constructor not properly called!
while if I print the data and assign the same value to data variable then it works.
You are providing a string representation of a dict to the DataFrame constructor, and not a dict itself. So this is the reason you get that error.
So if you want to use your code, you could do:
df = DataFrame(eval(data))
But better would be to not create the string in the first place, but directly putting it in a dict. Something roughly like:
data = []
for row in result_set:
data.append({'value': row["tag_expression"], 'key': row["tag_name"]})
But probably even this is not needed, as depending on what is exactly in your result_set you could probably:
provide this directly to a DataFrame: DataFrame(result_set)
or use the pandas read_sql_query function to do this for you (see docs on this)
Just ran into the same error, but the above answer could not help me.
My code worked fine on my computer which was like this:
test_dict = {'x': '123', 'y': '456', 'z': '456'}
df=pd.DataFrame(test_dict.items(),columns=['col1','col2'])
However, it did not work on another platform. It gave me the same error as mentioned in the original question. I tried below code by simply adding the list() around the dictionary items, and it worked smoothly after:
df=pd.DataFrame(list(test_dict.items()),columns=['col1','col2'])
Hopefully, this answer can help whoever ran into a similar situation like me.
import json
# Opening JSON file
f = open('data.json')
# returns JSON object as
# a dictionary
data1 = json.load(f)
#converting it into dataframe
df = pd.read_json(data1, orient ='index')

Categories

Resources