pandas how to replace only empty list data? - python

my dataframe look like this:
variations_list1 variations_list2
["yellow","ornage"] []
["xl","xxl"] []
["Burger","pizza"] ["$25","$30"]
expected dataframe:
variations_list1 variations_list2
["yellow","ornage"] ["yellow","ornage"] #filling emty list with current row data
["xl","xxl"] ["xl","xxl"]
["Burger","pizza"] ["$25","$30"]

You can just do
df.loc[~df['variations_list2'].astype(bool),'variations_list2'] = df['variations_list1']
You have the same issue like before, list is not list
df.loc[df['variations_list2']=='[]','variations_list2'] = df['variations_list1']

Related

pandas: while loop to simultaneously advance through multiple lists and call functions

I want my code to:
read data from a CSV and make a dataframe: "source_df"
see if the dataframe contains any columns specified in a list:
"possible_columns"
call a unique function to replace the values in each column whose header is found in the "possible_columns" the list, then insert the modified values in a new dataframe: "destination_df"
Here it is:
import pandas as pd
#creates source_df
file = "yes-no-true-false.csv"
data = pd.read_csv(file)
source_df = pd.DataFrame(data)
#creates destination_df
blanklist = []
destination_df = pd.DataFrame(blanklist)
#create the column header lists for comparison in the while loop
columns = source_df.head(0)
possible_columns = ['yes/no','true/false']
#establish the functions list and define the functions to replace column values
fix_functions_list = ['yes_no_fix()','true_false_fix()']
def yes_no_fix():
destination_df['yes/no'] = destination_df['yes/no fixed'].replace("No","0").replace("Yes","1")
def true_false_fix():
destination_df['true/false'] = destination_df['true/false fixed'].replace('False', '1').replace('True', '0')
'''use the counter to call a unique function from the function list to replace the values in each column whose header is found in the "possible_columns" the list, insert the modified values in "destination_df, then advance the counter'''
counter = 0
while counter < len(possible_columns):
if possible_columns[counter] in columns:
destination_df.insert(counter, possible_columns[counter], source_df[possible_columns[counter]])
fix_functions_list[counter]
counter = counter + 1
#see if it works
print(destination_df.head(10))
When I print(destination_df), I see the unmodified column values from source_df. When I call the functions independently they work, which makes me think something is going wrong in my while loop.
Your issue is that you are trying to call a function that is stored in a list as a string.
fix_functions_list[cnt]
This will not actually run the function just access the string value.
I would try and find another way to run these functions.
def yes_no_fix():
destination_df['yes/no'] = destination_df['yes/no fixed'].replace("No","0").replace("Yes","1")
def true_false_fix():
destination_df['true/false'] = destination_df['true/false fixed'].replace('False', '1').replace('True', '0')
fix_functions_list = {0:yes_no_fix,1:true_false_fix}
and change the function calling to like below
fix_functions_list[counter]()
#creates source_df
file = "yes-no-true-false.csv"
data = pd.read_csv(file)
source_df = pd.DataFrame(data)
possible_columns = ['yes/no','true/false']
mapping_dict={'yes/no':{"No":"0","Yes":"1"} ,'true/false': {'False':'1','True': '0'}
old_columns=[if column not in possible_columns for column in source_df.columns]
existed_columns=[if column in possible_columns for column in source_df.columns]
new_df=source_df[existed_columns]
for column in new_df.columns:
new_df[column].map(mapping_dict[column])
new_df[old_columns]=source_df[old_columns]

How to create a dataframe in the for loop?

I want to create a dataframe that consists of values obtained inside the for loop.
columns = ['BIN','Date_of_registration', 'Tax','TaxName','KBK',
'KBKName','Paynum','Paytype', 'EntryType','Writeoffdate', 'Summa']
df = pd.DataFrame(columns=columns)
I have this for loop:
for elements in tree.findall('{http://xmlns.kztc-cits/sign}payment'):
print("hello")
tax = elements.find('{http://xmlns.kztc-cits/sign}TaxOrgCode').text
tax_name_ru = elements.find('{http://xmlns.kztc-cits/sign}NameTaxRu').text
kbk = elements.find('{http://xmlns.kztc-cits/sign}KBK').text
kbk_name_ru = elements.find('{http://xmlns.kztc-cits/sign}KBKNameRu').text
paynum = elements.find('{http://xmlns.kztc-cits/sign}PayNum').text
paytype = elements.find('{http://xmlns.kztc-cits/sign}PayType').text
entry_type = elements.find('{http://xmlns.kztc-cits/sign}EntryType').text
writeoffdate = elements.find('{http://xmlns.kztc-cits/sign}WriteOffDate').text
summa = elements.find('{http://xmlns.kztc-cits/sign}Summa').text
print(tax, tax_name_ru, kbk, kbk_name_ru, paynum, paytype, entry_type, writeoffdate, summa)
How can I append acquired values to the initially created(outside for loop) dataframe?
A simple way if you only need the dataframe after the loop is completed is to append the data to a list of lists and then convert to a dataframe. Caveat: Responsibility is on you to make sure the list ordering matches the columns, so if you change your columns in the future you have to reposition the list.
list_of_rows = []
for elements in tree.findall('{http://xmlns.kztc-cits/sign}payment'):
list_of_rows.append([
tax, tax_name_ru, kbk, kbk_name_ru, paynum, paytype,entry_type, writeoffdate, summa])
df = pd.DataFrame(columns=columns, data=list_of_rows)

How to grab all items from all DataFrames in Pandas

I have this issue where all the rows in mt Dataframe contain more than one item. I would like to iterate throughout the whole Dataframe and append each row item into a new list but I'm unsure on how to do this as of now.
IPs
0 [172.16.254.1, 192.168.1.15, 255.255.255.0]
1 [192.0.2.1, 255.255.255.0, 192.0.2.1]
2 [172.16.254.1]
3 [0.0.0.0]
This is my current output - and I would like to take each item per row in the Dataframe and append to a list
curled_ips_list = []
ip_addresses_found = []
ip_address_format = (r'\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b')
with open(website_file_path, 'r', encoding='utf-8-sig') as curled_ips_file:
found_ips_reader = pd.read_csv(curled_ips_file, names=['IPs'], delimiter='\n', quoting=csv.QUOTE_NONE, engine='c')
found_ips_reader = pd.Series(found_ips_reader['IPs'])
curled_ips_list = found_ips_reader[found_ips_reader.str.contains(ip_address_format)]
curled_ips_list = curled_ips_list.str.findall(ip_address_format)
curled_ips_list = pd.DataFrame(curled_ips_list)
curled_ips_file.close()
Not receiving any error messages as of yet, but unsure on how to go about it
Since you have not mentioned your output that you need, I am assuming you need the following.
#Load your IPs in a dataframe as the one that you have mentioned above.
iplist = df['IPs']
[ip for sublist in iplist for ip in sublist]
['172.16.254.1',
'192.168.1.15',
'255.255.255.0',
'192.0.2.1',
'255.255.255.0',
'192.0.2.1',
'172.16.254.1',
'0.0.0.0']

dataframe from dict resulting in empty dataframe

Hi I wrote some code that builds a default dictionary
def makedata(filename):
with open(filename, "r") as file:
for x in features:
previous = []
count = 0
for line in file:
var_name = x
regexp = re.compile(var_name + r'.*?([0-9.-]+)')
match = regexp.search(line)
if match and (match.group(1)) != previous:
previous = match.group(1)
count += 1
if count > wlength:
count = 1
target = str(str(count) + x)
dict.setdefault(target, []).append(match.group(1))
file.seek(0)
df = pd.DataFrame.from_dict(dict)
The dictionary looks good but when I try to convert to dataframe it is empty. I can't figure it out
dict:
{'1meanSignalLenght': ['0.5305184', '0.48961428', '0.47203177', '0.5177274'], '1amplCor': ['0.8780955002105448', '0.8634431017504487', '0.9381169983046714', '0.9407036427333355'], '1metr10.angle1': ['0.6439386643584522', '0.6555194964997434', '0.9512436169922103', '0.23789348400794422'], '1syncVar': ['0.1344131181025432', '0.08194580887223515', '0.15922251165913678', '0.28795644612520327'], '1linVelMagn': ['0.07062673289287498', '0.08792496681784517', '0.12603999663935528', '0.14791253129369603'], '1metr6.velSum': ['0.17850601560734558', '0.15855169971072014', '0.21396496345720045', '0.2739525279330513']}
df:
Empty DataFrame
Columns: []
Index: []
{}
I think part of your issue is that you are using the keyword 'dict', assuming it is a variable
make a dictionary in your function, call it something other than 'dict'. Have your function return that dictionary. Then when you make a dataframe use that return value. Right now, you are creating a data frame from an empty dictionary object.
df = pd.DataFrame(dict)
This should make a dataframe from the dictionary.
You can either pass a list of dicts simply using pd.DataFrame(list_of_dicts) (use pd.DataFrame([dict]) if your variable is not a list) or a dict of list using pd.DataFrame.from_dict(dict). In this last case dict should be something like dict = {a:[1,2,3], "b": ["a", "b", "c"], "c":...}.
see: Pandas Dataframe from dict with empty list value

Why does square brackets appear in Pandas DataFrame only when I append to list?

I have a loop that extracts some info from several files, each containing a list of actions taken in a game. If I run the following code, I get a dataframe with the correct output (the GameId without brackets).
GameId = ['23456']
dfWant = pd.DataFrame({'GameId': GameId})
print(dfWant)
But If I loop through the files and try to add each item to a list, the resulting dataframe contains square brackets. Why does this output differ? And how can I make the code below output the GameId without square brackets in the DataFrame?
GameId = ['12345']
GameIdList = []
GameIdList.append(Game)
dfHave = pd.DataFrame({'GameId': GameIdList})
print(dfHave)
This works great on python 3.7 :
The follwing solved my problem.
GameId = ['12345']
GameIdList = []
GameIdList.extend(Game)
a = {'GameId': GameIdList}
dfHave = pd.DataFrame.from_dict(a, orient = 'index')
dfHave = dfHave.transpose()

Categories

Resources