I have a two lists that are zipped together, i am able to print the list out to view but i when i try to export the list into a csv file, the csv file is created but its empty. not sure why as im using the same method to save the two lists separately and it works.
import csv
import random
import datetime
import calendar
with open("Duty Name List.csv") as CsvNameList:
NameList = CsvNameList.read().split("\n")
date = datetime.datetime.now()
MaxNumofDays = calendar.monthrange(date.year,date.month)
print (NameList)
print(date.year)
print(date.month)
print(MaxNumofDays[1])
x = MaxNumofDays[1] + 1
daylist = list(range(1,x))
print(daylist)
ShuffledList = random.sample(NameList,len(daylist))
print(ShuffledList)
RemainderList = set(NameList) - set(ShuffledList)
print(RemainderList)
with open("remainder.csv","w") as f:
wr = csv.writer(f,delimiter="\n")
wr.writerow(RemainderList)
AssignedDutyList = zip(daylist,ShuffledList)
print(list(AssignedDutyList))
with open("AssignedDutyList.csv","w") as g:
wr = csv.writer(g)
wr.writerow(list(AssignedDutyList))
no error messages are produced.
In Python 3, This line
AssignedDutyList = zip(daylist,ShuffledList)
creates an iterator named AssignedDutyList.
This line
print(list(AssignedDutyList))
exhausts the iterator. When this line is executed
wr.writerow(list(AssignedDutyList))
the iterator has no further output, so nothing is written to the file.
The solution is to store the result of calling list on the iterator in a variable rather than the iterator itself, in cases where the content of an iterator must be reused.
AssignedDutyList = list(zip(daylist,ShuffledList))
print(AssignedDutyList)
with open("AssignedDutyList.csv","w") as g:
wr = csv.writer(g)
wr.writerow(AssignedDutyList)
As a bonus, the name AssignedDutyList now refers to an actual list, and so is less confusing for future readers of the code.
Related
I want to run multiprocess in python.
Here is an example:
def myFunction(name,age):
output = paste(name,age)
return output
names = ["A","B","C"]
ages = ["1","2","3"]
with mp.Pool(processes=no_cpus) as pool:
results = pool.starmap(myFunction,zip(names,ages))
results_table = pd.concat(results)
results_table.to_csv(file,sep="\t",index=False)
myFunction in the real case takes really long time. Sometime I have to interupt the running and start again. However the results will only be written to the output file when all pool.starmap is done. How can I store the intermediate/cache result before it's finished?
I don't want to change myFunction from return to .to_csv()
Thanks!
Instead of using map, use method imap, which returns an iterator that when iterated gives each result one by one as they become available (i.e. returned by my_function). However, the results must still be returned in order. If you do not care about the order, than use imap_unordered.
As each dataframe is returned and iterated, it is converted to a CSV file and outputted either with or without a header according to whether it is the first result being processed.
import pandas as pd
import multiprocessing as mp
def paste(name, age):
return pd.DataFrame([[name, age]], columns=['Name', 'Age'])
def myFunction(t):
name, age = t # unpack passed tuple
output = paste(name, age)
return output
# Required for Windows:
if __name__ == '__main__':
names = ["A","B","C"]
ages = ["1","2","3"]
no_cpus = min(len(names), mp.cpu_count())
csv_file = 'test.txt'
with mp.Pool(processes=no_cpus) as pool:
# Results from imap must be iterated
for index, result in enumerate(pool.imap(myFunction, zip(names,ages))):
if index == 0:
# First return value
header = True
open_flags = "w"
else:
header = False
open_flags = "a"
with open(csv_file, open_flags, newline='') as f:
result.to_csv(f, sep="\t", index=False, header=header)
Output of test.txt:
Name Age
A 1
B 2
C 3
X values are being taken from a Y.JSON file. The Y file could change in values depending. I want the X file to save all values without overwriting the previous saved value.
# Initialize new dictionary/JSON for X
X = dict ()
with open('X.json', 'w') as f:
f.write(json.dumps(X))
If these cycles do not represent any value/meaning which you'd like to include in the filename, you could try to time encode the filenames. In this way, you end up with file names that include the time the files were saved at.
from datetime import datetime
now = datetime.now()
time = now.strftime("%y%m%d%M%S") # choose any format you like
filename = time+'_X.json'
X = dict ()
with open(filename, 'w') as f:
f.write(json.dumps(X))
For example, if files are being created every 4 seconds gives the following filenames:
2106265848_X.json
2106265852_X.json
2106265856_X.json
2106265900_X.json
2106265904_X.json
2106265908_X.json
2106265912_X.json
2106265916_X.json
2106265920_X.json
However, if the cycles (or whatever experiment you are doing) do matter I would strongly recommend to include it in the filed name.
e.g.
filename = f"{time}_X_c{cycle}.json"
To end up with something like this as results:
2106260547_X_c0.json
2106260551_X_c1.json
2106260555_X_c2.json
2106260559_X_c3.json
2106260603_X_c4.json
I have a python script that looks at a json file and lists variables to a CSV. The problem I am having is latitude and logitude are listed twice. Therefore, when I write the row, it looks at those variables and creates an output with duplicate values.
import csv, json, sys
def find_deep_value(d, key):
# Modified from https://stackoverflow.com/questions/48568649/convert-json-to-csv-using-python/48569129#48569129
if key in d:
return d[key]
for k in d.keys():
if isinstance(d[k], dict):
for j in find_deep_value(d[k], key):
return j
inputFile = open("pywu.cache.json", 'r') # open json file
outputFile = open("CurrentObs.csv", 'w') # load csv file
data = json.load(inputFile) # load json content
inputFile.close() # close the input file
output = csv.writer(outputFile) # create a csv.write
# Gives you latitude coordinates from within the json
lat = find_deep_value(data, "latitude")
# Gives you longitude coordinates from within the json
lon = find_deep_value(data, "longitude")
# Gives you a list of weather from within the json
weather = find_deep_value(data, "weather")
# Gives you a list of temperature_strings from within the json
temp = find_deep_value(data, "temperature_string")
output.writerow([lat, lon, weather, temp])
outputFile.close()
Is there a way to only list them once?
You need to use return rather than yield. Yield is for generators. Once you fix that, you'll also need to change
list(find_deep_value(data, "latitude"))
to
find_deep_value(data, "latitude")
for each of those lines. And finally, change
output.writerow(lat + lon + weather + temp)
to
output.writerow([lat, lon, weather, temp])
What's happening (you might want to read up on generators first) is when a key is not in the top-level dictionary, you start looping through them, and when the first 'latitude' is reached, the yield keyword returns a generator object. You have that generator wrapped in list() which immediately unpacks the entire generator into a list. So if you have more than one sub-dictionary with the given key in it, you're going to end up looking through and finding every single one.
I have a function, which takes a CSV and processes it. I am trying to count the rows in the CSV file before running through it to ensure I get to the end.
def parse_campaigns_cutsheet(country_code, input_file, DB, DB_hist, zip_id):
filecontent = urllib2.urlopen(input_file)
count = 0
csvFile = csv.DictReader(filecontent)
rowage = len(list(csvFile))
for row in csvFile:
count += 1
if 'MM' not in row['Tier']:
continue
if RB_COUNTRIES_new[country_code]['cut_sheet_country'] != row['Country']:
continue
document = DB.find_one({'rb_account_id': RB_COUNTRIES_new[country_code]['rb_account_id']})
if document is None:
continue
DB.save(document)
report_work(document, DB, DB_hist)
I keep getting the following error UnboundLocalError: local variable 'document' referenced before assignment. If i remove the rowage = len(list(csvFile)) line it works fine?
This happens because the DictReader is a generator.
When you call list on DictReader, it will yield all the values into the list, and won't be iterable any more.
That why when your for-loop tries to iterate over it, it never does, and the document will never be assigned.
If you want to accomplish what you're trying to do, you can keep a reference to the list and then iterate over the list:
...
csvFile = csv.DictReader(filecontent)
filecontent_list = list(csvFile)
rowage = len(filecontent_list)
for row in filecontent_list:
...
Keep in mind - this is mean that all your data will be saved in memory!
When iterating over the generator without forcing it to be a list, only one iterating are being saved in the memory each time.
I'm attempting to:
load dictionary
update/change the dictionary
save
(repeat)
Problem: I want to work with just 1 dictionary (players_scores)
but the defaultdict expression creates a completely seperate dictionary.
How do I load, update, and save to one dictionary?
Code:
from collections import defaultdict#for manipulating dict
players_scores = defaultdict(dict)
import ast #module for removing string from dict once it's called back
a = {}
open_file = open("scores", "w")
open_file.write(str(a))
open_file.close()
open_file2 = open("scores")
open_file2.readlines()
open_file2.seek(0)
i = input("Enter new player's name: ").upper()
players_scores[i]['GOLF'] = 0
players_scores[i]['MON DEAL'] = 0
print()
scores_str = open_file2.read()
players_scores = ast.literal_eval(scores_str)
open_file2.close()
print(players_scores)
You are wiping your changes; instead of writing out your file, you read it anew and the result is used to replace your players_scores dictionary. Your defaultdict worked just fine before that, even if you can't really use defaultdict here (ast.literal_eval() does not support collections.defaultdict, only standard python literal dict notation).
You can simplify your code by using the json module here:
import json
try:
with open('scores', 'r') as f:
player_scores = json.load(f)
except IOError:
# no such file, create an empty dictionary
player_scores = {}
name = input("Enter new player's name: ").upper()
# create a complete, new dictionary
players_scores[name] = {'GOLF': 0, 'MON DEAL': 0}
with open('scores', 'w') as f:
json.dump(player_scores, f)
You don't need defaultdict here at all; you are only creating new dictionary for every player name anyway.
I think one problem is that to index the data structure the way you want, something like a defaultdict(defaultdict(dict)) is what's really needed — but which unfortunately it's impossible to specify one directly like that. However, to workaround that, all you need to do is define a simple intermediary factory function to pass to the upper-level defaultdict:
from collections import defaultdict
def defaultdict_factory(*args, **kwargs):
""" Create and return a defaultdict(dict). """
return defaultdict(dict, *args, **kwargs)
Then you can use players_scores = defaultdict(defaultdict_factory) to create one.
However ast.literal_eval() won't work with one that's been converted to string representation because it's not one of the simple literal data types the function supports. Instead I would suggest you consider using Python's venerable pickle module which can handle most of Python's built-in data types as well custom classes like I'm describing. Here's an example of applying it to your code (in conjunction with the code above):
import pickle
try:
with open('scores', 'rb') as input_file:
players_scores = pickle.load(input_file)
except FileNotFoundError:
print('new scores file will be created')
players_scores = defaultdict(defaultdict_factory)
player_name = input("Enter new player's name: ").upper()
players_scores[player_name]['GOLF'] = 0
players_scores[player_name]['MON DEAL'] = 0
# below is a shorter way to do the initialization for a new player
# players_scores[player_name] = defaultdict_factory({'GOLF': 0, 'MON DEAL': 0})
# write new/updated data structure (back) to disk
with open('scores', 'wb') as output_file:
pickle.dump(players_scores, output_file)
print(players_scores)