I have code that generates a list of 28 dictionaries. It cycles thru 28 files and links data points from each file in the appropriate dictionary. In order to make my code more flexible I wanted to use:
tegDics = [dict() for x in range(len(files))]
But when I run the code the first 27 dictionaries are blank and only the last, tegDics[27], has data. Below is the code including the clumsy, yet functional, code I'm having to use that generates the dictionaries:
x=0
import os
files=os.listdir("DirPath")
os.chdir("DirPath")
tegDics = [{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{}] # THIS WORKS!!!
#tegDics = [dict() for x in range(len(files))] - THIS WON'T WORK!!!
allRads=[]
while x<len(tegDics): # now builds dictionaries
for line in open(files[x]):
z=line.split('\t')
allRads.append(z[2])
tegDics[x][z[2]]=z[4] # pairs catNo with locNo
x+=1
Does anybody know why the more elegant code doesn't work.
Since you're using x within the list comprehension, it will no longer be zero by the time you reach the while loop - it will be len(files)-1 instead. I suggest changing the variable you use to something else. It's traditional to use a single underscore for a value you don't care about.
tegDics = [dict() for _ in range(len(files))]
It could be useful to eliminate your use of x entirely. It's customary in python to iterate directly over the objects in a sequence, rather than using a counter variable. You might do something like:
for tegDic in tegDics:
#do stuff with tegDic here
Although it's slightly trickier in your case, since you want to simultaneously iterate through tegDics and files at the same time. You can use zip to do that.
import os
files=os.listdir("DirPath")
os.chdir("DirPath")
tegDics = [dict() for _ in range(len(files))]
allRads=[]
for file, tegDic in zip(files,tegDics):
for line in open(file):
z=line.split('\t')
allRads.append(z[2])
tegDic[z[2]]=z[4] # pairs catNo with locNo
Anyway there is a simplest way imho:
taegDics = [{}]*len(files)
Related
def read_prices(tikrList):
#read each file and get the price list dictionary
def getPriceDict():
priceDict = {}
TLL = len(tikrList)
for x in range(0,TLL):
with open(tikrList[x] + '.csv','r') as csvFile:
csvReader = csv.reader(csvFile)
for column in csvReader:
priceDict[column[0]] = float(column[1])
return priceDict
#populate the final dictionary with the price dictionary from the previous function
def popDict():
combDict = {}
TLL = len(tikrList)
for x in range(0,TLL):
for y in tikrList:
combDict[y] = getPriceDict()
return combDict
return(popDict())
print(read_prices(['GOOG','XOM','FB']))
What is wrong with the code is that when I return the final dictionary the key for GOOG,XOM,FB is represnting the values for the FB dictionary only.
As you can see with this output:
{'GOOG': {'2015-12-31': 104.660004, '2015-12-30': 106.220001},
'XOM': {'2015-12-31': 104.660004, '2015-12-30': 106.220001},
'FB': {'2015-12-31': 104.660004, '2015-12-30': 106.220001}
I have 3 different CSV files but all of them are just reading the CSV file for FB.
I want to apologize ahead of time if my code is not easy to read or doesn't make sense. I think there is an issue with storing the values and returning the priceDict in the getPriceDict function but I cant seem to figure it out.
Any help is appreciated, thank you!
Since this is classwork I won't provide a solution but I'll point a few things out.
You have defined three functions - two are defined inside the third. While structuring functions like that can make sense for some problems/solutions I don't see any benefit in your solution. It seems to make it more complicated.
The two inner functions don't have any parameters, you might want to refactor them so that when they are called you pass them the information they need. One advantage of a function is to encapsulate an idea/process into a self-contained code block that doesn't rely on resources external to itself. This makes it easy to test so you know that the function works and you can concentrate on other parts of the code.
This piece of your code doesn't make much sense - it never uses x from the outer loop:
...
for x in range(0,TLL):
for y in tikrList:
combDict[y] = getPriceDict()
When you iterate over a list the iteration will stop after the last item and it will iterate over the items themselves - no need to iterate over numbers to access the items: don't do for i in range(thelist): print(thelist[i])
>>> tikrList = ['GOOG','XOM','FB']
>>> for name in tikrList:
... print(name)
GOOG
XOM
FB
>>>
When you read through a tutorial or the documentation, don't just look at the examples - read and understand the text .
I'm new in Python. I'm trying to a write a brief script. I want to run a loop in which I have to read many files and for each file run a command.In particular, I want to do a calculation throught the the two rows of every file and return an output whith a name which is refered to the relative file.
I was able to load the files in a list ('work'). I tried to write the second single loop for the calculation that I have to do whith one of the file in the list and it runs correctly. THe problem is that I'm not able to iterate it over all the files and obtain each 'integr' value from the relative file.
Let me show what I tried to do:
import numpy as np
#I'm loading the files that contain the values whith which I want to do my calculation in a loop
work = {}
for i in range(0,100):
work[i] = np.loadtxt('work{}.txt'.format(i), float).T
#Now I'm trying to write a double loop in which I want to iterate the second loop (the calculation) over the files (that don't have the same length) in the list
integr = 0
for k in work:
for i in range(1, len(k[1,:])):
integr = integr + k[1,i]*(k[0,i] - k[0,i-1])
#I would like to print every 'integr' which come from the calculation over each file
print(integr)
When I try to run this, I obtain this message error:
Traceback (most recent call last):
File "lavoro.py", line 11, in <module>
for i in range(1, len(k[1,:])):
TypeError: 'int' object has no attribute '__getitem__'
Thank you in advance.
I am a bit guessing, but if I understood correctly, you want work to be a list and not a dictionary. Or maybe you don't want it, but surely you can use a list instead of a dictionary, given the context.
This is how you can create your work list:
work = []
for i in range(0,100):
work.append(np.loadtxt('work{}.txt'.format(i), float).T)
Or using the equivalent list comprehension of the above loop (usually the list comprehension is faster):
work = [np.loadtxt('work{}.txt'.format(i), float).T for i in range(100)]
Now you can loop over the work list to do your calculations (I assume they are correct, no way for me to check this):
for k in work:
integr = 0
for i in range(1, len(k[1,:])):
integr = integr + k[1,i]*(k[0,i] - k[0,i-1])
Note that I moved integr = 0 inside the loop, so that is reinitalized to 0 for each file, otherwise each inner loop will add to the result of the previous inner loops.
However if that was the desided behaviour, move integr = 0 outside the loop as your original code.
Guessing from the context you wanted:
for k in work.values():
iterating over dictionary produces only keys, not values.
I have hundereds of dataframe, let say the name is df1,..., df250, I need to build list by a column of those dataframe. Usually I did manually, but today data is to much, and to prone to mistakes
Here's what I did
list1 = df1['customer_id'].tolist()
list2 = df2['customer_id'].tolist()
..
list250 = df250['customer_id'].tolist()
This is so manual, can we make this in easier way?
The easier way is to take a step back and make sure you put your dataframes in a collection such as list or dict. You can then perform operations easily in a scalable way.
For example:
dfs = {1: df1, 2: df2, 3: df3, ... , 250: df250}
lists = {k: v['customer_id'].tolist() for k, v in dfs.items()}
You can then access the results as lists[1], lists[2], etc.
There are other benefits. For example, you are no longer polluting the namespace, you save the effort of explicitly defining variable names, you can easily store and transport related collections of objects.
Using exec function enables you to execute python code stored in a string:
for i in range(1,251):
s = "list"+str(i)+" = df"+str(i)+"['customer_id'].tolist()"
exec(s)
I'd use next code. In this case there's no need to manually create list of DataFrames.
cust_lists = {'list{}'.format(i): globals()['df{}'.format(i)]['customer_id'].tolist()
for i in range(1, 251)}
Now you can access you lists from cust_lists dict by the name, like this:
`cust_lists['list1']`
or
`list1`
I have a set of filenames coming from two different directories.
currList=set(['pathA/file1', 'pathA/file2', 'pathB/file3', etc.])
My code is processing the files, and need to change currList
by comparing it to its content at the former iteration, say processLst.
For that, I compute a symmetric difference:
toProcess=set(currList).symmetric_difference(set(processList))
Actually, I need the symmetric_difference to operate on the basename (file1...) not
on the complete filename (pathA/file1).
I guess I need to reimplement the __eq__ operator, but I have no clue how to do that in python.
is reimplementing __eq__ the right approach?
or
is there another better/equivalent approach?
Here is a token (and likely poorly constructed) itertools version that should run a little bit faster if speed ever becomes a concern (although agree that #Zarkonnen's one-liner is pretty sweet, so +1 there :) ).
from itertools import ifilter
currList = set(['pathA/file1', 'pathA/file2', 'pathB/file3'])
processList=set(['pathA/file1', 'pathA/file9', 'pathA/file3'])
# This can also be a lambda inside the map functions - the speed stays the same
def FileName(f):
return f.split('/')[-1]
# diff will be a set of filenames with no path that will be checked during
# the ifilter process
curr = map(FileName, list(currList))
process = map(FileName, list(processList))
diff = set(curr).symmetric_difference(set(process))
# This filters out any elements from the symmetric difference of the two sets
# where the filename is not in the diff set
results = set(ifilter(lambda x: x.split('/')[-1] in diff,
currList.symmetric_difference(processList)))
You can do this with the magic of generator expressions.
def basename(x):
return x.split("/")[-1]
result = set(x for x in set(currList).union(set(processList)) if (basename(x) in [basename(y) for y in currList]) != (basename(x) in [basename(y) for y in processList]))
should do the trick. It gives you all the elements X that appear in one list or the other, and whose basename-presence in the two lists is not the same.
Edit:
Running this with:
currList=set(['pathA/file1', 'pathA/file2', 'pathB/file3'])
processList=set(['pathA/file1', 'pathA/file9', 'pathA/file3'])
returns:
set(['pathA/file2', 'pathA/file9'])
which would appear to be correct.
I'm doing some exploring of various languages I hadn't used before, using a simple Perl script as a basis for what I want to accomplish. I have a couple of versions of something, and I'm curious which is the preferred method when using Python -- or if neither is, what is?
Version 1:
workflowname = []
paramname = []
value = []
for line in lines:
wfn, pn, v = line.split(",")
workflowname.append(wfn)
paramname.append(pn)
value.append(v)
Version 2:
workflowname = []
paramname = []
value = []
i = 0;
for line in lines:
workflowname.append("")
paramname.append("")
value.append("")
workflowname[i], paramname[i], value[i] = line.split(",")
i = i + 1
Personally, I prefer the second, but, as I said, I'm curious what someone who really knows Python would prefer.
A Pythonic solution might a bit like #Bogdan's, but using zip and argument unpacking
workflowname, paramname, value = zip(*[line.split(',') for line in lines])
If you're determined to use a for construct, though, the 1st is better.
Of your two attepts the 2nd one doesn't make any sense to me. Maybe in other languages it would. So from your two proposed approaces the 1st one is better.
Still I think the pythonic way would be something like Matt Luongo suggested.
Bogdan's answer is best. In general, if you need a loop counter (which you don't in this case), you should use enumerate instead of incrementing a counter:
for index, value in enumerate(lines):
# do something with the value and the index
Version 1 is definitely better than version 2 (why put something in a list if you're just going to replace it?) but depending on what you're planning to do later, neither one may be a good idea. Parallel lists are almost never more convenient than lists of objects or tuples, so I'd consider:
# list of (workflow,paramname,value) tuples
items = []
for line in lines:
items.append( line.split(",") )
Or:
class WorkflowItem(object):
def __init__(self,workflow,paramname,value):
self.workflow = workflow
self.paramname = paramname
self.value = value
# list of objects
items = []
for line in lines:
items.append( WorkflowItem(*line.split(",")) )
(Also, nitpick: 4-space tabs are preferable to 8-space.)