python - write formated file without fixed number of variables to print - python

I have these lines of code that create a list (with different amount of variables in it), and want to put them in an outfile. The thing is in
outfile.write('%i ?????' % (bn, crealines[bn]))
I don't know exactly how to write the format since the output varies in number.
Is there anyway of putting an output with different number of columns?
*I looked at this: Increasing variables and numbers by one each time (python) ... but in my case they don't increase one-by-one.
Also, can I print a list without the parenthesis?
The code is like this:
(# In this case I am creating a "cube" -matrix- of 3x3x3)
nx = ny = nz = 3
vec = []
crealines = []
outfile = open('test.txt', 'a')
for bn in arange(nx*ny*nz):
vec = neighboringcubes(bn,nx,ny,nz) #this is a defined function to see which cubes are neighbors to the cube "bn"
crealines.append(vec)
print bn, crealines[bn]
outfile.write('%i, %i ....' % (bn, crealines[bn]))
outfile.close()
using print it gives me this (which is correct):
0 0 0 <---- this is the output from function neighboringcubes() -which I don't need-
0 [1, 3, 9] <---- THIS IS WHAT I WANT WRITTEN IN THE OUTPUTFILE
1 0 0
1 [2, 0, 4, 10]
2 0 0
2 [1, 5, 11]
0 1 0
3 [4, 6, 0, 12]
1 1 0
4 [5, 3, 7, 1, 13] <--- BUT YOU CAN SEE IT CHANGES
2 1 0
5 [4, 8, 2, 14]
0 2 0
6 [7, 3, 15]
1 2 0
7 [8, 6, 4, 16]
2 2 0
8 [7, 5, 17]
0 0 1
9 [10, 12, 18, 0]
1 0 1
10 [11, 9, 13, 19, 1]
...
I want the outfile to have in the first column the number of the cube, and the following columns -from lower to higher- the neighbors; like this:
0 1 3 9
1 0 2 4 10
2 1 5 11
3 0 4 6 12
4 1 3 5 7 13
5 2 4 8 14
6 3 7 15
7 4 6 8 16
8 5 7 17
9 0 10 12 18
...

Your question isn't quite clear to me, but I believe you want to print the variable bn followed by its neighbors in sorted order. If so, this code snippet illustrates how to do that:
>>> bn = 5
>>> neighbors = [10, 12, 2, 4]
>>> print bn, ' '.join(map(str, sorted(neighbors)))
Which results in this output:
5 2 4 10 12

Few proposition, depending on what you exactly want (now they are the same, but may behave differently depending on data):
bn = 5
neighbours = [8, 12, -1, 4]
print "{} [{}]".format(bn, ', '.join(map(str, sorted(neighbours))))
print bn, repr(sorted(neighbours))
print bn, str(sorted(neighbours))
output:
5 [-1, 4, 8, 12]
5 [-1, 4, 8, 12]
5 [-1, 4, 8, 12]

I'm having the idea that you could just use print bn, crealines[bn].sort() but I could be wrong. (Thats because I cant test your code. Where is the function arange imported from?)

Thanks #jaime!
This solved the format problem:
print "{} {}".format(bn, ' '.join(map(str,crealines[bn])))
I sorted neighbors outside using vec=sorted(neighbors), then crealines[bn] is already sorted.
The output looks like this
5 2 4 8 14

Related

Turn list clockwise for one time

How I can rotate list clockwise one time? I have some temporary solution, but I'm sure there is a better way to do it.
I want to get from this
Index: 0 1 2 3 4 5 6 7 8 9
Count: 0 2 4 4 5 6 6 7 7 7
to this:
Index: 0 1 2 3 4 5 6 7 8 9
Count: 0 0 2 4 4 5 6 6 7 7
And my temporary "solution" is just:
temporary = [0, 2, 4, 4, 5, 6, 6, 7, 7, 7]
test = [None] * len(temporary)
test[0] = temporary[0]
for index in range(1, len(temporary)):
test[index] = temporary[index - 1]
You might use temporary.pop() to discard the last item and temporary.insert(0, 0) to add 0 to the front.
Alternatively in one line:
temporary = [0] + temporary[:-1]

How can I compare two data frames in pandas and update values based on keys?

I have two data frames and I want to use pandas syntax or methods to compare them and update values from the larger data frame to the smaller data frame based on similar keys.
import numpy
import pandas as pd
temp = pd.read_csv('.\\..\\..\\test.csv')
temp2 = pd.read_excel('.\\..\\..\\main.xlsx')
lenOfFile = len(temp.iloc[:, 1])
lenOfFile2 = len(temp2.iloc[:, 1])
dict1 = {}
dict2 = {}
for i in range(lenOfFile):
dict1[temp.iloc[i, 0]] = temp.iloc[i, 1]
for i in range(lenOfFile2):
dict2[temp2.iloc[i, 0]] = temp2.iloc[i, 1]
for i in dict1:
if i in dict2:
dict1[i] = dict2[i]
else:
dict1[i] = "Not in dict2"
I want the same behavior as what I wrote.
You should have put a Minimal, Complete and Verifiable Example. Please, make sure in the future we can run your code just by pasting into our IDE. I spent way too much time on that question haha
import pandas as pd
temp = pd.DataFrame({'A' : [20, 4, 60, 4, 8], 'B' : [2, 4, 5, 6, 7]})
temp2 = pd.DataFrame({'A' : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'B' : [1, 2, 3, 10, 5, 6, 70, 8, 9, 10]})
print(temp)
print(temp2)
# A B
# 0 20 2
# 1 4 4
# 2 60 5
# 3 4 6
# 4 8 7
# A B
# 0 1 1
# 1 2 2
# 2 3 3
# 3 4 10
# 4 5 5
# 5 6 6
# 6 7 70
# 7 8 8
# 8 9 9
# 9 10 10
# Make a mapping of the values of our second mask.
mapping = dict(zip(temp2['A'], temp2['B']))
# We apply the mapping to each row. If we find the occurence, replace, else, default.
temp['B'] = temp['A'].apply(lambda x:mapping[x] if x in mapping else 'No matching')
print(temp)
# A B
# 0 20 No matching
# 1 4 10
# 2 60 No matching
# 3 4 10
# 4 8 8

Using generator items selectively

Let's say I have some arrays/lists that contains a lot of values, which means that loading several of these into memory would ultimately result in a memory error due to lack of memory. One way to circumvent this is to load these arrays/lists into a generator, and then use them when needed. However, with generators you don't have so much control as with arrays/lists - and that is my problem.
Let me explain.
As an example I have the following code, which produces a generator with some small lists. So yeah, this is not memory intensive at all, just an example:
import numpy as np
np.random.seed(10)
number_of_lists = range(0, 5)
generator_list = (np.random.randint(0, 10, 10) for i in number_of_lists)
If I iterate over this list I get the following:
for i in generator_list:
print(i)
>> [9 4 0 1 9 0 1 8 9 0]
>> [8 6 4 3 0 4 6 8 1 8]
>> [4 1 3 6 5 3 9 6 9 1]
>> [9 4 2 6 7 8 8 9 2 0]
>> [6 7 8 1 7 1 4 0 8 5]
What I would like to do is sum element wise for all the lists (axis = 0). So the above should in turn result in:
[36, 22, 17, 17, 28, 16, 28, 31, 29, 14]
To do this I could use the following:
sum = [0]*10
for i in generator_list:
sum += i
where 10 is the length of one of the lists.
So far so good. I am not sure if there is a better/more optimized way of doing it, but it works.
My problem is that I would like to determine which lists in the generator_list I want to use. For example, what if I wanted to sum two of the first [0] list, one of the third, and 2 of the last, i.e.:
[9 4 0 1 9 0 1 8 9 0]
[9 4 0 1 9 0 1 8 9 0]
[4 1 3 6 5 3 9 6 9 1]
[6 7 8 1 7 1 4 0 8 5]
[6 7 8 1 7 1 4 0 8 5]
>> [34, 23, 19, 10, 35, 5, 19, 22, 43, 11]
How would I go about doing that ?
And before any questions arise why I want to do it this way, the reason is that in my real case, getting the arrays into the generator takes some time. I could then in principle just generate a new generator where I put in the order of lists as seen in the new list, but again, that would mean I would have to wait to get them in a new generator. And if this is to happen thousands of times (as seen with bootstrapping), well, it would take some time. With the first generator I have ALL lists that are available. Now I just wish to use them selectively so I don't have to create a new generator every time I want to mix it up, and sum a new set of arrays/lists.
import numpy as np
np.random.seed(10)
number_of_lists = range(5)
generator_list = (np.random.randint(0, 10, 10) for i in number_of_lists)
indices = [0, 0, 2, 4, 4]
assert sorted(indices) == indices, "only works for sorted list"
# sum_ = [0] * 10
# I prefer this:
sum_ = np.zeros((10,), dtype=int)
generator_index = -1
for index in indices:
while generator_index < index:
vector = next(generator_list)
generator_index += 1
sum_ += vector
print(sum_)
outputs
[34 23 19 10 37 5 19 22 43 11]

Building a function that divides dataframe into groups

I am intrested in creating a function that does the folloing:
accepts 2 parameters: a DataFrame and an integer.
adds a column to the DF called "group"
giving each row an integer based on his integer location. the number of groups should be as the number of integer given to the function.
if the number of rows is not dividable by the integer given, the remaning rows should be splitted as evenly as possible between the groups. this is the part im having problems with.
Here is a menual exemple i made to clarify my intentions:
I would to get from this DF:
d = {'value': [1,2,3,4,5,6,7,8,9,10,11,12,13],}
df_init = pd.DataFrame(data=d)
By this function:
wanted function(df_init,5)
To this finel DF:
s = {'value': [1,2,3,4,5,6,7,8,9,10,11,12,13],'group':[1,1,1,2,2,2,3,3,3,4,4,5,5]}
df_finel = pd.DataFrame(data=d)
If I can make the question any clearer, please tell me how and ill fix it.
Use np.array_split
In [5481]: [i for i, x in enumerate(np.array_split(np.arange(len(df)), 5), 1) for _ in x]
Out[5481]: [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5]
Assign it
In [5487]: df['group'] = [i for i, x in
enumerate(np.array_split(np.arange(len(df)), 5), 1) for _ in x]
In [5488]: df
Out[5488]:
value group
0 1 1
1 2 1
2 3 1
3 4 2
4 5 2
5 6 2
6 7 3
7 8 3
8 9 3
9 10 4
10 11 4
11 12 5
12 13 5
Details
Original df
In [5491]: df
Out[5491]:
value
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 10
10 11
11 12
12 13
The act
In [5492]: np.array_split(np.arange(len(df)), 5)
Out[5492]:
[array([0, 1, 2]),
array([3, 4, 5]),
array([6, 7, 8]),
array([ 9, 10]),
array([11, 12])]

numpy.random.randint does not return a list separte by comma

I am running this code:
import numpy as np
Z=np.ones(10)
I = np.random.randint(0,len(Z),20).
print I
#[9 0 0 1 0 2 3 4 3 3 2 2 7 8 1 9 9 2 1 7]
#so this instruction does not work
print Z[I]
return a list without where the elelements does not separates by comma as mentioned here randint
The output on that page shows the interpreter (or repr) output. Also, I changed it to randint and removed the period that would have thrown a syntax error.
import numpy as np
I = np.random.randint(0, 10, 10)
print(I) # => [9 4 2 7 6 3 4 5 6 2]
print(repr(I)) # => array([9, 4, 2, 7, 6, 3, 4, 5, 6, 2])
print(type(I)) # => <type 'numpy.ndarray'>
L = list(I)
print(L) # => [9, 4, 2, 7, 6, 3, 4, 5, 6, 2]
Changing the randomint to randint works for me:
Z=np.arange(10)
I = np.random.randint(0,len(Z),20)
print I
#[9 0 0 1 0 2 3 4 3 3 2 2 7 8 1 9 9 2 1 7]
#so this instruction works for me
print Z[I]
# [3 9 6 6 7 7 7 3 7 5 5 2 1 1 5 7 1 0 7 4]

Categories

Resources