Changing a value in multiple files at the same time in Python

Changing a value in multiple files at the same time in Python - python

I want to change the value of beta in Test.py which are in multiple folders at the same time without actually opening these files but I am getting an error. How do I do this?
import os
N=[8,10, 23,29, 36, 37, 41,42, 45, 46, 47]
I=[]
for i in N:
os.read(rf'C:\Users\User\{i}\Test.py')
beta=1e-1
The error is
in <module>
os.read(rf'C:\Users\User\OneDrive - Technion\Research_Technion\Python_PNM\All_ND\var_6.0_beta_0.1\{i}\220_beta_1.0_50.0_6.0ND.py')
TypeError: read expected 2 arguments, got 1

Syntax: os.read(fd, n)
Parameter: fd: A file descriptor representing the file to be read. n:
An integer value denoting the number of bytes to be read from the file
associated with the given file descriptor fd
Seems like you forgot the second argument n.
see - https://www.geeksforgeeks.org/python-os-read-method/#:~:text=read()%20method%20in%20Python,bytes%20left%20to%20be%20read.

Related

TypeError: '(slice(0, 15, None), 15)' is an invalid key

I have a code in Python that looks something like the code pasted below. For context, the all csv files print [15 rows x 16 columns], I just changed the name for privacy purposes.
import numpy as np
import pandas as pd
C = pd.read_csv('/Users/name/Desktop/filename1.csv')
Chome = pd.read_csv('/Users/name/Desktop/filename2.csv')
Cwork = pd.read_csv('/Users/name/Desktop/filename3.csv')
Cschool = pd.read_csv('/Users/name/Desktop/filename4.csv')
Cother = pd.read_csv('/Users/name/Desktop/filename5.csv')
Cf = np.zeros([17,17])
Cf = C
Cf[0:15,16] = C[0:15,15]
Cf[16,0:15] = C[15,0:15]
Cf[16,16] = C[15,15]
print(Cf)
When I run the code I get the following error:
runfile('/Users/name/.spyder-py3/untitled12.py', wdir='/Users/name/.spyder-py3')
Traceback (most recent call last):
File "/Users/name/.spyder-py3/untitled12.py", line 23, in <module>
Cf[0:15,16] = C[0:15,15]
File "/opt/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py", line 2800, in __getitem__
indexer = self.columns.get_loc(key)
File "/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2646, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 116, in pandas._libs.index.IndexEngine.get_loc
TypeError: '(slice(0, 15, None), 15)' is an invalid key
I am not exactly sure what this error means. I am pretty new to python, so debugging is a skill I am trying to better understand. So any advice on what I can do to fix this error, or what it means would be helpful. Thank you.

Note the following sequence in your code sample:
C = pd.read_csv(...)
... # Other cases of pd.read_csv
Cf = np.zeros([17,17])
So, at least till now, C is a DataFrame and Cf is a Numpy array.
Then Cf = C is probably a logical error, since it overwrites
the Numpy array (full of zeroes) with another reference to C.
And now as the offending instruction (Cf[0:15,16] = C[0:15,15]) is concerned:
Note that C[0:15,15] is wrong (run this code on your own to see it).
In case of pandasonic DataFrames you can use "positional addressing",
including slices, using iloc.
On the other hand, this notation is allowed for Numpy arrays.
So, assuming that Cf = C is not needed and Cf should remain a
Numpy array, you probably should correct this instruction to:
Cf[0:15,16] = C.iloc[0:15,15]
And make analogous corrections in remaining instructions in your code.
Edit
Another option is to refer to the underlying Numpy array in C DataFrame,
using values attribute.
In this case you can use Numpythonic addressing style, e.g.:
C.values[0:15,15]
causes no error.

Pack list of ints in Python

I have got a list that I am packing as bytes using struct module in Python. Here is my list:
[39, 39, 126, 126, 256, 258, 260, 259, 257, 126]
I am packing my list as:
encoded = struct.pack(">{}H".format(len(list)), *list)
where I pass number of elements in list as a format.
Now, I need to unpack the packed struct. For that I will need a format where I again pass number of elements. For now I am doing it like so:
struct.unpack(">{}H".format(10), encoded)
However, I can't pass it as a simple parameter to function format because that struct is then written to file that I am using for compressing image. How can I add a number of elements to file, and unpack it after?
P.S. I would like to get that 10 (in unpacking) from file itself that is packed as bytes.

Form what I understood from the comments and questions. Maybe this will be helpful.
import struct
data = [39, 39, 126, 126, 256, 258, 260, 259, 257, 126]
encoded = struct.pack(">{}H".format(len(data)), *data)
tmp = struct.pack(">H", len(data))
encoded = tmp + encoded #appending at the start
begin = 2
try:
size = struct.unpack(">H", encoded[0:begin])[0]
print(size)
print(struct.unpack(">{}H".format(size), encoded[begin:]))
except Exception as e:
print(e)
Let me know if it helps.

Here is my approach of adding that [number of elements] to the file:
file.write(len(compressed_list).to_bytes(3,'big'))
I allocate 3 bytes of memory for the length of compressed_list, convert it to bytes, and add it to the beginning of the file. Further, write other left parts.
Next, when I need that number, I get it from the file like so:
sz = int.from_bytes(encoded[0:3],'big')
which means that I take first three bytes from byte array read from the file, and typecast that bytes to int.
That solved my problem.

python multiprocessing struct.error

I am looping through a set of large files, and using multiprocessing for manipulation/writing. I create an iterable out of my dataframe and pass it to multiprocessing's map function. The processing is fine for the smaller files, but when I hit the larger ones (~10g) I get the error:
python struct.error: 'i' format requires -2147483648 <= number <= 2147483647
the code:
data = np.array_split(data, 10)
with mp.Pool(processes=5, maxtasksperchild=1) as pool1:
pool1.map(write_in_parallel, data)
pool1.close()
pool1.join()
Based on this answer I thought the problem is the file I am passing to map is too large. So I tried first splitting the dataframe into 1.5g chunks and passing each independently to map, but I am still receiving the same error.
Full traceback:
Traceback (most recent call last):
File "_FNMA_LLP_dataprep_final.py", line 51, in <module>
write_files()
File "_FNMA_LLP_dataprep_final.py", line 29, in write_files
'.txt')
File "/DATAPREP/appl/FNMA_LLP/code/FNMA_LLP_functions.py", line 116, in write_dynamic_columns_fannie
pool1.map(write_in_parallel, first)
File "/opt/Python364/lib/python3.6/multiprocessing/pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/opt/Python364/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
File "/opt/Python364/lib/python3.6/multiprocessing/pool.py", line 424, in _handle_tasks
put(task)
File "/opt/Python364/lib/python3.6/multiprocessing/connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/opt/Python364/lib/python3.6/multiprocessing/connection.py", line 393, in _send_bytes
header = struct.pack("!i", n)
struct.error: 'i' format requires -2147483648 <= number <= 2147483647

In the answer you mentioned was also another gist: the data should be loaded by the child function. In your case, it's function write_in_parallel. What I recommend you is to alter your child function in the next way:
def write_in_parallel('/path/to/your/data'):
""" We'll make an assumption that your data is stored in csv file"""
data = pd.read_csv('/path/to/your/data')
...
Then your "Pool code" should look like this:
with mp.Pool(processes=(mp.cpu_count() - 1)) as pool:
chunks = pool.map(write_in_parallel, ('/path/to/your/data',))
df = pd.concat(chunks)
I hope that will help you.

numpy.frombuffer ValueError: buffer is smaller than requested size

I have the error listed above, but have been unable to find what it means. I am new to numpy and its {.frombuffer()} command. The code where this error is triggering is:
ARRAY_1=400000004
fid=open(fn,'rb')
fid.seek(w+x+y+z) #w+x+y+z=
if(condition==0):
b=fid.read(struct.calcsize(fmt+str(ARRAY_1)+'b'))
myClass.y = numpy.frombuffer(b,'b',struct.calcsize(fmt+str(ARRAY_1)+'b'))
else:
b=fid.read(struct.calcsize(fmt+str(ARRAY_1)+'h'))
myClass.y = numpy.frombuffer(b,'h',struct.calcsize(fmt+str(ARRAY_1)+'h')) #error this line
where fmt is '>' where condition==0 and '<' where condition !=0. This is changing the way the binaryfile is read, big endian or little endian. fid is a binary file that has already been opened.
Debugging up to this point, condition=1, so I have a feeling that there is also an error in the last statement of the if condition as well, I just don't see it right now.
As I said before, I tried to find what the error meant, but haven't had any luck. If anyone knows why it's erroring out on me, I'd really like the help.

calcsize gives the number of bytes that the buffer will have given the format.
In [421]: struct.calcsize('>100h')
Out[421]: 200
In [422]: struct.calcsize('>100b')
Out[422]: 100
h takes 2 bytes per item, so for 100 items, it gives 200 bytes.
For frombuffer, the 3rd argument is
count : int, optional
Number of items to read. ``-1`` means all data in the buffer.
So I should give it 100, not 200.
Reading a simple bytestring (in Py3):
In [429]: np.frombuffer(b'one two three ','b',14)
Out[429]: array([111, 110, 101, 32, 116, 119, 111, 32, 116, 104, 114, 101, 101, 32], dtype=int8)
In [430]: np.frombuffer(b'one two three ','h',14)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-430-30077e924a4c> in <module>()
----> 1 np.frombuffer(b'one two three ','h',14)
ValueError: buffer is smaller than requested size
In [431]: np.frombuffer(b'one two three ','h',7)
Out[431]: array([28271, 8293, 30580, 8303, 26740, 25970, 8293], dtype=int16)
To read it with h I need to give it half the count of the b read.

ValueError string to float when retrieving float32 from Netcdf file using Netcdf4 in python

I am using netcdf4 in python 2.7 on a windows7 machine. I have loaded numpy recarrays into a netcdf file I created and have subsequently retrieved the data several times. Then, for some unknown reason when I try to retrieve the data I get a ValueError could not convert string to float:
The code that is being used to retrieve the data is:
def getNetCDFGroupVarData(NCfilename, GroupPath, Variable):
""" ==============================================================
TITLE: getNetCDFGroupVarData
DESCR: for a valid variable on the specified path in a NetCDF file
returns a data vector
ARGS: NCfilename : netcdf4 file path and name
GroupPath : group path
Variable : variable name
RETURN: VarData: vector of variable data
DEPEND: netCDF4.Dataset
=======================================================================
"""
# get rootgroup and group from which to return attributes
if os.path.isfile(NCfilename):
RG = Dataset(NCfilename, 'a')
G = giveListEndGroup(RG,GroupPath)
# retrieve variable data from group
keyVar = G.variables.keys()
print(keyVar)
kvlen = len(keyVar)
var = unicode(Variable)
if kvlen > 0 :
print('variable name: ',var)
V = G.variables[var]
print V.dtype
print V.shape
print V.dimensions
VarData = V[:] #====== Error raised here ==============
else:
print('no keys found')
VarData = None
RG.close()
return VarData
The print outputs and error stack I get when calling this function are:
[u'time', u'SECONDS', u'NANOSECONDS', u'Rg', u'Ts1', u'Ts2', u'Ts3', u'V_log', u'T_log']
('variable name: ', u'time')
float64
(88872,)
(u'time',)
variable: time does not exist
Unexpected error: <type 'exceptions.ValueError'>
Traceback (most recent call last):
File "C:\Users\rclement\Documents\My Dropbox\Code\python\NCTSutil\Panel_NCTS_structure.py", line 69, in tree_path_changed
pub.sendMessage('NetcdfTS.group.specified', arg1=pathlist )
File "C:\Python27\lib\site-packages\pubsub\core\kwargs\publisher.py", line 27, in sendMessage
topicObj.publish(**kwargs)
File "C:\Python27\lib\site-packages\pubsub\core\kwargs\publishermixin.py", line 24, in publish
self._publish(msgKwargs)
File "C:\Python27\lib\site-packages\pubsub\core\topicobj.py", line 376, in _publish
self.__sendMessage(data, self, iterState)
File "C:\Python27\lib\site-packages\pubsub\core\topicobj.py", line 397, in __sendMessage
self._mix_callListener(listener, data, iterState)
File "C:\Python27\lib\site-packages\pubsub\core\kwargs\publishermixin.py", line 64, in _mix_callListener
listener(iterState.filteredArgs, self, msgKwargs)
File "C:\Python27\lib\site-packages\pubsub\core\kwargs\listenerimpl.py", line 43, in __call__
cb(**kwargs)
File "C:\Users\rclement\Documents\My Dropbox\Code\python\NCTSutil\NetcdfTimeSeries.py", line 70, in listner_group
atime = self.GetSelectedVariableData(pathlist, u'time')
File "C:\Users\rclement\Documents\My Dropbox\Code\python\NCTSutil\NetcdfTimeSeries.py", line 307, in GetSelectedVariableData
VarData = MNU.getNetCDFGroupVarData(self.filename, GroupPathList, variable )
File "C:\Users\rclement\Documents\My Dropbox\Code\python\NCTSutil\MyNetcdfUtil.py", line 304, in getNetCDFGroupVarData
VarData = V[:]
File "netCDF4.pyx", line 2949, in netCDF4.Variable.__getitem__ (netCDF4.c:36472)
File "netCDF4.pyx", line 2969, in netCDF4.Variable._toma (netCDF4.c:36814)
ValueError: could not convert string to float:
When I use other netcdf utilities (i.e. panolpy) I can access the data.
Does anyone have a clue why netcdf4 would be throwing this excpetion - or worse - how it could have inserted a string in my float32 field in the netcdf file?

From the traceback the problem was occurring in the "_toma" Netcdf4 function which converts the data to a masked array. When reading the file with other utilities (eg. NCDUMP) I had no problem accessing the data.
At the moment I believe the problem occurred because I had an unassigned 'missing_value' attribute for the variable. Apparently, if there is no 'missing_value' attribute Netcdf4 defaults to a missing value appropriate for the dtype. In my implementation the 'missing_value' attribute was being exposed for editing via a wxpyhton grid control. When the edited attributes in the grid were written back to the netcdf file the empty grid cell was returning either a None object or wx.emptyString, which Netcdf4 attempted to insert into the float type in the netcdf file.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Changing a value in multiple files at the same time in Python - python

Related

TypeError: '(slice(0, 15, None), 15)' is an invalid key

Pack list of ints in Python

python multiprocessing struct.error

numpy.frombuffer ValueError: buffer is smaller than requested size

ValueError string to float when retrieving float32 from Netcdf file using Netcdf4 in python

Categories

Resources