Python/ Numpy Changing all „ ,“ to „.“ in an array - python

I am trying to extract data from a .txt file which embodies certain measurement values that I would like to use inside Python. I am doing this with the numpy module (numpy.genfromtxt), which saves the values into an array.
Nevertheless, whenever there is a decimal value, it is written with a comma (1,456 f.e.), which Python does not accept as a decimal. Sadly, this is the way that the data has been given to me. Now, I would like to write a Python Code that goes through all elements of the array, basically looks out for commas and changes them to dots (I have multiple files and I would like to automate this process, even though I could technically do it manually :) ).
As I have started programming with C and C++, I would have done this with pointers and loops. Nevertheless, the pointer concept does not seem to exist in Python or is at least not advised. I would be very glad if any of you could please tell me whether there is a way to advance this problem in Python. Thank you very much!

Welcome to SO. Please give more details. We cannot answer you if you do not include the code you wrote yet / sample data / and full error messages
about your trial to solve this problem, so we can reproduce and help.
See MRE here: https://stackoverflow.com/help/minimal-reproducible-example

read the file content and replace the "," characters like so:
with open('file.txt.','r') as f:
content = f.read().replace(',','.')
# do whatever with "content"

Related

Is there a way to let the user supply Python code that will run in my program?

I'm quite new to this so apologies in advance if this is a silly question. I'm trying to build a simple module which takes a Pandas dataframe and a set of instructions (in some text format) as an input, and converts this dataframe to a nested JSON representation of the data.
My plan was to essentially leave a section of the code blank and let the user supply the code (instructions) for how to do this conversion. I'm sure this is not a good way to solve the problem, and I'd happily take pointer on how it should be done as well, but is there a way to let the user insert a section of code into the code itself, and then execute the program?
Take a look at the exec function.
If you need some control over the validation of the instructions, maybe it's better to do as #will says. Take a look at this similar question

Python-pandas with large/disordered text files

I have a large (for my experience level anyway) text file of astrophysical data and I'm trying to get a handle on python/pandas. As a noob to python, it's comin' along slowly. Here is a sample of the text file, it's a 145Mb total file. When I'm trying to read this in pandas I'm getting confused because I don't know what to use pd.read_table(example.txt) or pd.read_csv(example.csv). In either case I can't call on a specific column without ipython freaking out, such as here. I know I'm doing something absent-minded. Can anyone explain what that might be? I've done this same procedure with smaller files and it works great, but this one seems to be limiting it's output, or just not working at all.
Thanks.
It looks like your columns are separated by varying amounts of whitespace, so you'll need to specify that as the separator. Try read_csv(example.csv, sep=r'\s+'). \s+ is the regular expression for "any amount of whitespace". Also, you should remove that # character from the beginning of the first line, as that will be read as an extra column and will mess up the reading.

writing large netCDF4 file with python?

I am trying to use the netCDF4 package with python. I am ingesting close to 20mil records of data, 28 bytes each, and then I need to write the data to a netCDF4 file. Yesterday, I tried doing it all at once, and after an hour or so of execution, python stopped running the code with the very helpful error message:
Killed.
Anyway, doing this with subsections of the data, it becomes apparent that somewhere between 2,560,000 records and 5,120,000 records, the code doesn't have enough memory and has to start swapping. Performance is, of course, greatly reduced. So two questions:
1) Anyone know how to make this work more effeciently? One thing I am thinking is to somehow put subsections of data in incrementally, instead of doing it all at once. Anyone know how to do that? 2) I presume the "Killed" message happened when memory finally ran out, but I don't know. Can anyone shed any light on this?
Thanks.
Addendum: netCDF4 provides an answer to this problem, which you can see in the answer I have given to my own question. So for the moment, I can move forward. But here's another question: The netCDF4 answer will not work with netCDF3, and netCDF3 is not gone by a long shot. Anyone know how to resolve this problem in the framework of netCDF3? Thanks again.
It's hard to tell what you are doing without seeing code, but you could try using the sync command to flush the data in memory to disk after some amount of data has been written to the file:
http://netcdf4-python.googlecode.com/svn/trunk/docs/netCDF4.Dataset-class.html
There is a ready answer in netCDF4: declare the netCDF4 variable with some specified "chunksize". I used 10000, and everything proceeded very nicely. As I indicated in the edit to my answer, I would like to find a way to resolve this in netCDF3 also, since netDF3 is far from dead.

Help with PyEPL logging

I have never used Python before, most of my programming has been in MATLAB and Unix. However, recently I have been given a new assignment that involves fixing an old PyEPL program written by a former employee (I've tried contacting him directly but he won't respond to my e-mails). I know essentially nothing about Python, and though I am picking it up, I thought I'd just quickly ask for some advice here.
Anyway, there are two issues at hand here, really. The first is this segment of the code:
exp = Experiment()
exp.setBreak()
vt = VideoTrack("video")
at = AudioTrack("audio")
kt = KeyTrack("key")
log = LogTrack("session")
clk = PresentationClock()
I understand what this is doing; it is creating a series of tracking files in the directory after the program is run. However, I have searched a bunch of online tutorials and can't find a reference to any of these commands in them. Maybe I'm not searching the right places or something, but I cannot find ANYTHING about this.
What I need to do is modify the
log = LogTrack("session")
segment of the code, so that all of the session.log files go into a new directory, separate from the other log files. But I also need to find a way to not only concatenate them into a single session.log file, but add a new column to that file that will add the subject number (the program is meant to be run by multiple subjects to collect data).
I am not asking anyone to do my work for me, but if anyone could give me some pointers, or any sort of advice, I would greatly appreciate it.
Thanks
I would first check if there is a line in the code
from some_module_name import *
This could easily explain why you can call these functions (classes?). It will also tell you what file to look in to modify the code for LogTrack.
Edit:
So, a little digging seems to find that LogTrack is part of PyEPL's textlog module. These other classes are from other modules. Somewhere in this person's code should be a line something like:
from PyEPL.display import VideoTrack
from PyEPL.sound import AudioTrack
from PyEPL.textlog import LogTrack
...
This means that these are classes specific to PyEPL. There are a few ways you could go about modifying how they work. You can modify the source of the LogTrack class so that it operates differently. Perhaps easier would be to simply subclass LogTrack and change some of its methods.
Either of these will require a fairly thorough understanding of how this class operates.
In any case, I would download the source from here, open up the code/textlog.py file, and start reading how LogTrack works.

How to use SVG DOM in Python

This question may sounds dumb, but I can't manage to find a correct answer on my own.
I am trying to use the SVG DOM interface in my python script. I would like to use getComputedTextLength but I can't find how even if I firstly thought it would be available thanks to modules or a packages like python-svg or something like that.
I am sure there is something I miss, but I can't find what.
Any help would be appreciated.
Thank you.
EDIT: I forgot to talk about what my script actually does. It's a Python script used to generate a SVG file from data grabbed on the Internet. My script needs to write texts and repeat them all along a path. Also, as I know the exact length (in pixels) of the path I need to know the length of the text in order to repeat it only what I need to. That's why a method like getComputedTextLength would be helpful.
Try this: http://www.gnu.org/software/pythonwebkit/
I don't think this is possible. DOM is one thing and calling browser's function is other thing. I only saw Python module which help you to create tree structures like HTML or SVG but they don't provide any other additional functionality. (Btw., last time I look even browsers had problems correctly computing getComputedTextLength but that was some time ago...)
You could try better luck with fonttools.

Categories

Resources