How Can I Create Variables From Text and Convert Coordinates - python

I am aware there are similar questions to mine, but after trying numerous "answers" over several hours I thought my best next step is submit my conundrum here. I respect your time.
After several hours with no success in understanding why my Python script won't work I decided to see if someone could help me. Essentially, the goal is to use the astronomical program, "Stellarium" as a "day and night sky" to practice Celestial Navigation (CelNav) navigating the simulated world of Microsoft Flight Simulator X (FSX). The script actually writes a "startup.ssc" script which initializes Stellarium's date, time, and position.
The process is thus...
Use FSX and save a "flight." This creates a *.FLT file which is a text file which saves the complete situation, including time and location.
Run the FSXtoStellarium.py
Locate the lines of date, time, latitude, longitude, and altitude in the *.FLT text.
Read the data into variables.
Convert the Degrees(°), Minutes('), Seconds(") (DMS) to Decimal Degrees (DD).
Lastly, the script constructs a "startup.ssc" and opens Stellarium at the recorded time and place.
The Problem:
I have not been able to read the DMS into variable(s) correctly nor can I format the DMS into Decimal Degrees (DD). According to the "watches" I set in my IDE (PyScripter), the script is reading in an "int" value I can't decipher instead of the text string of the DMS (Example: W157° 27' 23.20").
Here are some excerpts of the file and script.
HMS Bounty.FLT
Various lines of data above...
[SimVars.0]
Latitude=N21° 20' 47.36"
Longitude=W157° 27' 23.20"
Altitude=+000004.93
Various lines of data below...
EOF
FSXtoStellarium.py
Various lines of script above...
# find lat & Lon in the file
start = content.find("SimVars.0")
latstart = content.find("Latitude=")
latend = content.find("Latitude=",latstart+1)
longstart = content.find("Longitude=",start)
longend = content.find(",",longstart)
# convert to dec deg
latitude = float(content[longend+1:latend])/120000
longitude = float(content[longstart+10:longend])/120000
Various lines of script below...
So, what am I missing?
FYI - I am an old man who gets confused. My professional career was in COBOL/DB2/CICS, but you can consider me a Python newbie (it shows, right?). :)
Your help s greatly appreciated and I will gladly provide any additional information.
Calvin

Here is a way to get from the text file (with multiple input lines) all the way to Decimal Degrees in python 2.7:
from __future__ import print_function
content='''
[SimVars.0]
Latitude=N21° 20' 47.36"
Longitude=W157° 27' 23.20"
'''
latKey = "Latitude="
longKey = "Longitude="
latstart = content.index(latKey) + len(latKey)
latend = content.find('"', latstart) + 1
longstart = content.find(longKey, latend) + len(longKey)
longend = content.find('"', longstart) + 1
lat = content[latstart:latend]
long = content[longstart:longend]
print()
print('lat ', lat)
print('long ', long)
deg, mnt, sec = [float(x[:-1]) for x in lat[1:].split()]
latVal = deg + mnt / 60 + sec / 3600
deg, mnt, sec = [float(x[:-1]) for x in long[1:].split()]
longVal = deg + mnt / 60 + sec / 3600
print()
print('latVal ', latVal)
print('longVal ', longVal)
Explanation:
we start with a multi-line string, content
the first index() call finds the start position of the substring "Latitude=" within content, to which we add the length of "Latitude=" since what we care about is the characters following the = character
the second index() call searches for the 'seconds' character " (which marks the end of the Latitude substring), to which we add one (for the length of the ")
the third index() call does for Longitude= something similar to what we did for latitude, except it starts at the position latend since we expect Longitude= to follow the latitude string following Latitude=
the fourth index() call seeks the end of the longitude substring and is completely analogous to the second index() call above for latitude
the assignment to lat uses square bracket slice notation for the list content to extract the substring from the end of Latitude= to the subsequent " character
the assignment to long is analogous to the previous step
the first assignment to deg, mnt, sec is assigning a tuple of 3 values to these variables using a list comprehension:
split lat[1:], which is to say lat with the leading cardinal direction character N removed, into space-delimited tokens 21°, 20' and 47.36"
for each token, x[:-1] uses slice notation to drop the final character which gives strings 21, 20 and 47.36
float() converts these strings to numbers of type float
the assignment to latVal does the necessary arithmetic to calculate a quantity in decimal degrees using the degrees, minutes and seconds stored in deg, mnt, sec.
the treatment of long to get to longVal is completely analogous to that for lat and latVal above.
Output:
lat N21° 20' 47.36"
long W157° 27' 23.20"
latVal 21.34648888888889
longVal 157.45644444444443

Related

Python for-loop to go through all rows of one column

I want to extract all coordinates out of a table which are inside a given radius.
How do I need to set the for loop?
I use the haversine formula for this and I just enter the lat and lon values of the center point and the lat and lon values of the point to be tested if it is in the given radius.
So I thought I need a for-loop where I run the haversine formula for each row of the lat and lon column and if the cooridnates are inside the radius i save them in an list.
#Get coordinates
#Center coordinates = nearest road location
lat1 = float(lowParkingUtilization.iloc[roadIndex].toLat)
lon1 = float(lowParkingUtilization.iloc[roadIndex].toLon)
#Test coordinates = scooter coordinates
insideRadius = []
radius = 2.50 # in kilometer
for i in eScooterVOI['lat']:
lat2 = float(eScooterVOI['lat'][i])
lon2 = float(eScooterVOI['lon'][i])
a = haversine(lon1, lat1, lon2, lat2)
if a <= radius:
insideRadius += str(lon2)+","+str(lat2)
else:
With the given code I get following error message:
File "<ipython-input-574-02dadebee55c>", line 18
^
SyntaxError: unexpected EOF while parsing
The correct answer for the question "How do I need to set the for loop?" is: YOU DON'T. pandas dataframes are not for looping over their rows. What you DO need to do is the create two new columns in the dataframe, one to calculate the distance, and one to store the names in the format you want:
eScooterVOI['dist'] = eScooterVOI.apply(lambda x: haversine(lon1, lat1, x['lon'], x['lat']), axis=1)
eScooterVOI['name'] = eScooterVOI['lon'].astype(str) + ',' + eScooterVOI['lat'].astype(str)
And then, to get a list with only the names of the coordinates whose distance is less than the radius use:
insideRadius = list(eScooterVOI[eScooterVOI['dist'] <= radius]['name'])
btw: the haversine function can be built in a way that it recieves a pandas series instead of a value, and by that it could be implemented much faster than using df.apply, but that would require changing some code which is not here in the question.
The SyntaxError: unexpected EOF while parsing error message means that some of the code blocks were not completed and the end of the code has reached.
Your else block requires at least one line of code that should be in it.
For example:
else:
lots_of_code_to_write_here
You had this error because of your else block.
When Python reads it, it expects some code to be written. Python does not find any so an error occures.
Your code might be working laready, just delete the else block, you can use an if block without being bound to use an else one.
Anyway, if you absolutely want to use an else block try something like that :
if a <= radius:
insideRadius += str(lon2)+","+str(lat2)
else :
pass
But I do not think it is recommended.

Efficiently compute distances between thousands of coordinate pairs

I have a catalog I opened in python, which has about 70,000 rows of data (ra, dec coordinates and object name) for various objects. I also have another list of about 15,000 objects of interest, which also appear in the previously mentioned catalog. For each of these 15,000 objects, I would like to see if any other objects in the large 70,000 list have ra, dec coordinates within 10 arcseconds of the object. If this is found to be true, I'd just like to flag the object and move on to the next one. However, this process takes a long time, since the distances are computed between the current object of interest (out of 15,000) 70,000 different times. This would take days! How could I accomplish the same task more efficiently? Below is my current code, where all_objects is a list of all the 15,000 object names of interest and catalog is the previously mentioned table data for 70,000 objects.
from astropy.coordinates import SkyCoord
from astropy import units as u
for obj_name in all_objects:
obj_ind = list(catalog['NAME']).index(obj_name)
c1 = SkyCoord(ra=catalog['RA'][obj_ind]*u.deg, dec=catalog['DEC'][obj_ind]*u.deg, frame='fk5')
for i in range(len(catalog['NAME'])):
if i != obj_ind:
# Compute distance between object and other source
c2 = SkyCoord(ra=catalog['RA'][i]*u.deg, dec=catalog['DEC'][i]*u.deg, frame='fk5')
sep = c1.separation(c2)
contamination_flag = False
if sep.arcsecond <= 10:
contamination_flag = True
print('CONTAMINATION FOUND')
break
1 Create your own separation function
This step is really easy once you look at the implementation and ask yourself: "how can I make this faster"
def separation(self, other):
from . import Angle
from .angle_utilities import angular_separation # I've put that in the code bellow so it is clearer
if not self.is_equivalent_frame(other):
try:
other = other.transform_to(self, merge_attributes=False)
except TypeError:
raise TypeError('Can only get separation to another SkyCoord '
'or a coordinate frame with data')
lon1 = self.spherical.lon
lat1 = self.spherical.lat
lon2 = other.spherical.lon
lat2 = other.spherical.lat
sdlon = np.sin(lon2 - lon1)
cdlon = np.cos(lon2 - lon1)
slat1 = np.sin(lat1)
slat2 = np.sin(lat2)
clat1 = np.cos(lat1)
clat2 = np.cos(lat2)
num1 = clat2 * sdlon
num2 = clat1 * slat2 - slat1 * clat2 * cdlon
denominator = slat1 * slat2 + clat1 * clat2 * cdlon
return Angle(np.arctan2(np.hypot(num1, num2), denominator), unit=u.degree)
It calculates a lot of cosines and sines, then creates an instance of Angle and converts to degrees then you convert to arc seconds.
You might not want to use Angle, nor do the tests and conversions at the beginning, nor doing the import in the function, nor doing so much variable assignment if you need performance.
The separation function feels a bit heavy to me, it should just take numbers and return a number.
2 Use a quad tree (requires a complete rewrite of your code)
That said, let's look at the complexity of your algorithm, it checks every element against every other element, complexity is O(n**2) (Big O notation). Can we do better...
YES You could use a Quad-tree, worst case complexity of Quad tree is O(N). What that basically means if you're not familiar with Big O is that for 15 000 element, the lookup will be 15 000 times what it is for 1 element instead of 225 000 000 times (15 000 squared)... quite an improvement right... Scipy has a great Quad tree library (I've always used my own).

Extracting multiple values after an exact string using regular expresions

I have 100s of .txt/.sed files with lots of lines in each.
Sample input file:
Time: 10:34:51.49,15:21:39.24
Box Temperature (K): 32.82,8.88,-10.07
Silicon Temperature (K): 10.90,9.88
Voltage: 7.52,7.41
Dark Mode: AUTO,AUTO
Radiometric Calibration: RADIANCE
Units: W/m^2/sr/nm
GPS Time: n/a
Satellites: n/a
Channels: 1024
Desired output:
Time 15:21:39.24
Box Temp 32.82
8.88
-10.07
Si Temp 10.90
9.88
I was trying to write the code for identifying the string and then making a list of the values and then later tackle arranging them into a DataFrame followed by writing them to a .csv file.
Sample code
testtxt = 'Temperature (K): 32.82,8.88,-10.07,32.66,8.94,-10.07'
exp = r'^Temperature (K):(\s*) ([0-9.]+)([0-9.]+), ([0-9.-]+) , (-[0-9-.]+),([0-9-.]+) , ([0-9-.]+),(-[0-9-.]+)'
regexp = re.compile(exp)
my_temp = regexp.search(txt)
print(my_temp.group(0))
ERROR:
AttributeError: 'NoneType' object has no attribute 'group'
Basically, it finds no match!
Clarification: I want an efficient way to only extract the Time and Temperature values, not the others. It would be great to be able to stop scanning the files once those are found since each file has over 500 lines and I have lots of them.
My suggestion would be to use string.startswith() to determine if the string starts with "Box Temperature (K)", or whatever. Once you find that, get the rest of the string, parse it as a CSV, and then validate each of the components. Trying to do this all with regular expressions is more trouble than it's worth.
If you want to have the code stop once it's found everything, just set flags for the things you want to find, and once all the flags are set you can exit. Something like:
foundTime = 0
foundBoxTemp = 0
foundSiTemp = 0
while (not end of file AND (foundTime == 0 || foundBoxTemp == 0 || foundSiTemp == 0))
if (line.startswith("Box Temperature (K):"))
// parse and output
else if (line.startswith("Time:"))
// parse and output
else ....

Finding exon/ intron borders in a gene

I would like to go through a gene and get a list of 10bp long sequences containing the exon/intron borders from each feature.type =='mRNA'. It seems like I need to use compoundLocation, and the locations used in 'join' but I can not figure out how to do it, or find a tutorial.
Could anyone please give me an example or point me to a tutorial?
Assuming all the info in the exact format you show in the comment, and that you're looking for 20 bp on either side of each intro/exon boundary, something like this might be a start:
Edit: If you're actually starting from a GenBank record, then it's not much harder. Assuming that the full junction string you're looking for is in the CDS feature info, then:
for f in record.features:
if f.type == 'CDS':
jct_info = str(f.location)
converts the "location" information into a string and you can continue as below.
(There are ways to work directly with the location information without converting to a string - in particular you can use "extract" to pull the spliced sequence directly out of the parent sequence -- but the steps involved in what you want to do are faster and more easily done by converting to str and then int.)
import re
jct_info = "join{[0:229](+), [11680:11768](+), [11871:12135](+), [15277:15339](+), [16136:16416](+), [17220:17471](+), [17547:17671](+)"
jctP = re.compile("\[\d+\:\d+\]")
jcts = jctP.findall(jct_info)
jcts
['[0:229]', '[11680:11768]', '[11871:12135]', '[15277:15339]', '[16136:16416]', '[17220:17471]', '[17547:17671]']
Now you can loop through the list of start:end values, pull them out of the text and convert them to ints so that you can use them as sequence indexes. Something like this:
for jct in jcts:
(start,end) = jct.replace('[', '').replace(']', '').split(':')
try: # You need to account for going out of index, e.g. where start = 0
start_20_20 = seq[int(start)-20:int(start)+20]
except IndexError:
# do your alternatives e.g. start = int(start)

pyephem FixedObject() for given RA/Dec

I'm looking to determine the alt/az of (un-famous) stars at given RA/Dec at specific times from Mauna Kea. I'm trying to compute these parameters using pyephem, but the resulting alt/az don't agree with other sources. Here's the calculation for HAT-P-32 from Keck:
import ephem
telescope = ephem.Observer()
telescope.lat = '19.8210'
telescope.long = '-155.4683'
telescope.elevation = 4154
telescope.date = '2013/1/18 10:04:14'
star = ephem.FixedBody()
star._ra = ephem.degrees('02:04:10.278')
star._dec = ephem.degrees('+46:41:16.21')
star.compute(telescope)
print star.alt, star.az
which returns -28:43:54.0 73:22:55.3, though according to Stellarium, the proper alt/az should be: 62:26:03 349:15:13. What am I doing wrong?
EDIT: Corrected latitude and longitude, which were formerly reversed.
First, you've got long and latitude backwards; second, you need to provide the strings in hexadecimal form; and third, you need to provide the RA as hours, not degrees:
import ephem
telescope = ephem.Observer()
# Reversed longitude and latitude for Mauna Kea
telescope.lat = '19:49:28' # from Wikipedia
telescope.long = '-155:28:24'
telescope.elevation = 4154.
telescope.date = '2013/1/18 00:04:14'
star = ephem.FixedBody()
star._ra = ephem.hours('02:04:10.278') # in hours for RA
star._dec = ephem.degrees('+46:41:16.21')
star.compute(telescope)
This way, you get:
>>> print star.alt, star.az
29:11:57.2 46:43:19.6
PyEphem always uses UTC for time, so that programs operate the same and give the same output wherever they are run. You simply need to convert the date you are using to UTC, instead of using your local time zone, and the results agree fairly closely with Stellarium; use:
telescope.date = '2013/1/18 05:04:14'
The result is this alt/az:
62:27:19.0 349:26:19.4
To know where the small remaining difference comes from, I would have to look into how the two programs handle each step of their computation; but does this get you close enough?

Categories

Resources