From C to Python - python

I'm really sorry if this is a lame question, but I think this may potentially help others making the same transition from C to Python. I have a program that I started writing in C, but I think it's best if I did it in Python because it just makes my life a lot easier.
My program retrieves intraday stock data from Yahoo! Finance and stores it inside of a struct. Since I'm so used to programming in C I generally try to do things the hard way. What I want to know is what's the most "Pythonesque" way of storing the data into an organized fashion. I was thinking an array of tuples?
Here's a bit of my C program.
// Parses intraday stock quote data from a Yahoo! Finance .csv file.
void parse_intraday_data(struct intraday_data *d, char *path)
{
char cur_line[100];
char *csv_value;
int i;
FILE *data_file = fopen(path, "r");
if (data_file == NULL)
{
perror("Error opening file.");
return;
}
// Ignore the first 15 lines.
for (i = 0; i < 15; i++)
fgets(cur_line, 100, data_file);
i = 0;
while (fgets(cur_line, 100, data_file) != NULL) {
csv_value = strtok(cur_line, ",");
csv_value = strtok(NULL, ",");
d->close[i] = atof(csv_value);
csv_value = strtok(NULL, ",");
d->high[i] = atof(csv_value);
csv_value = strtok(NULL, ",");
d->low[i] = atof(csv_value);
csv_value = strtok(NULL, ",");
d->open[i] = atof(csv_value);
csv_value = strtok(NULL, "\n");
d->volume[i] = atoi(csv_value);
i++;
}
d->close[i] = 0;
d->high[i] = 0;
d->low[i] = 0;
d->open[i] = 0;
d->volume[i] = 0;
d->count = i - 1;
i = 0;
fclose(data_file);
}
So far my Python program retrieves the data like this.
response = urllib2.urlopen('https://www.google.com/finance/getprices?i=' + interval + '&p=' + period + 'd&f=d,o,h,l,c,v&df=cpct&q=' + ticker)
Question is, what's the best or most elegant way of storing this data in Python?

Keep it simple. Read the line, split it by commas, and store the values inside a (named)tuple. That’s pretty close to using a struct in C.
If your program gets more elaborate it might (!) make sense to replace the tuple by a class, but not immediately.
Here’s an outline:
from collections import namedtuple
IntradayData = namedtuple('IntradayData',
['close', 'high', 'low', 'open', 'volume', 'count'])
response = urllib2.urlopen('https://www.google.com/finance/getprices?q=AAPL')
result=response.read().split('\n')
result = result[15 :] # Your code does this, too. Not sure why.
all_data = []
for i, data in enumerate(x):
if data == '': continue
c, h, l, o, v, _ = map(float, data.split(','))
all_data.append(IntradayData(c, h, l, o, v, i))

I believe it depends on how much data manipulation you will want to do after retrieving data.
If, for example, you plan to just print them on the screen then an array of tuple would do.
However, if you need to be able to sort, search, and other kind of data manipulation I believe a custom class could help: you would then work with a list (or even a home-brew container) of custom objects, allowing you for easily adding custom methods, based on your need.
Note that this is just my opinion, and I'm not an advanced Python developer.

Pandas (http://pandas.pydata.org/pandas-docs/stable/) is particularly well suited to this. Numpy is a little lower level, but may also suit your purposes. I really recommend going the pandas route, though. Either way you shouldn't lose too much of C's speed, so that's a plus.

Related

Assigned a complex value in cupy RawKernel

I am a beginner learning how to exploit GPU for parallel computation using python and cupy. I would like to implement my code to simulate some problems in physics and require to use complex number, but don't know how to manage it. Although there are examples in Cupy's official document, it only mentions about include complex.cuh library and how to declare a complex variable. I can't find any example about how to assign a complex number correctly, as well ass how to call the function in the complex.cuh library to do calculation.
I am stuck in line 11 of this code. I want to make a complex number value equal x[tIdx]+j*y[t_Idx], j is the imaginary number. I tried several ways and no one works, so I left this one here.
import cupy as cp
import time
add_kernel = cp.RawKernel(r'''
#include <cupy/complex.cuh>
extern "C" __global__
void test(double* x, double* y, complex<float>* z){
int tId_x = blockDim.x*blockIdx.x + threadIdx.x;
int tId_y = blockDim.y*blockIdx.y + threadIdx.y;
complex<float>* value = complex(x[tId_x],y[tId_y]);
z[tId_x*blockDim.y*gridDim.y+tId_y] = value;
}''',"test")
x = cp.random.rand(1,8,4096,dtype = cp.float32)
y = cp.random.rand(1,8,4096,dtype = cp.float32)
z = cp.zeros((4096,4096), dtype = cp.complex64)
t1 = time.time()
add_kernel((128,128),(32,32),(x,y,z))
print(time.time()-t1)
What is the proper way to assign a complex number in the RawKernel?
Thank you for answering this question!
#plaeonix, thank you very much for your hint. I find out the answer.
This line:
complex<float>* value = complex(x[tId_x],y[tId_y])
should be replaced to:
complex<float> value = complex<float>(x[tId_x],y[tId_y])
Then the assignment of a complex number works.

How to make every vehicle do something in Or Tools?

In several problems of the VRP, when informing the total number of vehicles, it is compromised that all are used and that at least they visit a node. In reality, this may not even be the best, but I would like to understand why and how to adapt according to needs.
The example below concerns the simple VRP example according to OR Tools, with a small edition in the distance matrix and some changes according to the website (blog) - https://activimetrics.com/blog/ortools/counting_dimension /. According to the latter, it is possible to carry out a fair distribution of routes, which seemed totally appealing, since, as a rule, the solver minimizes the longest route and ends up using fewer vehicles and assigning several nodes to it. An important need was the use of an approach that makes the vehicle act, ensuring that it is used at least once.
However, if 5 vehicles are used to solve the problem, by logic and result obtained, he gets there, places a node for each vehicle, which without this edition was not possible. The problem is that using only 4 vehicles, the solver is no longer there, it manages to distribute routes, but always leaves a vehicle out.
using System;
using System.Collections.Generic;
using Google.OrTools.ConstraintSolver;
public class VrpGlobalSpan
{
class DataModel
{
public long[,] DistanceMatrix = {
{0, 9777, 10050,7908,10867,16601},
{9777, 0, 4763, 4855, 19567,31500},
{10050, 4763,0,2622,11733,35989},
{7908,4855,2622,0,10966,27877},
{10867,19567,11733,10966,0,27795},
{16601,31500,35989,27877,27795,0},
};
public int VehicleNumber = 4;
public int Depot = 0;
};
/// <summary>
/// Print the solution.
/// </summary>
static void PrintSolution(
in DataModel data,
in RoutingModel routing,
in RoutingIndexManager manager,
in Assignment solution)
{
// Inspect solution.
long maxRouteDistance = 0;
for (int i = 0; i < data.VehicleNumber; ++i)
{
Console.WriteLine("Route for Vehicle {0}:", i);
long routeDistance = 0;
var index = routing.Start(i);
while (routing.IsEnd(index) == false)
{
Console.Write("{0} -> ", manager.IndexToNode((int)index));
var previousIndex = index;
index = solution.Value(routing.NextVar(index));
routeDistance += routing.GetArcCostForVehicle(previousIndex, index, 0);
}
Console.WriteLine("{0}", manager.IndexToNode((int)index));
Console.WriteLine("Distance of the route: {0}m", routeDistance);
maxRouteDistance = Math.Max(routeDistance, maxRouteDistance);
}
Console.WriteLine("Maximum distance of the routes: {0}m", maxRouteDistance);
}
public static void Main(String[] args)
{
// Instantiate the data problem.
DataModel data = new DataModel();
// Create Routing Index Manager
RoutingIndexManager manager = new RoutingIndexManager(
data.DistanceMatrix.GetLength(0),
data.VehicleNumber,
data.Depot);
// Create Routing Model.
RoutingModel routing = new RoutingModel(manager);
// Create and register a transit callback.
int transitCallbackIndex = routing.RegisterTransitCallback(
(long fromIndex, long toIndex) => {
// Convert from routing variable Index to distance matrix NodeIndex.
var fromNode = manager.IndexToNode(fromIndex);
var toNode = manager.IndexToNode(toIndex);
return data.DistanceMatrix[fromNode, toNode];
}
);
// Define cost of each arc.
routing.SetArcCostEvaluatorOfAllVehicles(transitCallbackIndex);
double answer = 5/data.VehicleNumber +1;
//double Math.Ceiling(answer);
//double floor = (int)Math.Ceiling(answer);
routing.AddConstantDimension(
1,
(int)Math.Ceiling(answer),
true, // start cumul to zero
"Distance") ;
RoutingDimension distanceDimension = routing.GetDimensionOrDie("Distance");
//distanceDimension.SetGlobalSpanCostCoefficient(100);
for (int i = 0; i < data.VehicleNumber; ++i)
{
distanceDimension.SetCumulVarSoftLowerBound(routing.End(i), 2, 1000000);
}
// Setting first solution heuristic.
RoutingSearchParameters searchParameters =
operations_research_constraint_solver.DefaultRoutingSearchParameters();
//searchParameters.FirstSolutionStrategy =
// FirstSolutionStrategy.Types.Value.PathCheapestArc;
searchParameters.TimeLimit = new Google.Protobuf.WellKnownTypes.Duration { Seconds = 5 };
searchParameters.LocalSearchMetaheuristic = LocalSearchMetaheuristic.Types.Value.Automatic;
// Solve the problem.
Assignment solution = routing.SolveWithParameters(searchParameters);
// Print solution on console.
PrintSolution(data, routing, manager, solution);
}
}
Perhaps it must have been a topic already discussed, but I wanted to understand and understand what is the best path to follow and what measures to take to transform this example and others to follow in a better approach.
I thank you in advance for your attention and I am waiting for feedback.
Thank you.

python interaction with BPF maps

I'm wondering if there is an easy to to initialize BPF maps from python userspace. For my project, I'll have a scary looking NxN 2d array of floats for each process. For simplicity's sake, lets assume N is constant across processes (say 5). To achieve kernel support for this data, I could do something like:
b = BPF(text = """
typedef struct
{
float transMat[5][5];
} trans_struct;
BPF_HASH(trans_mapping, char[16], trans_struct);
.....
""")
I'm wondering if theres an easy way to initialize this map from python. Something like:
for ele in someDictionary:
#asume someDitionary has mapping (comm -> 5x5 float matrix)
b["trans_mapping"].insert(ele, someDictionary[ele])
I suppose at the crux of my confusion is -- 1) are all map methods available to the user, 2) how do we ensure type consistenty when going from python objects to c structures
Solution based on pchaigno's comment -- The key things to note are the use of c_types to ensure type consistency across environments, and extracting the table by indexing the BPF program object. Due to our ability to get maps by indexing, the get_table() function is now considered out of date. This format provides the structure of loading data into the map from the python front-end, but doesn't completely conform with the specifics of my question.
from time import sleep, strftime
from bcc import BPF
from bcc.utils import printb
from bcc.syscall import syscall_name, syscalls
from ctypes import *
b = BPF(text = """
BPF_HASH(start, u32, u64);
TRACEPOINT_PROBE(raw_syscalls, sys_exit)
{
u32 syscall_id = args->id;
u32 key = 1;
u64 *val;
u32 uid = bpf_get_current_uid_gid();
if (uid == 0)
{
val = start.lookup(&key); //find value associated with key 1
if (val)
bpf_trace_printk("Hello world, I have value %d!\\n", *val);
}
return 0;
}
""")
thisStart = b["start"]
thisStart[c_int(1)] = c_int(9) #insert key-value part 1->9
while 1:
try:
(task, pid, cpu, flags, ts, msg) = b.trace_fields()
except KeyboardInterrupt:
print("Detaching")
exit()
print("%-18.9f %-16s %-6d %s" % (ts, task, pid, msg))

Ask for clues for the “It's All About the Miles”

I recently encounter a problem and I really cannot figure out how to solve it. It's a problem in Open Kattis.
Please visit https://uchicago.kattis.com/problems/uchicago.miles
By now, I know it's a recursion problem.
But how to define this recursive procedure? I don't know where should I start.
So please give me a clue or maybe some pseudocode.
Here I pasted my code for reading the input, and I convert the input data into dictionary.
AFItt = input().split()
A, F, I = map(int, AFItt[0:3])
tmin, tmax = map(float, AFItt[3:])
airport = []
ada ={}
ai= []
for _ in range(A):
airport.append(input())
for _ in range(F):
ffda = input().split()
if ffda[0] + " " + ffda[1] not in ada.keys():
ada[ffda[0] + " " + ffda[1]] = (float(ffda[2]), float(ffda[3]))
else:
ada[ffda[0] + " " + ffda[1]] += ((float(ffda[2]), float(ffda[3])))
for _ in range(I):
ai.append(input())
I will try to give you a clue, but not sure whether it is efficient enough. I wrote a javascript version and it can produce the sample outputs correctly.
The idea of my solution is very simple: from the starting of the itinerary, find all possible next flights and keep appending to previous flight runs.
For example,
for first 2 itinerary airports, I will find all the possible flights and save it in an array list [[fligh1], [flight2], [flight3]]
after that, I will loop all the current possible runs, and continue to check if there existed an flight for the possible run to continue. If not, it is excluded, if yes, we append the flight to the list.
If flight1 and flight2 cannot continue, but flight3 has two possible flights to continue, my flight list will be changed to [[flight3, flight4], [flight3, flight5]]
A bit hard to me to explain well. Following is some code skeleton:
function findAllFlights(flightMap,
currentFlights,
currentItineraryIndex,
itineraryList, minTime, maxTime){
//flightMap is a map of all the flights. sample data:
/*{'a->b':[{from: 'a', to:'b', depTime:'1', arrTime:'2'}, {another flight}, ... ],
'b->c': [{from: 'b', to:'c', depTime:'1', arrTime:'2'}, {another flight}, ... ]}
*/
//currentFlights is the result of current possible runs, it is a list of list of flights. each sub list means a possible run.
//[[flight1, flight2], [flight1, flight3], ...]
//currentItineraryIndex: this is the next airport index in the itineraryList
//itineraryList: this is the list of airports we should travel.
//minTime, maxTime: it is the min time and max time.
if(currentItineraryIndex == 0){
var from = itineraryList[0];
var to = itineraryList[1];
var flightMapKey = from+'->'+to;
var possibleFlights = flightMap[flightMapKey];
if(possibleFlights.length == 0){
return [];
}
for(var i=0; i<possibleFlights.length; i++){
//current flights should be a list of list of flights.
//each of the sub list denotes the journey currently.
currentFlights.push([possibleFlights[i]]);
}
return findAllFlights(flightMap, currentFlights, 1, itineraryList, minTime, maxTime);
}else if(currentItineraryIndex == itineraryList.length - 1){
//we have searched all the required airports
return currentFlights;
}else{
//this is where you need to recursively call findAllFlights method.
var continableFlights = [];
//TODO: try to produce the continuable list of flights here based on above explanation.
//once we have the continuable flights for current itinerary airport, we can find flights for next airport similarly.
return findAllFlights(flightMap, continableFlights, currentItineraryIndex + 1, itineraryList, minTime, maxTime);
}
}
Enjoy!

Order randomly distributed data into regular grid

I am trying to order data scattered over the globe on a regular lat/lon grid.
I could go through all the data and check to which grid cell they belong and then append them to a list for each cell. This approach seems to me long and not very efficient.
I am sure that this is not a new problem, but I have not been able to find a solution on the web. Does anyone has a suggestion or can point me to an example or tutorial?
I believe geohashing would be useful, in this particular case you could use Morton number. This blog titled "Spatial Keys – Memory Efficient Geohashes" has example implementation, it's in Java, but Python version isn't much different.
long hash = 0;
double minLat = minLatI;
double maxLat = maxLatI;
double minLon = minLonI;
double maxLon = maxLonI;
int i = 0;
while (true) {
if (minLat midLat) {
hash |= 1;
minLat = midLat;
} else
maxLat = midLat;
}
hash <<= 1;
if (minLon midLon) {
hash |= 1;
minLon = midLon;
} else
maxLon = midLon;
}
i++;
if (i < iterations)
hash <<= 1;
else
break;
}
return hash;
The advantage of the Morton code is that you can calculate less bits (less iteration above) for bigger grid, more bits for finer grid. Or have fine grained codes calculated and just use prefixes for bigger grid.

Categories

Resources