Why nested When().Then() is slower than Left Join in Rust Polars?

Why nested When().Then() is slower than Left Join in Rust Polars? - python

In Rust Polars(might apply to python pandas as well) assigning values in a new column with a complex logic involving values of other columns can be achieved in two ways. The default way is using a nested WhenThen expression. Another way to achieve same thing is with LeftJoin. Naturally I would expect When Then to be much faster than Join, but it is not the case. In this example, When Then is 6 times slower than Join. Is that actually expected? Am I using When Then wrong?
In this example the goal is to assign weights/multipliers column based on three other columns: country, city and bucket.
use std::collections::HashMap;
use polars::prelude::*;
use rand::{distributions::Uniform, Rng}; // 0.6.5
pub fn bench() {
// PREPARATION
// This MAP is to be used for Left Join
let mut weights = df![
"country"=>vec!["UK"; 5],
"city"=>vec!["London"; 5],
"bucket" => ["1","2","3","4","5"],
"weights" => [0.1, 0.2, 0.3, 0.4, 0.5]
].unwrap().lazy();
weights = weights.with_column(concat_lst([col("weights")]).alias("weihts"));
// This MAP to be used in When.Then
let weight_map = bucket_weight_map(&[0.1, 0.2, 0.3, 0.4, 0.5], 1);
// Generate the DataSet itself
let mut rng = rand::thread_rng();
let range = Uniform::new(1, 5);
let b: Vec<String> = (0..10_000_000).map(|_| rng.sample(&range).to_string()).collect();
let rc = vec!["UK"; 10_000_000];
let rf = vec!["London"; 10_000_000];
let val = vec![1; 10_000_000];
let frame = df!(
"country" => rc,
"city" => rf,
"bucket" => b,
"val" => val,
).unwrap().lazy();
// Test with Left Join
use std::time::Instant;
let now = Instant::now();
let r = frame.clone()
.join(weights, [col("country"), col("city"), col("bucket")], [col("country"), col("city"), col("bucket")], JoinType::Left)
.collect().unwrap();
let elapsed = now.elapsed();
println!("Left Join took: {:.2?}", elapsed);
// Test with nested When Then
let now = Instant::now();
let r1 = frame.clone().with_column(
when(col("country").eq(lit("UK")))
.then(
when(col("city").eq(lit("London")))
.then(rf_rw_map(col("bucket"),weight_map,NULL.lit()))
.otherwise(NULL.lit())
)
.otherwise(NULL.lit())
)
.collect().unwrap();
let elapsed = now.elapsed();
println!("Chained When Then: {:.2?}", elapsed);
// Check results are identical
dbg!(r.tail(Some(10)));
dbg!(r1.tail(Some(10)));
}
/// All this does is building a chained When().Then().Otherwise()
fn rf_rw_map(col: Expr, map: HashMap<String, Expr>, other: Expr) -> Expr {
// buf is a placeholder
let mut it = map.into_iter();
let (k, v) = it.next().unwrap(); //The map will have at least one value
let mut buf = when(lit::<bool>(false)) // buffer WhenThen
.then(lit::<f64>(0.).list()) // buffer WhenThen, needed to "chain on to"
.when(col.clone().eq(lit(k)))
.then(v);
for (k, v) in it {
buf = buf
.when(col.clone().eq(lit(k)))
.then(v);
}
buf.otherwise(other)
}
fn bucket_weight_map(arr: &[f64], ntenors: u8) -> HashMap<String, Expr> {
let mut bucket_weights: HashMap<String, Expr> = HashMap::default();
for (i, n) in arr.iter().enumerate() {
let j = i + 1;
bucket_weights.insert(
format!["{j}"],
Series::from_vec("weight", vec![*n; ntenors as usize])
.lit()
.list(),
);
}
bucket_weights
}
The result is surprising to me: Left Join took: 561.26ms vs Chained When Then: 3.22s
Thoughts?
UPDATE
This does not make much difference. Nested WhenThen is still over 3s
// Test with nested When Then
let now = Instant::now();
let r1 = frame.clone().with_column(
when(col("country").eq(lit("UK")).and(col("city").eq(lit("London"))))
.then(rf_rw_map(col("bucket"),weight_map,NULL.lit()))
.otherwise(NULL.lit())
)
.collect().unwrap();
let elapsed = now.elapsed();
println!("Chained When Then: {:.2?}", elapsed);

The joins are one of the most optimized algorithms in polars. A left join will be executed fully in parallel and has many performance related fast paths. If you want to combine data based on equality, you should almost always choose a join.

Related

Python faster than rust with py03

I am trying to speed up some python code using rust bindings with py03.
i have implemented the following function in both python and rust:
def _play_action(state, action):
temp = state.copy()
i1, j1, i2, j2 = action
h1 = abs(temp[i1][j1])
h2 = abs(temp[i2][j2])
if temp[i1][j1] < 0:
temp[i2][j2] = -(h1 + h2)
else:
temp[i2][j2] = h1 + h2
temp[i1][j1] = 0
return temp
#[pyfunction]
fn play_action(state: [[i32; 9]; 9], action : [usize;4]) -> [[i32; 9]; 9] {
let mut s = state.clone();
let h1 = s[action[0]][action[1]];
let h2 = s[action[2]][action[3]];
s[action[0]][action[1]] = 0;
s[action[2]][action[3]] = h1.signum() * (h1 + h2).abs();
s
And to my great surprise the python version is faster... Any idea why?

This is probably caused by the overhead of the communication between python and Rust, the data you're passing is too small so I assume you're calling play_action many times. a better approach would be to batch your calls
#[pyfunction]
fn play_actions(data: Vec<([[i32; 9]; 9],[usize;4])>) -> Vec<[[i32; 9]; 9]> {
data.into_iter()
.map(|(state,action)| play_action(state,action))
.collect::<Vec<_>>()
}
fn play_action(state: [[i32; 9]; 9], action : [usize;4]) -> [[i32; 9]; 9] {
let mut s = state.clone();
let h1 = s[action[0]][action[1]];
let h2 = s[action[2]][action[3]];
s[action[0]][action[1]] = 0;
s[action[2]][action[3]] = h1.signum() * (h1 + h2).abs();
s
}

If you are calling the function written in rust from Python, there will have to be a conversion from Python objects to rust data structures. The time that this takes is overhead.
Since your function seems pretty small, it could easily be that the overhead overwhelms the runtime of the function.
I would encourage you to profile your python code (using the cProfile module) before trying to make it faster. Profiling and the insight in the behavior of your code that it provides can enable significant performance gains.
Here is a link to the first of a series of articles that I've written about python profiling.
If you do a lot of number crunching, see if your problem is a good fit for numpy.
If a relatively small function takes up a lot of the execution time because it is called very often, try using the functools.cache decorator.
Keep in mind that a better algorithm generally beats optimizations.

How to make every vehicle do something in Or Tools?

In several problems of the VRP, when informing the total number of vehicles, it is compromised that all are used and that at least they visit a node. In reality, this may not even be the best, but I would like to understand why and how to adapt according to needs.
The example below concerns the simple VRP example according to OR Tools, with a small edition in the distance matrix and some changes according to the website (blog) - https://activimetrics.com/blog/ortools/counting_dimension /. According to the latter, it is possible to carry out a fair distribution of routes, which seemed totally appealing, since, as a rule, the solver minimizes the longest route and ends up using fewer vehicles and assigning several nodes to it. An important need was the use of an approach that makes the vehicle act, ensuring that it is used at least once.
However, if 5 vehicles are used to solve the problem, by logic and result obtained, he gets there, places a node for each vehicle, which without this edition was not possible. The problem is that using only 4 vehicles, the solver is no longer there, it manages to distribute routes, but always leaves a vehicle out.
using System;
using System.Collections.Generic;
using Google.OrTools.ConstraintSolver;
public class VrpGlobalSpan
{
class DataModel
{
public long[,] DistanceMatrix = {
{0, 9777, 10050,7908,10867,16601},
{9777, 0, 4763, 4855, 19567,31500},
{10050, 4763,0,2622,11733,35989},
{7908,4855,2622,0,10966,27877},
{10867,19567,11733,10966,0,27795},
{16601,31500,35989,27877,27795,0},
};
public int VehicleNumber = 4;
public int Depot = 0;
};
/// <summary>
/// Print the solution.
/// </summary>
static void PrintSolution(
in DataModel data,
in RoutingModel routing,
in RoutingIndexManager manager,
in Assignment solution)
{
// Inspect solution.
long maxRouteDistance = 0;
for (int i = 0; i < data.VehicleNumber; ++i)
{
Console.WriteLine("Route for Vehicle {0}:", i);
long routeDistance = 0;
var index = routing.Start(i);
while (routing.IsEnd(index) == false)
{
Console.Write("{0} -> ", manager.IndexToNode((int)index));
var previousIndex = index;
index = solution.Value(routing.NextVar(index));
routeDistance += routing.GetArcCostForVehicle(previousIndex, index, 0);
}
Console.WriteLine("{0}", manager.IndexToNode((int)index));
Console.WriteLine("Distance of the route: {0}m", routeDistance);
maxRouteDistance = Math.Max(routeDistance, maxRouteDistance);
}
Console.WriteLine("Maximum distance of the routes: {0}m", maxRouteDistance);
}
public static void Main(String[] args)
{
// Instantiate the data problem.
DataModel data = new DataModel();
// Create Routing Index Manager
RoutingIndexManager manager = new RoutingIndexManager(
data.DistanceMatrix.GetLength(0),
data.VehicleNumber,
data.Depot);
// Create Routing Model.
RoutingModel routing = new RoutingModel(manager);
// Create and register a transit callback.
int transitCallbackIndex = routing.RegisterTransitCallback(
(long fromIndex, long toIndex) => {
// Convert from routing variable Index to distance matrix NodeIndex.
var fromNode = manager.IndexToNode(fromIndex);
var toNode = manager.IndexToNode(toIndex);
return data.DistanceMatrix[fromNode, toNode];
}
);
// Define cost of each arc.
routing.SetArcCostEvaluatorOfAllVehicles(transitCallbackIndex);
double answer = 5/data.VehicleNumber +1;
//double Math.Ceiling(answer);
//double floor = (int)Math.Ceiling(answer);
routing.AddConstantDimension(
1,
(int)Math.Ceiling(answer),
true, // start cumul to zero
"Distance") ;
RoutingDimension distanceDimension = routing.GetDimensionOrDie("Distance");
//distanceDimension.SetGlobalSpanCostCoefficient(100);
for (int i = 0; i < data.VehicleNumber; ++i)
{
distanceDimension.SetCumulVarSoftLowerBound(routing.End(i), 2, 1000000);
}
// Setting first solution heuristic.
RoutingSearchParameters searchParameters =
operations_research_constraint_solver.DefaultRoutingSearchParameters();
//searchParameters.FirstSolutionStrategy =
// FirstSolutionStrategy.Types.Value.PathCheapestArc;
searchParameters.TimeLimit = new Google.Protobuf.WellKnownTypes.Duration { Seconds = 5 };
searchParameters.LocalSearchMetaheuristic = LocalSearchMetaheuristic.Types.Value.Automatic;
// Solve the problem.
Assignment solution = routing.SolveWithParameters(searchParameters);
// Print solution on console.
PrintSolution(data, routing, manager, solution);
}
}
Perhaps it must have been a topic already discussed, but I wanted to understand and understand what is the best path to follow and what measures to take to transform this example and others to follow in a better approach.
I thank you in advance for your attention and I am waiting for feedback.
Thank you.

F# library or .Net Numerics equivalent to Python Numpy function

I have the following python Numpy function; it is able to take X, an array with an arbitrary number of columns and rows, and output a Y value predicted by a least squares function.
What is the Math.Net equivalent for such a function?
Here is the Python code:
newdataX = np.ones([dataX.shape[0],dataX.shape[1]+1])
newdataX[:,0:dataX.shape[1]]=dataX
# build and save the model
self.model_coefs, residuals, rank, s = np.linalg.lstsq(newdataX, dataY)

I think you are looking for the functions on this page: http://numerics.mathdotnet.com/api/MathNet.Numerics.LinearRegression/MultipleRegression.htm
You have a few options to solve :
Normal Equations : MultipleRegression.NormalEquations(x, y)
QR Decomposition : MultipleRegression.QR(x, y)
SVD : MultipleRegression.SVD(x, y)
Normal equations are faster but less numerically stable while SVD is the most numerically stable but the slowest.

You can call numpy from .NET using pythonnet (C# CODE BELOW IS COPIED FROM GITHUB):
The only "funky" part right now with pythonnet is passing numpy arrays. It is possible to convert them to Python lists at the interface, though this reduces performance for some situations.
https://github.com/pythonnet/pythonnet/tree/develop
static void Main(string[] args)
{
using (Py.GIL()) {
dynamic np = Py.Import("numpy");
dynamic sin = np.sin;
Console.WriteLine(np.cos(np.pi*2));
Console.WriteLine(sin(5));
double c = np.cos(5) + sin(5);
Console.WriteLine(c);
dynamic a = np.array(new List<float> { 1, 2, 3 });
dynamic b = np.array(new List<float> { 6, 5, 4 }, Py.kw("dtype", np.int32));
Console.WriteLine(a.dtype);
Console.WriteLine(b.dtype);
Console.WriteLine(a * b);
Console.ReadKey();
}
}
outputs:
1.0
-0.958924274663
-0.6752620892
float64
int32
[ 6. 10. 12.]
Here is example using F# posted on github:
https://github.com/pythonnet/pythonnet/issues/112
open Python.Runtime
open FSharp.Interop.Dynamic
open System.Collections.Generic
[<EntryPoint>]
let main argv =
//set up for garbage collection?
use gil = Py.GIL()
//-----
//NUMPY
//import numpy
let np = Py.Import("numpy")
//call a numpy function dynamically
let sinResult = np?sin(5)
//make a python list the hard way
let list = new Python.Runtime.PyList()
list.Append( new PyFloat(4.0) )
list.Append( new PyFloat(5.0) )
//run the python list through np.array dynamically
let a = np?array( list )
let sumA = np?sum(a)
//again, but use a keyword to change the type
let b = np?array( list, Py.kw("dtype", np?int32 ) )
let sumAB = np?add(a,b)
let SeqToPyFloat ( aSeq : float seq ) =
let list = new Python.Runtime.PyList()
aSeq |> Seq.iter( fun x -> list.Append( new PyFloat(x)))
list
//Worth making some convenience functions (see below for why)
let a2 = np?array( [|1.0;2.0;3.0|] |> SeqToPyFloat )
//--------------------
//Problematic cases: these run but don't give good results
//make a np.array from a generic list
let list2 = [|1;2;3|] |> ResizeArray
let c = np?array( list2 )
printfn "%A" c //gives type not value in debugger
//make a np.array from an array
let d = np?array( [|1;2;3|] )
printfn "%A" d //gives type not value in debugger
//use a np.array in a function
let sumD = np?sum(d) //gives type not value in debugger
//let sumCD = np?add(d,d) // this will crash
//can't use primitive f# operators on the np.arrays without throwing an exception; seems
//to work in c# https://github.com/tonyroberts/pythonnet //develop branch
//let e = d + 1
//-----
//NLTK
//import nltk
let nltk = Py.Import("nltk")
let sentence = "I am happy"
let tokens = nltk?word_tokenize(sentence)
let tags = nltk?pos_tag(tokens)
let taggedWords = nltk?corpus?brown?tagged_words()
let taggedWordsNews = nltk?corpus?brown?tagged_words(Py.kw("categories", "news") )
printfn "%A" taggedWordsNews
let tlp = nltk?sem?logic?LogicParser(Py.kw("type_check",true))
let parsed = tlp?parse("walk(angus)")
printfn "%A" parsed?argument
0 // return an integer exit code

From C to Python

I'm really sorry if this is a lame question, but I think this may potentially help others making the same transition from C to Python. I have a program that I started writing in C, but I think it's best if I did it in Python because it just makes my life a lot easier.
My program retrieves intraday stock data from Yahoo! Finance and stores it inside of a struct. Since I'm so used to programming in C I generally try to do things the hard way. What I want to know is what's the most "Pythonesque" way of storing the data into an organized fashion. I was thinking an array of tuples?
Here's a bit of my C program.
// Parses intraday stock quote data from a Yahoo! Finance .csv file.
void parse_intraday_data(struct intraday_data *d, char *path)
{
char cur_line[100];
char *csv_value;
int i;
FILE *data_file = fopen(path, "r");
if (data_file == NULL)
{
perror("Error opening file.");
return;
}
// Ignore the first 15 lines.
for (i = 0; i < 15; i++)
fgets(cur_line, 100, data_file);
i = 0;
while (fgets(cur_line, 100, data_file) != NULL) {
csv_value = strtok(cur_line, ",");
csv_value = strtok(NULL, ",");
d->close[i] = atof(csv_value);
csv_value = strtok(NULL, ",");
d->high[i] = atof(csv_value);
csv_value = strtok(NULL, ",");
d->low[i] = atof(csv_value);
csv_value = strtok(NULL, ",");
d->open[i] = atof(csv_value);
csv_value = strtok(NULL, "\n");
d->volume[i] = atoi(csv_value);
i++;
}
d->close[i] = 0;
d->high[i] = 0;
d->low[i] = 0;
d->open[i] = 0;
d->volume[i] = 0;
d->count = i - 1;
i = 0;
fclose(data_file);
}
So far my Python program retrieves the data like this.
response = urllib2.urlopen('https://www.google.com/finance/getprices?i=' + interval + '&p=' + period + 'd&f=d,o,h,l,c,v&df=cpct&q=' + ticker)
Question is, what's the best or most elegant way of storing this data in Python?

Keep it simple. Read the line, split it by commas, and store the values inside a (named)tuple. That’s pretty close to using a struct in C.
If your program gets more elaborate it might (!) make sense to replace the tuple by a class, but not immediately.
Here’s an outline:
from collections import namedtuple
IntradayData = namedtuple('IntradayData',
['close', 'high', 'low', 'open', 'volume', 'count'])
response = urllib2.urlopen('https://www.google.com/finance/getprices?q=AAPL')
result=response.read().split('\n')
result = result[15 :] # Your code does this, too. Not sure why.
all_data = []
for i, data in enumerate(x):
if data == '': continue
c, h, l, o, v, _ = map(float, data.split(','))
all_data.append(IntradayData(c, h, l, o, v, i))

I believe it depends on how much data manipulation you will want to do after retrieving data.
If, for example, you plan to just print them on the screen then an array of tuple would do.
However, if you need to be able to sort, search, and other kind of data manipulation I believe a custom class could help: you would then work with a list (or even a home-brew container) of custom objects, allowing you for easily adding custom methods, based on your need.
Note that this is just my opinion, and I'm not an advanced Python developer.

Pandas (http://pandas.pydata.org/pandas-docs/stable/) is particularly well suited to this. Numpy is a little lower level, but may also suit your purposes. I really recommend going the pandas route, though. Either way you shouldn't lose too much of C's speed, so that's a plus.

Order randomly distributed data into regular grid

I am trying to order data scattered over the globe on a regular lat/lon grid.
I could go through all the data and check to which grid cell they belong and then append them to a list for each cell. This approach seems to me long and not very efficient.
I am sure that this is not a new problem, but I have not been able to find a solution on the web. Does anyone has a suggestion or can point me to an example or tutorial?

I believe geohashing would be useful, in this particular case you could use Morton number. This blog titled "Spatial Keys – Memory Efficient Geohashes" has example implementation, it's in Java, but Python version isn't much different.
long hash = 0;
double minLat = minLatI;
double maxLat = maxLatI;
double minLon = minLonI;
double maxLon = maxLonI;
int i = 0;
while (true) {
if (minLat midLat) {
hash |= 1;
minLat = midLat;
} else
maxLat = midLat;
}
hash <<= 1;
if (minLon midLon) {
hash |= 1;
minLon = midLon;
} else
maxLon = midLon;
}
i++;
if (i < iterations)
hash <<= 1;
else
break;
}
return hash;
The advantage of the Morton code is that you can calculate less bits (less iteration above) for bigger grid, more bits for finer grid. Or have fine grained codes calculated and just use prefixes for bigger grid.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why nested When().Then() is slower than Left Join in Rust Polars? - python

The joins are one of the most optimized algorithms in polars. A left join will be executed fully in parallel and has many performance related fast paths. If you want to combine data based on equality, you should almost always choose a join.

Related

Python faster than rust with py03

How to make every vehicle do something in Or Tools?

F# library or .Net Numerics equivalent to Python Numpy function

From C to Python

Order randomly distributed data into regular grid

Categories

Resources