Posts

I recently had to solve a somewhat theoretical, somewhat Computer Science 101 problem in a real world application: finding all nodes in an m-ary tree given an arbitrary starting node. In terms of source data, the trees were stored in a SQL database using the parent pointer model. Graph databases are more well suited for modeling such data, but let's ignore that fact for the time being, because most data still lives in relational databases (of some sort) in the current age. While trying to solve this apparently simple problem in Postgres, I came across some hidden costs of common table expressions (CTEs) that I was not familiar with. Let's look at this simple theoretical problem, the pure SQL solution, and some things you should be aware of about CTE implementations on the DBMS side.
Recent frustrations with the more traditional desktop environment I have been using have led me to switch to a simpler window manager. This post isn't so much about what I learned about about window managers and the X window system, but how I set everything up so I could debug a window manager and step through it line by line. Maybe it is just because I am not even close to a GUI programmer, whether for the desktop applications on any platform or the web, so I am not familiar with how the 'pros' to it, but I did not find it to be particularly straightforward. I am sure someone more well versed in this field than me can explain why what I am doing is ridiculous or how there are much better ways.
There is a known trick for adding bound (a.k.a. box) constraints to optimization algorithms when the method does not natively support bound constraints, such as with the Levenberg-Marquardt algorithm. The trick is to simply add an internal scaling/transformation step in the objective function so that an unbound value is scaled into a bounded domain before actually being passed into the objective function. In this post we will examine the standard transformation used for this purpose, and examine the performance of some alternatives.
Numerical algorithms are often developed in Python due to ease of prototyping, and because many of the libraries that do the heavy lifting are written in fast, compiled languages, so performance is not usually an issue. However, there are often cases where you don't want to use Python. For instance, you may not want to deploy an application containing sensitive IP written in a scripting language (even in the form of Python byte code) on a client PC. As an algorithm developer, in many such cases you will be working with another team to implement the algorithm into a GUI Windows application, often written in something like C#. For such cases, it is useful to know how to cross-compile C++ code from Linux to target a DLL for Windows deployment, and how to interoperate between C++ and C#.
I recently saw that conda-forge released a Tensorflow-GPU package. Many people, myself included, have been waiting for this since Anaconda changed the license for their conda package repositories, since obtaining a Tensorflow-GPU conda package trivializes the installation of GPU enabled Tensorflow, which notoriously can be a bit tricky. This prompted me to want to step back a bit, and look at why efficient implementations of optimization algorithms, like gradient descent, are necessary, and why you should generally not write any numerical algorithms in pure Python.
High level packages for neural networks and optimization have greatly simplified the model development process. However, with these packages now becoming the "de facto" methods for approaching the task, the older methods are becoming something of a lost art. That is a bit of a problem because there are still many relevant cases where the classic approaches just seem to work better.
Earlier in the year I encountered a bug in the SciPy optimization routine 'trust-constr'. This is the method most analogous to Matlab's `fmincon` that supports optimization with arbitrary (linear or nonlinear) constraints. Optimizing a function subject to arbitrary constraints comes up frequently in data science tasks. This post details the simple fix that I made within the SciPy source code (that will hopefully be merged soon) as well as workarounds that can be used in the meantime.
30 Dec 2020 » Hello, world!
A brief statement of purpose