Programming by example

Sometimes I don’t know the function I need to perform a task I want.
Often its something simple like; I have two dataframes and I want to create a new dataframe which has all the rows of the first followed by all of the rows of the second.
I can for a small example create the two data frames and I can try and describe what I want to find through google, however, often the phrases that I type into google, being natural language, are ambiguous an imprecise of what I’m trying to describe.
In this example of concatenating data frames, I more often came across joining the two dataframe by adding the columns of one to columns of the other rather than adding rows.


This test of the function I’m currently building passes managing to find the function in the pandas library which concatenates the two objects.
This is still very much a pre-useful stage as in the test I’ve had to put the two dataframes in a list as, having found the solution through google, knew that is how they needed to be joined.

The idea is to try and progress this style of trying all the functions in a set of possible functions and all of the functions which are attributes on the argument themselves until the solution is found.
The solution can also be one where we would know it when we see it but don’t know per se what it is.
Since we pass a criteria function rather than an object to find equality with if we can, given the correct object, know it is the correct object, this function can be used to find it.

The structure of the finding is to try 4 derivatives of the source object.
  • The object itself (To do nothing is sufficient)
  • An attribute of the object
  • Calling member function of the object
  • Passing the object as an argument to one of a set of functions

These will then need to be composed going forward to permit arbitrary pathfinding. That is to say, there are a lot of questions which can’t be answered with a single function call and so to get the desired object more than one of the above four operations is required.

At the time of writing, I have created a first-order find function that will find a path which is a composition of two of the list of candidate operations.
The problem with creating a general form of this function is that the order becomes important. For each of the candidate classes above the size of the class is limited.
Depth as a direction to this is potentially infinite in size. Without a sensible ordering or aggressive trimming, one of the four classes will be checked but the others never will.

I have friends who would say this whole exercise is superfluous for you should simply remember the member functions of an object or read the documentations so as not to have issue. For those of us who are more humble about our abilities this can for example check the space of function in the pandas library on the simple example’s in less than 0.3 seconds. That is quicker than most people can react to anything, let alone find a path set.
It is true that the criteria and source data needs to be built but the speed of the search beyond the cost of the specification makes me think that even for searches of some depth this could be useful.

What about functions where you need to pass arguments?
This is another section which is complicated and has the ability to blow up in scope. The way I have currently solved this is with the concept of hints. The hints contain the totality of the possible arguments to be passed.
Although I don’t have a system to match the hints with the keyword arguments yet, they do provide a way to specify possible options.

Plan going forward
There are quite a few things I want to do with this project going forward. Almost all revolve around the idea of intelligently trimming the tree of possible options down and ordering those that are left.

At the moment the find function won’t be able to pass argument that are keyword only, this requires having an understanding of what the possible keyword arguments could be. It would also work well if we could as part of the hint system specify that this hint should be attempted as the value for this keyword argument, on the condition that the keyword argument is a valid option. This is what I’m currently working on using the inspect library.

Typing is a low hanging fruit when it comes to pruning the tree of possible options. There is no point passing hints as arguments if they are not the right type. Equally there is no point composing functions that don’t match in their types.

Idempotence and symmetry
Quickspec is a haskel module which I found inspiring when I was first introduce to it at university. It find relational truths about the functions given to it. It can for instance find that two functions are comutitive in their application, or associative.
If a function application is idempotent then application of the same function to the result will do nothing. That means we can remove the option of further application of idempotent function from the tree.
If the composition of two functions is symmetric that is to say one function is the inverse of the other then we can trim the composition of those two function from the tree as it is not providing us with any exploration of space.

Ordering candidates.
Beyond trimming the tree we also need to order the candidates that are left. One way to do this would be to have the criteria function return, rather than a boolean, a number between 0 and 1 representing a distance the correct answer. Candidates can functions on candidates can then be ordered base on how close they are to the correct answer. This puts a lot of onus on the criteria function and doesn’t order the functions before they are applied to the candidates.

Comments

Popular posts from this blog

Structural engineering with cardboard

The twelve fold way