Programming by example
Sometimes I don’t know the function I need to perform a task I want.
Often its something simple like; I have two dataframes and I want to create a
new dataframe which has all the rows of the first followed by all of the rows
of the second.
I can for a small example create the two data frames and I can try and
describe what I want to find through google, however, often the phrases that I
type into google, being natural language, are ambiguous an imprecise of what
I’m trying to describe.
In this example of concatenating data frames, I more often came across joining
the two dataframe by adding the columns of one to columns of the other rather
than adding rows.
This test of the function I’m currently building passes managing to find the
function in the pandas library which concatenates the two objects.
This is still very much a pre-useful stage as in the test I’ve had to put the
two dataframes in a list as, having found the solution through google, knew
that is how they needed to be joined.
The idea is to try and progress this style of trying all the functions in a set
of possible functions and all of the functions which are attributes on the
argument themselves until the solution is found.
The solution can also be one where we would know it when we see it but don’t
know per se what it is.
Since we pass a criteria function rather than an object to find equality with
if we can, given the correct object, know it is the correct object, this
function can be used to find it.
The structure of the finding is to try 4 derivatives of the source object.
- The object itself (To do nothing is sufficient)
- An attribute of the object
- Calling member function of the object
- Passing the object as an argument to one of a set of functions
These will then need to be composed going forward to permit arbitrary
pathfinding. That is to say, there are a lot of questions which can’t be
answered with a single function call and so to get the desired object more
than one of the above four operations is required.
At the time of writing, I have created a first-order find function that will
find a path which is a composition of two of the list of candidate operations.
The problem with creating a general form of this function is that the order
becomes important. For each of the candidate classes above the size of the
class is limited.
Depth as a direction to this is potentially infinite in size. Without a
sensible ordering or aggressive trimming, one of the four classes will be checked
but the others never will.
I have friends who would say this whole exercise is superfluous for you should
simply remember the member functions of an object or read the documentations
so as not to have issue. For those of us who are more humble about our
abilities this can for example check the space of function in the pandas
library on the simple example’s in less than 0.3 seconds. That is quicker than
most people can react to anything, let alone find a path set.
It is true that the criteria and source data needs to be built but the speed
of the search beyond the cost of the specification makes me think that even
for searches of some depth this could be useful.
What about functions where you need to pass arguments?
This is another section which is complicated and has the ability to blow up in
scope. The way I have currently solved this is with the concept of hints. The
hints contain the totality of the possible arguments to be passed.
Although I don’t have a system to match the hints with the keyword arguments
yet, they do provide a way to specify possible options.
Plan going forward
There are quite a few things I want to do with this project going forward.
Almost all revolve around the idea of intelligently trimming the tree of
possible options down and ordering those that are left.
At the moment the find function won’t be able to pass argument that are
keyword only, this requires having an understanding of what the possible
keyword arguments could be. It would also work well if we could as part of the
hint system specify that this hint should be attempted as the value for this
keyword argument, on the condition that the keyword argument is a valid
option. This is what I’m currently working on using the inspect library.
Typing is a low hanging fruit when it comes to pruning the tree of possible
options. There is no point passing hints as arguments if they are not the
right type. Equally there is no point composing functions that don’t match in
their types.
Idempotence and symmetry
Quickspec is a haskel module which I found inspiring when I was first
introduce to it at university. It find relational truths about the functions
given to it. It can for instance find that two functions are comutitive in
their application, or associative.
If a function application is idempotent then application of the same function
to the result will do nothing. That means we can remove the option of further
application of idempotent function from the tree.
If the composition of two functions is symmetric that is to say one function
is the inverse of the other then we can trim the composition of those two
function from the tree as it is not providing us with any exploration of
space.
Ordering candidates.
Beyond trimming the tree we also need to order the candidates that are left.
One way to do this would be to have the criteria function return, rather than
a boolean, a number between 0 and 1 representing a distance the correct
answer. Candidates can functions on candidates can then be ordered base on how
close they are to the correct answer. This puts a lot of onus on the criteria
function and doesn’t order the functions before they are applied to the
candidates.
Comments
Post a Comment