A friend once needed a small script to simplify the programmer’s daily life. Given an input path, a list containing file names and an output directory, it should only find all the files from the list in the input folder and copy them to the specified destination. Well, I didn’t have time back then (although I love to take a few minutes to create such small and helpful command line tools), so he assembled a quick ‘n’ dirty temporary solution in C#. More than a year later I needed a few sample files, so I sent him a list. Because we had ‘ListCompare’ to take care of it. (Fun fact: I came up with this extremely unfitting name and I have no idea why. It doesn’t compare lists, it compares a directory with a list. Very misleading title…) And as it is always the case with .NET: it just didn’t work. Finally, my chance had come to finish (…or better: start) this project. Shouldn’t take too long…
As for most of my cli applications, I used haxe. A great language, but still quite new - which means: there aren’t so many code examples or libraries around. I decided fastly that I want to create a few scripts and functions, which will be of use in future projects, but I didn’t have time to write in the past. The first thing that came to my mind was a command line parser. Doing a quick search on haxelib revealed that there already is a good one. Even better: hxargs comes in a simple single file - and uses a feature of haxe that is mostly considered black magic: Macros. As I only understand a small part of this extremely powerful language feature, I decided to take a look at it and extend its functionality to be even easier to use. (The modified version can be found in the git repository of BioLBM. I didn’t contribute the changes to the main repository. It doesn’t work well as a library because of some things I still can’t get to work. Maybe it’s not black magic, but hard to understand completely.)
The second one is closely connected to the idea of adding fuzzy search capabilities to the tool. When searching for files or even text, it can add a lot to usability if not only exact matches are displayed, but also similar (e.g. misspelled) items. To achieve that, it is necessary to calculate the similarity between the search word and all the strings to search, and return all matches which are below a threshold value. Doing so for thousands of files would result in minutes of processing time; with the list of search terms this can add up to a lot of waiting time for the user. The solution: index files, calculate the distance and save it in a data structure designed for similarity-based searches - the BK-tree. As haxe hasn’t been one, so bit more work was required. And so here it is: a BK-tree library for haxe.
During the script creation process I also made some changes to my personal haxe snippet library (Bio.hx) and used the opportunity to release it on GitHub. For installation and usage instructions, please refer to the readme files in the GitHub repositories.Written on September 12th, 2016 by bioruebe