For a new project I started recently, I have been using a Ruby gem called Directory Watcher, which does just what it says: It watches directories! Specifically, it watches for changes to files. Even more specificaly, adding, modifying and deleting files.
It is pretty simple, from the docs:
Not only is it simple but it works well and it works quickly. I have absolutely no complaints.
One thing I found interesting about how it worked, though, was that when you register an observer, it gets passed an
Array of all the changes as opposed to each change one at a time. I asked the developer, Tim Pease, as to why, and he said:
No particular reason for the choice…Iterating over the events array and passing them one at a time might prove to be faster.
Please do investigate and let me know what you find out :)
And so I decided to investigate!
I tried two variations of the original code. The bulk of the original is in its run method:
As you can see it calls a method named
scan_files which returns a
Hash of all the files found. It then compares this hash to the previous iteration of the hash to find the differences. In my first variation I modified the
scan_files method to just call
notify_observers itself with each event as they are found as opposed to returning an array. I thought this would be faster because there would be less overhead.
In my second variation, I did exactly as Tim suggested and just had it iterate through the array of events, calling
notify_observers once for each event.
To test, I ran each of the three variations 50 times on directories of increasing size. Here is my test script.
(All times in seconds. I calculated Standard Deviations but they didn’t vary much from variation to variation so I’m not bothering to post them.)
And a graph:
As you can see the original was the fastest, but only better than the second by 1.2 seconds when scanning over 200,000 files. When running my tests I did nothing with what
notify_observers passed back, so in the case of the original it didn’t iterate through the events array, which might account for the descrepancy.
Either way, Directory Watchers original implementation of storing the list of ‘events’ (changed files) in an array was definitely faster than notifying for each event (my first varation).
I prefer getting my events one at a time, so I’ll be going with my second variation for the code I use. And thanks Tim for writing this wonderful gem, it is making my life quite a bit easier!