Trying to make Directory Watcher faster
For a new project I started recently, I have been using a Ruby gem called Directory Watcher, which does just what it says: It watches directories! Specifically, it watches for changes to files. Even more specificaly, adding, modifying and deleting files.
It is pretty simple, from the docs:
Not only is it simple but it works well and it works quickly. I have absolutely no complaints.
One thing I found interesting about how it worked, though, was that when you register an observer, it gets passed an Array
of all the changes as opposed to each change one at a time. I asked the developer, Tim Pease, as to why, and he said:
No particular reason for the choice…Iterating over the events array and passing them one at a time might prove to be faster.
Please do investigate and let me know what you find out :)
And so I decided to investigate!
I tried two variations of the original code. The bulk of the original is in its run method:
As you can see it calls a method named scan_files
which returns a Hash
of all the files found. It then compares this hash to the previous iteration of the hash to find the differences. In my first variation I modified the scan_files
method to just call notify_observers
itself with each event as they are found as opposed to returning an array. I thought this would be faster because there would be less overhead.
In my second variation, I did exactly as Tim suggested and just had it iterate through the array of events, calling notify_observers
once for each event.
To test, I ran each of the three variations 50 times on directories of increasing size. Here is my test script.
My results:
Original | First | Second | |
---|---|---|---|
4,887 Files | 0.22 | 0.26 | 0.23 |
42,304 Files | 1.48 | 1.91 | 1.7 |
249,467 Files | 28.7 | 34.1 | 29.9 |
(All times in seconds. I calculated Standard Deviations but they didn’t vary much from variation to variation so I’m not bothering to post them.)
And a graph:
As you can see the original was the fastest, but only better than the second by 1.2 seconds when scanning over 200,000 files. When running my tests I did nothing with what notify_observers
passed back, so in the case of the original it didn’t iterate through the events array, which might account for the descrepancy.
Either way, Directory Watchers original implementation of storing the list of ‘events’ (changed files) in an array was definitely faster than notifying for each event (my first varation).
I prefer getting my events one at a time, so I’ll be going with my second variation for the code I use. And thanks Tim for writing this wonderful gem, it is making my life quite a bit easier!