Find the top 10 directories on your system that has the most files in it. For that, only count the files immediately under it. That is, don’t count files in subdirectories. For file, I mean just about anything that isn’t a symbolic link or weird device. Think about how you want to show the person running your program what it’s doing.
This challenge isn’t about counting so much as traversing, remembering, and displaying. How do you know what you need to handle next and which one was largest?
I’m actually writing my own version of this right now because I’m making some benchmarks on opendir
versus glob
and need some test cases. I could just create some new directories and make a bunch of fake files in them, but that’s no fun.
I don’t care how long your program takes, although you might. Let it run in a window (or screen) on its own. Test it on a small directory first (so, there’s a hint there).
I made a Curses version (but don’t look at it until you’ve tried your own solution!):
You can see a list of all Challenges and my summaries as well as the programs that I created and put in the Learning Perl Challenges GitHub repository.
I had a script that does just that in my ~/bin, originally using File::Find, now rewritten using Path::Iterator::Rule, after reading rjbs’ file finder modules comparison.
Might as well try doing this challenge (interesting enough).
Runs in 4 seconds on my PC. Probably could be optimized further, but I don’t care much. Also, I hope that using CPAN is fine.
A first attempt…
Bugfix: in small trees there may be less than 10 directories.
Added some comments, tried to use a bit better style.
It was fun :). Here my solution:
I’ve posted similar challenges as brainteasers, at work, as well as using them to sort job applications into the categories: “knows Perl” and “knows the word, ‘Perl'”. I’ve found that my beautiful, well factored, beautifully documented code can be boiled down to a simple Unix pipeline, so in this case I began with that:
sudo
because we’ll have to go into all sorts of directories, not all of which are owned by the user, even on my personal desktop.find / -type f -print
– Traverse all directories beginning at the root directory, and print out the path to each file.2 > /dev/null
– If there are weird errors because of strange names, just discard the error messages. Might not be a suitable solution if your software is running a nuclear power plant or a Mars Rover, but a great first approximation.xargs -n 1 dirname
– take each line of output from ‘find’, and consider only the path; discard the filename component.sort
– Get all identical values adjacent.uniq -c
– replace a sequence of identical lines with a single instance, preceded by the number of times it was seen. Non-adjacent instances are not collapsed, which is why the sorting is necessary beforehand.sort -n -r
– Sort the output of ‘uniq’ by the numeric count (-n
), in descending order (-r
).head
– brian wants the first ten.Makes for a pretty good start, but it generates error messages about mis-matched quote characters. Using tr to clean out all the expected characters in filenames, to find the odd chars, I discover filenames containing ‘, `, ~, ^, %, #, +, {, }, [, ], , |plus some files with totally Chinese names. Re-reading the ‘find’ man page, I rediscover the “-print0” …. rediscover in the sense that I’ve read about it before, but never used it. Man page says,
-print0
uses null-terminated strings and-0
tellsxargs
to expect that intput. Changing to:produces much better results:
and similar numbers for other directories.
Hmm … time to delete that directory, haven’t looked in there in nine years.