tvl-depot/scripts/optimise-store.pl
Eelco Dolstra 9488ae7357 * `show-duplication.pl', a small utility that shows the amount of
package duplication present in (e.g.) a profile.  It shows the
  number of instances of each package in a closure, along with the
  size in bytes of each instance as well as the "waste" (the
  difference between the sum of the sizes of all instances and the
  average size).

  $ ./show-duplication.pl /nix/var/nix/profiles/default
  gcc 11
    3.3.6 19293318
    3.4.4 21425257
    ...
    average 14942970, waste 149429707
  coreutils 6
  ...
  average package duplication 1.87628865979381, total size 3486330471, total waste 1335324237, 38.3017114443825% wasted

  This utility is useful for measuring the cost in terms of disk space
  of the Nix approach.
2006-09-19 13:53:35 +00:00

68 lines
1.2 KiB
Perl
Executable file

#! /usr/bin/perl -w
use strict;
{ my $ofh = select STDOUT;
$| = 1;
select $ofh;
}
my @paths = ("/nix/store");
my $tmpfile = "/tmp/nix-optimise-hash-list";
#my $tmpfile = "/data/nix-optimise-hash-list";
system("find @paths -type f -print0 | xargs -0 md5sum -- > $tmpfile") == 0
or die "cannot hash store files";
system("sort $tmpfile > $tmpfile.sorted") == 0
or die "cannot sort list";
open LIST, "<$tmpfile.sorted" or die;
my $prevFile;
my $prevHash;
my $totalSpace = 0;
my $savedSpace = 0;
my $files = 0;
while (<LIST>) {
# print "D";
/^([0-9a-f]*)\s+(.*)$/ or die;
my $curFile = $2;
my $curHash = $1;
# print "A";
my $fileSize = (stat $curFile)[7];
# print "B";
# my $fileSize = 1;
$totalSpace += $fileSize;
if (defined $prevHash && $curHash eq $prevHash) {
# print "$curFile = $prevFile\n";
$savedSpace += $fileSize;
} else {
$prevFile = $curFile;
$prevHash = $curHash;
}
print "." if ($files++ % 100 == 0);
#print ".";
# print "C";
}
print "\n";
print "total space = $totalSpace\n";
print "saved space = $savedSpace\n";
my $savings = ($savedSpace / $totalSpace) * 100.0;
print "savings = $savings %\n";
close LIST;