techhub.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A hub primarily for passionate technologists, but everyone is welcome

Administered by:

Server stats:

5.3K
active users

#duperemove

0 posts0 participants0 posts today
Sergei Trofimovich<p>Today's bug is a `duperemove` infinite looping bug: <a href="https://github.com/markfasheh/duperemove/pull/376" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">github.com/markfasheh/duperemo</span><span class="invisible">ve/pull/376</span></a></p><p>There `duperemove` was not able to dedupe against NoCOW file:</p><p> $ dd if=/dev/urandom bs=8M count=1 &gt; a<br> $ touch b<br> $ chattr +C b<br> $ cat a &gt;&gt; b<br> $ ./duperemove -d -q --batchsize=0 --dedupe-options=partial,same a b<br> &lt;hangup&gt;</p><p>I noticed it about a month ago but got to debug it only today. It's a 0.15 regression. The fix is trivial once bisected.</p><p><a href="https://fosstodon.org/tags/duperemove" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>duperemove</span></a> <a href="https://fosstodon.org/tags/bug" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>bug</span></a></p>
Andreas Kilgus<p>Bin dann doch ein wenig neugierig, ob der seit 29.11. laufende <code>duperemove</code>-Prozess noch irgendwann enden oder in die Erbmasse mit eingehen wird.</p><p><a href="https://friendica.andreaskilgus.de/search?tag=duperemove" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>duperemove</span></a></p>
Sergei Trofimovich<p>Today's `duperemove` bug is a <a href="https://github.com/markfasheh/duperemove/issues/332" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">github.com/markfasheh/duperemo</span><span class="invisible">ve/issues/332</span></a>.</p><p>There `duperemove` crashes when the file being deduped gets truncated down to zero.</p><p>And the bug is already fixed!</p><p><a href="https://fosstodon.org/tags/duperemove" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>duperemove</span></a> <a href="https://fosstodon.org/tags/bug" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>bug</span></a></p>
Sergei Trofimovich<p>`dupermove-0.14` is a lot faster than `duperemove-0.13`!</p><p>Unfortunately it crashes sometimes on my input data. It takes about 10 minutes to observe the crash.</p><p>I wrote a trivial fuzzer to generate funny filesystem states for `duperemove`. Guess how long it takes to crash `duperemove `with it.</p><p>Spoiler: <a href="https://trofi.github.io/posts/305-fuzzing-duperemove.html" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">trofi.github.io/posts/305-fuzz</span><span class="invisible">ing-duperemove.html</span></a></p><p><a href="https://fosstodon.org/tags/duperemove" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>duperemove</span></a> <a href="https://fosstodon.org/tags/bug" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>bug</span></a></p>
Sergei Trofimovich<p>Today's `duperemove` bug is a <a href="https://github.com/markfasheh/duperemove/pull/324" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">github.com/markfasheh/duperemo</span><span class="invisible">ve/pull/324</span></a>.</p><p>There quite aggressive `--dedupe-options=partial` option used less optimized `sqlite` query to fetch unique file extents. That caused the whole database scan when data was queries for each individual file.</p><p>The fix switched `JOIN` query for nested `SELECT` query to convert from full scan to an index lookup.</p><p><a href="https://fosstodon.org/tags/duperemove" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>duperemove</span></a> <a href="https://fosstodon.org/tags/bug" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>bug</span></a></p>
Sergei Trofimovich<p>Today's `duperemove` bug is a minor accounting bug: <a href="https://github.com/markfasheh/duperemove/pull/323" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">github.com/markfasheh/duperemo</span><span class="invisible">ve/pull/323</span></a></p><p> $ ls -lh /nix/var/nix/db/db.sqlite<br> 1.4G /nix/var/nix/db/db.sqlite</p><p>Before the change:</p><p> $ ./show-shared-extents /nix/var/nix/db/db.sqlite<br> /nix/var/nix/db/db.sqlite: 27065321263104 shared bytes</p><p>After the change:</p><p> $ ./show-shared-extents /nix/var/nix/db/db.sqlite<br> /nix/var/nix/db/db.sqlite: 1169276928 shared bytes</p><p>The size reduction is not as impressive as initially reported :)</p><p><a href="https://fosstodon.org/tags/duperemove" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>duperemove</span></a> <a href="https://fosstodon.org/tags/bug" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>bug</span></a></p>
Sergei Trofimovich<p>Today's bug is a `duperemove` quadratic slowdown: <a href="https://github.com/markfasheh/duperemove/pull/322" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">github.com/markfasheh/duperemo</span><span class="invisible">ve/pull/322</span></a></p><p>There `duperemove` was struggling to dedupe small files inlined into metadata entries. It kept trying to dedupe all of them as a single set (even if files' contents did not match).</p><p>This fix is a one-liner: just don't track non-dedupable files.</p><p>Without the fix dedupe run never finished on my system. I always had to run it on a subset to get any progress. Now the whole run takes 20 minutes.</p><p><a href="https://fosstodon.org/tags/duperemove" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>duperemove</span></a> <a href="https://fosstodon.org/tags/bug" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>bug</span></a></p>
Sergei Trofimovich<p>It feels like `duperemove` could have worked a lot faster than it does today.</p><p>What would it take to get a 2x speedup on small files? A one-liner: <a href="https://github.com/markfasheh/duperemove/pull/318" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">github.com/markfasheh/duperemo</span><span class="invisible">ve/pull/318</span></a></p><p>There are still a ton of low hanging improvements hiding there.</p><p><a href="https://fosstodon.org/tags/duperemove" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>duperemove</span></a> <a href="https://fosstodon.org/tags/bug" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>bug</span></a></p>
Sergei Trofimovich<p>Today's `duperemove` bug is a hangup bug on a directory with 1 million of unique 1KB files: <a href="https://github.com/markfasheh/duperemove/issues/316" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">github.com/markfasheh/duperemo</span><span class="invisible">ve/issues/316</span></a></p><p>In theory it should take about a minute to hash every single file and less than a minute to find out that all the files have unique hashes.</p><p>In practice the process gets stuck somewhere in the middle.</p><p><a href="https://fosstodon.org/tags/duperemove" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>duperemove</span></a> <a href="https://fosstodon.org/tags/bug" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>bug</span></a></p>
Sergei Trofimovich<p>Today's (or rather this month's) bug is a quadratic slowdown of incremental `duperemove` runs.</p><p>There running `duperemove` incrementally over one directory at a time caused `duperemove` to rescan all previous files over and over.</p><p>In <a href="https://github.com/markfasheh/duperemove/issues/303" rel="nofollow noopener noreferrer" target="_blank"><span class="invisible">https://</span><span class="ellipsis">github.com/markfasheh/duperemo</span><span class="invisible">ve/issues/303</span></a> JackSlateur added `--dedupe-option=norescan_files` option to avoid the rescans.</p><p>Meanwhile I found a few more aggressive deduping options and keep reporting various failures there. I hope we'll get it working soon.</p><p><a href="https://fosstodon.org/tags/duperemove" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>duperemove</span></a> <a href="https://fosstodon.org/tags/bug" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>bug</span></a></p>