Nicholas A. Ferrell<p><a href="https://social.emucafe.org/naferrell/blocking-independent-search-crawlers-07-17-25/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">social.emucafe.org/naferrell/b</span><span class="invisible">locking-independent-search-crawlers-07-17-25/</span></a></p><p>I read an interesting Hacker News comment by <a href="https://news.ycombinator.com/item?id=44589165" rel="nofollow noopener" target="_blank">user dumbfounder</a> (great username!) on Hacker News today. I excerpt the pertinent part below:</p><blockquote><p>I created a search engine that crawled the web way back in 2003. I used a proper user agent that included my email address. I got SO many angry emails about my crawler, which played as nice as I was able to make it play. Which was pretty nice I believe. If it’s not Google people didn’t want it. That’s a good way to prevent anyone from ever competing with Google.</p><p><a href="https://news.ycombinator.com/user?id=dumbfounder" rel="nofollow noopener" target="_blank">dumbfounder</a></p></blockquote><p>I have had <a href="https://social.emucafe.org/naferrell/03-19-25-sbinstitutionsbot-visits/" rel="nofollow noopener" target="_blank">problems with bad crawlers</a> (especially bad AI cralwers) on my sites. At the same time, dumbfounder highlights the reverse side of the coin. Many sites block <em>good crawlers </em>such as robots.txt-respecting crawlers for indepdent search engines. While all webmasters are free to control access to their sites as they see fit, allowing Google and other select big tech search crawlers while excluding small and independent search crawlers both limits search diversity and prevents people who may rely on small or niche search engines such as <a href="https://www.mojeek.com/" rel="nofollow noopener" target="_blank">Mojeek</a>, <a href="https://marginalia-search.com/" rel="nofollow noopener" target="_blank">Marginalia</a>, or <a href="https://search.seznam.cz/" rel="nofollow noopener" target="_blank">Seznam</a> from discovering potentially interesting writing. I previously <a href="https://thenewleafjournal.com/whitelisting-independent-search-crawlers/" rel="nofollow noopener" target="_blank">published an article on this issue</a> advocating for webmasters who want to support an open web and search engine diversity to ensure that <em>good crawlers</em> from independent search engines and directories can access their sites.</p><p><a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://social.emucafe.org/tag/bots/" target="_blank">#bots</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://social.emucafe.org/tag/google-search/" target="_blank">#googleSearch</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://social.emucafe.org/tag/marginalia-search/" target="_blank">#marginaliaSearch</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://social.emucafe.org/tag/mojeek/" target="_blank">#mojeek</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://social.emucafe.org/tag/search-engines/" target="_blank">#searchEngines</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://social.emucafe.org/tag/seznam/" target="_blank">#seznam</a></p>