techhub.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A hub primarily for passionate technologists, but everyone is welcome

Administered by:

Server stats:

5.4K
active users

#aiscraping

0 posts0 participants0 posts today
olimobu 🎶<p>👀 Esta mañana al comentar los problemas de Wikimedia con el scrapping, un amigo programador me han hablado del proyecto Anubis <a href="https://github.com/TecharoHQ/anubis/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">github.com/TecharoHQ/anubis/</span><span class="invisible"></span></a> <br>"Es bastante sencillo y fácil de implementar en cualquier web medio seria, te cargas automáticamente cualquier scrapper (sea de IA sea de lo que sea). Además, no pueden inventar nada que haga que sea rentable el scrapping con eso puesto." <a href="https://social.anartist.org/tags/aiscraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>aiscraping</span></a> <a href="https://social.anartist.org/tags/aiscrapers" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>aiscrapers</span></a> <a href="https://social.anartist.org/tags/wikimedia" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>wikimedia</span></a> <a href="https://social.anartist.org/tags/anubis" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>anubis</span></a> <a href="https://social.anartist.org/tags/iahastaenlaputasopa" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>iahastaenlaputasopa</span></a></p>
Winbuzzer<p>AI Crawlers Overwhelm Open-Source Projects, Forcing Developers to Block Entire Countries</p><p><a href="https://mastodon.social/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> <a href="https://mastodon.social/tags/Web" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Web</span></a> <a href="https://mastodon.social/tags/Robotstxt" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Robotstxt</span></a> <a href="https://mastodon.social/tags/AIScraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIScraping</span></a> <a href="https://mastodon.social/tags/OpenSource" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>OpenSource</span></a> <a href="https://mastodon.social/tags/Cybersecurity" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Cybersecurity</span></a> <a href="https://mastodon.social/tags/DataScraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>DataScraping</span></a> <a href="https://mastodon.social/tags/Scraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Scraping</span></a> <a href="https://mastodon.social/tags/WebScraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>WebScraping</span></a> </p><p><a href="https://winbuzzer.com/2025/03/26/ai-crawlers-overwhelm-open-source-projects-forcing-developers-to-block-entire-countries-xcxwbn/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">winbuzzer.com/2025/03/26/ai-cr</span><span class="invisible">awlers-overwhelm-open-source-projects-forcing-developers-to-block-entire-countries-xcxwbn/</span></a></p>
Mic Die Duiwel<p>AI scrapers are a plague on the internet</p><p><a href="https://mastodon.social/tags/aiscraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>aiscraping</span></a> <a href="https://mastodon.social/tags/aiscrapers" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>aiscrapers</span></a> <a href="https://mastodon.social/tags/ai" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ai</span></a> <a href="https://mastodon.social/tags/llm" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>llm</span></a> </p><p><a href="https://www.osnews.com/story/141969/foss-infrastructure-is-under-attack-by-ai-companies/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">osnews.com/story/141969/foss-i</span><span class="invisible">nfrastructure-is-under-attack-by-ai-companies/</span></a></p>
jbz<p>🌐 LLM crawlers continue to DDoS SourceHut | sr_ht status</p><p>「 SourceHut continues to face disruptions due to aggressive LLM crawlers. We are continuously working to deploy mitigations. We have deployed a number of mitigations which are keeping the problem contained for now. However, some of our mitigations may impact end-users 」</p><p><a href="https://status.sr.ht/issues/2025-03-17-git.sr.ht-llms/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">status.sr.ht/issues/2025-03-17</span><span class="invisible">-git.sr.ht-llms/</span></a></p><p><a href="https://indieweb.social/tags/sourcehut" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>sourcehut</span></a> <a href="https://indieweb.social/tags/ddos" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ddos</span></a> <a href="https://indieweb.social/tags/aiscraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>aiscraping</span></a></p>
aproposnix<p>Serious question, isn't this an issue even with decentralized systems? What's preventing AI bots from just using all of our public data on the Fediverse? Is there any difference?</p><p><a href="https://mastodon.social/tags/ai" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ai</span></a> <a href="https://mastodon.social/tags/AITraining" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AITraining</span></a> <a href="https://mastodon.social/tags/aiscraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>aiscraping</span></a> <a href="https://mastodon.social/tags/askfedi" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>askfedi</span></a> </p><p><a href="https://techcrunch.com/2025/03/15/bluesky-users-debate-plans-around-user-data-and-ai-training/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">techcrunch.com/2025/03/15/blue</span><span class="invisible">sky-users-debate-plans-around-user-data-and-ai-training/</span></a></p>
Friedemann<p>Hi <a href="https://mastodon.online/tags/Admins" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Admins</span></a> 👋,</p><p>Can you give me quotes that explain your fight against <a href="https://mastodon.online/tags/AIScraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIScraping</span></a>? I'm looking for (verbal) images, metaphors, comparisons, etc. that explain to non-techies what's going on. (efforts, goals, resources...)</p><p>I intend to publish your quotes in a text on <span class="h-card" translate="no"><a href="https://mastodon.social/@campact" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>campact</span></a></span> 's blog¹ (DE, German NGO).</p><p>The quotes should make your work🙏 visible in a generally understandable way</p><p>¹ <a href="https://blog.campact.de/author/friedemann/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">blog.campact.de/author/friedem</span><span class="invisible">ann/</span></a></p><p><a href="https://mastodon.online/tags/TDM" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>TDM</span></a> <a href="https://mastodon.online/tags/MastoAdmin" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>MastoAdmin</span></a> <a href="https://mastodon.online/tags/DataPoisoning" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>DataPoisoning</span></a> <a href="https://mastodon.online/tags/aitxt" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>aitxt</span></a> <a href="https://mastodon.online/tags/GPT" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>GPT</span></a> <a href="https://mastodon.online/tags/TDMRep" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>TDMRep</span></a> <a href="https://mastodon.online/tags/Kudurru" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Kudurru</span></a> <a href="https://mastodon.online/tags/Nightshade" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Nightshade</span></a> <a href="https://mastodon.online/tags/Glaze" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Glaze</span></a> <a href="https://mastodon.online/tags/FediAdmins" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>FediAdmins</span></a></p>
PPC Land<p>Cloudflare unveils tools to give publishers control over AI scraping: New AI Audit feature allows website owners to analyze and manage how AI models access their content, with plans for a marketplace. <a href="https://ppc.land/cloudflare-unveils-tools-to-give-publishers-control-over-ai-scraping/?utm_source=dlvr.it&amp;utm_medium=mastodon" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">ppc.land/cloudflare-unveils-to</span><span class="invisible">ols-to-give-publishers-control-over-ai-scraping/?utm_source=dlvr.it&amp;utm_medium=mastodon</span></a> <a href="https://mastodon.social/tags/Cloudflare" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Cloudflare</span></a> <a href="https://mastodon.social/tags/AIScraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIScraping</span></a> <a href="https://mastodon.social/tags/PublishingTools" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>PublishingTools</span></a> <a href="https://mastodon.social/tags/DigitalMarketing" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>DigitalMarketing</span></a> <a href="https://mastodon.social/tags/ContentManagement" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ContentManagement</span></a></p>
beSpacific<p>How to turn off <a href="https://newsie.social/tags/AIscraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIscraping</span></a> from your Word documents "<a href="https://newsie.social/tags/Microsoft" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Microsoft</span></a> Office has slyly turned on an “opt-out” feature that scrapes your <a href="https://newsie.social/tags/Word" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Word</span></a>,<a href="https://newsie.social/tags/Excel" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Excel</span></a> docs to train its internal AI systems. This setting is turned on by default, and you have to manually uncheck a box in order to opt out. If you are a writer who uses MS Word to write any proprietary content (blog posts, novels, any work you intend to protect w <a href="https://newsie.social/tags/copyright" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>copyright</span></a> and/or sell), u want to turn this feature off immediately <a href="https://medium.com/illumination/ms-word-is-using-you-to-train-ai-86d6a4d87021" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">medium.com/illumination/ms-wor</span><span class="invisible">d-is-using-you-to-train-ai-86d6a4d87021</span></a></p>
Norobiik @Norobiik@noc.social<p>The <a href="https://noc.social/tags/WebApp" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>WebApp</span></a>, called <a href="https://noc.social/tags/AdobeContentAuthenticity" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AdobeContentAuthenticity</span></a>, allows artists to signal that they do not consent for their work to be used by <a href="https://noc.social/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> models. It also gives creators the opportunity to add what Adobe is calling “<a href="https://noc.social/tags/ContentCredentials" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ContentCredentials</span></a>,” including their verified identity, social media handles, or other online domains, to their work. <a href="https://noc.social/tags/C2PA" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>C2PA</span></a> <a href="https://noc.social/tags/DataScraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>DataScraping</span></a></p><p><a href="https://noc.social/tags/Adobe" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Adobe</span></a> wants to make it easier for artists to blacklist their work from <a href="https://noc.social/tags/AIScraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIScraping</span></a><br><a href="https://www.technologyreview.com/2024/10/08/1105234/adobe-wants-to-make-it-easier-for-artists-to-blacklist-their-work-from-ai-scraping/?utm_source=press.coop" rel="nofollow noopener noreferrer" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">technologyreview.com/2024/10/0</span><span class="invisible">8/1105234/adobe-wants-to-make-it-easier-for-artists-to-blacklist-their-work-from-ai-scraping/?utm_source=press.coop</span></a></p>
𝓖𝓵𝓸𝓻𝓲𝓪<p>Hmm, interesting. I think tools like this are definitely a good thing.</p><p><a href="https://mastodon.social/tags/Adobe" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Adobe</span></a> wants to make it easier for artists to blacklist their work from <a href="https://mastodon.social/tags/AIscraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIscraping</span></a> | MIT Technology Review <span class="h-card" translate="no"><a href="https://threads.net/@technologyreview/" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>technologyreview</span></a></span> <a href="https://mastodon.social/tags/ai" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ai</span></a> <a href="https://mastodon.social/tags/aiart" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>aiart</span></a> <br> <a href="https://www.technologyreview.com/2024/10/08/1105234/adobe-wants-to-make-it-easier-for-artists-to-blacklist-their-work-from-ai-scraping/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">technologyreview.com/2024/10/0</span><span class="invisible">8/1105234/adobe-wants-to-make-it-easier-for-artists-to-blacklist-their-work-from-ai-scraping/</span></a></p>
Ecologia Digital<p>"It’s pretty crazy that not only a) these bots shamelessly harvest all your data without asking for permission and b) they do it in such a brute-force manner.<br>My coworker and security expert António pointed me to <a href="https://mato.social/tags/DarkVisitors" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>DarkVisitors</span></a>, and I’ll probably be installing their <a href="https://mato.social/tags/WordPressPlugin" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>WordPressPlugin</span></a> on all my sites. For what it’s worth."<br><span class="h-card" translate="no"><a href="https://mastodon.social/@john_fisherman" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>john_fisherman</span></a></span> on <a href="https://mato.social/tags/AIscraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIscraping</span></a> <br><a href="https://fred-rocha.medium.com/ai-crawler-bots-on-the-hunt-caf5a59ff478" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">fred-rocha.medium.com/ai-crawl</span><span class="invisible">er-bots-on-the-hunt-caf5a59ff478</span></a></p>
3dcandy<p>Meta scraped all public posts for AI <a href="https://3dcandy.social/2024/09/meta-scraped-all-public-posts-for-ai/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">3dcandy.social/2024/09/meta-sc</span><span class="invisible">raped-all-public-posts-for-ai/</span></a> <a href="https://mastodon.social/tags/ai" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ai</span></a> <a href="https://mastodon.social/tags/aiscraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>aiscraping</span></a> <a href="https://mastodon.social/tags/boost" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>boost</span></a> <a href="https://mastodon.social/tags/facebook" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>facebook</span></a> <a href="https://mastodon.social/tags/instagram" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>instagram</span></a> <a href="https://mastodon.social/tags/meta" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>meta</span></a></p>
eicker.news ᳇ tech news<p>»Online <a href="https://eicker.news/tags/publishers" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>publishers</span></a> face a <a href="https://eicker.news/tags/dilemma" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>dilemma</span></a>: Allow <a href="https://eicker.news/tags/AIscraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIscraping</span></a> from <a href="https://eicker.news/tags/Google" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Google</span></a> or lose <a href="https://eicker.news/tags/searchvisibility" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>searchvisibility</span></a>: Blocking the company’s <a href="https://eicker.news/tags/AIoverviews" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIoverviews</span></a> also blocks its <a href="https://eicker.news/tags/webcrawler" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>webcrawler</span></a>.« <a href="https://www.engadget.com/ai/online-publishers-face-a-dilemma-allow-ai-scraping-from-google-or-lose-search-visibility-202246891.html?eicker.news" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">engadget.com/ai/online-publish</span><span class="invisible">ers-face-a-dilemma-allow-ai-scraping-from-google-or-lose-search-visibility-202246891.html?eicker.news</span></a> <a href="https://eicker.news/tags/tech" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>tech</span></a> <a href="https://eicker.news/tags/media" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>media</span></a></p>
Ecologia Digital<p>big scoop by <span class="h-card" translate="no"><a href="https://mastodon.social/@404mediaco" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>404mediaco</span></a></span>: <br>"<a href="https://mato.social/tags/Nvidia" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Nvidia</span></a> employee leaked documents, Slack conversations, and emails to 404 Media showing how the company went about building a video foundational model that would feed into its other products. It's a fascinating look into how a tech giant operates as it's attempting to stay competitive in AI world, and how it gobbles up copyrighted content from around the web in the process."<br><a href="https://mato.social/tags/AIscraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIscraping</span></a><br><a href="https://www.404media.co/nvidia-ai-scraping-foundational-model-cosmos-project/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">404media.co/nvidia-ai-scraping</span><span class="invisible">-foundational-model-cosmos-project/</span></a></p>
jbz<p>「 OpenAI and Anthropic have stated publicly that they respect robots.txt and blocks to their specific web crawlers, GPTBot and ClaudeBot.</p><p>However, according to TollBit's findings, such blocks are not being respected, as claimed. AI companies, including OpenAI and Anthropic, are simply choosing to "bypass" robots.txt in order to retrieve or scrape all of the content from a given website or page 」</p><p><a href="https://www.businessinsider.com/openai-anthropic-ai-ignore-rule-scraping-web-contect-robotstxt" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">businessinsider.com/openai-ant</span><span class="invisible">hropic-ai-ignore-rule-scraping-web-contect-robotstxt</span></a></p><p><a href="https://indieweb.social/tags/OpenAI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>OpenAI</span></a> <a href="https://indieweb.social/tags/AIScraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIScraping</span></a> <a href="https://indieweb.social/tags/AITheft" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AITheft</span></a> <a href="https://indieweb.social/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> <a href="https://indieweb.social/tags/Cybersecurity" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Cybersecurity</span></a> <a href="https://indieweb.social/tags/Infosec" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Infosec</span></a> <a href="https://indieweb.social/tags/Security" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Security</span></a></p>
:blahaj: Why Not Zoidberg? 🦑<p><a href="https://topspicy.social/tags/LaughAtTheFlawedTechnology" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>LaughAtTheFlawedTechnology</span></a> <a href="https://topspicy.social/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> <a href="https://topspicy.social/tags/AIScraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIScraping</span></a> <a href="https://topspicy.social/tags/Enshittification" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Enshittification</span></a> </p><p>To me it is clear that a lot of Open AI / Google / Copilot teachings came from scraping porn and Instagram. "Realistic blonde / Japanese model face with odd hands" is a thing after all. But who looks at hands?</p>
jbz<p>🙈 Exclusive: Multiple AI companies bypassing web standard to scrape publisher sites, licensing firm says | Reuters </p><p><a href="https://www.reuters.com/technology/artificial-intelligence/multiple-ai-companies-bypassing-web-standard-scrape-publisher-sites-licensing-2024-06-21/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">reuters.com/technology/artific</span><span class="invisible">ial-intelligence/multiple-ai-companies-bypassing-web-standard-scrape-publisher-sites-licensing-2024-06-21/</span></a></p><p><a href="https://indieweb.social/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> <a href="https://indieweb.social/tags/GenAI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>GenAI</span></a> <a href="https://indieweb.social/tags/AIScraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIScraping</span></a> <a href="https://indieweb.social/tags/Scraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Scraping</span></a> <a href="https://indieweb.social/tags/Copyright" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Copyright</span></a> <a href="https://indieweb.social/tags/AITheft" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AITheft</span></a></p>
jbz<p>🥸 Perplexity AI Is Lying about Their User Agent • <span class="h-card" translate="no"><a href="https://social.lol/@robb" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>robb</span></a></span></p><p>「 I checked a few sites and this is just Google Chrome running on Windows 10. So they're using headless browsers to scrape content, ignoring robots.txt, and not sending their user agent string. I can't even block their IP ranges because it appears these headless browsers are not on their IP ranges 」 </p><p><a href="https://rknight.me/blog/perplexity-ai-is-lying-about-its-user-agent/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">rknight.me/blog/perplexity-ai-</span><span class="invisible">is-lying-about-its-user-agent/</span></a></p><p><a href="https://indieweb.social/tags/PerplexityAI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>PerplexityAI</span></a> <a href="https://indieweb.social/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> <a href="https://indieweb.social/tags/AIScraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIScraping</span></a> <a href="https://indieweb.social/tags/AITheft" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AITheft</span></a></p>
jbz<p>What if these latest hacks are really about feeding big AI scrappers under the table, using databreaches as cover.</p><p><a href="https://indieweb.social/tags/Databreach" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Databreach</span></a> <a href="https://indieweb.social/tags/AIscraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIscraping</span></a> <a href="https://indieweb.social/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> <a href="https://indieweb.social/tags/Privacy" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Privacy</span></a> <a href="https://indieweb.social/tags/Infosec" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Infosec</span></a></p>
Elena<p>This is my <a href="https://mastodon.social/tags/noobquestion" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>noobquestion</span></a> of the day.</p><p><a href="https://mastodon.social/tags/tumblr" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>tumblr</span></a> and <a href="https://mastodon.social/tags/wordpress" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>wordpress</span></a> are apparently going to sell data for <a href="https://mastodon.social/tags/aiscraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>aiscraping</span></a> but what prevents the <a href="https://mastodon.social/tags/scrapers" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>scrapers</span></a> from freely accessing the <a href="https://mastodon.social/tags/fediverse" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>fediverse</span></a> and feasting on it at no cost?</p><p>And, if the answer is "nothing", how or where can one post their shit online without having to think about being scraped?</p><p>Is there any way to prevent it at all, like those poisoning tools for images?</p><p>Scraping truly oughta be opt-in, FFS.<br><a href="https://mastodon.social/tags/mastodonadmin" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>mastodonadmin</span></a> <a href="https://mastodon.social/tags/fuckaitheft" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>fuckaitheft</span></a> <a href="https://mastodon.social/tags/consent" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>consent</span></a></p>