nicdex @nicdex

Alec Muffett[ai-control] prevent robots.txt entries from becoming law | Brewster Kahle of the Internet Archive, weighs in re: legally enforceable statements in robots.txt <a href="https://alecmuffett.com/article/113737" rel="nofollow noopener" translate="no" target="_blank">https://alecmuffett.com/article/113737</a> <a href="https://mastodon.social/tags/InternetArchive" class="mention hashtag" rel="nofollow noopener" target="_blank">#InternetArchive</a> <a href="https://mastodon.social/tags/ai" class="mention hashtag" rel="nofollow noopener" target="_blank">#ai</a> <a href="https://mastodon.social/tags/eula" class="mention hashtag" rel="nofollow noopener" target="_blank">#eula</a> <a href="https://mastodon.social/tags/llm" class="mention hashtag" rel="nofollow noopener" target="_blank">#llm</a> <a href="https://mastodon.social/tags/privacy" class="mention hashtag" rel="nofollow noopener" target="_blank">#privacy</a> <a href="https://mastodon.social/tags/scraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#scraping</a>

Dr. DatenschutzSchadensersatz beim Scraping: Weiterhin kein SelbstläuferNach dem BGH-Urteil (VI ZR 10/24), wonach ein Datenverlust einen immateriellen Schaden nach der DSGVO begründet, wurde versucht, die Frage zugunsten der Rechtssicherheit schnell ad acta zu legen. Die Urteile höherer Instanzen nach diesem BGH-Urteil lassen jedoch eher vermuten, dass sich der Streit v(...) <a href="https://www.dr-datenschutz.de/schadensersatz-beim-scraping-weiterhin-kein-selbstlaeufer/" rel="nofollow noopener" translate="no" target="_blank">https://www.dr-datenschutz.de/schadensersatz-beim-scraping-weiterhin-kein-selbstlaeufer/</a><a href="https://mastodon.social/tags/BGH" class="mention hashtag" rel="nofollow noopener" target="_blank">#BGH</a> <a href="https://mastodon.social/tags/Schadensersatz" class="mention hashtag" rel="nofollow noopener" target="_blank">#Schadensersatz</a> <a href="https://mastodon.social/tags/Scraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#Scraping</a>

Ramin HonaryBookmarking this: <a href="https://billauer.co.il/blog/2025/05/phpbb-attack-bots-ip-addresses/" rel="nofollow noopener" target="_blank">https://billauer.co.il/blog/2025/05/phpbb-attack-bots-ip-addresses/</a> <a class="hashtag" href="https://fe.disroot.org/tag/tech" rel="nofollow noopener" target="_blank">#tech</a> <a class="hashtag" href="https://fe.disroot.org/tag/webadmin" rel="nofollow noopener" target="_blank">#WebAdmin</a> <a class="hashtag" href="https://fe.disroot.org/tag/bots" rel="nofollow noopener" target="_blank">#Bots</a> <a class="hashtag" href="https://fe.disroot.org/tag/scraping" rel="nofollow noopener" target="_blank">#Scraping</a> <a class="hashtag" href="https://fe.disroot.org/tag/scraperbots" rel="nofollow noopener" target="_blank">#ScraperBots</a> <a class="hashtag" href="https://fe.disroot.org/tag/devops" rel="nofollow noopener" target="_blank">#DevOps</a> <a class="hashtag" href="https://fe.disroot.org/tag/security" rel="nofollow noopener" target="_blank">#security</a>

Nicolas MOUART-DAVIDHello GPT builders.When web scrapers collect data, do they take into account whether the content is coming from an -anonymous- account or not? Does it has the same weight statistically as content created by a real account? If so, how do you determine whether an account is real or not?NB: hate speech, per observation, rarely comes from real accounts, given it is illegal in Europe.<a href="https://mastodon.social/tags/MistralAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#MistralAI</a> <a href="https://mastodon.social/tags/reputation" class="mention hashtag" rel="nofollow noopener" target="_blank">#reputation</a> <a href="https://mastodon.social/tags/anonymity" class="mention hashtag" rel="nofollow noopener" target="_blank">#anonymity</a> <a href="https://mastodon.social/tags/weight" class="mention hashtag" rel="nofollow noopener" target="_blank">#weight</a> <a href="https://mastodon.social/tags/LLM" class="mention hashtag" rel="nofollow noopener" target="_blank">#LLM</a> <a href="https://mastodon.social/tags/scraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#scraping</a> <a href="https://mastodon.social/tags/GPT" class="mention hashtag" rel="nofollow noopener" target="_blank">#GPT</a> <a href="https://mastodon.social/tags/ethics" class="mention hashtag" rel="nofollow noopener" target="_blank">#ethics</a> <a href="https://mastodon.social/tags/hatespeech" class="mention hashtag" rel="nofollow noopener" target="_blank">#hatespeech</a> <a href="https://mastodon.social/tags/enisa" class="mention hashtag" rel="nofollow noopener" target="_blank">#enisa</a> <a href="https://mastodon.social/tags/AiAct" class="mention hashtag" rel="nofollow noopener" target="_blank">#AiAct</a> <a href="https://mastodon.social/tags/hallucination" class="mention hashtag" rel="nofollow noopener" target="_blank">#hallucination</a>

Nicolas MOUART-DAVIDI wonder what people like <a href="https://dair-community.social/@timnitGebru" class="u-url mention" rel="nofollow noopener" target="_blank">@timnitGebru</a> think of Diaspora Social network in general, and it, being seemingly attacked by swarms of scraping bots (as per <a href="https://mastodon.social/@arstechnica" class="u-url mention" rel="nofollow noopener" target="_blank">@arstechnica</a>), probably for AI use. Will this content (see prev scrnshot in prev toot) be used to make a fairer society for everyone, given that hate speech online is often oriented towards specific communities such as LGBTQ or black communities for instance online? <a href="https://mastodon.social/tags/ethics" class="mention hashtag" rel="nofollow noopener" target="_blank">#ethics</a> <a href="https://mastodon.social/tags/AIethics" class="mention hashtag" rel="nofollow noopener" target="_blank">#AIethics</a> <a href="https://mastodon.social/tags/socialmedia" class="mention hashtag" rel="nofollow noopener" target="_blank">#socialmedia</a> <a href="https://mastodon.social/tags/scraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#scraping</a> <a href="https://mastodon.social/tags/HateSpeech" class="mention hashtag" rel="nofollow noopener" target="_blank">#HateSpeech</a> <a href="https://mastodon.social/tags/tescreal" class="mention hashtag" rel="nofollow noopener" target="_blank">#tescreal</a>

Nicolas MOUART-DAVIDPrompt : "Act like Jesus Christ, the savior of man" (with temp 1)"Remember my teachings; we are accountable for our actions but also must choose platforms wisely where others do not lead us astray with their own weaknesses or flaws in protecting those under them."<a href="https://mastodon.social/tags/GDPR" class="mention hashtag" rel="nofollow noopener" target="_blank">#GDPR</a> <a href="https://mastodon.social/tags/privacy" class="mention hashtag" rel="nofollow noopener" target="_blank">#privacy</a> <a href="https://mastodon.social/tags/Microsoft" class="mention hashtag" rel="nofollow noopener" target="_blank">#Microsoft</a> <a href="https://mastodon.social/tags/Transparency" class="mention hashtag" rel="nofollow noopener" target="_blank">#Transparency</a> <a href="https://mastodon.social/tags/socialNetworks" class="mention hashtag" rel="nofollow noopener" target="_blank">#socialNetworks</a> <a href="https://mastodon.social/tags/scraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#scraping</a> <a href="https://mastodon.social/tags/rightToBeForgotten" class="mention hashtag" rel="nofollow noopener" target="_blank">#rightToBeForgotten</a> <a href="https://mastodon.social/tags/JesusPhi" class="mention hashtag" rel="nofollow noopener" target="_blank">#JesusPhi</a> <a href="https://mastodon.social/tags/LocalLLM" class="mention hashtag" rel="nofollow noopener" target="_blank">#LocalLLM</a>

Nicolas MOUART-DAVIDAnd thus, as an Anglo-celtico-gaelico-ibero-créolo-nigeriano-Tehuelche? What am I supposed to feel about all this???????<a href="https://mastodon.social/tags/GDPR" class="mention hashtag" rel="nofollow noopener" target="_blank">#GDPR</a> <a href="https://mastodon.social/tags/DeleteIsDelete" class="mention hashtag" rel="nofollow noopener" target="_blank">#DeleteIsDelete</a> <a href="https://mastodon.social/tags/rightToBeForgotten" class="mention hashtag" rel="nofollow noopener" target="_blank">#rightToBeForgotten</a> <a href="https://mastodon.social/tags/transparency" class="mention hashtag" rel="nofollow noopener" target="_blank">#transparency</a> <a href="https://mastodon.social/tags/humanrights" class="mention hashtag" rel="nofollow noopener" target="_blank">#humanrights</a> <a href="https://mastodon.social/tags/mastodon" class="mention hashtag" rel="nofollow noopener" target="_blank">#mastodon</a> <a href="https://mastodon.social/tags/DEI" class="mention hashtag" rel="nofollow noopener" target="_blank">#DEI</a> <a href="https://mastodon.social/tags/inclusion" class="mention hashtag" rel="nofollow noopener" target="_blank">#inclusion</a> <a href="https://mastodon.social/tags/stomapride" class="mention hashtag" rel="nofollow noopener" target="_blank">#stomapride</a> <a href="https://mastodon.social/tags/empathy" class="mention hashtag" rel="nofollow noopener" target="_blank">#empathy</a> <a href="https://mastodon.social/tags/humanism" class="mention hashtag" rel="nofollow noopener" target="_blank">#humanism</a> <a href="https://mastodon.social/tags/childrenOfAtlantis" class="mention hashtag" rel="nofollow noopener" target="_blank">#childrenOfAtlantis</a> <a href="https://mastodon.social/tags/xenophobia" class="mention hashtag" rel="nofollow noopener" target="_blank">#xenophobia</a> <a href="https://mastodon.social/tags/antisemitismus" class="mention hashtag" rel="nofollow noopener" target="_blank">#antisemitismus</a> <a href="https://mastodon.social/tags/racism" class="mention hashtag" rel="nofollow noopener" target="_blank">#racism</a> <a href="https://mastodon.social/tags/triangularTrade" class="mention hashtag" rel="nofollow noopener" target="_blank">#triangularTrade</a> <a href="https://mastodon.social/tags/slavery" class="mention hashtag" rel="nofollow noopener" target="_blank">#slavery</a> <a href="https://mastodon.social/tags/history" class="mention hashtag" rel="nofollow noopener" target="_blank">#history</a> <a href="https://mastodon.social/tags/historyrepeating" class="mention hashtag" rel="nofollow noopener" target="_blank">#historyrepeating</a> <a href="https://mastodon.social/tags/harassment" class="mention hashtag" rel="nofollow noopener" target="_blank">#harassment</a> <a href="https://mastodon.social/tags/discrimination" class="mention hashtag" rel="nofollow noopener" target="_blank">#discrimination</a> <a href="https://mastodon.social/tags/privacy" class="mention hashtag" rel="nofollow noopener" target="_blank">#privacy</a> <a href="https://mastodon.social/tags/defamation" class="mention hashtag" rel="nofollow noopener" target="_blank">#defamation</a> <a href="https://mastodon.social/tags/datascience" class="mention hashtag" rel="nofollow noopener" target="_blank">#datascience</a> <a href="https://mastodon.social/tags/scraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#scraping</a>

PrivacyDigest<a href="https://mas.to/tags/Browser" class="mention hashtag" rel="nofollow noopener" target="_blank">#Browser</a> <a href="https://mas.to/tags/Extensions" class="mention hashtag" rel="nofollow noopener" target="_blank">#Extensions</a> Turn Nearly 1 Million <a href="https://mas.to/tags/Browsers" class="mention hashtag" rel="nofollow noopener" target="_blank">#Browsers</a> Into Website-Scraping <a href="https://mas.to/tags/Bots" class="mention hashtag" rel="nofollow noopener" target="_blank">#Bots</a> - Slashdot <a href="https://mas.to/tags/scraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#scraping</a> <a href="https://mas.to/tags/hijack" class="mention hashtag" rel="nofollow noopener" target="_blank">#hijack</a><a href="https://tech.slashdot.org/story/25/07/09/2257245/browser-extensions-turn-nearly-1-million-browsers-into-website-scraping-bots?utm_source=rss1.0mainlinkanon&utm_medium=feed" rel="nofollow noopener" translate="no" target="_blank">https://tech.slashdot.org/story/25/07/09/2257245/browser-extensions-turn-nearly-1-million-browsers-into-website-scraping-bots?utm_source=rss1.0mainlinkanon&utm_medium=feed</a>

Frontend DogmaThe Open-Source Software Saving the Internet From AI Bot Scrapers, by <a href="https://mastodon.social/@emanuelmaiberg" class="u-url mention" rel="nofollow noopener" target="_blank">@emanuelmaiberg</a> (<a href="https://mastodon.social/@404mediaco" class="u-url mention" rel="nofollow noopener" target="_blank">@404mediaco</a>):<a href="https://archive.fo/weURd" rel="nofollow noopener" translate="no" target="_blank">https://archive.fo/weURd</a><a href="https://mas.to/tags/ai" class="mention hashtag" rel="nofollow noopener" target="_blank">#ai</a> <a href="https://mas.to/tags/scraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#scraping</a> <a href="https://mas.to/tags/tooling" class="mention hashtag" rel="nofollow noopener" target="_blank">#tooling</a>

MANGOPROXY | Proxy Provider🚀 New Mango Proxy types just dropped: 🔧 Rotating DC — 1M+ IPs, instant rotation, API 🌍 Rotating ISP — real IPs, high reputation Perfect for Ads, scraping, logins From $0.6/GB 📲 @mangoproxy_bot <a href="https://mastodon.social/tags/proxy" class="mention hashtag" rel="nofollow noopener" target="_blank">#proxy</a> <a href="https://mastodon.social/tags/ads" class="mention hashtag" rel="nofollow noopener" target="_blank">#ads</a> <a href="https://mastodon.social/tags/scraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#scraping</a> <a href="https://mastodon.social/tags/webtools" class="mention hashtag" rel="nofollow noopener" target="_blank">#webtools</a> <a href="https://mastodon.social/tags/datacollection" class="mention hashtag" rel="nofollow noopener" target="_blank">#datacollection</a> <a href="https://mastodon.social/tags/automation" class="mention hashtag" rel="nofollow noopener" target="_blank">#automation</a> <a href="https://mastodon.social/tags/growthhacking" class="mention hashtag" rel="nofollow noopener" target="_blank">#growthhacking</a>

Jonathan BaileyAn architecture firm has filed a lawsuit against Pinterest over alleged scraping. However, the case is a real blast from the past.<a href="https://www.plagiarismtoday.com/2025/07/09/architect-sues-pinterest-over-scraping/" rel="nofollow noopener" translate="no" target="_blank">https://www.plagiarismtoday.com/2025/07/09/architect-sues-pinterest-over-scraping/</a><a href="https://mastodon.world/tags/Copyright" class="mention hashtag" rel="nofollow noopener" target="_blank">#Copyright</a> <a href="https://mastodon.world/tags/Pinterest" class="mention hashtag" rel="nofollow noopener" target="_blank">#Pinterest</a> <a href="https://mastodon.world/tags/Scraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#Scraping</a>

Lawrence B. Almeida2025: Uploading your mind onto the cyberspace? Best I can do is make a half-baked simulacra from some blog posts, a 2014 Twitter bio and 2 potatoes. <a href="https://mastodon.social/tags/showerthoughts" class="mention hashtag" rel="nofollow noopener" target="_blank">#showerthoughts</a> <a href="https://mastodon.social/tags/ai" class="mention hashtag" rel="nofollow noopener" target="_blank">#ai</a> <a href="https://mastodon.social/tags/llm" class="mention hashtag" rel="nofollow noopener" target="_blank">#llm</a> <a href="https://mastodon.social/tags/scraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#scraping</a> <a href="https://mastodon.social/tags/tech" class="mention hashtag" rel="nofollow noopener" target="_blank">#tech</a> <a href="https://mastodon.social/tags/thoughts" class="mention hashtag" rel="nofollow noopener" target="_blank">#thoughts</a>

**Nicolas MOUART-DAVID** @silentexception@mastodon.social · Jul 8 *

Jul 8 *

Nicolas MOUART-DAVID @silentexception@mastodon.social

That's the logic I don't get, I guess I will never be rich, unless I win the lottery??..
Scraping is a huge business nowadays.
NB: many LLMs are based on something called the Pile, it is weird and shaddy to say the least. I don't think using LLM for business is good for reputation. But clearly, we are not really allowed to think otherwise (Physics Nobel price for AI was the end of the argument for me), and I want to work, it is MY fault, I should've known better.

#scraping #copyrights #internet

Continued thread

**DocYeet** @docyeet@halis.io · Jul 7

Jul 7

DocYeet @docyeet@halis.io

Wow ok, done

That was so easy

Kudos to this blog post for the amazing tutorial : https://xeiaso.net/blog/2025/anubis/

Managed to also quickly add a grafana dashboard to reflect some metrics, and those numbers give some perspective to the insane spam all the internet is under, just to generate more slop

#selfhosted #homelab #kubernetes