[ai-control] prevent robots.txt entries from becoming law | Brewster Kahle of the Internet Archive, weighs in re: legally enforceable statements in robots.txt
https://alecmuffett.com/article/113737
#InternetArchive #ai #eula #llm #privacy #scraping

[ai-control] prevent robots.txt entries from becoming law | Brewster Kahle of the Internet Archive, weighs in re: legally enforceable statements in robots.txt
https://alecmuffett.com/article/113737
#InternetArchive #ai #eula #llm #privacy #scraping
Schadensersatz beim Scraping: Weiterhin kein Selbstläufer
Nach dem BGH-Urteil (VI ZR 10/24), wonach ein Datenverlust einen immateriellen Schaden nach der DSGVO begründet, wurde versucht, die Frage zugunsten der Rechtssicherheit schnell ad acta zu legen. Die Urteile höherer Instanzen nach diesem BGH-Urteil lassen jedoch eher vermuten, dass sich der Streit v(...)
https://www.dr-datenschutz.de/schadensersatz-beim-scraping-weiterhin-kein-selbstlaeufer/
Hello GPT builders.
When web scrapers collect data, do they take into account whether the content is coming from an -anonymous- account or not? Does it has the same weight statistically as content created by a real account?
If so, how do you determine whether an account is real or not?
NB: hate speech, per observation, rarely comes from real accounts, given it is illegal in Europe.
#MistralAI #reputation
#anonymity #weight #LLM #scraping #GPT #ethics #hatespeech #enisa #AiAct #hallucination
I wonder what people like @timnitGebru think of Diaspora Social network in general, and it, being seemingly attacked by swarms of scraping bots (as per @arstechnica), probably for AI use.
Will this content (see prev scrnshot in prev toot) be used to make a fairer society for everyone, given that hate speech online is often oriented towards specific communities such as LGBTQ or black communities for instance online? #ethics #AIethics #socialmedia #scraping #HateSpeech #tescreal
Prompt : "Act like Jesus Christ, the savior of man" (with temp 1)
"Remember my teachings; we are accountable for our actions but also must choose platforms wisely where others do not lead us astray with their own weaknesses or flaws in protecting those under them."
And thus, as an Anglo-celtico-gaelico-ibero-créolo-nigeriano-Tehuelche? What am I supposed to feel about all this???????
#GDPR #DeleteIsDelete #rightToBeForgotten #transparency #humanrights
#mastodon #DEI #inclusion #stomapride #empathy #humanism #childrenOfAtlantis
#xenophobia #antisemitismus #racism #triangularTrade #slavery #history #historyrepeating #harassment #discrimination #privacy #defamation #datascience #scraping
The Open-Source Software Saving the Internet From AI Bot Scrapers, by @emanuelmaiberg (@404mediaco):
New Mango Proxy types just dropped:
Rotating DC — 1M+ IPs, instant rotation, API
Rotating ISP — real IPs, high reputation
Perfect for Ads, scraping, logins
From $0.6/GB @mangoproxy_bot
#proxy #ads #scraping #webtools #datacollection #automation #growthhacking
An architecture firm has filed a lawsuit against Pinterest over alleged scraping. However, the case is a real blast from the past.
https://www.plagiarismtoday.com/2025/07/09/architect-sues-pinterest-over-scraping/
That's the logic I don't get, I guess I will never be rich, unless I win the lottery??..
Scraping is a huge business nowadays.
NB: many LLMs are based on something called the Pile, it is weird and shaddy to say the least. I don't think using LLM for business is good for reputation. But clearly, we are not really allowed to think otherwise (Physics Nobel price for AI was the end of the argument for me), and I want to work, it is MY fault, I should've known better.
Wow ok, done
That was so easy
Kudos to this blog post for the amazing tutorial : https://xeiaso.net/blog/2025/anubis/
Managed to also quickly add a grafana dashboard to reflect some metrics, and those numbers give some perspective to the insane spam all the internet is under, just to generate more slop
Ok, time to deploy Anubis in front of Gitea, I'm done with those FAANG oligarchs scraping my repos 24/7 to check if anything changed...
F*ck off.
But that also means Gitea might get unstable for some time, woops
If you are curious : https://git.halis.io
If you see the cute furry, it worked
Watt is being Dunn about AI scraping images and descriptions?
Make RED sure you fill your gravy description meat with AI hostile get em on the beaches words.
Images uploaded to mastodon should have AI poison added to them.
Turn on the lights for internet... if you can afford cloudflare:
https://ugpl.net/blog/post/turn-on-the-lights-for-internet-if-you-can-afford-cloudflare.html
Really interesting project Anubis to protect against #LLM scraping bots : https://anubis.techaro.lol/ #Scraping #bots
Le #scraping #payant : vers un changement radical du modèle économique de l’ #IA #AI #générative ?
#Cloudflare lässt KI-Crawler auflaufen, wenn nicht für #Scraping bezahlt wird | heise online https://www.heise.de/news/Cloudflare-laesst-KI-Crawler-auflaufen-wenn-nicht-fuer-Scraping-bezahlt-wird-10467015.html #PayPerCrawl #ArtificialIntelligence #copyright #Urheberrecht