techhub.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A hub primarily for passionate technologists, but everyone is welcome

Administered by:

Server stats:

4.6K
active users

#scraping

1 post1 participant0 posts today
Alec Muffett<p>[ai-control] prevent robots.txt entries from becoming law | Brewster Kahle of the Internet Archive, weighs in re: legally enforceable statements in robots.txt<br><a href="https://alecmuffett.com/article/113737" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">alecmuffett.com/article/113737</span><span class="invisible"></span></a><br><a href="https://mastodon.social/tags/InternetArchive" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>InternetArchive</span></a> <a href="https://mastodon.social/tags/ai" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ai</span></a> <a href="https://mastodon.social/tags/eula" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>eula</span></a> <a href="https://mastodon.social/tags/llm" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>llm</span></a> <a href="https://mastodon.social/tags/privacy" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>privacy</span></a> <a href="https://mastodon.social/tags/scraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>scraping</span></a></p>
Dr. Datenschutz<p>Schadensersatz beim Scraping: Weiterhin kein Selbstläufer</p><p>Nach dem BGH-Urteil (VI ZR 10/24), wonach ein Datenverlust einen immateriellen Schaden nach der DSGVO begründet, wurde versucht, die Frage zugunsten der Rechtssicherheit schnell ad acta zu legen. Die Urteile höherer Instanzen nach diesem BGH-Urteil lassen jedoch eher vermuten, dass sich der Streit v(...)<br><a href="https://www.dr-datenschutz.de/schadensersatz-beim-scraping-weiterhin-kein-selbstlaeufer/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">dr-datenschutz.de/schadensersa</span><span class="invisible">tz-beim-scraping-weiterhin-kein-selbstlaeufer/</span></a></p><p><a href="https://mastodon.social/tags/BGH" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>BGH</span></a> <a href="https://mastodon.social/tags/Schadensersatz" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Schadensersatz</span></a> <a href="https://mastodon.social/tags/Scraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Scraping</span></a></p>
Ramin HonaryBookmarking this: <a href="https://billauer.co.il/blog/2025/05/phpbb-attack-bots-ip-addresses/" rel="nofollow noopener" target="_blank">https://billauer.co.il/blog/2025/05/phpbb-attack-bots-ip-addresses/</a><br><br><a class="hashtag" href="https://fe.disroot.org/tag/tech" rel="nofollow noopener" target="_blank">#tech</a> <a class="hashtag" href="https://fe.disroot.org/tag/webadmin" rel="nofollow noopener" target="_blank">#WebAdmin</a> <a class="hashtag" href="https://fe.disroot.org/tag/bots" rel="nofollow noopener" target="_blank">#Bots</a> <a class="hashtag" href="https://fe.disroot.org/tag/scraping" rel="nofollow noopener" target="_blank">#Scraping</a> <a class="hashtag" href="https://fe.disroot.org/tag/scraperbots" rel="nofollow noopener" target="_blank">#ScraperBots</a> <a class="hashtag" href="https://fe.disroot.org/tag/devops" rel="nofollow noopener" target="_blank">#DevOps</a> <a class="hashtag" href="https://fe.disroot.org/tag/security" rel="nofollow noopener" target="_blank">#security</a>
Nicolas MOUART-DAVID<p>Hello GPT builders.</p><p>When web scrapers collect data, do they take into account whether the content is coming from an -anonymous- account or not? Does it has the same weight statistically as content created by a real account? <br>If so, how do you determine whether an account is real or not?</p><p>NB: hate speech, per observation, rarely comes from real accounts, given it is illegal in Europe.</p><p><a href="https://mastodon.social/tags/MistralAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>MistralAI</span></a> <a href="https://mastodon.social/tags/reputation" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>reputation</span></a><br><a href="https://mastodon.social/tags/anonymity" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>anonymity</span></a> <a href="https://mastodon.social/tags/weight" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>weight</span></a> <a href="https://mastodon.social/tags/LLM" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>LLM</span></a> <a href="https://mastodon.social/tags/scraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>scraping</span></a> <a href="https://mastodon.social/tags/GPT" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GPT</span></a> <a href="https://mastodon.social/tags/ethics" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ethics</span></a> <a href="https://mastodon.social/tags/hatespeech" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>hatespeech</span></a> <a href="https://mastodon.social/tags/enisa" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>enisa</span></a> <a href="https://mastodon.social/tags/AiAct" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AiAct</span></a> <a href="https://mastodon.social/tags/hallucination" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>hallucination</span></a></p>
Nicolas MOUART-DAVID<p>I wonder what people like <span class="h-card" translate="no"><a href="https://dair-community.social/@timnitGebru" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>timnitGebru</span></a></span> think of Diaspora Social network in general, and it, being seemingly attacked by swarms of scraping bots (as per <span class="h-card" translate="no"><a href="https://mastodon.social/@arstechnica" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>arstechnica</span></a></span>), probably for AI use. <br>Will this content (see prev scrnshot in prev toot) be used to make a fairer society for everyone, given that hate speech online is often oriented towards specific communities such as LGBTQ or black communities for instance online? <a href="https://mastodon.social/tags/ethics" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ethics</span></a> <a href="https://mastodon.social/tags/AIethics" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AIethics</span></a> <a href="https://mastodon.social/tags/socialmedia" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>socialmedia</span></a> <a href="https://mastodon.social/tags/scraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>scraping</span></a> <a href="https://mastodon.social/tags/HateSpeech" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>HateSpeech</span></a> <a href="https://mastodon.social/tags/tescreal" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>tescreal</span></a></p>
Nicolas MOUART-DAVID<p>Prompt : "Act like Jesus Christ, the savior of man" (with temp 1)</p><p>"Remember my teachings; we are accountable for our actions but also must choose platforms wisely where others do not lead us astray with their own weaknesses or flaws in protecting those under them."</p><p><a href="https://mastodon.social/tags/GDPR" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GDPR</span></a> <a href="https://mastodon.social/tags/privacy" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>privacy</span></a> <a href="https://mastodon.social/tags/Microsoft" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Microsoft</span></a> <a href="https://mastodon.social/tags/Transparency" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Transparency</span></a> <a href="https://mastodon.social/tags/socialNetworks" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>socialNetworks</span></a> <a href="https://mastodon.social/tags/scraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>scraping</span></a> <a href="https://mastodon.social/tags/rightToBeForgotten" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>rightToBeForgotten</span></a> <a href="https://mastodon.social/tags/JesusPhi" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>JesusPhi</span></a> <a href="https://mastodon.social/tags/LocalLLM" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>LocalLLM</span></a></p>
Nicolas MOUART-DAVID<p>And thus, as an Anglo-celtico-gaelico-ibero-créolo-nigeriano-Tehuelche? What am I supposed to feel about all this???????</p><p><a href="https://mastodon.social/tags/GDPR" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GDPR</span></a> <a href="https://mastodon.social/tags/DeleteIsDelete" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DeleteIsDelete</span></a> <a href="https://mastodon.social/tags/rightToBeForgotten" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>rightToBeForgotten</span></a> <a href="https://mastodon.social/tags/transparency" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>transparency</span></a> <a href="https://mastodon.social/tags/humanrights" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>humanrights</span></a> <br><a href="https://mastodon.social/tags/mastodon" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>mastodon</span></a> <a href="https://mastodon.social/tags/DEI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DEI</span></a> <a href="https://mastodon.social/tags/inclusion" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>inclusion</span></a> <a href="https://mastodon.social/tags/stomapride" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>stomapride</span></a> <a href="https://mastodon.social/tags/empathy" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>empathy</span></a> <a href="https://mastodon.social/tags/humanism" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>humanism</span></a> <a href="https://mastodon.social/tags/childrenOfAtlantis" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>childrenOfAtlantis</span></a> <br><a href="https://mastodon.social/tags/xenophobia" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>xenophobia</span></a> <a href="https://mastodon.social/tags/antisemitismus" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>antisemitismus</span></a> <a href="https://mastodon.social/tags/racism" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>racism</span></a> <a href="https://mastodon.social/tags/triangularTrade" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>triangularTrade</span></a> <a href="https://mastodon.social/tags/slavery" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>slavery</span></a> <a href="https://mastodon.social/tags/history" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>history</span></a> <a href="https://mastodon.social/tags/historyrepeating" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>historyrepeating</span></a> <a href="https://mastodon.social/tags/harassment" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>harassment</span></a> <a href="https://mastodon.social/tags/discrimination" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>discrimination</span></a> <a href="https://mastodon.social/tags/privacy" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>privacy</span></a> <a href="https://mastodon.social/tags/defamation" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>defamation</span></a> <a href="https://mastodon.social/tags/datascience" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>datascience</span></a> <a href="https://mastodon.social/tags/scraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>scraping</span></a></p>
PrivacyDigest<p><a href="https://mas.to/tags/Browser" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Browser</span></a> <a href="https://mas.to/tags/Extensions" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Extensions</span></a> Turn Nearly 1 Million <a href="https://mas.to/tags/Browsers" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Browsers</span></a> Into Website-Scraping <a href="https://mas.to/tags/Bots" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Bots</span></a> - Slashdot <br><a href="https://mas.to/tags/scraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>scraping</span></a> <a href="https://mas.to/tags/hijack" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>hijack</span></a></p><p><a href="https://tech.slashdot.org/story/25/07/09/2257245/browser-extensions-turn-nearly-1-million-browsers-into-website-scraping-bots?utm_source=rss1.0mainlinkanon&amp;utm_medium=feed" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">tech.slashdot.org/story/25/07/</span><span class="invisible">09/2257245/browser-extensions-turn-nearly-1-million-browsers-into-website-scraping-bots?utm_source=rss1.0mainlinkanon&amp;utm_medium=feed</span></a></p>
Frontend Dogma<p>The Open-Source Software Saving the Internet From AI Bot Scrapers, by <span class="h-card" translate="no"><a href="https://mastodon.social/@emanuelmaiberg" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>emanuelmaiberg</span></a></span> (<span class="h-card" translate="no"><a href="https://mastodon.social/@404mediaco" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>404mediaco</span></a></span>):</p><p><a href="https://archive.fo/weURd" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">archive.fo/weURd</span><span class="invisible"></span></a></p><p><a href="https://mas.to/tags/ai" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ai</span></a> <a href="https://mas.to/tags/scraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>scraping</span></a> <a href="https://mas.to/tags/tooling" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>tooling</span></a></p>
MANGOPROXY | Proxy Provider<p>🚀 New Mango Proxy types just dropped:<br>🔧 Rotating DC — 1M+ IPs, instant rotation, API<br>🌍 Rotating ISP — real IPs, high reputation<br>Perfect for Ads, scraping, logins<br>From $0.6/GB<br>📲 @mangoproxy_bot<br><a href="https://mastodon.social/tags/proxy" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>proxy</span></a> <a href="https://mastodon.social/tags/ads" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ads</span></a> <a href="https://mastodon.social/tags/scraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>scraping</span></a> <a href="https://mastodon.social/tags/webtools" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>webtools</span></a> <a href="https://mastodon.social/tags/datacollection" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>datacollection</span></a> <a href="https://mastodon.social/tags/automation" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>automation</span></a> <a href="https://mastodon.social/tags/growthhacking" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>growthhacking</span></a></p>
Jonathan Bailey<p>An architecture firm has filed a lawsuit against Pinterest over alleged scraping. However, the case is a real blast from the past.</p><p><a href="https://www.plagiarismtoday.com/2025/07/09/architect-sues-pinterest-over-scraping/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">plagiarismtoday.com/2025/07/09</span><span class="invisible">/architect-sues-pinterest-over-scraping/</span></a></p><p><a href="https://mastodon.world/tags/Copyright" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Copyright</span></a> <a href="https://mastodon.world/tags/Pinterest" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Pinterest</span></a> <a href="https://mastodon.world/tags/Scraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Scraping</span></a></p>
Lawrence B. Almeida<p>2025: Uploading your mind onto the cyberspace?<br>Best I can do is make a half-baked simulacra from some blog posts, a 2014 Twitter bio and 2 potatoes. <br><a href="https://mastodon.social/tags/showerthoughts" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>showerthoughts</span></a> <a href="https://mastodon.social/tags/ai" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ai</span></a> <a href="https://mastodon.social/tags/llm" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>llm</span></a> <a href="https://mastodon.social/tags/scraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>scraping</span></a> <a href="https://mastodon.social/tags/tech" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>tech</span></a> <a href="https://mastodon.social/tags/thoughts" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>thoughts</span></a></p>

That's the logic I don't get, I guess I will never be rich, unless I win the lottery??..
Scraping is a huge business nowadays.
NB: many LLMs are based on something called the Pile, it is weird and shaddy to say the least. I don't think using LLM for business is good for reputation. But clearly, we are not really allowed to think otherwise (Physics Nobel price for AI was the end of the argument for me), and I want to work, it is MY fault, I should've known better.