techhub.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A hub primarily for passionate technologists, but everyone is welcome

Administered by:

Server stats:

4.7K
active users

#dataquality

1 post1 participant0 posts today
Aneesh Sathe<p><strong>Beyond the Dataset</strong></p><p>On the recent season of the show Clarkson’s farm, J.C. goes through great lengths to buy the right pub. As with any sensible buyer, the team does a thorough tear down followed by a big build up before the place is open for business. They survey how the place is built, located, and accessed. In their refresh they ensure that each part of the pub is built with purpose. Even the tractor on the ceiling. The art is&nbsp; in answering the question: <em>How was this place put together?&nbsp;</em></p><p>A data-scientist should be equally fussy. Until we trace how every number was collected, corrected and cleaned, —who measured it, what tool warped it, what assumptions skewed it—we can’t trust the next step in our business to flourish.</p><a href="https://aneeshsathe.com/wp-content/uploads/2025/07/image-from-rawpixel-id-3590280-jpeg.jpg" rel="nofollow noopener" target="_blank"></a>Old sound (1925) painting in high resolution by Paul Klee. Original from the Kunstmuseum Basel Museum. Digitally enhanced by rawpixel.<p><strong><strong>Two load-bearing pillars</strong></strong></p><p>While there are many flavors of data science I’m concerned about the analysis that is done in scientific spheres and startups. In this world, the structure held up by two pillars:</p><ol><li><strong>How we measure</strong> — the trip from reality to raw numbers. Feature extraction.</li><li><strong>How we compare</strong> — the rules that let those numbers answer a question. Statistics and causality.</li></ol><p>Both of these related to having a deep understanding of the data generation process. Each from a different angle. A crack in either pillar and whatever sits on top crumbles. Plots, significance, AI predictions, mean nothing.</p><p><strong><strong>How we measure</strong></strong></p><p>A misaligned microscope is the digital equivalent of crooked lumber. No amount of massage can birth a photon that never hit the sensor. In fluorescence imaging, the <strong>point-spread function</strong> tells you how a pin-point of light smears across neighboring pixels;<strong> noise</strong> reminds you that light itself arrives from and is recorded by at least some randomness. Misjudge either and the cell you call “twice as bright” may be a mirage.</p><p>In this data generation process the instrument nuances control what you see. Understanding this enables us to make judgements about what kind of post processing is right and which one may destroy or invent data. For simpler analysis the post processing can stop at cleaner raw data. For developing AI models, this process extends to labeling and analyzing data distributions. Andrew Ng’s approach, in data-centric AI, insists that tightening labels, fixing sensor drift, and writing clear provenance notes often beat fancier models.</p><p><strong><strong>How we compare</strong></strong></p><p>Now suppose Clarkson were to test a new fertilizer, fresh goat pellets, only on sunny plots. Any bumper harvest that follows says more about sunshine than about the pellets. Sound comparisons begin long before data arrive. A deep understanding of the science behind the experiment is critical before conducting any statistics. The wrong randomization, controls, and lurking confounder eat away at the foundation of statistics.</p><p>This information is <em>not</em> in the data. Only understanding how the experiment was designed and which events preclude others enable us to build a model of the world of the experiment. Taking this lightly has large risks for startups with limited budgets and smaller experiments. A false positive result leads to wasted resources while a false negative presents opportunity costs.&nbsp; &nbsp;</p><p>The stakes climb quickly. Early in the COVID-19 pandemic, some regions bragged of lower death rates. Age, testing access, and hospital load varied wildly, yet headlines crowned local policies as miracle cures. When later studies re-leveled the footing, the miracles vanished.&nbsp;</p><p><strong><strong>Why the pillars get skipped</strong></strong></p><p>Speed, habit, and misplaced trust. Leo Breiman warned in 2001 that many analysts chase algorithmic accuracy and skip the question of how the data were generated. What he called the “two cultures.” Today’s tooling tempts us even more: auto-charts, one-click models, pretrained everything. They save time—until they cost us the answer.</p><p>The other issue is lack of a culture that communicates and shares a common language. Only in academic training is it possible to train a single person to understand the science, the instrumentation, and the statistics sufficiently that their research may be taken seriously. Even then we prefer peer review. There is no such scope in startups. Tasks and expertise must be split. It falls to the data scientist to ensure clarity and collecting information horizontally. It is the job of the leadership to enable this or accept dumb risks.</p><p><strong><strong>Opening day</strong></strong></p><p>Clarkson’s pub opening was a monumental task with a thousand details tracked and tackled by an army of experts. Follow the journey from phenomenon to file, guard the twin pillars of <em>measure</em> and <em>compare</em>, and reinforce them up with careful curation and open culture. Do that, and your analysis leaves room for the most important thing: inquiry.</p><p><a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://aneeshsathe.com/tag/ai/" target="_blank">#AI</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://aneeshsathe.com/tag/causal-inference/" target="_blank">#causalInference</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://aneeshsathe.com/tag/clean-data/" target="_blank">#cleanData</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://aneeshsathe.com/tag/data-centric-ai/" target="_blank">#dataCentricAI</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://aneeshsathe.com/tag/data-provenance/" target="_blank">#dataProvenance</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://aneeshsathe.com/tag/data-quality/" target="_blank">#dataQuality</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://aneeshsathe.com/tag/data-science/" target="_blank">#dataScience</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://aneeshsathe.com/tag/evidence-based-decision-making/" target="_blank">#evidenceBasedDecisionMaking</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://aneeshsathe.com/tag/experiment-design/" target="_blank">#experimentDesign</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://aneeshsathe.com/tag/feature-extraction/" target="_blank">#featureExtraction</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://aneeshsathe.com/tag/foundation-engineering/" target="_blank">#foundationEngineering</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://aneeshsathe.com/tag/instrumentation/" target="_blank">#instrumentation</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://aneeshsathe.com/tag/measurement-error/" target="_blank">#measurementError</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://aneeshsathe.com/tag/science/" target="_blank">#science</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://aneeshsathe.com/tag/startup-analytics/" target="_blank">#startupAnalytics</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://aneeshsathe.com/tag/statistical-analysis/" target="_blank">#statisticalAnalysis</a> <a rel="nofollow noopener" class="hashtag u-tag u-category" href="https://aneeshsathe.com/tag/statistics/" target="_blank">#statistics</a></p>
HERMES Datenkompetenzzentrum<p>🛠️ <span class="h-card" translate="no"><a href="https://fedihum.org/@corinnaberg" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>corinnaberg</span></a></span> und Ksenia Stanicka haben im Rahmen des Formats Data Carpentries eine Lektion zum Thema <a href="https://fedihum.org/tags/Metadaten" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Metadaten</span></a> und Metadatenstandards entwickelt und publiziert. </p><p>🔗 Zur News: <a href="https://hermes-hub.de/aktuelles/news/release-2025-05-07.html" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">hermes-hub.de/aktuelles/news/r</span><span class="invisible">elease-2025-05-07.html</span></a> <br>🔗 Direkt zur Lektion: <a href="https://hermes-hub.de/lernen/datacarpentrieslektionen/lektionen/data-and-metadata-in-the-humanities.html" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">hermes-hub.de/lernen/datacarpe</span><span class="invisible">ntrieslektionen/lektionen/data-and-metadata-in-the-humanities.html</span></a> </p><p>🎯 Interesse an einer praktischen Einführung? </p><p>📍 Historikertag 2025 Bonn, Praxislabor <br>📅 16. September 2025, 14:00–15:40 Uhr <br>🌐 <a href="https://digigw.hypotheses.org/6357" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">digigw.hypotheses.org/6357</span><span class="invisible"></span></a> </p><p><a href="https://fedihum.org/tags/DH" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DH</span></a> <a href="https://fedihum.org/tags/DigitalHumanities" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DigitalHumanities</span></a> <a href="https://fedihum.org/tags/DigitaleGeisteswissenschaften" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DigitaleGeisteswissenschaften</span></a> <a href="https://fedihum.org/tags/metadata" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>metadata</span></a> <a href="https://fedihum.org/tags/dataquality" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>dataquality</span></a></p>
Recce - Trust, Verify, Ship<p>Is high-quality data the same as correct data?<br>No, data can pass every test, but still be wrong 😱 </p><p>✅ Schema checks<br>✅ Null constraints<br>🚫 No correctness validation</p><p>Recce introduces a workflow built around data correctness</p><p>Find and fix silent errors:<br><a href="https://reccehq.com/blog/high-quality-data-can-still-be-wrong/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">reccehq.com/blog/high-quality-</span><span class="invisible">data-can-still-be-wrong/</span></a></p><p><a href="https://mastodon.social/tags/dataquality" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>dataquality</span></a> <a href="https://mastodon.social/tags/datavalidation" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>datavalidation</span></a> <a href="https://mastodon.social/tags/dataengineering" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>dataengineering</span></a></p>
Jessica Bennet<p>AI adoption matures, but big challenges remain</p><p>68% of companies now run custom AI in production, with 81% spending $1M+ annually. But issues like poor data, tough training, and project delays still slow progress. As AI goes mainstream, control and trust are the next big frontiers.</p><p><a href="https://mastodon.social/tags/ArtificialIntelligence" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ArtificialIntelligence</span></a> <a href="https://mastodon.social/tags/AIDeployment" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AIDeployment</span></a> <a href="https://mastodon.social/tags/EnterpriseAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>EnterpriseAI</span></a> <a href="https://mastodon.social/tags/DataQuality" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DataQuality</span></a> <a href="https://mastodon.social/tags/MachineLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>MachineLearning</span></a> <a href="https://mastodon.social/tags/GenerativeAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GenerativeAI</span></a> </p><p><a href="https://www.artificialintelligence-news.com/news/ai-adoption-matures-deployment-hurdles-remain/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">artificialintelligence-news.co</span><span class="invisible">m/news/ai-adoption-matures-deployment-hurdles-remain/</span></a></p>
Yonhap Infomax News<p>Fed Chair Powell underscores the critical public value of high-quality economic data amid mounting concerns over statistical reliability.<br><a href="https://mastodon.social/tags/YonhapInfomax" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>YonhapInfomax</span></a> <a href="https://mastodon.social/tags/Powell" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Powell</span></a> <a href="https://mastodon.social/tags/DataQuality" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DataQuality</span></a> <a href="https://mastodon.social/tags/PublicBenefit" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>PublicBenefit</span></a> <a href="https://mastodon.social/tags/FederalReserve" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>FederalReserve</span></a> <a href="https://mastodon.social/tags/EconomicStatistics" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>EconomicStatistics</span></a> <a href="https://mastodon.social/tags/Economics" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Economics</span></a> <a href="https://mastodon.social/tags/FinancialMarkets" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>FinancialMarkets</span></a> <a href="https://mastodon.social/tags/Banking" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Banking</span></a> <a href="https://mastodon.social/tags/Securities" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Securities</span></a> <a href="https://mastodon.social/tags/Bonds" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Bonds</span></a> <a href="https://mastodon.social/tags/StockMarket" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>StockMarket</span></a> <br><a href="https://en.infomaxai.com/news/articleView.html?idxno=68256" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">en.infomaxai.com/news/articleV</span><span class="invisible">iew.html?idxno=68256</span></a></p>
Nayla Salibi<p>هل نواجه "تلوّثًا رقميًا" يُهدد مستقبل <a href="https://social.tchncs.de/tags/%D8%A7%D9%84%D8%B0%D9%83%D8%A7%D8%A1_%D8%A7%D9%84%D8%A7%D8%B5%D8%B7%D9%86%D8%A7%D8%B9%D9%8A" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>الذكاء_الاصطناعي</span></a>؟<br>منذ إطلاق <a href="https://social.tchncs.de/tags/ChatGPT" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ChatGPT</span></a> في 2022، يشبّه خبراء الذكاء الاصطناعي ما حدث بانفجار أول قنبلة ذرية!لماذا ؟<br>👇👇👇<br><a href="https://social.tchncs.de/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://social.tchncs.de/tags/ModelCollapse" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ModelCollapse</span></a> <a href="https://social.tchncs.de/tags/DataQuality" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DataQuality</span></a> <a href="https://social.tchncs.de/tags/ChatGPT" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ChatGPT</span></a> <a href="https://social.tchncs.de/tags/ArtificialIntelligence" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ArtificialIntelligence</span></a> <a href="https://social.tchncs.de/tags/Ethics" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Ethics</span></a> <a href="https://social.tchncs.de/tags/TechPolicy" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>TechPolicy</span></a> </p><p><a href="https://tinyurl.com/5n9xhc6v" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">tinyurl.com/5n9xhc6v</span><span class="invisible"></span></a></p>
PromptCloud<p>Scraping isn’t just about data collection.</p><p>It’s about precision:<br>✔️ Accurate values<br>✔️ Consistent formats<br>✔️ Real-time reliability</p><p>General-purpose AI often falls short.</p><p>That’s why more teams trust PromptCloud for scalable, structured web data.</p><p>📖 Read the full breakdown: <a href="https://shorturl.at/1oTaR" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">shorturl.at/1oTaR</span><span class="invisible"></span></a></p><p><a href="https://mastodon.social/tags/WebScraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WebScraping</span></a> <a href="https://mastodon.social/tags/DataStrategy" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DataStrategy</span></a> <a href="https://mastodon.social/tags/AIrisks" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AIrisks</span></a> <a href="https://mastodon.social/tags/OpenWeb" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OpenWeb</span></a> <a href="https://mastodon.social/tags/PromptCloud" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>PromptCloud</span></a> <a href="https://mastodon.social/tags/DataQuality" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DataQuality</span></a> <a href="https://mastodon.social/tags/BusinessTech" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>BusinessTech</span></a></p>
PromptCloud<p>Bots don’t scroll — they crawl. 🕷️</p><p>Today’s <a href="https://mastodon.social/tags/UncomplicateSeries" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>UncomplicateSeries</span></a> explains what a web crawler is and why it matters.</p><p>👉 <a href="https://bit.ly/43In4ur" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">bit.ly/43In4ur</span><span class="invisible"></span></a></p><p><a href="https://mastodon.social/tags/WebScraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WebScraping</span></a> <a href="https://mastodon.social/tags/DataQuality" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DataQuality</span></a> <a href="https://mastodon.social/tags/CleanData" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>CleanData</span></a> <a href="https://mastodon.social/tags/BigData" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>BigData</span></a></p>
PromptCloud<p>Others are still setting up proxies.</p><p>PromptCloud? Already delivered the data.</p><p>Pricing. Benchmarking. Market research, at scale.</p><p>⚡ That’s what winning looks like.<br>👉 <a href="https://bit.ly/43VArWP" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">bit.ly/43VArWP</span><span class="invisible"></span></a></p><p><a href="https://mastodon.social/tags/WebScraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WebScraping</span></a> <a href="https://mastodon.social/tags/DataQuality" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DataQuality</span></a> <a href="https://mastodon.social/tags/BigData" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>BigData</span></a></p>
Bytes Europe<p>When economic data quality deteriorates: Two thoughts for investors <a href="https://www.byteseu.com/1087842/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="">byteseu.com/1087842/</span><span class="invisible"></span></a> <a href="https://pubeurope.com/tags/BureauOfLaborStatistics" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>BureauOfLaborStatistics</span></a> <a href="https://pubeurope.com/tags/DataQuality" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DataQuality</span></a> <a href="https://pubeurope.com/tags/EconomicData" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>EconomicData</span></a> <a href="https://pubeurope.com/tags/economy" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>economy</span></a> <a href="https://pubeurope.com/tags/investors" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>investors</span></a> <a href="https://pubeurope.com/tags/LaborMarket" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>LaborMarket</span></a> <a href="https://pubeurope.com/tags/PaulDonovan" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>PaulDonovan</span></a> <a href="https://pubeurope.com/tags/ResponseRates" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ResponseRates</span></a> <a href="https://pubeurope.com/tags/StockMarket" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>StockMarket</span></a></p>
PromptCloud<p>Think you’re human? <br>Prove it. <br>That’s what a CAPTCHA asks.</p><p>Today’s <a href="https://mastodon.social/tags/UncomplicateSeries" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>UncomplicateSeries</span></a> breaks down CAPTCHA types &amp; what bypassing them means in web scraping.</p><p>📌 How do smart bots get past them?</p><p>👉 <a href="https://bit.ly/4kSRrUA" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">bit.ly/4kSRrUA</span><span class="invisible"></span></a></p><p><a href="https://mastodon.social/tags/WebScraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WebScraping</span></a> <a href="https://mastodon.social/tags/DataQuality" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DataQuality</span></a></p>
GESIS<p><a href="https://sciences.social/tags/dataquality" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>dataquality</span></a> <a href="https://sciences.social/tags/Surveydata" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Surveydata</span></a> <a href="https://sciences.social/tags/digitalbehavioraldata" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>digitalbehavioraldata</span></a> <a href="https://sciences.social/tags/linkeddatasources" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>linkeddatasources</span></a><br>Official launch of the <a href="https://sciences.social/tags/KODAQS" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>KODAQS</span></a> <a href="https://sciences.social/tags/Toolbox" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Toolbox</span></a> in July 2025 </p><p>The KODAQS Toolbox is a new, open platform for assessing and improving data quality in the social sciences. It supports researchers in systematically reflecting on the quality of their data - along three central data types: Survey data, digital behavioral data (e.g. app or sensor data) and linked data sources (e.g. register and geospatial data).<br><a href="https://kodaqs-toolbox.gesis.org/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">kodaqs-toolbox.gesis.org/</span><span class="invisible"></span></a></p>
PromptCloud<p>Imagine waking up to fresh, structured, compliant data.</p><p>Every. Single. Day.</p><p>That’s not a dream. That’s <a href="https://mastodon.social/tags/PromptCloud" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>PromptCloud</span></a>!</p><p><a href="https://mastodon.social/tags/DataQuality" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DataQuality</span></a> <a href="https://mastodon.social/tags/WebScraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WebScraping</span></a> <a href="https://mastodon.social/tags/CleanData" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>CleanData</span></a> <a href="https://mastodon.social/tags/BigData" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>BigData</span></a> <a href="https://mastodon.social/tags/DataExtraction" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DataExtraction</span></a></p>
Julien Benedetti<p>Tiens hier a été lancé une concertation IA et culture (bon en fait industrie culturelle) par C.Chappaz et R.Dati via la CSPLA. Dans les deux discours il est fait mention de qualité de la donnée et de donnée fiable. J'avoue j'ai ri mais j'ai ri. cc <span class="h-card" translate="no"><a href="https://mastodon.social/@CharlesNepote" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>CharlesNepote</span></a></span> <a href="https://framapiaf.org/tags/DataLove" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DataLove</span></a> <a href="https://framapiaf.org/tags/dataquality" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>dataquality</span></a> <a href="https://framapiaf.org/tags/IA" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>IA</span></a> <a href="https://framapiaf.org/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a></p>
SAP S/4HANA®-Experts<p>Quality Assurance in SAP Data Migrations</p><p>The SAP migration run is usually repeated several times to improve data quality and eliminate errors. Usually, a SAP system copy is created before the data migration so that the system can be reset to this state at any time. This allows iterative improvement processes in which data migrations can be repeated multiple times. Check out the core magazine to learn more:</p><p><a href="https://s4-experts.com/2024/01/16/sap-s-4hana-datenmigration-nicht-ohne-qualitatssicherung/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">s4-experts.com/2024/01/16/sap-</span><span class="invisible">s-4hana-datenmigration-nicht-ohne-qualitatssicherung/</span></a></p><p><a href="https://saptodon.org/tags/SAP" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SAP</span></a> <a href="https://saptodon.org/tags/Migration" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Migration</span></a> <a href="https://saptodon.org/tags/DataQuality" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DataQuality</span></a> <a href="https://saptodon.org/tags/qualityassurance" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>qualityassurance</span></a></p>
PromptCloud<p>Web scraping needs vary widely, so should your approach.<br>Should you:</p><p>• Build your own custom scrapers?<br>• Use a plug-and-play scraping tool?<br>• Go fully managed with a web scraping service?</p><p>In this blog, we simplify the decision-making process with a no-fluff comparison of:<br>✅ Cost<br>✅ Control<br>✅ Scalability<br>✅ Maintenance</p><p>🔗 Read the full blog: <a href="https://bit.ly/3ZHWxL6" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">bit.ly/3ZHWxL6</span><span class="invisible"></span></a></p><p><a href="https://mastodon.social/tags/DataQuality" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DataQuality</span></a> <a href="https://mastodon.social/tags/WebScraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WebScraping</span></a> <a href="https://mastodon.social/tags/CleanData" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>CleanData</span></a> <a href="https://mastodon.social/tags/BigData" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>BigData</span></a> <a href="https://mastodon.social/tags/DataExtraction" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DataExtraction</span></a> <a href="https://mastodon.social/tags/productdata" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>productdata</span></a> <a href="https://mastodon.social/tags/DataEngineering" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DataEngineering</span></a> <a href="https://mastodon.social/tags/TechForBusiness" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>TechForBusiness</span></a> <a href="https://mastodon.social/tags/MarketInsights" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>MarketInsights</span></a></p>
Yuna<p>Garbage in, garbage out – even Agentic AI can’t save you from yourself.</p><p>Artificial intelligence is only as brilliant as the data it’s spoon-fed – and spoiler alert: your data is often trash.<br>Whether it’s traditional machine learning, generative models, or your shiny new agentic systems, the pattern remains insultingly consistent:<br> • Bad data? Expect bad decisions.<br> • Incomplete data? Enjoy half-baked ideas.<br> • Outdated data? Say hello to irrelevant nonsense.</p><p>I often talk about what AI can or tragically still can’t do.<br>But here’s the real twist: the problem isn’t the system. It’s you. Or more specifically, the glorious mess you call your “data foundation.”</p><p>You don’t have a lack of innovation.<br>You have a lack of clean data structures, maintained knowledge bases, and basic contextual awareness.<br>And then you expect the AI to magically fill gaps that should never have existed in the first place.</p><p><a href="https://hachyderm.io/tags/ArtificialIntelligence" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ArtificialIntelligence</span></a> <a href="https://hachyderm.io/tags/MachineLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>MachineLearning</span></a> <a href="https://hachyderm.io/tags/DataScience" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DataScience</span></a> <a href="https://hachyderm.io/tags/DataQuality" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DataQuality</span></a> <a href="https://hachyderm.io/tags/DataManagement" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DataManagement</span></a> <a href="https://hachyderm.io/tags/BigData" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>BigData</span></a> <a href="https://hachyderm.io/tags/coding" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>coding</span></a> <a href="https://hachyderm.io/tags/Programming" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Programming</span></a></p>
GESIS<p><a href="https://sciences.social/tags/GESISGuides" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GESISGuides</span></a> <a href="https://sciences.social/tags/DBD" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DBD</span></a> <a href="https://sciences.social/tags/DataQuality" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DataQuality</span></a><br>Three new GESIS Guides to Digital Behavioral Data out now - get helpful information on data quality now:</p><p>* Bleier, A.: What is Computational Reproducibility? </p><p>* Fröhling, L., Birkenmaier, L., Lux, V., &amp; Daikeler, J.: How to Find and Explore Data Quality Frameworks for Digital Behavioral Data</p><p>*Lux, V., &amp; Wieland, M.: How to Set up and Monitor App-based Data Collections</p><p>Check out the whole collection of our Guides to DBD:<br><a href="https://www.gesis.org/en/gesis-guides/gesis-guides-to-digital-behavioral-data" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">gesis.org/en/gesis-guides/gesi</span><span class="invisible">s-guides-to-digital-behavioral-data</span></a></p>
HEDDA.IO<p>Building data pipelines is hard enough—keeping them reliable shouldn't be a guessing game.</p><p>Our blog post covers practical <a href="https://mastodon.social/tags/DataObservability" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DataObservability</span></a> for engineers—catch issues early, validate better, and build trust in your workflows.</p><p>👉 Read more: <a href="https://hedda.io/data-observability-for-data-engineers/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">hedda.io/data-observability-fo</span><span class="invisible">r-data-engineers/</span></a></p><p><a href="https://mastodon.social/tags/dataengineering" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>dataengineering</span></a> <a href="https://mastodon.social/tags/dataquality" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>dataquality</span></a></p>
SODa<p>Hast du Fragen zu OpenRefine &amp; brauchst Unterstützung bei deinen Projekten? Dann komm zu unserer regelmäßigen OpenRefine Sprechstunde!</p><p>🗓 Wann?<br>Do. 22.05. 15:00 – 16:00 Uhr<br>📍 Wo?<br>Online</p><p>Nutzt die Gelegenheit, um eure Fragen zu klären, Tipps zu erhalten oder gemeinsam an euren Datenprojekten zu arbeiten.<br>Alle Infos &amp; Link: <a href="https://sammlungen.io/termine/openrefine-sprechstunde?utm_campaign=coschedule&amp;utm_source=mastodon&amp;utm_medium=SODa%40fedihum.org" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">sammlungen.io/termine/openrefi</span><span class="invisible">ne-sprechstunde?utm_campaign=coschedule&amp;utm_source=mastodon&amp;utm_medium=SODa%40fedihum.org</span></a><br><a href="https://fedihum.org/tags/SODaZentrum" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SODaZentrum</span></a> <a href="https://fedihum.org/tags/OpenRefine" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OpenRefine</span></a> <a href="https://fedihum.org/tags/Dataquality" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Dataquality</span></a> <a href="https://fedihum.org/tags/DataLiteracy" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DataLiteracy</span></a></p>