IBBoard<p>End of day conclusion: it sucks and there's going to be some false-positives that I probably can't get rid of. </p><p>A two phase "get what I want, get what I don't want, and subtract the latter from the former in case 'what I want' had false positives" have WORSE behaviour because "what I don't want" was returning far too much "what I want" as well. </p><p>SpaCy and part of speech tagging was more powerful and accurate and easier to manage (and more enjoyable to work with) BUT more prone to missing things or extracting slightly less natural terms (because I was trying not to extract excessively verbose and unnecessary terms)</p><p><a href="https://hachyderm.io/tags/GenAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GenAI</span></a> <a href="https://hachyderm.io/tags/NLP" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>NLP</span></a> <a href="https://hachyderm.io/tags/spaCy" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>spaCy</span></a></p>