I haven't thought "I should try to build my *own* web spider, then maybe I could find things." since... Well, since 1998.
:/
To remove & externalise bookmark dependency from browsers, I’ve resorted to manually collecting & curating links as I find them, with personal notes+tags reminding me why they are of interest. They’re always 100% searchable & findable.
Given the inconsiderate, effective DDOS behavior of AI scraper bots, adding to that melee with more robo-indexing may not produce a usable search index - https://mastodon.social/@dahukanna/113741237599333856
I'm thinking of something much more modest:
… extract links from within the post and links to the source post?
I think so, yes. Basically I want a database of every single link that's been posted to *my* feed. It would also contain any hash tags used with the link, the post ID so I can go back and see the context.
Next I'd strip out all of the "big sites" and focus more on the obscure.
Then if I'm curious about, say # fossils I would get links mentioned in that context.
And if # fossils is used with the tag # crinoids often I could move laterally and find more links.
Importantly this database would grow over time, it wouldn't be focused on "what's new" ... basically I have a high level of trust in the way people #onhere associate hash tags with links and I think that'd be a great way to find things.
In fact I do it manually often enough, but it's time consuming. I just want all of the links sometimes.