Dealing with Link Rot, with auto-archiving
Links die, fact of the internet. Images in particular have lots of links to external sites over time, they become broken.
We could help end users/visitors, find a copy of the original page, even if the site is ofline.
For example, we could:
1) Extracts all links from all images (and articles/collections?)
2) For each one, check if they in an archive repository somewhere (archive.org, webarchive.org.uk etc) - ideally as close to possible as the date the link was first added.
... if so make a note of that location
... if not, do what can encourage one or more of them take an archive copy ASAP. (and if successful note its new location!)
3) Check every link, to see if it still online.
... If so great
... if not, could then either a) just modify the original link to now point to the archive copy, or maybe b) just offer the archive link as a additional link on the photo page.
Importantly, this is not a one off task, but one that needs repeating often. Because new links are added all the time, and more of the links will inevitiably die.
(maybe later will need a 4), check the archive copy itself hasn't disappeared, and if so replace it with a different archive link!)
project comments powered by Disqus
