Processing done on Geograph Images

Published: 8 August 2017

Processing done by Geograph

Note, although Geograph itself actually processes the images through the algorithm, the algorithms themselves are generally created by a third party!

Term Extraction

Using various online APIs, (like the Yahoo Term Extraction API LinkExternal link ) to extract 'key' terms from the contributor supplied textual description.
Creates a sort of tag, so could allow faceted browsing.

We've run a vast majority of Geograph images though one API or other (changed as availability of APIs change)

Data is available via: and is used in a few places on Geograph Website.

Example: LinkExternal link

Cluster Labels

Using Carrot2 clustering engine LinkExternal link , cluster labels have been assigned to a majority of images.
This can in some cases pick up associations not actually in the original metadata.

Data is available via: and is used in a few places on Geograph Website.

Example: LinkExternal link
And see the 'Automatic Clusters' sidebar in LinkExternal link

Computer Vision API

A pre-trained computer Artificial Intelligence is used to predict labels using the image itself LinkExternal link . (not any text linked with the image)

We've run most geograph images this the system, enough to experiment with the data. See:
LinkExternal link

Land Cover

Land cover describes the physical material on the surface of the country. For example: grassland, woodland, rivers & lakes and artificial materials such as roads and buildings.

A computer algorithm predicts the landcover by looking at satellite data (imagery etc), we then classify images by landcover.

For the moment see: LinkExternal link

Link Checking and Web-Archiving

We have extracted all links from all image descriptions, and undertaken to check if they are still valid. At the same time check if the pages referred are available in any oneline 'webarchive'. So that if the page ever disappears, we can instead link to the archived version. If not found in an archive, we do attempt to ask for the page to be archived!

See: Checking External Links

Nearest Placename

We do lookup the nearest settlement to the image, by consulting various gazetteers (generally the best we have available!) This allows us to show a 'near placename' on the photo page. But also most gazetteers also list a county/country for the place, so this gives an approximate country/county for the image too (but is not exact).

Processing done by Third Parties

Scenic predictions

A sample of some 200,000 images around London have been processed to predict scenicness. Trained with data from LinkExternal link which rated a different selection via on online game.

Academic paper:
Seresinhe CI, Preis T, Moat HS (2017) Using deep learning to quantify the beauty of outdoor places. Royal Society Open Science 4(7): 170170. LinkExternal link
Additionally, please cite the Dryad data package:

Output data:
Seresinhe CI, Preis T, Moat HS (2017) Data from: Using deep learning to quantify the beauty of outdoor places. Dryad Digital Repository. LinkExternal link


Wikimedia Commons

Each image that is cross-submitted to Wikimedia commons ends up getting a category (Like 'Churches in England'). There are various algorithms that do this (ie the category ends up getting assigned by bot), but is subject to human curation too.

Creative Commons Licence [Some Rights Reserved]   Text © Copyright August 2017, Barry Hunter; licensed for reuse under a Creative Commons Licence.
With contributions by Penny Mayes. (details)
You are not logged in login | register