Sample Selection Methods

Published: 23 April 2011
This document is intended for interested developers looking at various methods of selecting a sample of images. For anyone else it's probably gibberish, and can be ignored!

Currently the browse page, just picks a method at random from the following list. Once a method has been selected, it's used for a while, to avoid appearance of the selection changing on every page load.


All the following methods assume that aiming for $limit images in the sample. $where contains the criteria used for the current selection.


category

SELECT * FROM gridimage_search WHERE $where GROUP BY imageclass ORDER BY seq_no LIMIT $limit

Very simply and quick. The main drawback is takes no account how many categories there are, eg might only be two categories, or there could be hundreds, and the selection just picks a few arbitrary ones. Thinking now SELECT *,COUNT(*) c ... ORDER BY c DESC would be slightly better.

groups

SELECT DISTINCT * FROM gridimage_search INNER JOIN gridimage_group USING (gridimage_id) WHERE $where GROUP BY label ORDER BY seq_no LIMIT $limit

Similar issues to the category one. No idea how many groups/clusters there are for the square.

The 'clusters' are terms associated with each image, extracted by carrot2External link software

user+category


CREATE TEMPORARY TABLE table SELECT * FROM gridimage_search WHERE $where ORDER BY ftf BETWEEN 1 AND 4 DESC, REVERSE(gridimage_id)

ALTER IGNORE TABLE $table ADD UNIQUE (user_id),ADD UNIQUE (imageclass)

SELECT * FROM $table LIMIT $limit

based on the idea presented here: LinkExternal link

Again a similar issue to the first third, no idea if going to give an appropriate number


spaced


set @c := -1
$space = max(1,floor($square->imagecount/$limit));

select *,if(@c=$space,@c:=0,@c:=@c+1) AS c FROM gridimage_search WHERE $where HAVING c=0 ORDER by seq_no

Pretty good, picking images throughout the results, helps avoid runs when a contributor submits lots of similar images

any

SELECT * FROM gridimage_search WHERE $where ORDER BY ftf BETWEEN 1 AND 4 DESC LIMIT $limit

Favours the first images of each contributor.


random

SELECT * FROM gridimage_search WHERE $where ORDER BY ftf BETWEEN 1 AND 4 DESC, REVERSE(gridimage_id) LIMIT $limit

Not actually 'random' but uses a predictable ordering, but one that isn't apparent so might as well be a random order.


latest

SELECT * FROM gridimage_search WHERE $where ORDER BY ftf BETWEEN 1 AND 4 DESC, seq_no DESC LIMIT $limit

Shows the most recent, but favours the first geographs from each contributor

few

SELECT * FROM gridimage_search WHERE $where ORDER BY seq_no LIMIT $limit

Just picks the first images. Very simple. But suffers that if a contributor submits a selection of similar images, they will overrun the results. Can not be considered representative in a selection of hundreds of images!




Creative Commons Licence [Some Rights Reserved]   Text © Copyright April 2011, Barry Hunter; licensed for reuse under a Creative Commons Licence.
With contributions by Penny Mayes. (details)
You are not logged in login | register