Sample Selection Methods
Published: 23 April 2011
This document is intended for interested developers looking at various methods of selecting a sample of images. For anyone else it's probably gibberish, and can be ignored!Currently the browse page, just picks a method at random from the following list. Once a method has been selected, it's used for a while, to avoid appearance of the selection changing on every page load.
All the following methods assume that aiming for $limit images in the sample. $where contains the criteria used for the current selection.
category
SELECT * FROM gridimage_search WHERE $where GROUP BY imageclass ORDER BY seq_no LIMIT $limitVery simply and quick. The main drawback is takes no account how many categories there are, eg might only be two categories, or there could be hundreds, and the selection just picks a few arbitrary ones. Thinking now SELECT *,COUNT(*) c ... ORDER BY c DESC would be slightly better.
groups
SELECT DISTINCT * FROM gridimage_search INNER JOIN gridimage_group USING (gridimage_id) WHERE $where GROUP BY label ORDER BY seq_no LIMIT $limitSimilar issues to the category one. No idea how many groups/clusters there are for the square.
The 'clusters' are terms associated with each image, extracted by carrot2 software
user+category
CREATE TEMPORARY TABLE table SELECT * FROM gridimage_search WHERE $where ORDER BY ftf BETWEEN 1 AND 4 DESC, REVERSE(gridimage_id)
ALTER IGNORE TABLE $table ADD UNIQUE (user_id),ADD UNIQUE (imageclass)
SELECT * FROM $table LIMIT $limit
based on the idea presented here: Link
Again a similar issue to the first third, no idea if going to give an appropriate number
spaced
set @c := -1
$space = max(1,floor($square->imagecount/$limit));
select *,if(@c=$space,@c:=0,@c:=@c+1) AS c FROM gridimage_search WHERE $where HAVING c=0 ORDER by seq_no
Pretty good, picking images throughout the results, helps avoid runs when a contributor submits lots of similar images
any
SELECT * FROM gridimage_search WHERE $where ORDER BY ftf BETWEEN 1 AND 4 DESC LIMIT $limitFavours the first images of each contributor.
random
SELECT * FROM gridimage_search WHERE $where ORDER BY ftf BETWEEN 1 AND 4 DESC, REVERSE(gridimage_id) LIMIT $limitNot actually 'random' but uses a predictable ordering, but one that isn't apparent so might as well be a random order.
latest
SELECT * FROM gridimage_search WHERE $where ORDER BY ftf BETWEEN 1 AND 4 DESC, seq_no DESC LIMIT $limitShows the most recent, but favours the first geographs from each contributor
few
SELECT * FROM gridimage_search WHERE $where ORDER BY seq_no LIMIT $limitJust picks the first images. Very simple. But suffers that if a contributor submits a selection of similar images, they will overrun the results. Can not be considered representative in a selection of hundreds of images!