  GIFT user's guide
  Wolfgang Mller
  0.1.4, 8th of October 2000

  The GIFT is a content based image retrieval system.

  ______________________________________________________________________

  Table of Contents


  1. Introduction

  2. MRML - Why we publish GIFT this way

  3. Installation

  4. Prepare and run the server

     4.1 $GIFT_HOME: where the configuration data goes
     4.2 Indexing a collection
        4.2.1 Indexing in multiple runs
        4.2.2 Handling files in public_html

  5. Getting started with the Charmer interface

  6. Troubleshooting

     6.1 No connection
     6.2 Instead of images, I see empty frames
        6.2.1 Wrong image file format
        6.2.2 JAVA security gets in our way
     6.3 Digging in the indexing files
        6.3.1 url2fts
           6.3.1.1 Where is the url2fts for a given collection?
           6.3.1.2 Modifying url2fts for moving a collection
           6.3.1.3 JAVA: file:/... URLs vs. http://.. URLs

  7. How to analyze GIFT

  8. ToDo List

     8.1 Code and documentation quality
     8.2 Known shortcomings
     8.3 Features to be added soon
     8.4 Interesting things to be added
     8.5 MRML
     8.6 Research
     8.7 Call for participation

  9. Contact details

     9.1 GIFT


  ______________________________________________________________________

  11..  IInnttrroodduuccttiioonn

  Caveat: THIS PROGRAM NEEDS MORE PREREQUISITES THAN OTHER GNU PACKAGES
  (see below). At present, it installs neither info files nor man pages.
  However, otherwise the install procedure tries to be GNU conformant.

  GIFT is a content based image retrieval system (CBIRS). It gives the
  user the possibility to index and search images without having to
  annotate them first. Indexing is done using image properties such as
  color and texture.

  The query is one or multiple image _e_x_a_m_p_l_e_s.

  Content based image retrieval (CBIRS) is presently an area of vivid
  research interest, yielding many different systems.

  Our system is different from other systems with respect to the
  indexing method. We use very many very simple features which are
  translated into some kind of "pseudo text" for each image. On this
  representation we use inverted files as indexing technique. In our
  opinion this representation has two great advantages:

  +o  The representation is very flexible, in that it allows good
     incorporation of relevance feedback.

  +o  The representation is very flexible, in that it will allow
     completely integrated treatment of text (ASCII, HTML etc.) and
     images.

  both at a reasonable speed, which we hope to increase further. The
  current implementation still leaves room for optimization.

  This document will be the user's guide to the GIFT content based image
  retrieval system. It is aimed at the user of such a system.  Its
  companion document will be a programmer's manual elaborating details
  of interfacing etc. and a document describing the communication
  protocol, MRML, which we use for the communication between the user
  interface and the image retrieval system.



  22..  MMRRMMLL -- WWhhyy wwee ppuubblliisshh GGIIFFTT tthhiiss wwaayy

  One big problem of CBIRS research is the non-existence of a common
  benchmark, measuring the quality of retrieval results. In our project
  we have both worked on reasonable measures for benchmarks, and on
  software for performing this.  A precondition for a common benchmark
  is a common interface (in the programming sense of the word),
  permitting the connection of the benchmarking program to the
  benchmarked program.  MRML is an XML based language providing such an
  interface.

  The other great effect of such an interface language is the
  possibility to create GUIs for CBIRS which can be used by all MRML
  compliant servers.  Charmer provided by our partners at EPFL (contact
  details in the Author field) is a very beautiful interface which will
  equally become free software.

  We (EPFLausanne and CUI, Geneva) provide this software with the goal
  of promoting the use of MRML. Wide-spread use of MRML could both help
  scientists (less work and easier exchange) and normal users (easy
  combination of software packages from different sources. Think, for
  example, of a GIMP plug-in which would help the user organize his
  images).

  In fact, the idea of having interested real-world users test a system
  like ours is very attractive, as data about real users are hard to
  get. We hope to motivate users to share anonymized, preprocessed log
  files in order to help us improve our systems.

  33..

  IInnssttaallllaattiioonn


  Caveat: THIS PROGRAM NEEDS MORE PREREQUISITES THAN OTHER GNU PACKAGES.
  However, otherwise the install procedure is GNU conformant.

  Please tell us about any bugs in the installation procedure.

  The distribution so far has been tested on a GNU/Linux distribution by
  SuSE, and on a SUN Ultra Sparc 10 running Solaris.  Both machines had
  memory size 64MB and more than that.

  Get installed: you have received the package GIFT-0.1.4.tgz. Now:

  +o  Before doing anything make sure you have installed

  +o  a recent g++ (we suggest 2.95. egcs 1.1 should work, too)

  +o  Image Magick (the feature extraction uses it to convert image
     formats to PPM files.)

  +o  Perl younger than 5.003 (you get that at www.cpan.org)

  +o  The Perl modules

  +o  XML::Parser (you also get that at www.cpan.org)

  +o  XML::Writer

  +o  HTML::Parser (if not already provided by your GNU/Linux
     distribution)

  +o  libnet (if not already provided by your GNU/Linux distribution)

  +o  libwww (if not already provided by your GNU/Linux distribution)

     The reason for this is, that the current makefiles _t_e_s_t for these
     programs, but they do not draw any consequences.

  +o  Unpack it by doing

     ___________________________________________________________________
     tar -xvf
     GIFT-0.1.4.tar
     ___________________________________________________________________



  +o  Follow the install instructions in the file

     ___________________________________________________________________
     INSTALL
     ___________________________________________________________________



  +o  If you have installed KDOC, change into the directory Doc, and do

     ___________________________________________________________________
     make system-doc
     ___________________________________________________________________


  You can now look at the system documentation by typing

  ______________________________________________________________________
  lynx Doc/autoDoc/HTML/hier.html
  ______________________________________________________________________

  +o  Unpack now the charmer archive:

     ___________________________________________________________________
     tar -xvf
     Charmer-0.2.tar
     ___________________________________________________________________



  +o  Run the configuration script

     ___________________________________________________________________
     cd Charmer-0.2;perl write-applet-frame.pl
     ___________________________________________________________________



  44..  PPrreeppaarree aanndd rruunn tthhee sseerrvveerr

  We assume that you have successfully installed GIFT. Now you want to
  start the server, I suppose. But first you have to give to GIFT an
  image collection.

  44..11..  $$GGIIFFTT__HHOOMMEE:: wwhheerree tthhee ccoonnffiigguurraattiioonn ddaattaa ggooeess

  By default, the indexing data of the collections, as well as the
  configuration files reside in your home directory.

  To change this default, set the environment variable GIFT_HOME the to
  absolute path of the directory where you want your gift-indexing-data
  etc. to reside and put the line:


  ______________________________________________________________________
  export GIFT_HOME=/absolute/path/to/my_gift_home
  ______________________________________________________________________



  into your .bashrc

  Now you are ready to index a collection.

  44..22..  IInnddeexxiinngg aa ccoolllleeccttiioonn

  The vanilla way of indexing a collection is by typing

  ______________________________________________________________________
  gift-add-collection.pl /absolute/name/of/a/directory/tree/containing/images/
  ______________________________________________________________________


  This script will then create the appropriate configuration files and
  index directories in your home directory (or in $GIFT_HOME). Please
  note that it should be possible to index all images within your home
  directory tree by typing


  ______________________________________________________________________
  gift-add-collection.pl ~
  ______________________________________________________________________





  The script, as it is now takes 20-30 seconds per image to create the
  base indexing data for inverted file creation. The inverted file
  creation itself will take about 5 minutes to about an hour, depending
  on the size of the collection. The biggest collections we have indexed
  so far are 13000 images. This took our fast server two days. 500
  images get indexed on my portable AMD-K6-2-based computer in less than
  3 hours.

  44..22..11..  IInnddeexxiinngg iinn mmuullttiippllee rruunnss

  If you have to index a large collection or if you are indexing your
  collection on a portable computer, indexing a collection in one run
  becomes inacceptable. If you stop the

  ______________________________________________________________________
  gift-add-collection.pl
  ______________________________________________________________________


  during feature generation and indexing, you can resume by simply
  restarting the program. On restart, it will check each file if it was
  correctly generated and resume operation at the first file which was
  not correctly generated.

  44..22..22..  HHaannddlliinngg ffiilleess iinn ppuubblliicc__hhttmmll

  Most web server configurations have for each user a directory which is
  published on the web (if the file permissions are set accordingly). On
  most systems this directory is  /public_html, and the associated URL
  is http://localhost/ your_username/ . gift-add-collection.pl takes
  these settings into account. If an image file

  ~/public_html/a_file

  gift-add-collection.pl will generate a

  http://localhost/~your_username/a_file

  which are elsewhere, urls of the kind

  file:/a/path/another_file


  will be generated. If you are not happy with the settings, we suggest
  you to run

  ______________________________________________________________________
  gift-add-collection.pl --help
  ______________________________________________________________________


  and to read the "Digging in the indexing files" section of this docu-
  ment.

  When the indexing is done, you can start the GIFT server:

  ______________________________________________________________________
  gift
  ______________________________________________________________________


  At the time of writing, gift will output tons of debugging output on
  the screen. Most of the time it also will tell you why it dies, if it
  dies. In the cases known to me, reasons for dying are usually inappro-
  priate config file locations, as well as the untrapped possibilty to
  nuke the server using faulty XML or a non-exsistent session ID.
  It will start up, and it will listen on the socket 12789 for
  connecting clients.

  While 12789 is the port number by default, you can override this
  default port number by giving it as the first parameter. In addition
  to that you can override GIFT_HOME by giving it as parameter to the
  GIFT, for example:

  ______________________________________________________________________
  gift 12888 /usr/local/share/shared-gift-collections/
  ______________________________________________________________________



  55..  GGeettttiinngg ssttaarrtteedd wwiitthh tthhee CChhaarrmmeerr iinntteerrffaaccee


  +o  The installation is not yet GNU-ish: Simply unpack the archive and
     do:

     ___________________________________________________________________
     cd Charmer-0.2;perl write-applet-frame.pl
     ___________________________________________________________________



  +o  Start the Charmer interface by typing

     ___________________________________________________________________
     cd Charmer-0.2;appletviewer Charmer.html
     ___________________________________________________________________



  Note that you _h_a_v_e to cd into the directory where Charmer resides.
  Otherwise you would have to play with CLASSPATHs.

  +o  Click on the button which symbolizes a handshake.

  +o  Fill in the request: In the first line you give the host and the
     port, separated by a colon (e.g. localhost:12789). In the second
     line you give a username. This serves for opening a session under
     your name. We plan to add persistent user management to gift, to
     make it possible for the user to choose between different sessions.

  +o  As a reaction to pressing OK in the request, the interface changes:
     It now gives you a choice between different algorithms and
     collections. In this case all the algorithms are weighting
     functions for ranked queries on inverted files. However, in
     principle, you can put anything there.

  +o  Click the dice symbol. You will get a random selection of images
     from the collection you chose.

  +o  You now have the possibility to click on one or multiple of these
     images (they will get a green frame when clicked), and send a query
     for them by clicking on the _b_i_n_o_c_u_l_a_r_s button.

  +o  Getting back your query result, you are able to improve it. You can
     either click on some of the query results (thus adding "positive"
     images to your query) or click the right mouse button on an image
     and use the menu that pops up to indicate that you want to exclude
     them from further consideration.

  +o  To clear your query, press the button with the curved arrow, this
     deletes yur query, but does not clear the display of the result
     set.

  66..  TTrroouubblleesshhoooottiinngg

  66..11..  NNoo ccoonnnneeccttiioonn

  On connecting to a remote server, you do not get the choice between
  different algorithms (and at least one collection).

  Be aware, that an applet can only connect to the server it came from.
  So you have to see to it that your appletviewer fetches the
  SnakeCharmer applet from the remote server.

  66..22..  IInnsstteeaadd ooff iimmaaggeess,, II sseeee eemmppttyy ffrraammeess

  There are several possible reasons for this

  66..22..11..  WWrroonngg iimmaaggee ffiillee ffoorrmmaatt

  For generality, we allow quite a number of image formats: files with
  the extension png, gif, jpg, jpeg, eps, and ppm.  unfortunately JAVA
  is only able to digest GIF and JPEG.

  We create for each image also a 128x128 thumbnail in jpeg format.
  Unfortunately, SnakeCharmer still displays the original image instead
  of the thumbnails. It's on our TODO list.

  66..22..22..  JJAAVVAA sseeccuurriittyy ggeettss iinn oouurr wwaayy

  JAVA 1.1.x and JAVA 1.2 have different security concepts. While in
  JAVA 1.1.x the appletviewer does not load any remote IDs (at least I
  had once trouble with that), JAVA 1.2's appletviewer is extremely
  restrictive, as restrictive as a browser would be. As a consequence,
  you have to put your images and thumbnails into a web-published area,
  if you want to see anything.

  See the "digging in the indexing files" section for more information

  66..33..  DDiiggggiinngg iinn tthhee iinnddeexxiinngg ffiilleess

  66..33..11..  uurrll22ffttss

  MRML is based on URLs. Based on the url, it will see if it knows
  already the image, if the image is in the indexed collection etc. . If
  the image is unknown, it will create new features for finding a
  corresponding image etc. . As a consequence, if you move images that
  have been indexed from one directory to another, you will have to
  change the translation table from url to feature file. This resides in
  the "url2fts" files.

  66..33..11..11..  WWhheerree iiss tthhee uurrll22ffttss ffoorr aa ggiivveenn ccoolllleeccttiioonn??

  Assuming you have a collection containing images from

  ______________________________________________________________________
  /this/is/a/path/to/my_collection/
  ______________________________________________________________________


  You will find the indexing data in the directory

  ______________________________________________________________________
  ${VIPER_HOME:-$HOME}/gift-indexing-data/my_collection/
  ______________________________________________________________________


  The translation table from URL to feature file resides in

  ______________________________________________________________________
  ${VIPER_HOME:-$HOME}/gift-indexing-data/my_collection/url2fts
  ______________________________________________________________________



  66..33..11..22..  MMooddiiffyyiinngg uurrll22ffttss ffoorr mmoovviinngg aa ccoolllleeccttiioonn

  Do

  ______________________________________________________________________
  emacs \
  ${VIPER_HOME:-$HOME}/gift-indexing-data/my_collection/url2fts
  ______________________________________________________________________


  What you get to see is a long table of the structure


  image_location_url thumbnail_location_url feature_file_name



  Clearly, if you move the images from one location to another, you have
  to adjust image_location and thumbnail_location. Usually this amounts
  to a simple query-replace operation.

  For example: at my place I have the collection TSR500, and the first
  line looks like this:

  http://localhost/~muellerw/images/TSR500/b_1002_scaled_small.jpg http://localhost/~muellerw/images/TSR500_thumbnails/b_1002_scaled_small_thumbnail_jpg.jpg /home/muellerw/gift-indexing-data/TSR500/b_1002_scaled_small_jpg.fts


  If I want to move this collection from

  ~muellerw/public_html/TSR500/


  to

  ~muellerw/public_html/TSR501/


  all I have to do is to replace each string

  images/TSR500


  by

  images/TSR501


  in urls2fts (and to restart the GIFT, of course).

  66..33..11..33..  JJAAVVAA:: ffiillee:://...... UURRLLss vvss.. hhttttpp::////.... UURRLLss

  In my humble experience, old (1.1.x) appletviewers had problems with
  http:// URLs. "Problems" means, that they simply were neither
  downloaded nor displayed. "Uncool", in a word. The opposite now
  happens with a recent (1.2) appletviewer. The images are shown, if
  they come by http from a location on the local host, they will not be
  shown, if they are specified by a file:/... URL. This renders giving
  defaults which work for everybody impossible. We give possibilites to
  specify all this in gift-add-collection.pl (try --help for viewing the
  options). However, if you are experiencing trouble you can simply do
  replacement of URLs in the url2fts file, as described in the section
  above.

  77..  HHooww ttoo aannaallyyzzee GGIIFFTT

  In any case, if you are interested in adding anything to the GIFT, we
  would be happy to hear from you. Please mail the maintainer, Wolfgang
  Mueller: Wolfgang.Mueller@cui.unige.ch . He will be happy to help.

  We use KDOC as a system documentation tool. This means we put JAVADoc
  like comments into the headers.  If you want to find out, what the
  different classes are doing we suggest you to run KDOC on
  libInvertedFile/include/*.h and just browse. Comments in the *.cc
  files are usually shorter and only geared towards implementation.
  Later we plan to include a script which synchronizes comments between
  the headers and the prototypes in the *.cc files.

  For the developer there are several alternatives:

  1. use as much code as possible Then inherit something from CAccessor
     (access to a given indexation method and database) and/or CQuery
     (using this method to actually process queries).  It would be
     useful then to understand how CSessionManager works.  If you have
     suggestions to change what we have, please notify us and help us
     release. We are currently improving the factory design for
     accessors. This should make enhancing GIFT much easier.

  2. Intermediate: use the parsing code, but not much more. The
     knowledge about MRML is in the ...handler functions in
     CSessionManager and CCommunicationHandler/Server. This might serve
     as an inspiration.

  3. As little code as possible: Simply take the DTD. If you make a
     server, try, if

     ___________________________________________________________________
     gift-mrml-client.pl
     ___________________________________________________________________


  works with your system. Try to make it work seamlessly with an unmodi-
  fied version of the charmer interface.

  Always: patches are welcome. If you are missing MRML functionality, we
  would be also very interested in providing this functionality or help-
  ing you in providing it yourself. Please contact the maintainer of
  GIFT/Charmer as named in the AUTHORS file.

  We are currently putting together some documentation which will
  include a "gift extension howto". This is going hand in hand with some
  redesign of the session manager, in order to make cooperation easier.

  88..  TTooDDoo LLiisstt

  As you see, the version number of GIFT is quite low. It implies, that
  we see many areas which need more work.

  88..11..  CCooddee aanndd ddooccuummeennttaattiioonn qquuaalliittyy



  +o  proper generation and installation of .info and .texinfo files


  +o  Write decent Makefile.am

  +o  improve CAccessor factory design for easy integration of foreign
     packages

  +o  Class graph of GIFT

  +o  HOWTOs for adding new query engines to GIFT

  +o

  +o  More documentation in the headers

  +o  Port MRML tech report from LaTeX to SGML to have it in high quality
     on line for development

  +o  Write script to synchronize prototype comments used in the cc file
     with the declaration in the include file.

  +o  Decent parameter parsing for all tools

  88..22..  KKnnoowwnn sshhoorrttccoommiinnggss


  +o  Exceptions: GIFT has been designed to throw exceptions in case of
     errors. Unfortunately this feature is still too buggy in 2.8.1 to
     be used. So, in many cases, GIFT aborts where it should not.  Some
     of the aborts were deliberately left in to make some other errors
     more visible for us during development.

  +o  Socket programming: get rid of silly workarounds: we sleep before
     closing down our socket. Works, but quite embarrassing.

  +o  Make server multithreaded (not easy, since it is not stateless)

  88..33..  FFeeaattuurreess ttoo bbee aaddddeedd ssoooonn


  +o  Adding images during runtime

  +o  Persistent administration of sessions, including interaction
     history. Here we will have to choose a database to link GIFT to
     (probably the GPLed version of MySQL or PostgreSQL). Otherwise, we
     waste simply too much time on issues which are not really image
     processing.

  +o  Integrate Christoph Giess's CORBA code

  88..44..  IInntteerreessttiinngg tthhiinnggss ttoo bbee aaddddeedd


  +o  More flexible interfaces for both GIFT and Charmer, allowing them
     better to grow together with MRML. (Partly fixed, there is a new
     interface for CQuery and CAccessor which is quite flexible. We now
     need something like that for Charmer, too).

  +o  APIs for XML parsing

  +o  Feature: granting rights to users

  +o  Making an MRML interface a plug-in to the GIMP

  88..55..  MMRRMMLL

  We would like to add (at least) the following features to MRML

  +o  Add image to collection during query

  +o  Query by annotation is soon to be integrated

  +o  Queries and transmission of segments

  +o  A format for transmitting relevance information for benchmarks

  +o  ...

  88..66..  RReesseeaarrcchh


  +o  search for better feature sets. The current feature set is still
     the very first version.

  +o  add fast flexible browsing support

  88..77..  CCaallll ffoorr ppaarrttiicciippaattiioonn

  It is obvious that a group of three people cannot attack all this.
  However, we are hoping for help both from users and scientists. While
  users probably will be most interested in treating as many file types
  as possible, scientists may profit from enhancing the capabilities of
  MRML as well as a persistent session management, which would permit
  learning about each user and thus improve query performance.


  99..  CCoonnttaacctt ddeettaaiillss

  99..11..  GGIIFFTT

  GIFT has been developed at

  Centre Universitaire d'Informatique

  Vision Group

  24, rue du General Dufour

  1211 Geneva 4

  by (in the "order of appearance" in the group) Dr. David McG. Squire,
  Wolfgang Mueller, Henning Mueller, Dr. Stephane Marchand-Maillet,
  supervised by Prof. Dr. Thierry Pun

  See the AUTHORS file for information on who did what



















