<< Index-update-concept [49/117] >>

Index-update-concept


Home / Projects / Lucene eLecture / Concepts & Specifications / Index-update-concept

Index-Update-Concept



General
Because the synchronous write and read on the same index is not possible (write-lock-exception), one must create a secondary temporary index, which is later merged with the main-index. To keep in mind is, that the main-index can be very large, when the whole data-volume of 360 GB is successfully indexed.
Theoretically one can always open and close the index. But it is expected, that the performance suffers from this.
The following procedure should only used for updating the index. Its not appropriate for the first-creation of the index, because the secondary index holds all data (main index empty) and so the merging-process can be very time-consuming. It is expected, that it is faster to mergo to a large index a smaller one, than inversely.
Detection if files are up-to-date  Check if the time of modification (last-modified) of the file has changed since the last indexing. So save this information in the index.
A content-based comparison (via MD5-Hash) when only the last-modified of a file was changed is not necessary. In this case, the whole file would be written to the index.

Complete procedure 
- Open IndexReader on main-index (IndexReader: contains (insanely) functionality to delete documents)
- Create Indexwriter on a new secondary- / temporary-index (IndexWriter: Contains functionality to add documents)
- Run recursivly over directories and files of the file-server 
- Fetch per query the associated indexed document (if present
- Case-Differentiation:
- 1st case: file is up-to-date according to the index
=> do nothing 
- 2nd case: file is in index, but not up-to-date
=> Delete old document from main-index
=> Write new document to secondary-index
- 3rd case: file is not in index
=> Write document to secondary-index
- 4th case: (special case) In the index a file is present which is not contained on the file-server anymore (was deleted)
=> Delete document from main-index
Comment: To detect this case a appropriate and efficient detection-strategy must be elaborated 
- main-index now only contains actual documents
- secondary-index now contains all new and updated documents
- merge secondary-index with main-index (Lucene provides for this the appropriate functionality)
- Delete secondary-index


Visitors PageClicks Valid XHTML 1.0! Valid CSS!

CanciAbout meSite-MapRightsContactJSWins (JavaScript-Desktop-System)© 2004-2013 by Markus Krebs