http://git-wip-us.apache.org/repos/asf/lucenenet/blob/6a95ad43/src/Lucene.Net.Benchmark/ByTask/Feeds/package.md ---------------------------------------------------------------------- diff --git a/src/Lucene.Net.Benchmark/ByTask/Feeds/package.md b/src/Lucene.Net.Benchmark/ByTask/Feeds/package.md new file mode 100644 index 0000000..ea5f904 --- /dev/null +++ b/src/Lucene.Net.Benchmark/ByTask/Feeds/package.md @@ -0,0 +1,19 @@ + + + +Sources for benchmark inputs: documents and queries. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/lucenenet/blob/6a95ad43/src/Lucene.Net.Benchmark/ByTask/Programmatic/package.md ---------------------------------------------------------------------- diff --git a/src/Lucene.Net.Benchmark/ByTask/Programmatic/package.md b/src/Lucene.Net.Benchmark/ByTask/Programmatic/package.md new file mode 100644 index 0000000..50799ee --- /dev/null +++ b/src/Lucene.Net.Benchmark/ByTask/Programmatic/package.md @@ -0,0 +1,19 @@ + + + +Sample performance test written programmatically - no algorithm file is needed here. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/lucenenet/blob/6a95ad43/src/Lucene.Net.Benchmark/ByTask/Stats/package.md ---------------------------------------------------------------------- diff --git a/src/Lucene.Net.Benchmark/ByTask/Stats/package.md b/src/Lucene.Net.Benchmark/ByTask/Stats/package.md new file mode 100644 index 0000000..fc3147a --- /dev/null +++ b/src/Lucene.Net.Benchmark/ByTask/Stats/package.md @@ -0,0 +1,19 @@ + + + + Statistics maintained when running benchmark tasks. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/lucenenet/blob/6a95ad43/src/Lucene.Net.Benchmark/ByTask/Tasks/package.md ---------------------------------------------------------------------- diff --git a/src/Lucene.Net.Benchmark/ByTask/Tasks/package.md b/src/Lucene.Net.Benchmark/ByTask/Tasks/package.md new file mode 100644 index 0000000..d1b7f42 --- /dev/null +++ b/src/Lucene.Net.Benchmark/ByTask/Tasks/package.md @@ -0,0 +1,19 @@ + + + +Extendable benchmark tasks. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/lucenenet/blob/6a95ad43/src/Lucene.Net.Benchmark/ByTask/Utils/package.md ---------------------------------------------------------------------- diff --git a/src/Lucene.Net.Benchmark/ByTask/Utils/package.md b/src/Lucene.Net.Benchmark/ByTask/Utils/package.md new file mode 100644 index 0000000..ca581f3 --- /dev/null +++ b/src/Lucene.Net.Benchmark/ByTask/Utils/package.md @@ -0,0 +1,19 @@ + + + +Utilities used for the benchmark, and for the reports. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/lucenenet/blob/6a95ad43/src/Lucene.Net.Benchmark/ByTask/package.md ---------------------------------------------------------------------- diff --git a/src/Lucene.Net.Benchmark/ByTask/package.md b/src/Lucene.Net.Benchmark/ByTask/package.md new file mode 100644 index 0000000..9efd463 --- /dev/null +++ b/src/Lucene.Net.Benchmark/ByTask/package.md @@ -0,0 +1,499 @@ + + + + + Benchmarking Lucene By Tasks + + +Benchmarking Lucene By Tasks. +
+ + This package provides "task based" performance benchmarking of Lucene. One can use the predefined benchmarks, or create new ones. + + Contained packages: + + + + + + + + + + + + + + + + + + + + + + + + + + +
**Package****Description**
[stats](stats/package-summary.html)Statistics maintained when running benchmark tasks.
[tasks](tasks/package-summary.html)Benchmark tasks.
[feeds](feeds/package-summary.html)Sources for benchmark inputs: documents and queries.
[utils](utils/package-summary.html)Utilities used for the benchmark, and for the reports.
[programmatic](programmatic/package-summary.html)Sample performance test written programmatically.
+ +## Table Of Contents + + 1. [Benchmarking By Tasks](#concept) 2. [How to use](#usage) 3. [Benchmark "algorithm"](#algorithm) 4. [Supported tasks/commands](#tasks) 5. [Benchmark properties](#properties) 6. [Example input algorithm and the result benchmark report.](#example) 7. [Results record counting clarified](#recsCounting) + +## Benchmarking By Tasks + + Benchmark Lucene using task primitives. + + A benchmark is composed of some predefined tasks, allowing for creating an index, adding documents, optimizing, searching, generating reports, and more. A benchmark run takes an "algorithm" file that contains a description of the sequence of tasks making up the run, and some properties defining a few additional characteristics of the benchmark run. + +## How to use + + Easiest way to run a benchmarks is using the predefined ant task: * ant run-task +- would run the `micro-standard.alg` "algorithm". * ant run-task -Dtask.alg=conf/compound-penalty.alg +- would run the `compound-penalty.alg` "algorithm". * ant run-task -Dtask.alg=[full-path-to-your-alg-file] +- would run `your perf test` "algorithm". * java org.apache.lucene.benchmark.byTask.programmatic.Sample +- would run a performance test programmatically - without using an alg file. This is less readable, and less convenient, but possible. + + You may find existing tasks sufficient for defining the benchmark *you* need, otherwise, you can extend the framework to meet your needs, as explained herein. + + Each benchmark run has a DocMaker and a QueryMaker. These two should usually match, so that "meaningful" queries are used for a certain collection. Properties set at the header of the alg file define which "makers" should be used. You can also specify your own makers, extending DocMaker and implementing QueryMaker. + +> **Note:** since 2.9, DocMaker is a concrete class which accepts a ContentSource. In most cases, you can use the DocMaker class to create Documents, while providing your own ContentSource implementation. For example, the current Benchmark package includes ContentSource implementations for TREC, Enwiki and Reuters collections, as well as others like LineDocSource which reads a 'line' file produced by WriteLineDocTask. + + Benchmark .alg file contains the benchmark "algorithm". The syntax is described below. Within the algorithm, you can specify groups of commands, assign them names, specify commands that should be repeated, do commands in serial or in parallel, and also control the speed of "firing" the commands. + + This allows, for instance, to specify that an index should be opened for update, documents should be added to it one by one but not faster than 20 docs a minute, and, in parallel with this, some N queries should be searched against that index, again, no more than 2 queries a second. You can have the searches all share an index reader, or have them each open its own reader and close it afterwords. + + If the commands available for use in the algorithm do not meet your needs, you can add commands by adding a new task under org.apache.lucene.benchmark.byTask.tasks - you should extend the PerfTask abstract class. Make sure that your new task class name is suffixed by Task. Assume you added the class "WonderfulTask" - doing so also enables the command "Wonderful" to be used in the algorithm. + + External classes: It is sometimes useful to invoke the benchmark package with your external alg file that configures the use of your own doc/query maker and or html parser. You can work this out without modifying the benchmark package code, by passing your class path with the benchmark.ext.classpath property: * ant run-task -Dtask.alg=[full-path-to-your-alg-file] -Dbenchmark.ext.classpath=/mydir/classes -Dtask.mem=512M External tasks: When writing your own tasks under a package other than **org.apache.lucene.benchmark.byTask.tasks** specify that package thru the alt.tasks.packages property. + +## Benchmark "algorithm" + + The following is an informal description of the supported syntax. + +1. **Measuring**: When a command is executed, statistics for the elapsed + execution time and memory consumption are collected. + At any time, those statistics can be printed, using one of the + available ReportTasks. + +2. **Comments** start with '#'. + +3. **Serial** sequences are enclosed within '{ }'. + +4. **Parallel** sequences are enclosed within + '[ ]' + +5. **Sequence naming:** To name a sequence, put + '"name"' just after + '{' or '['. + +Example - { "ManyAdds" AddDoc } : 1000000 - + would + name the sequence of 1M add docs "ManyAdds", and this name would later appear + in statistic reports. + If you don't specify a name for a sequence, it is given one: you can see it as + the algorithm is printed just before benchmark execution starts. + +6. **Repeating**: + To repeat sequence tasks N times, add ': N' just + after the + sequence closing tag - '}' or + ']' or '>'. + +Example - [ AddDoc ] : 4 - would do 4 addDoc + in parallel, spawning 4 threads at once. + +Example - [ AddDoc AddDoc ] : 4 - would do + 8 addDoc in parallel, spawning 8 threads at once. + +Example - { AddDoc } : 30 - would do addDoc + 30 times in a row. + +Example - { AddDoc AddDoc } : 30 - would do + addDoc 60 times in a row. + +**Exhaustive repeating**: use * instead of + a number to repeat exhaustively. + This is sometimes useful, for adding as many files as a doc maker can create, + without iterating over the same file again, especially when the exact + number of documents is not known in advance. For instance, TREC files extracted + from a zip file. Note: when using this, you must also set + doc.maker.forever to false. + +Example - { AddDoc } : * - would add docs + until the doc maker is "exhausted". + +7. **Command parameter**: a command can optionally take a single parameter. + If the certain command does not support a parameter, or if the parameter is of + the wrong type, + reading the algorithm will fail with an exception and the test would not start. + Currently the following tasks take optional parameters: + + * **AddDoc** takes a numeric parameter, indicating the required size of + added document. Note: if the DocMaker implementation used in the test + does not support makeDoc(size), an exception would be thrown and the test + would fail. + + * **DeleteDoc** takes numeric parameter, indicating the docid to be + deleted. The latter is not very useful for loops, since the docid is + fixed, so for deletion in loops it is better to use the + `doc.delete.step` property. + + * **SetProp** takes a `name,value` mandatory param, + ',' used as a separator. + + * **SearchTravRetTask** and **SearchTravTask** take a numeric + parameter, indicating the required traversal size. + + * **SearchTravRetLoadFieldSelectorTask** takes a string + parameter: a comma separated list of Fields to load. + + * **SearchTravRetHighlighterTask** takes a string + parameter: a comma separated list of parameters to define highlighting. See that + tasks javadocs for more information + +Example - AddDoc(2000) - would add a document + of size 2000 (~bytes). + +See conf/task-sample.alg for how this can be used, for instance, to check + which is faster, adding + many smaller documents, or few larger documents. + Next candidates for supporting a parameter may be the Search tasks, + for controlling the query size. + +8. **Statistic recording elimination**: - a sequence can also end with + '>', + in which case child tasks would not store their statistics. + This can be useful to avoid exploding stats data, for adding say 1M docs. + +Example - { "ManyAdds" AddDoc > : 1000000 - + would add million docs, measure that total, but not save stats for each addDoc. + +Notice that the granularity of System.currentTimeMillis() (which is used + here) is system dependant, + and in some systems an operation that takes 5 ms to complete may show 0 ms + latency time in performance measurements. + Therefore it is sometimes more accurate to look at the elapsed time of a larger + sequence, as demonstrated here. + +9. **Rate**: + To set a rate (ops/sec or ops/min) for a sequence, add + ': N : R' just after sequence closing tag. + This would specify repetition of N with rate of R operations/sec. + Use 'R/sec' or + 'R/min' + to explicitly specify that the rate is per second or per minute. + The default is per second, + +Example - [ AddDoc ] : 400 : 3 - would do 400 + addDoc in parallel, starting up to 3 threads per second. + +Example - { AddDoc } : 100 : 200/min - would + do 100 addDoc serially, + waiting before starting next add, if otherwise rate would exceed 200 adds/min. + +10. **Disable Counting**: Each task executed contributes to the records count. + This count is reflected in reports under recs/s and under recsPerRun. + Most tasks count 1, some count 0, and some count more. + (See [Results record counting clarified](#recsCounting) for more details.) + It is possible to disable counting for a task by preceding it with -. + +Example - -CreateIndex - would count 0 while + the default behavior for CreateIndex is to count 1. + +11. **Command names**: Each class "AnyNameTask" in the + package org.apache.lucene.benchmark.byTask.tasks, + that extends PerfTask, is supported as command "AnyName" that can be + used in the benchmark "algorithm" description. + This allows to add new commands by just adding such classes. + +## Supported tasks/commands + + Existing tasks can be divided into a few groups: regular index/search work tasks, report tasks, and control tasks. + +1. **Report tasks**: There are a few Report commands for generating reports. + Only task runs that were completed are reported. + (The 'Report tasks' themselves are not measured and not reported.) + + * RepAll - all (completed) task runs. + + * RepSumByName - all statistics, + aggregated by name. So, if AddDoc was executed 2000 times, + only 1 report line would be created for it, aggregating all those + 2000 statistic records. + + * RepSelectByPref   prefixWord - all + records for tasks whose name start with + prefixWord. + + * RepSumByPref   prefixWord - all + records for tasks whose name start with + prefixWord, + aggregated by their full task name. + + * RepSumByNameRound - all statistics, + aggregated by name and by Round. + So, if AddDoc was executed 2000 times in each of 3 + rounds, 3 report lines would be + created for it, + aggregating all those 2000 statistic records in each round. + See more about rounds in the NewRound + command description below. + + * RepSumByPrefRound   prefixWord - + similar to RepSumByNameRound, + just that only tasks whose name starts with + prefixWord are included. + + If needed, additional reports can be added by extending the abstract class + ReportTask, and by + manipulating the statistics data in Points and TaskStats. + +2. **Control tasks**: Few of the tasks control the benchmark algorithm + all over: + + * ClearStats - clears the entire statistics. + Further reports would only include task runs that would start after this + call. + + * NewRound - virtually start a new round of + performance test. + Although this command can be placed anywhere, it mostly makes sense at + the end of an outermost sequence. + +This increments a global "round counter". All task runs that + would start now would + record the new, updated round counter as their round number. + This would appear in reports. + In particular, see RepSumByNameRound above. + +An additional effect of NewRound, is that numeric and boolean + properties defined (at the head + of the .alg file) as a sequence of values, e.g. + merge.factor=mrg:10:100:10:100 would + increment (cyclic) to the next value. + Note: this would also be reflected in the reports, in this case under a + column that would be named "mrg". + + * ResetInputs - DocMaker and the + various QueryMakers + would reset their counters to start. + The way these Maker interfaces work, each call for makeDocument() + or makeQuery() creates the next document or query + that it "knows" to create. + If that pool is "exhausted", the "maker" start over again. + The ResetInputs command + therefore allows to make the rounds comparable. + It is therefore useful to invoke ResetInputs together with NewRound. + + * ResetSystemErase - reset all index + and input data and call gc. + Does NOT reset statistics. This contains ResetInputs. + All writers/readers are nullified, deleted, closed. + Index is erased. + Directory is erased. + You would have to call CreateIndex once this was called... + + * ResetSystemSoft - reset all + index and input data and call gc. + Does NOT reset statistics. This contains ResetInputs. + All writers/readers are nullified, closed. + Index is NOT erased. + Directory is NOT erased. + This is useful for testing performance on an existing index, + for instance if the construction of a large index + took a very long time and now you would to test + its search or update performance. + +3. Other existing tasks are quite straightforward and would + just be briefly described here. + + * CreateIndex and + OpenIndex both leave the + index open for later update operations. + CloseIndex would close it. + + OpenReader, similarly, would + leave an index reader open for later search operations. + But this have further semantics. + If a Read operation is performed, and an open reader exists, + it would be used. + Otherwise, the read operation would open its own reader + and close it when the read operation is done. + This allows testing various scenarios - sharing a reader, + searching with "cold" reader, with "warmed" reader, etc. + The read operations affected by this are: + Warm, + Search, + SearchTrav (search and traverse), + and SearchTravRet (search + and traverse and retrieve). + Notice that each of the 3 search task types maintains + its own queryMaker instance. + + CommitIndex and + ForceMerge can be used to commit + changes to the index then merge the index segments. The integer + parameter specifies how many segments to merge down to (default + 1). + + WriteLineDoc prepares a 'line' + file where each line holds a document with *title*, + *date* and *body* elements, separated by [TAB]. + A line file is useful if one wants to measure pure indexing + performance, without the overhead of parsing the data. + + You can use LineDocSource as a ContentSource over a 'line' + file. + + ConsumeContentSource consumes + a ContentSource. Useful for e.g. testing a ContentSource + performance, without the overhead of preparing a Document + out of it. + +## Benchmark properties + + Properties are read from the header of the .alg file, and define several parameters of the performance test. As mentioned above for the NewRound task, numeric and boolean properties that are defined as a sequence of values, e.g. merge.factor=mrg:10:100:10:100 would increment (cyclic) to the next value, when NewRound is called, and would also appear as a named column in the reports (column name would be "mrg" in this example). + + Some of the currently defined properties are: + +1. analyzer - full + class name for the analyzer to use. + Same analyzer would be used in the entire test. + +2. directory - valid values are + This tells which directory to use for the performance test. + +3. **Index work parameters**: + Multi int/boolean values would be iterated with calls to NewRound. + There would be also added as columns in the reports, first string in the + sequence is the column name. + (Make sure it is no shorter than any value in the sequence). + + * max.buffered + +Example: max.buffered=buf:10:10:100:100 - + this would define using maxBufferedDocs of 10 in iterations 0 and 1, + and 100 in iterations 2 and 3. + + * merge.factor - which + merge factor to use. + + * compound - whether the index is + using the compound format or not. Valid values are "true" and "false". + + Here is a list of currently defined properties: + +1. **Root directory for data and indexes:** +2. * work.dir (default is System property "benchmark.work.dir" or "work".) + +3. **Docs and queries creation:** +4. * analyzer + + * doc.maker + + * doc.maker.forever + + * html.parser + + * doc.stored + + * doc.tokenized + + * doc.term.vector + + * doc.term.vector.positions + + * doc.term.vector.offsets + + * doc.store.body.bytes + + * docs.dir + + * query.maker + + * file.query.maker.file + + * file.query.maker.default.field + + * search.num.hits + +5. **Logging**: + + * log.step + + * log.step.[class name]Task ie log.step.DeleteDoc (e.g. log.step.Wonderful for the WonderfulTask example above). + + * log.queries + + * task.max.depth.log + +6. **Index writing**: + + * compound + + * merge.factor + + * max.buffered + + * directory + + * ram.flush.mb + +7. **Doc deletion**: + + * doc.delete.step + +8. **Spatial**: Numerous; see spatial.alg + +9. **Task alternative packages**: + + * alt.tasks.packages + - comma separated list of additional packages where tasks classes will be looked for + when not found in the default package (that of PerfTask). If the same task class + appears in more than one package, the package indicated first in this list will be used. + + For sample use of these properties see the *.alg files under conf. + +## Example input algorithm and the result benchmark report + + The following example is in conf/sample.alg: # -------------------------------------------------------- # # Sample: what is the effect of doc size on indexing time? # # There are two parts in this test: # - PopulateShort adds 2N documents of length L # - PopulateLong adds N documents of length 2L # Which one would be faster? # The comparison is done twice. # # -------------------------------------------------------- # ------------------------------------------------------------------------------------- # multi val params are iterated by NewRound's, added to reports, start with column name. merge.factor=mrg:10:20 max.buffered=buf:100:1000 compound=true analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer directory=FSDirectory doc.stored=true doc.tokenized=true doc.term.vector=false doc.add.log.step=500 docs.dir=reuters-out doc.maker=org.apache.lucene.benchmark.byTask.feeds.SimpleDocMaker query.maker=org.apache.lucene.bench mark.byTask.feeds.SimpleQueryMaker # task at this depth or less would print when they start task.max.depth.log=2 log.queries=false # ------------------------------------------------------------------------------------- { { "PopulateShort" CreateIndex { AddDoc(4000) > : 20000 Optimize CloseIndex > ResetSystemErase { "PopulateLong" CreateIndex { AddDoc(8000) > : 10000 Optimize CloseIndex > ResetSystemErase NewRound } : 2 RepSumByName RepSelectByPref Populate + + The command line for running this sample: +`ant run-task -Dtask.alg=conf/sample.alg` + + The output report from running this test contains the following: Operation round mrg buf runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem PopulateShort 0 10 100 1 20003 119.6 167.26 12,959,120 14,241,792 PopulateLong - - 0 10 100 - - 1 - - 10003 - - - 74.3 - - 134.57 - 17,085,208 - 20,635,648 PopulateShort 1 20 1000 1 20003 143.5 139.39 63,982,040 94,756,864 PopulateLong - - 1 20 1000 - - 1 - - 10003 - - - 77.0 - - 129.92 - 87,309,608 - 100,831,232 + +## Results record counting clarified + + Two columns in the results table indicate records counts: records-per-run and records-per-second. What does it mean? + + Almost every task gets 1 in this count just for being executed. Task sequences aggregate the counts of their child tasks, plus their own count of 1. So, a task sequence containing 5 other task sequences, each running a single other task 10 times, would have a count of 1 + 5 * (1 + 10) = 56. + + The traverse and retrieve tasks "count" more: a traverse task would add 1 for each traversed result (hit), and a retrieve task would additionally add 1 for each retrieved doc. So, regular Search would count 1, SearchTrav that traverses 10 hits would count 11, and a SearchTravRet task that retrieves (and traverses) 10, would count 21. + + Confusing? this might help: always examine the `elapsedSec` column, and always compare "apples to apples", .i.e. it is interesting to check how the `rec/s` changed for the same task (or sequence) between two different runs, but it is not very useful to know how the `rec/s` differs between `Search` and `SearchTrav` tasks. For the latter, `elapsedSec` would bring more insight. + +
+
 
+ + \ No newline at end of file http://git-wip-us.apache.org/repos/asf/lucenenet/blob/6a95ad43/src/Lucene.Net.Benchmark/Quality/Trec/package.md ---------------------------------------------------------------------- diff --git a/src/Lucene.Net.Benchmark/Quality/Trec/package.md b/src/Lucene.Net.Benchmark/Quality/Trec/package.md new file mode 100644 index 0000000..d535da5 --- /dev/null +++ b/src/Lucene.Net.Benchmark/Quality/Trec/package.md @@ -0,0 +1,19 @@ + + + +Utilities for Trec related quality benchmarking, feeding from Trec Topics and QRels inputs. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/lucenenet/blob/6a95ad43/src/Lucene.Net.Benchmark/Quality/Utils/package.md ---------------------------------------------------------------------- diff --git a/src/Lucene.Net.Benchmark/Quality/Utils/package.md b/src/Lucene.Net.Benchmark/Quality/Utils/package.md new file mode 100644 index 0000000..da1d900 --- /dev/null +++ b/src/Lucene.Net.Benchmark/Quality/Utils/package.md @@ -0,0 +1,19 @@ + + + +Miscellaneous utilities for search quality benchmarking: query parsing, submission reports. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/lucenenet/blob/6a95ad43/src/Lucene.Net.Benchmark/Quality/package.md ---------------------------------------------------------------------- diff --git a/src/Lucene.Net.Benchmark/Quality/package.md b/src/Lucene.Net.Benchmark/Quality/package.md new file mode 100644 index 0000000..5ac7b84 --- /dev/null +++ b/src/Lucene.Net.Benchmark/Quality/package.md @@ -0,0 +1,73 @@ + + + +## Search Quality Benchmarking. + +This package allows to benchmark search quality of a Lucene application. + +In order to use this package you should provide: + +* A [IndexSearcher]({@docRoot}/../core/org/apache/lucene/search/IndexSearcher.html). +* [Quality queries](QualityQuery.html). +* [Judging object](Judge.html). +* [Reporting object](utils/SubmissionReport.html). + +For benchmarking TREC collections with TREC QRels, take a look at the +[trec package](trec/package-summary.html). + +Here is a sample code used to run the TREC 2006 queries 701-850 on the .Gov2 collection: + + File topicsFile = new File("topics-701-850.txt"); + File qrelsFile = new File("qrels-701-850.txt"); + IndexReader ir = DirectoryReader.open(directory): + IndexSearcher searcher = new IndexSearcher(ir); + + int maxResults = 1000; + String docNameField = "docname"; + + PrintWriter logger = new PrintWriter(System.out,true); + + // use trec utilities to read trec topics into quality queries + TrecTopicsReader qReader = new TrecTopicsReader(); + QualityQuery qqs[] = qReader.readQueries(new BufferedReader(new FileReader(topicsFile))); + + // prepare judge, with trec utilities that read from a QRels file + Judge judge = new TrecJudge(new BufferedReader(new FileReader(qrelsFile))); + + // validate topics & judgments match each other + judge.validateData(qqs, logger); + + // set the parsing of quality queries into Lucene queries. + QualityQueryParser qqParser = new SimpleQQParser("title", "body"); + + // run the benchmark + QualityBenchmark qrun = new QualityBenchmark(qqs, qqParser, searcher, docNameField); + SubmissionReport submitLog = null; + QualityStats stats[] = qrun.execute(maxResults, judge, submitLog, logger); + + // print an avarage sum of the results + QualityStats avg = QualityStats.average(stats); + avg.log("SUMMARY",2,logger, " "); + +Some immediate ways to modify this program to your needs are: + +* To run on different formats of queries and judgements provide your own + [Judge](Judge.html) and + [Quality queries](QualityQuery.html). +* Create sophisticated Lucene queries by supplying a different + [Quality query parser](QualityQueryParser.html). \ No newline at end of file http://git-wip-us.apache.org/repos/asf/lucenenet/blob/6a95ad43/src/Lucene.Net.Benchmark/Utils/package.md ---------------------------------------------------------------------- diff --git a/src/Lucene.Net.Benchmark/Utils/package.md b/src/Lucene.Net.Benchmark/Utils/package.md new file mode 100644 index 0000000..03bb3f4 --- /dev/null +++ b/src/Lucene.Net.Benchmark/Utils/package.md @@ -0,0 +1,19 @@ + + + +Benchmark Utility functions. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/lucenenet/blob/6a95ad43/src/Lucene.Net.Benchmark/overview.md ---------------------------------------------------------------------- diff --git a/src/Lucene.Net.Benchmark/overview.md b/src/Lucene.Net.Benchmark/overview.md new file mode 100644 index 0000000..b786443 --- /dev/null +++ b/src/Lucene.Net.Benchmark/overview.md @@ -0,0 +1,22 @@ + + + + benchmark + + + benchmark \ No newline at end of file http://git-wip-us.apache.org/repos/asf/lucenenet/blob/6a95ad43/src/Lucene.Net.Benchmark/package.md ---------------------------------------------------------------------- diff --git a/src/Lucene.Net.Benchmark/package.md b/src/Lucene.Net.Benchmark/package.md new file mode 100644 index 0000000..b96f567 --- /dev/null +++ b/src/Lucene.Net.Benchmark/package.md @@ -0,0 +1,46 @@ + + + + + Lucene Benchmarking Package + + +The benchmark contribution contains tools for benchmarking Lucene using standard, freely available corpora. +
+ + ANT will + download the corpus automatically, place it in a temp directory and then unpack it to the working.dir directory specified in the build. + The temp directory + and working directory can be safely removed after a run. However, the next time the task is run, it will need to download the files again. + + Classes implementing the Benchmarker interface should have a no-argument constructor if they are to be used with the Driver class. The Driver + class is provided for convenience only. Feel free to implement your own main class for your benchmarker. + + The StandardBenchmarker is meant to be just that, a standard that runs out of the box with no configuration or changes needed. + Other benchmarking classes may derive from it to provide alternate views or to take in command line options. When reporting benchmarking runs + you should state any alterations you have made. + + To run the short version of the StandardBenchmarker, call "ant run-micro-standard". This should take a minute or so to complete and give you a preliminary idea of how your change affects the code + + To run the long version of the StandardBenchmarker, call "ant run-standard". This takes considerably longer. + + The original code for these classes was donated by Andrzej Bialecki at http://issues.apache.org/jira/browse/LUCENE-675 and has been updated by Grant Ingersoll to make some parts of the code reusable in other benchmarkers +
+
 
+ + \ No newline at end of file http://git-wip-us.apache.org/repos/asf/lucenenet/blob/6a95ad43/src/Lucene.Net.Classification/Utils/package.md ---------------------------------------------------------------------- diff --git a/src/Lucene.Net.Classification/Utils/package.md b/src/Lucene.Net.Classification/Utils/package.md new file mode 100644 index 0000000..39d41ec --- /dev/null +++ b/src/Lucene.Net.Classification/Utils/package.md @@ -0,0 +1,20 @@ + + + + +Utilities for evaluation, data preparation, etc. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/lucenenet/blob/6a95ad43/src/Lucene.Net.Classification/overview.md ---------------------------------------------------------------------- diff --git a/src/Lucene.Net.Classification/overview.md b/src/Lucene.Net.Classification/overview.md new file mode 100644 index 0000000..fa0f140 --- /dev/null +++ b/src/Lucene.Net.Classification/overview.md @@ -0,0 +1,22 @@ + + + + classification + + +Provides a classification module which leverages Lucene index information. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/lucenenet/blob/6a95ad43/src/Lucene.Net.Classification/package.md ---------------------------------------------------------------------- diff --git a/src/Lucene.Net.Classification/package.md b/src/Lucene.Net.Classification/package.md new file mode 100644 index 0000000..bb44fbe --- /dev/null +++ b/src/Lucene.Net.Classification/package.md @@ -0,0 +1,20 @@ + + +Uses already seen data (the indexed documents) to classify new documents. +Currently only contains a (simplistic) Lucene based Naive Bayes classifier +and a k-Nearest Neighbor classifier \ No newline at end of file http://git-wip-us.apache.org/repos/asf/lucenenet/blob/6a95ad43/src/Lucene.Net.Codecs/Appending/package.md ---------------------------------------------------------------------- diff --git a/src/Lucene.Net.Codecs/Appending/package.md b/src/Lucene.Net.Codecs/Appending/package.md new file mode 100644 index 0000000..eaf6006 --- /dev/null +++ b/src/Lucene.Net.Codecs/Appending/package.md @@ -0,0 +1,19 @@ + + + +Codec for on append-only outputs, such as plain output streams and append-only filesystems. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/lucenenet/blob/6a95ad43/src/Lucene.Net.Codecs/BlockTerms/package.md ---------------------------------------------------------------------- diff --git a/src/Lucene.Net.Codecs/BlockTerms/package.md b/src/Lucene.Net.Codecs/BlockTerms/package.md new file mode 100644 index 0000000..7174622 --- /dev/null +++ b/src/Lucene.Net.Codecs/BlockTerms/package.md @@ -0,0 +1,19 @@ + + + +Pluggable term index / block terms dictionary implementations. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/lucenenet/blob/6a95ad43/src/Lucene.Net.Codecs/Bloom/package.md ---------------------------------------------------------------------- diff --git a/src/Lucene.Net.Codecs/Bloom/package.md b/src/Lucene.Net.Codecs/Bloom/package.md new file mode 100644 index 0000000..c767523 --- /dev/null +++ b/src/Lucene.Net.Codecs/Bloom/package.md @@ -0,0 +1,19 @@ + + + +Codec PostingsFormat for fast access to low-frequency terms such as primary key fields. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/lucenenet/blob/6a95ad43/src/Lucene.Net.Codecs/DiskDV/package.md ---------------------------------------------------------------------- diff --git a/src/Lucene.Net.Codecs/DiskDV/package.md b/src/Lucene.Net.Codecs/DiskDV/package.md new file mode 100644 index 0000000..c9264b6 --- /dev/null +++ b/src/Lucene.Net.Codecs/DiskDV/package.md @@ -0,0 +1,19 @@ + + + +DocValuesFormat that accesses values directly from disk. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/lucenenet/blob/6a95ad43/src/Lucene.Net.Codecs/IntBlock/package.md ---------------------------------------------------------------------- diff --git a/src/Lucene.Net.Codecs/IntBlock/package.md b/src/Lucene.Net.Codecs/IntBlock/package.md new file mode 100644 index 0000000..2deb968 --- /dev/null +++ b/src/Lucene.Net.Codecs/IntBlock/package.md @@ -0,0 +1,19 @@ + + + +Intblock: base support for fixed or variable length block integer encoders \ No newline at end of file http://git-wip-us.apache.org/repos/asf/lucenenet/blob/6a95ad43/src/Lucene.Net.Codecs/Memory/package.md ---------------------------------------------------------------------- diff --git a/src/Lucene.Net.Codecs/Memory/package.md b/src/Lucene.Net.Codecs/Memory/package.md new file mode 100644 index 0000000..6410444 --- /dev/null +++ b/src/Lucene.Net.Codecs/Memory/package.md @@ -0,0 +1,19 @@ + + + +Term dictionary, DocValues or Postings formats that are read entirely into memory. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/lucenenet/blob/6a95ad43/src/Lucene.Net.Codecs/Pulsing/package.md ---------------------------------------------------------------------- diff --git a/src/Lucene.Net.Codecs/Pulsing/package.md b/src/Lucene.Net.Codecs/Pulsing/package.md new file mode 100644 index 0000000..43f9564 --- /dev/null +++ b/src/Lucene.Net.Codecs/Pulsing/package.md @@ -0,0 +1,19 @@ + + + +Pulsing Codec: inlines low frequency terms' postings into terms dictionary. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/lucenenet/blob/6a95ad43/src/Lucene.Net.Codecs/Sep/package.md ---------------------------------------------------------------------- diff --git a/src/Lucene.Net.Codecs/Sep/package.md b/src/Lucene.Net.Codecs/Sep/package.md new file mode 100644 index 0000000..704450b --- /dev/null +++ b/src/Lucene.Net.Codecs/Sep/package.md @@ -0,0 +1,19 @@ + + + +Sep: base support for separate files (doc,frq,pos,skp,pyl) \ No newline at end of file http://git-wip-us.apache.org/repos/asf/lucenenet/blob/6a95ad43/src/Lucene.Net.Codecs/SimpleText/package.md ---------------------------------------------------------------------- diff --git a/src/Lucene.Net.Codecs/SimpleText/package.md b/src/Lucene.Net.Codecs/SimpleText/package.md new file mode 100644 index 0000000..9e45eb2 --- /dev/null +++ b/src/Lucene.Net.Codecs/SimpleText/package.md @@ -0,0 +1,19 @@ + + + +Simpletext Codec: writes human readable postings. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/lucenenet/blob/6a95ad43/src/Lucene.Net.Codecs/overview.md ---------------------------------------------------------------------- diff --git a/src/Lucene.Net.Codecs/overview.md b/src/Lucene.Net.Codecs/overview.md new file mode 100644 index 0000000..0070a37 --- /dev/null +++ b/src/Lucene.Net.Codecs/overview.md @@ -0,0 +1,18 @@ + + +Collection of useful codec, postings format and terms dictionary implementations. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/lucenenet/blob/6a95ad43/src/Lucene.Net.Demo/Facet/package.md ---------------------------------------------------------------------- diff --git a/src/Lucene.Net.Demo/Facet/package.md b/src/Lucene.Net.Demo/Facet/package.md new file mode 100644 index 0000000..d6ff7d2 --- /dev/null +++ b/src/Lucene.Net.Demo/Facet/package.md @@ -0,0 +1,19 @@ + + + +Facets example code. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/lucenenet/blob/6a95ad43/src/Lucene.Net.Demo/overview.md ---------------------------------------------------------------------- diff --git a/src/Lucene.Net.Demo/overview.md b/src/Lucene.Net.Demo/overview.md new file mode 100644 index 0000000..ad0bdd0 --- /dev/null +++ b/src/Lucene.Net.Demo/overview.md @@ -0,0 +1,132 @@ + + +The demo module offers simple example code to show the features of Lucene. + +# Apache Lucene - Building and Installing the Basic Demo + +
+ +* [About this Document](#About_this_Document) +* [About the Demo](#About_the_Demo) +* [Setting your CLASSPATH](#Setting_your_CLASSPATH) +* [Indexing Files](#Indexing_Files) +* [About the code](#About_the_code) +* [Location of the source](#Location_of_the_source) +* [IndexFiles](#IndexFiles) +* [Searching Files](#Searching_Files) +
+ +## About this Document + +
+ +This document is intended as a "getting started" guide to using and running the Lucene demos. It walks you through some basic installation and configuration. + +
+ +## About the Demo + +
+ +The Lucene command-line demo code consists of an application that demonstrates various functionalities of Lucene and how you can add Lucene to your applications. + +
+ +## Setting your CLASSPATH + +
+ +First, you should [download](http://www.apache.org/dyn/closer.cgi/lucene/java/) the latest Lucene distribution and then extract it to a working directory. + +You need four JARs: the Lucene JAR, the queryparser JAR, the common analysis JAR, and the Lucene demo JAR. You should see the Lucene JAR file in the core/ directory you created when you extracted the archive -- it should be named something like lucene-core-{version}.jar. You should also see files called lucene-queryparser-{version}.jar, lucene-analyzers-common-{version}.jar and lucene-demo-{version}.jar under queryparser, analysis/common/ and demo/, respectively. + +Put all four of these files in your Java CLASSPATH. + +
+ +## Indexing Files + +
+ +Once you've gotten this far you're probably itching to go. Let's **build an index!** Assuming you've set your CLASSPATH correctly, just type: + + java org.apache.lucene.demo.IndexFiles -docs {path-to-lucene}/src + +This will produce a subdirectory called index +which will contain an index of all of the Lucene source code. + +To **search the index** type: + + java org.apache.lucene.demo.SearchFiles + +You'll be prompted for a query. Type in a gibberish or made up word (for example: +"supercalifragilisticexpialidocious"). +You'll see that there are no maching results in the lucene source code. +Now try entering the word "string". That should return a whole bunch +of documents. The results will page at every tenth result and ask you whether +you want more results.
+ +## About the code + +
+ +In this section we walk through the sources behind the command-line Lucene demo: where to find them, their parts and their function. This section is intended for Java developers wishing to understand how to use Lucene in their applications. + +
+ +## Location of the source + +
+ +The files discussed here are linked into this documentation directly: * [IndexFiles.java](https://github.com/apache/lucenenet/blob/{tag}/src/Lucene.Net.Demo/IndexFiles.cs): code to create a Lucene index. [SearchFiles.java](https://github.com/apache/lucenenet/blob/{tag}/src/Lucene.Net.Demo/SearchFiles.cs): code to search a Lucene index. + +
+ +## IndexFiles + +
+ +As we discussed in the previous walk-through, the [IndexFiles](https://github.com/apache/lucenenet/blob/{tag}/src/Lucene.Net.Demo/IndexFiles.cs) class creates a Lucene Index. Let's take a look at how it does this. + +The main() method parses the command-line parameters, then in preparation for instantiating [](xref:Lucene.Net.Index.IndexWriter IndexWriter), opens a [](xref:Lucene.Net.Store.Directory Directory), and instantiates [](xref:Lucene.Net.Analysis.Standard.StandardAnalyzer StandardAnalyzer) and [](xref:Lucene.Net.Index.IndexWriterConfig IndexWriterConfig). + +The value of the -index command-line parameter is the name of the filesystem directory where all index information should be stored. If IndexFiles is invoked with a relative path given in the -index command-line parameter, or if the -index command-line parameter is not given, causing the default relative index path "index" to be used, the index path will be created as a subdirectory of the current working directory (if it does not already exist). On some platforms, the index path may be created in a different directory (such as the user's home directory). + +The -docs command-line parameter value is the location of the directory containing files to be indexed. + +The -update command-line parameter tells IndexFiles not to delete the index if it already exists. When -update is not given, IndexFiles will first wipe the slate clean before indexing any documents. + +Lucene [](xref:Lucene.Net.Store.Directory Directory)s are used by the IndexWriter to store information in the index. In addition to the [](xref:Lucene.Net.Store.FSDirectory FSDirectory) implementation we are using, there are several other Directory subclasses that can write to RAM, to databases, etc. + +Lucene [](xref:Lucene.Net.Analysis.Analyzer Analyzer)s are processing pipelines that break up text into indexed tokens, a.k.a. terms, and optionally perform other operations on these tokens, e.g. downcasing, synonym insertion, filtering out unwanted tokens, etc. The Analyzer we are using is StandardAnalyzer, which creates tokens using the Word Break rules from the Unicode Text Segmentation algorithm specified in [Unicode Standard Annex #29](http://unicode.org/reports/tr29/); converts tokens to lowercase; and then filters out stopwords. Stopwords are common language words such as articles (a, an, the, etc.) and other tokens that may have less value for searching. It should be noted that there are different rules for every language, and you should use the proper analyzer for each. Lucene currently provides Analyzers for a number of different languages (see the javadocs under [lucene/analysis/common/src/java/org/apache/lucene/ analysis](../analyzers-common/overview-summary.html)). + +The IndexWriterConfig instance holds all configuration for IndexWriter. For example, we set the OpenMode to use here based on the value of the -update command-line parameter. + +Looking further down in the file, after IndexWriter is instantiated, you should see the indexDocs() code. This recursive function crawls the directories and creates [](xref:Lucene.Net.Documents.Document Document) objects. The Document is simply a data object to represent the text content from the file as well as its creation time and location. These instances are added to the IndexWriter. If the -update command-line parameter is given, the IndexWriterConfig OpenMode will be set to [](xref:Lucene.Net.Index.IndexWriterConfig.OpenMode.CREATE_OR_APPEND OpenMode.CREATE_OR_APPEND), and rather than adding documents to the index, the IndexWriter will **update** them in the index by attempting to find an already-indexed document with the same identifier (i n our case, the file path serves as the identifier); deleting it from the index if it exists; and then adding the new document to the index. + +
+ +## Searching Files + +
+ +The [SearchFiles](https://github.com/apache/lucenenet/blob/{tag}/src/Lucene.Net.Demo/SearchFiles.cs) class is quite simple. It primarily collaborates with an [](xref:Lucene.Net.Search.IndexSearcher IndexSearcher), [](xref:Lucene.Net.Analysis.Standard.StandardAnalyzer StandardAnalyzer), (which is used in the [IndexFiles](https://github.com/apache/lucenenet/blob/{tag}/src/Lucene.Net.Demo/IndexFiles.cs) class as well) and a [](xref:Lucene.Net.QueryParsers.Classic.QueryParser QueryParser). The query parser is constructed with an analyzer used to interpret your query text in the same way the documents are interpreted: finding word boundaries, downcasing, and removing useless words like 'a', 'an' and 'the'. The [](xref:Lucene.Net.Search.Query) object contains the results from the [](xref:Lucene.Net.QueryParsers.Classic.QueryParser QueryParser) which is passed to the searcher. Note that it's also possible to programmatically construct a rich [](xref:Lucene.Net.Search.Query) object without using the query parser. The query parser just enables decoding the [ Lucene query syntax](../queryparser/org/apache/lucene/queryparser/classic/package-summary.html#package_description) into the corresponding [](xref:Lucene.Net.Search.Query Query) object. + +SearchFiles uses the [](xref:Lucene.Net.Search.IndexSearcher.Search(Lucene.Net.Search.Query,int) IndexSearcher.Search(query,n)) method that returns [](xref:Lucene.Net.Search.TopDocs TopDocs) with max n hits. The results are printed in pages, sorted by score (i.e. relevance). + +
\ No newline at end of file http://git-wip-us.apache.org/repos/asf/lucenenet/blob/6a95ad43/src/Lucene.Net.Demo/package.md ---------------------------------------------------------------------- diff --git a/src/Lucene.Net.Demo/package.md b/src/Lucene.Net.Demo/package.md new file mode 100644 index 0000000..e1ab5b1 --- /dev/null +++ b/src/Lucene.Net.Demo/package.md @@ -0,0 +1,19 @@ + + + +Demo applications for indexing and searching. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/lucenenet/blob/6a95ad43/src/Lucene.Net.Expressions/JS/package.md ---------------------------------------------------------------------- diff --git a/src/Lucene.Net.Expressions/JS/package.md b/src/Lucene.Net.Expressions/JS/package.md new file mode 100644 index 0000000..3bf25be --- /dev/null +++ b/src/Lucene.Net.Expressions/JS/package.md @@ -0,0 +1,35 @@ + + +# Javascript expressions + +A Javascript expression is a numeric expression specified using an expression syntax that's based on JavaScript expressions. You can construct expressions using: + +* Integer, floating point, hex and octal literals +* Arithmetic operators: `+ - * / %` +* Bitwise operators: `| & ^ ~ << >> >>>` +* Boolean operators (including the ternary operator): `&& || ! ?:` +* Comparison operators: `< <= == >= >` +* Common mathematic functions: `abs ceil exp floor ln log2 log10 logn max min sqrt pow` +* Trigonometric library functions: `acosh acos asinh asin atanh atan atan2 cosh cos sinh sin tanh tan` +* Distance functions: `haversin` +* Miscellaneous functions: `min, max` +* Arbitrary external variables - see [](xref:Lucene.Net.Expressions.Bindings) + + JavaScript order of precedence rules apply for operators. Shortcut evaluation is used for logical operators—the second argument is only evaluated if the value of the expression cannot be determined after evaluating the first argument. For example, in the expression `a || b`, `b` is only evaluated if a is not true. + + To compile an expression, use [](xref:Lucene.Net.Expressions.Js.JavascriptCompiler). \ No newline at end of file http://git-wip-us.apache.org/repos/asf/lucenenet/blob/6a95ad43/src/Lucene.Net.Expressions/overview.md ---------------------------------------------------------------------- diff --git a/src/Lucene.Net.Expressions/overview.md b/src/Lucene.Net.Expressions/overview.md new file mode 100644 index 0000000..d3f1c5a --- /dev/null +++ b/src/Lucene.Net.Expressions/overview.md @@ -0,0 +1,24 @@ + + +# The Expressions Module for Apache Lucene + + The expressions module is new to Lucene 4.6. It provides an API for dynamically computing per-document values based on string expressions. + + The module is organized in two sections: 1. [](xref:Lucene.Net.Expressions) - The abstractions and simple utilities for common operations like sorting on an expression 2. [](xref:Lucene.Net.Expressions.Js) - A compiler for a subset of JavaScript expressions + + For sample code showing how to use the API, see [](xref:Lucene.Net.Expressions.Expression). \ No newline at end of file http://git-wip-us.apache.org/repos/asf/lucenenet/blob/6a95ad43/src/Lucene.Net.Expressions/package.md ---------------------------------------------------------------------- diff --git a/src/Lucene.Net.Expressions/package.md b/src/Lucene.Net.Expressions/package.md new file mode 100644 index 0000000..07ef42d --- /dev/null +++ b/src/Lucene.Net.Expressions/package.md @@ -0,0 +1,24 @@ + + +# expressions + + [](xref:Lucene.Net.Expressions.Expression) - result of compiling an expression, which can evaluate it for a given document. Each expression can have external variables are resolved by {@code Bindings}. + + [](xref:Lucene.Net.Expressions.Bindings) - abstraction for binding external variables to a way to get a value for those variables for a particular document (ValueSource). + + [](xref:Lucene.Net.Expressions.SimpleBindings) - default implementation of bindings which provide easy ways to bind sort fields and other expressions to external variables \ No newline at end of file http://git-wip-us.apache.org/repos/asf/lucenenet/blob/6a95ad43/src/Lucene.Net.Facet/Range/package.md ---------------------------------------------------------------------- diff --git a/src/Lucene.Net.Facet/Range/package.md b/src/Lucene.Net.Facet/Range/package.md new file mode 100644 index 0000000..c441011 --- /dev/null +++ b/src/Lucene.Net.Facet/Range/package.md @@ -0,0 +1,18 @@ + + +Provides range faceting capabilities. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/lucenenet/blob/6a95ad43/src/Lucene.Net.Facet/SortedSet/package.md ---------------------------------------------------------------------- diff --git a/src/Lucene.Net.Facet/SortedSet/package.md b/src/Lucene.Net.Facet/SortedSet/package.md new file mode 100644 index 0000000..ae01d92 --- /dev/null +++ b/src/Lucene.Net.Facet/SortedSet/package.md @@ -0,0 +1,18 @@ + + +Provides faceting capabilities over facets that were indexed with [](xref:Lucene.Net.Facet.Sortedset.SortedSetDocValuesFacetField). \ No newline at end of file http://git-wip-us.apache.org/repos/asf/lucenenet/blob/6a95ad43/src/Lucene.Net.Facet/Taxonomy/Directory/package.md ---------------------------------------------------------------------- diff --git a/src/Lucene.Net.Facet/Taxonomy/Directory/package.md b/src/Lucene.Net.Facet/Taxonomy/Directory/package.md new file mode 100644 index 0000000..80f5921 --- /dev/null +++ b/src/Lucene.Net.Facet/Taxonomy/Directory/package.md @@ -0,0 +1,18 @@ + + +Taxonomy index implementation using on top of a Directory. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/lucenenet/blob/6a95ad43/src/Lucene.Net.Facet/Taxonomy/WriterCache/package.md ---------------------------------------------------------------------- diff --git a/src/Lucene.Net.Facet/Taxonomy/WriterCache/package.md b/src/Lucene.Net.Facet/Taxonomy/WriterCache/package.md new file mode 100644 index 0000000..dcf4583 --- /dev/null +++ b/src/Lucene.Net.Facet/Taxonomy/WriterCache/package.md @@ -0,0 +1,18 @@ + + +Improves indexing time by caching a map of CategoryPath to their Ordinal. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/lucenenet/blob/6a95ad43/src/Lucene.Net.Facet/Taxonomy/package.md ---------------------------------------------------------------------- diff --git a/src/Lucene.Net.Facet/Taxonomy/package.md b/src/Lucene.Net.Facet/Taxonomy/package.md new file mode 100644 index 0000000..779af44 --- /dev/null +++ b/src/Lucene.Net.Facet/Taxonomy/package.md @@ -0,0 +1,40 @@ + + +# Taxonomy of Categories + + Facets are defined using a hierarchy of categories, known as a *Taxonomy*. + For example, the taxonomy of a book store application might have the following structure: + +* Author + + * Mark Twain + * J. K. Rowling + +* Date + + * 2010 + + * March + * April + + * 2009 + + The *Taxonomy* translates category-paths into interger identifiers (often termed *ordinals*) and vice versa. + The category `Author/Mark Twain` adds two nodes to the taxonomy: `Author` and + `Author/Mark Twain`, each is assigned a different ordinal. The taxonomy maintains the invariant that a + node always has an ordinal that is < all its children. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/lucenenet/blob/6a95ad43/src/Lucene.Net.Facet/overview.md ---------------------------------------------------------------------- diff --git a/src/Lucene.Net.Facet/overview.md b/src/Lucene.Net.Facet/overview.md new file mode 100644 index 0000000..5e5b8c1 --- /dev/null +++ b/src/Lucene.Net.Facet/overview.md @@ -0,0 +1,20 @@ + + +Provides faceted indexing and search capabilities. Checkout [this](http://shaierera.blogspot.com/2012/11/lucene-facets-part-1.html) +and [this](http://shaierera.blogspot.com/2012/11/lucene-facets-part-2.html) blog posts for some overview on the facets module +as well as source code examples [here](../demo). \ No newline at end of file