Within the PostgreSQL, the database is divided by Tables. For the CMIP5 dataset, datasets have differing variables (and, therefore, differing Name schemes) from the Models the dataset is based upon. As such, each Table is determined by the Model. From there, we have the Columns which are each component of the Name. And, finally, the Rows are the individual Name entries for each dataset.
While a tempting optimization would be to combine Models with the same Name schemes, it is unknown if the Name schemes will always be the same. As such, it is recommended they stay separate.
The Tables include a first Column of the full URI NDN Name. All Columns use URI (String) types.
Table A (for CIMP5)
Name|sha256|Activity|Product|Organization|Model|Experiment|Frequency|modeling realm|variable name|ensemble member
Table B (for CESM)
Name|sha256|Activity|Product|Organization|Model|Ensemble|Experiment|Sample Granularity|Start Time
Table C (For GigaLES)
Name|sha256|Activity|Product|Organization|Model|field campaign|optical properties for radiation|grid resolution|output type|timestamp|
On a search, the standard queries are defined as "[field]=[search]" with [field] being a Column and [search] being any exact matches in those Columns. These are comma-seperated lists, so we can continue to use JOINs to narrow the results down. If any of the fields are Model, then we can identify the Table and eliminate a large portion of the search; Model was chosen as it was observed to be the most commonly and first used for narrowing down results by users.
In the autocomplete query case, the query follows a "?=[search]" pattern. The question mark indicates we do not know what to look for, and the search is what has already been entered by the user. The autocomplete can be used with the standard query to do an initial narrowing down of results, prior to doing a regular expression search for what can come next.