Home‎ > ‎My Notebook‎ > ‎

Help Data Mining Jargon

Things you will find in a help viewer - MVP Rob Chandler


Search

Search is a pivotal part of any help veiwer. Here is the HTML Help Search tab.


  1. Text Entry
    Type the word or phrase you want to find.

  2. Operators: AND OR NEAR NOT
    allow you to refine your search.

  3. List Topics
    Perform a search and list the results (4).

  4. Search Results
    List all topics containing search results.

  5. Search options
    1. Search only in the current search results.
    2. Match Similar words (eg. add, adds, added)
    3. Search the topic titles only

  6. Display
    Select the topic you want from the search results (4), then click the "Display" button to view the topic (right). Or double-click the topic.

    Search terms found are usually highlighted in the contents pane (right).



                                                                                                          
Although Microsoft didn't document HH search, they did document the old HH MSDN Search (which shared much the same code and GUI).


Help jargon and terminology

Compounding

Some languages allow compounding, which is the formation of new words by combining together existing words. Google algorithms allow for compounding. For example, Google return results for Swedish credit card for both compounded [Visakort] and non-compounded [visa kort] queries.

Diacritical Marks

Some languages have diacritical marks, which alter pronunciation and sometimes change meaning of the words. Google algorithms allow users to mis-type or completely ignore diacritical marks.  For example, in Thai, [ข้าว] is "rice," with completely different results than [ข่าว], which is "news"; or in Slovakia, results for "child" [dieťa] are different than results for "diet".

Disambiguate

Another word used by data mining tools. (ie. A disambiguator removes ambiguities).
A fancy word for a simple thing. If there's more than one result, help will show a window (disambiguator) so you can select which result to view.

Here the Index item "Tutorials" has 3 associated topics so a window opens to let you select a topic to view.


Spell corrections

You will notice that Google will reognize that you might have spelt a word incorrectly and suggest an alternative spelling.

Stemming

The "Match similar words" option. Stemming allows you to expand the search results by including other popular forms of the words searched. For example "Stored" could match "Store" and "Stores".
  • Wikipedia reference: Stemming
  • Derivation - is when you create a new word by adding an affix to a stem.

Synonyms

Match words that have the same meaning. For example search for "buy" and also match "purchase".  Although synonyms are not used in Microsoft HTML Help, you can of course create Indexes where words and phrases of similar meaning all point to the same help topic (see Index tab).

Taxonomy

Wikipedia says that taxonomy is the practice and science of classification. During one visit to MS (Redmond campus Seattle) I noticed the Help PMs (Program Managers) were using the word Taxonomy when referring to the table of contents data. 

I guess it makes sense for AP help. In older help platforms there was a Table of Contents file that was displayed in its entirety in a Contents navigation tab. In AP Help (Vista Help) if you select contents, you only see a list of topics. The list is not expandable (like a TOC), however the user can click list items and drill deeper into the help. So its like a TOC in structure but you only see one level and section at a time. Opening Vista help and see for yourself.

Word Wheel

Used by the HTML Help Index page (also by WInHelp and other applications). This is where as you type a word or phrase, the list of index items or search results, scrolls to show the matching text. It allows you to quickly find a word or phrase in the list. Once the list is scrolled to the area required you can scroll down the list to view neighboring items.

Here in my HTML Help Index I've typed "In" and the closest match "Index Editor Notes" is selected and scrolled into view. I can also click items in the list to select them and double-click (or hit Display) to load the topic associated with the index item.


Key words / Tags / Indexes

Keywords are often referred to as Tags. Adding keywords to ( tagging)  a document is becoming a popular method of finding stuff. For non-text items such as images tags are essential. Vista allows you to tag your local files. EG. Tag family photos with the tag "family". Blog posts and articles on the web are also tagged. Click a tag and you find all documents associated with that tag. "Tag Clouds" are also getting popular on many web sites. These list all the most popular tags. The most frequently used tags are often larger in text size or have a stronger text color.

Index

Key words are usually stored (at least in help files) as a list of Name, Value string pairs. The Name stores the tag, while the Value stores the URL associated with the tag.

We call this list of Keyword, URL pairs an "Index".

ALink & KLink

HTML Help supports 2 kinds of index.
KLinks - Which are visible keywords you see listed in the Index navigation tab.
ALinks - Associated links forms a non-visible index. Optional and not used much outside Microsoft.

ALinks are cool because you can open a file (using the Help API) using just its ALink keyword. With MSDN your collection name, filename etc can change but you can still display a page if the ALink is kept the same.

Microsoft define key words as a meta tag in the file header. Most authors (outside MS) create a Index (.HHK) file to define their visible KLink keywords.

HH based MSDN key words were defined in HTML headers like this...
     <META NAME="MS-HKWD" CONTENT="Usage Peak counter">

For multi-level us a comma...
    <META NAME="MS-HKWD" CONTENT="Page Faults, Job Object Details">

ALinks use the same syntax but have the name "MS-HAID" instead of "MS-HKWD".

MS Help 2.0 Indexes

MS Help 2.0 (used by VS/MSDN 2002, 2003, 2005, 2008) allow unlimited indexe. MSDN uses these 4:
  • "A" Index - Same as ALinks above. Invisible Associated keyword index.
  • "K" Index - Same as KLinks above. Visible keywords index.
  • "F" Index - MSDN uses this for F1 context help lookup.
  • "NamedUrlIndex" Index - Special index with set Key words that define the home page URL etc. Allowed keywords include = HomePage, DefaultPage, NavFailPage, FilterEditPage, SampleDirPage, SupportPage, SearchHelpPage, HelpPage, AboutPageInfo, AboutPageIcon, ReleaseRegKey, RegisteredUser, RegisteredOrganization, ProductID.
Example: Doing a "view source" on any random file in MSDN library gave me the following Index tags defined...

      <MSHelp:Keyword Index="K" Term="AnyEventHandler delegate" />
      <MSHelp:Keyword Index="K" Term="Microsoft.Build.Framework.AnyEventHandler delegate" />
      <MSHelp:Keyword Index="F" Term="Microsoft.Build.Framework.AnyEventHandler" />
      <MSHelp:Keyword Index="F" Term="AnyEventHandler" />
      <MSHelp:Keyword Index="A" Term="frlrfMicrosoftBuildFrameworkAnyEventHandlerClassTopic" />
      <MSHelp:Keyword Index="A" Term="T:Microsoft.Build.Framework.AnyEventHandler" />

Q. What if I open a topic using a "A" Keyword that is associated with more than one file.
A. Usually the help viewer will display the disambiguator window and ask you which topic you want to view.

Comments