DIVA software

 

TermCooccur

Page history last edited by Steven Morris 1 yr ago

Term cooccurrence analysis

 

This page will cover analysis of both ID and DE terms. Analysis of 1-word, 2-word, and 3-word terms will be added at a later date.

 

 

Loading the paper to ID term (or DE term) matrix

 

  • Prior to term co-occurrence clustering, it is necessary to load the paper to ID term (or DE term) matrix from the database. To do this, on the TFGUI window:

 

    • Terms > load ID terms matrix for ID terms.
    • Terms > load DE terms matrix for DE terms.

 

When loading is complete, a done loading paper to DE (or ID) matrix will appear in the MATLAB command window.

 

Clustering ID (or DE) terms

 

  • The appropriate paper to terms matrix must be loaded into DIVA before co-occurrence clustering can be performed. See the previous paragraph.
  • In the TFGUI window, go to the Terms menu and select Cluster XX terms by paper where XX is either ID or DE depending on which term is to be clustered. The cooccur_gui will appear. The primary entity will be ID term (or DE term) and the secondary entity will be paper. This means that ID terms (or DE terms) will be clustered on cooccurrence in papers.
  • If the number of term to cluster is greater than the number of clusters specified in coocur_gui, then DIVA will produce the number of clusters specified. Otherwise DIVA will cluster down to individual terms, whatever that number is.
  • It is normally best to cluster down to individual terms, with number of terms between 50 to 200 depending on the size of the dataset. A good rule of thumb is to used 50 terms per 500 papers in the dataset.
  • In index term clustering the occurrence threshold corresponds to the minimum number of papers an index term is associated with. Terms with more papers than the occurrence threshold are retained.
  • You should experiment with the occurence threshold to get close to the desired number of retained terms.
  • Set the occurence threshold, click on Execute. An overwrite dialog will appear, click OK on this, then a dialog will appear telling the number of items (terms) that will be clustered, and asking whether to continue.
  • If the number of items is too few, click NO and go back and reduce the coocurrence threshold, if too few, click NO and increase the threashold. If the number of items that will be clustered is close to the desired number, then click Yes and clustering will proceed.
  • Clustering will proceed quickly, but simulated annealing to seriate the dendrogram may take very long if many items are to be clustered.

 

 

 

This page viewed times.

Comments (1)

Steven Morris said

at 10:19 pm on Jan 8, 2008

This is a test comment.

You don't have permission to comment on this page.