File Format

File format:
Headline 1: The number of antigens and the number of antisera are separated by Tab.
Headline 2: The names of all antisera separated by Tab.
Headline 3: The reference HI values user want to used for normalization. For the first 3 normalization, we will take the maximum HI value in every column as reference HI values. Then users can just leave zeros for the third row if they choose the first three normalization options.
Dataline: An antigen and group followed by the number of immunological data entries. If there is an empty entry, the user shall put 0.

Low Reactions

There are three types of data point are present in the HI matrix: observed value, unobserved value and low reactor. The low reactor represent the testing antigen and antiserum do not strongly react with each other. Those low reactor provide some information but not as accurate as observed value. The AntigenMap give low reactor special treatment. The web-sever ask user to defined the low reactor threshold. Any observed value smaller than this threshold will be regard as low reactor.


Five normalization options are provided here
N1: Each observed value is normalized by the overall maximum value max(Hij) and the maximum value for each column max(Hj), and the normalized value will be transformed into.
N2: The user can provide normalized data. AntigenMap will perform matrix completion on the input data without normalization.
N3: The data will be normalized by column (antiserum). Per column, each observed value is normalized by the maximum values, max(Hj), and the normalized value Hij/max(Hj).
N4: The data will be normalized by column (antiserum). Per column, each observed value is normalized by the reference values, Rj, and the normalized value Hij/Rj.
N5: The data will be normalized by column (antiserum). Per column, each observed value is normalized by the reference values, Rij. Similar to N4, the normalized value will be Hij/Rj. if Hij/Rj<1, the observed value will be 1 otherwise.


Rank: the dimension of space that the HI data being embedded into during the matrix completion process. Our simulation study suggests that rank 2 is usually enough for small data sets with less than 20 viruses and antiserum. However, rank 6 or more is required for a HI table of size over 100 in either dimension.

Temporal Model

After low rank matrix completion algorithm, we need to project the antigen (antiserum) into two or three dimensional map. As described in [1], the distribution of observed value in HI matrix is not random. The ordinary MDS work well on the dataset which span a small time interval. In order to obtain accurate global distance for a large time interval, we incorporate a temporal model in MDS. We suggest to use temporal model if the time interval is large enough, e.g. 16 years for H3N2 influenza A virus. The temporal model should be chosen carefully as a banded structure may be needed for the data distribution as described in [1].

The only difference of the file format for temporal model is the name of antigen and antiserum. All the names in input file of temporal model should ended with "/Year". "Year" is represented by two digit. For example, the name of virus isolated in 1998 should ended with "/98".