uv data files contain information about the coherence function of a wavefront at random locations and random times. Consequently, the way this information is stored on disk is different from that for images where the pixels are on a regular grid. AIPS uv data are stored on disk in a manner similar to the way that it would be organized on a FITS “random group” tape. The data are stored as logical records; each record (a “visibility”) contains all the data taken on one baseline at a given time. Consequently, a record may contain information for several IFs, several frequencies at each of those IFs and more than one polarization combination for each frequency/IF. The first part of each logical record contains what are known as “random parameters” e.g., spatial frequency coordinates and time. After the random parameters, there is a small, regular array of data.
For a multi-source data set such as might be created by FILLM, the random parameter group will include the following. UU-L-SIN, VV-L-SIN, and WW-L-SIN give the spatial frequency coordinates, computed with a sine projection in units of wavelengths at the reference frequency. TIME1 is the label for the time in days. BASELINE is the baseline number (256ant1 + ant2 + subarray∕100.) and SOURCE is the source number corresponding to an entry in the source table. If you have frequency table identifiers (which is usually the case these days), then there will be an additional random parameter, FREQSEL. For a compressed database, two additional random parameters will be required — WEIGHT to give a single data weight for all samples in the record and SCALE to give the gain used to compress that record.
The regular data array is similar to an image array in that the order of axes is arbitrary. However, the convention is for the first axis to be of type COMPLEX, having a dimension of 3 for uncompressed data (real, imaginary, weight) and a dimension of 1 for compressed data. The other axes of the regular array are IF, RA, DEC, FREQ and STOKES.
The number of words in each “visibility” is given by
For example, a 12-hour observation with 30 second integrations, one frequency and 2 IFs with RR, RL, LR and LL written in compressed format (DOUVCOMP = TRUE in FILLM) will occupy about 34 Mbytes on disk. In practice, the uv file will usually be a little larger due to the way the system allocates space on the disks. You must also remember to allow room for the extension tables — see §F.3. If this database had been written in uncompressed format, the uv data would have occupied around 62 Mbyte, but would have carried information about different data weights for different IFs.
Consider another example illustrated by the IMHEAD listing below:
This compressed (COMPLEX Pixels = 1) uv database contains 9 random parameters, 4 polarizations, 1 frequency, and 2 IFs. for each of 105198 visibilities. The size of the database file itself is, therefore,
The use of “compressed” data can make substantial savings in the amount of disk space that you require, particularly for spectral-line databases. All tasks should now be able to handle either the compressed or the uncompressed formats. Compressed data files can be identified by the dimension of 1 for the COMPLEX axis in the database header. (Uncompressed data will have a dimension of 3.) The savings can be close to a factor of three for spectral line observations.
This is achieved by converting all data weights into a single WEIGHT random parameter, by finding a single SCALE random parameter with which to scale all real and imaginary parts of the visibilities into 16-bit integers, and by packing the real and imaginary terms into one 32-bit location using magic-value blanking for flagged data. This is to be compared with the uncompressed format in which each of the real, imaginary and weight terms are each stored in a 32-bit floating-point location. The use of a single weight value masks real differences in system temperatures between polarizations and IFs, which one should retain for the lowest possible noise in imaging.
In general, data compression is a good thing and should be used, but with a little caution. With a single frequency, single IF, and single polarization, you will not save any disk space. In all other cases, there are respectable savings to be made. However, the use of a packed data word for the real and imaginary parts of the visibility function along with magic value blanking imposes a restriction on the “spectral dynamic range” of the data set of around 32000:1. Consequently, there are some situations where compressed data should not be used. For example, if the spectral dynamic range in the uv database is likely to be greater than, say, 1000:1, you must use uncompressed data format to avoid loss of accuracy. This situation can arise in maser spectra, for example, in which there are maser lines of 1 Jy and > 32000 Jy; in this case, you should never use compressed data. Bandpass calibration can cause large correction factors to be applied to the edge channels of a database. In the presence of noise or interference, bad channels can become very much greater in amplitude than good channels. In such cases you must either use uncompressed format or be very careful to flag bad channels or to drop them with the BCHAN and ECHAN adverbs as you apply the bandpass calibration. In general, continuum data sets can be loaded with data compression since these dynamic range considerations will not normally apply. The loss of weight information may not be worth the savings in disk space, however. Compressed data takes less time to read/write due to the small number of bytes, but casts in cpu time some due to the extra computation.
If there has been on-line or later flagging that depends on polarization, IF, or spectral channel (i.e., RFI excision) or differences in the intrinsic weights between polarizations, IFs, or spectral channels (i.e., different system temperatures for different IFs), then data compression causes a serious loss of information related to the data weight.