CCITT Definition

CCITT

CCITT which stands for Comité Consultatif International Téléphonique et Télégraphique is a lossless compression format that is historically used for compressing two-color images like black and white.

The few types of versions of this format are commonly known as Group 3 which encodes every line separately and Group 4 which makes references from the previous lines.

Being used for TIFF compression the format leverages the technique typically referred to as Huffman encoding with its purpose being to allow the data to be confined into a significantly smaller compressed stream. This method proves to be most useful for compressing images that contain long and repeated pixels of single colours.

Compression technique and use cases of CCITT

Compression starts with identifying the pixels to be stored, by looking at the patterns in which they repeat. It follows the process of counting the number of contiguous white or black pixels instead of keeping a record of the whole set of every single pixel.

The fact that the margins in a document are large and white, makes it very suitable for compression since the algorithm only says “500 white pixels” to mention them, instead of listing all the pixels one by one.

The format was in use for a very long time i.e., decades in the case of fax machines. Documents were compressed by Group 3 and sent down, which made it easy for them to be transmitted faster and be the owner of less cost, the same old story.

Every scan line was independently encoded, so, even if the partial transmission was failed, the document that reached you would still be partially readable.

Group 4, however, goes a step further and compares each line with the one above it. Text documents, especially in the case of letters, have a lot of horizontal similarity — words generate patterns which appear line by line. The algorithm does not encode all the line anew but just takes note of the differences, reducing the size of the file even further.

Business documents, architectural blueprints, and scanned text are the prominent types of images used with this method. These pictures come along with a lot of different factors such as large solid areas and sharp contrasts between black and white colors. The compression of such images can be as high as 10:1 and even more for the typical office documents.

The downside is scanning from a film or paper. The natural images and the photos to be more accurate always introduce some gradient, very little noise in the neighborhood of the edges, and other factors that make long runs of pixels hard.

So, the algorithm cannot detect any good patterns any time, i.e., compression becomes unsuccessful. The same issue is valid for color graphics as to the two-color limit, which is the restriction of the CCITT standard.

It is a tradition that even the most recent PDF files contain elements coded by CCITT. Nevertheless, there had been a big progress on the technological side, but the usage of the pure line of black and white is the only way for the message to be as dry as possible and not to become a commodity.