The need for hexadecimal representations of binary values brings up an important point about digital data: there's a lot of it, and it can quickly munch into the number of bytes you have for storage (no pun intended.) It's also notoriously difficult to send files like photos or videos over text or email because of how large they are.
That's why data compression is needed. **Data compression** is the process (or processes) by which the size of shared or stored data is reduced. This in essence reduces the number of bits.
The amount you can shrink your file by depends on two things:
the amount of redundancy, or repeated information, that you can remove in your original data
the method you use to compress your file
Many data compression methods work by using symbols to shorten the data.
A simple form of data compression is known as run-length encoding. It works by replacing repeating data, such as colors in an image or letters in a document, with a run that represents the number and value of the repeated data. For example, the string "FFFFFIIIIIIVVVVVVVEEEE" would be stored as 5F6I7V4E, greatly reducing the number of bytes needed to store it.
Run-length encoding is used to compress some simple images, such as bitmaps. It is also used by fax machines.
Another method of data compression that replaces repeating data with symbols is known as the LZW compression algorithm. It's used to compress text and images, most notably in GIFs.
Let's take a look at two of the most common compression types: lossless and lossy.
Lossless compression algorithms allow you to reduce your file size without sacrificing any of the original data in the process. You're generally able to restore your original file if you want. Run-length encoding and the LZW algorithm are both examples of lossless compression because they only shorten data to compress it, and all the information remains the same.
If your main concern is the quality of your file or if you need to be able to reconstruct your original file, lossless algorithms are usually the better option.
This type of storage might be important in databases, where a difference in a compressed versus an uncompressed file could skew the information being represented.
This concern about skew also applies to both medical and satellite imaging, where small differences in the data could have large impacts. Many software downloads also use lossless compression methods because the programs need to be recreated exactly on your computer in order to work.
In contrast, lossy compression algorithms sacrifice some data in order to achieve greater compression than you can achieve with a lossless method. They usually do this by removing details, such as replacing similar colors with the same one in a photo.
Photo by Krisztian Tabori on Unsplash
An image of tacos before and after lossy compression. Although the image on the bottom uses 62% less data than the image on the top, the two images look practically identical.
If your main concern is minimizing how big your file is or how long it'll take to send or receive it, go with a lossy method! A lot of lossy compression methods make changes that are barely detectable or even undetectable to your average viewer, and they can save you a lot of space. It's commonly used in photo, audio, and visual compression, especially for downloading purposes.
Although there are two main types of data compression, you don't have to choose just one. Indeed, many modern compression software systems use a combination of the two methods in some way.