Computers use binary - the digits 0 and 1 - to store data. A binary digit, or bit, is the smallest unit of data in computing. It is represented by a 0 or a 1. Binary numbers are made up of binary digits (bits), eg the binary number 1001. The circuits in a computer's processor are made up of billions of transistors. A transistor is a tiny switch that is activated by the electronic signals it receives. The digits 1 and 0 used in binary reflect the on and off states of a transistor. Computer programs are sets of instructions. Each instruction is translated into machine code - simple binary codes that activate the CPU. Programmers write computer code and this is converted by a translator into binary instructions that the processor can execute. All software, music, documents, and any other information that is processed by a computer, is also stored using binary. [1]

To include strings, integers, characters and colours. This should include considering the space taken by data, for instance the relation between the hexadecimal representation of colours and the number of colours available.

To include strings, integers, characters and colours. This should include considering the space taken by data, for instance the relation between the hexadecimal representation of colours and the number of colours available.

Computers store and process data in binary format, using a combination of 0s and 1s, known as bits. These bits are grouped together to form larger units of data. Those bits are combined to represent various data types in computer.

Units of Data ​

These are standardized measures to count the amount of information or data stored in computer:

  • Bit (b) : The smallest information is a single binary digit (0 or 1).
  • Byte (B) : A byte is a group of 8 bits. It is the most common unit used for representing characters and data in computer systems.
  • Kilobyte (KB) : 1 kilobyte is equal to 1,000 bytes ( 1 0 3 10^3 1 0 3 bytes). It is often used to describe small amounts of data, such as text documents or small images.
  • Megabyte (MB) : 1 megabyte is equal to 1,000 kilobytes ( 1 0 6 10^6 1 0 6 bytes). It is commonly used to measure the size of files, larger documents, images, or short audio recordings.
  • Gigabyte (GB) : 1 gigabyte is equal to 1,000 megabytes ( 1 0 9 10^9 1 0 9 bytes). It is used to describe larger files, such as high-resolution images, longer audio recordings, or small videos.
  • Terabyte (TB) : 1 terabyte is equal to 1,000 gigabytes ( 1 0 12 10^{12} 1 0 12 bytes). It is used for large-scale data storage, such as hard drives, servers, or high-definition video recordings.
  • Petabyte (PB) : 1 petabyte is equal to 1,000 terabytes ( 1 0 15 10^{15} 1 0 15 bytes). It is used to measure large amounts of data, such as data centers or big data analytics.

And many more measurements that follows the same pattern.

There are also measurements such as KiB or MiB. The difference between KB and KiB or MB and MiB differs in their base. KB uses base 10 (decimal) while KiB uses base 2 (binary).

In summary:

  • 1 KB = 1 0 3 10^3 1 0 3 bytes (decimal)
  • 1 KiB = 2 10 2^{10} 2 10 bytes (binary)

File Format ​

In computer, a file is a collection of data or information stored on a storage device such as hard drives. When a file is created, modified, or saved, it is typically represented as a sequence of binary data, consisting of 0s and 1s. The file's contents, along with its metadata (such as file name, size, creation date, and permissions), are stored on the storage device.

A file format defines how a file is structured and organized. A file format describes how data is stored, encoded, and interpreted in a computer file. For example, a document file may include what font used in the document so that the computer that reads it know what to display.

Various amount of file format can be found in digital media processing , digital media formats section.

Data Representation ​

In computer, color is represented as number in binary format. Each combination of binary format represent a different color.

RGB (Red, Green, Blue) : RGB is the most widely used color model in computer graphics and digital displays. It represents colors by specifying the intensities of red, green, and blue primary colors. By combining different intensities of these three primary colors, a wide range of colors can be produced.

RGB contains 3 different color components (also called color channel ), where each component is typically represented as 8-bit value ranging from 0 to 255. For example:

  • Red: RGB(255, 0, 0) / RGB(11111111, 00000000, 00000000) in binary.
  • Green: RGB(0, 255, 0) / RGB(00000000, 11111111, 00000000).
  • Blue: RGB(0, 0, 255) / RGB(00000000, 00000000, 11111111).
  • Purple: RGB(128, 0, 128) / RGB(10000000, 00000000, 10000000).

CMYK (Cyan, Magenta, Yellow, Key/Black) : CMYK is primarily used in printing and represents colors in terms of the amounts of cyan, magenta, yellow, and black inks required to reproduce a specific color. It uses subtractive color mixing, where the more ink is added, the darker the color becomes. Similar to RGB, CMYK is typically represented as a set of 8-bit.

  • Cyan: CMYK(100, 0, 0, 0) / CMYK(11111111, 00000000, 00000000, 00000000)
  • Magenta: CMYK(0, 100, 0, 0) / CMYK(00000000, 11111111, 00000000, 00000000)
  • Yellow: CMYK(0, 0, 100, 0) / CMYK(00000000, 00000000, 11111111, 00000000)
  • Black: CMYK(0, 0, 0, 100) / CMYK(00000000, 00000000, 00000000, 11111111)
  • Orange: CMYK(0, 50, 100, 0) / CMYK(00000000, 01111111, 11111111, 00000000)

HSL/HSV (Hue, Saturation, Lightness/Value) : HSL and HSV are alternative color models that represent colors based on their perceived attributes. Hue represents the dominant wavelength of the color, saturation represents the intensity or purity of the color, and lightness or value represents the brightness. HSL and HSV values are usually represented as angles for hue (ranging from 0 to 360 degrees) and percentages or decimal values for saturation and lightness.

  • Red: HSL(0, 100%, 50%) / HSL(00000000, 10000000, 01100100)
  • Lime Green: HSL(120, 100%, 50%) / HSL(01111000, 10000000, 01100100)
  • Blue: HSL(240, 100%, 50%) / HSL(11110000, 10000000, 01100100)
  • Light Yellow: HSL(60, 100%, 75%) / HSL(00111100, 10000000, 11001000)
  • Magenta: HSL(300, 100%, 50%) / HSL(10010110, 10000000, 01100100)

Hexadecimal Color : Hexadecimal color is another commonly used representation for colors in computer systems. It uses the hexadecimal numbering system to represent colors, where each color component is represented by a two-digit hexadecimal value ranging from 00 to FF.

  • Red: #FF0000
  • Green: #00FF00
  • Blue: #0000FF
  • Yellow: #FFFF00
  • Purple: #800080

Sound is a continuous wave in its analog form, in computer, they are represented discretely. To transform continuous wave to discrete data, it will go through a process called sampling. Sampling involves measuring the amplitude of the sound wave at specific points in time. The rate at which these measurements are taken is known as the sampling rate . For example, when we say a sound is sampled at 44.1 kHz, it means we are sampling the sound wave at 44,100 times per second.

Each sample represents the amplitude of the sound wave at a particular moment. To convert this analog amplitude into a digital representation, the sample will be quantized . Quantization involves assigning a numerical value to the amplitude of each sample. Basically, we will assign binary digits for each different amplitude. However, with the many combinations of amplitude, sometimes they are rounded to the closest interval to reduce complexity of the data, sacrificing the sound accuracy. The number of numerical value we will have is calculated by 2 bit depth 2^{\text{bit depth}} 2 bit depth , the bit depth determines the resolution or precision of the quantized representation.

After converting it to binary, we can then store it on a file. Storing and accessing the file will involve coding process which includes encoding and decoding. Simply, they are the process of representing a signal or data in a specific format or code that can be processed, transmitted, stored, or interpreted by digital systems.

As explained before, the stored file will be in a specific file format. In the case of sound or audio file, we can store it in MP3 format . By using the MP3 format, audio files can be efficiently stored, transferred, and played back on various digital devices.

Sound sample

Know more about wave in computer in digital signal processing , especially the signal transmission part. More about digital media processing .

A database is a collection of structured data. A common approach to store database is to organize data into tables consisting of rows and columns. Each row represents a record or entity, and each column represents a specific attribute or field of that record.

A database consists of multiple rows and columns, the structure and organization of the tables, data types used, etc.

A database file is typically divided into fixed-size chunk that contains a specific number of records or a portion of the database file. The database will be stored using a specific file format that defines how the database file is structured. It may consist of header containing important information about the file, the metadata, and the actual database.

Example of a structured database in table

See also database system .

Character Encoding ​

Encoding refers to the process of converting information from one representation or format to another. It involves converting data into a specific format that can be processed, transmitted, stored, or interpreted by digital systems.

Character encoding is specific encoding used to represent characters, symbols, and textual data in computer.

ASCII (American Standard Code for Information Interchange) is one of the simplest character encoding, widely used in the old days of computing. ASCII represent character using a combination of binary digits. A character is represented by 7-bit code, counting all the binary digits' combination, we can represent 128 different character.

ASCII provides a standardized mapping between these characters and their corresponding numerical codes. For example, the uppercase letter "A" is represented by the code 65, the lowercase letter "a" is represented by 97, and the digit "0" is represented by 48.

Example of "goodbye" encoded in ASCII

The image above did a good job explaining ASCII. For example, letter "g" is defined as 103 in decimal or 01100111 in binary.

While ASCII provides a simple way to represent character, it has very limited character set, and it focuses on the English language.

Unicode is a widely used universal character encoding standard for text in all writing systems and languages worldwide. It can even represent various kind of emojis. The Unicode version 15.1, which was released in September 2023 is able to produce 149813 different character.

Unicode assigns a unique numerical value, called a code point , to each character in its repertoire. The code points are represented using hexadecimal notation , such as U+0041 for the uppercase letter "A" and U+4E2D for the Chinese character "中".

UTF (Unicode Transformation Format) , such as UTF-8, UTF-16, and UTF-32, is the character encoding schemes used to represent Unicode characters in binary form.

UTF-8 : UTF-8 is a variable-length encoding scheme that represents Unicode characters using 8-bit units, which can be one to four bytes long. In UTF-8, characters from the ASCII character set (U+0000 to U+007F) are represented using a single byte, making it backward compatible with ASCII. Characters outside the ASCII range are represented using multiple bytes.

UTF-8 uses a specific bit pattern to indicate the start of a multibyte sequence.

  • A single-byte UTF-8 character (ASCII) starts with a '0' bit, followed by the 7-bit ASCII representation.
  • A two-byte UTF-8 character starts with '110', followed by the remaining 11 bits of the character's code point.
  • A three-byte UTF-8 character starts with '1110', followed by the remaining 16 bits of the character's code point.
  • A four-byte UTF-8 character starts with '11110', followed by the remaining 21 bits of the character's code point.

UTF-16 : UTF-16 is a variable-length encoding scheme that represents Unicode characters using 16-bit units, which can be one or two 16-bit code units (also known as surrogates ). Characters from the ASCII character set are represented using a single 16-bit unit, while characters outside the ASCII range are represented using one or two 16-bit units. UTF-16 can handle the entire Unicode character set, including characters outside the Basic Multilingual Plane (BMP), or the most commonly used characters across various writing systems.

UTF-32 : UTF-32 is a fixed-length encoding scheme that represents all Unicode characters using 32-bit units. Each character is encoded using a single 32-bit unit, regardless of its Unicode code point value. UTF-32 provides a straightforward and uniform representation for all characters, but it requires more storage space compared to UTF-8 and UTF-16.

UTF-8 Example ​

ASCII Character "A": The ASCII character "A" has a Unicode code point of U+0041. In UTF-8, since the code point for "A" falls within the ASCII range (U+0000 to U+007F), it can be represented using a single byte. The UTF-8 binary representation of "A" is: 01000001.

Non-ASCII Character "中": The non-ASCII character "中" has a Unicode code point of U+4E2D. In UTF-8, since the code point for "中" is outside the ASCII range, it requires multiple bytes for representation. The UTF-8 binary representation of "中" is: 11100100 10111000 10101101. Here, the first byte starts with three leading '1' bits followed by a '0' bit (indicating a multibyte sequence), while the subsequent bytes start with '10' bits.

UTF comparison

Base Encoding ​

Base encoding is the process of representing data or information in a specific numerical base. The most common encoding in computing is the base-2 encoding, where we represent data using only two symbols: 0 and 1.

Base64 is an encoding scheme that represents binary data in an ASCII string format. It uses a set of 64 characters from the alphabet (both lowercase and uppercase), numbers, the "+" symbol, and the "/". It also use the "=" symbol as padding, to ensure that the length of the resulting encoded string is a multiple of 4 characters.

Here's how the conversion from binary data to Base64 works ( Base64 encoding ):

  • Input Binary Data: The binary data will be divided into groups of 3 bytes.
  • Split Into 6-bit Chunks: Each byte that contains 8-bit each will be combined producing 24-bit binary value. It will then be split onto four 6-bit chunks.
  • Map to Base64: Each 6-bit chunk is mapped to a corresponding character from the Base64 character set.
  • Padding: If the input data is not evenly divisible by 3 (i.e., the last group has less than 3 bytes), padding is added to ensure that the length of the encoded string is a multiple of 4 characters.

The process to get binary data back from a string encoded in Base64 will be the reverse process of this, and it's called Base64 decoding .

Base64 table defined

For example, consider ASCII characters: "Man", which has 8-bit binary values of 01001101 , 01100001 , and 01101110 , respectively. Each byte will be joined together resulting in 010011010110000101101110 . We will then split it into 6-bit chunk 010011 010110 000101 101110 . Each 6-bit chunk maps to T , W , F , u , respectively. Thus, "Man" in ASCII is equivalent to "TWFu" in Base64 encoded.

Data visualization or data visualisation is viewed by many disciplines as a modern equivalent of visual communication. It involves the creation and study of the visual representation of data, meaning "information that has been abstracted in some schematic form, including attributes or variables for the units of information". [ 1 ]

A primary goal of data visualization is to communicate information clearly and efficiently via statistical graphics, plots and information graphics. Numerical data may be encoded using dots, lines, or bars, to visually communicate a quantitative message. Effective visualization helps users analyze and reason about data and evidence. It makes complex data more accessible, understandable and usable. Users may have particular analytical tasks, such as making comparisons or understanding causality, and the design principle of the graphic (i.e., showing comparisons or showing causality) follows the task. Tables are generally used where users will look up a specific measurement, while charts of various types are used to show patterns or relationships in the data for one or more variables. [ 1 ]

Data visualization is both an art and a science. It is viewed as a branch of descriptive statistics by some, but also as a grounded theory development tool by others. Increased amounts of data created by Internet activity and an expanding number of sensors in the environment are referred to as "big data" or Internet of things. Processing, analyzing and communicating this data present ethical and analytical challenges for data visualization. The field of data science and practitioners called data scientists help address this challenge. [ 1 ]

This is a collection of our favorite visualizations, infographics, and other projects built on open data from Wikipedia and other Wikimedia projects, curated by Stephen LaPorte and Mahmoud Hashemi.

To introduce this subject, let us consider an example that may help you to understand more clearly the idea of representing one thing by another. Take the word cat . It refers to a class of animals, often kept as pets by humans, whose members have certain common characteristics, such as that they have claws, fur, and make purring noises. It is unlikely that you would ever confuse the word cat with the species that it represents or with any particular member of that species.

Digression: At the risk of becoming pedantic, let us go one step farther. Consider that which appears, centered on the screen (or page), between here and the next paragraph.

Is what appears immediately above the word cat itself, or is it just a representation of that word, formed by a pattern of black and white pixels on your computer screen (or ink stains on a sheet of paper, if you're reading a "hard copy" version of this document)? The point is that one could reasonably view each occurrence of the character sequence cat (or any similar sequence that spells some word) appearing on a page, or a computer screen, or a blackboard, etc., as simply a representation of the corresponding word. End of Digression.

Few people would confuse the word cat with the type of animal to which it refers, but many people routinely confuse numerals with the numbers that they represent. For example, consider

This is a five-digit numeral that represents the same number as is represented by the phrase thirty-five thousand twenty-four (which can also be considered to be a numeral!). Just as words refer to (or represent) objects, actions, and various other concepts, numerals refer to (or represent) numbers. In our day-to-day lives, most of us rarely need to make such subtle distinctions. But because computers store representations of concepts, and manipulate those representations, a good understanding of computers requires that you appreciate the difference between a thing and a representation thereof.

Computers are capable of storing and processing data of many different kinds. Among the most common types of data are numeric , textual (composed of characters), logical (i.e., true and false values), visual (i.e., images), and audio (i.e., sound). Yet computers store all data in terms of only 0's and 1's! Or at least that's the point of view taken by computer scientists. The physical manifestation of those 0's and 1's (i.e., by what means the 0's and 1's are represented on whatever physical medium they are stored) is the concern of people who work at levels of abstraction closer to physical reality, such as electronics engineers and physicists.)

How can so many different kinds of data all be expressed in terms of 0's and 1's?? The answer lies in encoding schemes ! Numeric Data Unsigned Integers We begin by considering unsigned (i.e., nonnegative) integers, or the so-called natural numbers. Most peoples of the world employ the decimal (or base ten ) numeral system . In this system, the ten distinct symbols 0 , 1 , 2 , ..., 9 (also called the decimal digits ) represent the numbers zero through nine. To express larger numbers, we form sequences of digits and follow the convention that the "worth" of each digit in such a sequence depends not only upon which digit it is (i.e., 4 vs. 7) but also upon its position within the sequence. (Sometimes this is called positional notation.)

More specifically, the positions become increasingly significant as we go from right to left. We say that the rightmost digit is in the 1's column, its neighbor to the left is in the 10's column, the next digit to the left is in the 100's column, the next is in the 1000's column, etc., etc. That is, the weights , or place values , of the columns are the powers of 10. (i.e., 1 (or 10 0 ), 10 (or 10 1 ), 100 (or 10 2 ), 1000 (or 10 3 ), etc.). Here is an illustration for the numeral 7326:

This numeral means the same thing as

(7 × 1000) + (3 × 100) + (2 × 10) + (6 × 1)

This system works quite nicely because every nonnegative integer can be expressed as a sum of the form (d k × 10 k ) + (d k-1 × 10 k-1 ) + ... + (d 1 × 10 1 ) + (d 0 × 10 0 ) for some natural number k, where each d i is a decimal digit (i.e., one of 0, 1, 2, ..., 9). Hence, each such number can be represented by the corresponding numeral

d k d k-1 ... d 1 d 0

Why do we use ten as the base of our numeral system? Is there something inherent about ten that makes it better than any other choice? No! Rather, anthropologists point to evidence that many ancient civilizations adopted counting systems convenient for counting on the hands, which have ten fingers.

We could, for example, just as well use eight as the base (giving rise to the octal system) or 16 (giving rise to the hexadecimal system) or any other integer greater than 1. (There is such a thing as the base 1 (or unary) system, although it is not entirely analogous.)

As an example, consider the octal (i.e., base 8) system. In this system, numerals are formed from the (eight) digits 0 through 7 and the column weights are the powers of eight (1 = 8 0 , 8 = 8 1 , 64 = 8 2 , 512 = 8 3 , etc.). Take, for example, the octal numeral 5207:

Analogous to the decimal numeral example above, we calculate (using base 10 numerals!) that the number represented by the octal numeral 5207 is (5 × 512) + (2 × 64) + (0 × 8) + (7 × 1) which works out to 2695 (expressed in decimal). That is, we have

5207 8 = 2695 10

Note that we place a (decimal numeral) subscript to the right of a numeral in order to indicate its base explicitly.

For reasons having to do with the concerns of engineering (such as reliability and cost), devices on which digital data are stored are built in such a way that each atomic unit of memory/storage is a switch , meaning that, at any moment in time, it is in one of two possible states. By convention, we refer to these states as 0 and 1 , which, of course, correspond to the two digits that are available in the binary (or base 2 ) numeral system. One might call each of these a b inary dig it , from which we get the contraction bit . It would seem natural, then, for computers to employ the binary numeral system for representing numbers.

As an example, take the binary numeral 10100110 2 :

Notice that the column weights are the powers of two. Analogous to the examples above, we have that 10100110 2 represents the number corresponding to the sum (expressed in decimal numerals)

(1 × 128) + (0 × 64) + (1 × 32) + (0 × 16) + (0 × 8) + (1 × 4) + (1 × 2) + (0 × 1)

which comes out (in decimal) to 166.

In general, to translate a binary numeral into its decimal equivalent, do exactly as we did in arriving at 166 in the above example: simply add up the weights of the columns in which the binary numeral contains 1's.

Translating from decimal to binary is only a little more difficult. Perhaps the most intuitively appealing approach is to find the powers of two that sum up to the desired number. We illustrate this with an example: Suppose that we want to express the number 75 (here expressed in decimal notation, as usual) in binary notation. First find the largest power of two that is less than or equal to 75. That would be 64 (or 2 6 ), because the next higher power of two is 128, which is too big. As 75 − 64 = 11, it remains to find powers of two that sum to 11. Following the same technique as before, find the largest power of two no greater than 11. That would be 8 (or 2 3 ). As 11 − 8 = 3, it remains to find powers of two summing to 3. The largest power of two no greater than 3 is 2 (or 2 1 ). As 3 − 2 = 1, it remains to find powers of two summing to 1. The largest power of two no greater than 1 is 1 (or 2 0 ). As 1 − 1 = 0, we are done. What we have determined is that 75 can be written as the sum of powers of two as follows:

75 = 64 + 8 + 2 + 1

which is to say that the binary representation of 75 has 1's in the 64's, 8's, 2's and 1's columns and 0's in every other column. Omitting leading 0's (in the columns with weights greater than 64), this yields

That is, the binary numeral we seek is 1001011 2 . Arithmetic Operations For a computer to be useful as a "number cruncher", it needs not only to be able to encode integer values, but also to be able to perform arithmetic operations upon them. How can addition, for example, be carried out upon numbers encoded using the binary numeral system? Well, it turns out that addition, as well as the other arithmetic operations, can be performed in binary (or any other base) similarly to how humans perform it in decimal.

Here is an example:

The larger point being made here is that, regardless of how many bits are chosen as being the "standard size" for representing integers (or any other type of data), the set of values that is encodable inside any fixed-length chunk of storage is finite. Hence, if the (accurate) result of some particular computation is outside this set, the result that actually gets stored will be in error. For example, if we are working in the realm of 8-bit numerals represented using the 2's complement scheme and we try to add 95 ( 01011111 2 ) and 67 ( 01000011 2 ), we cannot get the correct result (162), simply because that value is outside the range (namely, -128 to +127) of values representable using 2's complement 8-bit numerals. 16 . (Recall that the number of distinct bit strings of length k is 2 k .) Using the standard representation scheme described earlier, a 16-bit integer can have any value between 0 and 2 16 - 1 (i.e., 65535). 32-bit integers have a much larger range of possible values (namely, between 0 and 2 31 - 1), but the point here is that, regardless of the number of bits, the range of values that can be represented by any fixed-length chunk of memory is necessarily finite. Hence, if the (actual) result of some particular computation is outside this range, the result that actually gets stored will be in error. --> Real Numbers A detailed discussion of how real numbers are encoded is omitted for now. But we note that, like integers, real numbers are typically stored in fixed-length chunks of memory, typically either 32 or 64 bits. As with integers, this limits the range of possible values that can be represented. In addition, however, it limits the precision or accuracy with which real numbers can be stored. For example, in the most common 32-bit representation scheme for real numbers (called single-precision floating point ), we cannot accurately represent numbers with more than seven significant (decimal) digits. Hence, for example, the closest we could come (using 32 bits) to representing the number 53.000006372 (having eleven significant digits) might be something closer to 53.00001 (which has only seven digits and is rounded to the nearest one hundred thousandth). Indeed, if the computer were instructed to add 53.0 and 0.000006372, the result would likely be 53.00001.

Extended ASCII extends regular ASCII by using an eighth bit, thereby resulting in a coding scheme for 256 (2 8 ) different characters.

In color images , each pixel has a color. Following the RGB color model , in which red, green, and blue are the primary colors, each pixel's appearance can be described by an RGB triple that describes the intensities of red, green, and blue, respectively, present in that pixel. One standard representation, called truecolor , uses 24 bits to store the RGB value of each pixel, eight bits for each of the three components (which, of course, are viewed as integers in the range 0..255). Each cell in the table below is labeled with the RGB value of its background color.

255,0,0 255,127,0 255,255,0 255,127,127 255,255,127 255,0,127
0,255,0 127,255,0 255,0,255 127,255,127 32,32,32 127,127,127
0,0,255 127,0,255 127,127,255 0,127,127 255,255,255

If you want to play with different combinations of RGB values to see what colors they give rise to, click here . here . (Note that there the color intensities are described on a scale from 0 to 1 (e.g., 0.64) rather than from 0 to 255.) A similar tool can be found here . -->

If you want to view lots of examples of colors and see how they are represented in RGB, click here . click here . (Rather than using three decimal numerals to describe the intensities of red, green, and blue, however, on this site an RGB value is shown as a six-digit hexadecimal (base 16) numeral, with the first two digits giving red's intensity, the next two digits green's intensity, and the last two blue's intensity. (A two-digit hexadecimal numeral can represent any integer in the range 0..255. In hexadecimal, we use A through F as "digits" corresponding to values 10 through 15, respectively.) -->

So far we've talked about how individual pixels are represented. What about an image as a whole? Remember, an image is just a two-dimensional grid of pixels, or rows and columns of pixels. To encode an image as a whole, we can "linearize" the two-dimensional grid into a sequence of pixels by, for example, starting with the first row of pixels, then moving to the second, and then to the third, etc. For example, consider the 5 × 5 table below, which is supposed to illustrate an image with five rows and five columns of pixels. (The image forms a somewhat crude upper case N .)

A compression technique is said to be lossless if it can be reversed, meaning that data compressed using that technique can be decompressed to recover the original representation. A compression technique is said to be lossy if, in general, it cannot be reversed, which is to say that decompression will yield something close to the original representation, but (probably) not matching it exactly. Because the human vision system has only a certain degree of sensitivity, and hence cannot distinguish two images that differ only in subtle ways, most compression techniques that are used for digital images are lossy. The same is true for representations of audio (e.g., music). In contrast, to use lossy compression on numeric or textual data could be disastrous, because, for most applications, it is imperative that that kind of data be recoverable in exact form.

While data comes in many forms, mathematical models are limited to real numbers. As a result, we often have to engineer our inputs prior to model development and inference. Data representation is best illustrated with an example.

This is what the top 3 rows of our dataset looks like -- we can assume that we have at least a few thousand observations. The goal is to train a probabilistic model to determine each person's likelihood of buying a Magazine subscription after being given a free trial.

customer_idpurchased_subscription income_annualized age review linked_payment_method

We want to predict purchased_subscription using the other columns as predictors.

  • income_annualized is a continuous variable between 0 and 10 million
  • age is a positive integer under 120
  • review is on the scale of 1 to 5 (1 being the worst, 5 being the best)
  • linked_payment_method is one of PAYPAL, CREDIT_CARD or NONE if no payment method is linked
  • customer_id is only for identification purposes (excluded from model)

Continuous Variables

Income is the only continuous variable out of our predictors. We can keep income as it is or transform it to another set of real numbers (for instance, we can divide income by 1000 or take its logarithm value). In either case, income remains a real number and needs no further preprocessing.

We can also group income into buckets if we think people in the same income bucket share similar purchasing behavior. For instance, any income under 30,000 would be Group 1, any income between 30,000 and 50,000 would be Group 2, etc. This effectively turns income into a categorical variable .

Categorical Variables

linked_payment_method is a categorical variable represented as a string. If we think that all linked payment types are equal, we can represent payment as a binary indicator (payment linked versus no payment linked). If we think that linking Paypal affects Magazine purchase differently than linking Credit Card (maybe there's a hefty fee for Paypal?), then we should divide this into three classes: no payment linked, paypal and credit card.

Let's go with the second assumption. The next step is to expand our single linked_payment_method predictor into 3 different predictors: no_payment_linked, credit_card_linked, paypal_linked . Each of these predictors will evaluate to 1 if the customer is linked to that method; otherwise, it will evaluate to 0. The sum of these three predictors should equal one for every observation.

This method of enumerating categorical variables is known as one-hot encoding . If we one-hot encoded our categorical variable linked_payment_method , our data would look like:

customer_idpurchased_subscription income_annualized age review no_payment_linkedcredit_card_linkedpaypal_linked

Ordinal Variables

review is an ordinal variable. Ordinal variables are categorical variables where the possible values are ordered. We have the flexibility here to leave them as is or to one-hot encode them. If we one-hot encode them, we will lose information about the order.

Interaction Effects between Variables

The effect of certain variables might differ based on the values of another variable. For instance, the purchase behavior of a 50+ year old making 150,000 a year might be different from the purchase behavior of 20 year old making the same amount of money.

While most models and algorithms will understand these relationships implicitly, it is sometimes better to create new model features to explicitly capture this behavior.

Interaction between Continuous Variables

We can treat both age, income as continuous variables and designate new variable age_income_interaction as their product. We can use age_income_interaction as a predictor in our model.

customer_idincome_annualized ageage_income_interaction
1278,0004949 x 78,000 = 3,822,000
32845,0002121 x 45,000 = 945,000
89120,0005050 x 120,000 = 6,000,000

Interaction between Categorical Variables

Alteratively, we can use age_income_interaction to bucket and capture the interaction between age and income. We will one-hot encode this predictor similarly to what we did with the linked_payment_method predictor.

Age Age > 50
Income Age-Income Bucket 2Age-Income Bucket 3
Income > 50,000Age-Income Bucket 4Age-Income Bucket 5Age-Income Bucket 6

Interaction between Categorical and Continuous Variable

Assume we want to stick with our buckets for income , but keep age as a continuous variable. This creates new predictors income_bucket_1_age and income_bucket_2_age . income_bucket_1_age will reflect the customer's age if they fall under bucket 1 and 0 otherwise. Similarly, income_bucket_2_age will reflect the customer's age if they fall under bucket 2 and 0 otherwise.

While these examples deal with interactions between two variables, we can capture interactions for any number of variables. For instance, we can use the interaction between age, income, review as a predictor.


Data Representation: Definition, Types, Examples

Data Representation: Data representation is a technique for analysing numerical data. The relationship between facts, ideas, information, and concepts is depicted in a diagram via data representation. It is a fundamental learning strategy that is simple and easy to understand. It is always determined by the data type in a specific domain. Graphical representations are available in many different shapes and sizes.

In mathematics, a graph is a chart in which statistical data is represented by curves or lines drawn across the coordinate point indicated on its surface. It aids in the investigation of a relationship between two variables by allowing one to evaluate the change in one variable’s amount in relation to another over time. It is useful for analysing series and frequency distributions in a given context. On this page, we will go through two different types of graphs that can be used to graphically display data. Continue reading to learn more.

Data Representation in Maths

Definition: After collecting the data, the investigator has to condense them in tabular form to study their salient features. Such an arrangement is known as the presentation of data.

Any information gathered may be organised in a frequency distribution table, and then shown using pictographs or bar graphs. A bar graph is a representation of numbers made up of equally wide bars whose lengths are determined by the frequency and scale you choose.

The collected raw data can be placed in any one of the given ways:

  • Serial order of alphabetical order
  • Ascending order
  • Descending order

Data Representation Example

Example: Let the marks obtained by \(30\) students of class VIII in a class test, out of \(50\)according to their roll numbers, be:



The data in the given form is known as raw data or ungrouped data. The above-given data can be placed in the serial order as shown below:

Data Representation Example

Now, for say you want to analyse the standard of achievement of the students. If you arrange them in ascending or descending order, it will give you a better picture.

Ascending order:



Descending order:



When the raw data is placed in ascending or descending order of the magnitude is known as an array or arrayed data.

Graph Representation in Data Structure

A few of the graphical representation of data is given below:

  • Frequency distribution table

Pictorial Representation of Data: Bar Chart

The bar graph represents the ​qualitative data visually. The information is displayed horizontally or vertically and compares items like amounts, characteristics, times, and frequency.

The bars are arranged in order of frequency, so more critical categories are emphasised. By looking at all the bars, it is easy to tell which types in a set of data dominate the others. Bar graphs can be in many ways like single, stacked, or grouped.

Bar Chart

Graphical Representation of Data: Frequency Distribution Table

A frequency table or frequency distribution is a method to present raw data in which one can easily understand the information contained in the raw data.

The frequency distribution table is constructed by using the tally marks. Tally marks are a form of a numerical system with the vertical lines used for counting. The cross line is placed over the four lines to get a total of \(5\).

Frequency Distribution Table

Consider a jar containing the different colours of pieces of bread as shown below:

Frequency Distribution Table Example

Construct a frequency distribution table for the data mentioned above.

Frequency Distribution Table Example

Graphical Representation of Data: Histogram

The histogram is another kind of graph that uses bars in its display. The histogram is used for quantitative data, and ranges of values known as classes are listed at the bottom, and the types with greater frequencies have the taller bars.

A histogram and the bar graph look very similar; however, they are different because of the data level. Bar graphs measure the frequency of the categorical data. A categorical variable has two or more categories, such as gender or hair colour.


Graphical Representation of Data: Pie Chart

The pie chart is used to represent the numerical proportions of a dataset. This graph involves dividing a circle into different sectors, where each of the sectors represents the proportion of a particular element as a whole. Thus, it is also known as a circle chart or circle graph.

Pie Chart

Graphical Representation of Data: Line Graph

A graph that uses points and lines to represent change over time is defined as a line graph. In other words, it is the chart that shows a line joining multiple points or a line that shows the link between the points.

The diagram illustrates the quantitative data between two changing variables with the straight line or the curve that joins a series of successive data points. Linear charts compare two variables on the vertical and the horizontal axis.

Line Graph

General Rules for Visual Representation of Data

We have a few rules to present the information in the graphical representation effectively, and they are given below:

  • Suitable Title:  Ensure that the appropriate title is given to the graph, indicating the presentation’s subject.
  • Measurement Unit:  Introduce the measurement unit in the graph.
  • Proper Scale:  To represent the data accurately, choose an appropriate scale.
  • Index:  In the Index, the appropriate colours, shades, lines, design in the graphs are given for better understanding.
  • Data Sources:  At the bottom of the graph, include the source of information wherever necessary.
  • Keep it Simple:  Build the graph in a way that everyone should understand easily.
  • Neat:  You have to choose the correct size, fonts, colours etc., in such a way that the graph must be a model for the presentation of the information.

Solved Examples on Data Representation

Q.1. Construct the frequency distribution table for the data on heights in \(({\rm{cm}})\) of \(20\) boys using the class intervals \(130 – 135,135 – 140\) and so on. The heights of the boys in \({\rm{cm}}\) are: 

Data Representation Example 1

Ans: The frequency distribution for the above data can be constructed as follows:

Data Representation Example

Q.2. Write the steps of the construction of Bar graph? Ans: To construct the bar graph, follow the given steps: 1. Take a graph paper, draw two lines perpendicular to each other, and call them horizontal and vertical. 2. You have to mark the information given in the data like days, weeks, months, years, places, etc., at uniform gaps along the horizontal axis. 3. Then you have to choose the suitable scale to decide the heights of the rectangles or the bars and then mark the sizes on the vertical axis. 4. Draw the bars or rectangles of equal width and height marked in the previous step on the horizontal axis with equal spacing. The figure so obtained will be the bar graph representing the given numerical data.

Q.3. Read the bar graph and then answer the given questions: I. Write the information provided by the given bar graph. II. What is the order of change of the number of students over several years? III. In which year is the increase of the student maximum? IV. State whether true or false. The enrolment during \(1996 – 97\) is double that of \(1995 – 96\)

pictorial representation of data

Ans: I. The bar graph represents the number of students in class \({\rm{VI}}\) of a school during the academic years \(1995 – 96\,to\,1999 – 2000\). II. The number of stcccccudents is changing in increasing order as the heights of bars are growing. III. The increase in the number of students in uniform and the increase in the height of bars is uniform. Hence, in this case, the growth is not maximum in any of the years. The enrolment in the years is \(1996 – 97\, = 200\). and the enrolment in the years is \(1995 – 96\, = 150\). IV. The enrolment in \(1995 – 97\,\) is not double the enrolment in \(1995 – 96\). So the statement is false.

Q.4. Write the frequency distribution for the given information of ages of \(25\) students of class VIII in a school. \(15,\,16,\,16,\,14,\,17,\,17,\,16,\,15,\,15,\,16,\,16,\,17,\,15\) \(16,\,16,\,14,\,16,\,15,\,14,\,15,\,16,\,16,\,15,\,14,\,15\) Ans: Frequency distribution of ages of \(25\) students:

Data Representation Example

Q.5. There are \(20\) students in a classroom. The teacher asked the students to talk about their favourite subjects. The results are listed below:

Data Representation Example

By looking at the above data, which is the most liked subject? Ans: Representing the above data in the frequency distribution table by using tally marks as follows:

Data Representation Example

From the above table, we can see that the maximum number of students \((7)\) likes mathematics.

Also, Check –

  • Diagrammatic Representation of Data

In the given article, we have discussed the data representation with an example. Then we have talked about graphical representation like a bar graph, frequency table, pie chart, etc. later discussed the general rules for graphic representation. Finally, you can find solved examples along with a few FAQs. These will help you gain further clarity on this topic.

FAQs on Data Representation

Q.1: How is data represented? A: The collected data can be expressed in various ways like bar graphs, pictographs, frequency tables, line graphs, pie charts and many more. It depends on the purpose of the data, and accordingly, the type of graph can be chosen.

Q.2: What are the different types of data representation? A : The few types of data representation are given below: 1. Frequency distribution table 2. Bar graph 3. Histogram 4. Line graph 5. Pie chart

Q.3: What is data representation, and why is it essential? A: After collecting the data, the investigator has to condense them in tabular form to study their salient features. Such an arrangement is known as the presentation of data. Importance: The data visualization gives us a clear understanding of what the information means by displaying it visually through maps or graphs. The data is more natural to the mind to comprehend and make it easier to rectify the trends outliners or trends within the large data sets.

Q.4: What is the difference between data and representation? A: The term data defines the collection of specific quantitative facts in their nature like the height, number of children etc., whereas the information in the form of data after being processed, arranged and then presented in the state which gives meaning to the data is data representation.

Q.5: Why do we use data representation? A: The data visualization gives us a clear understanding of what the information means by displaying it visually through maps or graphs. The data is more natural to the mind to comprehend and make it easier to rectify the trends outliners or trends within the large data sets.

Related Articles

Ways To Improve Learning Outcomes: With the development of technology, students may now rely on strategies to enhance learning outcomes. No matter how knowledgeable a...

The Three States of Matter: Anything with mass and occupied space is called ‘Matter’. Matters of different kinds surround us. There are some we can...

Motion is the change of a body's position or orientation over time. The motion of humans and animals illustrates how everything in the cosmos is...

Understanding Frequency Polygon: Students who are struggling with understanding Frequency Polygon can check out the details here. A graphical representation of data distribution helps understand...

When you receive your order of clothes or leather shoes or silver jewellery from any online shoppe, you must have noticed a small packet containing...

Visual Learning Style: We as humans possess the power to remember those which we have caught visually in our memory and that too for a...

Air Pollution: In the past, the air we inhaled was pure and clean. But as industrialisation grows and the number of harmful chemicals in the...

In biology, flowering plants are known by the name angiosperms. Male and female reproductive organs can be found in the same plant in flowering plants....

Integers Introduction: To score well in the exam, students must check out the Integers introduction and understand them thoroughly. The collection of negative numbers and whole...

Human Respiratory System: Students preparing for the NEET and Biology-related exams must have an idea about the human respiratory system. It is a network of tissues...

Place Value of Numbers: Detailed Explanation

Place Value of Numbers: Students must understand the concept of the place value of numbers to score high in the exam. In mathematics, place value...

The Leaf: Types, Structures, Parts

The Leaf: Students who want to understand everything about the leaf can check out the detailed explanation provided by Embibe experts. Plants have a crucial role...

Factors Affecting Respiration: Definition, Diagrams with Examples

In plants, respiration can be regarded as the reversal of the photosynthetic process. Like photosynthesis, respiration involves gas exchange with the environment. Unlike photosynthesis, respiration...

General Terms Related to Spherical Mirrors

General terms related to spherical mirrors: A mirror with the shape of a portion cut out of a spherical surface or substance is known as a...

Number System: Types, Conversion and Properties

Number System: Numbers are highly significant and play an essential role in Mathematics that will come up in further classes. In lower grades, we learned how...

Types of Respiration

Every living organism has to "breathe" to survive. The process by which the living organisms use their food to get energy is called respiration. It...

Animal Cell: Definition, Diagram, Types of Animal Cells

Animal Cell: An animal cell is a eukaryotic cell with membrane-bound cell organelles without a cell wall. We all know that the cell is the fundamental...

Conversion of Percentages: Conversion Method & Examples

Conversion of Percentages: To differentiate and explain the size of quantities, the terms fractions and percent are used interchangeably. Some may find it difficult to...

Arc of a Circle: Definition, Properties, and Examples

Arc of a circle: A circle is the set of all points in the plane that are a fixed distance called the radius from a fixed point...

Ammonia (NH3): Preparation, Structure, Properties and Uses

Ammonia, a colourless gas with a distinct odour, is a chemical building block and a significant component in producing many everyday items. It is found...

CGPA to Percentage: Calculator for Conversion, Formula, & More

CGPA to Percentage: The average grade point of a student is calculated using their cumulative grades across all subjects, omitting any supplemental coursework. Many colleges,...

Uses of Ether – Properties, Nomenclature, Uses, Disadvantages

Uses of Ether:  Ether is an organic compound containing an oxygen atom and an ether group connected to two alkyl/aryl groups. It is formed by the...

General and Middle Terms: Definitions, Formula, Independent Term, Examples

General and Middle terms: The binomial theorem helps us find the power of a binomial without going through the tedious multiplication process. Further, the use...

Mutually Exclusive Events: Definition, Formulas, Solved Examples

Mutually Exclusive Events: In the theory of probability, two events are said to be mutually exclusive events if they cannot occur simultaneously or at the...

Geometry: Definition, Shapes, Structure, Examples

Geometry is a branch of mathematics that is largely concerned with the forms and sizes of objects, their relative positions, and the qualities of space....

Bohr’s Model of Hydrogen Atom: Expressions for Radius, Energy

Rutherford’s Atom Model was undoubtedly a breakthrough in atomic studies. However, it was not wholly correct. The great Danish physicist Niels Bohr (1885–1962) made immediate...

Types of Functions: Definition, Classification and Examples

Types of Functions: Functions are the relation of any two sets. A relation describes the cartesian product of two sets. Cartesian products of two sets...

data representation wikipedia

The process of collecting the data and analyzing that data in large quantity is known as statistics. It is a branch of mathematics trading with the collection, analysis, interpretation, and presentation of numeral facts and figures.

It is a numerical statement that helps us to collect and analyze the data in large quantity the statistics are based on two of its concepts:

  • Statistical Data 
  • Statistical Science

Statistics must be expressed numerically and should be collected systematically.

Data Representation

The word data refers to constituting people, things, events, ideas. It can be a title, an integer, or anycast.  After collecting data the investigator has to condense them in tabular form to study their salient features. Such an arrangement is known as the presentation of data.

It refers to the process of condensing the collected data in a tabular form or graphically. This arrangement of data is known as Data Representation.

The row can be placed in different orders like it can be presented in ascending orders, descending order, or can be presented in alphabetical order. 

Example: Let the marks obtained by 10 students of class V in a class test, out of 50 according to their roll numbers, be: 39, 44, 49, 40, 22, 10, 45, 38, 15, 50 The data in the given form is known as raw data. The above given data can be placed in the serial order as shown below: Roll No. Marks 1 39 2 44 3 49 4 40 5 22 6 10 7 45 8 38 9 14 10 50 Now, if you want to analyse the standard of achievement of the students. If you arrange them in ascending or descending order, it will give you a better picture. Ascending order: 10, 15, 22, 38, 39, 40, 44. 45, 49, 50 Descending order: 50, 49, 45, 44, 40, 39, 38, 22, 15, 10 When the row is placed in ascending or descending order is known as arrayed data.

Types of Graphical Data Representation

Bar chart helps us to represent the collected data visually. The collected data can be visualized horizontally or vertically in a bar chart like amounts and frequency. It can be grouped or single. It helps us in comparing different items. By looking at all the bars, it is easy to say which types in a group of data influence the other.

Now let us understand bar chart by taking this example  Let the marks obtained by 5 students of class V in a class test, out of 10 according to their names, be: 7,8,4,9,6 The data in the given form is known as raw data. The above given data can be placed in the bar chart as shown below: Name Marks Akshay 7 Maya 8 Dhanvi 4 Jaslen 9 Muskan 6

A histogram is the graphical representation of data. It is similar to the appearance of a bar graph but there is a lot of difference between histogram and bar graph because a bar graph helps to measure the frequency of categorical data. A categorical data means it is based on two or more categories like gender, months, etc. Whereas histogram is used for quantitative data.

For example:

The graph which uses lines and points to present the change in time is known as a line graph. Line graphs can be based on the number of animals left on earth, the increasing population of the world day by day, or the increasing or decreasing the number of bitcoins day by day, etc. The line graphs tell us about the changes occurring across the world over time. In a  line graph, we can tell about two or more types of changes occurring around the world.

For Example:

Pie chart is a type of graph that involves a structural graphic representation of numerical proportion. It can be replaced in most cases by other plots like a bar chart, box plot, dot plot, etc. As per the research, it is shown that it is difficult to compare the different sections of a given pie chart, or if it is to compare data across different pie charts.

Frequency Distribution Table

A frequency distribution table is a chart that helps us to summarise the value and the frequency of the chart. This frequency distribution table has two columns, The first column consist of the list of the various outcome in the data, While the second column list the frequency of each outcome of the data. By putting this kind of data into a table it helps us to make it easier to understand and analyze the data. 

For Example: To create a frequency distribution table, we would first need to list all the outcomes in the data. In this example, the results are 0 runs, 1 run, 2 runs, and 3 runs. We would list these numerals in numerical ranking in the foremost queue. Subsequently, we ought to calculate how many times per result happened. They scored 0 runs in the 1st, 4th, 7th, and 8th innings, 1 run in the 2nd, 5th, and the 9th innings, 2 runs in the 6th inning, and 3 runs in the 3rd inning. We set the frequency of each result in the double queue. You can notice that the table is a vastly more useful method to show this data.  Baseball Team Runs Per Inning Number of Runs Frequency           0       4           1        3            2        1            3        1

Sample Questions

Question 1: Considering the school fee submission of 10 students of class 10th is given below:

Muskan  Paid
Kritika Not paid
Anmol Not paid
Raghav Paid
Nitin Paid
Dhanvi Paid
Jasleen Paid
Manas Not paid
Anshul Not paid
Sahil Paid
In order to draw the bar graph for the data above, we prepare the frequency table as given below. Fee submission No. of Students Paid   6 Not paid    4 Now we have to represent the data by using the bar graph. It can be drawn by following the steps given below: Step 1: firstly we have to draw the two axis of the graph X-axis and the Y-axis. The varieties of the data must be put on the X-axis (the horizontal line) and the frequencies of the data must be put on the Y-axis (the vertical line) of the graph. Step 2: After drawing both the axis now we have to give the numeric scale to the Y-axis (the vertical line) of the graph It should be started from zero and ends up with the highest value of the data. Step 3: After the decision of the range at the Y-axis now we have to give it a suitable difference of the numeric scale. Like it can be 0,1,2,3…….or 0,10,20,30 either we can give it a numeric scale like 0,20,40,60… Step 4: Now on the X-axis we have to label it appropriately. Step 5: Now we have to draw the bars according to the data but we have to keep in mind that all the bars should be of the same length and there should be the same distance between each graph

Question 2: Watch the subsequent pie chart that denotes the money spent by Megha at the funfair. The suggested colour indicates the quantity paid for each variety. The total value of the data is 15 and the amount paid on each variety is diagnosed as follows:

Chocolates – 3

Wafers – 3

Toys – 2

Rides – 7

To convert this into pie chart percentage, we apply the formula:  (Frequency/Total Frequency) × 100 Let us convert the above data into a percentage: Amount paid on rides: (7/15) × 100 = 47% Amount paid on toys: (2/15) × 100 = 13% Amount paid on wafers: (3/15) × 100 = 20% Amount paid on chocolates: (3/15) × 100 = 20 %

Question 3: The line graph given below shows how Devdas’s height changes as he grows.

Given below is a line graph showing the height changes in Devdas’s as he grows. Observe the graph and answer the questions below.

data representation wikipedia

(i) What was the height of  Devdas’s at 8 years? Answer: 65 inches (ii) What was the height of  Devdas’s at 6 years? Answer:  50 inches (iii) What was the height of  Devdas’s at 2 years? Answer: 35 inches (iv) How much has  Devdas’s grown from 2 to 8 years? Answer: 30 inches (v) When was  Devdas’s 35 inches tall? Answer: 2 years.

