This specification extends the ImageBitmap interface to allow JavaScript developers to read the underlying data out and set an external data into an ImageBitmap in a set of supported ImageFormats.

This document is not complete and is subject to change. Early experimentations are encouraged to allow the Media Capture Task Force to evolve the specification based on technical discussions within the Task Force, implementation experience gained from early implementations, and feedback from other groups and individuals.


The ImageBitmap interface is originally designed as a pure opaque handler to an image data buffer inside a browser so that how it stores the buffer is uknown to users and optimized to platforms.

Currently, the Media Capture Stream with Video Worker specification [mediacapture-worker] proposes a video processing framework on the web platform. It provides web applications a way to hook a processing script (which is ran in a VideoWorker) to a MediaStreamTrack with video data so that each frame will be processed accordingly. If a VideoWorker is successfully hooked to a track, then for each video frame of the track, the framework will post a VideoProcessEvent to the VideoWorker. A VideoProcessEvent has an inputImageBitmap property and an outputImageBitmap property so that JavaScript developers could read the raw image data of the current video frame out from the inputImageBitmap, then after some processing, write the processed image data back into the outputImageBitmap which is handled by the framework thereafter. This specification is meant to extend the ImageBitmap interface with new APIs to read data from and write data back into an ImageBitmap.

The Media Capture Stream with Video Worker specification [mediacapture-worker] chooses ImageBitmap (instead of ImageData) as the container of video frames because the decoded video frame data might exist in either CPU or GPU memory which perfectly matches the nature of ImageBitmap as an opaque handler.

Considering how would developers process video frames, there are two possible ways, via JavaScript(/asm.js) or WebGL.

  1. If developers use WebGL, then treating an ImageBitmap as an opaque container, passing it into the WebGL context and the browser will handle how to upload the raw image data into the GPU memory. Possiblely, the data is already in the GPU memory so that no further operation is needed through.
  2. If developers use JavaScript(/asm.js) to process the frames, then new APIs to access the raw data inside an ImageBitmp are needed since the current ImageBitmap interface provides no way for developers to do so.

In this specification, the original ImageBitmap interface is extended with four methods to let developers read data from and write data into an ImageBitmap in a set of supported ImageFormats. Also, two interfaces ImageFormatPixelLayout and ChannelPixelLayout are proposed to work with the extend ImageBitmap methods to describe how the accessed image data is arranged in memory.


The ImageBitmap interface this specification extends are defined in [[!HTML51]].

The Media Capture Stream with Video Worker specificaton proposes a video processing framework on the web platform. [ mediacapture-worker]

The VideoWorker is defined in [mediacapture-worker].

The VideoProcessEvent is defined in [mediacapture-worker].

Image format

An image or a video frame is conceptually a two-dimentional array of data and each element in the array is called a pixel. However, the pixels are usually stored in a one-dimentional array and could be arranged in a variety of ImageFormats. Developers need to know how the pixels are formatted so that they are able to process it. An ImageFormat describes how pixels in an image are arranged and all pixels in one single image are arranged in the same way. A single pixel has at least one, but usually multiple pixel values. The range of a pixel value varies, which means different ImageFormats use different data types to store a single pixel value. The most popular data type is 8-bit unsigned interger whose range is from 0 to 255, others could be 16-bit interger or 32-bit folating points and so forth. The number of pixle values of a single pixel is called the number of channels of the ImageFormat. Multiple pixel valuse of a pixel are used together to describe the captured property which could be color or depth information. For example, if the data is a color image in RGB color space, then it is a three-channel ImageFormat and a pixel is described by R, G and B three pixel values with range from 0 to 255. Another example, if the data is a gray image, then it is a single-channel ImageFormat with 8-bit unsigned interger data type and the pixel value describes the gray scale. For depth data, it is a single channel ImageFormat too, but the data type is 16-bit unsigned interger and the pixel value is the depth level. For those ImageFormats whose pixel contain multiple pixel values, the pixel values might be arranged in a planar way or interleaving way:

  1. Planar pixel layout: each channel has its pixel values stored consecutively in separated buffers (a.k.a. planes) and then all channel buffers are stored consecutively in memory. (Ex: RRRRRR......GGGGGG......BBBBBB......)
  2. Interleaving pixel layout: each pixel has its pixel values from all channels stored together and interleaves all channels. (Ex: RGBRGBRGBRGBRGB......)

ImageFormats belong to the same color space might have different pixel layouts.



An enumeration ImageFormat defines a list of image formats which are exposed to users. The extend APIs of ImageBitmap use this enumeration to negotiate the format while accessing the underlying data of ImageBitmap and writing back.

We need to elaborate this list for standardization.


Channel order: R, G, B, A

Channel size: full rgba-chennels

Pixel layout: interleaving rgba-channels

Data type: 8-bit unsigned integer


Channel order: B, G, R, A

Channel size: full bgra-channels

Pixel layout: interleaving bgra-channels

Data type: 8-bit unsigned integer


Channel order: R, G, B

Channel size: full rgb-channels

Pixel layout: interleaving rgb-channels

Data type: 8-bit unsigned integer


Channel order: B, G, R

Channel size: full bgr-channels

Pixel layout: interleaving bgr-channels

Data type: 8-bit unsigned integer


Channel order: GRAY

Channel size: full gray-channel

Pixel layout: planar gray-channel

Data type: 8-bit unsigned integer


Channel order: Y, U, V

Channel size: full yuv-channels

Pixel layout: planar yuv-channels

Data type: 8-bit unsigned integer


Channel order: Y, U, V

Channel size: full y-channel, half uv-channels

Pixel layout: planar yuv-channels

Data type: 8-bit unsigned integer


Channel order: Y, U, V

Channel size: full y-channel, quarter uv-channels

Pixel layout: planar yuv-channels

Data type: 8-bit unsigned integer


Channel order: Y, U, V

Channel size: full y-channel, quarter uv-channels

Pixel layout: planar y-channel, interleaving uv-channels

Data type: 8-bit unsigned integer


Channel order: Y, V, U

Channel size: full y-channel, quarter uv-channels

Pixel layout: planar y-channel, interleaving vu-channels

Data type: 8-bit unsigned integer


Channel order: H, S, V

Channel size: full hsv-channels

Pixel layout: interleaving hsv-channels

Data type: 8-bit unsigned integer


Channel order: l, a, b

Channel size: full lab-channels

Pixel layout: interleaving lab-channels

Data type: 8-bit unsigned integer


Channel order: DEPTH

Channel size: full depth-channel

Pixel layout: planar depth-channel

Data type: 16-bit unsigned integer


Two interfaces, ImageFormatPixelLayout and ChannelPixelLayout, help together to generalize the variety of pixel layouts among image formats.

The ImageFormatPixelLayout represents the pixel layout of a certain image format and since a image format is composed by at least one channel so ImageFormatPixelLayout contains at least one ChannelPixelLayout.

Although an image or a video frame is a two-dimensional structure, its data is usually stored in an one-dimensional array in the raw-major way and the ChannelPixelLayout uses the following properties to describe how pixel values are arranged in the one dimentional array buffer.

  1. offset: where is each channel's data starts from. (Relative to the beginning of the video data one-dimension array.)
  2. width and height: how much samples are in each channel.
  3. data type: the data type used to store one single pixel value.
  4. stride: the total bytes of each raw plus the padding bytes of each row.
  5. skip: this is used to describe interleaving layout. (For planar layout, this property will be zero.)

          Example1: RGBA image, width = 620, height = 480, stride = 2560

          chanel_r: offset = 0, width = 620, height = 480, data type = uint8, stride = 2560, skip = 3
          chanel_g: offset = 1, width = 620, height = 480, data type = uint8, stride = 2560, skip = 3
          chanel_b: offset = 2, width = 620, height = 480, data type = uint8, stride = 2560, skip = 3
          chanel_a: offset = 3, width = 620, height = 480, data type = uint8, stride = 2560, skip = 3

                  <---------------------------- stride ---------------------------->
                  <---------------------- width x 4 ---------------------->
          [index] 01234   8   12  16  20  24  28                           2479    2559
          [data]  RGBARGBARGBARGBARGBAR___R___R...                         A%%%%%%%%
          [data]  RGBARGBARGBARGBARGBAR___R___R...                         A%%%%%%%%
          [data]  RGBARGBARGBARGBARGBAR___R___R...                         A%%%%%%%%
          Example2: YUV420P image, width = 620, height = 480, stride = 640

          chanel_y: offset = 0, width = 620, height = 480, stride = 640, skip = 0
          chanel_u: offset = 307200, width = 310, height = 240, data type = uint8, stride = 320, skip = 0
          chanel_v: offset = 384000, width = 310, height = 240, data type = uint8, stride = 320, skip = 0

                  <--------------------------- y-stride --------------------------->
                  <----------------------- y-width ----------------------->
          [index] 012345                                                  619      639
          [data]  YYYYYYYYYYYYYYYYYYYYYYYYYYYYY...                        Y%%%%%%%%%
          [data]  YYYYYYYYYYYYYYYYYYYYYYYYYYYYY...                        Y%%%%%%%%%
          [data]  YYYYYYYYYYYYYYYYYYYYYYYYYYYYY...                        Y%%%%%%%%%
          [data]  ......
                  <-------- u-stride ---------->
                  <----- u-width ----->
          [index] 307200              307509   307519
          [data]  UUUUUUUUUU...       U%%%%%%%%%
          [data]  UUUUUUUUUU...       U%%%%%%%%%
          [data]  UUUUUUUUUU...       U%%%%%%%%%
          [data]  ......
                  <-------- v-stride ---------->
                  <- --- v-width ----->
          [index] 384000              384309   384319
          [data]  VVVVVVVVVV...       V%%%%%%%%%
          [data]  VVVVVVVVVV...       V%%%%%%%%%
          [data]  VVVVVVVVVV...       V%%%%%%%%%
          [data]  ......
          Example3: YUV420SP_NV12 image, width = 620, height = 480, stride = 640

          chanel_y: offset = 0, width = 620, height = 480, stride = 640, skip = 0
          chanel_u: offset = 307200, width = 310, height = 240, data type = uint8, stride = 640, skip = 1
          chanel_v: offset = 307201, width = 310, height = 240, data type = uint8, stride = 640, skip = 1

                  <--------------------------- y-stride -------------------------->
                  <----------------------- y-width ---------------------->
          [index] 012345                                                 619      639
          [data]  YYYYYYYYYYYYYYYYYYYYYYYYYYYYY...                       Y%%%%%%%%%
          [data]  YYYYYYYYYYYYYYYYYYYYYYYYYYYYY...                       Y%%%%%%%%%
          [data]  YYYYYYYYYYYYYYYYYYYYYYYYYYYYY...                       Y%%%%%%%%%
          [data]  ......
                  <--------------------- u-stride / v-stride -------------------->
                  <------------------ u-width + v-width ----------------->
          [index] 307200(u-offset)                                       307819  307839
          [index] |307201(v-offset)                                      |307820 |
          [data]  UVUVUVUVUVUVUVUVUVUVUVUVUVUVUV...                      UV%%%%%%%
          [data]  UVUVUVUVUVUVUVUVUVUVUVUVUVUVUV...                      UV%%%%%%%
          [data]  UVUVUVUVUVUVUVUVUVUVUVUVUVUVUV...                      UV%%%%%%%
                   ^            ^
                  u-skip        v-skip
          Example4: DEPTH image, width = 640, height = 480, stride = 1280

          chanel_d: offset = 0, width = 640, height = 480, data type = uint16, stride = 1280, skip = 0

                  <----------------------- d-stride ---------------------->
                  <----------------------- d-width ----------------------->
          [index] 012345                                                  1280
          [data]  DDDDDDDDDDDDDDDDDDDDDDDDDDDDD...                        D
          [data]  DDDDDDDDDDDDDDDDDDDDDDDDDDDDD...                        D
          [data]  DDDDDDDDDDDDDDDDDDDDDDDDDDDDD...                        D
          [data]  ......


[Constant] readonly attribute unsigned long offset

The beginning position of this channel's data (relative to the given ArrayBuffer parameter of the mapDataInto() method.)

[Constant] readonly attribute unsigned long width

The width of this channel. Channels in a image format may have different width.

[Constant] readonly attribute unsigned long height

The height of this channel. Channels in a image format may have different height.

[Constant] readonly attribute DataType dataType

The data type used to store one single pixel value.

[Constant] readonly attribute unsigned long stride

The stride of this channel. The stride is the number of bytes between the beging two consecutive raws in memory. The total bytes of each raw plus the padding bytes of each raw.

[Constant] readonly attribute unsigned long skip

This is used to describe how much bytes between two adjacent pixel values in this channel.

Possible values:

  • zero: for planar format.
  • a positive integer: for interleaving format.

8-bit unsigned integer.
8-bit integer.
16-bit unsigned integer.
16-bit integer.
32-bit unsigned integer.
32-bit integer.
32-bit IEEE floating point number.
64-bit IEEE floating point number.


[Constant, Cached] readonly attribute sequence<ChannelPixelLayout> channels

Channel information of this image format. Each image format has at least one channel.

ImageBitmap extensions

ImageFormat findOptimalFormat()

Find the best image format for receiving data.

@return one of the possibleFormats or the empty string if no any format in the list is supported. If the possibleFormats is not given, then returns the most suitable image format for this ImageBitmap from all supported image formats.

optional sequence<ImageFormat> possibleFormats

A list of image formats that users can handler.

[Throws] long mappedDataLength()

Calculate the length of mapped data wile the image is represented in the given format.

Throws if format is not supported.

@return the length (in bytes) of image data the represented in the given format.

ImageFormat format

The format that users want.

[Throws] Promise<ImageFormatPixelLayout> mapDataInto()

Makes a copy of the underlying image data in the given format format into the given buffer at offset offset, filling at most length bytes and returns a ImageFormatPixelLayout object which describes the pixel layout.

Throws if format is not supported.

Each time this method is invoked returns a new ImageFormatPixelLayout object.

@return a ImageFormatPixelLayout object which describes the pixel layout.

ImageFormat format

The format that users want.

ArrayBuffer buffer

A container for receiving the mapped image data.

long offset

The beginning position of the buffer to place the mapped data.

long length

The length of space in the buffer that could be filled.

[Throws] boolean setDataFrom()

Set an external image data into a ImageBitmap.

ImageFormat format

The format of the external image data.

ArrayBuffer buffer

A container of the external image data.

long offset

The beginning position of the buffer where the external image data is placed.

long length

The length of spaces in the buffer that the external image data is palced.

ImageFormatPixelLayout layout

The pixel layout of the external image data, which describes how the data is arranged in the given buffer as the given format.


Thanks to Jeff Muizelaar for providing insightful oppinions in the APIs design and the experimantal implement.