This specification extends the ImageBitmap interface
to allow JavaScript developers to read the underlying data out and set
an external data into an ImageBitmap
in a set of
supported ImageFormats.
This document is not complete and is subject to change. Early experimentations are encouraged to allow the Media Capture Task Force to evolve the specification based on technical discussions within the Task Force, implementation experience gained from early implementations, and feedback from other groups and individuals.
The ImageBitmap interface
is originally designed as
a pure opaque handler to an image data buffer inside a browser so that
how it stores the buffer is uknown to users and optimized to platforms.
Currently, the Media Capture Stream with Video Worker
specification [mediacapture-worker] proposes a video processing
framework on the web platform. It provides web applications a way to
hook a processing script (which is ran in a VideoWorker) to a
MediaStreamTrack
with video data so that each frame will
be processed accordingly. If a VideoWorker is successfully hooked
to a track, then for each video frame of the track, the framework will
post a VideoProcessEvent to the VideoWorker.
A VideoProcessEvent has an inputImageBitmap
property and an outputImageBitmap
property so that
JavaScript developers could read the raw image data of the current video
frame out from the inputImageBitmap
, then after some
processing, write the processed image data back into the
outputImageBitmap
which is handled by the framework thereafter.
This specification is meant to extend the ImageBitmap interface
with new APIs to read data from and write data back into an
ImageBitmap
.
The Media Capture Stream with Video Worker specification
[mediacapture-worker] chooses ImageBitmap
(instead of ImageData
) as the container of video frames
because the decoded video frame data might exist in either CPU or GPU
memory which perfectly matches the nature of
ImageBitmap
as an opaque handler.
Considering how would developers process video frames, there are two possible ways, via JavaScript(/asm.js) or WebGL.
ImageBitmap interface
provides
no way for developers to do so.
In this specification, the original ImageBitmap interface is extended with four methods to let developers read data from and write data into an ImageBitmap in a set of supported ImageFormats. Also, two interfaces ImageFormatPixelLayout and ChannelPixelLayout are proposed to work with the extend ImageBitmap methods to describe how the accessed image data is arranged in memory.
The
ImageBitmap interface
this specification extends
are defined in [[!HTML51]].
The Media Capture Stream with Video Worker specificaton proposes a video processing framework on the web platform. [ mediacapture-worker]
The
VideoWorker
is defined in [mediacapture-worker].
The
VideoProcessEvent
is defined in [mediacapture-worker].
An image or a video frame is conceptually a two-dimentional array of data and each element in the array is called a pixel. However, the pixels are usually stored in a one-dimentional array and could be arranged in a variety of ImageFormats. Developers need to know how the pixels are formatted so that they are able to process it. An ImageFormat describes how pixels in an image are arranged and all pixels in one single image are arranged in the same way. A single pixel has at least one, but usually multiple pixel values. The range of a pixel value varies, which means different ImageFormats use different data types to store a single pixel value. The most popular data type is 8-bit unsigned interger whose range is from 0 to 255, others could be 16-bit interger or 32-bit folating points and so forth. The number of pixle values of a single pixel is called the number of channels of the ImageFormat. Multiple pixel valuse of a pixel are used together to describe the captured property which could be color or depth information. For example, if the data is a color image in RGB color space, then it is a three-channel ImageFormat and a pixel is described by R, G and B three pixel values with range from 0 to 255. Another example, if the data is a gray image, then it is a single-channel ImageFormat with 8-bit unsigned interger data type and the pixel value describes the gray scale. For depth data, it is a single channel ImageFormat too, but the data type is 16-bit unsigned interger and the pixel value is the depth level. For those ImageFormats whose pixel contain multiple pixel values, the pixel values might be arranged in a planar way or interleaving way:
ImageFormats belong to the same color space might have different pixel layouts.
An enumeration ImageFormat defines a list of image formats which are exposed to users. The extend APIs of ImageBitmap use this enumeration to negotiate the format while accessing the underlying data of ImageBitmap and writing back.
We need to elaborate this list for standardization.
Channel order: R, G, B, A
Channel size: full rgba-chennels
Pixel layout: interleaving rgba-channels
Data type: 8-bit unsigned integer
Channel order: B, G, R, A
Channel size: full bgra-channels
Pixel layout: interleaving bgra-channels
Data type: 8-bit unsigned integer
Channel order: R, G, B
Channel size: full rgb-channels
Pixel layout: interleaving rgb-channels
Data type: 8-bit unsigned integer
Channel order: B, G, R
Channel size: full bgr-channels
Pixel layout: interleaving bgr-channels
Data type: 8-bit unsigned integer
Channel order: GRAY
Channel size: full gray-channel
Pixel layout: planar gray-channel
Data type: 8-bit unsigned integer
Channel order: Y, U, V
Channel size: full yuv-channels
Pixel layout: planar yuv-channels
Data type: 8-bit unsigned integer
Channel order: Y, U, V
Channel size: full y-channel, half uv-channels
Pixel layout: planar yuv-channels
Data type: 8-bit unsigned integer
Channel order: Y, U, V
Channel size: full y-channel, quarter uv-channels
Pixel layout: planar yuv-channels
Data type: 8-bit unsigned integer
Channel order: Y, U, V
Channel size: full y-channel, quarter uv-channels
Pixel layout: planar y-channel, interleaving uv-channels
Data type: 8-bit unsigned integer
Channel order: Y, V, U
Channel size: full y-channel, quarter uv-channels
Pixel layout: planar y-channel, interleaving vu-channels
Data type: 8-bit unsigned integer
Channel order: H, S, V
Channel size: full hsv-channels
Pixel layout: interleaving hsv-channels
Data type: 8-bit unsigned integer
Channel order: l, a, b
Channel size: full lab-channels
Pixel layout: interleaving lab-channels
Data type: 8-bit unsigned integer
Channel order: DEPTH
Channel size: full depth-channel
Pixel layout: planar depth-channel
Data type: 16-bit unsigned integer
Two interfaces, ImageFormatPixelLayout and ChannelPixelLayout, help together to generalize the variety of pixel layouts among image formats.
The ImageFormatPixelLayout represents the pixel layout of a certain image format and since a image format is composed by at least one channel so ImageFormatPixelLayout contains at least one ChannelPixelLayout.
Although an image or a video frame is a two-dimensional structure, its data is usually stored in an one-dimensional array in the raw-major way and the ChannelPixelLayout uses the following properties to describe how pixel values are arranged in the one dimentional array buffer.
Example1: RGBA image, width = 620, height = 480, stride = 2560 chanel_r: offset = 0, width = 620, height = 480, data type = uint8, stride = 2560, skip = 3 chanel_g: offset = 1, width = 620, height = 480, data type = uint8, stride = 2560, skip = 3 chanel_b: offset = 2, width = 620, height = 480, data type = uint8, stride = 2560, skip = 3 chanel_a: offset = 3, width = 620, height = 480, data type = uint8, stride = 2560, skip = 3 <---------------------------- stride ----------------------------> <---------------------- width x 4 ----------------------> [index] 01234 8 12 16 20 24 28 2479 2559 |||||---|---|---|---|---|---|----------------------------|-------| [data] RGBARGBARGBARGBARGBAR___R___R... A%%%%%%%% [data] RGBARGBARGBARGBARGBAR___R___R... A%%%%%%%% [data] RGBARGBARGBARGBARGBAR___R___R... A%%%%%%%% ^^^ r-skip
Example2: YUV420P image, width = 620, height = 480, stride = 640 chanel_y: offset = 0, width = 620, height = 480, stride = 640, skip = 0 chanel_u: offset = 307200, width = 310, height = 240, data type = uint8, stride = 320, skip = 0 chanel_v: offset = 384000, width = 310, height = 240, data type = uint8, stride = 320, skip = 0 <--------------------------- y-stride ---------------------------> <----------------------- y-width -----------------------> [index] 012345 619 639 ||||||--------------------------------------------------|--------| [data] YYYYYYYYYYYYYYYYYYYYYYYYYYYYY... Y%%%%%%%%% [data] YYYYYYYYYYYYYYYYYYYYYYYYYYYYY... Y%%%%%%%%% [data] YYYYYYYYYYYYYYYYYYYYYYYYYYYYY... Y%%%%%%%%% [data] ...... <-------- u-stride ----------> <----- u-width -----> [index] 307200 307509 307519 |-------------------|--------| [data] UUUUUUUUUU... U%%%%%%%%% [data] UUUUUUUUUU... U%%%%%%%%% [data] UUUUUUUUUU... U%%%%%%%%% [data] ...... <-------- v-stride ----------> <- --- v-width -----> [index] 384000 384309 384319 |-------------------|--------| [data] VVVVVVVVVV... V%%%%%%%%% [data] VVVVVVVVVV... V%%%%%%%%% [data] VVVVVVVVVV... V%%%%%%%%% [data] ......
Example3: YUV420SP_NV12 image, width = 620, height = 480, stride = 640 chanel_y: offset = 0, width = 620, height = 480, stride = 640, skip = 0 chanel_u: offset = 307200, width = 310, height = 240, data type = uint8, stride = 640, skip = 1 chanel_v: offset = 307201, width = 310, height = 240, data type = uint8, stride = 640, skip = 1 <--------------------------- y-stride --------------------------> <----------------------- y-width ----------------------> [index] 012345 619 639 ||||||-------------------------------------------------|--------| [data] YYYYYYYYYYYYYYYYYYYYYYYYYYYYY... Y%%%%%%%%% [data] YYYYYYYYYYYYYYYYYYYYYYYYYYYYY... Y%%%%%%%%% [data] YYYYYYYYYYYYYYYYYYYYYYYYYYYYY... Y%%%%%%%%% [data] ...... <--------------------- u-stride / v-stride --------------------> <------------------ u-width + v-width -----------------> [index] 307200(u-offset) 307819 307839 |------------------------------------------------------|-------| [index] |307201(v-offset) |307820 | ||-----------------------------------------------------||------| [data] UVUVUVUVUVUVUVUVUVUVUVUVUVUVUV... UV%%%%%%% [data] UVUVUVUVUVUVUVUVUVUVUVUVUVUVUV... UV%%%%%%% [data] UVUVUVUVUVUVUVUVUVUVUVUVUVUVUV... UV%%%%%%% ^ ^ u-skip v-skip
Example4: DEPTH image, width = 640, height = 480, stride = 1280 chanel_d: offset = 0, width = 640, height = 480, data type = uint16, stride = 1280, skip = 0 <----------------------- d-stride ----------------------> <----------------------- d-width -----------------------> [index] 012345 1280 ||||||--------------------------------------------------| [data] DDDDDDDDDDDDDDDDDDDDDDDDDDDDD... D [data] DDDDDDDDDDDDDDDDDDDDDDDDDDDDD... D [data] DDDDDDDDDDDDDDDDDDDDDDDDDDDDD... D [data] ......
The beginning position of this channel's data (relative to the given ArrayBuffer parameter of the mapDataInto() method.)
The width of this channel. Channels in a image format may have different width.
The height of this channel. Channels in a image format may have different height.
The data type used to store one single pixel value.
The stride of this channel. The stride is the number of bytes between the beging two consecutive raws in memory. The total bytes of each raw plus the padding bytes of each raw.
This is used to describe how much bytes between two adjacent pixel values in this channel.
Possible values:
Channel information of this image format. Each image format has at least one channel.
Find the best image format for receiving data.
@return one of the possibleFormats
or the empty
string if no any format in the list is supported. If the
possibleFormats
is not given, then returns the most
suitable image format for this ImageBitmap from all supported
image formats.
A list of image formats that users can handler.
Calculate the length of mapped data wile the image is represented
in the given format
.
Throws if format
is not supported.
@return the length (in bytes) of image data the represented in the
given format
.
The format that users want.
Makes a copy of the underlying image data in the given format
format
into the given buffer
at offset
offset
, filling at most length
bytes and
returns a ImageFormatPixelLayout object which describes the
pixel layout.
Throws if format
is not supported.
Each time this method is invoked returns a new ImageFormatPixelLayout object.
@return a ImageFormatPixelLayout object which describes the pixel layout.
The format that users want.
A container for receiving the mapped image data.
The beginning position of the buffer
to place
the mapped data.
The length of space in the buffer
that could be
filled.
Set an external image data into a ImageBitmap
.
The format of the external image data.
A container of the external image data.
The beginning position of the buffer
where the
external image data is placed.
The length of spaces in the buffer
that the
external image data is palced.
The pixel layout of the external image data, which describes
how the data is arranged in the given buffer
as
the given format
.
Thanks to Jeff Muizelaar for providing insightful oppinions in the APIs design and the experimantal implement.