Visu3d - Transform (go/v3d-transform)#

If you’re new to v3d, please look at the intro first.

Installation#

We use same installation/imports as in the intro.

!pip install visu3d etils[ecolab] jax[cpu] tf-nightly tfds-nightly sunds
/bin/sh: line 1: pip: command not found
from __future__ import annotations
from etils.ecolab.lazy_imports import *

Transformations#

v3d makes it easy to project back and forth across coordinates frames.

3d <> 3d#

v3d.Transform stores the position, rotation and scale of an object.

It is used to transform objects (e.g. from world to camera 3d coordinates).

v3d.Transform is composed of R (rotation, scale) and t (translation) component:

tr = v3d.Transform(
    R=[  # Define a rigid rotation
       [-1/3, -(1/3)**.5, (1/3)**.5],
       [1/3, -(1/3)**.5, -(1/3)**.5],
       [-2/3, 0, -(1/3)**.5],
    ],
    t=[2, 2, 2],
)

# Fig display the (x, y, z) basis of the transformation
tr.fig

v3d.Transform can be composed with all types of objects:

Transformation is applied through Python __matmul__ operator: tr @ <obj>

v3d.make_fig([
    tr,
    tr @ np.array([[0, 0, 0], [1, 1, 1]]),
    tr @ v3d.Point3d(p=[0, 0, 2], rgb=[255, 0, 0]),
    tr @ v3d.Ray(pos=[0, 0, 0], dir=[0, 1, 1]),
    tr @ v3d.Transform(R=np.eye(3), t=[0, 0, 3]),
])

Inverting a transformation is trivial:

tr.inv
Transform(
    R=array([[-0.5       ,  0.5       , -1.        ],
           [-0.86602545, -0.86602545, -0.        ],
           [ 0.57735026, -0.57735026, -0.57735026]], dtype=float32),
    t=array([2.       , 3.4641018, 1.1547005], dtype=float32),
)
tr.inv @ tr  # `tr.inv @ tr` is identity
Transform(
    R=array([[1.        , 0.        , 0.        ],
           [0.        , 1.        , 0.        ],
           [0.        , 0.        , 0.99999994]], dtype=float32),
    t=array([0., 0., 0.], dtype=float32),
)

See the API for all properties (.matrix4x4, .x_dir, .y_dir, .z_dir,…).

3d <> 2d (Camera pixel projections)#

Let’s create a camera looking at the center.

# Camera looking at the center
cam = v3d.Camera.from_look_at(
    spec=v3d.PinholeCamera.from_focal(
        resolution=(128, 170),
        focal_in_px=120,
    ),
    pos=[2, -0.5, 1.7],
    target=[0, 0, 0],  # < TODO(epot): Rename end -> look_at
)

# Point cloud of arbitrary `(..., 3)` shape
rng = np.random.default_rng(0)
point_cloud = v3d.Point3d(
    p=(rng.random((50, 50, 3)) - 0.5) * 3,
    rgb=rng.integers(255, size=(50, 50, 3)),
)

We can project the 3d into 2d pixel coordinates using px_from_world. It supports:

# Convert (world 3d) -> (px 2d) coordinates
px_coord = cam.px_from_world @ point_cloud

Which is equivalent to:

# Convert (world 3d) -> (camera 3d) -> (px 2d) coordinates
px_coord = cam.spec.px_from_cam @ cam.cam_from_world @ point_cloud

v3d.Point2d can be visualized in the pixel space:

# Truncate coordinates outside the screen
# Use `(w, h)` as pixels are in `(i, j)` coordinates
px_coord = px_coord.clip(min=0, max=cam.wh)

px_coord.fig

v3d.Point3d -> v3d.Point2d will preserve the depth and rgb values, which allows to project back to 3d without any information loss:

px_coord.flatten()[0]
Point2d(
    p=array([ 63.00236, 118.86679], dtype=float32),
    depth=array([3.1114159], dtype=float32),
    rgb=array([ 48, 134,  92], dtype=uint8),
)

The transformation preserves the shape (*shape, 3) -> (*shape, 2).

print(f'{point_cloud.p.shape} -> {px_coord.p.shape}')
(50, 50, 3) -> (50, 50, 2)

When the depth is missing, z=1 in camera coordinates:

px_coord = px_coord.replace(depth=None)

# Convert (px 2d) -> (world 3d) coordinates
projected_points = cam.world_from_px @ px_coord

v3d.make_fig([
    point_cloud,
    projected_points,
    cam,
])

Supporting the Transform protocol#

To support v3d.Transform, you only need to implement the apply_transform protocol.

from etils.array_types import f32

class MyRay(v3d.DataclassArray):
  pos: f32['*shape 3']
  dir: f32['*shape 3']

  def apply_transform(self, tr: v3d.Transform):
    """Supports `tr @ my_ray`."""
    return self.replace(
        pos=tr @ self.pos,
        # `tr.apply_to_dir` only apply the rotation (tr.R), but NOT the
        # translation (tr.t)
        dir=tr.apply_to_dir(self.dir),
    )


my_ray = MyRay(pos=[0, 0, 0], dir=[0, 0, 1])
cam.world_from_cam @ my_ray
MyRay(
    pos=array([ 2. , -0.5,  1.7], dtype=float32),
    dir=array([-0.74848115,  0.18712029, -0.636209  ], dtype=float32),
)

Similarly to support 3d <-> 2d pixel projection, you need to implement the apply_px_from_cam and apply_cam_from_px protocols. See v3d.Point3d for an implementation example.

For more info on how to create your custom v3d.DataclassArray primitives, have look at the dataclass array tutorial.