2 Tensor

In any deep learning library, the Tensor is the fundamental thing. It is more than just a multi-dimensional array; it is a container for data that remembers its own history.

In this chapter, we will build the foundation of our library, which we will call babygrad.

Create a new folder and a virtual environment. We will use uv package manager.

project/
├─ .venv/           # virtual environment folder
├─ babygrad/        # source code
├─ examples/        # Examples using babygrad 
└─ tests/           #tests

2.1 Tensor

What is an Array?

Array is a collection of elements , each identified by at least one index.

But We will not use Python Lists/Arrays here. We would instead give the responsibility to NumPy.

What is a NumPy Array?

A NumPy Array is a high-performance, N-dimensional block of numbers.

It stores data in a fixed block of memory.
It allows us to perform “vectorized” operations (adding 1 million numbers at once without a for loop).
It is the “raw material” that our Tensors will wrap.
It is very fast.

What is a Tensor?

A Tensor is a container (or “wrapper”) that holds an array in a structured way, allowing you to perform various operations efficiently.

a = babygrad.Tensor([1,2,3])
b = babygrad.Tensor([1,2,3])
c = a+b 
print(c._inputs)
>>> [Tensor(1,2,3), Tensor[1,2,3]] 
print(c.op)
>>> <babygrad.ops.Add object at 0x7f0cfcfcc3a0>

Here inputs of C Tensor are clearly A and B. We need this information when we will do backward pass.

The op shows the operation that was performed (Add) to produce the output(C).

x = babygrad.Tensor([1,2,3])
print(a.__dict__)
# {
#  'data': array([1., 2., 3.], dtype=float32), 
#  'grad': None, 
#  'requires_grad': True, 
#  '_op': None, 
#  '_inputs': [], 
#  '_device': 'cpu'
# }

As from the output above, we can clearly see what are the properties required for a Tensor class.

data: Numpy array.
dtype: “float32”.
grad : gradients calculated during backward pass.
requires_grad If false no backpropagation.
_op: What Math operation created this Tensor.
_inputs : What inputs created this Tensor.

The _op and _inputs are required to create the Computation graph that will help a lot during backward pass.

What is a Computation Graph?

A graph that shows

Numbers (Tensors) as nodes.
Operations (ops) as nodes.
Edges showing how data flows from inputs → operations → outputs.

Now that we understand the basics of tensor, let’s start implementing the Tensor.

Lets start by creating babygrad/tensor.py.

What are the different types of data one can pass when creating Tensor?

A normal List/Scalar.
Ndarray .
A Tensor

So we need to handle these cases in the initialization. It doesn’t matter what the input data is, we want to always make sure self.data is always NDArray(np.ndarray).

Exercise 2.1

Lets write the Tensor class.

import numpy as np 
NDArray = np.ndarray
class Tensor:
    def __init__(self, data, *, device=None, dtype="float32",
         requires_grad=True):
        """
        Create a new tensor.
        Args:
            data: Array-like data (list, numpy array, or another Tensor)
            device: Device placement (currently ignored, CPU only)
            dtype: Data type for the array
            requires_grad: Whether to track gradients for this tensor
        
        Design decision: requires_grad defaults to True (unlike PyTorch)
         (Will change later to false, when introducing Parameter)
        """
        # if data isinstance of Tensor

        # if data instance of np.ndarray 

        # if data instance of List/Scalar 
        #Your solution here.
    def __repr__(self):
        """
        Detailed representation showing data and gradient tracking.
        Example:
            >>> x = Tensor([1, 2, 3])
            >>> print(repr(x))
            Tensor([1. 2. 3.], requires_grad=True)
        """
        return f"Tensor({self.data}, requires_grad={self.requires_grad})"
    def __str__(self):
        """
        Simple string representation (just the data).
        Example:
            >>> x = Tensor([1, 2, 3])
            >>> print(x)
            [1. 2. 3.]
        """
        return str(self.data)
    def backward(self, out_grad=None):
        # we will do this in next chapter !.

Note

_op and _inputs are needed too.

If the data is not of type np.ndarray , convert it to np.ndarray first.

2.2 Data operations

Now that we have the basic Tensor class, let’s extend it by adding a few simple methods that are helpful for working with tensors. For now, we’ll focus on:

Shape
Dtype
Device
ndim
Size

These are simple properties of the Tensor class.

class Tensor:
    @property
    def shape(self):
        """Shape of the tensor."""
        return self.data.shape
    @property
    def dtype(self):
        """Data type of the tensor."""
        return self.data.dtype
    @property
    def ndim(self):
        """Number of dimensions."""
        return self.data.ndim
    @property
    def size(self):
        """Total number of elements."""
        return self.data.size
    @property
    def device(self):
        """Device where tensor lives."""
        return self._device

Now that we have basic properties let’s focus on getting the actual data stored in a Tensor.

We’ll start with a simple method: numpy().

The purpose of numpy() is to extract the raw NumPy array from a Tensor.
This is useful when you want to inspect, visualize, or use the data with other Python libraries without screwing anything in the graph.

class Tensor:
    # existing code...
    def numpy(self):
        """
        Return the data as a NumPy array (detached from the autograd graph).
        This returns a copy, so modifying the result will not affect
        the tensor's data.
        Examples:
            >>> x = Tensor([1, 2, 3])
            >>> y = x + 1   # y is still a Tensor, part of the graph
            >>> z = x.numpy() + 1  # z is a NumPy array, not part of the graph
        Returns:
            np.ndarray: A copy of the tensor's data as a NumPy array.
        """
        return self.data.copy()

Sometimes we want a clone of a Tensor that has the same data but is not connected to the computation graph.
This is useful when we want to inspect or manipulate the data without affecting the graph.

The detach() method creates a new Tensor with the same underlying data as the original Tensor

class Tensor:
    def detach(self):
        """
        Creates a new Tensor with same data but no gradient tracking.
        Useful when you want to use values without building
        computation graph.
        Returns:
            Tensor: New tensor with requires_grad=False
        Example:
            >>> x = Tensor([1, 2, 3], requires_grad=True)
            >>> y = x.detach()  # y doesn't track gradients
            >>> z = y * 2       # This operation won't be in graph
        """
        return Tensor(self.data, requires_grad=False)

Note

detach() creates a Tensor with requires_grad= False. That means it wont participate in Computation graph.