2 Tensor
In any deep learning library, the Tensor is the fundamental thing. It is more than just a multi-dimensional array; it is a container for data that remembers its own history.
In this chapter, we will build the foundation of our library, which we will call babygrad.
project/
├─ .venv/ # virtual environment folder
├─ babygrad/ # source code
├─ examples/ # Examples using babygrad
└─ tests/ #tests2.1 Tensor
Array is a collection of elements , each identified by at least one index.
But We will not use Python Lists/Arrays here. We would instead give the responsibility to NumPy.
A NumPy Array is a high-performance, N-dimensional block of numbers.
- It stores data in a fixed block of memory.
- It allows us to perform “vectorized” operations (adding 1 million numbers at once without a for loop).
- It is the “raw material” that our Tensors will wrap.
- It is very fast.
A Tensor is a container (or “wrapper”) that holds an array in a structured way, allowing you to perform various operations efficiently.
a = babygrad.Tensor([1,2,3])
b = babygrad.Tensor([1,2,3])
c = a+b
print(c._inputs)
>>> [Tensor(1,2,3), Tensor[1,2,3]]
print(c.op)
>>> <babygrad.ops.Add object at 0x7f0cfcfcc3a0>Here inputs of C Tensor are clearly A and B. We need this information when we will do backward pass.
The op shows the operation that was performed (Add) to produce the output(C).
x = babygrad.Tensor([1,2,3])
print(a.__dict__)
# {
# 'data': array([1., 2., 3.], dtype=float32),
# 'grad': None,
# 'requires_grad': True,
# '_op': None,
# '_inputs': [],
# '_device': 'cpu'
# }As from the output above, we can clearly see what are the properties required for a Tensor class.
data: Numpy array.dtype: “float32”.grad: gradients calculated during backward pass.requires_gradIf false no backpropagation._op: What Math operation created thisTensor._inputs: What inputs created this Tensor.
The _op and _inputs are required to create the Computation graph that will help a lot during backward pass.
A graph that shows
- Numbers (Tensors) as nodes.
- Operations (ops) as nodes.
- Edges showing how data flows from inputs → operations → outputs.
Now that we understand the basics of tensor, let’s start implementing the Tensor.
Lets start by creating babygrad/tensor.py.
What are the different types of data one can pass when creating Tensor?
- A normal List/Scalar.
- Ndarray .
- A Tensor
So we need to handle these cases in the initialization. It doesn’t matter what the input data is, we want to always make sure self.data is always NDArray(np.ndarray).
Lets write the Tensor class.
import numpy as np
NDArray = np.ndarray
class Tensor:
def __init__(self, data, *, device=None, dtype="float32",
requires_grad=True):
"""
Create a new tensor.
Args:
data: Array-like data (list, numpy array, or another Tensor)
device: Device placement (currently ignored, CPU only)
dtype: Data type for the array
requires_grad: Whether to track gradients for this tensor
Design decision: requires_grad defaults to True (unlike PyTorch)
(Will change later to false, when introducing Parameter)
"""
# if data isinstance of Tensor
# if data instance of np.ndarray
# if data instance of List/Scalar
#Your solution here.
def __repr__(self):
"""
Detailed representation showing data and gradient tracking.
Example:
>>> x = Tensor([1, 2, 3])
>>> print(repr(x))
Tensor([1. 2. 3.], requires_grad=True)
"""
return f"Tensor({self.data}, requires_grad={self.requires_grad})"
def __str__(self):
"""
Simple string representation (just the data).
Example:
>>> x = Tensor([1, 2, 3])
>>> print(x)
[1. 2. 3.]
"""
return str(self.data)
def backward(self, out_grad=None):
# we will do this in next chapter !._op and _inputs are needed too.
If the data is not of type np.ndarray , convert it to np.ndarray first.
2.2 Data operations
Now that we have the basic Tensor class, let’s extend it by adding a few simple methods that are helpful for working with tensors. For now, we’ll focus on:
- Shape
- Dtype
- Device
- ndim
- Size
These are simple properties of the Tensor class.
class Tensor:
@property
def shape(self):
"""Shape of the tensor."""
return self.data.shape
@property
def dtype(self):
"""Data type of the tensor."""
return self.data.dtype
@property
def ndim(self):
"""Number of dimensions."""
return self.data.ndim
@property
def size(self):
"""Total number of elements."""
return self.data.size
@property
def device(self):
"""Device where tensor lives."""
return self._deviceNow that we have basic properties let’s focus on getting the actual data stored in a Tensor.
We’ll start with a simple method: numpy().
- The purpose of
numpy()is to extract the raw NumPy array from a Tensor.
- This is useful when you want to inspect, visualize, or use the data with other Python libraries without screwing anything in the graph.
class Tensor:
# existing code...
def numpy(self):
"""
Return the data as a NumPy array (detached from the autograd graph).
This returns a copy, so modifying the result will not affect
the tensor's data.
Examples:
>>> x = Tensor([1, 2, 3])
>>> y = x + 1 # y is still a Tensor, part of the graph
>>> z = x.numpy() + 1 # z is a NumPy array, not part of the graph
Returns:
np.ndarray: A copy of the tensor's data as a NumPy array.
"""
return self.data.copy()Sometimes we want a clone of a Tensor that has the same data but is not connected to the computation graph.
This is useful when we want to inspect or manipulate the data without affecting the graph.
The detach() method creates a new Tensor with the same underlying data as the original Tensor
class Tensor:
def detach(self):
"""
Creates a new Tensor with same data but no gradient tracking.
Useful when you want to use values without building
computation graph.
Returns:
Tensor: New tensor with requires_grad=False
Example:
>>> x = Tensor([1, 2, 3], requires_grad=True)
>>> y = x.detach() # y doesn't track gradients
>>> z = y * 2 # This operation won't be in graph
"""
return Tensor(self.data, requires_grad=False) detach() creates a Tensor with requires_grad= False. That means it wont participate in Computation graph.