NumPy Arrays: Efficient Numerical Computing

Welcome to NumPy Arrays! Think of NumPy as Python’s calculator on steroids - it handles complex mathematical operations with lightning speed, making it the foundation of scientific computing.

Why NumPy Matters

Before NumPy, Python lists were slow for math operations:

# Slow Python list approach
python_list = [1, 2, 3, 4, 5]
result = []
for x in python_list:
    result.append(x * 2 + 10)
# Takes milliseconds for large lists

With NumPy, the same operation is blazing fast:

import numpy as np

arr = np.array([1, 2, 3, 4, 5])
result = arr * 2 + 10  # Vectorized operation!
# Takes microseconds, even for millions of elements

Creating Arrays

From Lists

import numpy as np

# 1D array
arr1d = np.array([1, 2, 3, 4, 5])
print(arr1d)  # [1 2 3 4 5]
print(type(arr1d))  # <class 'numpy.ndarray'>

# 2D array (matrix)
arr2d = np.array([[1, 2, 3], [4, 5, 6]])
print(arr2d)
# [[1 2 3]
#  [4 5 6]]

# 3D array
arr3d = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
print(arr3d.shape)  # (2, 2, 2)

Special Arrays

# Arrays of zeros
zeros = np.zeros(5)  # [0. 0. 0. 0. 0.]
zeros_2d = np.zeros((3, 4))  # 3x4 matrix of zeros

# Arrays of ones
ones = np.ones(5)  # [1. 1. 1. 1. 1.]
ones_2d = np.ones((2, 3))  # 2x3 matrix of ones

# Identity matrix
identity = np.eye(3)  # 3x3 identity matrix
# [[1. 0. 0.]
#  [0. 1. 0.]
#  [0. 0. 1.]]

# Range arrays
range_arr = np.arange(10)  # [0 1 2 3 4 5 6 7 8 9]
step_arr = np.arange(0, 10, 2)  # [0 2 4 6 8]

# Linear spacing
linspace_arr = np.linspace(0, 1, 5)  # [0.   0.25 0.5  0.75 1.  ]

# Random arrays
random_arr = np.random.rand(5)  # 5 random numbers between 0-1
random_ints = np.random.randint(1, 100, 10)  # 10 random integers 1-99

Data Types

# Specify data type
int_arr = np.array([1, 2, 3], dtype=np.int32)
float_arr = np.array([1.0, 2.0, 3.0], dtype=np.float64)
bool_arr = np.array([True, False, True], dtype=bool)

# Convert data types
float_arr = int_arr.astype(np.float64)
int_arr = float_arr.astype(np.int32)

# Check data type
print(arr.dtype)  # int64, float32, etc.

Array Properties

arr = np.array([[1, 2, 3], [4, 5, 6]])

print("Shape:", arr.shape)  # (2, 3) - rows, columns
print("Size:", arr.size)   # 6 - total elements
print("Dimensions:", arr.ndim)  # 2 - number of dimensions
print("Data type:", arr.dtype)  # int64
print("Item size:", arr.itemsize)  # 8 bytes per element
print("Total bytes:", arr.nbytes)  # 48 bytes total

Indexing and Slicing

1D Arrays

arr = np.array([10, 20, 30, 40, 50])

# Single element
print(arr[0])   # 10
print(arr[2])   # 30
print(arr[-1])  # 50 (last element)

# Slicing
print(arr[1:4])   # [20 30 40]
print(arr[:3])    # [10 20 30]
print(arr[2:])    # [30 40 50]
print(arr[::2])   # [10 30 50] (every other element)
print(arr[::-1])  # [50 40 30 20 10] (reversed)

2D Arrays

arr2d = np.array([[1, 2, 3, 4],
                  [5, 6, 7, 8],
                  [9, 10, 11, 12]])

# Single element
print(arr2d[0, 0])  # 1 (row 0, column 0)
print(arr2d[1, 2])  # 7 (row 1, column 2)

# Entire row
print(arr2d[0])     # [1 2 3 4]
print(arr2d[1, :])  # [5 6 7 8]

# Entire column
print(arr2d[:, 0])  # [1 5 9]
print(arr2d[:, 2])  # [3 7 11]

# Sub-matrix
print(arr2d[0:2, 1:3])  # [[2 3] [6 7]]
print(arr2d[::2, ::2])  # [[1 3] [9 11]] (every other row and column)

Boolean Indexing

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

# Filter elements
mask = arr > 5
print(mask)  # [False False False False False  True  True  True  True  True]
print(arr[mask])  # [6 7 8 9 10]

# Multiple conditions
mask = (arr > 3) & (arr < 8)
print(arr[mask])  # [4 5 6 7]

# Even numbers
even_mask = arr % 2 == 0
print(arr[even_mask])  # [2 4 6 8 10]

Mathematical Operations

Element-wise Operations

a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])

# Addition
print(a + b)  # [6 8 10 12]
print(np.add(a, b))  # Same result

# Subtraction
print(a - b)  # [-4 -4 -4 -4]

# Multiplication
print(a * b)  # [5 12 21 32]

# Division
print(b / a)  # [5. 3. 2.33333333 2.]

# Power
print(a ** 2)  # [1 4 9 16]

# Modulo
print(b % a)  # [0 0 1 0]

Broadcasting

arr = np.array([1, 2, 3, 4])

# Scalar operations (broadcasting)
print(arr + 10)  # [11 12 13 14]
print(arr * 2)   # [2 4 6 8]
print(arr ** 2)  # [1 4 9 16]

# Broadcasting with different shapes
arr2d = np.array([[1, 2, 3], [4, 5, 6]])
scalar = 10
print(arr2d + scalar)  # Adds 10 to every element

# Broadcasting arrays
arr1 = np.array([[1], [2], [3]])  # Shape (3, 1)
arr2 = np.array([10, 20, 30])     # Shape (3,)
print(arr1 + arr2)  # Broadcasting works!

Mathematical Functions

arr = np.array([1, 4, 9, 16, 25])

# Square root
print(np.sqrt(arr))  # [1. 2. 3. 4. 5.]

# Trigonometric functions
angles = np.array([0, np.pi/4, np.pi/2, np.pi])
print(np.sin(angles))  # [0. 0.70710678 1. 0.]

# Exponential and logarithmic
print(np.exp(arr))   # e^1, e^4, e^9, etc.
print(np.log(arr))   # ln(1), ln(4), ln(9), etc.

# Rounding
float_arr = np.array([1.234, 2.567, 3.891])
print(np.round(float_arr, 1))  # [1.2 2.6 3.9]
print(np.floor(float_arr))     # [1. 2. 3.]
print(np.ceil(float_arr))      # [2. 3. 4.]

Array Manipulation

Reshaping

arr = np.arange(12)  # [0 1 2 3 4 5 6 7 8 9 10 11]

# Reshape to 2D
reshaped = arr.reshape(3, 4)
print(reshaped)
# [[ 0  1  2  3]
#  [ 4  5  6  7]
#  [ 8  9 10 11]]

# Reshape to 3D
reshaped_3d = arr.reshape(2, 2, 3)
print(reshaped_3d.shape)  # (2, 2, 3)

# Flatten array
flat = reshaped.flatten()
print(flat)  # [0 1 2 3 4 5 6 7 8 9 10 11]

# Transpose
matrix = np.array([[1, 2, 3], [4, 5, 6]])
print(matrix.T)
# [[1 4]
#  [2 5]
#  [3 6]]

Joining Arrays

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Concatenate
print(np.concatenate([a, b]))  # [1 2 3 4 5 6]

# Stack vertically
print(np.vstack([a, b]))
# [[1 2 3]
#  [4 5 6]]

# Stack horizontally
print(np.hstack([a, b]))  # [1 2 3 4 5 6]

# 2D arrays
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])

print(np.vstack([arr1, arr2]))
# [[1 2]
#  [3 4]
#  [5 6]
#  [7 8]]

print(np.hstack([arr1, arr2]))
# [[1 2 5 6]
#  [3 4 7 8]]

Splitting Arrays

arr = np.arange(10)  # [0 1 2 3 4 5 6 7 8 9]

# Split into equal parts
print(np.split(arr, 2))  # [array([0, 1, 2, 3, 4]), array([5, 6, 7, 8, 9])]

# Split at specific indices
print(np.split(arr, [3, 7]))  # [array([0, 1, 2]), array([3, 4, 5, 6]), array([7, 8, 9])]

# 2D array splitting
arr2d = np.arange(16).reshape(4, 4)
print(np.vsplit(arr2d, 2))  # Split into 2 rows
print(np.hsplit(arr2d, 2))  # Split into 2 columns

Statistical Operations

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

print("Sum:", np.sum(arr))           # 55
print("Mean:", np.mean(arr))         # 5.5
print("Median:", np.median(arr))     # 5.5
print("Standard deviation:", np.std(arr))  # 2.872...
print("Variance:", np.var(arr))      # 8.25
print("Minimum:", np.min(arr))       # 1
print("Maximum:", np.max(arr))       # 10
print("Range:", np.ptp(arr))         # 9 (peak-to-peak)

# Percentiles
print("25th percentile:", np.percentile(arr, 25))  # 3.25
print("75th percentile:", np.percentile(arr, 75))  # 7.75

# 2D array statistics
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

print("Sum of all elements:", np.sum(matrix))      # 45
print("Sum along rows:", np.sum(matrix, axis=0))   # [12 15 18]
print("Sum along columns:", np.sum(matrix, axis=1)) # [6 15 24]
print("Mean along rows:", np.mean(matrix, axis=0))  # [4. 5. 6.]

Random Number Generation

# Set seed for reproducible results
np.random.seed(42)

# Uniform random numbers [0, 1)
print(np.random.rand(5))  # [0.37454012 0.95071431 0.73199394 0.59865848 0.15601864]

# Random integers
print(np.random.randint(1, 10, 5))  # 5 random integers from 1 to 9

# Normal distribution
print(np.random.normal(0, 1, 5))  # 5 numbers from standard normal distribution

# Choice (sampling)
colors = ['red', 'blue', 'green', 'yellow']
print(np.random.choice(colors, 3))  # Random sample of 3 colors

# Shuffle array
arr = np.arange(10)
np.random.shuffle(arr)
print(arr)  # Shuffled in place

# Permutation
arr = np.arange(5)
print(np.random.permutation(arr))  # Random permutation

Practical Examples

Example 1: Image Processing Basics

# Create a simple 3x3 grayscale image
image = np.array([[0, 128, 255],
                  [64, 192, 128],
                  [255, 64, 0]], dtype=np.uint8)

print("Image shape:", image.shape)
print("Image data type:", image.dtype)
print("Min pixel value:", np.min(image))
print("Max pixel value:", np.max(image))
print("Mean pixel value:", np.mean(image))

# Brightness adjustment
brighter = np.clip(image + 50, 0, 255)
darker = np.clip(image - 50, 0, 255)

# Simple edge detection (difference between adjacent pixels)
edges = np.abs(np.diff(image, axis=1))
print("Edge detection result:")
print(edges)

Example 2: Financial Calculations

# Stock prices over 10 days
prices = np.array([100, 102, 98, 105, 108, 95, 110, 115, 120, 118])

# Daily returns
daily_returns = np.diff(prices) / prices[:-1] * 100
print("Daily returns (%):", np.round(daily_returns, 2))

# Cumulative returns
cumulative_returns = np.cumprod(1 + daily_returns / 100) - 1
print("Cumulative returns (%):", np.round(cumulative_returns * 100, 2))

# Volatility (standard deviation of returns)
volatility = np.std(daily_returns)
print(f"Volatility: {volatility:.2f}%")

# Maximum drawdown
peak = np.maximum.accumulate(prices)
drawdown = (prices - peak) / peak
max_drawdown = np.min(drawdown)
print(f"Maximum drawdown: {max_drawdown:.2%}")

# Moving averages
def moving_average(data, window):
    return np.convolve(data, np.ones(window)/window, mode='valid')

ma_3 = moving_average(prices, 3)
ma_5 = moving_average(prices, 5)
print("3-day moving average:", ma_3)
print("5-day moving average:", ma_5)

Example 3: Matrix Operations

# Coefficient matrix for system of equations:
# 2x + 3y = 8
# 4x - y = 2

A = np.array([[2, 3], [4, -1]])
b = np.array([8, 2])

# Solve Ax = b
x = np.linalg.solve(A, b)
print(f"Solution: x = {x[0]:.2f}, y = {x[1]:.2f}")

# Matrix multiplication
matrix1 = np.array([[1, 2], [3, 4]])
matrix2 = np.array([[5, 6], [7, 8]])
product = np.dot(matrix1, matrix2)
print("Matrix product:")
print(product)

# Eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(matrix1)
print("Eigenvalues:", eigenvalues)
print("Eigenvectors:")
print(eigenvectors)

# Matrix inverse
inverse = np.linalg.inv(matrix1)
identity = np.dot(matrix1, inverse)
print("Matrix inverse check (should be identity):")
print(np.round(identity, 10))

Example 4: Data Analysis Simulation

# Simulate exam scores for 100 students
np.random.seed(42)
scores = np.random.normal(75, 15, 100)  # Mean 75, std dev 15
scores = np.clip(scores, 0, 100)  # Keep scores between 0-100

print("Exam Statistics:")
print(f"Mean score: {np.mean(scores):.1f}")
print(f"Median score: {np.median(scores):.1f}")
print(f"Standard deviation: {np.std(scores):.1f}")
print(f"Highest score: {np.max(scores):.1f}")
print(f"Lowest score: {np.min(scores):.1f}")

# Grade distribution
def assign_grade(score):
    if score >= 90: return 'A'
    elif score >= 80: return 'B'
    elif score >= 70: return 'C'
    elif score >= 60: return 'D'
    else: return 'F'

grades = np.array([assign_grade(score) for score in scores])
unique_grades, counts = np.unique(grades, return_counts=True)

print("\nGrade Distribution:")
for grade, count in zip(unique_grades, counts):
    percentage = count / len(scores) * 100
    print(f"{grade}: {count} students ({percentage:.1f}%)")

# Correlation with study hours (simulated)
study_hours = np.random.normal(20, 5, 100)
correlation = np.corrcoef(scores, study_hours)[0, 1]
print(f"\nCorrelation between scores and study hours: {correlation:.3f}")

Performance Comparison

import time

# Large dataset
size = 1000000
python_list = list(range(size))
numpy_array = np.arange(size)

# Python list operation
start = time.time()
result_list = [x * 2 + 1 for x in python_list]
python_time = time.time() - start

# NumPy operation
start = time.time()
result_numpy = numpy_array * 2 + 1
numpy_time = time.time() - start

print(f"Python list time: {python_time:.4f} seconds")
print(f"NumPy array time: {numpy_time:.4f} seconds")
print(f"NumPy is {python_time/numpy_time:.1f}x faster!")

# Verify results are the same
print("Results match:", np.array_equal(result_list, result_numpy))

Best Practices

1. Vectorize Operations

# Good - vectorized
arr = np.array([1, 2, 3, 4, 5])
result = arr * 2 + 10

# Bad - loops
result = []
for x in arr:
    result.append(x * 2 + 10)

2. Use Appropriate Data Types

# Use smaller data types when possible
large_array = np.zeros(1000000, dtype=np.int8)   # 1MB
# vs
large_array = np.zeros(1000000, dtype=np.int64)  # 8MB

3. Avoid Unnecessary Copies

# Good - views don't copy data
arr = np.arange(10)
view = arr[2:8]  # No copy made

# Bad - copies data
arr = np.arange(10)
copy = arr[2:8].copy()  # Explicit copy

4. Use Broadcasting Wisely

# Good - broadcasting
arr = np.random.rand(1000, 1000)
normalized = (arr - arr.min()) / (arr.max() - arr.min())

# Avoid inefficient broadcasting
arr1 = np.random.rand(100)
arr2 = np.random.rand(100)
# This works but may be slow for very large arrays
result = arr1[:, np.newaxis] + arr2

Practice Exercises

Exercise 1: Array Creation

Create the following arrays:

A 1D array of numbers from 1 to 50
A 2D array (5x5) of random integers between 10-99
A 3D array (2x3x4) filled with ones
An identity matrix of size 6x6

Exercise 2: Array Operations

Given two arrays:

a = np.array([1, 2, 3, 4, 5])
b = np.array([10, 20, 30, 40, 50])

Perform the following operations:

Add them element-wise
Multiply them element-wise
Calculate a squared plus b
Find elements where a > 2 and b < 40

Exercise 3: Matrix Calculations

Create a 4x4 matrix and:

Calculate the sum of each row
Calculate the sum of each column
Find the maximum value in each row
Transpose the matrix

Exercise 4: Statistical Analysis

Generate 1000 random numbers from a normal distribution and:

Calculate mean, median, and standard deviation
Find the 25th and 75th percentiles
Count how many values are within 1 standard deviation of the mean
Create a histogram of the data

Exercise 5: Image Manipulation

Create a 10x10 “image” (2D array) and:

Set the border pixels to 255 (white)
Set the inner pixels to 0 (black)
Add random noise to the inner pixels
Calculate the mean brightness of the image

Summary

NumPy arrays provide efficient numerical computing:

Creating Arrays:

import numpy as np

# From lists
arr = np.array([1, 2, 3, 4, 5])

# Special arrays
zeros = np.zeros(5)
ones = np.ones((3, 4))
identity = np.eye(3)
range_arr = np.arange(10)

Array Operations:

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Element-wise operations
print(a + b)    # [5 7 9]
print(a * b)    # [4 10 18]
print(a ** 2)   # [1 4 9]

# Broadcasting
print(a + 10)   # [11 12 13]

Indexing and Slicing:

arr = np.array([10, 20, 30, 40, 50])

print(arr[0])      # 10
print(arr[1:4])    # [20 30 40]
print(arr[::2])    # [10 30 50]

Key Concepts:

Vectorized operations for speed
Broadcasting for flexible operations
Array indexing and slicing
Statistical functions
Array manipulation (reshape, join, split)

Next: Pandas DataFrames - Data manipulation and analysis! 📊

Popular Topics

Categories