NumPy Arrays: Efficient Numerical Computing
Welcome to NumPy Arrays! Think of NumPy as Python’s calculator on steroids - it handles complex mathematical operations with lightning speed, making it the foundation of scientific computing.
Why NumPy Matters
Before NumPy, Python lists were slow for math operations:
# Slow Python list approach
python_list = [1, 2, 3, 4, 5]
result = []
for x in python_list:
result.append(x * 2 + 10)
# Takes milliseconds for large lists
With NumPy, the same operation is blazing fast:
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
result = arr * 2 + 10 # Vectorized operation!
# Takes microseconds, even for millions of elements
Creating Arrays
From Lists
import numpy as np
# 1D array
arr1d = np.array([1, 2, 3, 4, 5])
print(arr1d) # [1 2 3 4 5]
print(type(arr1d)) # <class 'numpy.ndarray'>
# 2D array (matrix)
arr2d = np.array([[1, 2, 3], [4, 5, 6]])
print(arr2d)
# [[1 2 3]
# [4 5 6]]
# 3D array
arr3d = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
print(arr3d.shape) # (2, 2, 2)
Special Arrays
# Arrays of zeros
zeros = np.zeros(5) # [0. 0. 0. 0. 0.]
zeros_2d = np.zeros((3, 4)) # 3x4 matrix of zeros
# Arrays of ones
ones = np.ones(5) # [1. 1. 1. 1. 1.]
ones_2d = np.ones((2, 3)) # 2x3 matrix of ones
# Identity matrix
identity = np.eye(3) # 3x3 identity matrix
# [[1. 0. 0.]
# [0. 1. 0.]
# [0. 0. 1.]]
# Range arrays
range_arr = np.arange(10) # [0 1 2 3 4 5 6 7 8 9]
step_arr = np.arange(0, 10, 2) # [0 2 4 6 8]
# Linear spacing
linspace_arr = np.linspace(0, 1, 5) # [0. 0.25 0.5 0.75 1. ]
# Random arrays
random_arr = np.random.rand(5) # 5 random numbers between 0-1
random_ints = np.random.randint(1, 100, 10) # 10 random integers 1-99
Data Types
# Specify data type
int_arr = np.array([1, 2, 3], dtype=np.int32)
float_arr = np.array([1.0, 2.0, 3.0], dtype=np.float64)
bool_arr = np.array([True, False, True], dtype=bool)
# Convert data types
float_arr = int_arr.astype(np.float64)
int_arr = float_arr.astype(np.int32)
# Check data type
print(arr.dtype) # int64, float32, etc.
Array Properties
arr = np.array([[1, 2, 3], [4, 5, 6]])
print("Shape:", arr.shape) # (2, 3) - rows, columns
print("Size:", arr.size) # 6 - total elements
print("Dimensions:", arr.ndim) # 2 - number of dimensions
print("Data type:", arr.dtype) # int64
print("Item size:", arr.itemsize) # 8 bytes per element
print("Total bytes:", arr.nbytes) # 48 bytes total
Indexing and Slicing
1D Arrays
arr = np.array([10, 20, 30, 40, 50])
# Single element
print(arr[0]) # 10
print(arr[2]) # 30
print(arr[-1]) # 50 (last element)
# Slicing
print(arr[1:4]) # [20 30 40]
print(arr[:3]) # [10 20 30]
print(arr[2:]) # [30 40 50]
print(arr[::2]) # [10 30 50] (every other element)
print(arr[::-1]) # [50 40 30 20 10] (reversed)
2D Arrays
arr2d = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12]])
# Single element
print(arr2d[0, 0]) # 1 (row 0, column 0)
print(arr2d[1, 2]) # 7 (row 1, column 2)
# Entire row
print(arr2d[0]) # [1 2 3 4]
print(arr2d[1, :]) # [5 6 7 8]
# Entire column
print(arr2d[:, 0]) # [1 5 9]
print(arr2d[:, 2]) # [3 7 11]
# Sub-matrix
print(arr2d[0:2, 1:3]) # [[2 3] [6 7]]
print(arr2d[::2, ::2]) # [[1 3] [9 11]] (every other row and column)
Boolean Indexing
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
# Filter elements
mask = arr > 5
print(mask) # [False False False False False True True True True True]
print(arr[mask]) # [6 7 8 9 10]
# Multiple conditions
mask = (arr > 3) & (arr < 8)
print(arr[mask]) # [4 5 6 7]
# Even numbers
even_mask = arr % 2 == 0
print(arr[even_mask]) # [2 4 6 8 10]
Mathematical Operations
Element-wise Operations
a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])
# Addition
print(a + b) # [6 8 10 12]
print(np.add(a, b)) # Same result
# Subtraction
print(a - b) # [-4 -4 -4 -4]
# Multiplication
print(a * b) # [5 12 21 32]
# Division
print(b / a) # [5. 3. 2.33333333 2.]
# Power
print(a ** 2) # [1 4 9 16]
# Modulo
print(b % a) # [0 0 1 0]
Broadcasting
arr = np.array([1, 2, 3, 4])
# Scalar operations (broadcasting)
print(arr + 10) # [11 12 13 14]
print(arr * 2) # [2 4 6 8]
print(arr ** 2) # [1 4 9 16]
# Broadcasting with different shapes
arr2d = np.array([[1, 2, 3], [4, 5, 6]])
scalar = 10
print(arr2d + scalar) # Adds 10 to every element
# Broadcasting arrays
arr1 = np.array([[1], [2], [3]]) # Shape (3, 1)
arr2 = np.array([10, 20, 30]) # Shape (3,)
print(arr1 + arr2) # Broadcasting works!
Mathematical Functions
arr = np.array([1, 4, 9, 16, 25])
# Square root
print(np.sqrt(arr)) # [1. 2. 3. 4. 5.]
# Trigonometric functions
angles = np.array([0, np.pi/4, np.pi/2, np.pi])
print(np.sin(angles)) # [0. 0.70710678 1. 0.]
# Exponential and logarithmic
print(np.exp(arr)) # e^1, e^4, e^9, etc.
print(np.log(arr)) # ln(1), ln(4), ln(9), etc.
# Rounding
float_arr = np.array([1.234, 2.567, 3.891])
print(np.round(float_arr, 1)) # [1.2 2.6 3.9]
print(np.floor(float_arr)) # [1. 2. 3.]
print(np.ceil(float_arr)) # [2. 3. 4.]
Array Manipulation
Reshaping
arr = np.arange(12) # [0 1 2 3 4 5 6 7 8 9 10 11]
# Reshape to 2D
reshaped = arr.reshape(3, 4)
print(reshaped)
# [[ 0 1 2 3]
# [ 4 5 6 7]
# [ 8 9 10 11]]
# Reshape to 3D
reshaped_3d = arr.reshape(2, 2, 3)
print(reshaped_3d.shape) # (2, 2, 3)
# Flatten array
flat = reshaped.flatten()
print(flat) # [0 1 2 3 4 5 6 7 8 9 10 11]
# Transpose
matrix = np.array([[1, 2, 3], [4, 5, 6]])
print(matrix.T)
# [[1 4]
# [2 5]
# [3 6]]
Joining Arrays
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
# Concatenate
print(np.concatenate([a, b])) # [1 2 3 4 5 6]
# Stack vertically
print(np.vstack([a, b]))
# [[1 2 3]
# [4 5 6]]
# Stack horizontally
print(np.hstack([a, b])) # [1 2 3 4 5 6]
# 2D arrays
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])
print(np.vstack([arr1, arr2]))
# [[1 2]
# [3 4]
# [5 6]
# [7 8]]
print(np.hstack([arr1, arr2]))
# [[1 2 5 6]
# [3 4 7 8]]
Splitting Arrays
arr = np.arange(10) # [0 1 2 3 4 5 6 7 8 9]
# Split into equal parts
print(np.split(arr, 2)) # [array([0, 1, 2, 3, 4]), array([5, 6, 7, 8, 9])]
# Split at specific indices
print(np.split(arr, [3, 7])) # [array([0, 1, 2]), array([3, 4, 5, 6]), array([7, 8, 9])]
# 2D array splitting
arr2d = np.arange(16).reshape(4, 4)
print(np.vsplit(arr2d, 2)) # Split into 2 rows
print(np.hsplit(arr2d, 2)) # Split into 2 columns
Statistical Operations
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
print("Sum:", np.sum(arr)) # 55
print("Mean:", np.mean(arr)) # 5.5
print("Median:", np.median(arr)) # 5.5
print("Standard deviation:", np.std(arr)) # 2.872...
print("Variance:", np.var(arr)) # 8.25
print("Minimum:", np.min(arr)) # 1
print("Maximum:", np.max(arr)) # 10
print("Range:", np.ptp(arr)) # 9 (peak-to-peak)
# Percentiles
print("25th percentile:", np.percentile(arr, 25)) # 3.25
print("75th percentile:", np.percentile(arr, 75)) # 7.75
# 2D array statistics
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print("Sum of all elements:", np.sum(matrix)) # 45
print("Sum along rows:", np.sum(matrix, axis=0)) # [12 15 18]
print("Sum along columns:", np.sum(matrix, axis=1)) # [6 15 24]
print("Mean along rows:", np.mean(matrix, axis=0)) # [4. 5. 6.]
Random Number Generation
# Set seed for reproducible results
np.random.seed(42)
# Uniform random numbers [0, 1)
print(np.random.rand(5)) # [0.37454012 0.95071431 0.73199394 0.59865848 0.15601864]
# Random integers
print(np.random.randint(1, 10, 5)) # 5 random integers from 1 to 9
# Normal distribution
print(np.random.normal(0, 1, 5)) # 5 numbers from standard normal distribution
# Choice (sampling)
colors = ['red', 'blue', 'green', 'yellow']
print(np.random.choice(colors, 3)) # Random sample of 3 colors
# Shuffle array
arr = np.arange(10)
np.random.shuffle(arr)
print(arr) # Shuffled in place
# Permutation
arr = np.arange(5)
print(np.random.permutation(arr)) # Random permutation
Practical Examples
Example 1: Image Processing Basics
# Create a simple 3x3 grayscale image
image = np.array([[0, 128, 255],
[64, 192, 128],
[255, 64, 0]], dtype=np.uint8)
print("Image shape:", image.shape)
print("Image data type:", image.dtype)
print("Min pixel value:", np.min(image))
print("Max pixel value:", np.max(image))
print("Mean pixel value:", np.mean(image))
# Brightness adjustment
brighter = np.clip(image + 50, 0, 255)
darker = np.clip(image - 50, 0, 255)
# Simple edge detection (difference between adjacent pixels)
edges = np.abs(np.diff(image, axis=1))
print("Edge detection result:")
print(edges)
Example 2: Financial Calculations
# Stock prices over 10 days
prices = np.array([100, 102, 98, 105, 108, 95, 110, 115, 120, 118])
# Daily returns
daily_returns = np.diff(prices) / prices[:-1] * 100
print("Daily returns (%):", np.round(daily_returns, 2))
# Cumulative returns
cumulative_returns = np.cumprod(1 + daily_returns / 100) - 1
print("Cumulative returns (%):", np.round(cumulative_returns * 100, 2))
# Volatility (standard deviation of returns)
volatility = np.std(daily_returns)
print(f"Volatility: {volatility:.2f}%")
# Maximum drawdown
peak = np.maximum.accumulate(prices)
drawdown = (prices - peak) / peak
max_drawdown = np.min(drawdown)
print(f"Maximum drawdown: {max_drawdown:.2%}")
# Moving averages
def moving_average(data, window):
return np.convolve(data, np.ones(window)/window, mode='valid')
ma_3 = moving_average(prices, 3)
ma_5 = moving_average(prices, 5)
print("3-day moving average:", ma_3)
print("5-day moving average:", ma_5)
Example 3: Matrix Operations
# Coefficient matrix for system of equations:
# 2x + 3y = 8
# 4x - y = 2
A = np.array([[2, 3], [4, -1]])
b = np.array([8, 2])
# Solve Ax = b
x = np.linalg.solve(A, b)
print(f"Solution: x = {x[0]:.2f}, y = {x[1]:.2f}")
# Matrix multiplication
matrix1 = np.array([[1, 2], [3, 4]])
matrix2 = np.array([[5, 6], [7, 8]])
product = np.dot(matrix1, matrix2)
print("Matrix product:")
print(product)
# Eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(matrix1)
print("Eigenvalues:", eigenvalues)
print("Eigenvectors:")
print(eigenvectors)
# Matrix inverse
inverse = np.linalg.inv(matrix1)
identity = np.dot(matrix1, inverse)
print("Matrix inverse check (should be identity):")
print(np.round(identity, 10))
Example 4: Data Analysis Simulation
# Simulate exam scores for 100 students
np.random.seed(42)
scores = np.random.normal(75, 15, 100) # Mean 75, std dev 15
scores = np.clip(scores, 0, 100) # Keep scores between 0-100
print("Exam Statistics:")
print(f"Mean score: {np.mean(scores):.1f}")
print(f"Median score: {np.median(scores):.1f}")
print(f"Standard deviation: {np.std(scores):.1f}")
print(f"Highest score: {np.max(scores):.1f}")
print(f"Lowest score: {np.min(scores):.1f}")
# Grade distribution
def assign_grade(score):
if score >= 90: return 'A'
elif score >= 80: return 'B'
elif score >= 70: return 'C'
elif score >= 60: return 'D'
else: return 'F'
grades = np.array([assign_grade(score) for score in scores])
unique_grades, counts = np.unique(grades, return_counts=True)
print("\nGrade Distribution:")
for grade, count in zip(unique_grades, counts):
percentage = count / len(scores) * 100
print(f"{grade}: {count} students ({percentage:.1f}%)")
# Correlation with study hours (simulated)
study_hours = np.random.normal(20, 5, 100)
correlation = np.corrcoef(scores, study_hours)[0, 1]
print(f"\nCorrelation between scores and study hours: {correlation:.3f}")
Performance Comparison
import time
# Large dataset
size = 1000000
python_list = list(range(size))
numpy_array = np.arange(size)
# Python list operation
start = time.time()
result_list = [x * 2 + 1 for x in python_list]
python_time = time.time() - start
# NumPy operation
start = time.time()
result_numpy = numpy_array * 2 + 1
numpy_time = time.time() - start
print(f"Python list time: {python_time:.4f} seconds")
print(f"NumPy array time: {numpy_time:.4f} seconds")
print(f"NumPy is {python_time/numpy_time:.1f}x faster!")
# Verify results are the same
print("Results match:", np.array_equal(result_list, result_numpy))
Best Practices
1. Vectorize Operations
# Good - vectorized
arr = np.array([1, 2, 3, 4, 5])
result = arr * 2 + 10
# Bad - loops
result = []
for x in arr:
result.append(x * 2 + 10)
2. Use Appropriate Data Types
# Use smaller data types when possible
large_array = np.zeros(1000000, dtype=np.int8) # 1MB
# vs
large_array = np.zeros(1000000, dtype=np.int64) # 8MB
3. Avoid Unnecessary Copies
# Good - views don't copy data
arr = np.arange(10)
view = arr[2:8] # No copy made
# Bad - copies data
arr = np.arange(10)
copy = arr[2:8].copy() # Explicit copy
4. Use Broadcasting Wisely
# Good - broadcasting
arr = np.random.rand(1000, 1000)
normalized = (arr - arr.min()) / (arr.max() - arr.min())
# Avoid inefficient broadcasting
arr1 = np.random.rand(100)
arr2 = np.random.rand(100)
# This works but may be slow for very large arrays
result = arr1[:, np.newaxis] + arr2
Practice Exercises
Exercise 1: Array Creation
Create the following arrays:
- A 1D array of numbers from 1 to 50
- A 2D array (5x5) of random integers between 10-99
- A 3D array (2x3x4) filled with ones
- An identity matrix of size 6x6
Exercise 2: Array Operations
Given two arrays:
a = np.array([1, 2, 3, 4, 5])
b = np.array([10, 20, 30, 40, 50])
Perform the following operations:
- Add them element-wise
- Multiply them element-wise
- Calculate a squared plus b
- Find elements where a > 2 and b < 40
Exercise 3: Matrix Calculations
Create a 4x4 matrix and:
- Calculate the sum of each row
- Calculate the sum of each column
- Find the maximum value in each row
- Transpose the matrix
Exercise 4: Statistical Analysis
Generate 1000 random numbers from a normal distribution and:
- Calculate mean, median, and standard deviation
- Find the 25th and 75th percentiles
- Count how many values are within 1 standard deviation of the mean
- Create a histogram of the data
Exercise 5: Image Manipulation
Create a 10x10 “image” (2D array) and:
- Set the border pixels to 255 (white)
- Set the inner pixels to 0 (black)
- Add random noise to the inner pixels
- Calculate the mean brightness of the image
Summary
NumPy arrays provide efficient numerical computing:
Creating Arrays:
import numpy as np
# From lists
arr = np.array([1, 2, 3, 4, 5])
# Special arrays
zeros = np.zeros(5)
ones = np.ones((3, 4))
identity = np.eye(3)
range_arr = np.arange(10)
Array Operations:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
# Element-wise operations
print(a + b) # [5 7 9]
print(a * b) # [4 10 18]
print(a ** 2) # [1 4 9]
# Broadcasting
print(a + 10) # [11 12 13]
Indexing and Slicing:
arr = np.array([10, 20, 30, 40, 50])
print(arr[0]) # 10
print(arr[1:4]) # [20 30 40]
print(arr[::2]) # [10 30 50]
Key Concepts:
- Vectorized operations for speed
- Broadcasting for flexible operations
- Array indexing and slicing
- Statistical functions
- Array manipulation (reshape, join, split)
Next: Pandas DataFrames - Data manipulation and analysis! 📊