Numpy Interview

NumPy Interview
šŸ“Š
NumPy Interview

Question: What is NumPy and why is it important in data science?

Answer: NumPy is a fundamental package for scientific computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays efficiently.

  • Efficient storage and manipulation of numerical data.
  • Mathematical operations are performed faster than with standard Python lists.
  • Supports integration with other data science libraries like pandas and scikit-learn.
šŸ”§ 1 import numpy as np
šŸ”§ 2 array = np.array([1, 2, 3, 4, 5])
šŸ’” 3 # Display the array
šŸš€ 4 print(array)

Question: How do you create a NumPy array?

Answer: You can create a NumPy array using the np.array() function by passing a Python list or tuple.

  • Creating a 1D array:
  • Creating a 2D array:
  • Creating arrays with specific data types:
šŸ”§ 1 import numpy as np
šŸ”§ 2 # 1D array
šŸ’” 3 array_1d = np.array([1, 2, 3, 4, 5])
šŸ”§ 4 # 2D array
šŸ’” 5 array_2d = np.array([[1, 2, 3], [4, 5, 6]])
šŸ”§ 6 # Array with specific data type
šŸ’” 7 array_float = np.array([1, 2, 3], dtype='float')
šŸš€ 8 print(array_1d)
šŸš€ 9 print(array_2d)
šŸš€ 10 print(array_float)

Question: How can you perform element-wise operations on NumPy arrays?

Answer: NumPy allows you to perform element-wise operations directly on arrays using standard arithmetic operators.

  • Addition, subtraction, multiplication, and division.
  • Exponentiation and other mathematical functions.
  • Applying functions like np.sin() or np.log().
šŸ”§ 1 import numpy as np
šŸ”§ 2 a = np.array([1, 2, 3])
šŸ”§ 3 b = np.array([4, 5, 6])
šŸ’” 4 # Element-wise addition
šŸš€ 5 c = a + b
šŸ’” 6 # Element-wise multiplication
šŸš€ 7 d = a * b
šŸ’” 8 # Element-wise sine
šŸš€ 9 e = np.sin(a)
šŸš€ 10 print(c)
šŸš€ 11 print(d)
šŸš€ 12 print(e)

Question: What is broadcasting in NumPy?

Answer: Broadcasting is a powerful mechanism in NumPy that allows arithmetic operations on arrays of different shapes. It automatically expands the smaller array to match the shape of the larger array without making copies of the data.

  • Enables operations between arrays of different dimensions.
  • Rules determine how the arrays are broadcast together.
  • Prevents the need for explicit replication of data.
šŸ”§ 1 import numpy as np
šŸ”§ 2 a = np.array([[1, 2, 3], [4, 5, 6]])
šŸ”§ 3 b = np.array([10, 20, 30])
šŸ’” 4 # Broadcasting addition
šŸš€ 5 c = a + b
šŸš€ 6 print(c)

Question: How do you index and slice NumPy arrays?

Answer: Indexing and slicing in NumPy arrays allow you to access and modify specific elements or subsets of the array. NumPy supports both integer indexing and slicing similar to Python lists.

  • Accessing individual elements using indices.
  • Slicing to obtain subarrays.
  • Using boolean indexing for conditional selection.
šŸ”§ 1 import numpy as np
šŸ”§ 2 a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
šŸ’” 3 # Access element at row 1, column 2
šŸš€ 4 element = a[1, 2]
šŸš€ 5 print(element)
šŸ’” 6 # Slice first two rows and columns
šŸš€ 7 slice_a = a[:2, :2]
šŸš€ 8 print(slice_a)
šŸ’” 9 # Boolean indexing
šŸš€ 10 bool_idx = a > 5
šŸš€ 11 print(a[bool_idx])

Question: How do you reshape a NumPy array?

Answer: Reshaping a NumPy array changes its dimensions without changing its data. This can be done using the reshape() method.

  • Changing the shape of an array to have different dimensions.
  • Ensuring the total number of elements remains the same.
  • Using -1 to automatically calculate the size of one dimension.
šŸ”§ 1 import numpy as np
šŸ”§ 2 a = np.array([1, 2, 3, 4, 5, 6])
šŸ’” 3 # Reshape to 2x3 array
šŸš€ 4 b = a.reshape((2, 3))
šŸš€ 5 print(b)
šŸ’” 6 # Reshape using -1
šŸš€ 7 c = a.reshape((-1, 2))
šŸš€ 8 print(c)

Question: How do you compute the mean, median, and standard deviation using NumPy?

Answer: NumPy provides built-in functions to compute statistical measures like mean, median, and standard deviation on arrays.

  • np.mean() computes the average of the array elements.
  • np.median() computes the median of the array elements.
  • np.std() computes the standard deviation of the array elements.
šŸ”§ 1 import numpy as np
šŸ”§ 2 data = np.array([10, 20, 30, 40, 50])
šŸ’” 3 # Compute mean
šŸš€ 4 mean = np.mean(data)
šŸš€ 5 print('Mean:', mean)
šŸ’” 6 # Compute median
šŸš€ 7 median = np.median(data)
šŸš€ 8 print('Median:', median)
šŸ’” 9 # Compute standard deviation
šŸš€ 10 std_dev = np.std(data)
šŸš€ 11 print('Standard Deviation:', std_dev)

Question: How do you handle missing values in NumPy arrays?

Answer: NumPy does not have built-in support for missing values like pandas. However, you can represent missing values using np.nan and handle them using functions that can ignore or handle nan values.

  • Using np.nan to represent missing values.
  • Functions like np.isnan() to detect missing values.
  • Using np.nanmean(), np.nanmedian(), etc., to compute statistics ignoring nan values.
šŸ”§ 1 import numpy as np
šŸ”§ 2 data = np.array([1, 2, np.nan, 4, 5])
šŸ’” 3 # Check for nan values
šŸš€ 4 nan_mask = np.isnan(data)
šŸš€ 5 print('NaN Mask:', nan_mask)
šŸ’” 6 # Compute mean ignoring nan
šŸš€ 7 mean = np.nanmean(data)
šŸš€ 8 print('Mean ignoring NaN:', mean)

Question: What are some advantages of using NumPy over Python lists?

Answer: NumPy offers several advantages over Python lists, especially for numerical computations and data manipulation:

  • Performance: NumPy arrays are stored more efficiently and allow for faster computations.
  • Memory Efficiency: NumPy uses less memory compared to Python lists.
  • Convenient Operations: Supports vectorized operations, which are more concise and readable.
  • Rich Functionality: Provides a wide range of mathematical, logical, and statistical functions.
  • Integration: Easily integrates with other scientific libraries like pandas, matplotlib, and scikit-learn.
šŸ”§ 1 import numpy as np
šŸ”§ 2 import time
šŸ’” 3 # Create a large list and NumPy array
šŸ”§ 4 large_list = list(range(1000000))
šŸ”§ 5 large_array = np.arange(1000000)
šŸ’” 6 # Time list comprehension
šŸš€ 7 start_time = time.time()
šŸš€ 8 squared_list = [x**2 for x in large_list]
šŸš€ 9 print('List Comprehension Time:', time.time() - start_time)
šŸ’” 10 # Time vectorized operation
šŸš€ 11 start_time = time.time()
šŸš€ 12 squared_array = large_array**2
šŸš€ 13 print('NumPy Vectorized Operation Time:', time.time() - start_time)

Question: How can you concatenate and split NumPy arrays?

Answer: NumPy provides functions like np.concatenate(), np.vstack(), np.hstack() for concatenation, and np.split() for splitting arrays.

  • np.concatenate() joins two or more arrays along an existing axis.
  • np.vstack() stacks arrays vertically (row-wise).
  • np.hstack() stacks arrays horizontally (column-wise).
  • np.split() splits an array into multiple sub-arrays.
šŸ”§ 1 import numpy as np
šŸ”§ 2 a = np.array([[1, 2], [3, 4]])
šŸ”§ 3 b = np.array([[5, 6], [7, 8]])
šŸ’” 4 # Concatenate along axis 0
šŸš€ 5 c = np.concatenate((a, b), axis=0)
šŸš€ 6 print('Concatenated along axis 0:', c)
šŸ’” 7 # Split the array into two
šŸš€ 8 split_a, split_b = np.split(c, 2, axis=0)
šŸš€ 9 print('First Split:', split_a)
šŸš€ 10 print('Second Split:', split_b)