Numpy Interview
ā Mejbah Ahammad
Question: What is NumPy and why is it important in data science?
Answer: NumPy is a fundamental package for scientific computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays efficiently.
- Efficient storage and manipulation of numerical data.
- Mathematical operations are performed faster than with standard Python lists.
- Supports integration with other data science libraries like pandas and scikit-learn.
Question: How do you create a NumPy array?
Answer: You can create a NumPy array using the np.array()
function by passing a Python list or tuple.
- Creating a 1D array:
- Creating a 2D array:
- Creating arrays with specific data types:
Question: How can you perform element-wise operations on NumPy arrays?
Answer: NumPy allows you to perform element-wise operations directly on arrays using standard arithmetic operators.
- Addition, subtraction, multiplication, and division.
- Exponentiation and other mathematical functions.
- Applying functions like
np.sin()
ornp.log()
.
Question: What is broadcasting in NumPy?
Answer: Broadcasting is a powerful mechanism in NumPy that allows arithmetic operations on arrays of different shapes. It automatically expands the smaller array to match the shape of the larger array without making copies of the data.
- Enables operations between arrays of different dimensions.
- Rules determine how the arrays are broadcast together.
- Prevents the need for explicit replication of data.
Question: How do you index and slice NumPy arrays?
Answer: Indexing and slicing in NumPy arrays allow you to access and modify specific elements or subsets of the array. NumPy supports both integer indexing and slicing similar to Python lists.
- Accessing individual elements using indices.
- Slicing to obtain subarrays.
- Using boolean indexing for conditional selection.
Question: How do you reshape a NumPy array?
Answer: Reshaping a NumPy array changes its dimensions without changing its data. This can be done using the reshape()
method.
- Changing the shape of an array to have different dimensions.
- Ensuring the total number of elements remains the same.
- Using -1 to automatically calculate the size of one dimension.
Question: How do you compute the mean, median, and standard deviation using NumPy?
Answer: NumPy provides built-in functions to compute statistical measures like mean, median, and standard deviation on arrays.
np.mean()
computes the average of the array elements.np.median()
computes the median of the array elements.np.std()
computes the standard deviation of the array elements.
Question: How do you handle missing values in NumPy arrays?
Answer: NumPy does not have built-in support for missing values like pandas. However, you can represent missing values using np.nan
and handle them using functions that can ignore or handle nan
values.
- Using
np.nan
to represent missing values. - Functions like
np.isnan()
to detect missing values. - Using
np.nanmean()
,np.nanmedian()
, etc., to compute statistics ignoringnan
values.
Question: What are some advantages of using NumPy over Python lists?
Answer: NumPy offers several advantages over Python lists, especially for numerical computations and data manipulation:
- Performance: NumPy arrays are stored more efficiently and allow for faster computations.
- Memory Efficiency: NumPy uses less memory compared to Python lists.
- Convenient Operations: Supports vectorized operations, which are more concise and readable.
- Rich Functionality: Provides a wide range of mathematical, logical, and statistical functions.
- Integration: Easily integrates with other scientific libraries like pandas, matplotlib, and scikit-learn.
Question: How can you concatenate and split NumPy arrays?
Answer: NumPy provides functions like np.concatenate()
, np.vstack()
, np.hstack()
for concatenation, and np.split()
for splitting arrays.
np.concatenate()
joins two or more arrays along an existing axis.np.vstack()
stacks arrays vertically (row-wise).np.hstack()
stacks arrays horizontally (column-wise).np.split()
splits an array into multiple sub-arrays.