Numpy Interview

Question: What is NumPy and why is it important in data science?

Answer: NumPy is a fundamental package for scientific computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays efficiently.

Efficient storage and manipulation of numerical data.
Mathematical operations are performed faster than with standard Python lists.
Supports integration with other data science libraries like pandas and scikit-learn.

        🔧
        
        import numpy as np
      
        🔧
        
        array = np.array([1, 2, 3, 4, 5])
      
        💡
        
        # Display the array
      
        🚀
        
        print(array)

Question: How do you create a NumPy array?

Answer: You can create a NumPy array using the np.array() function by passing a Python list or tuple.

Creating a 1D array:
Creating a 2D array:
Creating arrays with specific data types:

        🔧
        
        import numpy as np
      
        🔧
        
        # 1D array
      
        💡
        
        array_1d = np.array([1, 2, 3, 4, 5])
      
        🔧
        
        # 2D array
      
        💡
        
        array_2d = np.array([[1, 2, 3], [4, 5, 6]])
      
        🔧
        
        # Array with specific data type
      
        💡
        
        array_float = np.array([1, 2, 3], dtype='float')
      
        🚀
        
        print(array_1d)
      
        🚀
        
        print(array_2d)
      
        🚀
        
        print(array_float)

Question: How can you perform element-wise operations on NumPy arrays?

Answer: NumPy allows you to perform element-wise operations directly on arrays using standard arithmetic operators.

Addition, subtraction, multiplication, and division.
Exponentiation and other mathematical functions.
Applying functions like np.sin() or np.log().

        🔧
        
        import numpy as np
      
        🔧
        
        a = np.array([1, 2, 3])
      
        🔧
        
        b = np.array([4, 5, 6])
      
        💡
        
        # Element-wise addition
      
        🚀
        
        c = a + b
      
        💡
        
        # Element-wise multiplication
      
        🚀
        
        d = a * b
      
        💡
        
        # Element-wise sine
      
        🚀
        
        e = np.sin(a)
      
        🚀
        
        print(c)
      
        🚀
        
        print(d)
      
        🚀
        
        print(e)

Question: What is broadcasting in NumPy?

Answer: Broadcasting is a powerful mechanism in NumPy that allows arithmetic operations on arrays of different shapes. It automatically expands the smaller array to match the shape of the larger array without making copies of the data.

Enables operations between arrays of different dimensions.
Rules determine how the arrays are broadcast together.
Prevents the need for explicit replication of data.

        🔧
        
        import numpy as np
      
        🔧
        
        a = np.array([[1, 2, 3], [4, 5, 6]])
      
        🔧
        
        b = np.array([10, 20, 30])
      
        💡
        
        # Broadcasting addition
      
        🚀
        
        c = a + b
      
        🚀
        
        print(c)

Question: How do you index and slice NumPy arrays?

Answer: Indexing and slicing in NumPy arrays allow you to access and modify specific elements or subsets of the array. NumPy supports both integer indexing and slicing similar to Python lists.

Accessing individual elements using indices.
Slicing to obtain subarrays.
Using boolean indexing for conditional selection.

        🔧
        
        import numpy as np
      
        🔧
        
        a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
      
        💡
        
        # Access element at row 1, column 2
      
        🚀
        
        element = a[1, 2]
      
        🚀
        
        print(element)
      
        💡
        
        # Slice first two rows and columns
      
        🚀
        
        slice_a = a[:2, :2]
      
        🚀
        
        print(slice_a)
      
        💡
        
        # Boolean indexing
      
        🚀
        
        bool_idx = a > 5
      
        🚀
        
        print(a[bool_idx])

Question: How do you reshape a NumPy array?

Answer: Reshaping a NumPy array changes its dimensions without changing its data. This can be done using the reshape() method.

Changing the shape of an array to have different dimensions.
Ensuring the total number of elements remains the same.
Using -1 to automatically calculate the size of one dimension.

        🔧
        
        import numpy as np
      
        🔧
        
        a = np.array([1, 2, 3, 4, 5, 6])
      
        💡
        
        # Reshape to 2x3 array
      
        🚀
        
        b = a.reshape((2, 3))
      
        🚀
        
        print(b)
      
        💡
        
        # Reshape using -1
      
        🚀
        
        c = a.reshape((-1, 2))
      
        🚀
        
        print(c)

Question: How do you compute the mean, median, and standard deviation using NumPy?

Answer: NumPy provides built-in functions to compute statistical measures like mean, median, and standard deviation on arrays.

np.mean() computes the average of the array elements.
np.median() computes the median of the array elements.
np.std() computes the standard deviation of the array elements.

        🔧
        
        import numpy as np
      
        🔧
        
        data = np.array([10, 20, 30, 40, 50])
      
        💡
        
        # Compute mean
      
        🚀
        
        mean = np.mean(data)
      
        🚀
        
        print('Mean:', mean)
      
        💡
        
        # Compute median
      
        🚀
        
        median = np.median(data)
      
        🚀
        
        print('Median:', median)
      
        💡
        
        # Compute standard deviation
      
        🚀
        
        std_dev = np.std(data)
      
        🚀
        
        print('Standard Deviation:', std_dev)

Question: How do you handle missing values in NumPy arrays?

Answer: NumPy does not have built-in support for missing values like pandas. However, you can represent missing values using np.nan and handle them using functions that can ignore or handle nan values.

Using np.nan to represent missing values.
Functions like np.isnan() to detect missing values.
Using np.nanmean(), np.nanmedian(), etc., to compute statistics ignoring nan values.

        🔧
        
        import numpy as np
      
        🔧
        
        data = np.array([1, 2, np.nan, 4, 5])
      
        💡
        
        # Check for nan values
      
        🚀
        
        nan_mask = np.isnan(data)
      
        🚀
        
        print('NaN Mask:', nan_mask)
      
        💡
        
        # Compute mean ignoring nan
      
        🚀
        
        mean = np.nanmean(data)
      
        🚀
        
        print('Mean ignoring NaN:', mean)

Question: What are some advantages of using NumPy over Python lists?

Answer: NumPy offers several advantages over Python lists, especially for numerical computations and data manipulation:

Performance: NumPy arrays are stored more efficiently and allow for faster computations.
Memory Efficiency: NumPy uses less memory compared to Python lists.
Convenient Operations: Supports vectorized operations, which are more concise and readable.
Rich Functionality: Provides a wide range of mathematical, logical, and statistical functions.
Integration: Easily integrates with other scientific libraries like pandas, matplotlib, and scikit-learn.

        🔧
        
        import numpy as np
      
        🔧
        
        import time
      
        💡
        
        # Create a large list and NumPy array
      
        🔧
        
        large_list = list(range(1000000))
      
        🔧
        
        large_array = np.arange(1000000)
      
        💡
        
        # Time list comprehension
      
        🚀
        
        start_time = time.time()
      
        🚀
        
        squared_list = [x**2 for x in large_list]
      
        🚀
        
        print('List Comprehension Time:', time.time() - start_time)
      
        💡
        
        # Time vectorized operation
      
        🚀
        
        start_time = time.time()
      
        🚀
        
        squared_array = large_array**2
      
        🚀
        
        print('NumPy Vectorized Operation Time:', time.time() - start_time)

Question: How can you concatenate and split NumPy arrays?

Answer: NumPy provides functions like np.concatenate(), np.vstack(), np.hstack() for concatenation, and np.split() for splitting arrays.

np.concatenate() joins two or more arrays along an existing axis.
np.vstack() stacks arrays vertically (row-wise).
np.hstack() stacks arrays horizontally (column-wise).
np.split() splits an array into multiple sub-arrays.

        🔧
        
        import numpy as np
      
        🔧
        
        a = np.array([[1, 2], [3, 4]])
      
        🔧
        
        b = np.array([[5, 6], [7, 8]])
      
        💡
        
        # Concatenate along axis 0
      
        🚀
        
        c = np.concatenate((a, b), axis=0)
      
        🚀
        
        print('Concatenated along axis 0:', c)
      
        💡
        
        # Split the array into two
      
        🚀
        
        split_a, split_b = np.split(c, 2, axis=0)
      
        🚀
        
        print('First Split:', split_a)
      
        🚀
        
        print('Second Split:', split_b)