# Data Science – Coding Interview Questions

**Category:**Data Science

**Posted:**Mar 29, 2019

**By:**Ashley Morrison

**1. Given an array of integers (positive and negative) write a program that can find the largest continuous sum. You need to return the total sum amount, not the sequence. Let’s see a few clarifying examples:**

**[7,8,9] answer is: 7+8+9 = 24**

**[-1,7,8,9,-10] answer is: 7+8+9 = 24**

**[2,3,-10,9,2] answer is 9+2 =11**

**[2,11,-10,9,2] answer is 2+11-10+9+2 =14**

**[12,-10,7,-8,4,6] answer is 12.**

**Solution: **The algorithm is, we start summing up the numbers and store in a current sum variable. After adding each element, we check whether the current sum is larger than maximum sum encountered so far. If it is, we update the maximum sum. As long as the current sum is positive, we keep adding the numbers. When the current sum becomes negative, we start with a new current sum because a negative current sum will only decrease the sum of a future sequence.

Note that we don’t reset the current sum to 0 because the array can contain all negative integers. Then the result would be the largest negative number. The Python code for this is:

**Want to know More about Data Science? Click here**

**2. Given a string in the form ‘AAAABBBBCCCCCDDEEEE’ compress it to become ‘A4B4C5D2E4’. For this problem , you can falsely “compress” strings of single or souble letters. For instance, it is okay for ‘AAB’ to return ‘A2B1’ even though this technically takes more space. The function should also be case sensitive, so that a string ‘AAAaaa’ returns ‘A3a3’. **

**Solution:** Our strategy is to go along the string, keeping a running count of the current letter series. Once we detect a change in letter, we “compress” that series with its count. The code for this is:

**3. You are given an array of historical stock prices per day, for example: ****[6, 13, 2, 10, 3, 5]. Write an algorithm that figures out what days (index of array) you could buy and sell the stock for maximum profit.**

**You are only allowed to buy the stock once and sell it once. Also no shorting the stock, you have to buy before selling. Let’s see an example of this with the example array of stock prices. Given [6, 13, 2, 10, 3, 5] You make the most profit by buying on day 3(price of $2) and selling on day 4 (price of $10), netting you an $8 gain.**

**Hint: You should be able to solve this problem by only going through the array once!**

**Solution:** One thing to think about right off the bat is that we cannot just find the maximum proce and the lowest price and then subtract the two, because the max could come before the min. In this case, we will use a greedy algorithm approach. We will iterate through the list of stock prices while keeping track of our maximum profit. That means for every price we’re going to keep track of the lowest price so far and then check if we can get a better profit than out current max profit.

**4. Consider an array of non-negative integers. A second array is formed by shuffling the elements of the first array and deleting a random element. Given these two arrays, find which element is missing in the second array. **

**For example, given: **

**[1,2,3,4,5]****[1, 3, 4, 5]**

**The missing value was 2. Try to solve this problem in multiple ways.**

**Solution: **There are many possible solutions for this problem . There are two straight forward approaches. Since, we know all the numbers are non-negative. We can simply sum up both of the arrays. Check the difference and then you have your missing element. Another approach is to sort both of the arrays and then just go through them sequentially until you don’t have a match. The code for this is:

**More Questions on Coding:**

**5. What is output for − raining’. find(‘z’) ?**

**A) Type error**

**B) ‘ ‘**

**C****) -1**

**D) Not found**

**Solution:** C. If the string is not found by method find() , it returns the integer -1.

**6. What is output of following code?**

**x = 2**

**y = 10**

**x * = y * x + 1**

**A) 42**

**B) 41**

**C) 40**

**D) 39**

**Solution:** A, x * = y * x + 1 means x = x * (y * x + 1)

**7. Suppose we have two sets A & B, then A<B is:**

**A) True if len(A) is less than len(B).**

**B****) True if A is a proper subset of B.**

**C) True if the elements in A when compared are less than the elements in B.**

**D) True if A is a proper superset of B.**

**Solution: B,** If A is proper subset of B then len all elements of A are in B but B contains at least one element that is not in B.

**8. What is output of following −**

**print(”abbzxyzxzxabb”.count(‘abb’,-10,-1))**

**A) 2**

**B) 0**

**C) 1**

**D) Error**

**Solution: B** It Counts the number of times the substring ‘abb’ is present starting from position 2 and ending at position 11 in the given string.

**9. Which among them is incorrect for set s = {100,101,102,103}**

**A) Len(s)**

**B) Sum(s)**

**C****) Print(s[3])**

**D) Max(s)**

**Solution:** C, There is no indexing in Sets.

**10. Which options are correct to create an empty set in Python?**

**A) {}**

**B)()**

**C)[]**

**D) set()**

**Solution: **D . It is required to define the set by including the keyword ‘set’.

**11. Suppose you are given the below string**

**str = “””Email_Address,Nickname,Group_Status,Join_Year
[email protected],aa,Owner,2014
[email protected],bb,Member,2015
[email protected],cc,Member,2017
[email protected],dd,Member,2016
[email protected],ee,Member,2020
“”” **

**In order to extract only the domain names from the email addresses from the above string (for eg. “aaa”, “bbb”..) you write the following code:**

**What number should be mentioned instead of “__” to index only the domains?**

**A) 0**

**B) 1**

**C) 2**

**D) 3**

**Solution: **C

**12. What should be the value of “pattern” in regular expression?**

**Note: Python regular expression library has been imported as re.**

**A) pattern = ‘(i|ie)(,)’**

**B) pattern = ‘(i$|ie$)(,)’**

**C) pattern = ‘([a-zA-Z]+i|[a-zA-Z]+ie)(,)’**

**D) None of these**

**Solution: ****B, **You have to find the pattern the end in either “i” or “ie”. So, correct option is B.

**13. Assume, you are given two lists:**

**a = [1,2,3,4,5] and b = [6,7,8,9]**

**Create a list which has all the elements of a and b in one dimension. **

Output: **a = [1,2,3,4,5,6,7,8,9]**

**Which of the following option would you choose?**

**A) a.append(b)**

**B) a.extend(b)**

**C) Any of the above**

**D) None of these**

**Solution: **B

**14. You have built a machine learning model which you wish to freeze now and use later. Which of the following command can perform this task for you? **

**A) push(model, “file”)**

**B) save(model, “file”)**

**C) dump(model, “file”)**

**D) freeze(model, “file”)**

**Solution: ****C**

**15. We want to convert the below string in date-time value:**

**To convert the above string, what should be written in place of ****date_format****?**

**A) “%d/%m/%y”**

**B) “%D/%M/%Y”**

**C) “%d/%M/%y”**

**D) “%d/%m/%Y”**

**Solution: ****D**

**16. How would you join the two arrays? Note: Numpy library has been imported as np**

**A) resulting_set = train_set.append(test_set)**

**B) resulting_set = np.concatenate([train_set, test_set])**

**C) resulting_set = np.vstack([train_set, test_set])**

**D) None of these**

**Solution: ****C** , Both option A and B would do horizontal stacking, but we would like to have vertical stacking. So option C is correct

**17. How would you import a decision tree classifier in sklearn?**

**A) from sklearn.decision_tree import DecisionTreeClassifier**

**B) from sklearn.ensemble import DecisionTreeClassifier**

**C) from sklearn.tree import DecisionTreeClassifier**

**D) None of these**

**Solution: **C

**18. Suppose you want to join train and test data set (both are two numpy arrays train_set and test_set) into a resulting array (resulting_set) to do data processing on it simultaneously. This is as follows:**

**How would you join the two arrays? Note: Numpy library has been imported as np**

A) resulting_set = train_set.append(test_set)

B) resulting_set = np.concatenate([train_set, test_set])

C) resulting_set = np.vstack([train_set, test_set])

D) None of these

**Solution: C **Both option A and B would do horizontal stacking, but we would like to have vertical stacking. So option C is correct

**19. What is the difference between the two data series given below?**

**df[‘Name’] and df.loc[:, ‘Name’] Note: Pandas has been imported as pd**

**A) 1 is view of original dataframe and 2 is a copy of original dataframe.**

**B) 2 is view of original dataframe and 1 is a copy of original dataframe.**

**C) Both are copies of original dataframe.**

**D) Both are views of original dataframe**

**Solution: **B

**Learn Data Science from Industry Experts**

**20. Consider a function “fun” which is defined below:**

**Now you define a list that has three numbers in it.g = [10,11,12] ****Which of the following will be the output of the given print statement:**

**print fun(g), g**

**A) [5, 11, 12] [5, 11, 12]**

**B) [5, 11, 12] [10, 11, 12]**

**C) [10, 11, 12] [10, 11, 12]**

**D) [10, 11, 12] [5, 11, 12]**

**Solution: A**