ITKarma picture
Snake and flowers 2 by pikaole

It happens that a company is looking for a data scientist, but in fact it needs a Python developer. Therefore, in preparation for the interview, it makes sense to refresh Python information, and not just study algorithms.

The Cloud Solutions team translated an article by a developer who has repeatedly found himself in this situation and based on his experience has compiled a list of 53 questions and answers to prepare for the interview. Most data researchers write a lot of code, so this list is useful for both data scientists and engineers. It will be useful for applicants, for those who conduct interviews, and for those who simply learn Python.

Questions go randomly. Let's go.

1. What is the difference between a list and a tuple?

I was asked this question literally at every Python/data science interview. Learn the answer like the back of your hand:

  1. The list can be changed after creation.
  2. The tuple cannot be changed after creation.
  3. The list is sorted. It is an ordered sequence of objects, usually of the same type. For example, all usernames are sorted by creation date: CDMY0CDMY.
  4. A tuple has a structure. In each index, different types of data can coexist. For example, such a database record in memory: CDMY1CDMY.

2. How is string interpolation done?

Without importing the CDMY2CDMY class, there are three ways to interpolate strings:

name='Chris' # 1. f strings print(f'Hello {name}') # 2. % operator print('Hey %s %s' % (name, name)) # 3. format print( "My name is {}".format((name)) ) 

3. What is the difference between CDMY3CDMY and CDMY4CDMY?

When I was a novice developer, I did not see the difference... hello, bugs. So for the protocol: CDMY5CDMY checks for identity, and CDMY6CDMY checks for equality.

Consider an example. Create multiple lists and give them names. Note that below CDMY7CDMY points to the same object as CDMY8CDMY:

a=[1,2,3] b=a c=[1,2,3] 

Check equality and note that all objects are equal:

print(a == b) print(a == c) #=> True #=> True 

But are they all identical? No:

print(a is b) print(a is c) #=> True #=> False 

We can verify this by printing their object identifiers:

print(id(a)) print(id(b)) print(id(c)) #=> 4369567560 #=> 4369567560 #=> 4369567624 

The identifier CDMY9CDMY differs from the identifier CDMY10CDMY and CDMY11CDMY.

4. What is a decorator?

Another question that I was asked at every interview. The topic deserves a separate article, but for basic training, just write your own example.

The decorator allows you to add new functionality to an existing function. This is done as follows. The function is passed to the decorator, and it executes the existing and additional code.

We’ll write a decorator that logs calls to another function.

Write a decorator function . As an argument, it takes the CDMY12CDMY function. The decorator defines a function CDMY13CDMY, which calls CDMY14CDMY and executes some code CDMY15CDMY. Then returns the function defined by it:

def logging(func): def log_function_called(): print(f'{func} called.') func() return log_function_called 

We will write other functions to which we will add a decorator (then, not now):

def my_name(): print('chris') def friends_name(): print('naruto') my_name() friends_name() #=> chris #=> naruto 

Now add a decorator to both of them:

@logging def my_name(): print('chris') @logging def friends_name(): print('naruto') my_name() friends_name() #=> <function my_name at 0x10fca5a60> called. #=> chris #=> <function friends_name at 0x10fca5f28> called. #=> naruto 

Now it's easy to add logging to any function we write. Just write before it CDMY16CDMY.

5. Explain range function

Range generates a list of integers. It can be used in three ways.

A function takes one to three arguments. Note that I wrapped each example in a list to see the generated values.

CDMY17CDMY - Generates integers from 0 to an integer CDMY18CDMY:

[i for i in range(10)] #=> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] 

CDMY19CDMY - Generates integers from CDMY20CDMY to CDMY21CDMY:

[i for i in range(2,10)] #=> [2, 3, 4, 5, 6, 7, 8, 9] 

CDMY22CDMY - Generates integers from CDMY23CDMY to CDMY24CDMY at step intervals:

[i for i in range(2,10,2)] #=> [2, 4, 6, 8] 

Serge Boremchuk suggested a more suitable way:

list(range(2,10,2)) #=> [2, 4, 6, 8] 

6. Define a car class with two attributes: color and speed. Then create an instance and return speed

Here's how to do it:

class Car : def __init__(self, color, speed): self.color=color self.speed=speed car=Car('red','100mph') car.speed #=> '100mph' 

7. What is the difference between instance methods, class methods, and static methods in Python?

Instance methods : accept the CDMY25CDMY parameter and apply to a specific instance of the class.

Static methods : they use the CDMY26CDMY decorator, are not associated with a specific instance and are autonomous (class or instance attributes are not changed).

Class methods : accept the CDMY27CDMY parameter, you can change the class itself.

We illustrate the difference in the fictional class CDMY28CDMY:

class CoffeeShop: specialty='espresso' def __init__(self, coffee_price): self.coffee_price=coffee_price # instance method def make_coffee(self): print(f'Making {self.specialty} for ${self.coffee_price}') # static method @staticmethod def check_weather(): print('Its sunny') # class method @classmethod def change_specialty(cls, specialty): cls.specialty=specialty print(f'Specialty changed to {specialty}') 

The class CDMY29CDMY has the attribute CDMY30CDMY (branded drink), which is set by default to CDMY31CDMY. Each CDMY32CDMY instance is initialized with the CDMY33CDMY attribute. It also has three methods: an instance method, a static method, and a class method.

Let's initialize the instance with attribute CDMY34CDMY equal to CDMY35CDMY. Then call the instance method CDMY36CDMY:

coffee_shop=CoffeeShop('5') coffee_shop.make_coffee() #=> Making espresso for $5 

Now call the static method. Static methods cannot change the state of a class or instance; therefore, they are usually used for utility functions, for example, adding two numbers. Ours check the weather. They say it's sunny. Great!

coffee_shop.check_weather() #=> Its sunny 

Now we use the class method to change the CDMY37CDMY brand drink, and then make the CDMY38CDMY coffee:

coffee_shop.change_specialty('drip coffee') #=> Specialty changed to drip coffee coffee_shop.make_coffee() #=> Making drip coffee for $5 

Please note that CDMY39CDMY used to make espresso and now brews a drip coffee maker CDMY40CDMY.

8. What is the difference between func and func ()?

The question should check your understanding that all functions in Python are also objects:

def func(): print('Im a function') func #=> function __main__.func> func() #=> Im a function 

CDMY41CDMY is an object representing a function that can be assigned to a variable or passed to another function. Function CDMY42CDMY with parentheses calls the function and returns the result.

9. Explain how the map function works

It returns an object (iterator), which iterates over the values, applying a function to each element. If necessary, the object can be converted to a list:

def add_three(x): return x + 3 li=[1,2,3] list(map(add_three, li)) #=> [4, 5, 6] 

Here we add the number 3 to each item in the list.

10. Explain how reduce function works

It can be tricky to understand right away until you use it several times.

CDMY43CDMY accepts a function and sequence - and goes through that sequence. At each iteration, both the current element and the output of the previous element are transferred to the function. In the end, one value is returned:

from functools import reduce def add_three(x,y): return x + y li=[1,2,3,5] reduce(add_three, li) #=> 11 

11 is returned, which is the sum of 1 + 2 + 3 + 5.

11. Explain how the filter function works

The function does literally what its name says: it filters the elements in the sequence.

Each element is passed to a function that includes it in a sequence if it conditionally receives CDMY44CDMY and discards it in the case of CDMY45CDMY:

def add_three(x): if x % 2 == 0: return True else: return False li=[1,2,3,4,5,6,7,8] [i for i in filter(add_three, li)] #=> [2, 4, 6, 8] 

Note how all items that are not divisible by 2. are removed.

12. Are variables in Python passed by reference or by value?

Be prepared to go down the rabbit hole of semantics if you google this question and read the first few pages.

In general, all names are passed by reference, but objects are stored in some memory cells and pointers to other memory cells in others.


Let's see how this works with strings. Create an instance of the name and object that other names point to.Then delete the first one:

x='some text' y=x x is y #=> True del x # удаляем имя 'a', но не объект в памяти z=y y is z #=> True 

We see that all names point to the same object in memory that remained untouched after the delete operation CDMY46CDMY.

Here is another interesting example with a function:

name='text' def add_chars(str1): print( id(str1) ) #=> 4353702856 print( id(name) ) #=> 4353702856 # новое имя, тот же объект str2=str1 # создаем новое имя (не отличается от предыдущего) и новый объект str1 += 's' print( id(str1) ) #=> 4387143328 # объект не изменился print( id(str2) ) #=> 4353702856 add_chars(name) print(name) #=>text 

Note that adding the letter CDMY47CDMY to the line inside the function creates a new name - and a new object too. Even if the new object has the same name as the existing one.

13. How to expand the list?

Note that CDMY48CDMY is called in the list and modifies it. The function itself does not return the modified list:

li=['a','b','c'] print(li) li.reverse() print(li) #=> ['a', 'b', 'c'] #=> ['c', 'b', 'a'] 

14. How does line multiplication work?

Let's see the result of multiplying the string CDMY49CDMY by CDMY50CDMY:

'cat' * 3 #=> 'catcatcat' 

As a result, the contents of the string are repeated three times.

15. How does list multiplication work?

Let's look at the result of multiplying the list CDMY51CDMY by CDMY52CDMY:

[1,2,3] * 2 #=> [1, 2, 3, 1, 2, 3] 

The contents of the CDMY53CDMY list are repeated twice.

16. What does self mean in class?

Self refers to an instance of the class. So the method can update the object to which it belongs.

The CDMY54CDMY transmission below allows you to set the instance color during initialization:

class Shirt: def __init__(self, color): self.color=color s=Shirt('yellow') s.color #=> 'yellow' 

17. How to combine lists in Python?

Lists are combined when combined. Please note that this does not work with arrays:

a=[1,2] b=[3,4,5] a + b #=> [1, 2, 3, 4, 5] 

18. What is the difference between shallow and shallow copies?

We discuss this in the context of a mutable object - a list. For immutable objects, deep and shallow (shallow) copying usually do not differ.

Consider three scenarios.

I) Put a link to the source object . It sends the new name CDMY55CDMY to the same memory location that CDMY56CDMY points to. Therefore, any change in CDMY57CDMY also occurs with CDMY58CDMY:

li1=[['a'],['b'],['c']] li2=li1 li1.append(['d']) print(li2) #=> [['a'], ['b'], ['c'], ['d']] 

II) Create a shallow copy of the original . It can be created using the constructor CDMY59CDMY or CDMY60CDMY.

A shallow copy creates a new object, but fills it with links to the original. Thus, adding a new object to the source list of CDMY61CDMY will not be reflected in CDMY62CDMY, but changing objects in CDMY63CDMY will be reflected:

li3=[['a'],['b'],['c']] li4=list(li3) li3.append([4]) print(li4) #=> [['a'], ['b'], ['c']] li3[0][0]=['X'] print(li4) #=> [[['X']], ['b'], ['c']] 

III) Create a deep copy . This is done using CDMY64CDMY. The original and the copy are completely independent, and changes in one do not affect the other:

import copy li5=[['a'],['b'],['c']] li6=copy.deepcopy(li5) li5.append([4]) li5[0][0]=['X'] print(li6) #=> [['a'], ['b'], ['c']] 

19. What is the difference between lists and arrays?

Note : The Python standard library has an array object, but here we specifically discuss the array from the popular Numpy library.

Lists in each index can be populated with different types of data. Arrays require homogeneous elements.

Arithmetic in lists adds or removes items from a list. Arithmetic operations on arrays correspond to functions of linear algebra.

Arrays use less memory and have significantly more functionality.

20. How to combine two arrays?

Remember that arrays are not lists. This is the Numpy library and linear algebra works here.

To combine arrays you need to use the corresponding Numpy function:

import numpy as np a=np.array([1,2,3]) b=np.array([4,5,6]) np.concatenate((a,b)) #=> array([1, 2, 3, 4, 5, 6]) 

21. What do you like about Python?

Note : This is a very subjective question, and it’s logical to adapt the answer depending on what position you are applying for.

Python is very readable, and there is the so-called "Python way" of solving almost any task, that is, the most understandable, clear and concise code.

This seems like the opposite of Ruby, where there are often many ways to solve a problem without clear guidelines, which option is preferable.

22. What is your favorite library in Python?

Note : this is also subjective, see question 21.

When working with large amounts of data, it is difficult to find something more useful than pandas. With this library, data processing and visualization are easy.

23.Name mutable and immutable objects

Immutability means that the state cannot be changed after creation. Examples: int, float, bool, string and tuple.

The state of mutable objects can be changed. Examples: list, dict, and set.

24. How to round a number to three decimal places?

Use the CDMY65CDMY function:

a=5.12345 round(a,3) #=> 5.123 

25. How to break the list?

The function syntax includes three arguments: CDMY66CDMY, where CDMY67CDMY is the interval at which elements are returned:

a=[0,1,2,3,4,5,6,7,8,9] print(a[:2]) #=> [0, 1] print(a[8:]) #=> [8, 9] print(a[2:8]) #=> [2, 3, 4, 5, 6, 7] print(a[2:8:2]) #=> [2, 4, 6] 

26. What is pickle?

Pickle is a module for serializing and deserializing objects in Python.

In the example below, we serialize and deserialize the list of dictionaries:

import pickle obj=[ {'id':1, 'name':'Stuffy'}, {'id':2, 'name': 'Fluffy'} ] with open('file.p', 'wb') as f: pickle.dump(obj, f) with open('file.p', 'rb') as f: loaded_obj=pickle.load(f) print(loaded_obj) #=> [{'id': 1, 'name': 'Stuffy'}, {'id': 2, 'name': 'Fluffy'}] 

27. What is the difference between dictionaries and JSON?

A dict (dictionary) is a Python data type that is a collection of indexed but unordered key-value pairs.

JSON is just a string that follows the specified format and is designed to transmit data.

28. What ORMs did you use in Python?

ORM technology (object-relational mapping, object-relational mapping) associates data models (usually in an application) with database tables and simplifies database transactions.

In the Flask context, SQLAlchemy is usually used, and Django has its own ORM.

29. How do any () and all () work?

Any returns CDMY68CDMY if at least one element in the sequence matches the condition, that is, CDMY69CDMY.

All returns CDMY70CDMY only if all items in the sequence match the condition.

a=[False, False, False] b=[True, False, False] c=[True, True, True] print( any(a) ) print( any(b) ) print( any(c) ) #=> False #=> True #=> True print( all(a) ) print( all(b) ) print( all(c) ) #=> False #=> False #=> True 

30. Where is faster search: in dictionaries or lists?

Finding a value in a list takes O (n) time, because you have to go through the whole list.

Finding a key in a dictionary takes O (1) time because it is a hash table.

The time difference can be huge if there are many values, so dictionaries are usually recommended for performance. But they have other limitations, such as the need for unique keys.

31. What is the difference between a module and a package?

A module is a file or a set of files that are imported together:

import sklearn 

A package is a directory with modules:

from sklearn import cross_validation 

Thus, packages are modules, but not all modules are packages.

32. How to increase and decrease an integer in Python?

Increment and decrement can be done using +=and - =:

value=5 value += 1 print(value) #=> 6 value -= 1 value -= 1 print(value) #=> 4 

33. How to return the binary code of an integer?

Use function CDMY71CDMY:

bin(5) #=> '0b101' 

34. How to remove duplicates from the list?

This can be done by converting the list to a set and then back to the list:

a=[1,1,1,2,3] a=list(set(a)) print(a) #=> [1, 2, 3] 

Note that sets do not necessarily support the order of the list.

35. How to check if a value exists in a list?


'a' in ['a','b','c'] #=> True 'a' in [1,2,3] #=> False 

36. What is the difference between append and extend?

append adds values ​​to the list, and extend adds values ​​to the list from another list:

a=[1,2,3] b=[1,2,3] a.append(6) print(a) #=> [1, 2, 3, 6] b.extend([4,5]) print(b) #=> [1, 2, 3, 4, 5] 

37. How to get the absolute value of an integer?

This can be done using the CDMY73CDMY function:

abs(2) #=> 2 abs(-2) #=> 2 

38. How to combine two lists into a list of tuples?

To combine into a list of tuples, you can use the CDMY74CDMY function, and not only two, but three or more lists.

a=['a','b','c'] b=[1,2,3] [(k,v) for k,v in zip(a,b)] #=> [('a', 1), ('b', 2), ('c', 3)] 

39. How to sort a dictionary by key, in alphabetical order?

You cannot sort the dictionary, because the dictionaries do not support ordering, but you can return a sorted list of tuples with keys and values ​​from the dictionary:

d={'c':3, 'd':4, 'b':2, 'a':1} sorted(d.items()) #=> [('a', 1), ('b', 2), ('c', 3), ('d', 4)] 

40. How is class inheritance implemented in Python?

In the example below, the class CDMY75CDMY is the descendant of CDMY76CDMY. And along with this, the instance methods of the parent class are inherited:

class Car(): def drive(self): print('vroom') class Audi(Car): pass audi=Audi() 

41.How to remove all spaces from a string?

You can split the string at spaces, and then concatenate again without spaces:

s='A string with white space' ''.join(s.split()) #=> 'Astringwithwhitespace' 

Two readers recommended a more canonical replacement method that follows the Python spirit that “explicit is better than implicit”. It also works faster because a new list object is not created here:

s='A string with white space' s.replace(' ', '') #=> 'Astringwithwhitespace' 

42. Why do we use enumerate () when iterating over a sequence?

CDMY77CDMY allows you to track the index during iteration of the sequence. This is a more native way than defining and incrementing an integer representing an index:

li=['a','b','c','d','e'] for idx,val in enumerate(li): print(idx, val) #=> 0 a #=> 1 b #=> 2 c #=> 3 d #=> 4 e 

43. What is the difference between pass, continue and break?

The pass stub means "do nothing." We usually use this function because Python does not allow you to create a class, function or if statement without code inside.

In the example below, an error will occur if there is no code inside CDMY78CDMY, so we use CDMY79CDMY:

a=[1,2,3,4,5] for i in a: if i > 3: pass print(i) #=> 1 #=> 2 #=> 3 #=> 4 #=> 5 

continue sends you to the next element in the loop, stopping execution for the current element. So CDMY80CDMY never gets CDMY81CDMY values:

for i in a: if i < 3: continue print(i) #=> 3 #=> 4 #=> 5 

break breaks the loop and the sequence no longer repeats. Thus, the cycle is interrupted on the number 3, and this and the following elements are not printed:

for i in a: if i == 3: break print(i) #=> 1 #=> 2 

44. Convert the next for loop to a list comprehension

The following CDMY82CDMY cycle is given:

a=[1,2,3,4,5] a2=[] for i in a: a2.append(i + 1) print(a2) #=> [2, 3, 4, 5, 6] 


a3=[i+1 for i in a] print(a3) #=> [2, 3, 4, 5, 6] 

A list generator is generally considered a more canonical way in Python, if it remains clear.

45. Give an example of a ternary operator

The ternary (conditional) operator is a single-line if/else statement.

The syntax is: CDMY83CDMY.

x=5 y=10 'greater' if x > 6 else 'less' #=> 'less' 'greater' if y > 6 else 'less' #=> 'greater' 

46. Check that the line contains only numbers

You can use CDMY84CDMY:

'123a'.isnumeric() #=> False '123'.isnumeric() #=> True 

47. Verify that the string contains only letters

You can use CDMY85CDMY:

'123a'.isalpha() #=> False 'a'.isalpha() #=> True 

48. Verify that the string contains only letters and numbers

Here you can use CDMY86CDMY:

'123abc...'.isalnum() #=> False '123abc'.isalnum() #=> True 

49. Get the list of keys from the dictionary

This can be done by passing the dictionary to the constructor CDMY87CDMY:

d={'id':7, 'name':'Shiba', 'color':'brown', 'speed':'very slow'} list(d) #=> ['id', 'name', 'color', 'speed'] 

50. How to translate a string to upper/lower case?

You can use the string methods CDMY88CDMY and CDMY89CDMY:

small_word='potatocake' big_word='FISHCAKE' small_word.upper() #=> 'POTATOCAKE' big_word.lower() #=> 'fishcake' 

51. What is the difference between remove, del and pop?

CDMY90CDMY removes the first matching value:

li=['a','b','c','d'] li.remove('b') li #=> ['a', 'c', 'd'] 

CDMY91CDMY removes an item by its index:

li=['a','b','c','d'] del li[0] li #=> ['b', 'c', 'd'] 

CDMY92CDMY removes the item by index and returns that item:

li=['a','b','c','d'] li.pop(2) #=> 'c' li #=> ['a', 'b', 'd'] 

52. Give an example of a dictionary generator (dict comprehension)

Below we will create a dictionary with letters of the alphabet as keys and indices as values:

# создаем список букв import string list(string.ascii_lowercase) alphabet=list(string.ascii_lowercase) # генерация словаря d={val:idx for idx,val in enumerate(alphabet)} d #=> {'a': 0, #=> 'b': 1, #=> 'c': 2, #=>... #=> 'x': 23, #=> 'y': 24, #=> 'z': 25} 

53. How is exception handling done in Python?

For exception handling, Python provides a three-word construct: CDMY93CDMY, CDMY94CDMY, and CDMY95CDMY.

The syntax looks something like this:

try: # попробовать сделать это except: # если блок try не сработал, попробовать это finally: # всегда делать это 

Below is a simplified example of such a design. Here, the CDMY96CDMY block fails because we cannot stack integers with strings. The CDMY97CDMY block installs the CDMY98CDMY, and then the CDMY99CDMY block outputs CDMY100CDMY:

try: val=1 + 'A' except: val=10 finally: print('complete') print(val) #=> complete #=> 10 

Of course, it is impossible to 100% guess what questions will be asked at the interview. The best way to prepare is to program, and once again program, gaining experience.

However, the above list will definitely help those who are preparing for an interview for the position of a data scientist or junior/middle Python developer.

What else to read on the topic:

  1. Big Data Analysis in the Cloud: How a Company Becomes Data Oriented .
  2. Big data file formats .
  3. Our telegram channel on digital transformation .