Important functions in Data Analysis and Visualizations

Important functions in Data Analysis and Visualizations

·

3 min read

1) Zip function

  • The "Zip" function in Python is a built-in function that combines elements from multiple iterable objects (such as lists, tuples, or strings) and returns an iterator that generates tuples containing elements from the input iterables. It stops when the shortest input iterable is exhausted.

  • Example -

names = ['Alice', 'Bob', 'Charlie']
ages = [25, 30, 22]

zipped_data = zip(names, ages)

for name, age in zipped_data:
    print(f'{name} is {age} years old')
  • Output -
Alice is 25 years old
Bob is 30 years old
Charlie is 22 years old
  • Note - You can also use the zip function to unzip data back into separate lists using the * operator.

  • You can use zip to create dictionaries from two lists.

pythonCopy codekeys = ['name', 'age']
values = ['Alice', 25]

data_dict = dict(zip(keys, values))

print(data_dict)  #    OUTPUT --->  {'name': 'Alice', 'age': 25}

2) Lambda function

  • Lambda function is a concise and short way to create anonymous functions . Syntax-
lambda arguments: expression
  • Example - To square a given number we will write a regular function and its equivalent lambda function
# Regular function
def square(x):
    return x**2

# Equivalent lambda function
lambda_square = lambda x: x**2

print(square(5))          # Output: 25
print(lambda_square(5))   # Output: 25
  • Note - Lambda functions are often used with map() and filter() functions to process sequences of data.
# MAP FUNCTION
numbers = [1, 2, 3, 4, 5]
squared_numbers = map(lambda x: x**2, numbers)
# Output: [1, 4, 9, 16, 25]
# FILTER FUNCTION
even_numbers = filter(lambda x: x % 2 == 0, numbers)
# Output: [2, 4]

3) Cumsum

  • cumsum computes the cumulative sum of a column in a DataFrame. Example-
import pandas as pd

# Sample DataFrame
data = {'Values': [1, 2, 3, 4, 5]}
df = pd.DataFrame(data)

# Cumulative sum of the 'Values' column
df['Cumulative_Sum'] = df['Values'].cumsum()

print(df)

Output :

   Values  Cumulative_Sum
0       1               1
1       2               3
2       3               6
3       4              10
4       5              15

4) Cut

  • cut is used for binning or discretization of continuous values into discrete intervals (bins).
# Binning 'Values' into three bins: Low, Medium, High
data = {'Values': [1, 2, 3, 4, 5]}
df = pd.DataFrame(data)
bins = [0, 2, 4, 6]
labels = ['Low', 'Medium', 'High']

df['Category'] = pd.cut(df['Values'], bins=bins, labels=labels)

print(df)

Output-

   Values   Cumulative_Sum    Category
0       1            1           Low
1       2            3           Medium
2       3            6           Medium
3       4            10          High
4       5            15          High

5) Qcut

  • qcut is used for quantile-based binning. It divides the data into intervals with the same number of points.
# Quantile-based binning of 'Values' into three bins
data = {'Values': [1, 2, 3, 4, 5]}
df = pd.DataFrame(data)
df['Quantile_Category'] = pd.qcut(df['Values'], q=3, labels=['Low', 'Medium', 'High'])

print(df)

#OUTPUT -

      Values  Cumulative_Sum  Category    Quantile_Category
0       1               1       Low              Low
1       2               3       Medium           Low
2       3               6       Medium           Medium
3       4              10       High             Medium
4       5              15       High             High

6) Generator function

  • A generator function in Python is a special type of function that allows you to iterate over a potentially large sequence of data without generating the entire sequence in memory at once. It uses the yield keyword to produce a series of values over multiple calls, making it memory-efficient compared to generating a full list.

  • Example-

      def fibonacci_generator(limit):
          a, b = 0, 1
          while a < limit:
              yield a
              a, b = b, a + b
    
      # Using the generator function
      fibonacci_limit = 20
      fibonacci_gen = fibonacci_generator(fibonacci_limit)
    
      # Iterating through the generator
      for number in fibonacci_gen:
          print(number, end=' ')
    
      Output:          0 1 1 2 3 5 8 13