Module II - Day 5

Python Made Easy: Science and Math Edition

Sep-Dec 2025 batch, Vikrant Patil

Date: 13 Dec 2025

Click here for All Notes

Live note are here https://vikrant.dev/python-made-easy-science-math/module2-day5.html

Please login to https://traininghub.vikrant.dev and create a notebook with name module2-day5.ipynb

© Vikrant Patil

String formating

Suppose you want to generate some text based on template

notes_template="""
---
title: "Training Notes"
---

Python Made Easy: Science and Math Edition

Sep-Dec 2025 batch, Vikrant Patil

© Vikrant Patil


## Regular Batch

### Module I
{module1_notes}

### Module II
{module2_notes}
"""
solution = 42
f"Solution to this problem is {solution}"
'Solution to this problem is 42'
problem_name = "life"
problem_solution = 42
f"Solution to problem of {problem_name} is {problem_solution}"
'Solution to problem of life is 42'
module1 = [f"- module1-day{i}.ipynb" for i in range(1,6)]
module1
['- module1-day1.ipynb',
 '- module1-day2.ipynb',
 '- module1-day3.ipynb',
 '- module1-day4.ipynb',
 '- module1-day5.ipynb']
"\n".join(module1)
'- module1-day1.ipynb\n- module1-day2.ipynb\n- module1-day3.ipynb\n- module1-day4.ipynb\n- module1-day5.ipynb'
print("\n".join(module1))
- module1-day1.ipynb
- module1-day2.ipynb
- module1-day3.ipynb
- module1-day4.ipynb
- module1-day5.ipynb
module1_notes = "\n".join(module1)
def create_module_subsection(module_name, days):
    return "\n".join([f"- {module_name}-day{i}" for i in range(1, days+1)])
create_module_subsection("module1", 5)
'- module1-day1\n- module1-day2\n- module1-day3\n- module1-day4\n- module1-day5'
text = "this is some template with a variable {x}"
print(text)
this is some template with a variable {x}
text.format(x=42)
'this is some template with a variable 42'
print(notes_template.format(module1_notes=create_module_subsection("module1", 5),
                      module2_notes=create_module_subsection("module2", 5)))

---
title: "Training Notes"
---

Python Made Easy: Science and Math Edition

Sep-Dec 2025 batch, Vikrant Patil

© Vikrant Patil


## Regular Batch

### Module I
- module1-day1
- module1-day2
- module1-day3
- module1-day4
- module1-day5

### Module II
- module2-day1
- module2-day2
- module2-day3
- module2-day4
- module2-day5
def create_module_subsection(module_name, days):
    return "\n".join([f"- {module_name}-day{i}.ipynb" for i in range(1, days+1)])

def make_notes_qml(filename):
    notes_template="""
---
title: "Training Notes"
---

Python Made Easy: Science and Math Edition

Sep-Dec 2025 batch, Vikrant Patil

© Vikrant Patil


## Regular Batch

### Module I
{module1_notes}

### Module II
{module2_notes}
"""
    with open(filename, "w") as f:
        m1 = create_module_subsection("module1", 5)
        m2 = create_module_subsection("module2", 5)
        contents = notes_template.format(module1_notes = m1,
                                         module2_notes = m2)
        f.write(contents)
        
make_notes_qml("notes_test.qmd")
!cat notes_test.qmd

---
title: "Training Notes"
---

Python Made Easy: Science and Math Edition

Sep-Dec 2025 batch, Vikrant Patil

© Vikrant Patil


## Regular Batch

### Module I
- module1-day1.ipynb
- module1-day2.ipynb
- module1-day3.ipynb
- module1-day4.ipynb
- module1-day5.ipynb

### Module II
- module2-day1.ipynb
- module2-day2.ipynb
- module2-day3.ipynb
- module2-day4.ipynb
- module2-day5.ipynb

Handling CSV files

import pandas as pd
csvfile = "https://raw.githubusercontent.com/vikipedia/python-trainings/master/online_course/source/module2/wallet.csv"
pd.read_csv(csvfile)
Unnamed: 0 date category description debit
0 0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
1 1 2020-10-08 09:53:28.377359 Food Swiggy 328.440080
2 2 2021-02-23 09:53:28.377359 Books Amazon 244.679437
3 3 2020-11-01 14:53:28.377359 Utility Phone 222.756318
4 4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492
... ... ... ... ... ...
95 95 2021-07-19 13:53:28.377359 Utility Phone 388.671213
96 96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562
97 97 2021-03-25 11:53:28.377359 Utility Phone 320.789434
98 98 2021-05-13 15:53:28.377359 Travel Taxi 442.096469
99 99 2020-10-11 16:53:28.377359 Food Hotel 100.455501

100 rows × 5 columns

wallet = pd.read_csv(csvfile)
wallet
Unnamed: 0 date category description debit
0 0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
1 1 2020-10-08 09:53:28.377359 Food Swiggy 328.440080
2 2 2021-02-23 09:53:28.377359 Books Amazon 244.679437
3 3 2020-11-01 14:53:28.377359 Utility Phone 222.756318
4 4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492
... ... ... ... ... ...
95 95 2021-07-19 13:53:28.377359 Utility Phone 388.671213
96 96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562
97 97 2021-03-25 11:53:28.377359 Utility Phone 320.789434
98 98 2021-05-13 15:53:28.377359 Travel Taxi 442.096469
99 99 2020-10-11 16:53:28.377359 Food Hotel 100.455501

100 rows × 5 columns

import requests
def downloadfile(url, filename):
    r = requests.get(url)
    with open(filename, "wb") as f:
        f.write(r.contents)
csvurl = "https://raw.githubusercontent.com/vikipedia/python-trainings/master/online_course/source/module2/wallet.csv"
downloadfile(csvurl, "wallet.csv")
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[39], line 2
      1 csvurl = "https://raw.githubusercontent.com/vikipedia/python-trainings/master/online_course/source/module2/wallet.csv"
----> 2 downloadfile(csvurl, "wallet.csv")

Cell In[38], line 4, in downloadfile(url, filename)
      2 r = requests.get(url)
      3 with open(filename, "wb") as f:
----> 4     f.write(r.contents)

AttributeError: 'Response' object has no attribute 'contents'
r = requests.get(csvurl)
r.status_code
200
def downloadfile(url, filename):
    r = requests.get(url)
    with open(filename, "wb") as f:
        f.write(r.content)
downloadfile(csvurl, "wallet.csv")
files_to_download =[
    "https://raw.githubusercontent.com/vikipedia/python-trainings/master/online_course/source/module2/wallet.csv",
    "https://raw.githubusercontent.com/vikipedia/python-trainings/master/online_course/source/module2/wallet.xlsx"
]
names = [ url.split("/")[-1] for url in files_to_download]
names
['wallet.csv', 'wallet.xlsx']
for url, name in zip(files_to_download, names):
    print(f"Downloading {name}")
    downloadfile(url, name)
Downloading wallet.csv
Downloading wallet.xlsx
pd.read_csv("wallet.csv")
Unnamed: 0 date category description debit
0 0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
1 1 2020-10-08 09:53:28.377359 Food Swiggy 328.440080
2 2 2021-02-23 09:53:28.377359 Books Amazon 244.679437
3 3 2020-11-01 14:53:28.377359 Utility Phone 222.756318
4 4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492
... ... ... ... ... ...
95 95 2021-07-19 13:53:28.377359 Utility Phone 388.671213
96 96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562
97 97 2021-03-25 11:53:28.377359 Utility Phone 320.789434
98 98 2021-05-13 15:53:28.377359 Travel Taxi 442.096469
99 99 2020-10-11 16:53:28.377359 Food Hotel 100.455501

100 rows × 5 columns

pd.read_excel("wallet.xlsx")
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
File ~/usr/local/default/lib/python3.13/site-packages/pandas/compat/_optional.py:135, in import_optional_dependency(name, extra, errors, min_version)
    134 try:
--> 135     module = importlib.import_module(name)
    136 except ImportError:

File /usr/lib/python3.13/importlib/__init__.py:88, in import_module(name, package)
     87         level += 1
---> 88 return _bootstrap._gcd_import(name[level:], package, level)

File <frozen importlib._bootstrap>:1387, in _gcd_import(name, package, level)

File <frozen importlib._bootstrap>:1360, in _find_and_load(name, import_)

File <frozen importlib._bootstrap>:1324, in _find_and_load_unlocked(name, import_)

ModuleNotFoundError: No module named 'openpyxl'

During handling of the above exception, another exception occurred:

ImportError                               Traceback (most recent call last)
Cell In[50], line 1
----> 1 pd.read_excel("wallet.xlsx")

File ~/usr/local/default/lib/python3.13/site-packages/pandas/io/excel/_base.py:495, in read_excel(io, sheet_name, header, names, index_col, usecols, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, parse_dates, date_parser, date_format, thousands, decimal, comment, skipfooter, storage_options, dtype_backend, engine_kwargs)
    493 if not isinstance(io, ExcelFile):
    494     should_close = True
--> 495     io = ExcelFile(
    496         io,
    497         storage_options=storage_options,
    498         engine=engine,
    499         engine_kwargs=engine_kwargs,
    500     )
    501 elif engine and engine != io.engine:
    502     raise ValueError(
    503         "Engine should not be specified when passing "
    504         "an ExcelFile - ExcelFile already has the engine set"
    505     )

File ~/usr/local/default/lib/python3.13/site-packages/pandas/io/excel/_base.py:1567, in ExcelFile.__init__(self, path_or_buffer, engine, storage_options, engine_kwargs)
   1564 self.engine = engine
   1565 self.storage_options = storage_options
-> 1567 self._reader = self._engines[engine](
   1568     self._io,
   1569     storage_options=storage_options,
   1570     engine_kwargs=engine_kwargs,
   1571 )

File ~/usr/local/default/lib/python3.13/site-packages/pandas/io/excel/_openpyxl.py:552, in OpenpyxlReader.__init__(self, filepath_or_buffer, storage_options, engine_kwargs)
    534 @doc(storage_options=_shared_docs["storage_options"])
    535 def __init__(
    536     self,
   (...)    539     engine_kwargs: dict | None = None,
    540 ) -> None:
    541     """
    542     Reader using openpyxl engine.
    543 
   (...)    550         Arbitrary keyword arguments passed to excel engine.
    551     """
--> 552     import_optional_dependency("openpyxl")
    553     super().__init__(
    554         filepath_or_buffer,
    555         storage_options=storage_options,
    556         engine_kwargs=engine_kwargs,
    557     )

File ~/usr/local/default/lib/python3.13/site-packages/pandas/compat/_optional.py:138, in import_optional_dependency(name, extra, errors, min_version)
    136 except ImportError:
    137     if errors == "raise":
--> 138         raise ImportError(msg)
    139     return None
    141 # Handle submodules: if we have submodule, grab parent module from sys.modules

ImportError: Missing optional dependency 'openpyxl'.  Use pip or conda to install openpyxl.
!pip install openpyxl
Collecting openpyxl

  Using cached openpyxl-3.1.5-py2.py3-none-any.whl.metadata (2.5 kB)

Collecting et-xmlfile (from openpyxl)

  Using cached et_xmlfile-2.0.0-py3-none-any.whl.metadata (2.7 kB)

Using cached openpyxl-3.1.5-py2.py3-none-any.whl (250 kB)

Using cached et_xmlfile-2.0.0-py3-none-any.whl (18 kB)

Installing collected packages: et-xmlfile, openpyxl

   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2/2 [openpyxl]━━ 1/2 [openpyxl]

Successfully installed et-xmlfile-2.0.0 openpyxl-3.1.5
pd.read_excel("wallet.xlsx")
Unnamed: 0 date category description debit
0 0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
1 1 2020-10-08 09:53:28.377359 Food Swiggy 328.440080
2 2 2021-02-23 09:53:28.377359 Books Amazon 244.679437
3 3 2020-11-01 14:53:28.377359 Utility Phone 222.756318
4 4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492
... ... ... ... ... ...
95 95 2021-07-19 13:53:28.377359 Utility Phone 388.671213
96 96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562
97 97 2021-03-25 11:53:28.377359 Utility Phone 320.789434
98 98 2021-05-13 15:53:28.377359 Travel Taxi 442.096469
99 99 2020-10-11 16:53:28.377359 Food Hotel 100.455501

100 rows × 5 columns

wallet = pd.read_excel("wallet.xlsx")
wallet
Unnamed: 0 date category description debit
0 0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
1 1 2020-10-08 09:53:28.377359 Food Swiggy 328.440080
2 2 2021-02-23 09:53:28.377359 Books Amazon 244.679437
3 3 2020-11-01 14:53:28.377359 Utility Phone 222.756318
4 4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492
... ... ... ... ... ...
95 95 2021-07-19 13:53:28.377359 Utility Phone 388.671213
96 96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562
97 97 2021-03-25 11:53:28.377359 Utility Phone 320.789434
98 98 2021-05-13 15:53:28.377359 Travel Taxi 442.096469
99 99 2020-10-11 16:53:28.377359 Food Hotel 100.455501

100 rows × 5 columns

wallet.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Unnamed: 0   100 non-null    int64  
 1   date         100 non-null    object 
 2   category     100 non-null    object 
 3   description  100 non-null    object 
 4   debit        100 non-null    float64
dtypes: float64(1), int64(1), object(3)
memory usage: 4.0+ KB
wallet.head()
Unnamed: 0 date category description debit
0 0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
1 1 2020-10-08 09:53:28.377359 Food Swiggy 328.440080
2 2 2021-02-23 09:53:28.377359 Books Amazon 244.679437
3 3 2020-11-01 14:53:28.377359 Utility Phone 222.756318
4 4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492
wallet.tail()
Unnamed: 0 date category description debit
95 95 2021-07-19 13:53:28.377359 Utility Phone 388.671213
96 96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562
97 97 2021-03-25 11:53:28.377359 Utility Phone 320.789434
98 98 2021-05-13 15:53:28.377359 Travel Taxi 442.096469
99 99 2020-10-11 16:53:28.377359 Food Hotel 100.455501
wallet['category']
0       Music
1        Food
2       Books
3     Utility
4       Books
       ...   
95    Utility
96      Books
97    Utility
98     Travel
99       Food
Name: category, Length: 100, dtype: object
wallet
Unnamed: 0 date category description debit
0 0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
1 1 2020-10-08 09:53:28.377359 Food Swiggy 328.440080
2 2 2021-02-23 09:53:28.377359 Books Amazon 244.679437
3 3 2020-11-01 14:53:28.377359 Utility Phone 222.756318
4 4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492
... ... ... ... ... ...
95 95 2021-07-19 13:53:28.377359 Utility Phone 388.671213
96 96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562
97 97 2021-03-25 11:53:28.377359 Utility Phone 320.789434
98 98 2021-05-13 15:53:28.377359 Travel Taxi 442.096469
99 99 2020-10-11 16:53:28.377359 Food Hotel 100.455501

100 rows × 5 columns

type(wallet)
pandas.core.frame.DataFrame
type(wallet['category'])
pandas.core.series.Series
s = pd.Series([34, 45, 56],index=['a','b','c'])
s
a    34
b    45
c    56
dtype: int64
s[0]
/tmp/ipykernel_62707/243613605.py:1: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  s[0]
np.int64(34)
s['a']
np.int64(34)
wallet
Unnamed: 0 date category description debit
0 0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
1 1 2020-10-08 09:53:28.377359 Food Swiggy 328.440080
2 2 2021-02-23 09:53:28.377359 Books Amazon 244.679437
3 3 2020-11-01 14:53:28.377359 Utility Phone 222.756318
4 4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492
... ... ... ... ... ...
95 95 2021-07-19 13:53:28.377359 Utility Phone 388.671213
96 96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562
97 97 2021-03-25 11:53:28.377359 Utility Phone 320.789434
98 98 2021-05-13 15:53:28.377359 Travel Taxi 442.096469
99 99 2020-10-11 16:53:28.377359 Food Hotel 100.455501

100 rows × 5 columns

pd.Series(["some", "text", "value"])
0     some
1     text
2    value
dtype: object
values = pd.Series([235, 233, 500, 2000], index=['AAPLE','AT&T','SIMPLE','INFY'])
values
AAPLE      235
AT&T       233
SIMPLE     500
INFY      2000
dtype: int64
wallet
Unnamed: 0 date category description debit
0 0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
1 1 2020-10-08 09:53:28.377359 Food Swiggy 328.440080
2 2 2021-02-23 09:53:28.377359 Books Amazon 244.679437
3 3 2020-11-01 14:53:28.377359 Utility Phone 222.756318
4 4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492
... ... ... ... ... ...
95 95 2021-07-19 13:53:28.377359 Utility Phone 388.671213
96 96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562
97 97 2021-03-25 11:53:28.377359 Utility Phone 320.789434
98 98 2021-05-13 15:53:28.377359 Travel Taxi 442.096469
99 99 2020-10-11 16:53:28.377359 Food Hotel 100.455501

100 rows × 5 columns

wallet.groupby(["category"]).sum(numeric_only=True)
Unnamed: 0 debit
category
Books 604 4929.750393
Food 1376 8281.189173
Music 970 4233.451868
Travel 721 6052.931876
Utility 1279 7562.267233
wallet.groupby(['description']).sum(numeric_only=True)
Unnamed: 0 debit
description
Amazon 411 2504.690567
Amazon Kindle 171 1389.052376
Auto 121 2210.428935
Electricity 536 2885.064355
Flipcart 335 2503.255216
Hotel 565 2752.174732
Metro 139 1216.463665
Netflix 394 1546.567562
Phone 743 4677.202878
Swiggy 386 1936.495366
Taxi 461 2626.039276
Zomato 425 3592.519075
spotify 263 1219.636541
wallet.groupby(['category','description']).sum(numeric_only=True)
Unnamed: 0 debit
category description
Books Amazon 98 1037.442802
Amazon Kindle 171 1389.052376
Flipcart 335 2503.255216
Food Hotel 565 2752.174732
Swiggy 386 1936.495366
Zomato 425 3592.519075
Music Amazon 313 1467.247766
Netflix 394 1546.567562
spotify 263 1219.636541
Travel Auto 121 2210.428935
Metro 139 1216.463665
Taxi 461 2626.039276
Utility Electricity 536 2885.064355
Phone 743 4677.202878
grp_cat = wallet.groupby(['category']).sum(numeric_only=True)
grp_cat
Unnamed: 0 debit
category
Books 604 4929.750393
Food 1376 8281.189173
Music 970 4233.451868
Travel 721 6052.931876
Utility 1279 7562.267233
grp_cat['debit']
category
Books      4929.750393
Food       8281.189173
Music      4233.451868
Travel     6052.931876
Utility    7562.267233
Name: debit, dtype: float64
grp_cat.debit.sum()
np.float64(31059.59054317728)
grp_cat.debit.mean()
np.float64(6211.9181086354565)
wallet['debit'].sum()
np.float64(31059.590543177284)
wallet
Unnamed: 0 date category description debit
0 0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
1 1 2020-10-08 09:53:28.377359 Food Swiggy 328.440080
2 2 2021-02-23 09:53:28.377359 Books Amazon 244.679437
3 3 2020-11-01 14:53:28.377359 Utility Phone 222.756318
4 4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492
... ... ... ... ... ...
95 95 2021-07-19 13:53:28.377359 Utility Phone 388.671213
96 96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562
97 97 2021-03-25 11:53:28.377359 Utility Phone 320.789434
98 98 2021-05-13 15:53:28.377359 Travel Taxi 442.096469
99 99 2020-10-11 16:53:28.377359 Food Hotel 100.455501

100 rows × 5 columns

grp_cat
Unnamed: 0 debit
category
Books 604 4929.750393
Food 1376 8281.189173
Music 970 4233.451868
Travel 721 6052.931876
Utility 1279 7562.267233
grp_cat.to_csv("wallet-category.csv")