Module III - Day 2

Python Foundation

Nov-Jan 2025 batch, Vikrant Patil

Date: 10 Feb 2026

Click here for All Notes

login to https://traininghub.vikrant.dev/ with your username and create a notebook with name module3-day2.ipynb

© Vikrant Patil

Pandas

!ls 
about.qmd             index.qmd       poem.txt
add.py                index.qmd~      push
adventures_in_iterations.pdf  Makefile        __pycache__
api.py                Makefile~       _quarto.yml
api.py~               module1-day1.ipynb  _quarto.yml~
args.py               module1-day2.ipynb  sample1.html
bank0.py              module1-day3.ipynb  sample.html
bank1.py              module1-day4.ipynb  say_hello.py
checkname.py              module2-day1.ipynb  _site
command_fib.py            module2-day2.ipynb  styles.css
data.csv              module2-day3.ipynb  tax_computation.py
day4.org              module2-day4.ipynb  test
day4.org~             module3-day1.ipynb  test.py
digits.bin            module3-day1.org    test.txt
digits.txt            module3-day2.ipynb  todays_date.py
empty.py              notes.qmd       topics.qmd
fibs1.py              notes.qmd~      topics.qmd~
fibs.py               numbers.txt     trainer.qmd
hello.py              passkey.txt     wallet.xlsx
%%file stocks.csv
ticker,value,high,low,volume
IBM,255,256,253.5,100
MICROSOFT,350.5,350.5,345.7,300
AGILLENT,450.5,450.5,445.7,400
INFY,355,356,353.5,300
Overwriting stocks.csv
import pandas as pd
pd.read_csv("stocks.csv")
ticker value high low volume
0 IBM 255.0 256.0 253.5 100
1 MICROSOFT 350.5 350.5 345.7 300
2 AGILLENT 450.5 450.5 445.7 400
3 INFY 355.0 356.0 353.5 300
stocks = pd.read_csv("stocks.csv")
stocks
ticker value high low volume
0 IBM 255.0 256.0 253.5 100
1 MICROSOFT 350.5 350.5 345.7 300
2 AGILLENT 450.5 450.5 445.7 400
3 INFY 355.0 356.0 353.5 300
type(stocks)
pandas.DataFrame
url = "https://raw.githubusercontent.com/vikipedia/python-trainings/master/online_course/source/module2/wallet.csv"
wallet = pd.read_csv(url)
wallet
Unnamed: 0 date category description debit
0 0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
1 1 2020-10-08 09:53:28.377359 Food Swiggy 328.440080
2 2 2021-02-23 09:53:28.377359 Books Amazon 244.679437
3 3 2020-11-01 14:53:28.377359 Utility Phone 222.756318
4 4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492
... ... ... ... ... ...
95 95 2021-07-19 13:53:28.377359 Utility Phone 388.671213
96 96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562
97 97 2021-03-25 11:53:28.377359 Utility Phone 320.789434
98 98 2021-05-13 15:53:28.377359 Travel Taxi 442.096469
99 99 2020-10-11 16:53:28.377359 Food Hotel 100.455501

100 rows × 5 columns

excelurl = "https://github.com/vikipedia/python-trainings/raw/master/online_course/source/module2/wallet.xlsx"
wallet = pd.read_excel(excelurl)
wallet
Unnamed: 0 date category description debit
0 0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
1 1 2020-10-08 09:53:28.377359 Food Swiggy 328.440080
2 2 2021-02-23 09:53:28.377359 Books Amazon 244.679437
3 3 2020-11-01 14:53:28.377359 Utility Phone 222.756318
4 4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492
... ... ... ... ... ...
95 95 2021-07-19 13:53:28.377359 Utility Phone 388.671213
96 96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562
97 97 2021-03-25 11:53:28.377359 Utility Phone 320.789434
98 98 2021-05-13 15:53:28.377359 Travel Taxi 442.096469
99 99 2020-10-11 16:53:28.377359 Food Hotel 100.455501

100 rows × 5 columns

try this Try to load data from url “https://www.moneycontrol.com/markets/indian-indices/” using pandas.read_html function. What is the type of result returned?

data = pd.read_html("https://www.moneycontrol.com/markets/indian-indices/")
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
File ~/usr/local/default/lib/python3.13/site-packages/pandas/compat/_optional.py:158, in import_optional_dependency(name, extra, min_version, errors)
    157 try:
--> 158     module = importlib.import_module(name)
    159 except ImportError as err:

File /usr/lib/python3.13/importlib/__init__.py:88, in import_module(name, package)
     87         level += 1
---> 88 return _bootstrap._gcd_import(name[level:], package, level)

File <frozen importlib._bootstrap>:1387, in _gcd_import(name, package, level)

File <frozen importlib._bootstrap>:1360, in _find_and_load(name, import_)

File <frozen importlib._bootstrap>:1310, in _find_and_load_unlocked(name, import_)

File <frozen importlib._bootstrap>:488, in _call_with_frames_removed(f, *args, **kwds)

File <frozen importlib._bootstrap>:1387, in _gcd_import(name, package, level)

File <frozen importlib._bootstrap>:1360, in _find_and_load(name, import_)

File <frozen importlib._bootstrap>:1324, in _find_and_load_unlocked(name, import_)

ModuleNotFoundError: No module named 'lxml'

The above exception was the direct cause of the following exception:

ImportError                               Traceback (most recent call last)
Cell In[12], line 1
----> 1 data = pd.read_html("https://www.moneycontrol.com/markets/indian-indices/")

File ~/usr/local/default/lib/python3.13/site-packages/pandas/io/html.py:1226, in read_html(io, match, flavor, header, index_col, skiprows, attrs, parse_dates, thousands, encoding, decimal, converters, na_values, keep_default_na, displayed_only, extract_links, dtype_backend, storage_options)
   1222 check_dtype_backend(dtype_backend)
   1224 io = stringify_path(io)
-> 1226 return _parse(
   1227     flavor=flavor,
   1228     io=io,
   1229     match=match,
   1230     header=header,
   1231     index_col=index_col,
   1232     skiprows=skiprows,
   1233     parse_dates=parse_dates,
   1234     thousands=thousands,
   1235     attrs=attrs,
   1236     encoding=encoding,
   1237     decimal=decimal,
   1238     converters=converters,
   1239     na_values=na_values,
   1240     keep_default_na=keep_default_na,
   1241     displayed_only=displayed_only,
   1242     extract_links=extract_links,
   1243     dtype_backend=dtype_backend,
   1244     storage_options=storage_options,
   1245 )

File ~/usr/local/default/lib/python3.13/site-packages/pandas/io/html.py:967, in _parse(flavor, io, match, attrs, encoding, displayed_only, extract_links, storage_options, **kwargs)
    965 retained = None
    966 for flav in flavor:
--> 967     parser = _parser_dispatch(flav)
    968     p = parser(
    969         io,
    970         compiled_match,
   (...)    975         storage_options,
    976     )
    978     try:

File ~/usr/local/default/lib/python3.13/site-packages/pandas/io/html.py:914, in _parser_dispatch(flavor)
    912     import_optional_dependency("bs4")
    913 else:
--> 914     import_optional_dependency("lxml.etree")
    915 return _valid_parsers[flavor]

File ~/usr/local/default/lib/python3.13/site-packages/pandas/compat/_optional.py:161, in import_optional_dependency(name, extra, min_version, errors)
    159 except ImportError as err:
    160     if errors == "raise":
--> 161         raise ImportError(msg) from err
    162     return None
    164 # Handle submodules: if we have submodule, grab parent module from sys.modules

ImportError: `Import lxml` failed.  Use pip or conda to install the lxml package.
!pip install lxml
Collecting lxml
  Using cached lxml-6.0.2-cp313-cp313-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl.metadata (3.6 kB)
Using cached lxml-6.0.2-cp313-cp313-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl (5.2 MB)
Installing collected packages: lxml
Successfully installed lxml-6.0.2
data = pd.read_html("https://www.moneycontrol.com/markets/indian-indices/")
data
[  Stock Name   Sector      LTP   Change     %Chg
 0    No Data  No Data  No Data  No Data  No Data,
   Stock Name   Sector      LTP   Change     %Chg
 0    No Data  No Data  No Data  No Data  No Data,
        Stock Name                  Sector       LTP  Change  %Chg
 0   Tech Mahindra    Computers - Software   1512.70  -21.70 -1.41
 1   Maruti Suzuki     Auto - Cars & Jeeps  15051.00 -186.00 -1.22
 2   Bajaj Finance          Finance - NBFC   1012.75  -12.00 -1.17
 3            TMPV      Auto - LCVs & HCVs    377.25   -3.00 -0.79
 4   Eicher Motors      Auto - LCVs & HCVs   8010.50  -54.50 -0.68
 5             M&M     Auto - Cars & Jeeps   3509.70  -23.70 -0.67
 6      Bajaj Auto   Auto - 2 & 3 Wheelers   9697.50  -62.50 -0.64
 7           Trent                  Retail   4229.50  -22.50 -0.53
 8   Jio Financial        Finance - Others    262.55   -1.40 -0.53
 9           Wipro    Computers - Software    213.26   -0.83 -0.39
 10        Infosys    Computers - Software   1365.60   -3.50 -0.26
 11     ICICI Bank  Banks - Private Sector   1411.60   -3.00 -0.21
 12       Hindalco            Iron & Steel    907.45   -1.55 -0.17,
          Stock Name                             Sector       LTP  Change  %Chg
 0         HDFC Bank             Banks - Private Sector    925.80   21.90  2.42
 1   Adani Enterpris                            Trading   2184.60   48.00  2.25
 2    Max Healthcare     Hospital & Healthcare Services   1075.85   22.25  2.11
 3         Axis Bank             Banks - Private Sector   1358.30   26.00  1.95
 4             Cipla                    Pharmaceuticals   1356.40   24.90  1.87
 5   Shriram Finance  Finance - Leasing & Hire Purchase   1085.40   19.60  1.84
 6              NTPC  Power - Generation & Distribution    369.10    6.10  1.68
 7              ONGC       Oil Drilling And Exploration    271.55    4.15  1.55
 8       Adani Ports           Infrastructure - General   1540.10   21.10  1.39
 9     Bajaj Finserv              Finance - Investments   2050.90   27.70  1.37
 10              ITC                        Diversified    317.95    4.20  1.34
 11        JSW Steel                      Steel - Large   1248.20   16.20  1.31
 12     Asian Paints                 Paints & Varnishes   2397.20   30.80  1.30
 13       Tata Steel                       Iron & Steel    205.81    2.63  1.29
 14         Reliance                         Refineries   1437.10   17.50  1.23
 15    Bharti Airtel       Telecommunications - Service   2029.40   24.70  1.23
 16   Kotak Mahindra             Banks - Private Sector    425.60    4.95  1.18
 17        HDFC Life            Life & Health Insurance    703.90    7.10  1.02
 18  Apollo Hospital       Hospitals & Medical Services   7618.00   75.50  1.00
 19           Grasim                        Diversified   2912.70   24.70  0.86
 20              SBI              Banks - Public Sector   1208.10    9.50  0.79
 21           Larsen           Infrastructure - General   4201.50   27.60  0.66
 22  TATA Cons. Prod         Plantations - Tea & Coffee   1139.20    7.00  0.62
 23              HUL                      Personal Care   2319.00   13.80  0.60
 24           Nestle                    Food Processing   1289.80    7.20  0.56
 25      Bharat Elec                Aerospace & Defence    438.00    2.45  0.56
 26              TCS               Computers - Software   2706.60   14.40  0.53
 27          Eternal                    Online Services    286.60    1.40  0.49
 28         HCL Tech               Computers - Software   1461.80    6.60  0.45
 29  SBI Life Insura            Life & Health Insurance   2042.40    8.20  0.40
 30   Interglobe Avi              Transport & Logistics   4940.80   11.60  0.24
 31       Sun Pharma                    Pharmaceuticals   1700.50    3.00  0.18
 32  UltraTechCement                     Cement - Major  12981.00   18.00  0.14
 33   Dr Reddys Labs                    Pharmaceuticals   1269.40    1.30  0.10
 34    Titan Company                      Miscellaneous   4181.10    1.90  0.05,
         Stock Name                             Sector    LTP  Change  %Chg
 0  Power Grid Corp  Power - Generation & Distribution  300.5   13.30  4.63
 1       Coal India                  Mining & Minerals  422.5   13.55  3.31,
   Stock Name   Sector      LTP   Change     %Chg
 0    No Data  No Data  No Data  No Data  No Data]
type(data)
list
len(data)
6
data[0]
Stock Name Sector LTP Change %Chg
0 No Data No Data No Data No Data No Data
data[1]
Stock Name Sector LTP Change %Chg
0 No Data No Data No Data No Data No Data
data[2]
Stock Name Sector LTP Change %Chg
0 Tech Mahindra Computers - Software 1512.70 -21.70 -1.41
1 Maruti Suzuki Auto - Cars & Jeeps 15051.00 -186.00 -1.22
2 Bajaj Finance Finance - NBFC 1012.75 -12.00 -1.17
3 TMPV Auto - LCVs & HCVs 377.25 -3.00 -0.79
4 Eicher Motors Auto - LCVs & HCVs 8010.50 -54.50 -0.68
5 M&M Auto - Cars & Jeeps 3509.70 -23.70 -0.67
6 Bajaj Auto Auto - 2 & 3 Wheelers 9697.50 -62.50 -0.64
7 Trent Retail 4229.50 -22.50 -0.53
8 Jio Financial Finance - Others 262.55 -1.40 -0.53
9 Wipro Computers - Software 213.26 -0.83 -0.39
10 Infosys Computers - Software 1365.60 -3.50 -0.26
11 ICICI Bank Banks - Private Sector 1411.60 -3.00 -0.21
12 Hindalco Iron & Steel 907.45 -1.55 -0.17
data[3]
Stock Name Sector LTP Change %Chg
0 HDFC Bank Banks - Private Sector 925.80 21.90 2.42
1 Adani Enterpris Trading 2184.60 48.00 2.25
2 Max Healthcare Hospital & Healthcare Services 1075.85 22.25 2.11
3 Axis Bank Banks - Private Sector 1358.30 26.00 1.95
4 Cipla Pharmaceuticals 1356.40 24.90 1.87
5 Shriram Finance Finance - Leasing & Hire Purchase 1085.40 19.60 1.84
6 NTPC Power - Generation & Distribution 369.10 6.10 1.68
7 ONGC Oil Drilling And Exploration 271.55 4.15 1.55
8 Adani Ports Infrastructure - General 1540.10 21.10 1.39
9 Bajaj Finserv Finance - Investments 2050.90 27.70 1.37
10 ITC Diversified 317.95 4.20 1.34
11 JSW Steel Steel - Large 1248.20 16.20 1.31
12 Asian Paints Paints & Varnishes 2397.20 30.80 1.30
13 Tata Steel Iron & Steel 205.81 2.63 1.29
14 Reliance Refineries 1437.10 17.50 1.23
15 Bharti Airtel Telecommunications - Service 2029.40 24.70 1.23
16 Kotak Mahindra Banks - Private Sector 425.60 4.95 1.18
17 HDFC Life Life & Health Insurance 703.90 7.10 1.02
18 Apollo Hospital Hospitals & Medical Services 7618.00 75.50 1.00
19 Grasim Diversified 2912.70 24.70 0.86
20 SBI Banks - Public Sector 1208.10 9.50 0.79
21 Larsen Infrastructure - General 4201.50 27.60 0.66
22 TATA Cons. Prod Plantations - Tea & Coffee 1139.20 7.00 0.62
23 HUL Personal Care 2319.00 13.80 0.60
24 Nestle Food Processing 1289.80 7.20 0.56
25 Bharat Elec Aerospace & Defence 438.00 2.45 0.56
26 TCS Computers - Software 2706.60 14.40 0.53
27 Eternal Online Services 286.60 1.40 0.49
28 HCL Tech Computers - Software 1461.80 6.60 0.45
29 SBI Life Insura Life & Health Insurance 2042.40 8.20 0.40
30 Interglobe Avi Transport & Logistics 4940.80 11.60 0.24
31 Sun Pharma Pharmaceuticals 1700.50 3.00 0.18
32 UltraTechCement Cement - Major 12981.00 18.00 0.14
33 Dr Reddys Labs Pharmaceuticals 1269.40 1.30 0.10
34 Titan Company Miscellaneous 4181.10 1.90 0.05
data[4]
Stock Name Sector LTP Change %Chg
0 Power Grid Corp Power - Generation & Distribution 300.5 13.30 4.63
1 Coal India Mining & Minerals 422.5 13.55 3.31
data[5]
Stock Name Sector LTP Change %Chg
0 No Data No Data No Data No Data No Data

DataFrame and Series

stocks.columns
Index(['ticker', 'value', 'high', 'low', 'volume'], dtype='str')
type(stocks['value'])
pandas.Series
s1 = pd.Series([421, 328, 123, 234])
s1
0    421
1    328
2    123
3    234
dtype: int64
s = pd.Series([421, 328, 123, 234], index=["APPLE","AT&T","IBM","NIKE"])
s
APPLE    421
AT&T     328
IBM      123
NIKE     234
dtype: int64
[1, 2, 3, 4, 5]
[1, 2, 3, 4, 5]
{"one": 1, "two": 2}
{'one': 1, 'two': 2}
s['APPLE']
np.int64(421)
labels = ["APPLE","AT&T","IBM","NIKE"]
value = pd.Series([234.5, 221.6, 125.7, 100.5], index=labels)
high = pd.Series([240.32, 222.5, 127.3, 105.0], index=labels)
low = pd.Series([233.0, 220.0, 123.0, 104.0], index=labels)
volume = pd.Series([100, 200, 50, 1000], index=labels)
pd.DataFrame({"value":value, 
              "high":high, 
              "low":low, 
              "volume":volume})
value high low volume
APPLE 234.5 240.32 233.0 100
AT&T 221.6 222.50 220.0 200
IBM 125.7 127.30 123.0 50
NIKE 100.5 105.00 104.0 1000
pd.DataFrame({"value" : [234.5, 221.6, 125.7, 100.5],
              "high" : [240.32, 222.5, 127.3, 105.0],
              "low" : [233.0, 220.0, 123.0, 104.0],
              "volume" : [100, 200, 50, 1000]})
value high low volume
0 234.5 240.32 233.0 100
1 221.6 222.50 220.0 200
2 125.7 127.30 123.0 50
3 100.5 105.00 104.0 1000
stocks1 = pd.DataFrame({"value" : [234.5, 221.6, 125.7, 100.5],
              "high" : [240.32, 222.5, 127.3, 105.0],
              "low" : [233.0, 220.0, 123.0, 104.0],
              "volume" : [100, 200, 50, 1000]}, index=["APPLE","AT&T","IBM","NIKE"])
stocks1
value high low volume
APPLE 234.5 240.32 233.0 100
AT&T 221.6 222.50 220.0 200
IBM 125.7 127.30 123.0 50
NIKE 100.5 105.00 104.0 1000
stocks
ticker value high low volume
0 IBM 255.0 256.0 253.5 100
1 MICROSOFT 350.5 350.5 345.7 300
2 AGILLENT 450.5 450.5 445.7 400
3 INFY 355.0 356.0 353.5 300
stocks.set_index("ticker")
value high low volume
ticker
IBM 255.0 256.0 253.5 100
MICROSOFT 350.5 350.5 345.7 300
AGILLENT 450.5 450.5 445.7 400
INFY 355.0 356.0 353.5 300
stocks['value']
0    255.0
1    350.5
2    450.5
3    355.0
Name: value, dtype: float64
stocks.value # you can access columns like an attribute if column name does not have space or special char
0    255.0
1    350.5
2    450.5
3    355.0
Name: value, dtype: float64
d = pd.DataFrame({"high value": [1, 2, 3, 4, 5],
                  "low_value": [1, 1,1 , 1, 1]})
d
high value low_value
0 1 1
1 2 1
2 3 1
3 4 1
4 5 1
d['high value']
0    1
1    2
2    3
3    4
4    5
Name: high value, dtype: int64
d.low_value
0    1
1    1
2    1
3    1
4    1
Name: low_value, dtype: int64
d['low_value']
0    1
1    1
2    1
3    1
4    1
Name: low_value, dtype: int64
data[2]
Stock Name Sector LTP Change %Chg
0 Tech Mahindra Computers - Software 1512.70 -21.70 -1.41
1 Maruti Suzuki Auto - Cars & Jeeps 15051.00 -186.00 -1.22
2 Bajaj Finance Finance - NBFC 1012.75 -12.00 -1.17
3 TMPV Auto - LCVs & HCVs 377.25 -3.00 -0.79
4 Eicher Motors Auto - LCVs & HCVs 8010.50 -54.50 -0.68
5 M&M Auto - Cars & Jeeps 3509.70 -23.70 -0.67
6 Bajaj Auto Auto - 2 & 3 Wheelers 9697.50 -62.50 -0.64
7 Trent Retail 4229.50 -22.50 -0.53
8 Jio Financial Finance - Others 262.55 -1.40 -0.53
9 Wipro Computers - Software 213.26 -0.83 -0.39
10 Infosys Computers - Software 1365.60 -3.50 -0.26
11 ICICI Bank Banks - Private Sector 1411.60 -3.00 -0.21
12 Hindalco Iron & Steel 907.45 -1.55 -0.17
data[2]['Stock Name']
0     Tech Mahindra
1     Maruti Suzuki
2     Bajaj Finance
3              TMPV
4     Eicher Motors
5               M&M
6        Bajaj Auto
7             Trent
8     Jio Financial
9             Wipro
10          Infosys
11       ICICI Bank
12         Hindalco
Name: Stock Name, dtype: str

how to access rows?

stocks
ticker value high low volume
0 IBM 255.0 256.0 253.5 100
1 MICROSOFT 350.5 350.5 345.7 300
2 AGILLENT 450.5 450.5 445.7 400
3 INFY 355.0 356.0 353.5 300
stocks1
value high low volume
APPLE 234.5 240.32 233.0 100
AT&T 221.6 222.50 220.0 200
IBM 125.7 127.30 123.0 50
NIKE 100.5 105.00 104.0 1000
stocks1.loc['APPLE'] # this is how you access a row from dataframe
value     234.50
high      240.32
low       233.00
volume    100.00
Name: APPLE, dtype: float64
stocks
ticker value high low volume
0 IBM 255.0 256.0 253.5 100
1 MICROSOFT 350.5 350.5 345.7 300
2 AGILLENT 450.5 450.5 445.7 400
3 INFY 355.0 356.0 353.5 300
stocks.loc[0]
ticker      IBM
value     255.0
high      256.0
low       253.5
volume      100
Name: 0, dtype: object
stocks1.loc[['APPLE','IBM']]
value high low volume
APPLE 234.5 240.32 233.0 100
IBM 125.7 127.30 123.0 50
subset = stocks1.loc[['APPLE','IBM']]
subset
value high low volume
APPLE 234.5 240.32 233.0 100
IBM 125.7 127.30 123.0 50
stocks1.loc['APPLE','volume']
np.int64(100)
stocks1.loc[['APPLE','IBM'],['value','volume']]
value volume
APPLE 234.5 100
IBM 125.7 50

Basic working with dataframe

url = "https://raw.githubusercontent.com/vikipedia/python-trainings/master/online_course/source/module2/wallet.csv"
wallet = pd.read_csv(url)
wallet
Unnamed: 0 date category description debit
0 0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
1 1 2020-10-08 09:53:28.377359 Food Swiggy 328.440080
2 2 2021-02-23 09:53:28.377359 Books Amazon 244.679437
3 3 2020-11-01 14:53:28.377359 Utility Phone 222.756318
4 4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492
... ... ... ... ... ...
95 95 2021-07-19 13:53:28.377359 Utility Phone 388.671213
96 96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562
97 97 2021-03-25 11:53:28.377359 Utility Phone 320.789434
98 98 2021-05-13 15:53:28.377359 Travel Taxi 442.096469
99 99 2020-10-11 16:53:28.377359 Food Hotel 100.455501

100 rows × 5 columns

wallet.head()
Unnamed: 0 date category description debit
0 0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
1 1 2020-10-08 09:53:28.377359 Food Swiggy 328.440080
2 2 2021-02-23 09:53:28.377359 Books Amazon 244.679437
3 3 2020-11-01 14:53:28.377359 Utility Phone 222.756318
4 4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492
wallet.head(10)
Unnamed: 0 date category description debit
0 0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
1 1 2020-10-08 09:53:28.377359 Food Swiggy 328.440080
2 2 2021-02-23 09:53:28.377359 Books Amazon 244.679437
3 3 2020-11-01 14:53:28.377359 Utility Phone 222.756318
4 4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492
5 5 2021-07-28 19:53:28.377359 Utility Electricity 219.941711
6 6 2021-04-16 11:53:28.377359 Books Amazon Kindle 270.322595
7 7 2021-02-15 10:53:28.377359 Food Zomato 457.183104
8 8 2021-08-10 19:53:28.377359 Utility Phone 151.496373
9 9 2020-11-29 14:53:28.377359 Travel Auto 443.618884
top10 = wallet.head(10)
top10
Unnamed: 0 date category description debit
0 0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
1 1 2020-10-08 09:53:28.377359 Food Swiggy 328.440080
2 2 2021-02-23 09:53:28.377359 Books Amazon 244.679437
3 3 2020-11-01 14:53:28.377359 Utility Phone 222.756318
4 4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492
5 5 2021-07-28 19:53:28.377359 Utility Electricity 219.941711
6 6 2021-04-16 11:53:28.377359 Books Amazon Kindle 270.322595
7 7 2021-02-15 10:53:28.377359 Food Zomato 457.183104
8 8 2021-08-10 19:53:28.377359 Utility Phone 151.496373
9 9 2020-11-29 14:53:28.377359 Travel Auto 443.618884
wallet.tail()
Unnamed: 0 date category description debit
95 95 2021-07-19 13:53:28.377359 Utility Phone 388.671213
96 96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562
97 97 2021-03-25 11:53:28.377359 Utility Phone 320.789434
98 98 2021-05-13 15:53:28.377359 Travel Taxi 442.096469
99 99 2020-10-11 16:53:28.377359 Food Hotel 100.455501
wallet.tail(10)
Unnamed: 0 date category description debit
90 90 2020-12-01 14:53:28.377359 Music Amazon 101.573276
91 91 2021-01-22 17:53:28.377359 Food Hotel 232.663468
92 92 2021-01-12 19:53:28.377359 Travel Taxi 356.842638
93 93 2021-01-11 09:53:28.377359 Utility Electricity 111.720809
94 94 2021-01-04 13:53:28.377359 Utility Phone 431.185537
95 95 2021-07-19 13:53:28.377359 Utility Phone 388.671213
96 96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562
97 97 2021-03-25 11:53:28.377359 Utility Phone 320.789434
98 98 2021-05-13 15:53:28.377359 Travel Taxi 442.096469
99 99 2020-10-11 16:53:28.377359 Food Hotel 100.455501
wallet.sum()
Unnamed: 0                                                  4950
date           2021-03-07 14:53:28.3773592020-10-08 09:53:28....
category       MusicFoodBooksUtilityBooksUtilityBooksFoodUtil...
description    AmazonSwiggyAmazonPhoneFlipcartElectricityAmaz...
debit                                               31059.590543
dtype: object
del wallet['Unnamed: 0']  # here 'Unnamed: 0' is name of a column we want to delete
wallet
date category description debit
0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
1 2020-10-08 09:53:28.377359 Food Swiggy 328.440080
2 2021-02-23 09:53:28.377359 Books Amazon 244.679437
3 2020-11-01 14:53:28.377359 Utility Phone 222.756318
4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492
... ... ... ... ...
95 2021-07-19 13:53:28.377359 Utility Phone 388.671213
96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562
97 2021-03-25 11:53:28.377359 Utility Phone 320.789434
98 2021-05-13 15:53:28.377359 Travel Taxi 442.096469
99 2020-10-11 16:53:28.377359 Food Hotel 100.455501

100 rows × 4 columns

wallet['ones'] = [1]*100 # it should match with number of rows 
wallet
date category description debit ones
0 2021-03-07 14:53:28.377359 Music Amazon 421.207327 1
1 2020-10-08 09:53:28.377359 Food Swiggy 328.440080 1
2 2021-02-23 09:53:28.377359 Books Amazon 244.679437 1
3 2020-11-01 14:53:28.377359 Utility Phone 222.756318 1
4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492 1
... ... ... ... ... ...
95 2021-07-19 13:53:28.377359 Utility Phone 388.671213 1
96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562 1
97 2021-03-25 11:53:28.377359 Utility Phone 320.789434 1
98 2021-05-13 15:53:28.377359 Travel Taxi 442.096469 1
99 2020-10-11 16:53:28.377359 Food Hotel 100.455501 1

100 rows × 5 columns

wallet.sum(numeric_only=True)
debit    31059.590543
ones       100.000000
dtype: float64
wallet.std(numeric_only=True)
debit    121.178218
ones       0.000000
dtype: float64
wallet.min(numeric_only=True)
debit    100.455501
ones       1.000000
dtype: float64
wallet.max(numeric_only=True)
debit    499.858182
ones       1.000000
dtype: float64
wallet.cumsum(numeric_only=True)
debit ones
0 421.207327 1
1 749.647407 2
2 994.326844 3
3 1217.083162 4
4 1711.211654 5
... ... ...
95 29728.694577 96
96 30196.249139 97
97 30517.038572 98
98 30959.135042 99
99 31059.590543 100

100 rows × 2 columns

[1]*5
[1, 1, 1, 1, 1]
stocks
ticker value high low volume
0 IBM 255.0 256.0 253.5 100
1 MICROSOFT 350.5 350.5 345.7 300
2 AGILLENT 450.5 450.5 445.7 400
3 INFY 355.0 356.0 353.5 300
stocks.rename(columns={"ticker":"name", "value":"price"})
name price high low volume
0 IBM 255.0 256.0 253.5 100
1 MICROSOFT 350.5 350.5 345.7 300
2 AGILLENT 450.5 450.5 445.7 400
3 INFY 355.0 356.0 353.5 300
stocks
ticker value high low volume
0 IBM 255.0 256.0 253.5 100
1 MICROSOFT 350.5 350.5 345.7 300
2 AGILLENT 450.5 450.5 445.7 400
3 INFY 355.0 356.0 353.5 300
stocks_new = stocks.rename(columns={"ticker":"name", "value":"price"})
stocks_new
name price high low volume
0 IBM 255.0 256.0 253.5 100
1 MICROSOFT 350.5 350.5 345.7 300
2 AGILLENT 450.5 450.5 445.7 400
3 INFY 355.0 356.0 353.5 300
stocks.rename(columns={"ticker":"name", "value":"price"}, inplace=True)
stocks
name price high low volume
0 IBM 255.0 256.0 253.5 100
1 MICROSOFT 350.5 350.5 345.7 300
2 AGILLENT 450.5 450.5 445.7 400
3 INFY 355.0 356.0 353.5 300

Combining dataframes

Dataframes with same columns and different rows

labels = ["APPLE","AT&T","IBM","NIKE"]
value = pd.Series([234.5, 221.6, 125.7, 100.5], index=labels)
high = pd.Series([240.32, 222.5, 127.3, 105.0], index=labels)
low = pd.Series([233.0, 220.0, 123.0, 104.0], index=labels)
volume = pd.Series([100, 200, 50, 1000], index=labels)
stocks = pd.DataFrame({"value":value, "high":high, "low":low, "volume":volume})

stocks1 = pd.DataFrame({
    "value" : [125, 500.0, 300.4, 423.9],
    "low" : [125.0, 490.0, 299.5, 421.1],
    "high" : [130.0, 500.0, 305.0, 425.5],
    "volume" : [123, 50, 100, 80]
    },
    index = ["BELL","XEROX","FORD","TESLA"]
)
stocks
value high low volume
APPLE 234.5 240.32 233.0 100
AT&T 221.6 222.50 220.0 200
IBM 125.7 127.30 123.0 50
NIKE 100.5 105.00 104.0 1000
stocks1
value low high volume
BELL 125.0 125.0 130.0 123
XEROX 500.0 490.0 500.0 50
FORD 300.4 299.5 305.0 100
TESLA 423.9 421.1 425.5 80
pd.concat([stocks, stocks1])
value high low volume
APPLE 234.5 240.32 233.0 100
AT&T 221.6 222.50 220.0 200
IBM 125.7 127.30 123.0 50
NIKE 100.5 105.00 104.0 1000
BELL 125.0 130.00 125.0 123
XEROX 500.0 500.00 490.0 50
FORD 300.4 305.00 299.5 100
TESLA 423.9 425.50 421.1 80

When data has at least one common column and other different columns

df1 = pd.DataFrame(
{
"a":[1,2, 3, 4, 5],
"b":[34, 56, 76, 87, 9],
"labels": ["x","y","z","m","n"]
}
)

df2 = pd.DataFrame(
{
"c":[1,2, 3, 4],
"d":[34, 56, 76, 87],
"labels":["x","y","z","m"]}
)
df1
a b labels
0 1 34 x
1 2 56 y
2 3 76 z
3 4 87 m
4 5 9 n
df2
c d labels
0 1 34 x
1 2 56 y
2 3 76 z
3 4 87 m
pd.merge(df2, df1)
c d labels a b
0 1 34 x 1 34
1 2 56 y 2 56
2 3 76 z 3 76
3 4 87 m 4 87
pd.merge(df2, df1, on="labels")
c d labels a b
0 1 34 x 1 34
1 2 56 y 2 56
2 3 76 z 3 76
3 4 87 m 4 87

Join

import pandas as pd
dfj1 = pd.DataFrame(
{
"a":[1,2, 3, 4, 5],
"b":[34, 56, 76, 87, 9]
},
index=["x","y","z","m","n"]
)
dfj1
   
a b
x 1 34
y 2 56
z 3 76
m 4 87
n 5 9
dfj2 = pd.DataFrame(
{
"c":[1,2, 3, 4],
"d":[34, 56, 76, 87],
},
index=["x","y","z","m"]
)
dfj2
c d
x 1 34
y 2 56
z 3 76
m 4 87
dfj1.join(dfj2) # left join
a b c d
x 1 34 1.0 34.0
y 2 56 2.0 56.0
z 3 76 3.0 76.0
m 4 87 4.0 87.0
n 5 9 NaN NaN
dfj2.join(dfj1) #left join
c d a b
x 1 34 1 34
y 2 56 2 56
z 3 76 3 76
m 4 87 4 87
df = dfj1.join(dfj2)
df['c'] = df.c.convert_dtypes(int)
df
a b c d
x 1 34 1 34.0
y 2 56 2 56.0
z 3 76 3 76.0
m 4 87 4 87.0
n 5 9 <NA> NaN
df.convert_dtypes(int)
a b c d
x 1 34 1 34
y 2 56 2 56
z 3 76 3 76
m 4 87 4 87
n 5 9 <NA> <NA>

Selection

url = "https://raw.githubusercontent.com/vikipedia/python-trainings/master/online_course/source/module2/wallet.csv"
wallet = pd.read_csv(url)
wallet['Unnamed: 0']
0      0
1      1
2      2
3      3
4      4
      ..
95    95
96    96
97    97
98    98
99    99
Name: Unnamed: 0, Length: 100, dtype: int64
wallet['ones'] = [1]*100
wallet
Unnamed: 0 date category description debit ones
0 0 2021-03-07 14:53:28.377359 Music Amazon 421.207327 1
1 1 2020-10-08 09:53:28.377359 Food Swiggy 328.440080 1
2 2 2021-02-23 09:53:28.377359 Books Amazon 244.679437 1
3 3 2020-11-01 14:53:28.377359 Utility Phone 222.756318 1
4 4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492 1
... ... ... ... ... ... ...
95 95 2021-07-19 13:53:28.377359 Utility Phone 388.671213 1
96 96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562 1
97 97 2021-03-25 11:53:28.377359 Utility Phone 320.789434 1
98 98 2021-05-13 15:53:28.377359 Travel Taxi 442.096469 1
99 99 2020-10-11 16:53:28.377359 Food Hotel 100.455501 1

100 rows × 6 columns

del wallet['Unnamed: 0']
wallet
date category description debit ones
0 2021-03-07 14:53:28.377359 Music Amazon 421.207327 1
1 2020-10-08 09:53:28.377359 Food Swiggy 328.440080 1
2 2021-02-23 09:53:28.377359 Books Amazon 244.679437 1
3 2020-11-01 14:53:28.377359 Utility Phone 222.756318 1
4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492 1
... ... ... ... ... ...
95 2021-07-19 13:53:28.377359 Utility Phone 388.671213 1
96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562 1
97 2021-03-25 11:53:28.377359 Utility Phone 320.789434 1
98 2021-05-13 15:53:28.377359 Travel Taxi 442.096469 1
99 2020-10-11 16:53:28.377359 Food Hotel 100.455501 1

100 rows × 5 columns

s = pd.Series(range(10))
s
0    0
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
9    9
dtype: int64
s > 5 #this will return a series of type boolean
0    False
1    False
2    False
3    False
4    False
5    False
6     True
7     True
8     True
9     True
dtype: bool
s[s > 5] # it will filter outer series where there is True a[s > 5]
6    6
7    7
8    8
9    9
dtype: int64
wallet.category 
0       Music
1        Food
2       Books
3     Utility
4       Books
       ...   
95    Utility
96      Books
97    Utility
98     Travel
99       Food
Name: category, Length: 100, dtype: str
wallet.category == "Music"
0      True
1     False
2     False
3     False
4     False
      ...  
95    False
96    False
97    False
98    False
99    False
Name: category, Length: 100, dtype: bool
wallet[wallet.category=="Music"]
date category description debit ones
0 2021-03-07 14:53:28.377359 Music Amazon 421.207327 1
12 2021-07-24 14:53:28.377359 Music Amazon 329.536003 1
23 2020-12-11 10:53:28.377359 Music Netflix 354.940241 1
37 2021-04-18 16:53:28.377359 Music Amazon 266.069078 1
49 2021-03-16 09:53:28.377359 Music spotify 232.303402 1
54 2020-11-16 10:53:28.377359 Music spotify 160.817543 1
65 2020-10-23 18:53:28.377359 Music Netflix 188.748743 1
67 2021-07-31 14:53:28.377359 Music Netflix 324.786917 1
76 2020-11-17 09:53:28.377359 Music Netflix 197.534600 1
78 2020-09-09 16:53:28.377359 Music spotify 415.372894 1
79 2021-08-17 09:53:28.377359 Music Netflix 321.763416 1
82 2020-09-22 09:53:28.377359 Music spotify 411.142701 1
84 2020-09-21 10:53:28.377359 Music Netflix 158.793646 1
85 2021-01-12 09:53:28.377359 Music Amazon 130.374908 1
89 2021-04-12 14:53:28.377359 Music Amazon 218.487173 1
90 2020-12-01 14:53:28.377359 Music Amazon 101.573276 1
wallet[wallet.description=="spotify"]
date category description debit ones
49 2021-03-16 09:53:28.377359 Music spotify 232.303402 1
54 2020-11-16 10:53:28.377359 Music spotify 160.817543 1
78 2020-09-09 16:53:28.377359 Music spotify 415.372894 1
82 2020-09-22 09:53:28.377359 Music spotify 411.142701 1
wallet[wallet.description=="Amazon"]
date category description debit ones
0 2021-03-07 14:53:28.377359 Music Amazon 421.207327 1
2 2021-02-23 09:53:28.377359 Books Amazon 244.679437 1
12 2021-07-24 14:53:28.377359 Music Amazon 329.536003 1
24 2021-05-31 11:53:28.377359 Books Amazon 498.100496 1
37 2021-04-18 16:53:28.377359 Music Amazon 266.069078 1
72 2021-06-30 18:53:28.377359 Books Amazon 294.662869 1
85 2021-01-12 09:53:28.377359 Music Amazon 130.374908 1
89 2021-04-12 14:53:28.377359 Music Amazon 218.487173 1
90 2020-12-01 14:53:28.377359 Music Amazon 101.573276 1
wallet[(wallet.description=="Amazon") & (wallet.category=="Music") ]
date category description debit ones
0 2021-03-07 14:53:28.377359 Music Amazon 421.207327 1
12 2021-07-24 14:53:28.377359 Music Amazon 329.536003 1
37 2021-04-18 16:53:28.377359 Music Amazon 266.069078 1
85 2021-01-12 09:53:28.377359 Music Amazon 130.374908 1
89 2021-04-12 14:53:28.377359 Music Amazon 218.487173 1
90 2020-12-01 14:53:28.377359 Music Amazon 101.573276 1
wallet[(wallet.description=="Amazon") & (wallet.category=="Books") ]
date category description debit ones
2 2021-02-23 09:53:28.377359 Books Amazon 244.679437 1
24 2021-05-31 11:53:28.377359 Books Amazon 498.100496 1
72 2021-06-30 18:53:28.377359 Books Amazon 294.662869 1
wallet[(wallet.description=="Flipcart") & (wallet.category=="Books") ]
date category description debit ones
4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492 1
19 2021-05-16 10:53:28.377359 Books Flipcart 109.325909 1
31 2020-09-25 11:53:28.377359 Books Flipcart 246.503527 1
47 2020-10-28 10:53:28.377359 Books Flipcart 310.408610 1
55 2021-01-21 19:53:28.377359 Books Flipcart 423.749708 1
83 2021-03-18 09:53:28.377359 Books Flipcart 451.584407 1
96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562 1
def filter_by_desc_category(dataframe, desc, cat):
    return dataframe[(dataframe.description==desc) & (wallet.category==cat)]
filter_by_desc_category(wallet, "Amazon", "Books")
date category description debit ones
2 2021-02-23 09:53:28.377359 Books Amazon 244.679437 1
24 2021-05-31 11:53:28.377359 Books Amazon 498.100496 1
72 2021-06-30 18:53:28.377359 Books Amazon 294.662869 1
wallet[wallet.debit < 200]
date category description debit ones
8 2021-08-10 19:53:28.377359 Utility Phone 151.496373 1
13 2021-06-06 10:53:28.377359 Utility Phone 154.044949 1
19 2021-05-16 10:53:28.377359 Books Flipcart 109.325909 1
26 2020-08-26 15:53:28.377359 Books Amazon Kindle 138.806578 1
27 2021-05-01 15:53:28.377359 Utility Electricity 103.680791 1
29 2021-06-20 10:53:28.377359 Utility Electricity 184.557728 1
39 2021-05-17 17:53:28.377359 Food Swiggy 112.333160 1
45 2021-05-27 16:53:28.377359 Utility Electricity 154.742873 1
53 2021-01-27 09:53:28.377359 Utility Electricity 183.186662 1
54 2020-11-16 10:53:28.377359 Music spotify 160.817543 1
61 2021-02-25 13:53:28.377359 Food Hotel 124.658278 1
65 2020-10-23 18:53:28.377359 Music Netflix 188.748743 1
70 2021-08-17 13:53:28.377359 Utility Phone 125.229773 1
73 2021-08-15 17:53:28.377359 Travel Metro 117.588729 1
76 2020-11-17 09:53:28.377359 Music Netflix 197.534600 1
84 2020-09-21 10:53:28.377359 Music Netflix 158.793646 1
85 2021-01-12 09:53:28.377359 Music Amazon 130.374908 1
86 2021-05-07 16:53:28.377359 Food Zomato 198.450672 1
88 2021-04-18 09:53:28.377359 Utility Phone 124.221248 1
90 2020-12-01 14:53:28.377359 Music Amazon 101.573276 1
93 2021-01-11 09:53:28.377359 Utility Electricity 111.720809 1
99 2020-10-11 16:53:28.377359 Food Hotel 100.455501 1
wallet['date'] = pd.to_datetime(wallet.date)
wallet.date
0    2021-03-07 14:53:28.377359
1    2020-10-08 09:53:28.377359
2    2021-02-23 09:53:28.377359
3    2020-11-01 14:53:28.377359
4    2021-06-05 13:53:28.377359
                ...            
95   2021-07-19 13:53:28.377359
96   2021-01-12 19:53:28.377359
97   2021-03-25 11:53:28.377359
98   2021-05-13 15:53:28.377359
99   2020-10-11 16:53:28.377359
Name: date, Length: 100, dtype: datetime64[us]
import datetime
maydate = datetime.datetime(year=2021, month=5, day=1)
maydate
datetime.datetime(2021, 5, 1, 0, 0)
wallet[wallet.date < maydate]
date category description debit ones
0 2021-03-07 14:53:28.377359 Music Amazon 421.207327 1
1 2020-10-08 09:53:28.377359 Food Swiggy 328.440080 1
2 2021-02-23 09:53:28.377359 Books Amazon 244.679437 1
3 2020-11-01 14:53:28.377359 Utility Phone 222.756318 1
6 2021-04-16 11:53:28.377359 Books Amazon Kindle 270.322595 1
... ... ... ... ... ...
93 2021-01-11 09:53:28.377359 Utility Electricity 111.720809 1
94 2021-01-04 13:53:28.377359 Utility Phone 431.185537 1
96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562 1
97 2021-03-25 11:53:28.377359 Utility Phone 320.789434 1
99 2020-10-11 16:53:28.377359 Food Hotel 100.455501 1

61 rows × 5 columns

Groupby

wallet
date category description debit ones
0 2021-03-07 14:53:28.377359 Music Amazon 421.207327 1
1 2020-10-08 09:53:28.377359 Food Swiggy 328.440080 1
2 2021-02-23 09:53:28.377359 Books Amazon 244.679437 1
3 2020-11-01 14:53:28.377359 Utility Phone 222.756318 1
4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492 1
... ... ... ... ... ...
95 2021-07-19 13:53:28.377359 Utility Phone 388.671213 1
96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562 1
97 2021-03-25 11:53:28.377359 Utility Phone 320.789434 1
98 2021-05-13 15:53:28.377359 Travel Taxi 442.096469 1
99 2020-10-11 16:53:28.377359 Food Hotel 100.455501 1

100 rows × 5 columns

grpby = wallet.groupby("category")
grpby
<pandas.api.typing.DataFrameGroupBy object at 0x7fac638b92b0>
grpby.sum(numeric_only=True)
debit ones
category
Books 4929.750393 14
Food 8281.189173 27
Music 4233.451868 16
Travel 6052.931876 16
Utility 7562.267233 27
grpby.max(numeric_only=True)
debit ones
category
Books 498.100496 1
Food 489.143483 1
Music 421.207327 1
Travel 494.124399 1
Utility 499.858182 1
grpby.mean(numeric_only=True)
debit ones
category
Books 352.125028 1.0
Food 306.710710 1.0
Music 264.590742 1.0
Travel 378.308242 1.0
Utility 280.083972 1.0
grpby.first()
date description debit ones
category
Books 2021-02-23 09:53:28.377359 Amazon 244.679437 1
Food 2020-10-08 09:53:28.377359 Swiggy 328.440080 1
Music 2021-03-07 14:53:28.377359 Amazon 421.207327 1
Travel 2020-11-29 14:53:28.377359 Auto 443.618884 1
Utility 2020-11-01 14:53:28.377359 Phone 222.756318 1
wallet.groupby(['category', 'description']).sum(numeric_only=True)
debit ones
category description
Books Amazon 1037.442802 3
Amazon Kindle 1389.052376 4
Flipcart 2503.255216 7
Food Hotel 2752.174732 9
Swiggy 1936.495366 8
Zomato 3592.519075 10
Music Amazon 1467.247766 6
Netflix 1546.567562 6
spotify 1219.636541 4
Travel Auto 2210.428935 5
Metro 1216.463665 4
Taxi 2626.039276 7
Utility Electricity 2885.064355 12
Phone 4677.202878 15

Other computation

wallet['discount'] = [10]*100
wallet['fianl_amount'] = wallet.debit - wallet.discount
wallet
date category description debit ones discount fianl_amount
0 2021-03-07 14:53:28.377359 Music Amazon 421.207327 1 10 411.207327
1 2020-10-08 09:53:28.377359 Food Swiggy 328.440080 1 10 318.440080
2 2021-02-23 09:53:28.377359 Books Amazon 244.679437 1 10 234.679437
3 2020-11-01 14:53:28.377359 Utility Phone 222.756318 1 10 212.756318
4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492 1 10 484.128492
... ... ... ... ... ... ... ...
95 2021-07-19 13:53:28.377359 Utility Phone 388.671213 1 10 378.671213
96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562 1 10 457.554562
97 2021-03-25 11:53:28.377359 Utility Phone 320.789434 1 10 310.789434
98 2021-05-13 15:53:28.377359 Travel Taxi 442.096469 1 10 432.096469
99 2020-10-11 16:53:28.377359 Food Hotel 100.455501 1 10 90.455501

100 rows × 7 columns

wallet.discount*5
0     50
1     50
2     50
3     50
4     50
      ..
95    50
96    50
97    50
98    50
99    50
Name: discount, Length: 100, dtype: int64
s
0    0
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
9    9
dtype: int64
s + 5*s -5 
0    -5
1     1
2     7
3    13
4    19
5    25
6    31
7    37
8    43
9    49
dtype: int64
wallet.sum(numeric_only=True)
debit           31059.590543
ones              100.000000
discount         1000.000000
fianl_amount    30059.590543
dtype: float64
wallet.mean(numeric_only=True)
debit           310.595905
ones              1.000000
discount         10.000000
fianl_amount    300.595905
dtype: float64
wallet.std(numeric_only=True)
debit           121.178218
ones              0.000000
discount          0.000000
fianl_amount    121.178218
dtype: float64
wallet.quantile(numeric_only=True)
debit           318.110776
ones              1.000000
discount         10.000000
fianl_amount    308.110776
Name: 0.5, dtype: float64
wallet.plot()

wallet[['debit']].plot()

Writing

wallet
date category description debit ones discount fianl_amount
0 2021-03-07 14:53:28.377359 Music Amazon 421.207327 1 10 411.207327
1 2020-10-08 09:53:28.377359 Food Swiggy 328.440080 1 10 318.440080
2 2021-02-23 09:53:28.377359 Books Amazon 244.679437 1 10 234.679437
3 2020-11-01 14:53:28.377359 Utility Phone 222.756318 1 10 212.756318
4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492 1 10 484.128492
... ... ... ... ... ... ... ...
95 2021-07-19 13:53:28.377359 Utility Phone 388.671213 1 10 378.671213
96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562 1 10 457.554562
97 2021-03-25 11:53:28.377359 Utility Phone 320.789434 1 10 310.789434
98 2021-05-13 15:53:28.377359 Travel Taxi 442.096469 1 10 432.096469
99 2020-10-11 16:53:28.377359 Food Hotel 100.455501 1 10 90.455501

100 rows × 7 columns

wallet.to_csv("wallet_pd.csv")
!head wallet_pd.csv
,date,category,description,debit,ones,discount,fianl_amount
0,2021-03-07 14:53:28.377359,Music,Amazon,421.2073272347991,1,10,411.2073272347991
1,2020-10-08 09:53:28.377359,Food,Swiggy,328.4400802428426,1,10,318.4400802428426
2,2021-02-23 09:53:28.377359,Books,Amazon,244.67943701511356,1,10,234.67943701511356
3,2020-11-01 14:53:28.377359,Utility,Phone,222.7563175805277,1,10,212.7563175805277
4,2021-06-05 13:53:28.377359,Books,Flipcart,494.1284923793595,1,10,484.1284923793595
5,2021-07-28 19:53:28.377359,Utility,Electricity,219.94171130968408,1,10,209.94171130968408
6,2021-04-16 11:53:28.377359,Books,Amazon Kindle,270.32259514795845,1,10,260.32259514795845
7,2021-02-15 10:53:28.377359,Food,Zomato,457.1831036346536,1,10,447.1831036346536
8,2021-08-10 19:53:28.377359,Utility,Phone,151.49637259947792,1,10,141.49637259947792
!pip install xlsxwriter
Requirement already satisfied: xlsxwriter in /home/vikrant/usr/local/default/lib/python3.13/site-packages (3.2.9)
writer = pd.ExcelWriter("wallet_pd.xlsx", engine="xlsxwriter")
wallet.to_excel(writer, sheet_name="wallet")
writer.close()

Documenatation

pandas documentation

  • Stock market
    • portfolio stats
    • moving trades for today
    • top gainers and loosers
    • plots
  • Automated email
  • backups/sorted