Linux-Python Modules for Data Analysis Installation and basics with Jupyter NoteBook
Alright ,So the Modules we need right now are ,
1. Numpy (Numerical Python)
2. Pandas
3. Matplotlib
Installation and setup (python3)
Numpy
To install Numpy on Ubuntu 20.04 execute the following command.
PYTHON 3:
$ sudo apt install python3-numpy
Using pip
pip3 install numpy
Check the version ,
python3 -c "import numpy; print(numpy.__version__)"
Note- Some may face problems while using the pip3 command . Like, It has been installed but you are not able to use it . It may give you the ModuleNotFoundError .If your working directory and the installed modules are different .
Pandas
pip install pandas
Check out the version of pandas ,
pd.__version__
Matplotlib
pip install matplotlib
Basic Data Analysis
Download the file -usa_gov
This file serves here as the data file
Alright, So using this data file we can analyse various aspects of this data and even create graphs with matplotlib. Like this:-
Let us start with reading the file , I’m assuming you’re familiar with Jupyter Note book and continuing with opening the file .
Counting time Zone with Pandas
and here is the output you get ,
Download the this notebook for reference .
Here are all the notebooks on various data sources and their analysis.
Download the database from here-
Note book for this database