Data science deals with studying huge amount of data to find patterns in order to draw meaningful insights mostly to aid in business decision making.
Historically DIKW has been a well-known model wherein we start with data, draw meaningful information to gain knowledge with a vision of achieving wisdom.
At a high level data science would involve 5 simple steps:
- Extract the raw data from all possible sources in structured , unstructured formats
- Process the data and convert in a standardized form
- Explore the data to come to a conclusion
- Use suitable data models to fit the war data onto to draw out meaningful inferences
- Enable proper data visualization for reporting purposes to enable better insights and inferences
Python which is widely used for data science, is free source and can’t be learnt conveniently by means of myriad tutorials, videos and references available over the web.
Ananconda is the go to package to install to get started with Python for getting a feel of the power of the language.
The following is a very convenient link to install Anaconda and get started with Python for simple programming leading to getting a feel of data science.
Anaconda installation once goes through would help in installation of the following very important packages to enable programming and especially data science with Python
- Pandas —– For converting raw data in any form to standardized form to further explore the data.
- NumPy and SciPy —– For enabling data exploration using numerical and scientific function related features
- Scikit Learn —- for enabling machine learning and coming up with a data model that fits the data converted from raw form to standardized form.
- Matplotlib —- To enable data visualization in form of charts to enable insights on data for taking business decisions
In addition to this Anaconda installation comes with the installation of the web enabled Jupyter notebook which hosts Python sources as well as the IDE i.e. Spyder.
To start coding with Python
Right click on Jupyter Notebook icon shortcut (Should be in desktop) and override %USERPROFILE% with the destination folder to store the .ipynb sources.
Subsequently follow the below steps.
- Launch the Anaconda prompt
- Launch Jupyter console and notebook wherein the URL comes up launching Jupyter.
- Create a new notebook to start coding Python
- Jupyter notebook tool bar has self-explanatory shortcuts that help in running of the relevant Python codes specific to a particular cell.
- To rename a source click on the source name and rename to a meaningful name.
- The sources get saved with a .ipynb extension.