Model Development using PySpark in Jupyter Notebook
Navigate to the ‘Notebooks’ tab of your project and open ‘Create and Save Model’ notebook using Jupyter with latest/highest version of Spark environment. You can select the environment by clicking the 3 vertical dots at the right of the name of the Notebook.
Follow the notebook instructions and execute all cells as directed.
Please note following -
-
You can read the data either from a Database/DataWarehouse or Virtual Datasource or Flat File depending on your previous step for Data Ingestion and Data Organization. Accordingly you run appropriate cells in the notebook
-
While Saving the Model using Model name with a prefix of your name/user-id. This is just to ensure that you are not saving a model with a name same as others used.
-
Whereever, user credentials are needed please use your user if and password
-
Wherever, urls are needed use the Cloud Pak for Data URL (ip/host name and port) provided to you.