Install pySpark. Before installing pySpark, you must have Python and Spark installed. I am using Python 3 in the following examples but you can easily adapt them to Python 2. Go to the Python official website to install it. I also encourage you to set up a virtualenv. To install Spark, make sure you have Java 8 or higher installed on your computer. Mac Install Jupyter Notebook; Jupyter Mac Install; To install pip, go through How to install PIP on Windows? And follow the instructions provided. Installing Jupyter Notebook using Anaconda: Anaconda is an open-source software that contains Jupyter, spyder, etc that are used for large data processing, data analytics, heavy scientific computing.
This post is now outdated since Apple released TensorFlow 2.5 optimized for Macs, which is twice as fast as TensorFlow 2.4 was on my M1. See how to install it at http://blog.wafrat.com/installing-tensorflow-2-5-and-jupyter-lab/.
A few weeks ago (see Getting started with ML: Colab or self-hosted Jupyter?), I said I would eventually install JupyterLab locally on my Mac with M1.
The date is May 22, 2021. After 3 months of trial and error, I finally managed to run TensorFlow and Jupyter Lab on M1. It turns out it's about 6 times faster than Colab.
The quickest way to do it is to follow this guide with one change (see below):
Instead of using the provided yml, you should use this yml:
The only difference is, it forces numpy to use 1.19.5 instead of 1.20.*. 1.20.* is known to not play well with TensorFlow. If you're interested, you can read my notes of how I ended up with this solution.
First I had to install pip. I followed https://pip.pypa.io/en/stable/installing/. However, running
python get-pip.py returned this error:
ERROR: This script does not work on Python 2.7 The minimum supported Python version is 3.6. Please use https://bootstrap.pypa.io/2.7/get-pip.py instead.
Install Jupyter In Miniconda
Instead I had to run
This is strange, why does it say that it installed pip 21, then tells me I am using pip 19?
Anyway, to finish I add Python's bin folder to the path with
vim .zshrc. When I reopen the terminal and run pip, it works as expected.
Over at https://pipenv.pypa.io/en/latest/install/#pragmatic-installation-of-pipenv, I follow their install command:
Then try it by running
pipenv, and it works.
JupyterLab installs fine.
But it failed for TensorFlow.
The URL is indeed reachable. I get the same error when I run the recommended
pipenv lock --clear. When I run
pipenv graph, however, I do see TensorFlow installed:
Following https://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html#pipenv, I run:
Looks like in Colab, keras is not automatically available as in Colab. On https://keras.io/, seeing from the screenshot, I should be importing it from tensorflow instead:
But every time I run this import, the Kernel crashes.
So it might be that TensorFlow did in fact not install correctly.
I am using a new Mac Mini with M1 processors. On this post, it seems that TensorFlow requires specific Python versions and hardware.
So let's check the requirements on the official site. I remembered the announcement that TensorFlow supported M1 processors. Here's the blog post: https://blog.tensorflow.org/2020/11/accelerating-tensorflow-performance-on-mac.html.
It turns out the instructions to install this special build is at a separate repository, and is being brought into the main repository. So until then, we have to use the separate repository and instructions at https://github.com/apple/tensorflow_macos.
The installation guide is really nice. You can run their script, and it will create a venv environment with TensorFlow installed. After installing, I switch to the environment, install Jupyter Lab, run it and try importing tf again:
But I still get the same crash.
It also didn't output a precise error message, and when running, Jupyter doesn't tell whether there is a log file.
Googling the error some more, some recent posts recommend to upgrade Numpy:
The second link also recommends to try in a separate python script first, as does https://github.com/tensorflow/tensorflow/issues/9829#issuecomment-300783730, but for other reasons.
First I upgraded numpy just to see if it would work with dumb luck.
pip list reveals I had version 1.18.5. Running
pip install -U numpy upgraded it to version 1.20.1.
I ran JupyterLab and it failed again to load tf again.
Finally I tried with the python interpreter:
Now we're getting somewhere! Googling the error, I end up on https://stackoverflow.com/questions/65383338/zsh-illegal-hardware-instruction-python-when-installing-tensorflow-on-macbook. They say you should set up the environment to use the arm64 build of Python. Another person says you should disable Rosetta to run in the Terminal, which is roughly the same thing.
Trying to run the interpreter with arch, I get:
I don't know what this means. Could it be that I should have run the install script without Rosetta enabled in the first place? The original install command was:
So this time I tried:
It ran fine when activating with
. '/Users/[username]/tensorflow_macos_venv/bin/activate', but when testing in the interpreter, I got the same error as above.
I tried using arch to run the activate script, but I get strange arch-related errors about commands not being in the PATH.
Since I really don't know what I am doing with this arch command, I decided to disable running Terminal with Rosetta and try again. Still no luck:
Let's see what the TF for M1 repo says:
Let's see what my platform is:
Weird. I thought I had disabled Rosetta?... I close all my windows and try again.
Alright, let's download the internet and reinstall for the nth time:
And it worked!!
Phew, the key information was using
uname -m to make sure you're running Terminal in arm64 mode. I wish this information was shown prominently in the installation notes. I might contribute a PR later.
I am now able to import tensorflow.keras, set up my model and image generators. When trying to train the model, I get this error:
Apparently I should install a package called pillow: https://stackoverflow.com/a/52230898. But running
pip install Pillow throws an error about missing jpeg dependency:
On Mac OS, you can install it with Homebrew:
But now that Rosetta is disabled, Homebrew won't work:
The fix is to run the command using arch, according to https://stackoverflow.com/a/64997047:
Back to installing Pillow:
For Pillow to be available in the JupyterLab environment, I had to restart the kernel.
Running model.fit again, I get a new error:
To make sure my image files were not corrupted, I unzipped my dataset file again. It failed again. So some of my images are truncated. According to this issue on the Pillow repository, I can make Pillow ignore truncation errors with a flag:
Let's run model.fit again. Another error pops up:
But the Tensorflow for M1 at https://github.com/apple/tensorflow_macos#notes states that NumPy is not available for M1:
SciPy and dependent packages and Server/Client TensorBoard packages
Let's try anyway:
I'll skip the rest of the 4000 lines of logs. Basically it doesn't work. After some more digging, I found the open issue on scipy: https://github.com/scipy/scipy/issues/13409.
Numpy is fine though: https://github.com/apple/tensorflow_macos/blob/301e41cf699068ae1b55292472c2eb0b15d7a0f3/scripts/install_venv.sh#L51
So until scipy is properly fixed to run on M1, I am stuck to using Colab. It only took me ten hours to find out!
Some people report having better luck with Conda, including installing scipy:
But to try it properly, I'd have to learn conda. I'll try later.
According to this post, it seems pretty straightforward with Miniforge3 as well. Let's try it.
A few gotchas. Right after installing Miniforge, it put me into the 'base' environment. So when I tried to create a new environment as instructed, it would just spit some weird error. I had to
deactivate, then create the environment as instructed, using their yml file.
After installing the environment, then the custom build of TensorFlow for M1, I tried training a sequential model. The code looks like this:
However, it throw this error when instantiating LSTM:
NotImplementedError: Cannot convert a symbolic Tensor (strided_slice:0) to a numpy array.
The solution, according to this post, is to either use Python 3.8, or downgrade numpy.
python --version and checked that I am using 3.8.1, and I used
pip list to figure out my version of numpy. It's 1.20.3. I tried installing an earlier version with
pip install numpy1.19.5. Among the many errors, I got things like:
I have no idea what these mean. One person on StackOverflow seems to have managed to install it. See the post. They mention building from source and disabling some option. But I'm surprised that Miniforge was able to install it without any strange config. So instead, I specified the version 1.19.5 into the yml file I used to create an environment earlier.
... And it worked! Finally!
Let's run it on Colab, and compare performance.
8s vs 53s. So it's about 6 times faster. Maybe performance increase will differ with the type of model. I'll perform more tests later.
This is my preferred way to install Python and Jupyter notebook for doing scientific data analysis. There are many alternative ways of doing this that you can find on Google. I’m doing this on a MacBook Pro (Retina, 13-inch, Early 2015) with macOS High Sierra 10.13.3.
In the past, I used
virtualenv to manage virtual environments with Python 2. Python3 has built-in handling of virtual environments, so I use that here instead. If you need to use Python 2, then you’ll want to install
virtualenv (see first link at the bottom).
All of these steps are done in the Mac OS Terminal, so start that first.
First install XCode:
Open or create the file ~/.bash_profile and write:
Install Python 3
As of 2018-4-9, this will install Python 3 (I think previously it installed Python 2):
Set up virtual environment
By default, Python 3 comes with the ability to create virtual environments.
Make a folder to host your virtual envs:
Create a virtual env for Jupyter:
Run virtual environment and Jupyter
Start the virtual env:
Install packages for scientific computing:
A browser window will open with the Jupyter file browser in your current working directory.
Exit Jupyter and virtual environment
Jupyter notebook will run in your terminal window until you close it (with Ctrl-C).
You can close the virtual environment with:
UPDATE 2018-04-19: A very useful (and IMO essential) addition to Jupyter notebook is the Table of Contents extension. I show how I install this in a different blog post.
Install Jupyter Macports
Install Jupyter In Mac Browser
- The steps above are mostly based on Maria Mele’s “Install Python 2.7, virtualenv and virtualenvwrapper on OS X Mavericks/Yosemite”
- Documentation on Python 3 virtual environments
- Explanation of how Homebrew installs Python — i.e. why Python 3 isn’t linked to the command`python`, which motivated some of my deviations from the above blog post