How to set up CS224U sans Anaconda

Here is a way to run the CS224U notebooks without using Anaconda, especially if you already have python installed and don’t want to be bound to Anaconda.

Setup

First of all I run a 64bit *nix system (Linux, Mac is proabably find too) and have Python2.7 preinstalled (which is why I don’t want to have to install Anaconda). This means we will be mostly using the pip and virtualenv cli tool of native python.

Python2.7 comes with pip in the box. If you don’t have virtualenv, install it via

1
>> sudo pip install virtualenv

Here is a good introduction to the idea of Virtual Environment for Python. Basically I use it because I primarily use Python3.5 and don’t want the packages used in this course to interfere with other use cases.

Open a virutal environment

Clone the project

1
git clone https://github.com/cgpotts/cs224u.git && cd cs224u

We are now inside the cs224u directory. Make a virtual environment, or venv here.

1
>> virtualenv .

This will create some additional files and directories, like bin/, lib/ (where the packages will be installed). If you are bothered, add those files into the .gitignore ignore list so git won’t track them.

Now we can enter the virtual environment by

1
>> source bin/activate

activate is a script, by sourcing (running) it you change the python path and a few other properties to cut you away from the system global setting. You are now in a clean state, whatever you install in this environment won’t affect the system global site_packages. Also notice your prompt will display the name of the venv to remind you you are now virtual. To see for yourself

1
2
3
4
(venv) >> pip list
pip (8.1.1)
setuptools (20.6.6)
wheel (0.29.0)

Nothing else!

Install required packages

If you become lazy and just

1
(venv) >> pip install -r requirements.txt

You will run into problem while fetching. Apparently, this is because the package names in this file correspond to those on the Anaconda package repository, and are different from PyPI repo, which pip fetches from.

After some trial and error Actually somebody pointed it out in this pull request, this file seems to be the correct list for pip

1
2
3
4
5
6
7
8
9
10
python >= 2.7.10
numpy >= 1.10
scipy >= 0.11.0
matplotlib >= 1.5.1
scikit-learn >= 0.17
nltk >= 3.0
python-dateutil >= 2.4
unicodecsv >= 0.14
jupyter>=1.0.0
# tensorflow==0.7.1

Overwrite requirement.txt as above and it should install all the packages for you without problem. It might take a while though.

Install tensorflow

You must have noticed how tensorflow was commented out in requirement.txt. This is because the package does not reside on PyPI for some reason. If you follow the link given in setup.ipnb and try

1
>> pip install -i https://pypi.anaconda.org/jjhelmus/simple tensorflow

It will download the package, before failing to find a dependency called protobuf and fails. Turns out, although the tensorflow package resides on Anaconda, this protobuf package is not hosted up to the required version, causing the fetching to fail.

I was able to bypass the problem by following a different instruction here on the TensorFlow official website. This fetches from the Google repo

1
>> pip install https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.7.1-cp27-none-linux_x86_64.whl

This Wheel file probably comes with all the dependencies packaged in and thus bypassed the problem earlier.

Conclusion

This is able to run the first notebook without problem. I have not tried a notebook with TensorFlow, but if there is any problem I will update accordingly. Until then.