Publish AI, ML & data-science insights to a global community of data professionals.

Switch from Anaconda to Miniconda for your Data project environment

Mini can get the job done, even better than the big snake

Opinion

Photo by Tanner Boriack on Unsplash
Photo by Tanner Boriack on Unsplash

When I first started out my career as a data scientist, one of the tools I kept getting recommended is Anaconda. It is even said so in the original docs.

Why the Anaconda docs say you should choose it - Screenshot of docs by Author
Why the Anaconda docs say you should choose it – Screenshot of docs by Author

For a while, I also thought Anaconda was cool. I mean look at the first page of Anaconda Navigator that you see once you open the tool.

Anaconda Navigator - Image from Official Docs
Anaconda Navigator – Image from Official Docs
Photo by National Cancer Institute on Unsplash
Photo by National Cancer Institute on Unsplash

When I first saw that landing page, during my first tenure as a data scientist, I felt like how a kid imagined a real scientist works, in front of his complex and awesome tools.

All I needed then to complete the set was a pair of glasses.

I also agreed at the time why Anaconda was so recommended for data scientists. All the tools, including extra packages, code editors, and viz tools, were already provided.

One or Two Years later…

After working on dozens of analytics use cases, each with its unique challenges and requirements, I have grown wearier and wearier of Anaconda.

In summary, it became one bloated useless app on my computer.

  1. The thousands of packages are total storage and memory eater. And even worse, useless. All those preinstalled packages really weighed my computer’s performance down. Storage taken was a few GB, which can make a pretty big difference, especially on a Macbook Pro 128GB. Even with conda clean --all it is still taking up quite a space. I did not use many of the packages that came with Anaconda, like pomegranate, proj4, pyopengl, and so many more. I’m not sure what they all are for.
  2. Managing the Python packages became a very slow process. Even updating a single package withconda update [some package] felt like it dragged on for way too long. I have tried using from only one environment (not really recommended, by the way) to different virtual environments with specific packages for different use cases (one for data exploration, one for numerical analysis and modeling, and another one for image processing). Both were still slow.
  3. The Navigator became obsolete. I can just open my preferred code editor app either from the Start menu/Application folder or via command prompt/Terminal. In fact, opening the app through Navigator is even way slower. Updating and managing packages is also more convenient via command prompt. Making new virtual environments? Open command prompt/terminal and type in the conda command. Why bother with the Environment page in Anaconda Navigator. Even more so, why bother with it at all?
  4. Makes my job a lot harder. Once the deployment schedule arrives for some of my clients/users, I have to create a virtual environment with the same required packages and config to ensure the model(s) can run smoothly in their production server(s). For the exact same reasons that I have just listed above, no way I was going with Anaconda (or likely ever).

So what’s the alternative…

Miniconda!

Basically, it is just the conda package management system + Python + its base packages.

That’s it.

  • No extra (useless!) tools and installations.
  • Need some packages for a specific use case/project? You can still create virtual environments and install just the packages you need there.
  • Need some code editors installed? Just download directly from the official web. Directly managed by the developers, and just as good (or even better) than the one installed via Anaconda Navigator.

With Miniconda, I actually felt like making lean data science and analytics projects.

Conclusions

I am not saying Anaconda is an outdated tool that no data scientist/analyst should ever use. It still has its potential. There is even an Enterprise version of it. I think first-time data scientists/analysts could benefit from starting their career/training using Anaconda. Like, learn what is an optimal virtual environment for your data projects. They might even have some uses for those 1,500 extra packages. Who knows.

I am saying that if you have a firm grasp of what you want to build or use with Python (e.g. you want to be a time series expert, deep learning engineer, or a data-driven marketing specialist), Miniconda is an **** efficient and recommended tool to use.


Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.

Write for TDS

Related Articles