Learning a programming language is a bit like traditional foreign language learning. Before you can handle complicated constructions, you need to master the basics of grammar. In programming, you need to understand the syntax of commands in order to write code that will perform tasks like data analysis and data visualization. When you study data analytics and visualization, you will certainly come across both Python and R; two of the most popular programming languages for data analysis.
If you’re new to data science and data analytics, learning a programming language can seem daunting. Fortunately, both Python and R have support for those who are learning to program for the first time. And learning through online courses can be a great way to quickly get up to speed in a new programming language.
Python is one of the fastest growing programming languages in the world. A solid grounding in the Python programming language is a skill that will make you an in-demand hire. A recent search on Glassdoor found over 2,000 Python programmer jobs with salaries ranging as high as $231,400.
The R programming language is less widely used than Python, but it is also gaining in popularity. Your decision about the language you prefer using for programming may depend on the focus of your data analysis work. Each programming language has different strengths and weaknesses.
Here are some of the factors you may want to consider as you decide to learn about programming in R or Python (or both).
The history of Python and R
Both Python and R are free and open-source. An open-source programming language includes code that others can use, edit and build on with few or no restrictions. Nonprofit foundations have been organized to oversee the development of both R and Python, and to support the communities of programmers who work with each programming language.
The development of the Python programming language
Python was first rolled out in 1989 by its developer, Guido van Rossum of the Centrum Wiskunde & Informatica in the Netherlands. It has been through two major updates (and many minor ones) since it was first introduced. Python 2.0 was released in 2000; Python 3.0 was released in 2008. Both versions are still in use, though Python 2.7 will be the last update to Python 2 and support for the second version will sunset in 2020.
Python is managed by the Python Software Foundation. Part of the origin story of the Python programming language is rooted in humor: the name Python comes not from the snake, but from the British comedy troupe, Monty Python’s Flying Circus. A Python programmer and community member summed up the guiding ideas behind programming in Python in a set of aphorisms called “The Zen of Python” Perhaps the core values of Python programming can be summed up in these three principles:
- “Simple is better than complex.”
- “Sparse is better than dense.”
- “Readability counts.”
This programming language was built to be extensible. This means that the language itself is concise, but also easy to add to. So when it comes to creating new features on top of existing application program interfaces (APIs)—code-based tools that allow two different programs to interact with relative ease—Python is a good choice. This clean and economical design was a radical departure from some earlier programming languages and it is probably one reason for the increasing use of Python. Its nimbleness is an asset in the complex and connected online ecosystem.
Python’s original programmer has continued to shepherd the development of the programming language he originated. The Python community, tongue firmly in cheek, named him Benevolent Dictator for Life. He stepped down from that post in 2018 but continues to serve as a member of the five-person steering council that oversees Python.
The genesis of the R programming language
The story of the beginnings of the R programming language and the community of R programmers are slightly less colorful than Python’s. R was developed in the early 1990s and the beta release was in 2000. The name comes from the first names of the co-developers of the programming language: Ross Ihaka and Robert Gentleman. The name R is also a reference to the programming language S, which was developed in 1976 by John Chambers at Bell Laboratories, because R is related to S. The R programming language can run most of the code that is written in S.
R programming is overseen by The R Project for Statistical Computing and supported by The R Foundation. R is released by the GNU Project. GNU is an umbrella for free software developed to support open-source programming.
Choosing between Python and R
To choose between learning to program in Python or R, consider what your goals are. R is considered to be more applicable to statistical data analysis. Python has the benefit of working in a wide array of applications, so you can apply your programming skills to data visualization and data analysis and beyond. Both Python and R have extensive user-created libraries that can give you the tools you need to easily complete almost any data analysis project.
For deep learning, Python users can turn to the popular Keras library. Keras allows you to prototype and develop deep learning programs that run on top of Theano (another Python library), TensorFlow (an open-source structure for machine learning) or CNTK (a set of open-source deep learning tools created by Microsoft – the Microsoft Cognitive Toolkit).
For data visualization, the power of R can be a big asset. Popular R libraries for data visualization include ggplot2 and Plotly, among many others. The R programming language gives you the ability to create a data visualization in a range of formats, including scatter plots, bar charts and heat maps. You can also add color to make your visualization pop.
Python is a more accessible and adaptable programming language while R may be better for complex data analysis and data visualization projects. In the end, you may decide to add both R and Python to your programming toolkit.
The Python problem: the end of Python 2
There is an issue looming in the Python programming community: the demise of Python 2. Originally scheduled to be phased out in 2015, Python extended support for version 2 until 2020. Phasing out the older version of the Python programming language has caused concern because Python 3 isn’t fully backwards compatible. This means that projects whose programming was done in Python 2 can’t easily be converted to Python 3. After support ends for Python 2, data analysts could be left without bug fixes. Since there is a lot of Python 2 code deployed around the internet, this could cause problems. However, the Python community (they call themselves Pythonistas) cares passionately about the future of this programming language and Python 2 still has fans. It wouldn’t be surprising if an enterprising group of coders step into the breach once the Python Software Foundation stops supporting Python 2.
Python versus R for data science
Before considering which programming language is better for data science, it’s important to make the distinction between data science and data analytics. Those two terms are often used interchangeably but they define different scopes of work.
Data scientists are the professionals who glean meaningful information from disorganized sets of big data. Data science involves producing reports from this data that organizations can use to guide their decision-making. Data scientists search for data sources and design queries to uncover patterns and trends. Deep learning and artificial intelligence are two tools that data scientists may deploy to help them mine big data for insights.
Data analytics, on the other hand, works with datasets that already exist. Data analysis involves extrapolating information from the data. This can include a statistical analysis, for which the R programming language is a great tool. When you perform data analytics, you home in on a specific question. You may use data visualization to convey the results of your data analysis to other people in your organization.
If you want to use artificial intelligence for data science, you can use either the R or Python programming language. Both have numerous packages available to assist you in deploying deep learning systems for exploring big data.
Although Python’s core code is compact, the packages give it a great deal of power. You may need to use package management software to get the packages you need in Python, however, because Python packages are available from different vendors, not distributed from a central repository.
CRAN (short for the Comprehensive R Archive Network) serves as a library where you can access all the packages for the R programming language in one place. Because R has statistical language built-in, many in the programming community feel that the R programming language is the more powerful tool for data science.
Python is a great choice for data science projects if you are learning to program. Because the Python community is large and active, you’ll find it easy to get help answering questions and solving problems. Python is also somewhat easier to learn than R. You might consider the Python programming language as your gateway to data science.
However, if you already have some experience with programming languages—and particularly if you have some background in statistics—you might find that R is your preferred tool for deep learning projects. While R has a smaller set of users, you can still find support and assistance when you are programming a project using R.
How can you learn Python and R?
The first step to start learning Python or R is to download the software, which is freely available on the internet. You can find R downloads at CRAN. You can download Python from the Python Software Foundation.
The next step is to figure out your best source for learning programming. On the web, you can find written tutorials and YouTube videos. You could go to a coding camp to learn a new programming language. Or you could attend classes online or in person.
The best way for you to learn a new programming language depends on your goals. If you want to dip your toe in and find out if the Python or R programming language is for you, you might start by watching a video or finding a tutorial on the web. However, if you want to accelerate your career in data visualization, deep learning or artificial intelligence, your best choice is probably an online degree from an accredited university, such as the MS in Data Analytics and Visualization Online from the Katz School of Science and Health at Yeshiva University (YU).
Getting a master’s degree in data analytics and visualization gives you a credential that can open doors to higher earning potential and more exciting career paths. When you enroll in a master’s program, you aren’t learning to program in a vacuum. You get to solve real-world problems with the tools you will need to use in your future career. You’ll get instruction in the application of programming languages to deep learning problems and artificial intelligence projects. You’ll also understand how to use the tools you need to complete a stunning visualization that helps your team grasp your data analysis.
The Katz MS in Data Analytics and Visualization online
An online master’s degree program gives you the best of both worlds. You get the community and support of a university program of study combined with the convenience of the online learning environment. With an online MS in Data Analytics and Visualization from the Katz School of Science and Health, you get the benefit of the faculty and resources of YU, one of the top 100 academic institutions in the US, according to U.S. News & World Report.
In the Katz online master’s program, you get the benefit of a personal student success coach to support you throughout the program, in addition to a virtual student community. Classes are limited in size, so you have time to ask the questions, connect with your professors and get the most out of your coursework.
Best of all, the online MS in Data Analytics and Visualization is a part-time program. You can complete your degree while continuing to work at your current job. You can pace your Master’s program to fit your life. You can finish the program and get your MS degree in as little as 18 months up to two and a half years.
If you’re excited to get started learning Python or R (or both), your future is waiting for you. An online MS from Katz can open the door to careers on the frontlines of the tech revolution, using the cutting edge techniques that are most in-demand by employers, including artificial intelligence and deep learning.