Hi folks,
For those of you dealing with data science / analysis, I'm curious to see what tools you use for exploratory data analysis.
I'm doing most of it in Python / Plotly myself, but just wondered if you have preferred other solutions. Sometimes I feel like I'm repeating that bit of analysis quite a lot.
Cheers!
I'm absolutely too lazy to write my own tools for something, unless I want to automate said task. I think less than 5% of all data related exploration experiments I do end up in some automated system. Not that much.
So I try to stick it to graphical tools; a great one is KNIME, which allows you to visually create a data transformation path. Another one is JMP, which is primarily useful for data exploration. If I need to model the relationships between data I'm mostly using Mathematica, as it allows me to have a lot of the mathematical heavy lifting (mostly algebra though) be done by the computer.
Hey thanks these are great! KNIME is especially cool - JMP is rather expensive! You use it for work?
JMP Pro is incredibly expensive, but thankfully many universities offer it for free to their students. I believe there are cheaper (and free) alternatives available which do more or less the same, but I have no idea where to find those, nor have I searched for those very hard.
No worries, thanks! That was a great help.
I just started using https://github.com/plouc/nivo and it's been wonderful. I'd recommend it if you end up building a web-app
That's a nice library.
(rues my lack of JS skills)
Oh I like it! I'm going to use the calendar to visualize the number of flights someone made in the past year :)
We created a platform for individuals and small teams for collecting data, building pipelines and automating repetitive work - and it has a completely free tier. Welcome to check it out if you're still searching for a solution :) Any feedback is much appreciated since we're just rolling out the new experience. https://www.keboola.com/paygo/data-scientists
Shameless plug 🔌from me too.
You should check Pasteur (https://www.intersectlabs.io/pasteur) - it is a no-code data preparation tool, where you can do your data exploration without any coding. We were constantly running local scripts for various functions.. until we built a function library with a visual GUI to use them. You can integrate it with your apps and databases too.
I would love to hear your thoughts on this.
Mathematica (https://www.wolfram.com/mathematica/) is actually quite good for visualizing stuff, and has most of the tools you need for most use cases. But the downside is: it takes a bit of time to get used to, and it is not cheap
I generally do a lot of my data exploration in Excel/Google Sheets, depending on the number of rows. Over 30mb / 200k lines of data and performance takes a real hit though.
Shameless plug 🔌
We started Phiona (https://phiona.com) because of a similar reason, we felt we were doing a lot of repetitive data work and it didn't seem efficient. I was a PM that didn't know Python (you're light years ahead of me) yet wanted a quick way to validate data assumptions. Would love to show you what we're up to, if you interested.
I use Jupyter notebooks a lot (https://jupyter.org/) for visualization of Python scripts. It is very easy to try out different things and quickly see results.
I use Python. For visualisation during analysis I might use matplotlib or export data to Google Sheets where I can work with data more efficiently and create plots.