pandasai

PandasAI — Exploratory Data Analysis with Pandas and AI prompts

I came across PandasAI while searching for AI integration with Pandas dataframes. My primary objective is to conduct fast exploratory data analysis on new datasets, which would guide my future analysis approach. PandasAI appeared to meet my needs in this regard. In summary, PandasAI is a Python library that seamlessly integrates generative artificial intelligence capabilities (eg Openai) into Pandas, enabling users to perform basic Pandas operations using simple text prompts. It’s worth noting that PandasAI is designed to complement rather than replace Pandas.

What I like about pandasAI

  1. Alternatives LLM integration: Besides openai, PandasAI support integration with Hugging Face’s Starcoder, which is free to use and works pretty well with PandasAI
  2. Return Dataframe Object: PandasAI returns dataframe objects that can be further processed by Pandas or PandasAI itself.
  3. Simplified Plotting process: PandasAI simplifies common plotting tasks for easy data visualization.

In the following sections, we explore a range of common tasks that can be performed by prompting the dataframe instead of the usual pandas operations. We will use a sample dataset “Penguins” loaded from seaborn as our study. We will also be using the Hugging Face starcoder LLM which is free. However, I find that openai is able to deliver the right output with longer and more complex prompt.

!pip install pandasai
# Setting up for prompt
import pandas as pd
from pandasai import PandasAI
from pandasai.llm.starcoder import Starcoder
from pandasai.llm.openai import OpenAI
import seaborn as sns

# Instantiate a LLM
# Openai
# llm = OpenAI(api_token="openai_key")

# Starcoder
llm = Starcoder(api_token="hugging face api key")
pandas_ai = PandasAI(llm)

# Load dataset
penguins = sns.load_dataset("penguins")
Basic Operations prompt
NA operations
Fillna and row operations

There are some cases where I did not managed to get an output (openai llm might do a better job) such as below.

# it set to the penguins dataframe instead.
penguins_update = pandas_ai(penguins, prompt= 'return a copy. penguin[ bill_length_mm] = 0 if island = Torgersen', show_code=True)
# Does not return any output
penguins_newcol = pandas_ai(penguins, prompt= 'Add new column "bill_length_cm" by taking "bill_length_mm" /100.')

In conclusion, PandasAI excels at enabling simple and clean exploratory analysis, particularly with its seamless integration of Starcoder, which eliminates cost concerns. However, it may not perform as effectively with longer and more complex prompts, especially when used with Starcoder. It’s important to note that while PandasAI offers valuable functionalities, you will still rely on Pandas for more extensive data manipulation and analysis tasks.

This post has also been published on Medium