Close Menu
AI News TodayAI News Today

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Slots & Daggers review: Sometimes, a short game hits the spot

    How one founder’s bet on ‘the old school web’ is paying off

    This weekend’s two biggest movies were both directed by YouTubers

    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    Facebook X (Twitter) Instagram Pinterest Vimeo
    AI News TodayAI News Today
    • Home
    • Shop
    • AI News
    • AI Reviews
    • AI Tools
    • AI Tutorials
    • Chatbots
    • Free AI Tools
    AI News TodayAI News Today
    Home»AI Tools»Pandas Isn’t Going Anywhere: Why It’s Still My Go-To for Data Wrangling
    AI Tools

    Pandas Isn’t Going Anywhere: Why It’s Still My Go-To for Data Wrangling

    By No Comments6 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Pandas Isn’t Going Anywhere: Why It’s Still My Go-To for Data Wrangling
    Share
    Facebook Twitter LinkedIn Pinterest Email

    learning data science in 2020, Pandas was one of the most popular tools. Although new tools focus on improving Pandas’ weaknesses in handling very large datasets, I still use Pandas for many data cleaning, processing, and analysis tasks. Yes, Pandas gives me a hard time when working with billions of rows, but it is definitely more than enough for working with anything below that.

    I see Pandas being used in not only for EDA or in notebooks but also in production systems.

    In this article, I’ll go over some data cleaning and processing operations to demonstrate how capable Pandas is.

    Let’s start with the dataset, which contains stock keeping units (SKUs) and a search API responses for these SKUs.

    import pandas as pd
    
    search_results = pd.read_csv("search_results.csv")
    
    search_results.head()

    Search result is a list of dictionaries and looks like this:

    search_results.loc[0, "search_result"]
    
    "[{'my_id': 'HBCV00007F5Y2B', 'distance': 1.0, 'entity': {}}, 
    {'my_id': 'HBCV00007UPQBM', 'distance': 1.0, 'entity': {}}, 
    {'my_id': 'HBCV00008I29IH', 'distance': 1.0, 'entity': {}}, 
    {'my_id': 'HBCV00006U3ZYB', 'distance': 0.8961254358291626, 'entity': {}}, 
    {'my_id': 'HBCV0000AFA4H6', 'distance': 0.8702399730682373, 'entity': {}}, 
    {'my_id': 'HBCV00009CDGD4', 'distance': 0.86175537109375, 'entity': {}}, 
    {'my_id': 'HBCV000046336T', 'distance': 0.8594968318939209, 'entity': {}}, 
    {'my_id': 'HBCV00009QDZRT', 'distance': 0.8572311997413635, 'entity': {}}, 
    {'my_id': 'HBCV00008E11P3', 'distance': 0.8553324937820435, 'entity': {}}, 
    {'my_id': 'HBV00000C4IY6', 'distance': 0.8539167642593384, 'entity': {}}] 
    ... and 5 entities remaining"

    As we see in the output, it’s not a proper list of dictionary format because of the last part (“… and 5 entities remaining”). Also, it’s saved as a single string.

    In order to make better use of it, we need to convert it to a proper list of dictionaries. The following line of code removes the last part by splitting the string at “…” and takes the first split.

    search_results.loc[0, "search_result"].split("...")[0].strip()

    However, the output is still a single string. We can use the built-in ast module of Python to convert it to a list:

    import ast
    
    res = ast.literal_eval(search_results.loc[0, "search_result"].split("...")[0].strip())
    
    res
    
    [{'my_id': 'HBCV00007F5Y2B', 'distance': 1.0, 'entity': {}},
     {'my_id': 'HBCV00007UPQBM', 'distance': 1.0, 'entity': {}},
     {'my_id': 'HBCV00008I29IH', 'distance': 1.0, 'entity': {}},
     {'my_id': 'HBCV00006U3ZYB', 'distance': 0.8961254358291626, 'entity': {}},
     {'my_id': 'HBCV0000AFA4H6', 'distance': 0.8702399730682373, 'entity': {}},
     {'my_id': 'HBCV00009CDGD4', 'distance': 0.86175537109375, 'entity': {}},
     {'my_id': 'HBCV000046336T', 'distance': 0.8594968318939209, 'entity': {}},
     {'my_id': 'HBCV00009QDZRT', 'distance': 0.8572311997413635, 'entity': {}},
     {'my_id': 'HBCV00008E11P3', 'distance': 0.8553324937820435, 'entity': {}},
     {'my_id': 'HBV00000C4IY6', 'distance': 0.8539167642593384, 'entity': {}}]

    We now have the search results as a proper list of dictionaries. This was only for a single row. We need to apply the same operation to all SKUs (i.e. entire SKU column).

    One option is to go over all the rows in a for loop and perform the same operation. However, this is not the best option. We should prefer vectorized operations when we can. A vectorized operation basically means executing the code on all rows at once.

    On a single row, I used splitting to get rid of the last part of the string but it did not work in a vectorized operation. A more robust option seems to be using a regex.

    search_results.loc[:, 'search_result'] = search_results['search_result'].str.replace(r"....*", "", regex=True).str.strip()

    This code selects “…” and everything that comes after it and replaces them with nothing. In other words, it removes “… and 5 entities remaining” part.

    We now have all the rows in the search results column as a proper list of dictionaries.

    search_results.loc[10, "search_result"]
    
    "[{'my_id': 'HBCV00007F5Y2B', 'distance': 1.0, 'entity': {}},
     {'my_id': 'HBCV00007UPQBM', 'distance': 1.0, 'entity': {}},
     {'my_id': 'HBCV00008I29IH', 'distance': 1.0, 'entity': {}},
     {'my_id': 'HBCV00006U3ZYB', 'distance': 0.8961254358291626, 'entity': {}},
     {'my_id': 'HBCV0000AFA4H6', 'distance': 0.8702399730682373, 'entity': {}},
     {'my_id': 'HBCV00009CDGD4', 'distance': 0.86175537109375, 'entity': {}},
     {'my_id': 'HBCV000046336T', 'distance': 0.8594968318939209, 'entity': {}},
     {'my_id': 'HBCV00009QDZRT', 'distance': 0.8572311997413635, 'entity': {}},
     {'my_id': 'HBCV00008E11P3', 'distance': 0.8553324937820435, 'entity': {}},
     {'my_id': 'HBV00000C4IY6', 'distance': 0.8539167642593384, 'entity': {}}]"

    They’re still saved as a string but I can easily convert them to a list using the ast module, which I will do in the next step.

    What I’m interested in is the SKUs returned in the search results. I’ll create a new column by extracting the SKUs in the dictionaries. I can access them using the “my_id” key of the dictionary.

    There are 3 parts of this operation:

    • Convert the search result string to list using the literal_eval function
    • Extract SKU from the my_id key of the dictionary
    • Do this in a list comprehension to get SKUs from all the dictionaries in the list

    We can do all these operations by applying a lambda function to all rows as follows:

    search_results.loc[:, "result_skus"] = 
    search_results["search_result"].apply(lambda x: [item['my_id'] for item in ast.literal_eval(x)])
    
    search_results.head()

    Each row in the result_skus column contains a list of 10 SKUs. Let’s say I need to have these 10 SKUs in different rows. For each row in the sku column, there will be 10 rows created from the list in the result_skus column. There is a very simple way of doing this in Pandas, which is the explode function.

    data = search_results[["sku", "result_skus"]].explode("result_skus", ignore_index=True)
    
    data.head()

    We created a new dataframe with sku and result_skus column. The drawing below demonstrates what the explode function does:

    Consider the opposite. We have a dataframe as shown above but want to have all results for an sku in a single row.

    We can use the groupby function to group the rows by sku and then apply the list function on the result_skus column:

    new_data = data.groupby("sku", as_index=False)["result_skus"].apply(list)
    
    new_data.head()

    This will get us back to the previous step:

    Using the explode function, we created a dataframe with a separate row for each sku in the result_skus column. What if we need to have them separated to different columns instead of rows?

    One option is to apply the pd.Series function to the result_skus column and concatenate the resulting columns to the original dataframe.

    new_cols = new_data["result_skus"].apply(pd.Series)
    
    new_data = pd.concat([new_data, new_cols], axis=1)
    
    new_data.head()

    Columns from 0 to 9 contains the 10 SKUs in the result_skus column. This code using the apply function is not a vectorized operation.

    We have another option, which is vectorized and much faster.

    new_cols = pd.DataFrame(new_data["result_skus"].tolist())
    
    new_data = pd.concat([new_data, new_cols], axis=1)

    This code will give us the same dataframe as above but much faster.

    I demonstrated a typical data cleaning and processing task a data scientist or analyst may encounter in their job. I’ve been in the field for over 5 years and Pandas has always been enough to do what I need except for when working very large datasets (e.g. billions of rows).

    The tools that are better fit for such large datasets have similar syntax to Pandas. For example, PySpark is kind of a mixture of Pandas and SQL. Polars is very similar to Pandas in terms of syntax. Thus, learning and practicind Pandas is still a highly valuable skill for anyone working in the data science and AI domain.

    Thank you for reading.

    Data goto isnt Pandas Wrangling
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleFujifilm’s X Half is even more whimsical with a $300 price cut
    Next Article Doctors Explain Why Your Smartwatch Is Giving You Anxiety, and How to Stop It
    • Website

    Related Posts

    AI Tools

    Meta-Cognitive Regulation Might Be the Most Important AI Skill Nobody Is Talking About

    AI Tools

    Embeddings Aren’t Magic: The Predictable Failure Modes of RAG Retrieval

    AI Tools

    Qdrant TurboQuant Explained: Is TurboQuant the Silver Bullet?

    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Slots & Daggers review: Sometimes, a short game hits the spot

    0 Views

    How one founder’s bet on ‘the old school web’ is paying off

    0 Views

    This weekend’s two biggest movies were both directed by YouTubers

    0 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews
    AI Tutorials

    Quantization from the ground up

    AI Tools

    David Sacks is done as AI czar — here’s what he’s doing instead

    AI Reviews

    Judge sides with Anthropic to temporarily block the Pentagon’s ban

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Most Popular

    Slots & Daggers review: Sometimes, a short game hits the spot

    0 Views

    How one founder’s bet on ‘the old school web’ is paying off

    0 Views

    This weekend’s two biggest movies were both directed by YouTubers

    0 Views
    Our Picks

    Quantization from the ground up

    David Sacks is done as AI czar — here’s what he’s doing instead

    Judge sides with Anthropic to temporarily block the Pentagon’s ban

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Contact Us
    • Terms & Conditions
    • Privacy Policy
    • Disclaimer

    © 2026 ainewstoday.co. All rights reserved. Designed by DD.

    Type above and press Enter to search. Press Esc to cancel.