Faster String Slicing in NumPy

A new vectorized approach to slicing string arrays in NumPy, shown to be up to 25x faster!

In a recent NumPy update, the package introduces numpy.strings.slice (docs), which is a native string array slicing function supporting negative indices, offsets, and support for StringDType, NumPy’s flexible data type (docs).

Let’s say that you have a list of strings and you need to slice each one to extract text. Typically, you’d leverage a for-loop:

invoices = [
    '2344_invoice.pdf', 
    '2345_invoice.pdf', 
    '2346_invoice.pdf'
]

# Slice using a `for` loop
customer_id = [i[:4] for i in invoices]
print(customer_id) # ['2344' '2345' '2346']

While this will work for small scenarios, you’ll see some performance issues when you’re processing thousands of strings, as they’re not vectorized and execute line by line, making them slower for larger datasets.

» NumPy is a powerful Python library implemented in C/C++ that enables blazing-fast numerical computing, including array and list manipulations, mathematical operations, and more.

If we were to implement this same logic using numpy.strings.slice, we’d get something such as:

import numpy as np


customer_invoices = np.array([
    '2344_invoice.pdf',
    '2345_invoice.pdf',
    '2346_invoice.pdf'
])


# Slice using `np.strings.slice`
customer_id = np.strings.slice(customer_invoices, 0, 4)
print(customer_id) # ['2344' '2345' '2346']

Benchmark: for-loop vs numpy.strings.slice

We benchmarked the performance as the string size scales and found that it’s ~26x quicker using line_profiler:

Results from line_profiler

» Note: an array size of a 10M x 10 characters were used for this benchmark.

If we were plot the performance when the string size scales, we found that it exponentially increases with time. In fact, performance is (negligibly) worse if we have small strings:

» Want to be able to benchmark this yourself? Download the Jupyter Notebook (by the way, this is a Python Snacks Pro feature and you’re getting it for free today!)

When to use numpy.strings.slice?

Use it when you are:

  • Working with large string arrays

  • Handling multilingual or Unicode text

  • Needing element-wise slicing logic

Final thoughts: If you're working with large string arrays, choosing numpy.strings.slice over a traditional for-loop will make string data handling easier, quicker, and more optimized.

Bonus content: A run-down of StringDType

It’s NumPy’s flexible data type designed to work with numpy.strings.slice along with other np.strings’ functions. (docs)

Unlike older Unicode dtypes ('U'), StringDType supports both variable-length UTF-8 strings as well as emoji and international character support.

This example showcases both Unicode-safe capabilities of StringDType and start and stop arrays functionality that supports element-wise slicing control:

customer_invoices = [
    '2346_账单_四月.pdf',               # Chinese
    '2347_فاتورة_مايو.pdf',                # Arabic
    '2🚀50_invoice_july.pdf',          # Emoji
]

# Create a NumPy array with StringDtype
invoices_array = np.array(
    customer_invoices,  
    dtype=np.dtypes.StringDType()
)

customer_id = np.strings.slice(
    invoices_array,
    [2, 1, 0], # start character
    [5, -1, -2] # stop character
)
print(customer_id) 
# ['46_' '347_فاتورة_مايو.pd' '2🚀50_invoice_july.p']

Pro tip: Always convert string arrays to StringDType before using np.strings functions to prevent truncation, as it dynamically allocates space and ensures accurate handling of variable-length text

Happy coding!

📧 Join the Python Snacks Newsletter! 🐍

Want even more Python-related content that’s useful? Here’s 3 reasons why you should subscribe the Python Snacks newsletter:

  1. Get Ahead in Python with bite-sized Python tips and tricks delivered straight to your inbox, like the one above.

  2. Exclusive Subscriber Perks: Receive a curated selection of up to 6 high-impact Python resources, tips, and exclusive insights with each email.

  3. Get Smarter with Python in under 5 minutes. Your next Python breakthrough could just an email away.

You can unsubscribe at any time.

Interested in starting a newsletter or a blog?

Do you have a wealth of knowledge and insights to share with the world? Starting your own newsletter or blog is an excellent way to establish yourself as an authority in your field, connect with a like-minded community, and open up new opportunities.

If TikTok, Twitter, Facebook, or other social media platforms were to get banned, you’d lose all your followers. This is why you should start a newsletter: you own your audience.

This article may contain affiliate links. Affiliate links come at no cost to you and support the costs of this blog. Should you purchase a product/service from an affiliate link, it will come at no additional cost to you.

Reply

or to participate.