Chris Padilla/Blog


My passion project! Posts spanning music, art, software, books, and more. Equal parts journal, sketchbook, mixtape, dev diary, and commonplace book.


    Coordinating Multiple Serverless Functions on AWS Lambda

    Sharing a bit of initial research for a serverless project I'm working on.

    I've been wrestling this week with a challenge to coordinate a cron job on AWS Lambda. The single script running on Lambda is ideal. There's no server overhead, the service is isolated and easy to debug, and it's cost effective. The challenge, though, is how to scale the process when the run time increases.

    Lambda's runtime limit at this moment is 15 minutes. Fairly generous, but it is still a limiter. My process will involve web scraping, which if done sequentially, could easily eat those 15 minutes if I have several processes to run.

    The process looks something like this so far:

    1. Initial function start, get a list of pages to crawl.
    2. Sequentially crawl those pages. (separate lambda function called sequentially)
    3. After page crawl, send an update on the status of the crawl and update the DB with results.

    Simple when it's 3 or 5 pages to crawl for a minute each. But an issue when that scales up. Not to mention the inefficiency of waiting for all processes to end before sending results from the crawl.

    This would be a great opportunity to lean on the power of the microservice structure by switching to concurrent tasks. The crawls are already able to be sequentially called, the change would be figuring out how to send the notification after the crawl completes.

    To do this, each of those steps above can be broken up into their own separate lambda functions.

    Once they're divided into their own serverless functions, the challenge is too coordinate them.

    Self Orchestration

    One option here would be to adjust my functions to pass state between functions. Above, the 1st lambda would grab the list of pages, and fire off all instances of the 2nd lambda for each page. The crawler could receive an option to notify and update the db.

    It's a fine use case for it! With only three steps, it wouldn't be overly complicated.

    To call a lambda function asynchronously, it simply needs to be marked as an "Event" type.

    import boto3
    
    lambda_client = boto3.client('lambda', region_name='us-east-2')
    
    page = { ... }
    
    lambda_client.invoke(
      FunctionName=f'api-crawler',
      InvocationType="Event",
      Payload=json.dumps(page=page)
    )

    Step Functions

    Say it were more complicated, though! 3 more steps, or needing to handle fail conditions!

    Another option I explored was using Step Functions and a state machine approach. AWS allows for the ability to orchestrate multiple lambda functions in a variety of patterns such as Chaining, branching, and parallelism. "Dynamic parallelism" is a patern that would suit my case here. Though, I may not necessarily need the primary benefit of shared and persistent state.

    The Plan

    For this use case, I'm leaning towards the self orchestration. The level of state being passed is not overly complex: A list of pages from step 1 to 2, and then a result of success or failure from step 2 to 3. The process has resources in place to log errors at each step, and there's no need to correct at any part of the step.

    Next step is the implementation. To be continued!


    Time to Boogie Woogie!

    Listen on Youtube

    Walking that bass! Trying out playing without looking at my hands as well.

    Followed this great video tutorial for the tune


    Aang

    The boy who lived? ⚡No, wrong YA protagonist.

    Really enjoyed Netflix's Avatar! Such a feast for the eyes 🤩


    New Album — Flowers 🌸

    🌼

    Incase you missed it! Celebrating the arrival of spring! I've always loved how these two flowers speckle the hills and fields this time of year.

    Purchase on 🤘 Bandcamp and Listen on 🙉 Spotify or any of your favorite streaming services!


    SSL Certifications in Python on Mac

    Busy week, but I have a quick edge case to share:

    We've upgraded to a newer version of Python for our microservices at work. While making the transition, I bumped into an error sending http requests through urllib3:

    SSL: CERTIFICATE_VERIFY_FAILED

    I've come across this issue before, having to manually add certs. The solution involved low level work and was gnarly to handle! Thankfully, though, with Python 3.6 and higher, the solution is more straightforward.

    It's largely outlined in this Stack Overflow post by Craig Glennie. The gist is that on Mac, you actually have to run a post install script located in your Applications/Python folder.

    If you follow along with Craig's post, you'll find that the documentation for it is even there in the Python README. Always good to read the manual!


    Satchmo

    Louis!

    Listened exclusively to early Hot Five and Hot Seven recordings of Louis Armstrong the other week. Nothing better to brighten your mood!


    Lazy Sunday Improv


    Pup at Twilight

    Sweet little Lucy loving on this pretty sky

    The most beautiful time... Oh, and today is Lucy's birthday!!


    Customizing Title Bar Colors in VS Code

    We're doing a lot of migration from one repo to another at work. And it's easy to mix up which-window-is-which when looking at two files that are so similar!

    Enter custom Title Bar Colors! Not only does doing this look cool, it's actually incredibly practical!

    I first came across this in Wes Bos' Advanced React course, and it's sucinclty covered on Zachary Minner's blog: How to set title bar color in VS Code per project. So I'll let you link-hop over there!

    If you don't want to pick your colors, looks like there's even a VS Code plugin that will do this automatically for you! (Thank you Stack Overflow!)

    The Plugin in question


    Blue Bossa


    The Boy and the Heron

    🐦

    Miranda and I caught The Boy and the Heron in theaters! Wildly beautiful!! 🐦


    Regex Capture Groups in Node

    I've been writing some regex for the blog. My blog is in markdown, so there's a fair amount of parsing I'm doing to get the data I need.

    But it took some figuring out to understand how I could get my capture group through regex. Execute regex with single line strings wasn't a problem, but passing in a multiline markdown file proved challenging.

    So here's a quick reference for using Regex in Node:

    RegExp Class

    new RegExp('\[([^\n\]]+)\]', 'gi').exec('[a][b]')
    // ['a']

    Returns only the capture groups. Oddly, there is a groups attribute if you log in a Chrome browser, but it's undefined in the above example.

    The issue here comes with multiline string. If my match were down a few lines, I would get no result.

    You would think to flip on the global flag, but trouble arises. More on that soon.

    RegExp Literal

    /\[(a)\]/g.exec('[a][b]')
    // ["[a]", "a"]

    Returns the full match first, then the grouped result. Different from the class!

    I was curious and checked it was an instance of the RegExp class:

    /fdsf/ instanceof RegExp
    // true

    And so it is! Perhaps it inherits properties, but changes some method implementations...

    String Match

    '[a][b]'.match(/\[(a)\]/g)
    // ["[a]"]

    You can also call regex on a string directly, but you'll only get the full match. No capture group isolation.

    Again, problems here when it comes to multiline strings and passing the global flag.

    The Global Option

    The Global option interferes with the result. For example: Passing a global flag to the RegExp class will actually not allow the search to run properly.

    The only working method I've found for multi line regex search with capture groups is using the Regexp literal.

    My working example:

    const artUrlRegex = /[.|\n]]*\!\[.*\]\((.*)\)/;
    
    const urls = artUrlRegex.exec(postMarkDown);
    
    if(urls) {
        const imageUrl = urls[1];
        . . . 
    }

    Surprisingly complicated! Hope this helps direct any of your future regex use.


    Fly Me To The Moon

    Listen on Youtube

    Walking the Bass, flying to the moon 🌙


    Oatchi

    What a gooooood boy!

    Oatchi from Pikmin 4 is THE BEST! When ever Lucy is being a rascal, I've started telling her "No, be like Oatchi!"


    Triggering Notifications through Github Actions

    My Mom and Dad are not on RSS (surprising, I know!) Nor are they on Instagram or Twitter or anywhere! But, naturally, they want to stay in the loop with music and art!

    I've done a bit of digging to see how I could set up a notification system. Here's what I've got:

    This blog is updated through pushing markdown files to the GitHub repo. That triggers the build over on Vercel where I'm hosting things. So, the push is the starting point. I can:

    1. Call a script with github-script
    2. From said script, scan for the latest post.
    3. Grab the relevant content (an image or video link)
    4. Integrate with my messaging system of choice, either:
      • Twilio to send a text
      • Amazon Simple Email Service

    The config for github scripts is very straightforward:

    name: Hey Mom
    
    on: [push]
    
    jobs:
      hey-mom:
        runs-on: ubuntu-latest
        steps:
          - uses: actions/checkout@v3
          - uses: actions/github-script@v7
            with:
              script: |
                const script = require('./.github/scripts/heyMom.js')
                console.log(script({github, context}))
    ./.github/workflows/heymom.yml

    The one concern I have here is external dependencies. From my preliminary research, it's best to keep the GitHub Action light without any dependencies. Mostly, I want it to find the content quickly, and send it off so the website can finish up the deploy cycle.

    This would be a great case for passing that off to AWS Lambda. The lambda can take in the link and message body, load in dependencies for integrating with the service, and send it off from there.

    Though, as I write this now, I'm forgetting that Vercel, of course, is actually handling the build CI. This GitHub action would be separate from that process.

    Well, more to research! No code ready for this just yet, but we'll see how this all goes. Maybe the first text my folks get will be the blog post saying how I developed this. 🙂