Polishing the T-urtle

blogs.perl.org

Published by lichtkind on Friday 21 February 2025 16:29

Cellgraph 0.7 is out - and I will tell you about the great new features in the next paragraph and how it helps you to play with logical structures and deepen your understanding of them. But first please let me mention the why!

sqrt.png

Almost everybody here has side projects, that might be a waste of time - but you just want to build them because you feel a connection to the idea. Cellgraph is such a project to me. And after looking at all my stalled and half baked projects I wanted have something completed, well designed, beautiful code, nice interface, a joy to use - perfect as I'm capable of. Something that can rekindle your love for programming. In one way it already made me a better programmer, with better habits. Every time I spot a flaw in code, in code structure or even architecture I fix it. Yes its tedious. But I got a sense how good code looks like and whats needed, so i can get there faster in other projects.

Plus there is a lot going on under the hood you can not see, but that is needed to get smooth functionality. Why you can use a slider or arrow buttons to change a value and still get a smooth undo and redo that work as you expect? There is an data history class that can replace current value when they coming in faster. Will probably released as its own module some day.

The other why is: I like beautiful personalized graphics. Its the opposite of AI, you can tweak the details in the sense of directly touching them. And understand what you did. Many of the new features help you with that. Maybe the strongest new feature is the variable subrule mapping. Big word. Well, the state of a cell and its neighbors determine the next state of a cell. If you have a large neighborhood and many states you get combinatoric explosion. So I have 4 levels of how to bundle similar rules, so it stays manageable. The groupings also introduce certain graphical behavior. Also good for learning what is going on: many parameter have randomize functions. You can push it a few times to see what actually changes to see the influence of that parameter. And the undo and redo buttons make it easier to go back to the best you fond or to experiment in general. Also the action rules got completely reworked. They are much more powerful now. But in the end they produce certain areas where the usual pattern just breaks down. Then you have a lot of functions that produce related rules. Also helpful to understand but not a new feature. Lastly something that added a new layer of possibilities was the result application. Cellular automata are just simple functions state of neighborhood cells --> new cell state. But what if you do not just replace the old state with the new but add it (modulo max) or do other operations. This is especially useful when your rule has almost none useful output image and you could not figure out which subrule is the bottle neck that prevents the patterns from change. Then just drop in another result application and you got totally different pattern which than you can tweak into your liking.

And than has got a load of effort in cleaning up the UI, fixing the documentation. Rendering and startup got faster, better UI guidance and more bugs squashed than i want to admit where there. Please give it a try. Thank you.

Lists of Perl distributions on metacpan.org use blue color for regular distributions and red for development releases (containing an underscore). But some are colored in grey, for reasons that I fail to understand, and I found no explanation on the site.

release_schedule.pod - RL to release 5.41.10

Perl commits on GitHub

Published by richardleach on Thursday 20 February 2025 22:35

release_schedule.pod - RL to release 5.41.10

mem_collxfrm: Handle above-Unicode code points

Perl commits on GitHub

Published by khwilliamson on Thursday 20 February 2025 19:54

mem_collxfrm: Handle above-Unicode code points

As stated in the comments added by this commit, it is undefined behavior
to call strxfrm() on above-Unicode code points, and especially calling
it with Perl's invented extended UTF-8.  This commit changes all such
input into a legal value, replacing all above-Unicode with the highest
permanently unassigned code point, U+10FFFF.

run/locale.t: Hoist code out of a block

Perl commits on GitHub

Published by khwilliamson on Thursday 20 February 2025 19:54

run/locale.t: Hoist code out of a block

The next commit will want to use the results later.

run/locale.t: Add detail to test names

Perl commits on GitHub

Published by khwilliamson on Thursday 20 February 2025 19:54

run/locale.t: Add detail to test names

utf8.h: Split a macro into components

Perl commits on GitHub

Published by khwilliamson on Thursday 20 February 2025 19:54

utf8.h: Split a macro into components

This creates an internal macro that skips some error checking for use
when we don't care if it is completely well-formed or not.

What's new on CPAN - January 2025

r/perl

Published by /u/oalders on Thursday 20 February 2025 18:32

Step-by-Step Guide to Learning PERL Programming: From Novice to Expert

Perl on Medium

Published by Chandan Kumar on Thursday 20 February 2025 01:54

Learn From Basic to Advanced in this blog

Hosting a Secure Static Website with S3 and CloudFront: Part IIb

Introduction

In Part IIa, we detailed the challenges we faced when automating the deployment of a secure static website using S3, CloudFront, and WAF. Service interdependencies, eventual consistency, error handling, and AWS API complexity all presented hurdles. This post details the actual implementation journey.

We didn’t start with a fully fleshed-out solution that just worked. We had to “lather, rinse and repeat”. In the end, we built a resilient automation script robust enough to deploy secure, private websites across any organization.

The first take away - the importance of logging and visibility. While logging wasn’t the first thing we actually tackled, it was what eventually turned a mediocre automation script into something worth publishing.


1. Laying the Foundation: Output, Errors, and Visibility

1.1. run_command()

While automating the process of creating this infrastructure, we need to feed the output of one or more commands into the pipeline. The output of one command feeds another. But each step of course can fail. We need to both capture the output for input to later steps and capture errors to help debug the process. Automation without visibility is like trying to discern the elephant by looking at the shadows on the cave wall. Without a robust solution for capturing output and errors we experienced:

  • Silent failures
  • Duplicated output
  • Uncertainty about what actually executed

When AWS CLI calls failed, we found ourselves staring at the terminal trying to reconstruct what went wrong. Debugging was guesswork.

The solution was our first major building block: run_command().

    echo "Running: $*" >&2
    echo "Running: $*" >>"$LOG_FILE"

    # Create a temp file to capture stdout
    local stdout_tmp
    stdout_tmp=$(mktemp)

    # Detect if we're capturing output (not running directly in a terminal)
    if [[ -t 1 ]]; then
        # Not capturing → Show stdout live
        "$@" > >(tee "$stdout_tmp" | tee -a "$LOG_FILE") 2> >(tee -a "$LOG_FILE" >&2)
    else
        # Capturing → Don't show stdout live; just log it and capture it
        "$@" >"$stdout_tmp" 2> >(tee -a "$LOG_FILE" >&2)
    fi

    local exit_code=${PIPESTATUS[0]}

    # Append stdout to log file
    cat "$stdout_tmp" >>"$LOG_FILE"

    # Capture stdout content into a variable
    local output
    output=$(<"$stdout_tmp")
    rm -f "$stdout_tmp"

    if [ $exit_code -ne 0 ]; then
        echo "ERROR: Command failed: $*" >&2
        echo "ERROR: Command failed: $*" >>"$LOG_FILE"
        echo "Check logs for details: $LOG_FILE" >&2
        echo "Check logs for details: $LOG_FILE" >>"$LOG_FILE"
        echo "TIP: Since this script is idempotent, you can re-run it safely to retry." >&2
        echo "TIP: Since this script is idempotent, you can re-run it safely to retry." >>"$LOG_FILE"
        exit 1
    fi

    # Output stdout to the caller without adding a newline
    if [[ ! -t 1 ]]; then
        printf "%s" "$output"
    fi
}

This not-so-simple wrapper gave us:

  • Captured stdout and stderr for every command
  • Real-time terminal output and persistent logs
  • Clear failures when things broke

run_command() became the workhorse for capturing our needed inputs to other processes and our eyes into failures.

1.2. Lessons from the Evolution

We didn’t arrive at run_command() fully formed. We learned it the hard way:

  • Our first iterations printed output twice
  • Capturing both streams without swallowing stdout took fine-tuning
  • We discovered that without proper telemetry, we were flying blind

2. Automating the Key AWS Resources

2.1. S3 Bucket Creation

The point of this whole exercise is to host content, and for that, we need an S3 bucket. This seemed like a simple first task - until we realized it wasn’t. This is where we first collided with a concept that would shape the entire script: idempotency.

S3 bucket names are globally unique. If you try to create one that exists, you fail. Worse, AWS error messages can be cryptic:

  • “BucketAlreadyExists”
  • “BucketAlreadyOwnedByYou”

Our naive first attempt just created the bucket. Our second attempt checked for it first:

create_s3_bucket() {
    if run_command $AWS s3api head-bucket --bucket "$BUCKET_NAME" --profile $AWS_PROFILE 2>/dev/null; then
        echo "Bucket $BUCKET_NAME already exists."
        return
    fi

    run_command $AWS s3api create-bucket \
        --bucket "$BUCKET_NAME" \
        --create-bucket-configuration LocationConstraint=$AWS_REGION \
        --profile $AWS_PROFILE
}

Making the script “re-runable” was essential unless of course we could guarantee we did everything right and things worked the first time. When has that every happened? Of course, we then wrapped the creation of the bucket run_command() because every AWS call still had the potential to fail spectacularly.

And so, we learned: If you can’t guarantee perfection, you need idempotency.

2.2. CloudFront Distribution with Origin Access Control

Configuring a CloudFront distribution using the AWS Console offers a streamlined setup with sensible defaults. But we needed precise control over CloudFront behaviors, cache policies, and security settings - details the console abstracts away. Automation via the AWS CLI gave us that control - but there’s no free lunch. Prepare yourself to handcraft deeply nested JSON payloads, get jiggy with jq, and manage the dependencies between S3, CloudFront, ACM, and WAF. This is the path we would need to take to build a resilient, idempotent deployment script - and crucially, to securely serve private S3 content using Origin Access Control (OAC).

Why do we need OAC?

Since our S3 bucket is private, we need CloudFront to securely retrieve content on behalf of users without exposing the bucket to the world.

Why not OAI?

AWS has deprecated Origin Access Identity in favor of Origin Access Control (OAC), offering tighter security and more flexible permissions.

Why do we need jq?

In later steps we create a WAF Web ACL to firewall our CloudFront distribution. In order to associate the WAF Web ACL with our distribution we need to invoke the update-distribution API which requires a fully fleshed out JSON payload updated with the Web ACL id.

GOTHCHA: Attaching a WAF WebACL to an existing CloudFront distribution requires that you use the update-distribution API, not associate-web-acl as one might expect.

Here’s the template for our distribution configuration (some of the Bash variables used will be evident when you examine the completed script):

{
  "CallerReference": "$CALLER_REFERENCE",
   $ALIASES
  "Origins": {
    "Quantity": 1,
    "Items": [
      {
        "Id": "S3-$BUCKET_NAME",
        "DomainName": "$BUCKET_NAME.s3.amazonaws.com",
        "OriginAccessControlId": "$OAC_ID",
        "S3OriginConfig": {
          "OriginAccessIdentity": ""
        }
      }
    ]
  },
  "DefaultRootObject": "$ROOT_OBJECT",
  "DefaultCacheBehavior": {
    "TargetOriginId": "S3-$BUCKET_NAME",
    "ViewerProtocolPolicy": "redirect-to-https",
    "AllowedMethods": {
      "Quantity": 2,
      "Items": ["GET", "HEAD"]
    },
    "ForwardedValues": {
      "QueryString": false,
      "Cookies": {
        "Forward": "none"
      }
    },
    "MinTTL": 0,
    "DefaultTTL": $DEFAULT_TTL,
    "MaxTTL": $MAX_TTL
  },
  "PriceClass": "PriceClass_100",
  "Comment": "CloudFront Distribution for $ALT_DOMAIN",
  "Enabled": true,
  "HttpVersion": "http2",
  "IsIPV6Enabled": true,
  "Logging": {
    "Enabled": false,
    "IncludeCookies": false,
    "Bucket": "",
    "Prefix": ""
  },
  $VIEWER_CERTIFICATE
}

The create_cloudfront_distribution() function is then used to create the distribution.

create_cloudfront_distribution() {
    # Snippet for brevity; see full script
    run_command $AWS cloudfront create-distribution --distribution-config file://$CONFIG_JSON
}

Key lessons:

  • use update-configuation, not associate-web-acl for CloudFront distributions
  • leverage jq to modify the existing configuration to add the WAF Web ACL id
  • manually configuring CloudFront provides more granularity than the console, but requires some attention to the details

2.3. WAF IPSet + NAT Gateway Lookup

Cool. We have a CloudFront distribution! But it’s wide open to the world. We needed to restrict access to our internal VPC traffic - without exposing the site publicly. AWS WAF provides this firewall capability using Web ACLs. Here’s what we need to do:

  1. Look up our VPC’s NAT Gateway IP (the IP CloudFront would see from our internal traffic).
  2. Create a WAF IPSet containing that IP (our allow list).
  3. Build a Web ACL rule using the IPSet.
  4. Attach the Web ACL to the CloudFront distribution.

Keep in mind that CloudFront is designed to serve content to the public internet. When clients in our VPC access the distribution, their traffic needs to exit through a NAT gateway with a public IP. We’ll use the AWS CLI to query the NAT gateway’s public IP and use that when we create our allow list of IPs (step 1).

find_nat_ip() {
    run_command $AWS ec2 describe-nat-gateways --filter "Name=tag:Environment,Values=$TAG_VALUE" --query "NatGateways[0].NatGatewayAddresses[0].PublicIp" --output text --profile $AWS_PROFILE
}

We take this IP and build our first WAF component: an IPSet. This becomes the foundation for the Web ACL we’ll attach to CloudFront.

The firewall we create will be composed of an allow list of IP addresses (step 2)…

create_ipset() {
    run_command $AWS wafv2 create-ip-set \
        --name "$IPSET_NAME" \
        --scope CLOUDFRONT \
        --region us-east-1 \
        --addresses "$NAT_IP/32" \
        --ip-address-version IPV4 \
        --description "Allow NAT Gateway IP"
}

…that form the rules for our WAF Web ACL (step 3).

create_web_acl() {
    run_command $AWS wafv2 create-web-acl \
        --name "$WEB_ACL_NAME" \
        --scope CLOUDFRONT \
        --region us-east-1 \
        --default-action Block={} \
        --rules '[{"Name":"AllowNAT","Priority":0,"Action":{"Allow":{}},"Statement":{"IPSetReferenceStatement":{"ARN":"'$IPSET_ARN'"}},"VisibilityConfig":{"SampledRequestsEnabled":true,"CloudWatchMetricsEnabled":true,"MetricName":"AllowNAT"}}]' \
        --visibility-config SampledRequestsEnabled=true,CloudWatchMetricsEnabled=true,MetricName="$WEB_ACL_NAME"
}

This is where our earlier jq surgery becomes critical - attaching the Web ACL requires updating the entire CloudFront distribution configuration. And that’s how we finally attach that Web ACL to our CloudFront distribution (step 4).

DISTRIBUTION_CONFIG=$(run_command $AWS cloudfront get-distribution-config --id $DISTRIBUTION_ID)
<h1 id="usejqtoinjectwebaclidintoconfigjson">Use jq to inject WebACLId into config JSON</h1>

UPDATED_CONFIG=$(echo "$DISTRIBUTION_CONFIG" | jq --arg ACL_ARN "$WEB_ACL_ARN" '.DistributionConfig | .WebACLId=$ACL_ARN')
<h1 id="passupdatedconfigbackintoupdate-distribution">Pass updated config back into update-distribution</h1>

echo "$UPDATED_CONFIG" > updated-config.json
run_command $AWS cloudfront update-distribution --id $DISTRIBUTION_ID --if-match "$ETAG" --distribution-config file://updated-config.json

At this point, our CloudFront distribution is no longer wide open. It is protected by our WAF Web ACL, restricting access to only traffic coming from our internal VPC NAT gateway.

For many internal-only sites, this simple NAT IP allow list is enough. WAF can handle more complex needs like geo-blocking, rate limiting, or request inspection - but those weren’t necessary for us. Good design isn’t about adding everything; it’s about removing everything that isn’t needed. A simple allow list was also the most secure.

2.4. S3 Bucket Policy Update

When we set up our bucket, we blocked public access - an S3-wide security setting that prevents any public access to the bucket’s contents. However, this also prevents CloudFront (even with OAC) from accessing S3 objects unless we explicitly allow it. Without this policy update, requests from CloudFront would fail with Access Denied errors.

At this point, we need to allow CloudFront to access our S3 bucket. The update_bucket_policy() function will apply the policy shown below.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "cloudfront.amazonaws.com"
      },
      "Action": "s3:GetObject",
      "Resource": "arn:aws:s3:::$BUCKET_NAME/*",
      "Condition": {
        "StringEquals": {
          "AWS:SourceArn": "arn:aws:cloudfront::$AWS_ACCOUNT:distribution/$DISTRIBUTION_ID"
        }
      }
    }
  ]
}

Modern OAC best practice is to use the AWS:SourceArn condition to ensure only requests from your specific CloudFront distribution are allowed.

It’s more secure because it ties bucket access directly to a single distribution ARN, preventing other CloudFront distributions (or bad actors) from accessing your bucket.

"Condition": {
    "StringEquals": { "AWS:SourceArn": "arn:aws:cloudfront::$AWS_ACCOUNT:distribution/$DISTRIBUTION_ID" }
}

With this policy in place, we’ve completed the final link in the security chain. Our S3 bucket remains private but can now securely serve content through CloudFront - protected by OAC and WAF.


3. Putting It All Together

We are now ready to wrap a bow around these steps in an idempotent Bash script.

  1. Create an S3 Bucket (or verify it Exists)
    • This is where we first embraced idempotency. If the bucket is already there, we move on.
  2. Create a CloudFront Distribution with OAC
    • The foundation for serving content securely, requiring deep JSON config work and the eventual jq patch. Restrict Access with WAF
  3. Discover the NAT’s Gateway IP - The public IP representing our VPC
    • Create a WAF IPSet (Allow List) – Build the allow list with our NAT IP.
    • Create a WAF Web ACL – Bundle the allow list into a rule.
    • Attach the Web ACL to CloudFront – Using jq and update-distribution.
  4. Grant CloudFront Access to S3
    • Update the bucket policy to allow OAC originating requests from our distribution.

Each segment of our script is safe to rerun. Each is wrapped in run_command(), capturing results for later steps and ensuring errors are logged. We now have a script we can commit and re-use with confidence whenever we need a secure static site. Together, these steps form a robust, idempotent deployment pipeline for a secure S3 + CloudFront website - every time.

You can find the full script here.


4. Running the Script

A hallmark of a production-ready script is an ‘-h’ option. Oh wait - your script has no help or usage? I’m supposed to RTFC? It ain’t done skippy until it’s done.

Scripts should include the ability to pass options that make it a flexible utility. We may have started out writing a “one-off” but recognizing opportunities to generalize the solution turned this into another reliable tool in our toolbox.

Be careful though - not every one-off needs to be Swiss Army knife. Just because aspirin is good for a headache doesn’t mean you should take the whole bottle.

Our script now supports the necessary options to create a secure, static website with a custom domain and certificate. We even added the ability to include additional IP addresses for your allow list in addition to the VPC’s public IP.

Now, deploying a private S3-backed CloudFront site is as easy as:

Example:

./s3-static-site.sh -b my-site -t dev -d example.com -c arn:aws:acm:us-east-1:cert-id

Inputs:

  • -b - the bucket name
  • -t - the tag I used to identify my VPC NAT gateway
  • -c - the certificate ARN I created for my domain
  • -d - the domain name for my distribution

This single command now deploys an entire private website - reliably and repeatably. It only takes a little longer to do it right!


5. Key Takeaways from this Exercise

The process of working with ChatGPT to construct a production ready script that creates static websites took many hours. In the end, several lessons were reinforced and some gotchas discovered. Writing this blog itself was a collaborative effort that dissected both the technology and the process used to implement it. Overall, it was a productive, fun and rewarding experience. For those not familiar with ChatGPT or who are afraid to give it a try, I encourage you to explore this amazing tool.

Here are some of the things I took away from this adventure with ChatGPT.

  • ChatGPT is a great accelerator for this type of work - but not perfect. Ask questions. Do not copy & paste without understanding what it is you are copying and pasting!
  • If you have some background and general knowledge of a subject ChatGPT can help you become even more knowledgeable as long as you ask lots of follow-up questions and pay close attention to the answers.

With regard to the technology, some lessons were reinforced, some new knowledge was gained:

  • Logging (as always) is an important feature when multiple steps can fail
  • Idempotency guards make sure you can iterate when things went wrong
  • Discovering the NAT IP and subsequently adding a WAF firewall rule was needed because of the way CloudFront works
  • Use the update-distribution API call not associate-web-acl when adding WAF ACLs to your distribution!

Thanks to ChatGPT for being an ever-present back seat driver on this journey. Real AWS battle scars + AI assistance = better results.

Wrap Up

In Part III we wrap it all up as we learn more about how CloudFront and WAF actually protect your website.

Disclaimer

This post was drafted with the assistance of ChatGPT, but born from real AWS battle scars.

If you like this content, please leave a comment or consider following me. Thanks.

What's new on CPAN - January 2025

perl.com

Published on Thursday 20 February 2025 00:00

Welcome to “What’s new on CPAN”, a curated look at last month’s new CPAN uploads for your reading and programming pleasure. Enjoy!

APIs & Apps

Config & Devops

Data

Development & Version Control

Science & Mathematics

Other

Hosting a Secure Static Website with S3 and CloudFront: Part IIa

Overcoming Challenges in AWS Automation: Lessons from Deploying a Secure S3 + CloudFront Static Website

Introduction

After designing a secure static website on AWS using S3, CloudFront, and WAF as discussed in Part I of this series, we turned our focus to automating the deployment process. While AWS offers powerful APIs and tools, we quickly encountered several challenges that required careful consideration and problem-solving. This post explores the primary difficulties we faced and the lessons we learned while automating the provisioning of this infrastructure.

1. Service Interdependencies

A key challenge when automating AWS resources is managing service dependencies. Our goal was to deploy a secure S3 website fronted by CloudFront, secured with HTTPS (via ACM), and restricted using WAF. Each of these services relies on others, and the deployment sequence is critical:

  • CloudFront requires an ACM certificate
    • before a distribution with HTTPS can be created.
  • S3 needs an Origin Access Control (OAC)
    • configured before restricting bucket access to CloudFront.
  • WAF must be created and associated with CloudFront
    • after the distribution is set up.

Missteps in the sequence can result in failed or partial deployments, which can leave your cloud environment in an incomplete state, requiring tedious manual cleanup.

2. Eventual Consistency

AWS infrastructure often exhibits eventual consistency, meaning that newly created resources might not be immediately available. We specifically encountered this when working with ACM and CloudFront:

  • ACM Certificate Validation:
    • After creating a certificate, DNS validation is required. Even after publishing the DNS records, it can take minutes (or longer) before the certificate is validated and usable.
  • CloudFront Distribution Deployment:
    • When creating a CloudFront distribution, changes propagate globally, which can take several minutes. Attempting to associate a WAF policy or update other settings during this window can fail.

Handling these delays requires building polling mechanisms into your automation or using backoff strategies to avoid hitting API limits.

3. Error Handling and Idempotency

Reliable automation is not simply about executing commands; it requires designing for resilience and repeatability:

  • Idempotency:
    • Your automation must handle repeated executions gracefully. Running the deployment script multiple times should not create duplicate resources or cause conflicts.
  • Error Recovery:
    • AWS API calls occasionally fail due to rate limits, transient errors, or network issues. Implementing automatic retries with exponential backoff helps reduce manual intervention.

Additionally, logging the execution of deployment commands proved to be an unexpected challenge. We developed a run_command function that captured both stdout and stderr while logging the output to a file. However, getting this function to behave correctly without duplicating output or interfering with the capture of return values required several iterations and refinements. Reliable logging during automation is critical for debugging failures and ensuring transparency when running infrastructure-as-code scripts.

4. AWS API Complexity

While the AWS CLI and SDKs are robust, they are often verbose and require a deep understanding of each service:

  • CloudFront Distribution Configuration:
    • Defining a distribution involves deeply nested JSON structures. Even minor errors in JSON formatting can cause deployment failures.
  • S3 Bucket Policies:
    • Writing secure and functional S3 policies to work with OAC can be cumbersome. Policy errors can lead to access issues or unintended public exposure.
  • ACM Integration:
    • Automating DNS validation of ACM certificates requires orchestrating multiple AWS services (e.g., Route 53) and carefully timing validation checks. We did not actuall implement an automated process for this resource. Instead, we considered this a one-time operation better handled manually via the console.

Lessons Learned

Throughout this process, we found that successful AWS automation hinges on the following principles:

  • Plan the dependency graph upfront:
    • Visualize the required services and their dependencies before writing any automation.
  • Integrate polling and backoff mechanisms:
    • Design your scripts to account for delays and transient failures.
  • Prioritize idempotency:
    • Your infrastructure-as-code (IaC) should be safe to run repeatedly without adverse effects.
  • Test in a sandbox environment:
    • Test your automation in an isolated AWS account to catch issues before deploying to production.
  • Implement robust logging:
    • Ensure that all automation steps log their output consistently and reliably to facilitate debugging and auditing.

Conclusion

Automating AWS deployments unlocks efficiency and scalability, but it demands precision and robust error handling. Our experience deploying a secure S3 + CloudFront website highlighted common challenges that any AWS practitioner is likely to face. By anticipating these issues and applying resilient practices, teams can build reliable automation pipelines that simplify cloud infrastructure management.

Next up, Part IIb where we build our script for creating our static site.

Disclaimer

This post was drafted with the assistance of ChatGPT, but born from real AWS battle scars.

If you like this content, please leave a comment or consider following me. Thanks.

Hosting a Secure Static Website with S3 and CloudFront: Part I

Introduction

While much attention is given to dynamic websites there are still many uses for the good ‘ol static website. Whether for hosting documentation, internal portals, or lightweight applications, static sites remain relevant. In my case, I wanted to host an internal CPAN repository for storing and serving Perl modules. AWS provides all of the necessary components for this task but choosing the right approach and configuring it securely and automatically can be a challenge.

Whenever you make an architectural decision various approaches are possible. It’s a best practice to document that decision in an Architectural Design Record (ADR). This type of documentation justifies your design choice, spelling out precisely how each approach either meets or fails to meet functional or non-functional requirements. In the first part of this blog series we’ll discuss the alternatives and why we ended up choosing our CloudFront based approach. This is our ADR.

Requirements

Description Notes
1. HTTPS website for hosting a CPAN repository Will be used internally but we would like secure transport
2. Controlled Access Can only be accessed from within a private subnet in our VPC
3. Scalable Should be able to handle increasing storage without reprovisioning
4. Low-cost Ideally less than $10/month
5. Low-maintenance No patching or maintenance of applicaation or configurations
6. Highly available Should be available 24x7, content should be backed up

Alternative Approaches

Now that we’ve defined our functional and non-functional requirements let’s look at some approaches we might take in order to create a secure, scalable, low-cost, low-maintenance static website for hosting our CPAN repository.

Use an S3 Website-Enabled Bucket

This solution at first glance seems like the quickest shot on goal. While S3 does offer a static website hosting feature, it doesn’t support HTTPS by default, which is a major security concern and does not match our requirements. Additionally, website-enabled S3 buckets do not support private access controls - they are inherently public if enabled. Had we been able to accept an insecure HTTP site and public access this approach would have been the easiest to implement. If we wanted to accept public access but required secure transport we could have used CloudFront with the website enabled bucket either using CloudFront’s certificate or creating our own custom domain with its own certificate.

Since our goal is to create a private static site, we can however use CloudFront as a secure, caching layer in front of S3. This allows us to enforce HTTPS, control access using Origin Access Control (OAC), and integrate WAF to restrict access to our VPC. More on this approach later…

Pros:

  • Quick & Easy Setup Enables static website hosting with minimal configuration.
  • No Additional Services Needed Can serve files directly from S3 without CloudFront.
  • Lower Cost No CloudFront request or data transfer fees when accessed directly.

Cons:

  • No HTTPS Support Does not natively support HTTPS, which is a security concern.
  • Public by Default Cannot enforce private access controls; once enabled, it’s accessible to the public.
  • No Fine-Grained Security Lacks built-in protection mechanisms like AWS WAF or OAC.
  • Not VPC-Restricted Cannot natively block access from the public internet while still allowing internal users.

Analysis:

While using an S3 website-enabled bucket is the easiest way to host static content, it fails to meet security and privacy requirements due to public access and lack of HTTPS support.

Deploying a Dedicated Web Server

Perhaps the obvious approach to hosting a private static site is to deploy a dedicated Apache or Nginx web server on an EC2 instance. This method involves setting up a lightweight Linux instance, configuring the web server, and implementing a secure upload mechanism to deploy new content.

Pros:

  • Full Control: You can customize the web server configuration, including caching, security settings, and logging.
  • Private Access: When used with a VPC, the web server can be accessed only by internal resources.
  • Supports Dynamic Features: Unlike S3, a traditional web server allows for features such as authentication, redirects, and scripting.
  • Simpler Upload Mechanism: Files can be easily uploaded using SCP, rsync, or an automated CI/CD pipeline.

Cons:

  • Higher Maintenance: Requires ongoing security patching, monitoring, and potential instance scaling.
  • Single Point of Failure: Unless deployed in an autoscaling group, a single EC2 instance introduces availability risks.
  • Limited Scalability: Scaling is manual unless configured with an ALB (Application Load Balancer) and autoscaling.

Analysis:

Using a dedicated web server is a viable alternative when additional flexibility is needed, but it comes with added maintenance and cost considerations. Given our requirements for a low-maintenance, cost-effective, and scalable solution, this may not be the best approach.

Using a Proxy Server with a VPC Endpoint

A common approach I have used to securely serve static content from an S3 bucket is to use an internal proxy server (such as Nginx or Apache) running on an EC2 instance within a private VPC. In fact, this is the approach I have used to create my own private yum repository, so I know it would work effectively for my CPAN repository. The proxy server retrieves content from an S3 bucket via a VPC endpoint, ensuring that traffic never leaves AWS’s internal network. This approach requires managing an EC2 instance, handling security updates, and scaling considerations. Let’s look at the cost of an EC2 based solution.

The following cost estimates are based on AWS pricing for us-east-1:

EC2 Cost Calculation (t4g.nano instance)

Item Pricing
Instance type: t4g.nano (cheapest ARM-based instance) Hourly cost: \$0.0052/hour
Monthly usage: 730 hours (assuming 24/7 uptime) (0.0052 x 730 = \$3.80/month)

Pros:

  • Predictable costs No per-request or per-GB transfer fees beyond the instance cost.
  • Avoids external traffic costs All traffic remains within the VPC when using a private endpoint.
  • Full control over the web server Can customize caching, security, and logging as needed.

Cons:

  • Higher maintenance
    • Requires OS updates, security patches, and monitoring.
  • Scaling is manual
    • Requires autoscaling configurations or manual intervention as traffic grows.
  • Potential single point of failure
    • Needs HA (High Availability) setup for reliability.

Analysis:

If predictable costs and full server control are priorities, EC2 may be preferable. However, this solution requires maintenance and may not scale with heavy traffic. Moreover, to create an HA solution would require additional AWS resources.

CloudFront + S3 + WAF

As alluded to before, CloudFront + S3 might fit the bill. To create a secure, scalable, and cost-effective private static website, we chose to use Amazon S3 with CloudFront (sprinkling in a little AWS WAF for good measure). This architecture allows us to store our static assets in an S3 bucket while CloudFront acts as a caching and security layer in front of it. Unlike enabling public S3 static website hosting, this approach provides HTTPS support, better scalability, and fine-grained access control.

CloudFront integrates with Origin Access Control (OAC), ensuring that the S3 bucket only allows access from CloudFront and not directly from the internet. This eliminates the risk of unintended public exposure while still allowing authorized users to access content. Additionally, AWS WAF (Web Application Firewall) allows us to restrict access to only specific IP ranges or VPCs, adding another layer of security.

Let’s look at costs:

Item Cost Capacity Total
Data Transfer Out First 10TB is \$0.085 per GB 25GB/month of traffic Cost for 25GB: (25 x 0.085 = \$2.13)
HTTP Requests \$0.0000002 per request 250,000 requests/month Cost for requests: (250,000 x 0.0000002 = \$0.05)
Total CloudFront Cost: \$2.13 (Data Transfer) + \$0.05 (Requests) = \$2.18/month

Pros:

  • Scales effortlessly
    • AWS handles scaling automatically based on demand.
  • Lower maintenance
    • No need to manage servers or perform security updates.
  • Includes built-in caching & security
    • CloudFront integrates WAF and Origin Access Control (OAC).

Cons:

  • Traffic-based pricing
    • Costs scale with data transfer and request volume.
  • External traffic incurs costs
    • Data transfer fees apply for internet-accessible sites.
  • Less customization
    • Cannot modify web server settings beyond what CloudFront offers.
  • May require cache invalidations for often updated assets

Analysis:

And the winner is…CloudFront + S3!

Using just a website enabled S3 bucket fails to meet the basic requiredments so let’s eliminate that solution right off the bat. If predictable costs and full server control are priorities, Using an EC2 either as a proxy or a full blown webserver may be preferable. However, for a low-maintenance, auto-scaling solution, CloudFront + S3 is the superior choice. EC2 is slightly more expensive but avoids CloudFront’s external traffic costs. Overall, our winning approach is ideal because it scales automatically, reduces operational overhead, and provides strong security mechanisms without requiring a dedicated EC2 instance to serve content.

CloudFront+S3+WAF

  • CloudFront scales better - cost remains low per GB served, whereas EC2 may require scaling for higher traffic.
  • CloudFront includes built-in caching & security, while EC2 requires maintenance and patching.

Bash Scripting vs Terraform

Now that we have our agreed upon approach (the “what”) and documented our “architectural decision”, it’s time to discuss the “how”. How should we go about constructing our project? Many engineers would default to Terraform for this type of automation, but we had specific reasons for thinking this through and looking at a different approach. We’d like:

  • Full control over execution order (we decide exactly when & how things run).
  • Faster iteration (no need to manage Terraform state files).
  • No external dependencies - just AWS CLI.
  • Simple solution for a one-off project.

Why Not Terraform?

While Terraform is a popular tool for infrastructure automation, it introduces several challenges for this specific project. Here’s why we opted for a Bash script over Terraform:

  • State Management Complexity

    Terraform relies on state files to track infrastructure resources, which introduces complexity when running and re-running deployments. State corruption or mismanagement can cause inconsistencies, making it harder to ensure a seamless idempotent deployment.

  • Slower Iteration and Debugging

    Making changes in Terraform requires updating state, planning, and applying configurations. In contrast, Bash scripts execute AWS CLI commands immediately, allowing for rapid testing and debugging without the need for state synchronization.

  • Limited Control Over Execution Order

    Terraform follows a declarative approach, meaning it determines execution order based on dependencies. This can be problematic when AWS services have eventual consistency issues, requiring retries or specific sequencing that Terraform does not handle well natively.

  • Overhead for a Simple, Self-Contained Deployment

    For a relatively straightforward deployment like a private static website, Terraform introduces unnecessary complexity. A lightweight Bash script using AWS CLI is more portable, requires fewer dependencies, and avoids managing an external Terraform state backend.

  • Handling AWS API Throttling

    AWS imposes API rate limits, and handling these properly requires implementing retry logic. While Terraform has some built-in retries, it is not as flexible as a custom retry mechanism in a Bash script, which can incorporate exponential backoff or manual intervention if needed.

  • Less Direct Logging and Error Handling

    Terraform’s logs require additional parsing and interpretation, whereas a Bash script can log every AWS CLI command execution in a simple and structured format. This makes troubleshooting easier, especially when dealing with intermittent AWS errors.

When Terraform Might Be a Better Choice

Although Bash was the right choice for this project, Terraform is still useful for more complex infrastructure where:

  • Multiple AWS resources must be coordinated across different environments.
  • Long-term infrastructure management is needed with a team-based workflow.
  • Integrating with existing Terraform deployments ensures consistency.

For our case, where the goal was quick, idempotent, and self-contained automation, Bash scripting provided a simpler and more effective approach. This approach gave us the best of both worlds - automation without complexity, while still ensuring idempotency and security.


Next Steps

  • In Part IIa of the series we’ll discuss the challenges we faced with AWS automation.
  • Part IIb we’ll discuss in detail the script we built.
  • Finally, Part III will wrap things up with a better explanation of why this all works.

Disclaimer

This post was drafted with the assistance of ChatGPT, but born from real AWS battle scars.

If you like this content, please leave a comment or consider following me. Thanks.

Hosting a Secure Static Website with S3 and CloudFront: Part III

This is the last in our three part series where we discuss the creation of a private, secure, static website using Amazon S3 and CloudFront.

Introduction

Amazon S3 and CloudFront are powerful tools for hosting static websites, but configuring them securely can be surprisingly confusing-even for experienced AWS users. After implementing this setup for my own use, I discovered a few nuances that others often stumble over, particularly around CloudFront access and traffic routing from VPC environments. This post aims to clarify these points and highlight a potential gap in AWS’s offering.

The Secure S3 + CloudFront Website Setup

The typical secure setup for hosting a static website using S3 and CloudFront looks like this:

  1. S3 Bucket: Store your website assets. Crucially, this bucket should not be publicly accessible.
  2. CloudFront Distribution: Distribute your website content, with HTTPS enabled and custom domain support via ACM.
  3. Origin Access Control (OAC): Grant CloudFront permission to read from your private S3 bucket.
  4. S3 Bucket Policy: Configure it to allow access only from the CloudFront distribution (via OAC).

This setup ensures that even if someone discovers your S3 bucket URL, they won’t be able to retrieve content directly. All access is routed securely through CloudFront.

The VPC Epiphany: Why Is My Internal Traffic Going Through NAT?

For many AWS users, especially those running workloads inside a VPC, the first head-scratcher comes when internal clients access the CloudFront-hosted website. You might notice that this traffic requires a NAT gateway, and you’re left wondering:

  • “Isn’t this all on AWS’s network? Why is it treated as public?”
  • “Can I route CloudFront traffic through a private path in my VPC?”

Here’s the key realization:

CloudFront is a public-facing service. Even when your CloudFront distribution is serving content from a private S3 bucket, your VPC clients are accessing CloudFront through its public endpoints.

  • CloudFront -> S3: This is private and stays within the AWS network.
  • VPC -> CloudFront: This is treated as public internet traffic, even though it often stays on AWS’s backbone.

This distinction is not immediately obvious, and it can be surprising to see internal traffic going through a NAT gateway and showing up with a public IP.

CloudFront+S3+WAF

Why This Feels Like a Product Gap

For my use case, I wasn’t interested in CloudFront’s global caching or latency improvements; I simply wanted a secure, private website hosted on S3, with a custom domain and HTTPS. AWS currently lacks a streamlined solution for this. A product offering like “S3 Secure Website Hosting” could fill this gap by combining:

  • Private S3 bucket
  • Custom domain + HTTPS
  • Access control (VPC, IP, IAM, or WAF)
  • No CloudFront unless explicitly needed

Securing Access to Internal Clients

To restrict access to your CloudFront-hosted site, you can use AWS WAF with an IPSet containing your NAT gateway’s public IP address. This allows only internal VPC clients (routing through the NAT) to access the website while blocking everyone else.

Conclusion

The S3 + CloudFront setup is robust and secure - once you understand the routing and public/private distinction. However, AWS could better serve users needing simple, secure internal websites by acknowledging this use case and providing a more streamlined solution.

Until then, understanding these nuances allows you to confidently deploy secure S3-backed websites without surprises.

Disclaimer

This post was drafted with the assistance of ChatGPT, but born from real AWS battle scars.

If you like this content, please leave a comment or consider following me. Thanks.

1. Introduction

Ever locked yourself out of your own S3 bucket? That’s like asking a golfer if he’s ever landed in a bunker. We’ve all been there.

Scenario:

A sudden power outage knocks out your internet. When service resumes, your ISP has assigned you a new IP address. Suddenly, the S3 bucket you so carefully protected with that fancy bucket policy that restricts access by IP… is protecting itself from you. Nice work.

And here’s the kicker, you can’t change the policy because…you can’t access the bucket! Time to panic? Read on…

This post will cover:

  • Why this happens
  • How to recover
  • How to prevent it next time with a “safe room” approach to bucket policies

2. The Problem: Locking Yourself Out

S3 bucket policies are powerful and absolute. A common security pattern is to restrict access to a trusted IP range, often your home or office IP. That’s fine, but what happens when those IPs change without prior notice?

That’s the power outage scenario in a nutshell.

Suddenly (and without warning), I couldn’t access my own bucket. Worse, there was no easy way back in because the bucket policy itself was blocking my attempts to update it. Whether you go to the console or drop to a command line, you’re still hitting that same brick wall—your IP isn’t in the allow list.

At that point, you have two options, neither of which you want to rely on in a pinch:

  1. Use the AWS root account to override the policy.
  2. Open a support ticket with AWS and wait.

The root account is a last resort (as it should be), and AWS support can take time you don’t have.


3. The Safe Room Approach

Once you regain access to the bucket again, it’s time to build a policy that includes an emergency backdoor from a trusted environment. We’ll call that the “safe room”. Your safe room is your AWS VPC.

While your home IP might change with the weather, your VPC is rock solid. If you allow access from within your VPC, you always have a way to manage your bucket policy.

Even if you rarely touch an EC2 instance, having that backdoor in your pocket can be the difference between a quick fix and a day-long support ticket.


4. The Recovery & Prevention Script

A script to implement our safe room approach must at least:

  • Allow S3 bucket listing from your home IP and your VPC.
  • Grant bucket policy update permissions from your VPC.
  • Block all other access.

Options & Nice-To-Haves

  • Automatically detect the VPC ID (from the instance metadata).
    • …because you don’t want to fumble for it in an emergency
  • Accept your home IP as input.
    • …because it’s likely changed and you need to specify it
  • Support AWS CLI profiles.
    • …because you should test this stuff in a sandbox
  • Include a dry-run mode to preview the policy.
    • …because policies are dangerous to test live

This script helps you recover from lockouts and prevents future ones by ensuring your VPC is always a reliable access point.


5. Using the Script

Our script is light on dependencies but you will need to have curl and the aws script installed on your EC2.

A typical use of the command requires only your new IP address and the bucket name. The aws CLI will try credentials from the environment, your ~/.aws config, or an instance profile - so you only need -p if you want to specify a different profile. Here’s the minimum you’d need to run the command if you are executing the script in your VPC:

./s3-bucket-unlock.sh -i <your-home-ip> -b <bucket-name>

Options:

  • -i Your current public IP address (e.g., your home IP).
  • -b The S3 bucket name.
  • -v (Optional) VPC ID; auto-detected if not provided.
  • -p (Optional) AWS CLI profile (defaults to $AWS_PROFILE or default).
  • -n Dry run (show policy, do not apply).

Example with dry run:

./s3-bucket-unlock.sh -i 203.0.113.25 -b my-bucket -n

The dry run option lets you preview the generated policy before making any changes—a good habit when working with S3 policies.


6. Lessons Learned

Someone once said that we learn more from our failures than from our successes. At this rate I should be on the AWS support team soon…lol. Well, I probably need a lot more mistakes under my belt before they hand me a badge. In any event, ahem, we learned something from our power outage. Stuff happens - best be prepared. Here’s what this experience reinforced:

  • IP-based policies are brittle.
    • Your home IP will change. Assume it.
  • We should combine IP AND VPC-based controls.
    • VPC access is more stable and gives you a predictable backdoor. VPC access is often overlooked when setting up non-production projects.
  • Automation saves future you under pressure.
    • This script is simple, but it turns a frustrating lockout into a 60-second fix.
  • Root accounts are a last resort, but make sure you have your password ready!
    • Avoid the need to escalate by designing resilient access patterns upfront.

Sometimes it’s not a mistake - it’s a failure to realize how fragile access is. My home IP was fine…until it wasn’t.


7. Final Thoughts

Our script will help us apply a quick fix. The process of writing it was a reminder that security balances restrictions with practical escape hatches.

Next time you set an IP-based bucket policy, ask yourself:

  • What happens when my IP changes?
  • Can I still get in without root or AWS support?

Disclaimer

Thanks to ChatGPT for being an invaluable backseat driver on this journey. Real AWS battle scars + AI assistance = better results.

Best Books To Learn Perl Programming (in 2025)

Perl on Medium

Published by Robin on Wednesday 19 February 2025 14:47

Discover Expert-Recommended Perl Books and Practical Tips to Begin Your Programming Journey

Perl Weekly Issue #708 - Perl is growing...

r/perl

Published by /u/briandfoy on Wednesday 19 February 2025 12:31

Hi everyone,

As part of my learning of Perl I would like to use tools to analyze Perl code and render documentation for it, in a way that Doxygen analyzes C and C++ source code.

I found Doxygen::Filter::Perl and will try to experiment with to render Perl written long time ago that I have to maintain.

Is this what people use? Are there other tools? What do you use?

submitted by /u/prouleau001
[link] [comments]

PEVANS Core Perl 5: Grant Report for December 2024

r/perl

Published by /u/briandfoy on Tuesday 18 February 2025 12:31

CGI::Tiny - Perl CGI, but modern

blogs.perl.org

Published by Grinnz on Tuesday 18 February 2025 08:14

Originally published at dev.to

In a previous blog post, I explored the modern way to write CGI scripts using frameworks like Mojolicious. But as pointed out in comments, despite the many benefits, there is one critical problem: when you actually need to deploy to a regular CGI server, where the scripts will be loaded each time and not persisted, frameworks designed for persistent applications add lots of overhead to each request.

CGI scripts have historically been written using the CGI module (or even more ancient libraries). But this module is bulky, crufty, and has serious design issues that led to it being removed from Perl core.

Enter CGI::Tiny. It is built for one thing only: serving the CGI protocol. In most cases, frameworks are still the right answer, but in the case of scripts that are forced to run under the actual CGI protocol (such as shared web hosting), or when you want to just drop in CGI scripts with no need to scale, CGI::Tiny provides a modern alternative to CGI.pm. You can explore the interface differences from CGI.pm or suggested ways to extend CGI::Tiny scripts.

So without further ado, here is the equivalent CGI::Tiny script to my previous blog post's examples:

#!/usr/bin/env perl
use strict;
use warnings;
use CGI::Tiny;

cgi {
  my $cgi = $_;
  my $input = $cgi->param('input');
  $cgi->render(json => {output => uc $input});
};

Building Automated Scripting Applications with Perl

Perl on Medium

Published by Gathering Insight on Tuesday 18 February 2025 07:22

From Development to Deployment

bulk download cpan meta data?

r/perl

Published by /u/gruntastics on Tuesday 18 February 2025 00:23

Hello, this might be a strange thing to do, but is there a way to download the data behind fastapi.metacpan.org? I want to download everything so I don't clobber the api. In particular, I want to analyze dependencies, activity (of various kinds), number of open issues, etc. I realize a lot of that would be on github these days but not all.

(Why do I need such data? I have a side project idea to measure the "health" of a programming language by analyzing the liveliness of its package/module ecosystem (cpan for perl, npm for node, etc)).

submitted by /u/gruntastics
[link] [comments]

Perl 🐪 Weekly #708 - Perl is growing...

dev.to #perl

Published by Gabor Szabo on Monday 17 February 2025 07:31

Originally published at Perl Weekly 708

Hi there,

There are many interpretations of what it means to grow? I am using the term for new features. We get lots of improvements and new features with every release of Perl. In v5.38, the experimental class feature was rolled out in core. In the next maintenance release of Perl v5.40, new field attribute :reader was added and many other improvements. The next thing, we all waited was for field attribute :writer. Luckily it is already part of development release v5.41.7. I made this gist demonstrating the core changes.

If you are new to Perl Release Policy then there are two types of release i.e. Maintenance and Development. The even numbers are reserved for the maintenance release e.g. v5.38, v5.40 whereas odd numbers are for the development release e.g. v5.39, v5.41. The maintenance release are mostly production ready.

If you are interested in release history then please checkout the version history page. I found an interesting proposal with regard to the version number.

Recently, I got to try the different facets of parallel and concurrent programming. Please find below the list covered so far.

  1. Thread Lifecycle
  2. Multi-threading
  3. Multi-processing
  4. Thread Synchronization
  5. Process Synchronization
  6. Read/Write Lock
  7. Re-entrant Lock
  8. Livelock
  9. CPU bound Thread Performance
  10. IO bound Thread Performance

Enjoy rest of the newsletter.

--
Your editor: Mohammad Sajid Anwar.

Announcements

nicsell supports the German Perl Workshop

nicsell is now supporting German Perl Workshop. nicsell is a domain backorder service, also known as a dropcatcher, which allows you to bid on a large number of domains that are currently being deleted.

Articles

Premium XS Integration, Pt 2

This is a continuation of a series of articles about how to write XS libraries that are more convenient and foolproof for the Perl users, while not blocking them from using the actual C API.

Grants

PEVANS Core Perl 5: Grant Report for December 2024 - January 2025

The Weekly Challenge

The Weekly Challenge by Mohammad Sajid Anwar will help you step out of your comfort-zone. You can even win prize money of $50 by participating in the weekly challenge. We pick one champion at the end of the month from among all of the contributors during the month, thanks to the sponsor Lance Wicks.

The Weekly Challenge - 309

Welcome to a new week with a couple of fun tasks "Mind Gap" and "Min Diff". If you are new to the weekly challenge then why not join us and have fun every week. For more information, please read the FAQ.

RECAP - The Weekly Challenge - 308

Enjoy a quick recap of last week's contributions by Team PWC dealing with the "Count Common" and "Decode XOR" tasks in Perl and Raku. You will find plenty of solutions to keep you busy.

TWC308

Apart from Perl magics, there is CPAN gem, Data::Show, used as well. Cool, keep it up great work.

Exclusive or Common

Nice bunch of one-liners in Raku. Raku Rocks!!!

Perl Weekly Challenge: Week 308

It is one post where we get Perl and Raku magic together. On top, we have detailed discussion, incredible.

Common Encodings

Compact solutions in Perl and PDL. New to PDL? You must check it out.

lazyness

Welcome back with yet another quality contributions in Raku. Great work.

Perl Weekly Challenge 308

The post reminded me of good old Truth Table, very handy to cover the test cases. Thanks for sharing.

Avoid Common Traps, and Reduce the XOR

Lots of mathematical magic shared with this week contribution. Bitwise operation is always tricky. Well done.

AND and XOR

Great detailed XOR operation is very interesting, and definitely not to be missed. Thanks for the contributions.

The Weekly Challenge #308

Simple and straight forward approach makes it so easy to decode. Nice work, thanks for sharing.

Count Common from The Weekly Challenge 308

Clever use of set in Raku and Python, ended up one-liner. Keep it up great work.

Count Xor, ha ha ha

My personal favourite Postscript one-liner is USP of the post. Highly recommended.

Counting the XOR

Python makes me fall in love again and again. Incredibly powerful and easy to follow. Well done and keep it up.

Rakudo

2025.06 It’s A Bot!

Weekly collections

NICEPERL's lists

Great CPAN modules released last week.

Events

Boston.pm monthly meeting

Virtual event

Paris.pm monthly meeting

Paris, France

Boston.pm monthly meeting

Virtual event

German Perl/Raku Workshop Conference 2025

Munich, Germany

The Perl and Raku Conference 2025

Greenville, South Carolina, USA

You joined the Perl Weekly to get weekly e-mails about the Perl programming language and related topics.

Want to see more? See the archives of all the issues.

Not yet subscribed to the newsletter? Join us free of charge!

(C) Copyright Gabor Szabo
The articles are copyright the respective authors.

The Weekly Challenge - 309

The Weekly Challenge

Published on Monday 17 February 2025 03:42

Welcome to the Week #309 of The Weekly Challenge.

RECAP - The Weekly Challenge - 308

The Weekly Challenge

Published on Monday 17 February 2025 02:03

Thank you Team PWC for your continuous support and encouragement.

The Weekly Challenge - Guest Contributions

The Weekly Challenge

Published on Sunday 16 February 2025 09:35

As you know, The Weekly Challenge, primarily focus on Perl and Raku. During the Week #018, we received solutions to The Weekly Challenge - 018 by Orestis Zekai in Python. It was pleasant surprise to receive solutions in something other than Perl and Raku. Ever since regular team members also started contributing in other languages like Ada, APL, Awk, BASIC, Bash, Bc, Befunge-93, Bourne Shell, BQN, Brainfuck, C3, C, CESIL, Chef, COBOL, Coconut, C Shell, C++, Clojure, Crystal, D, Dart, Dc, Elixir, Elm, Emacs Lisp, Erlang, Excel VBA, F#, Factor, Fennel, Fish, Forth, Fortran, Gembase, GNAT, Go, GP, Groovy, Haskell, Haxe, HTML, Hy, Idris, IO, J, Janet, Java, JavaScript, Julia, K, Kap, Korn Shell, Kotlin, Lisp, Logo, Lua, M4, Maxima, Miranda, Modula 3, MMIX, Mumps, Myrddin, Nelua, Nim, Nix, Node.js, Nuweb, Oberon, Octave, OCaml, Odin, Ook, Pascal, PHP, Python, PostgreSQL, Postscript, PowerShell, Prolog, R, Racket, Rexx, Ring, Roc, Ruby, Rust, Scala, Scheme, Sed, Smalltalk, SQL, Standard ML, SVG, Swift, Tcl, TypeScript, Uiua, V, Visual BASIC, WebAssembly, Wolfram, XSLT, YaBasic and Zig.

Weekly Challenge: Counting the XOR

dev.to #perl

Published by Simon Green on Sunday 16 February 2025 04:23

Weekly Challenge 308

Each week Mohammad S. Anwar sends out The Weekly Challenge, a chance for all of us to come up with solutions to two weekly tasks. My solutions are written in Python first, and then converted to Perl. It's a great way for us all to practice some coding.

Challenge, My solutions

Task 1: Count Common

Task

You are given two array of strings, @str1 and @str2.

Write a script to return the count of common strings in both arrays.

My solution

The tasks and examples don't mention what to do if a string appears more than once in both arrays. I've made the assumption that we only need to return it once.

For the command line input, I take two strings that are space separated as shown in the example.

In Python this is a one liner. I turn the lists into sets (which only has unique values) and take the length of the intersection of these two sets.

def count_common(str1: list, str2: list) -> int:
    return len(set(str1) & set(str2))

Perl does not have sets or intersections built in. For the Perl solution, I turn both strings into a hash with the key being the strings. I then iterate through the keys of the first hash to see if they appear in the second hash. If they do, I increment the count variable.

sub main (@inputs) {
    my %str1 = map { $_, 1 } split( /\s+/, $inputs[0] );
    my %str2 = map { $_, 1 } split( /\s+/, $inputs[1] );

    my $count = 0;
    foreach my $str ( keys %str1 ) {
        $count++ if exists $str2{$str};
    }

    say $count;
}

Examples

$ ./ch-1.py "perl weekly challenge" "raku weekly challenge"
2

$ ./ch-1.py "perl raku java" "python java"
1

$ ./ch-1.py "guest contribution" "fun weekly challenge"
0

Task 2: Decode XOR

Task

You are given an encoded array and an initial integer.

Write a script to find the original array that produced the given encoded array. It was encoded such that encoded[i] = orig[i] XOR orig[i + 1].

My solution

This is relatively straight forward. For the command line input, I take the last value as the initial integer, and the rest as the encoded integers.

For this task, I create the orig list (array in Perl) with the initial value. I then iterate over each item in the encoded list and takes the exclusive-or of it and the last value in the orig list.

def decode_xor(encoded: list, initial: int) -> list:
    orig = [initial]

    for i in encoded:
        orig.append(i ^ orig[-1])

    return orig

Examples

$ ./ch-2.py 1 2 3 1
[1, 0, 2, 1]

$ ./ch-2.py 6 2 7 3 4
[4, 2, 0, 7, 4]

I have read all the "Similar questions" posts related to my question, but none have been helpful other than to suggest a subroutine might not have returned a value.

This is part of a proprietary CMS: create pages, add content, etc. that a number of websites on the same server are using. Only one of the sites returns this error, but only for one of the features.

First, the error:

Bareword "Common::dbconnect" not allowed while "strict subs" in use

Now the code:

A database object is created in Super.pm:

use Common;
my ( $dbh ) = Common::dbconnect;

Common.pm has the dbconnect subroutine:

sub dbconnect {
    my ($host, $database, $username, $password) = getvariables();
    use DBI qw(:sql_types);
    $dbh_cache->{member} = DBI->connect("DBI:mysql:".$database.":".
                                                     $host,
                                                     $username,
                                                     $password,
    { RaiseError => 1 },) or die "Connect failed: $DBI::errstr\n";
    return ($dbh_cache->{member});
}

Debugging this has been frustrating:

  • this looks like a compile error, so I can't use a data dumper to check values in either module
  • other CMS functions call the same routine for the same purpose, but no errors
  • other sites using the CMS return no errors when calling the same routine

What am I missing? What should I be trying?

UPDATE:

I should have added that this is at the top of Common.pm (which contains dbconnect:

use vars qw( @EXPORT );
@EXPORT = qw( checksession dbconnect);

And like I said, all other sites using the same exact code and calling the same exact methods work without error. The better question might be "how do I debug this?"

(dxxxv) 2 great CPAN modules released last week

Niceperl

Published by Unknown on Saturday 15 February 2025 22:54

Updates for great CPAN modules released last week. A module is considered great if its favorites count is greater or equal than 12.

  1. Perl::Tidy - indent and reformat perl scripts
    • Version: 20250214 on 2025-02-13, with 142 votes
    • Previous CPAN version: 20250105 was 1 month, 8 days before
    • Author: SHANCOCK
  2. YAML::PP - YAML 1.2 Processor
    • Version: v0.39.0 on 2025-02-09, with 17 votes
    • Previous CPAN version: v0.38.1 was 16 days before
    • Author: TINITA

SFTP script hangs when run via Apache [closed]

Perl questions on StackOverflow

Published by Welcho on Friday 14 February 2025 21:23

I need to upload a file to a server using SFTP.

When I run the script from the command line, it works perfectly. However, when executed through Apache, it hangs indefinitely and never returns a response.

I'm using:

  • Windows 11
  • Apache 2.4.52
  • Strawberry Perl 5.32

For some reason it is imperative to use module "Net::SFTP::Foreign".

It’s clearly an Apache-related issue, but after days of troubleshooting, I haven't made any progress. What can I try next?

Here’s my code:

#!/usr/bin/perl
use strict;
use warnings;
use Net::SFTP::Foreign;

my $host = 'host';
my $user = 'user';

my $remote_path = '/var/www/html/img/';
my $local_file = 'file.txt';

print "Content-type: text/html\n\n";

my $sftp = Net::SFTP::Foreign->new(
    host=> $host,
    user=> $user,
    ssh_cmd => '"C:\\Program Files\\PuTTY\\plink.exe"', 
    more => ['-i', 'D:\\OneDriveIQUE\\trabajo\\localhost\\administrator\\temp\\sftp_key_2.ppk'],
    stderr_discard => 1,
);

$sftp->error and die "Error de conexión: " . $sftp->error;

print "Conectado a $host\n";

$sftp->put($local_file, "$remote_path$local_file") or die "Error al subir el archivo: " . $sftp->error;
print "Archivo $local_file subido correctamente a $remote_path\n";

my @files = $sftp->ls($remote_path, names_only => 1);
print "Archivos en $remote_path:\n", join("\n", @files), "\n";

exit;

I can't install a Perl module. I have it installed and working OK on another laptop. Can I copy the whole installation from that laptop? What I tried:

  • I have Strawberry on my current laptop. I tried to install Image::Magick, but it gives an error. I read that in principle it is possible to install Image::Magick to Strawberry from cpan , but it seems it's going to be quite complicated.
  • I tried to install ActiveState Perl instead, but its installation procedure is too complicated. Plus, I would prefer to keep my Strawberry installation alongside with ActiveState for legacy programs. So I felt tempted to try a simpler quick-and-dirty shortcut:
  • I have Activestate Perl on my old laptop, with Image::Magick installed and working OK. So I hoped to copy it completely to my current laptop:
  • I copied the whole tree c:\Perl64 and prepended its \bin to the Path variable. Now perl --verion successfully presents itself as ActiveState. However, it still looks for modules under c:\Strawbwerry not under c:\Perl64\lib, where Image::Magick is really present (copied from the old laptop). Is there an environment variable to control where pel will look for modules? Is there anything else that I should copy from my old laptop? I would prefer simple solutions, without compiling things locally. And without changing my Perl code itself Both systems (new - target and old - source) are Windows-10 Very strangely perl-V says: @INC: C:/Perl64/site/lib C:/Perl64/lib And still it does not find Image::Magick, which really is present in: C:\Perl64\site\lib\Image\Magick.pm Strangely, the error message says: Can't locate Image/Magick.pm in @INC (you may need to install the Image::Magick module) (@INC entries checked: C:/Strawberry/perl/site/lib C:/Strawberry/perl/vendor/lib C:/Strawberry/perl/lib) Note that these @INC entries do not correspnd to to the value of @INC reported by perl -V (see above) Interestingly, on the old (source) laptop, perl-V reports the same as o the new (target) one: @INC: C:/Perl64/site/lib C:/Perl64/lib And there Image::Magick works OK, with the same tree od c:\Perl64 .

How to include a table title in Perl Text::Table::More

Perl questions on StackOverflow

Published by Zilore Mumba on Thursday 13 February 2025 19:35

I have made a simple version of a table using Text::Table::More as given here. I have been able to include a title being having the first row span all the columns.

I tried to remove the top_border with "top_border => 0" but it does not work. can this be done?

Also the unclear guide (to me) in this module suggests that one can color rows. Is this doable? My code below.

#!perl
use 5.010001;
use strict;
use warnings;
use Text::Table::More qw/generate_table/;

my $rows = [
# header row
[{text=>"Upcoming Program Achievements in Entertainment", align => "left", colspan=>5}],
# first data row
["Year",
"Comedy",
"Drama",
"Variety",
"Lead Comedy Actor"],
# second data row
[1962,
"The Bob Newhart Show (NBC)",
"The Defenders (CBS)",
"The Garry Moore Show (CBS)",
"E. G. Marshall (CBS)"],
# third data row
[1963,
"The Dick Van Dyke Show (CBS)",
"The Dick Van Dyke Show (CBS)",
"The Andy Williams Show (NBC)",
"The Andy Williams Show (NBC)"],
# fourth data row
[1964,
"The Danny Kaye Show (CBS)",
"Dick Van Dyke (CBS)",
"Mary Tyler Moore (CBS)",
"The Andy Williams Show (NBC)"],
];

binmode STDOUT, "utf8";
print generate_table(
rows => $rows,      # required
to_border => 0      #top border was put here, this is what I said doesn't work
header_row => 1,    # optional, default 0
separate_rows => 1, # optional, default 0
border_style => "UTF8::SingleLineBoldHeader",
row_attrs => [
[0, {align=>'middle', bottom_border=>1}],
 ],
col_attrs => [[2, {valign=>'middle'}],
],
);

The Weekly Challenge - 308

The Weekly Challenge

Published on Thursday 13 February 2025 09:22

Welcome to the Week #308 of The Weekly Challenge.

The Weekly Challenge - 307

The Weekly Challenge

Published on Thursday 13 February 2025 09:22

Welcome to the Week #307 of The Weekly Challenge.

When Laziness Isn't

blogs.perl.org

Published by silent11 on Tuesday 11 February 2025 17:00

I just needed a few rows of UUIDs in a column of a spreadsheet, more for esthetics than anything else. uuidgen to the rescue.

At the time I didn't realize that uuidgen natively supports outputting multiple ids like so
uuidgen -C 8


The truly lazy path would have been to read the fine uuidgen manual.

Alas, supposing I needed to make multiple calls to uuidgen, I went with a Perl one-liner with a loop, as I couldn't recall the Bash loop syntax.

Here comes the laziness... I I didn't want to write something like this:

perl -e 'print `uuidgen` for @{[1..5]}';


I'm not so found of of perl's de-reference syntax these days, also that array reference/range was giving "the ick" as my kids would say. I needed something lazier, cleaner. I wondered if there were any default/exported arrays available to me that don't have too many elements to them.... Ah, I know!



$ perl -e 'print `uuidgen` for @INC';

d2c9c4b9-2126-4eda-ba52-ca30fdc55db0
eac4f86a-04eb-4c1a-aba1-fb1fa5c7dcda
2a2c416c-00bc-46d8-b7ce-c639f73cef26
4cc052cc-6423-4420-bbf5-595a7ad28c51
0bb78a2e-f4e9-44cd-80ae-e463197398f5
37728b6c-69dc-4669-99e7-2814b0d5e2a6
5acf78b2-6938-465b-ad8a-3bf29037e749
87d6d4ef-e85c-40bb-b3c2-acf9dc88f3e1


This is more a case of (ab)using a variable for an unintended purpose, but today it got the job done, even if it wasn't the most lazy approach. Hubris? Maybe.

nicsell supports the German Perl Workshop

blogs.perl.org

Published by Max Maischein on Tuesday 11 February 2025 13:41

Sie bieten, wir catchen!
nicsell ist ein Domain-Backorder-Dienst, auch Dropcatcher genannt, der es Ihnen ermöglicht, auf eine Vielzahl freiwerdender Domains zu bieten, die sich aktuell in der Löschungsphase befinden.
Schon ab einem geringen Startgebot von 10 € können Sie an unseren Auktionen teilnehmen und haben die Chance an Ihre Wunschdomain zu gelangen.
Übrigens: Zur Verstärkung unseres Teams in Osnabrück suchen wir engagierte Perl-Entwickler (m/w/d). Bei Interesse freuen wir uns auf Ihre Bewerbung!

Nicsell

Premium XS Integration, Pt 2

blogs.perl.org

Published by Nerdvana on Tuesday 11 February 2025 07:52

This is a continuation of a series of articles about how to write XS libraries that are more convenient and foolproof for the Perl users, while not blocking them from using the actual C API.

If you spot anything wrong, or want to contribute suggestions, open an issue at the GitHub repo

Wrapping Transient Objects

One frequent and difficult problem you will encounter when writing XS wrappers around a C library is what to do when the C library exposes a struct which the user needs to see, but the lifespan of that struct is controlled by something other than the reference the user is holding onto.

For example, consider the Display and Screen structs of libX11. When you connect to an X server, the library gives you a Display pointer. Within that Display struct are Screen structs. Some of the X11 API uses those Screen pointers as parameters, and you need to expose them in the Perl interface. But, if you call XCloseDisplay on the Display pointer those Screen structs get freed, and now accessing them will crash the program. The Perl user might still be holding onto a X11::Xlib::Screen Perl object, so how do you stop them from crashing the program when they check an attribute of that object?

Indirect References

For the case of X11 Screens there was an easy workaround: The Screen structs are numbered, and a pair of (Display, ScreenNumber) can refer to the Screen struct without needing the pointer to it. Because the Perl Screen object references the Perl Display object, the methods of Screen can check whether the display is closed before resolving the pointer to a Screen struct, and die with a useful message instead of a crash.

From another perspective, you can think of them like symlinks. You reference one Perl object which has control over its own struct’s lifecycle and then a relative path from that struct to whatever internal data structure you’re wrapping with the current object.

While this sounds like a quick solution, there’s one other detail to worry about: cyclical references. If the sub-object is referring to the parent object, and the parent refers to a collection of sub-objects, Perl will never free these objects. For the case of X11 Screens, the list of screen structs is known at connection-time and is almost always just one Screen, and doesn’t change at runtime. [1] An easy solution for a case like this is to have a strong reference from Display to Screen, and weak references (Scalar::Util::weaken) from Screen to Display, and create all the Screen objects as soon as the Display is connected.

1) this API is from an era before people thought about connecting new monitors while the computer was powered up, and these days can more accurately be thought of as a list of graphics cards rather than “screens”

Lazy Cache of Wrapper Objects

If the list of Screens were dynamic, or if I just didn’t want to allocate them all upfront for some reason, another approach is to wrap the C structs on demand. You could literally create a new wrapper object each time they access the struct, but you’d probably want to return the same Perl object if they access two references to the same struct. One way to accomplish this is with a cache of weak references.

In Perl it would look like:

package MainObject {
  use Moo;
  use Scalar::Util 'weaken';

  has is_closed         => ( is => 'rwp' );

  # MainObject reaches out to invalidate all the SubObjects
  sub close($self) {
    ...
    $self->_set_is_closed(1);
  }

  has _subobject_cache => ( is => 'rw', default => sub {+{}} );

  sub _new_cached_subobject($self, $ptr) {
    my $obj= $self->_subobject_cache->{$ptr};
    unless (defined $obj) {
      $obj= SubObject->new(main_ref => $main, data_ptr => $ptr);
      weaken($self->_subobject_cache->{$ptr}= $obj);
    }
    return $obj;
  }

  sub find_subobject($self, $search_key) {
    my $data_ptr= _xs_find_subobject($self, $search_key);
    return $self->_new_cached_subobject($data_ptr);
  }
}

package SubObject {
  use Moo;

  has main_ref => ( is => 'ro' );
  has data_ptr => ( is => 'ro' );

  sub method1($self) {
    # If main is closed, stop all method calls
    croak "Object is expired"
      if $self->main_ref->is_closed;
    ... # operate on data_ptr
  }

  sub method2($self) {
    # If main is closed, stop all method calls
    croak "Object is expired"
      if $self->main_ref->is_closed;
    ... # operate on data_ptr
  }
}

Now, the caller of find_subobject gets a SubObject, and it has a strong reference to MainObject, and MainObject’s cache holds a weak reference to the SubObject. If we call that same method again with the same search key while the first SubObject still exists, we get the same Perl object back. As long as the user holds onto the SubObject, the MainObject won’t expire, but the SubObjects can get garbage collected as soon as they aren’t needed.

One downside of this exact design is that every method of SubObject which uses data_ptr will need to first check that main_ref isn’t closed (like shown in method1). If you have frequent method calls and you’d like them to be a little more efficient, here’s an alternate version of the same idea:

package MainObject {
  ...

  # MainObject reaches out to invalidate all the SubObjects
  sub close($self) {
    ...
    $_->data_ptr(undef)
      for grep defined, values $self->_subobject_cache->%*;
  }

  ...
}

package SubObject {
  ...

  sub method1($self) {
    my $data_ptr= $self->data_ptr
      // croak "SubObject belongs to a closed MainObject";
    ... # operate on data_ptr
  }

  sub method2($self) {
    my $data_ptr= $self->data_ptr
      // croak "SubObject belongs to a closed MainObject";
    ... # operate on data_ptr
  }

  ...
}

In this pattern, the sub-object doesn’t need to consult anything other than its own pointer before getting to work, which comes in really handy with the XS Typemap. The sub-object also doesn’t need a reference to the main object (unless you want one to prevent the main object from getting freed while a user holds SubObjects) so this design is a little more flexible. The only downside is that closing the main object takes a little extra time as it invalidates all of the SubObject instances, but in XS that time won’t be noticeable.

Lazy Cache of Wrapper Objects, in XS

So, what does the code above look like in XS? Here we go…

/* First, the API for your internal structs */

struct MainObject_info {
  SomeLib_MainObject *obj;
  HV *wrapper;
  HV *subobj_cache;
  bool is_closed;
};

struct SubObject_info {
  SomeLib_SubObject *obj;
  SomeLib_MainObject *parent;
  HV *wrapper;
};

struct MainObject_info*
MainObject_info_create(HV *wrapper) {
  struct MainObject_info *info= NULL;
  Newxz(info, 1, struct MainObject_info);
  info->wrapper= wrapper;
  return info;
}

void MainObject_info_close(struct MainObject_info* info) {
  if (info->is_closed) return;
  /* All SubObject instances are about to be invalid */
  if (info->subobj_cache) {
    HE *pos;
    hv_iterinit(info->subobj_cache);
    while (pos= hv_iternext(info->subobj_cache)) {
      /* each value of the hash is a weak reference,
         which might have become undef at some point */
      SV *subobj_ref= hv_iterval(info->subobj_cache, pos);
      if (subobj_ref && SvROK(subobj_ref)) {
        struct SubObject_info *s_info =
          SubObject_from_magic(SvRV(subobj_ref), 0);
        if (s_info) {
          /* it's an internal piece of the parent, so
             no need to call a destructor here */
          s_info->obj= NULL;
          s_info->parent= NULL;
        }
      }
    }
  }
  SomeLib_MainObject_close(info->obj);
  info->obj= NULL;
  info->is_closed= true;
}

void MainObject_info_free(struct MainObject_info* info) {
  if (info->obj)
    MainObject_info_close(info);
  if (info->subobj_cache)
    SvREFCNT_dec((SV*) info->subobj_cache);
  /* The lifespan of 'wrapper' is handled by perl,
   * probably in the process of getting freed right now.
   * All we need to do is delete our struct.
   */
  Safefree(info);
}

The gist here is that MainObject has a set of all SubObject wrappers which are still held by the Perl script, and during “close” (which, in this hypothetical library, invalidates all SubObject pointers) it can iterate that set and mark each wrapper as being invalid.

The Magic setup for MainObject goes just like in the previous article:

static int MainObject_magic_free(pTHX_ SV* sv, MAGIC* mg) {
  MainObject_info_free((struct MainObject_info*) mg->mg_ptr);
}
static MAGIC MainObject_magic_vtbl = {
  ...
};

struct MainObject_info *
MainObject_from_magic(SV *objref, int flags) {
  ...
}

The destructor for the magic will call the destructor for the info struct. The “frommagic” function instantiates the magic according to ‘flags’, and so on.

Now, the Magic handling for SubObject works a little differently. We don’t get to decide when to create or destroy SubObject, we just encounter these pointers in the return values of the C library functions, and need to wrap them in order to show them to the perl script.

/* Return a new ref to an existing wrapper, or
 * create a new wrapper and cache it.
 */
SV * SubObject_wrap(SomeLib_SubObject *sub_obj) {
  /* If your library doesn't have a way to get the main object
   * from the sub object, this gets more complicated.
   */
  SomeLib_MainObject *main_obj= SomeLib_SubObject_get_main(sub_obj);
  SV **subobj_entry= NULL;
  SubObject_info *s_info= NULL;
  HV *wrapper= NULL;
  SV *objref= NULL;
  MAGIC *magic;

  /* lazy-allocate the cache */
  if (!main_obj->subobj_cache) {
    main_obj->subobj_cache= newHV();

  /* See if the SubObject has already been wrapped.
   * Use the pointer as the key
   */
  subobj_entry= hv_fetch(
    main_obj->subobj_cache,
    &sub_obj, sizeof(void*), 1
  );
  if (!subobj_entry)
    croak("lvalue hv_fetch failed"); /* should never happen */

  /* weak references may have become undef */
  if (*subobj_entry && SvROK(*subobj_entry))
    /* we can re-use the existing wrapper */
    return newRV_inc( SvRV(*subobj_entry) );

  /* Not cached. Create the struct and wrapper. */
  Newxz(s_info, 1, struct SubObject_info);
  s_info->obj= sub_obj;
  s_info->wrapper= newHV();
  s_info->parent= main_obj;
  objref= newRV_noinc((SV*) s_info->wrapper);
  sv_bless(objref, gv_stashpv("YourProject::SubObject", GV_ADD));

  /* Then attach the struct pointer to its wrapper via magic */
  magic= sv_magicext((SV*) s_info->wrapper, NULL, PERL_MAGIC_ext,
      &SubObject_magic_vtbl, (const char*) s_info, 0);
#ifdef USE_ITHREADS
  magic->mg_flags |= MGf_DUP;
#else
  (void)magic; // suppress warning
#endif

  /* Then add it to the cache as a weak reference */
  *subobj_entry= sv_rvweaken( newRV_inc((SV*) s_info->wrapper) );

  /* Then return a strong reference to it */
  return objref;
}

Again, this is roughly equivalent to the Perl implementation of new_cached_subobject above.

Now, when methods are called on the SubObject wrapper, we want to throw an exception if the SubObject is no longer valid. We can do that in the function that the Typemap uses:

struct SubObject_info *
SubObject_from_magic(SV *objref, int flags) {
  struct SubObject_info *ret= NULL;

  ... /* inspect magic */

  if (flags & OR_DIE) {
    if (!ret)
      croak("Not an instance of SubObject");
    if (!ret->obj)
      croak("SubObject belongs to a closed MainObject");
  }
  return ret;
}

Now, the Typemap:

TYPEMAP
struct MainObject_info *   O_SomeLib_MainObject_info
SomeLib_MainObject*        O_SomeLib_MainObject
struct SubObject_info *    O_SomeLib_SubObject_info
SomeLib_SubObject*         O_SomeLib_SubObject

INPUT
O_SomeLib_MainObject_info
  $var= MainObject_from_magic($arg, OR_DIE);

INPUT
O_SomeLib_MainObject
  $var= MainObject_from_magic($arg, OR_DIE)->obj;

INPUT
O_SomeLib_SubObject_info
  $var= SubObject_from_magic($arg, OR_DIE);

INPUT
O_SomeLib_SubObject
  $var= SubObject_from_magic($arg, OR_DIE)->obj;

OUTPUT
O_SomeLib_SubObject
  sv_setsv($arg, sv_2mortal(SubObject_wrap($var)));

This time I added an “OUTPUT” entry for SubObject, because we can safely wrap any SubObject pointer that we see in any of the SomeLib API calls, and get the desired result.

There’s nothing stopping you from automatically wrapping MainObject pointers with an OUTPUT typemap, but that’s prone to errors because sometimes an API returns a pointer to the already-existing MainObject, and you don’t want perl to put a second wrapper on the same MainObject. This problem doesn’t apply to SubObject, because we re-use any existing wrapper by checking the cache. (of course, you could apply the same trick to MainObject and have a global cache of all the known MainObject instances, and actually I do this in X11::Xlib)

But in general, for objects like MainObject I prefer to special-case my constructor (or whatever method initializes the instance of SomeLib_MainObject) with a call to _from_magic(..., AUTOCREATE) on the INPUT typemap rather than returning the pointer and letting Perl’s typemap wrap it on OUTPUT.

After all that, it pays off when you add a bunch of methods in the rest of the XS file.

Looking back to the find_subobject method of the original Perl example, all you need in the XS is basically the prototype for that function of SomeLib:

SomeLib_SubObject *
find_subobject(main, search_key)
  SomeLib_MainObject *main
  char *key

and XS translation handles the rest!

Reduce Redundancy in your Typemap

I should mention that you don’t need a new typemap INPUT/OUTPUT macro for every single data type. The macros for a typemap provide you with a $type variable (and others, see perldoc xstypemap) which you can use to construct function names, as long as you name your functions consistently. If you have lots of different types of sub-objects, you could extend the previous typemap like this:

TYPEMAP
struct MainObject_info *    O_INFOSTRUCT_MAGIC
SomeLib_MainObject*         O_LIBSTRUCT_MAGIC

struct SubObject1_info *    O_INFOSTRUCT_MAGIC
SomeLib_SubObject1*         O_LIBSTRUCT_MAGIC_INOUT

struct SubObject2_info *    O_INFOSTRUCT_MAGIC
SomeLib_SubObject2*         O_LIBSTRUCT_MAGIC_INOUT

struct SubObject3_info *    O_INFOSTRUCT_MAGIC
SomeLib_SubObject3*         O_LIBSTRUCT_MAGIC_INOUT

INPUT
O_INFOSTRUCT_MAGIC
  $var= @{[ $type =~ / (\w+)/ ]}_from_magic($arg, OR_DIE);

INPUT
O_LIBSTRUCT_MAGIC
  $var= @{[ $type =~ /_(\w*)/ ]}_from_magic($arg, OR_DIE)->obj;

INPUT
O_LIBSTRUCT_MAGIC_INOUT
  $var= @{[ $type =~ /_(\w*)/ ]}_from_magic($arg, OR_DIE)->obj;

OUTPUT
O_LIBSTRUCT_MAGIC_INOUT
  sv_setsv($arg, sv_2mortal(@{[ $type =~ /_(\w*)/ ]}_wrap($var)));

Of course, you can choose your function names and type names to fit more conveniently into these patterns.

Finding the MainObject for a SubObject

Now, you maybe noticed that I made the convenient assumption that the C library has a function that looks up the MainObject of a SubObject:

SomeLib_MainObject *main= SomeLib_SubObject_get_main(sub_obj);

That isn’t always the case. Sometimes the library authors assume you have both pointers handy and don’t bother to give you a function to look one up from the other.

The easiest workaround is if you can assume that any function which returns a SubObject also took a parameter of the MainObject as an input. Then, just standardize the variable name given to the MainObject and use that variable name in the typemap macro.

OUTPUT
O_SomeLib_SubObject
  sv_setsv($arg, sv_2mortal(SubObject_wrap(main, $var)));

This macro blindly assumes that “main” will be in scope where the macro gets expanded, which is true for my example:

SomeLib_SubObject *
find_subobject(main, search_key)
  SomeLib_MainObject *main
  char *key

But, what if it isn’t? What if the C API is basically walking a linked list, and you want to expose it to Perl in a way that the user can write:

for (my $subobj= $main->first; $subobj; $subobj= $subobj->next) {
  ...
}

The problem is that the “next” method is acting on one SubObject and returning another SubObject, with no reference to “main” available.

Well, if a subobject wrapper exists, then it knows the main object, so you just need to look at that SubObject info’s pointer to parent (the MainObject) and make that available for the SubObject’s OUTPUT typemap:

SomeLib_SubObject *
next(prev_obj_info)
  struct SubObject_info *prev_obj_info;
  INIT:
    SomeLib_MainObject *main= prev_obj_info->parent;
  CODE:
    RETVAL= SomeLib_SubObject_next(prev_obj_info->obj);
  OUTPUT:
    RETVAL

So, now there is a variable ‘main’ in scope when it’s time for the typemap to construct a wrapper for the SomeLib_SubObject.

Conclusion

In Perl, the lifespan of objects is nicely defined: the destructor runs when the last reference is lost, and you use a pattern of strong and weak references to control the order the destructors run. In C, the lifespan of objects is dictated by the underlying library, and you might need to go to some awkward lengths to track which ones the Perl user is holding onto, and then flag those objects when they become invalid. While somewhat awkward, it’s very possible thanks to weak references and hashtables keyed on the C pointer address, and the users of your XS library will probably be thankful when they get a useful error message about violating the lifecycle of objects, instead of a mysterious segfault.

Writing git extensions in Perl

dev.to #perl

Published by Juan Julián Merelo Guervós on Monday 10 February 2025 09:34

Introduction

Most people will tell you git is a source control tool; some people will tell you that git is a content-addressable filesystem. It's all that, but the interesting thing is that it's a single-tool interface to frameworks that allow you to create products as a team.
Enter the absolutely simple extension mechanism that git has: write a n executable called git-xxx and git will dutifully call it when you make git xxx. Which is why, to make an easier onramp for students in my 7th-semester class in Computer Science, I created an extension called git iv (IV is the acronym for the class). The extension allows them to create branches with specific names, as well as upload those branches, without needing to remember specific git commands.

You might argue that remembering git commands is what students should do, but in fact they don't, and since this is not part of the core of the class, I prefer to eliminate sources of trouble for them (which eventually become sources of trouble for me) using this.

Writing the extension in Perl

There are many good things that can be said about Perl, for this or for anything else. But in this case there's a thing that makes it ideal for writing extensions: git includes a Perl module called Git, which is a Perl interface to all the Git commands. This is distributed with git, so if you've got git, you've got this library.

The whole extension is not hosted in this GitHub repo; this will contain the most up-to-date version as well as documentation and other stuff.

So here's the preamble to the extension:

use strict;
use warnings;
use lib qw( /Library/Developer/CommandLineTools/usr/share/git-core/perl
            /usr/share/perl5 );
use Git;

use v5.14;

my $HELP_FLAG = "-h";
my $USAGE_STRING = <<EOC;
Uso:
    git iv objetivo <número> -- crea una rama para ese objetivo
    git iv sube-objetivo     -- sube al repo remoto la rama

    git iv $HELP_FLAG -- imprime este mensaje
EOC

The main caveat about the extension is that some flags will be handled by git itself. There are probably quite a few of those, but one of them is --help. git xxx --help will try to look up a manual page for git xxx. This is why above a different help flag is defined. And also a usage string, which is helpful when you don't remember the exact shape of the subcommands. In this case, I use git iv as the extension name and as interface to the stuff that needs to be made; but there are subcommands that will do different things. These are implemented later:

my @subcommands = qw(objetivo sube-objetivo);
push( @subcommands, quotemeta $HELP_FLAG);

die( usage_string() ) unless @ARGV;
my $subcommand = shift;

die "No se reconoce el subcomando $subcommand" unless grep( /\Q$subcommand/, @subcommands );

my @args = @ARGV;

I prefer not to include any dependencies; there are powerful command line flag libraries out there, but in this case, a single script is best. So you handle whatever comes after iv uniformly, be it a subcommand or a flag. But the issue with the flag is that it includes a dash -, so we wrap it so that it can be used safely in regexes. Like the one, for instance, 4 lines below: in case the subcommand at the front of the command line is not part of the list, it will bail out showing the usage string.

Anything after the subcommand will be gobbled into @args.

if ( $subcommand eq $HELP_FLAG ) {
  say $USAGE_STRING;
} else {

  my $repo;

  eval {
    $repo = Git->repository;
  } or die "Aparentemente, no estás en un repositorio";

  if ( $subcommand eq "objetivo" ) {
    die $USAGE_STRING unless @args;
    $repo->command( "checkout", "-b", "Objetivo-" . $args[0]);
  }

  if ( $subcommand eq "sube-objetivo" ) {
    my $branch = $repo->command( "rev-parse", "--abbrev-ref", "HEAD" );
    chomp($branch);
    $repo->command ( "push", "-u", "origin", $branch );
  }
}

Now it's a matter of processing the subcommand. If it's the flag -h, print the usage string; if it's any of the other subcommands, we need to work with the git repository.

$repo = Git->repository; creates an object out of the Git library we mentioned before that we will use to issue the different plumbing or high level commands. One of the subcommands will do a checkout: $repo->command( "checkout", "-b", "Objetivo-" . $args[0]); will convert itself to the equivalent command. You can even work with plumbing commands such as rev-parse to check the branch you're in and create that branch remotely, ad the other command does.

Concluding

Perl saves you a whole lot of trouble when writing this kind of thing. Besides, the fact that it will be most probably be installed in any system you use to develop (Mac, Linux or WSL) will save you trouble asking for prerequisites for this script.

perl

dev.to #perl

Published by RAK on Monday 10 February 2025 08:10

(dxxxiv) 12 great CPAN modules released last week

Niceperl

Published by Unknown on Saturday 08 February 2025 23:42

Updates for great CPAN modules released last week. A module is considered great if its favorites count is greater or equal than 12.

  1. App::Netdisco - An open source web-based network management tool.
    • Version: 2.083001 on 2025-02-06, with 17 votes
    • Previous CPAN version: 2.082001 was 8 days before
    • Author: OLIVER
  2. Crypt::Passphrase - A module for managing passwords in a cryptographically agile manner
    • Version: 0.021 on 2025-02-04, with 17 votes
    • Previous CPAN version: 0.020 was 25 days before
    • Author: LEONT
  3. CryptX - Cryptographic toolkit
    • Version: 0.085 on 2025-02-08, with 51 votes
    • Previous CPAN version: 0.084 was 3 months, 23 days before
    • Author: MIK
  4. Imager - Perl extension for Generating 24 bit Images
    • Version: 1.026 on 2025-02-08, with 67 votes
    • Previous CPAN version: 1.025 was 2 months, 22 days before
    • Author: TONYC
  5. IO::Prompter - Prompt for input, read it, clean it, return it.
    • Version: 0.005002 on 2025-02-07, with 27 votes
    • Previous CPAN version: 0.005001 was 1 year, 6 months, 22 days before
    • Author: DCONWAY
  6. Mozilla::CA - Mozilla's CA cert bundle in PEM format
    • Version: 20250202 on 2025-02-02, with 19 votes
    • Previous CPAN version: 20240924 was 4 months, 8 days before
    • Author: LWP
  7. PerlPowerTools - BSD utilities written in pure Perl
    • Version: 1.049 on 2025-02-06, with 40 votes
    • Previous CPAN version: 1.048 was 1 month, 28 days before
    • Author: BRIANDFOY
  8. Rex - the friendly automation framework
    • Version: 1.16.0 on 2025-02-05, with 86 votes
    • Previous CPAN version: 1.15.0 was 3 months before
    • Author: FERKI
  9. SPVM - The SPVM Language
    • Version: 0.990043 on 2025-02-07, with 35 votes
    • Previous CPAN version: 0.990042 was 16 days before
    • Author: KIMOTO
  10. Sys::Virt - libvirt Perl API
    • Version: v11.0.0 on 2025-02-07, with 17 votes
    • Previous CPAN version: v10.9.0 was 3 months, 6 days before
    • Author: DANBERR
  11. Test::Warnings - Test for warnings and the lack of them
    • Version: 0.038 on 2025-02-02, with 18 votes
    • Previous CPAN version: 0.037 was 28 days before
    • Author: ETHER
  12. YAML::LibYAML - Perl YAML Serialization using XS and libyaml
    • Version: v0.903.0 on 2025-02-02, with 57 votes
    • Previous CPAN version: v0.902.0 was 4 months, 12 days before
    • Author: TINITA

(dc) metacpan weekly report - Perlmazing

Niceperl

Published by Unknown on Saturday 08 February 2025 23:39

This is the weekly favourites list of CPAN distributions. Votes count: 48

Week's winners (+3): Perlmazing 

Build date: 2025/02/08 22:37:34 GMT


Clicked for first time:


Increasing its reputation:

A view upward toward the wooden framing of a house under construction against a blue sky.

Recently I’ve been working on a project with a Vue front-end and two back-ends, one in Python using the Django framework and one in Perl using the Mojolicious framework. So, it’s a good time to spend some words to share the experience and do a quick comparison.

Previously I wrote a post about Perl web frameworks, and now I’m expanding the subject into another language.

Django was chosen for this project because it’s been around for almost 20 years now and provides the needed maturity and stability to be long-running and low-budget. In this regard, it has proved a good choice so far. Recently it saw a major version upgrade without any problems to speak of. It could be argued that I should have used the Django REST Framework instead of plain Django. However, at the time the decision was made, adding a framework on top of another seemed a bit excessive. I don’t have many regrets about this, though.

Mojolicious is an old acquaintance. It used to have fast-paced development but seems very mature now, and it’s even been ported to JavaScript.

Both frameworks have just a few dependencies (which is fairly normal in the Python world, but not in the Perl one) and excellent documentation. They both follow the model-view-controller pattern. Let’s examine the components.

Views

Both frameworks come with a built-in template system (which can be swapped out with something else), but in this project we can skip the topic altogether as both frameworks are used only as back-end for transmitting JSON, without any HTML rendering involved.

However, let’s see how the rendering looks for the API we’re writing.

use Mojo::Base 'Mojolicious::Controller', -signatures;
sub check ($self) {
    $self->render(json => { status => 'OK' });
}
from django.http import JsonResponse
def status(request):
    return JsonResponse({ "status":  "OK" })

Nothing complicated here, just provide the right call.

Models

Django

Usually a model in context of web development means a database and here we are going to keep this assumption.

Django comes with a comprehensive object-relational mapping (ORM) system and it feels like the natural thing to use. I don’t think it makes much sense to use another ORM, or even to use raw SQL queries (though it is possible).

You usually start a Django project by defining the model. The Django ORM gives you the tools to manage the migrations, providing abstraction from the SQL. You need to define the field types and the relationships (joins and foreign keys) using the appropriate class methods.

For example:

from django.db import models
class User(AbstractUser):
    email = models.EmailField(null=False, blank=False)
    site = models.ForeignKey(Site, on_delete=models.CASCADE, related_name="site_users")
    libraries = models.ManyToManyField(Library, related_name="affiliated_users")
    expiration = models.DateTimeField(null=True, blank=True)
    created = models.DateTimeField(auto_now_add=True)
    last_modified = models.DateTimeField(auto_now=True)

These calls provide not only the SQL type to use, but also the validation. For example, the blank parameter is a validation option specifying whether Django will accept an empty value. It is different from the null option, which directly correlates to SQL. You can see we’re quite far from working with SQL, at least two layers of abstraction away.

In the example above, we’re also defining a foreign key between a site and a user (many-to-one), so each user belongs to one site. We also define a many-to-many relationship with the libraries record. I like how these relationships are defined, it’s very concise.

Thanks to these definitions, you get a whole admin console almost for free, which your admin users are sure to like. However, I’m not sure this is a silver bullet for solving all problems. With large tables and relationships the admin pages load slowly and they could become unusable very quickly. Of course, you can tune that by filtering out what you need and what you don’t, but that means things are not as simple as “an admin dashboard for free” — at the very least, there’s some configuring to do.

As for the query syntax, you usually need to call Class.objects.filter(). As you would expect from an ORM, you can chain the calls and finally get objects out of that, representing a database row, which, in turn, you can update or delete.

The syntax for the filter() call is based on the double underscore separator, so you can query over the relationships like this:

for agent in (Agent.objects.filter(canonical_agent_id__isnull=False)
              .prefetch_related('canonical_agent')
              .order_by('canonical_agent__name', 'name')
              .all()):
    agent.name = "Dummy"
    agent.save()

In this case, provided that we defined the foreign keys and the attributes in the model, we can search/​order across the relationship. The __isnull suffix, as you can imagine, results in a WHERE canonical_agent_id IS NOT NULL query, while in the order_by call we sort over the joined table using the name column. Looks nice and readable, with a touch of magic.

Of course things are never so simple, so you can build complex queries with the Q class combined with bytewise operators (&, |).

Here’s an example of a simple case-insensitive search for a name containing multiple words:

from django.db.models import Q

def api_list(request)
    term = request.GET.get('search')
    if term
        words = [ w for w in re.split(r'\W+', term) if w ]
        if words:
            query = Q(name__icontains=words.pop())
            while words:
                query = query & Q(name__icontains=words.pop())
            # logger.debug(query)
            agents = Agent.objects.filter(query).all()

To sum up, the ORM is providing everything you need to stay away from the SQL. In fact, it seems like Django doesn’t like you doing raw SQL queries.

Mojolicious and Perl

In the Perl world things are a bit different.

The Mojolicious tutorial doesn’t even mention the database. You can use any ORM or no ORM at all, if you prefer so. However, Mojolicious makes the DB handle available everywhere in the application.

You could use DBIx::Connector, DBIx::Class, Mojo::Pg (which was developed with Mojolicious), or whatever you prefer.

For example, to use Mojo::Pg in the main application class:

package MyApp;
use Mojo::Base 'Mojolicious', -signatures;
use Mojo::Pg;
use Data::Dumper::Concise;

sub startup ($self) {
    my $config = $self->plugin('NotYAMLConfig');
    $self->log->info("Starting up with " . Dumper($config));
    $self->helper(pg => sub {
                      state $pg = Mojo::Pg->new($config->{dbi_connection_string});
                  });

In the routes you can call $self->pg to get the database object.

The three approaches I’ve mentioned here are different.

DBIx::Connector is basically a way to get you a safe DBI handle across forks and DB connection failures.

Mojo::Pg gives you the ability to do abstract queries but also gives some convenient methods to get the results. I wouldn’t call it a ORM; from a query you usually gets hashes, not objects, you don’t need to define the database layout, and it won’t produce migrations for you, though there is some migration support.

Here’s an example of standard and abstract queries:

sub list_texts ($self) {
    if (my $sid = $self->param('sid')) {
        my $sql = 'SELECT * FROM texts WHERE sid = ? ORDER BY sorting_index';
        @all = $self->pg->db->query($sql, $sid)->hashes->each;
    }
    $self->render(json => { texts => \@all });

The query above can be rewritten with an abstract query, using the same module.

@all = $self->pg->db->select(texts => undef,
                             { sid => $sid },
                             { order_by => 'sorting_index' })->hashes->each;

If it’s a simple, static query, it’s basically a matter of taste; do you prefer to see the SQL or not? The second version is usually nicer if you want to build a different query depending on the parameters, so you add or remove keys to the hashes which maps to query and finally execute it.

Now, speaking of taste, for complex queries with a lot of joins I honestly prefer to see the SQL query instead of wondering if the abstract one is producing the correct SQL. This is true regardless of the framework. I have the impression that it is faster, safer, and cleaner to have the explicit SQL in the code rather than leaving future developers (including future me) to wonder if the magic is happening or not.

Finally, nothing stops you from using DBIx::Class, which is the best ORM for Perl, even if it’s not exactly light on dependencies.

It’s very versatile, it can build queries of arbitrary complexity, and you usually get objects out of the queries you make. It doesn’t come with an admin dashboard, it doesn’t enforce the data types and it doesn’t ship any validation by default (of course, you can implement that manually). The query syntax is very close to the Mojo::Pg one (which is basically SQL::Abstract).

The gain here is that, like in Django’s ORM, you can attach your methods to the classes representing the rows, so the data definitions live with the code operating on them.

However, the fact that it builds an object for each result means you’re paying a performance penalty which sometimes can be very high. I think this is a problem common to all ORMs, regardless of the language and framework you’re using.

The difference with Django is that once you have chosen it as your framework, you are basically already sold to the ORM. With Mojolicious and other Perl frameworks (Catalyst, Dancer), you can still make the decision and, at least in theory, change it down the road.

My recommendation would be to keep the model, both code and business logic, decoupled from the web-specific code. This is not really doable with Django, but is fully doable with the Perl frameworks. Just put the DB configuration in a dedicated file and the business code in appropriate classes. Then you should be able to, for example, run a script without loading the web and the whole framework configuration. In this ideal scenario, the web framework just provides the glue between the user and your model.

Controllers

Routes are defined similarly between Django and Mojolicious. Usually you put the code in a class and then point to it, attaching a name to it so you can reference it elsewhere. The language is different, the style is different, but they essentially do the same thing.

Django:

from django.urls import path
from . import views
urlpatterns = [
    path("api/agents/<int:agent_id>", views.api_agent_view, name="api_agent_view"),
]

The function views.api_agent_view will receive the request with the agent_id as a parameter.

Mojolicious:

sub startup ($self) {
    # ....
    my $r = $self->routes;
    $r->get('/list/:sid')->to('API#list_texts')->name('api_list_texts');
}

The ->to method is routing the request to the Myapp::Controller::API::list_texts, which will receive the request with the sid as parameter.

This is pretty much the core business of every web framework: routing a request to a given function.

Mojolicious has also the ability to chain the routes (pretty much taken from Catalyst). The typical use is authorization:

sub startup ($self) {
    ...
    my $r = $self->routes;
    my $api = $r->under('/api/v1', sub ($c) {
        if ($c->req->headers->header('X-API-Key') eq 'testkey') {
            return 1;
        }
        $c->render(text => 'Authentication required!', status => 401);
        return undef;
    }
    $api->get('/check')->to('API#check')->name('api_check');

So the request to /api/v1/check will first go in the first block and the chain will abort if the API key is not set in the header. Otherwise it will proceed to run the API module’s check function.

Conclusion

I’m Perl guy and so I’m a bit biased toward Mojolicious, but I also have a pragmatic approach to programming. Python is widely used — they teach it in schools — while Perl is seen as old-school, if not dead (like all the mature technologies). So, Python could potentially attract more developers to your project, and this is important to consider.

Learning a new language like Python is not a big leap; it and Perl are quite similar despite the different syntax. I’d throw Ruby in the same basket.

Of course both languages provide high quality modules you can use, and these two frameworks are an excellent example.

Building a Simple Web Scraper with Perl

Perl on Medium

Published by Mayur Koshti on Tuesday 04 February 2025 17:33

Extracting Specific Data from HTML Elements

Proposed Perl Changes (part 2)

Perl Hacks

Published by Dave Cross on Sunday 02 February 2025 17:18

At the end of my last post, we had a structure in place that used GitHub Actions to run a workflow every time a change was committed to the PPC repository. That workflow would rebuild the website and publish it on GitHub Pages.

All that was left for us to do was to write the middle bit – the part that actually takes the contents of the repo and creates the website. This involves writing some Perl.

There are three types of pages that we want to create:

  • The PPCs themselves, which are in Markdown and need to be converted to HTML pages
  • There are a few other pages that describe the PPC process, also in Markdown, which should be converted to HTML
  • An index page which should contain links to the other pages. This page should include a table listing various useful details about the PPCs so visitors can quickly find the ones they want more information on

I’ll be using the Template Toolkit to build the site, with a sprinkling of Bootstrap to make it look half-decent. Because there is a lot of Markdown-to-HTML conversion, I’ll use my Template::Provider::Pandoc module which uses Pandoc to convert templates into different formats.

Parsing PPCs and extracting data

The first thing I did was parse the PPCs themselves, extracting the relevant information. Luckily, each PPC has a “preamble” section containing most of the data we need. I created a basic class to model PPCs which included a really hacky parser to extract this information and create a object of the class.

Building the site

This class abstracts away a lot of the complexity which means the program that actually builds the site is less than eighty lines of code. Let’s look at it in a bit more detail:

#!/usr/bin/perl

use v5.38;
use JSON;
use File::Copy;
use Template;
use Template::Provider::Pandoc;

use PPC;

There’s nothing unusual in the first few lines. We’re just loading the modules we’re using. Note that use v5.38 automatically enables strict and warnings, so we don’t need to load them explicitly.

my @ppcs;

my $outpath = './web';
my $template_path = [ './ppcs', './docs', './in', './ttlib' ];

Here, we’re just setting up some useful variables. @ppcs will contain the PPC objects that we create. One potential clean-up here is to reduce the size of that list of input directories.

my $base = shift || $outpath;
$base =~ s/^\.//;
$base = "/$base" if $base !~ m|^/|;
$base = "$base/" if $base !~ m|/$|;

This is a slightly messy hack that is used to set a <base> tag in the HTML.

my $provider = Template::Provider::Pandoc->new({
  INCLUDE_PATH => $template_path,
});

my $tt = Template->new({
  LOAD_TEMPLATES => [ $provider ],
  INCLUDE_PATH => $template_path,
  OUTPUT_PATH => $outpath,
  RELATIVE => 1,
  WRAPPER => 'page.tt',
  VARIABLES => {
    base => $base,
  }
});

Here, we’re setting up our Template Toolkit processor. Some of you may not be familiar with using a Template provider module. These modules change how TT retrieves templates: if the template has an .md extension, then the text is passed though Pandoc to convert it from Markdown to HTML before it’s handed to the template processor. It’s slightly annoying that we need to pass the template include path to both the provider and the main template engine.

for (<ppcs/*.md>) {
  my $ppc = PPC->new_from_file($_);
  push @ppcs, $ppc;

  $tt->process($ppc->in_path, {}, $ppc->out_path)
    or warn $tt->error;
}

This is where we process the actual PPCs. For each PPC we find in the /ppcs directory, we create a PPC object, store that in the @ppcs variable and process the PPC document as a template – converting it from Markdown to HTML and writing it to the /web directory.

my $vars = {
  ppcs => \@ppcs,
};

$tt->process('index.tt', $vars, 'index.html')
  or die $tt->error;

Here’s where we process the index.tt file to generate the index.html for our site. Most of the template is made up of a loop over the @ppcs variable to create a table of the PPCs.

for (<docs/*.md>) {
  s|^docs/||;
  my $out = s|\.md|/index.html|r;

  $tt->process($_, {}, $out)
    or die $tt->error;
}

There are a few other documents in the /docs directory describing the PPC process. So in this step, we iterate across the Markdown files in that directory and convert each of them into HTML. Unfortunately, one of them is the template.md which is intended to be used as the template for new PPCs – so it would be handy if that one wasn’t converted to HTML. That’s something to think about in the future.

mkdir 'web/images';
for (<images/*>) {
  copy $_, "web/$_";
}

if (-f 'in/style.css') {
  copy 'in/style.css', 'web/style.css';
}

if (-f 'CNAME') {
  copy 'CNAME', "web/CNAME";
}

We’re on the home straight now. And this section is a bit scrappy. You might recall from the last post that we’re building the website in the /web directory. And there are a few other files that need to be copied into that directory in order that they are then deployed to the web server. So we just copy files. You might not know what a CNAME file is – it’s the file that GitHub Pages uses to tell their web server that you’re serving your website from a custom domain name.

my $json = JSON->new->pretty->canonical->encode([
  map { $_->as_data } @ppcs
]);

open my $json_fh, '>', 'web/ppcs.json' or die $!;

print $json_fh $json;

And, finally, we generate a JSON version of our PPCs and write that file to the /web directory. No-one asked for this, but I thought someone might find this data useful. If you use this for something interesting, I’d love to hear about it.

Other bits and pieces

A few other bits and pieces to be aware of.

  • I use a page wrapper to ensure that every generated page has a consistent look and feel
  • The navigation in the page wrapper is hard-coded to contain links to the pages in /docs. It would make sense to change that so it’s generated from the contents of that directory
  • I used a Javascript project called Simple Datatables to turn the main table into a data table. That means it’s easy to sort, page and filter the data that’s displayed
  • There’s a basic hack that hides the email addresses when they appear in the main table. But it’s currently not applied to the PPC pages themselves. I’ve idly contemplated writing a TT filter that would be called something like Template::Filter::RemoveEmailAddresses

In conclusion

But there you are. That’s the system that I knocked together in a few hours a couple of weeks ago. As I mentioned in the last post, the idea was to make the PPC process more transparent to the Perl community outside of the Perl 5 Porters and the Perl Steering Council. I hope it achieves that and, further, I hope it does so in a way that keeps out of people’s way. As soon as someone updates one of the documents in the repository, the workflow will kick in and publish a new version of the website. There are a few grungy corners of the code and there are certainly some improvements that can be made. I’m hoping that once the pull request is merged, people will start proposing new pull requests to add new features.

The post Proposed Perl Changes (part 2) first appeared on Perl Hacks.

What's new on CPAN - December 2024

perl.com

Published on Sunday 02 February 2025 12:42

Welcome to “What’s new on CPAN”, a curated look at last month’s new CPAN uploads for your reading and programming pleasure. Enjoy!

APIs & Apps

  • Automatically generate changelogs based on Git commit history with App::Changelog (OLOOEEZ)
  • Webservice::Sendy::API (OODLER) provides an interface to the Sendy e-mail marketing service, with the purpose of superseding a comparable module that is no longer maintained
  • Manage standup preparation and presentation notes with App::Standup::Diary (SMONFF)
  • Bluesky (SANKO) provides a high-level interface to the Bluesky social network
  • App::datasection (PLICEASE) lets you manage the DATA section of source files from the command line
  • Repeat a command an arbitrary number of times using App::repeat (PERLANCAR)

Config & Devops

Data

Development & Version Control

  • Programmatically update the DATA section of source files with Data::Section::Writer (PLICEASE). Compatible with formats of many DATA section-reading modules
  • Use Test::SpellCheck (PLICEASE) to spellcheck POD within your tests
  • Tie::Hash::DataSection (PLICEASE) lets you access the DATA section of source files via tied hash

Language & International

Science & Mathematics

Web

(dxxxiii) 6 great CPAN modules released last week

Niceperl

Published by Unknown on Saturday 01 February 2025 21:49

Updates for great CPAN modules released last week. A module is considered great if its favorites count is greater or equal than 12.

  1. App::DBBrowser - Browse SQLite/MySQL/PostgreSQL databases and their tables interactively.
    • Version: 2.423 on 2025-01-27, with 14 votes
    • Previous CPAN version: 2.422 was 4 days before
    • Author: KUERBIS
  2. App::Netdisco - An open source web-based network management tool.
    • Version: 2.082001 on 2025-01-29, with 17 votes
    • Previous CPAN version: 2.081004 was 10 days before
    • Author: OLIVER
  3. Crypt::JWT - JSON Web Token
    • Version: 0.036 on 2025-01-26, with 26 votes
    • Previous CPAN version: 0.035 was 1 year, 3 months, 23 days before
    • Author: MIK
  4. IO::Interactive - Utilities for interactive I/O
    • Version: 1.026 on 2025-01-26, with 16 votes
    • Previous CPAN version: v0.0.3 was 18 years, 11 months, 9 days before
    • Author: BRIANDFOY
  5. Text::CSV_XS - Comma-Separated Values manipulation routines
    • Version: 1.60 on 2025-01-31, with 102 votes
    • Previous CPAN version: 1.59 was 26 days before
    • Author: HMBRAND
  6. Unicode::Tussle - Tom's Unicode Scripts So Life is Easier
    • Version: 1.121 on 2025-01-29, with 13 votes
    • Previous CPAN version: 1.119 was 26 days before
    • Author: BRIANDFOY

Enhancing your MIDI devices with Perl

perl.com

Published on Wednesday 29 January 2025 00:00

This article was originally published at fuzzix.org.

Introduction

These days, even modestly priced MIDI hardware comes stuffed with features. These features may include a clock, sequencer, arpeggiator, chord voicing, Digital Audio Workstation (DAW) integration, and transport control.

Fitting all this into a small device’s form factor may result in some amount of compromise — perhaps modes aren’t easily combined, or some amount of menu diving is required to switch between modes. Your device may even lack the precise functionality you require.

This post will walk through the implementation of a pair of features to augment those found in a MIDI keyboard — a M-Audio Oxygen Pro 61 in this case, though the principle should apply to any device.

Feature 1 : Pedal Tone

A pedal tone (or pedal note, or pedal point) is a sustained single note, over which other potentially dissonant parts are played. A recent video by Polarity Music opened with some exploration of using a pedal tone in Bitwig Studio to compose progressions. In this case, the pedal tone was gated by the keyboard, and the fifth interval of the played note was added resulting in a three note chord for a single played note. This simple setup resulted in some dramatic progressions.

There are, of course, ways to achieve this effect in other DAW software. I was able to use FL Studio’s Patcher to achieve a similar result with two instances of VFX Key Mapper:

FL Studio Patcher with MIDI input routed to FLEX and two instances of VFX Key Mapper

One instance of VFX Key Mapper transposes the incoming note by 7 semitones. The other will replace any incoming note. Alongside the original note, these mappers are routed to FLEX with a Rhodes sample set loaded. It sounds like this (I’m playing just one or two keys at a time here):

A similar method can be used to patch this in other modular environments. In VCV Rack, a pair of quantizers provide the fifth-note offset and pedal tone signals. The original note, the fifth, and the pedal tone are merged and sent to the Voltage Controlled Oscillator (VCO). The gate signal from the keyboard triggers an envelope to open the Voltage Controlled Amplifier (VCA) and Voltage Controlled Filter (VCF).

VCV Rack with the patch described above

This patch is a little less flexible than the FL Studio version — further work is required to support playing multiple notes on the keyboard, for example.

The FL Studio version also has a downside. The played sequence only shows the played notes in the piano roll, not the additional fifth and pedal tone. Tweaking timing and velocity, or adding additional melody is not trivial - any additional notes in the piano roll will play three notes in the Patcher instrument.

If we could coax our MIDI device into producing these additional notes, there would be no need for tricky patching plus we might end up with a more flexible result.

Perl Tone

The approach described here will set up a new software-defined MIDI device which will proxy events from our hardware, while applying any number of filters to events before they are forwarded. These examples will make use of Perl bindings to RtMidi.

We’re going to need a little bit of framework code to get started. While the simplest RtMidi callback examples just sleep to let the RtMidi event loop take over, we may wish to schedule our own events later. I went into some detail previously on Perl, IO::Async, and the RtMidi event loop.

The framework will need to set up an event loop, manage two or more MIDI devices, and store some state to influence decision-making within filter callback functions. Let’s start with those:

use v5.40;
use experimental qw/ class /;

class MidiFilter {
    field $loop       = IO::Async::Loop->new;
    field $midi_ch    = IO::Async::Channel->new;
    field $midi_out   = RtMidiOut->new;
    field $input_name = $ARGV[0];
    field $filters    = {};
    field $stash      = {};

Aside from our event $loop and $midi_out device, there are fields for getting $input_name from the command line, a $stash for communication between callbacks and a store for callback $filters. The callback store will hold callbacks keyed on MIDI event names, e.g. “note_on”. The channel $midi_ch will be used to receive events from the MIDI input controller.

Methods for creating new filters and accessing the stash are as follows:

    method add_filter( $event_type, $action ) {
        push $filters->{ $event_type }->@*, $action;
    }

    method stash( $key, $value = undef ) {
        $stash->{ $key } = $value if defined $value;
        $stash->{ $key };
    }

Adding a filter requires an event type, plus a callback. Callbacks are pushed into $filters for each event type in the order they are declared. If a $value is supplied while accessing the stash, it will be stored for the given $key. The value for the given $key is returned in any case.

Let’s add some methods for sending MIDI events:

    method send( $event ) {
        $midi_out->send_event( $event->@* );
    }

    method delay_send( $delay_time, $event ) {
        $loop->add(
            IO::Async::Timer::Countdown->new(
                delay => $delay_time,
                on_expire => sub { $self->send( $event ) }
            )->start
        )
    }

The send method simply passes the supplied $event to the configured $midi_out device. The delay_send method does the same thing, except it waits for some specified amount of time before sending.

Methods for filtering incoming MIDI events are as follows:

    method _filter_and_forward( $event ) {
        my $event_filters = $filters->{ $event->[0] } // [];

        for my $filter ( $event_filters->@* ) {
            return if $filter->( $self, $event );
        }

        $self->send( $event );
    }

    async method _process_midi_events {
        while ( my $event = await $midi_ch->recv ) {
            $self->_filter_and_forward( $event );
        }
    }

These methods are denoted as “private” via the ancient mechanism of “Add an underscore to the start of the name to indicate that this method shouldn’t be used”. The documentation for Object::Pad (which acts as an experimental playground for perl core class features) details the lexical method feature, which allows for block scoped methods unavailable outside the class. The underscore technique will serve us for now.

The _process_midi_events method awaits receiving a message, passing each message received to _filter_and_forward. The _filter_and_forward method retrieves callbacks for the current event type (The first element of the $event array) and delegates the event to the available callbacks. If no callbacks are available, or if none of the callbacks return true, the event is forwarded to the MIDI output device untouched.

The final pieces are the setup of MIDI devices and the communications channel:

    method _init_out {
        return $midi_out->open_port_by_name( qr/loopmidi/i )
            if ( grep { $^O eq $_ } qw/ MSWin32 cygwin / );

        $midi_out->open_virtual_port( 'Mister Fancy Pants' );
    }

    method go {
        my $midi_rtn = IO::Async::Routine->new(
            channels_out => [ $midi_ch ],
            code => sub {
                my $midi_in = RtMidiIn->new;
                $midi_in->open_port_by_name( qr/$input_name/i ) ||
                    die "Unable to open input device";

                $midi_in->set_callback_decoded(
                    sub( $ts, $msg, $event, $data ) {
                        $midi_ch->send( $event );
                    }
                );

                sleep;
            }
        );
        $loop->add( $midi_rtn );
        $loop->await( $self->_process_midi_events );
    }

    ADJUST {
        $self->_init_out;
    }

The _init_out method takes care of some shortcomings in Windows MIDI, which does not support the creation of virtual ports. On this platform messages will be routed via loopMIDI. On other platforms the virtual MIDI port “RtMidi Output Client:Mister Fancy Pants” is created. The ADJUST block assures this is done during construction of the MidiFilter instance.

The go method creates a routine which instantiates a RtMidi instance, and connects to the hardware MIDI device specified on the command line. A callback is created to send incoming events over the communications channel, then we simply sleep and allow RtMidi’s event loop to take over the routine.

The final step is to await _process_midi_events, which should process events from the hardware until the program is terminated.

Writing Callbacks

Callbacks are responsible for managing the stash, and sending filtered messages to the output device. A callback receives the MidiFilter instance and the incoming event.

In order to implement the pedal tone feature described earlier, we need to take incoming “note on” events and transform them into three “note on” events, then send these to the output MIDI device. A similar filter is needed for “note off” — all three notes must be stopped after being played:

use constant PEDAL => 55; # G below middle C

sub pedal_notes( $note ) {
    ( PEDAL, $note, $note + 7 );
}

sub pedal_tone( $mf, $event ) {
    my ( $ev, $channel, $note, $vel ) = $event->@*;
    $mf->send( [ $ev, $channel, $_, $vel ] ) for pedal_notes( $note );
    true;
}

my $mf = MidiFilter->new;

$mf->add_filter( note_on  => \&pedal_tone );
$mf->add_filter( note_off => \&pedal_tone );

$mf->go;

We start by setting a constant containing a MIDI note value for the pedal tone. The sub pedal_notes returns this pedal tone, the played note, and its fifth. The callback function pedal_tone sends a MIDI message to output for each of the notes returned by pedal_notes. Note the callback yields true in order to prevent falling through to the default action. The callback function is applied to both the “note on” and “note off” events. We finish by calling the go method of our MidiFilter instance in order to await and process incoming messages from the keyboard.

The last step is to run the script:

$ ./midi-filter.pl ^oxy

Rather than specify a fully qualified device name, we can pass in a regex which should match any device whose name starts with “oxy” - there is only one match on my system, the Oxygen Pro.

The device “RtMidi Output Client:Mister Fancy Pants” or “loopMIDI”, depending on your platform, can now be opened in the DAW to receive played notes routed through the pedal tone filter. This filter is functionally equivalent to the FL Studio Patcher patch from earlier, with the added benefit of being DAW-agnostic. If recording a sequence from this setup, all notes will be shown in the piano roll.

Feature 2 : Controller Banks

The Oxygen Pro has four “banks” or sets of controls. Each bank can have different assignments or behaviour for the knobs, keys, sliders, and pads.

A problem with this feature is that there is limited feedback when switching banks - it’s not always visible on screen, depending on the last feature used. Switching banks does not effect the keyboard. Also, perhaps 4 banks isn’t enough.

A simpler version of this feature might be to use pads to select the bank, and the bank just sets the MIDI channel for all future events. There are 16 pads on the device, for each of 16 channels. It should be more obvious which bank (or channel) was the last selected, and if not, just select it again.

This can also be applied to the keyboard by defining callbacks for “note on” and “note off” (or rather, modifying the existing ones). For this device, we also need callbacks for “pitch wheel change” and “channel aftertouch”. The callback for “control change” should handle the mod wheel without additional special treatment.

The pads on this device are set up to send notes on channel 10, usually reserved for drums. Watching for specific notes incoming on channel 10, and stashing the corresponding channel should be enough to allow other callbacks to route events appropriately:

sub set_channel( $mf, $event ) {
    my ( $ev, $channel, $note, $vel ) = $event->@*;
    return false unless $channel == 9;

    my $new_channel = $note - 36;
    $mf->stash( channel => $new_channel );
    true;
}

$mf->add_filter( note_on  => \&set_channel );
$mf->add_filter( note_on  => \&pedal_tone );
$mf->add_filter( note_off => \&set_channel );
$mf->add_filter( note_off => \&pedal_tone );

If the event channel sent to set_channel is not 10 (or rather 9, as we are working with zero-indexed values) we return false, allowing the filter to fall through to the next callback. Otherwise, the channel is stashed and we stop processing further callbacks. As the pad notes are numbered 36 to 51, the channel can be derived by subtracting 36 from the incoming note.

This callback needs to be applied to both “note on” and “note off” events — remember, there is an existing “note off” callback which will erroneously generate three “note off” events unless intercepted. The order of callbacks is also important. If pedal_tone were first, it would prevent set_channel from happening at all.

We can now retrieve the stashed channel in pedal_tone:

sub pedal_tone( $mf, $event ) {
    my ( $ev, $channel, $note, $vel ) = $event->@*;
    $channel = $mf->stash( 'channel' ) // $channel;
    $mf->send( [ $ev, $channel, $_, $vel ] ) for pedal_notes( $note );
    true;
}

The final piece of this feature is to route some additional event types to the selected channel:

sub route_to_channel( $mf, $event ) {
    my ( $ev, $channel, @params ) = $event->@*;
    $channel = $mf->stash( 'channel' ) // $channel;
    $mf->send( [ $ev, $channel, @params ] );
    true;
}

$mf->add_filter( pitch_wheel_change  => \&route_to_channel );
$mf->add_filter( control_change      => \&route_to_channel );
$mf->add_filter( channel_after_touch => \&route_to_channel );

We can now have different patches respond to different channels, and control each patch with the entire MIDI controller (except the pads, of course).

Pickup

You may have spotted a problem with the bank feature. Imagine we are on bank 1 and we set knob 1 to a low value. We then switch to bank 2, and turn knob 1 to a high value. When we switch back to bank 1 and turn the knob, the control will jump to the new high value.

A feature called “pickup” (or “pick up”) allows for bank switching by only engaging the control for knob 1, bank 1 when the knob passes its previous value. That is, the control only starts changing again when the knob goes beyond its previous low value.

Pickup could be implemented in our filters by stashing the last value for each control/channel combination. This would not account for knob/channel combinations which were never touched - large jumps in control changes would still be possible, with no way to prevent them. One would need to set initial values by tweaking all controls on all channels before beginning a performance.

Many DAWs and synths support pickup, and it is better handled there rather than implementing a half-baked and inconsistent solution here.

Feature 1a: Strum

So far we have not taken complete advantage of our event loop. You might remember we implemented a delay_send method which accepts a delay time alongside the event to be sent.

We can exploit this to add some expressiveness (of a somewhat robotic variety) to the pedal tone callback:

use constant STRUM_DELAY => 0.05; # seconds

sub pedal_tone( $mf, $event ) {
    my ( $ev, $channel, $note, $vel ) = $event->@*;
    $channel = $mf->stash( 'channel' ) // $channel;
    my @notes = pedal_notes( $note );

    $mf->send( [ $ev, $channel, shift @notes, $vel ] );

    my $delay_time = 0;
    for my $note ( @notes ) {
        $delay_time += STRUM_DELAY;
        $mf->delay_send( $delay_time, [ $ev, $channel, $note, $vel ] );
    }
    true;
}

We now store the notes and send the first immediately. Remaining snotes are sent with an increasing delay. The delay_send method will schedule the notes and return immediately, allowing further events to be processed.

Scheduling the “note off” events is also a good idea. Imagine a very quick keypress on the keyboard. If the keyboard note off happens before we finish sending the scheduled notes, sending all “note off” events instantaneously would leave some scheduled notes ringing out. Scheduling “note off” events with the same cadence as the “note on” events should prevent this. That is, the same callback can continue to service both event types.

With that change, playing a single key at a time sounds like this:

Demo Patch

This VCV Rack patch should demonstrate the complete set of features built in this post. On the right is an additive voice which responds to MIDI channel 2. The mod wheel is pacthed to control feedback which should influence the brightness of the sound.

The left side is a typical subtractive patch controlled by channel 3, with an envelope controlling a VCA and VCF to shape incoming sawtooths. The mod wheel is patched to allow a Low-Frequency Oscillator (LFO) to frequency modulate the VCO for a vibrato effect.

VCV Rack patch with FM OP controlled by channel 2 and a subtractive patch controlled by channel 3

This is what it sounds like - we first hear the additive patch on channel 2, then the subtractive one on channel 3. Switching channels is as simple as pushing the respective pad on the controller:

Not very exciting, I know — it’s just to demonstrate the principle.

Keen eyes may have spotted an issue with the bank switching callback. When switching to channel 10, then played keyboard keys which overlap with those assigned to the pads may dump you unexpectedly onto a different channel! I will leave resolving this as an exercise for the reader — perhaps one of the pads could be put to another use.

Latency

While I haven’t measured latency of this project specifically, previous experiments with async processing of MIDI events in Perl showed a latency of a fraction of a millisecond. I expect the system described in this post to have a similar profile.

Source Code

There is a gist with the complete source of the MidiFilter project.

It’s also included below:

#!/usr/bin/env perl

# There is currently an issue with native callbacks and threaded perls, which leads to a crash.
# As of Jan 2025, all the available pre-built perls I am aware of for Windows are threaded.
# I was able to work around this by building an unthreaded perl with cygwin / perlbrew... but
# you might want to just try this on Linux or Mac instead :)

use v5.40;
use experimental qw/ class /;

class MidiFilter {
    use IO::Async::Loop;
    use IO::Async::Channel;
    use IO::Async::Routine;
    use IO::Async::Timer::Countdown;
    use Future::AsyncAwait;
    use MIDI::RtMidi::FFI::Device;

    field $loop       = IO::Async::Loop->new;
    field $midi_ch    = IO::Async::Channel->new;
    field $midi_out   = RtMidiOut->new;
    field $input_name = $ARGV[0];
    field $filters    = {};
    field $stash      = {};

    method _init_out {
        return $midi_out->open_port_by_name( qr/loopmidi/i )
            if ( grep { $^O eq $_ } qw/ MSWin32 cygwin / );

        $midi_out->open_virtual_port( 'Mister Fancy Pants' );
    }

    method add_filter( $event_type, $action ) {
        push $filters->{ $event_type }->@*, $action;
    }

    method stash( $key, $value = undef ) {
        $stash->{ $key } = $value if defined $value;
        $stash->{ $key };
    }

    method send( $event ) {
        $midi_out->send_event( $event->@* );
    }

    method delay_send( $dt, $event ) {
        $loop->add(
            IO::Async::Timer::Countdown->new(
                delay => $dt,
                on_expire => sub { $self->send( $event ) }
            )->start
        )
    }

    method _filter_and_forward( $event ) {
        my $event_filters = $filters->{ $event->[0] } // [];

        for my $filter ( $event_filters->@* ) {
            return if $filter->( $self, $event );
        }

        $self->send( $event );
    }

    async method _process_midi_events {
        while ( my $event = await $midi_ch->recv ) {
            $self->_filter_and_forward( $event );
        }
    }

    method go {
        my $midi_rtn = IO::Async::Routine->new(
            channels_out => [ $midi_ch ],
            code => sub {
                my $midi_in = RtMidiIn->new;
                $midi_in->open_port_by_name( qr/$input_name/i ) ||
                    die "Unable to open input device";

                $midi_in->set_callback_decoded(
                    sub( $ts, $msg, $event, $data ) {
                        $midi_ch->send( $event );
                    }
                );

                sleep;
            }
        );
        $loop->add( $midi_rtn );
        $loop->await( $self->_process_midi_events );
    }

    ADJUST {
        $self->_init_out;
    }
}

use constant PEDAL => 55; # G below middle C
use constant STRUM_DELAY => 0.05; # seconds

sub pedal_notes( $note ) {
    ( PEDAL, $note, $note + 7 );
}

sub pedal_tone( $mf, $event ) {
    my ( $ev, $channel, $note, $vel ) = $event->@*;
    $channel = $mf->stash( 'channel' ) // $channel;
    my @notes = pedal_notes( $note );

    $mf->send( [ $ev, $channel, shift @notes, $vel ] );

    my $dt = 0;
    for my $note ( @notes ) {
        $dt += STRUM_DELAY;
        $mf->delay_send( $dt, [ $ev, $channel, $note, $vel ] );
    }
    true;
}

sub set_channel( $mf, $event ) {
    my ( $ev, $channel, $note, $vel ) = $event->@*;
    return false unless $channel == 9;

    my $new_channel = $note - 36;
    $mf->stash( channel => $new_channel );
    true;
}

sub route_to_channel( $mf, $event ) {
    my ( $ev, $channel, @params ) = $event->@*;
    $channel = $mf->stash( 'channel' ) // $channel;
    $mf->send( [ $ev, $channel, @params ] );
    true;
}

my $mf = MidiFilter->new;

$mf->add_filter( note_on  => \&set_channel );
$mf->add_filter( note_on  => \&pedal_tone );
$mf->add_filter( note_off => \&set_channel );
$mf->add_filter( note_off => \&pedal_tone );

$mf->add_filter( pitch_wheel_change  => \&route_to_channel );
$mf->add_filter( control_change      => \&route_to_channel );
$mf->add_filter( channel_after_touch => \&route_to_channel );

$mf->go;

BEGIN {
    $ENV{PERL_FUTURE_DEBUG} = true;
}

Conclusion

After describing some of the shortcomings of a given MIDI controller, and an approach for adding to a performance within a DAW, we walked through the implementation of a framework to proxy a MIDI controller’s facilities through software-defined filters.

The filters themselves are implemented as simple callbacks which may decide to store data for later use, change the parameters of the incoming message, forward new messages to the virtual hardware proxy device, and/or cede control to further callbacks in a chain.

Callbacks are attached to MIDI event types and a single callback function may be suitable to attach to multiple event types.

We took a look at some simple functionality to build upon the device — a filter which turns a single key played into a strummed chord with a pedal tone, and a bank-switcher which sets the channel of all further events from the hardware device.

These simple examples served to demonstrate the principle, but the practical limit to this approach is your own imagination. My imagination is limited, but some next steps might be to add “humanising” random fluctuations to sequences, or perhaps extending the system to combine the inputs of multiple hardware devices into one software-defined device with advanced and complex facilities. If your device has a DAW mode, you may be able to implement visual feedback for the actions and state of the virtual device. You could also coerce non-MIDI devices, e.g. Gamepads, into sending MIDI messages.

Proposed Perl Changes

Perl Hacks

Published by Dave Cross on Sunday 26 January 2025 16:36

Many thanks to Dave Cross for providing an initial implementation of a PPC index page.

Perl Steering Council meeting #177

Maybe I should explain that in a little more detail. There’s a lot of detail, so it will take a couple of blog posts.

About two weeks ago, I got a message on Slack from Phillippe Bruhat, a member of the Perl Steering Council. He asked if I would have time to look into building a simple static site based on the GitHub repo that stores the PPCs that are driving a lot of Perl’s development. The PSC thought that reading these important documents on a GitHub page wasn’t a great user experience and that turning it into a website might lead to more people reading the proposals and, hence, getting involved in discussions about them.

I guess they had thought of me as I’ve written a bit about GitHub Pages and GitHub Actions over the last few years and these were exactly the technologies that would be useful in this project. In fact, I have already created a website that fulfills a similar role for the PSC meeting minutes – and I know they know about that site because they’ve been maintaining it themselves for several months.

I was about to start working with a new client, but I had a spare day, so I said I’d be happy to help. And the following day, I set to work.

Reviewing the situation

I started by looking at what was in the repo.

All of these documents were in Markdown format. The PPCs seemed to have a pretty standardised format.

Setting a target

Next, I listed what would be essential parts of the new site.

  • An index page containing a list of the PPCs – which links to a page for each of the PPCs
  • The PPCs, converted to HTML
  • The other documents, also converted to HTML
  • The site should be automatically rebuilt whenever a change is made to any of the input files

This is exactly the kind of use case that a combination of GitHub Pages and GitHub Actions is perfect for. Perhaps it’s worth briefly describing what those two GitHub features are.

Introducing GitHub Pages

GitHub Pages is a way to run a website from a GitHub repo. The feature was initially introduced to make it easy to run a project website alongside your GitHub repo – with the files that make up the website being stored in the same repo as the rest of your code. But, as often happens with useful features, people have been using the feature for all sorts of websites. The only real restriction is that it only supports static sites – you cannot use GitHub’s servers to run any kind of back-end processing.

The simplest way to run a GitHub Pages website is to construct it manually, put the HTML, CSS and other files into a directory inside your repo called /docs, commit those files and go to the “Settings -> Pages” settings for your repo to turn on Pages for the repo. Within minutes your site will appear at the address USERNAME.github.repo/REPONAME. Almost no-one uses that approach.

The most common approach is to use a static site builder to build your website. The most popular is Jekyll – which is baked into the GitHub Pages build/deploy cycle. You edit Markdown files and some config files. Then each time you commit a change to the repo, GitHub will automatically run Jekyll over your input files, generate your website and deploy that to its web servers. We’re not going to do that.

We’ll use the approach I’ve used for many GitHub Pages sites. We’ll use GitHub Actions to do the equivalent of the “running Jekyll over your input files to generate your website” step. This gives us more flexibility and, in particular, allows us to generate the website using Perl.

Introducing GitHub Actions

GitHub Actions is another feature that was introduced with one use case in mind but which has expanded to be used for an incredible range of ideas. It was originally intended for CI/CD – a replacement for systems like Jenkins or Travis CI – but that only accounts for about half of the things I use it for.

A GitHub Actions run starts in response to various triggers. You can then run pretty much any code you want on a virtual machine, generating useful reports, updating databases, releasing code or (as in this case) generating a website.

GitHub Actions is a huge subject (luckily, there’s a book!) We’re only going to touch on one particular way of using it. Our workflow will be:

  • Wait for a commit to the repo
  • Then regenerate the website
  • And publish it to the GitHub Pages web servers

Making a start

Let’s make a start on creating a GitHub Actions workflow to deal with this. Workflows are defined in YAML files that live in the .github/workflows directory in our repo. So I created the relevant directories and a file called buildsite.yml.

There will be various sections in this file. We’ll start simply by defining a name for this workflow:

name: Generate website

The next section tells GitHub when to trigger this workflow. We want to run it when a commit is pushed to the “main” branch. We’ll also add the “workflow_dispatch” trigger, which allows us to manually trigger the workflow – it adds a button to the workflow’s page inside the repo:

on:
  push:
    branches: 'main'
  workflow_dispatch:

The main part of the workflow definition is the next section – the one that defines the jobs and the individual steps within them. The start of that section looks like this:

jobs:
  build:
    runs-on: ubuntu-latest
    container: perl:latest

    steps:
    - name: Perl version
      run: perl -v

    - name: Checkout
      uses: actions/checkout@v4

The “build” there is the name of the first job. You can name jobs anything you like – well anything that can be the name of a valid YAML key. We then define the working environment for this job – we’re using a Ubuntu virtual machine and on that, we’re going to download and run the latest Perl container from the Docker Hub.

The first step isn’t strictly necessary, but I like to have a simple but useful step to ensure that everything is working. This one just prints the Perl version to the workflow log. The second step is one you’ll see in just about every GitHub Actions workflow. It uses a standard, prepackaged library (called an “action”) to clone the repo to the container.

The rest of this job will make much more sense once I’ve described the actual build process in my next post. But here it is for completeness:

- name: Install pandoc and cpanm
      run: apt-get update && apt-get install -y pandoc cpanminus

    - name: Install modules
      run: |
        cpanm --installdeps --notest .

    - name: Get repo name into environment
      run: |
        echo "REPO_NAME=${GITHUB_REPOSITORY#$GITHUB_REPOSITORY_OWNER/}" >> $GITHUB_ENV

    - name: Create pages
      env:
        PERL5LIB: lib
        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
      run: |
        mkdir -p web
        perl bin/build $REPO_NAME

    - name: Update pages artifact
      uses: actions/upload-pages-artifact@v3
      with:
        path: web/

Most of the magic (and all of the Perl – for those of you who were wondering) happens in the “Create pages” step. If you can’t wait until the next post, you can find the build program and the class it uses in the repo.

But for now, let’s skim over that and look at the final step in this job. That uses another pre-packaged action to build an artifact (which is just a tarball) which the next job will deploy to the GitHub Pages web server. You can pass it the name of a directory and it will build the artifact from that directory. So you can see that we’ll be building the web pages in the web/ directory.

The second (and final) job is the one that actually carries out the deployment. It looks like this:

deploy:
    needs: build
    permissions:
      pages: write
      id-token: write
    environment:
      name: github-pages
      url: ${{ steps.deployment.outputs.page_url }}
    runs-on: ubuntu-latest
    steps:
      - name: Deploy to GitHub Pages
        id: deployment
        uses: actions/deploy-pages@v4

It uses another standard, pre-packaged action and most of the code here is configuration. One interesting line is the “need” key. That tells the workflow engine that the “build” job needs to have completed successfully before this job can be run.

But once it has run, the contents of our web/ directory will be on the GitHub Pages web server and available for our adoring public to read.

All that is left is for us to write the steps that will generate the website. And that is what we’ll be covering in my next post.

Oh, and if you want to preview the site itself, it’s at https://davorg.dev/PPCs/ and there’s an active pull request to merge it into the main repo.

The post Proposed Perl Changes first appeared on Perl Hacks.

(dxxxii) 15 great CPAN modules released last week

Niceperl

Published by Unknown on Saturday 25 January 2025 23:32

Updates for great CPAN modules released last week. A module is considered great if its favorites count is greater or equal than 12.

  1. App::DBBrowser - Browse SQLite/MySQL/PostgreSQL databases and their tables interactively.
    • Version: 2.422 on 2025-01-23, with 14 votes
    • Previous CPAN version: 2.421 was 14 days before
    • Author: KUERBIS
  2. App::Netdisco - An open source web-based network management tool.
    • Version: 2.081004 on 2025-01-19, with 17 votes
    • Previous CPAN version: 2.081003 was 19 days before
    • Author: OLIVER
  3. DBI - Database independent interface for Perl
    • Version: 1.647 on 2025-01-20, with 275 votes
    • Previous CPAN version: 1.646 was 9 days before
    • Author: HMBRAND
  4. Function::Parameters - define functions and methods with parameter lists ("subroutine signatures")
    • Version: 2.002005 on 2025-01-19, with 60 votes
    • Previous CPAN version: 2.002004 was 1 year, 6 months, 4 days before
    • Author: MAUKE
  5. Math::BigInt - Pure Perl module to test Math::BigInt with scalars
    • Version: 2.003004 on 2025-01-23, with 13 votes
    • Previous CPAN version: 2.003003 was 7 months, 27 days before
    • Author: PJACKLAM
  6. Module::CoreList - what modules shipped with versions of perl
    • Version: 5.20250120 on 2025-01-20, with 43 votes
    • Previous CPAN version: 5.20241220 was 1 month before
    • Author: BINGOS
  7. Net::Curl - Perl interface for libcurl
    • Version: 0.57 on 2025-01-22, with 18 votes
    • Previous CPAN version: 0.56 was 9 months, 21 days before
    • Author: SYP
  8. PDL - Perl Data Language
    • Version: 2.099 on 2025-01-23, with 57 votes
    • Previous CPAN version: 2.098 was 20 days before
    • Author: ETJ
  9. perl - The Perl 5 language interpreter
    • Version: 5.040001 on 2025-01-18, with 427 votes
    • Previous CPAN version: 5.40.1 was 2 days before
    • Author: SHAY
  10. Spreadsheet::ParseXLSX - parse XLSX files
    • Version: 0.36 on 2025-01-24, with 19 votes
    • Previous CPAN version: 0.35 was 10 months, 5 days before
    • Author: NUDDLEGG
  11. SPVM - The SPVM Language
    • Version: 0.990042 on 2025-01-22, with 34 votes
    • Previous CPAN version: 0.990039 was 5 days before
    • Author: KIMOTO
  12. Syntax::Construct - Explicitly state which non-feature constructs are used in the code.
    • Version: 1.040 on 2025-01-20, with 14 votes
    • Previous CPAN version: 1.038 was 3 months, 19 days before
    • Author: CHOROBA
  13. Test::Simple - Basic utilities for writing tests.
    • Version: 1.302209 on 2025-01-22, with 190 votes
    • Previous CPAN version: 1.302207 was 25 days before
    • Author: EXODIST
  14. Test2::Harness - A new and improved test harness with better Test2 integration.
    • Version: 1.000156 on 2025-01-22, with 17 votes
    • Previous CPAN version: 1.000155 was 1 year, 3 months, 19 days before
    • Author: EXODIST
  15. YAML::PP - YAML 1.2 Processor
    • Version: v0.38.1 on 2025-01-24, with 17 votes
    • Previous CPAN version: v0.38.0 was 11 months, 26 days before
    • Author: TINITA