Using AI, Creating Regular Expressions in TiddlyWiki Is Easy

I have never gone beyond the basic regular expression, while it is very powerful.

Here is a tested procedure using AI.
STEP 1

  1. Open Regexer
  2. Describe in clear way what you want
  3. Press create

STEP 2
4. Check the test section and make sure the created pattern works as expected.
5. For make hundred percent sure
5.1. do some experiments here regex101: build, test, and debug regex
5.2. and write some test case in your TiddlyWiki

STEP 3
6. Use it in your TiddlyWiki

IMPORTANT NOTES:

  1. I tested few AI site generating regex, most produces incorrect patterns when you ask a complex question, but Regexer works great in most cases
  2. NEVER trust the AI answer before carefully testing them. Most of the time AI gives inaccurate answers.
1 Like

Example 1:
Use regular expression to extract all valid emails from a tiddler.

STEP 1.

  1. Open Regexer
  2. In create a regular text that, enter: finds valid emails
  3. Press enter

STEP 2.
4. Check the test section and make sure the created pattern works as expected.
5. For make hundred percent sure
5.1. do some experiments here regex101: build, test, and debug regex
5.2. and write some test case in your TiddlyWiki

STEP 3.
Create a procedure to take a tiddler text and extract valid emails. For example

\procedure extract-email(src)
<$let emailpattern="^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$" >
<$list filter="[<src>splitregexp[\s]search:title:regexp<emailpattern>]" variable=email>
<$text text=<<email>>/><br/>
</$list>
</$let>
\end

To give a try on https://tiddlywiki.com

4 posts were split to a new topic: How to Ensure that “regexps” Can Always Be Used in TW

I have several concerns here. But the biggest one is whether you understand the code that’s being generated.

For instance, do you know why the regex generated by this tool does not accept the legal email, a~b@example.com? If you do understand why, and choose to accept the restriction, fine: exact regexes for email are notoriously difficult. If you don’t understand it, though, what are you going to do when things break?

Well yes, your concern is correct. AI, or even software people use in technical works produces results, but it is the role of the user to make sure the answer is correct or not!

The AI is an assistant here. In step 2, I explained how to do tests to make sure the solution works for your cases.
You can use regex101 or similar tools, to explain the generated pattern. If you have edge cases (like the one you stated) you can see if it works or not!
In my opinion it is a good help specially for not complex patterns.

If you haven’t yet seen it in this forum, you should know that I’m a big sceptic of the current crop of “AI”.

When it comes to writing code, I’m already seeing the result from our recent hires at GigantiCorp… and from some more experienced developers who really should know better.

We’re getting a lot of code that works for 75 - 90% of the scenarios it has to cover, and falls down miserably for the rest. Much of this is created by generative AI tools. The problem is that when the AI-user doesn’t understand the generated code.

You offer tests as a solution. The trouble is, how do you write good tests if you don’t understand the possibilities in great detail?

I know of only two ways to write good tests. One is to do Test-Driven Development (TDD), writing the code alongside the tests, adding more and more sophisticated tests as you notice potential weaknesses in the code. The other is to do Property-based Testing (PBT), which lets you dictate generic properties which the code must pass and then allow a random input-generator to sniff around the expected inputs and many edge cases.

TDD requires you to understand the code in detail. PBT requires you to understand the data domain in detail. The first won’t work if you’re using generative AI to do anything besides offering a very rough first draft. The second might work, but if you already have such a detailed understanding of the model, and you know the code techniques to implement it, then why bother with the AI at all? If you don’t know the code techniques, then who will maintain that code?

I would rather not be negative. But I can see no good coming out of generative AI, except in the very narrow range of providing a list of options for further considerations.

For this specific case, do you have any sense of what sorts of legal email addresses it will reject? (There are many!) Do you have any sense of what sorts of illegal email address it will accept? (As far as I can tell, only those with parts too long, but I haven’t looked closely.)

1 Like

Thank you for your detailed explanation! I really enjoyed reading it! Your precisie technical approach to the problem is appreciated.
I confirm your concern. You are absolutely correct.

The issue with AI is much worse when it comes to education/research, when students/resaerchers ask generative AI to solve their problems/homeworks or write their term paper! Yes, generative AI at the current level is not trustable!

Still with all the above shortcomings, for the case I gave in the OP, I think it is a good help! From the pattern one can understand it validates common emails and it is enough for my personal notebook!

I also use the Copilot from Microsoft! and I am happy with that! For example, from an old math book I take a screenshot (using my cell phone) from equations and give it to Copilot, it produces the TEX equivalent and I just paste TEX code in TiddlyWiki, it takes me some time to check equation by equation and symbol by symbol and make sure the TEX equivalent is correct! I never trust it, but it saves me a huge amount of time. For codes, I ask Copilot to make my code well documented! It does the work and I just read through and make sure everything is correct! So with the current state of AI, I believe it is a very good assistant, if we could use it in the right way!

1 Like

That strikes me as exactly the sort of task Generative AI is useful for: giving a reasonable first draft.

Anyone who uses that as final draft deserves the scorn they’ll likely face.

I think AI was a good help with your e-mail regexp for search. But –

It does not validate eg: It will show name@example..com which is OK (and probably expected) for a search, but 2 dots are not valid. If you want to validate an e-mail address there are more checks needed.

I did not find a regepx that covers everything. Eliminate invalide addresses and not also eliminating valid ones in the way.

With regexp, there are always compromises, between maintainability and correctness. Ususally more external tests have to be done in addition to regexp filter.

Test data and regexp provided by RegexBuddy 4, which I always use if regexp is a topic.

There are some valid addresses which I personally would always block: eg: IP instead of domain-name.

Valid addresses:
================
president@whitehouse.gov
ip@1.2.3.123
pharaoh@egyptian.museum
john.doe+regexbuddy@gmail.com
Mike.O'Dell@ireland.com
"Mike\ O'Dell"@ireland.com
IPguy@[1.2.3.4]
The email address president@whitehouse.gov is valid.
fabio@disapproved.solutions has a long TLD
fabio@email.validating.solutions

Invalid addresses:
==================
1024x768@60Hz
not.a.valid.email
invalid@ifon.nonexistingtld
john@aol...com
Mike\ O'Dell@ireland.com
joe@a_domain_name_with_more_than_sixty-four_characters_is_invalid_6465.com
a_local_part_with_more_than_sixty-four_characters_is_invalid_6465@mail.com
the_total_length_of_an_email_address_is_limited@two-hundred-fifty-four-characters.because-the-SMTP-protocol-for-sending-email.does-not-support-more-than-that.really-hard-to-come-up-with-a-bogus-address-as-long-as-this.still-not-long-enough.too-long-now.com

Possible regexp for (limited) validation. More tests are recommended.

regexp: ^(?=[A-Z0-9][A-Z0-9@._%+-]{5,253}$)[A-Z0-9._%+-]{1,64}@(?:[A-Z0-9-]{1,63}\.)+[A-Z]{2,63}$

Description
Email address 4 (anchored; limit total length and length of each part)
Use this anchored version to check whether a valid email address was entered when you cannot use procedural code to check the length of the input.
This regex makes two passes over the string to enforce limits on the overall length of the email address as well as the length of each part.
Does not match email addresses using an IP address instead of a domain name.
Requires the “case insensitive” option to be ON.

hope that helps
-mario

PS: I am not related to RegexBuddy 4 or RegexMagic – I’m just a satisfied customer.
PPS: They also maintain the https://www.regular-expressions.info/ page, which I used to learn regexp.

Thank you Mario for your detailed review of different scenarios! Yes, email addresses have very different complex formats and it is difficult to cover all of them.

This is one of the better ones I’ve seen, capturing the length requirements well. But it also has some false positives., properly accepting “scott@sau-yet.com” and “scott@sa-uy-et.com”, but also improperly accepting “scott@sau---yet.com”: domain names cannot have consecutive hyphens.

As noted, there is simply no perfect regex email validator, and you’ll have to choose what exactly you want to cover.

1 Like

Me too for many years. I also use the same author’s full scale “PowerGrep”. All three apps also teach you regex whilst developing the expressions.

Given the big use of regex in TW it Is worth naming such complementary tools.