paulgorman.org

< ^ txt

Sat Jan 1 06:00:01 EST 2022 ======================================== Slept from eleven-thirty to six-thirty, without waking. Cloudy. Chance of rain showers and snow showers until late afternoon, then snow showers, possibly mixed with rain showers early in the evening. Accumulations up to half an inch. Highs in the upper 30s. North winds 5 to 10 mph increasing to 10 to 15 mph with gusts to around 30 mph in the late morning and afternoon. Chance of precipitation 70 percent. Happy new year (hopefully, a much better year than the last couple)! https://willdemaine.ghost.io/grep-from-first-principles-in-golang/ > To understand how regular expressions work, we first need to understand what a regular language is. The definition of a regular language is often given as "any language which can be expressed with a regular expression", which sounds like a tautology to me. The actual formal langauge theory here is a bit esoteric, so I'm going to instead try and demonstrate regular and non-regular languages with a few examples. > > Given any alphabet, say the latin alphabet, or maybe just one which constists of two symbols {a,b}, some regular languages are: > > * The empty string "" > * The symbol a or the symbol b > * Other langauges combined, e.g. aa or ab. > * Any language 0 or more times, like abab > > These simple examples consitute the base cases of regular languages. From those, you can build up some slightly more interesting things like: > > * "Any string with an even number of a's" -> (aa)* > * "One or more a, followed by one or more bs -> aabb (or a+b+) > > To understand where the limits of regular languages lie, let's look at some examples of languages which are not regular. If I asked you to write me a regular expression which matched any valid Go program, you'd rightfully tell me to get stuffed, but what is it about a Go program that is inherently more complex than the languages we defined above? > > Basically (and this is a massive over-simplification), it comes down to memory. For a language to be regular, we need to be able to recognise (i.e. check its validity) it without storing anything in memory. To properly understand a Go program, we need to keep track of state as we're parsing it to check if the next token we're looking at is valid. With regular expressions, we have a current state, but we don't have any way to remember things (in fact, this is why modern regular expression engines which support things like backracking can parse languages which are not formally regular). > > To demonstrate this, consider the language defined as "3 a's followed by 3 b's" which we could represent as aaabbb or a{3}b{3}. This is a totally valid regular language (and therefore a valid regular expression). But what if I said we wanted a language which was "N a's followed by N b's"? The most obvious step of inference would be to write a{N}b{N}. If you give me a value for N, I can generate an expression which matches a language, but importantly we can't have a single regular language which matches for every value of N. > > So why not? Well, to do that, we'd need some memory. We'd have to count all the a's to discover our N, record that number somewhere, and then make sure there are the same number of b's. > > Another canonical example of something a regular expression can't do is to test whether a string has 'balanced brackets', e.g. match (()(())) but not ()((). To do that, we'd need to keep track of every time we saw an opening bracket to ensure that we closed it. And with no memory, no keeping track. Watched episodes of the British reality series Portrait Artist of The Year. Nap in the afternoon. Read the second volume of Witch Hat Atelier. Servings: grains 3/6, fruit 1/4, vegetables 2/4, dairy 0/2, meat 2/3, nuts 0/0.5 Breakfast: potato chips Brunch: banana, spaghetti with chicken Dinner: green tea, carrots, potato chips

< ^ txt