A little about regular expression syntax in the Apache.htaccess file

18.03.2011

This material is one of the most important, so I recommend paying special attention to it. Any printable character and space can be used in a regular expression, but some characters have a special meaning.

  • Parentheses () are used to separate groups of characters.

  • The ^ symbol marks the beginning of a line.

  • The $ symbol marks the end of a line.

  • Symbol . stands for any character.

  • Symbol | denotes an alternative. For example, the expressions "A|B" and "(ABC|DEF)" mean "A or B" and "ABC or DEF", respectively.

  • Symbol ? is placed after a character (or group of characters), which may or may not be present. For example, the expression "jpe?g" would match both the string "jpg" and the string "jpeg". An example of an expression with a group of characters: "super-(puper-)?site".

  • The * character is placed after a character (or group of characters) that may be absent or present an unlimited number of times in a row. For example, the expression "jpe*g" would match the strings "jpg", "jpeg", and "jpeeeeeeg".

  • The + symbol acts similarly to the * symbol, with the only difference that the symbol preceding it must be present at least once. For example, the expression "jpe+g" would match the strings "jpeg" and "jpeeeeg", but not "jpg".

  • Square brackets [] are used to enumerate valid characters. For example, the expression "[abc]" is equivalent to the expression "a|b|c", but the bracketed version is usually faster. Ranges can be used inside brackets: for example, the expression "[0-9]" is equivalent to the expression "[0123456789]". If the characters inside the square brackets begin with a ^, it means any character other than those listed in the brackets. For example, the expression "[^0-9]+" means a string of any characters other than numbers.

  • The \ character is placed before special characters if they are needed in their original form. For example, the expression "jpe\+g" matches only one string "jpe+g".

  • Anything after the '#' character is considered a comment.

An example of a fragment of a .htaccess file for CNC

  • AddDefaultCharset utf-8
  • Options+FollowSymLinks
  • Rewrite Engine On
  • RewriteBase /
  • RewriteRule ^([a-zA-Z0-9_-]+)/*$ index.php/$1
  • RewriteRule ^(article|news)/(.+).html$ index.php/$1?view=$1&id=$2
  • RewriteRule ^(book)/(.+).html$ index.php/$1?view=$1&book_id=$2
  • RewriteRule ^(catalog)/(.+).htm$ index.php/$1?view=$1&id=$2
  • RewriteRule ^(catalog)/(.+).html$ index.php/$1?view=$1&parent_id=$2
  • RewriteRule ^(catalog)/(.+).html/([0-9]+)$ index.php/$1?view=$1&parent_id=$2&step=$3
  • RewriteRule ^(search)/(.+)$ index.php/$1?view=$1&text=$2
  • RewriteRule ^([a-zA-Z0-9_-]+)/([0-9]+)$ index.php/$1?view=$1&step=$2
  • RewriteCond %{REQUEST_FILENAME} !-f
  • RewriteCond %{REQUEST_FILENAME} !-d
  • RewriteCond %{REQUEST_URI} !^/index.php
  • RewriteCond %{REQUEST_URI} (/|\.php|\.html|\.htm|\.feed|\.pdf|\.raw|/[^.]*)$ [NC]
  • RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]

AddDefaultCharset utf-8 - Set charset utf-8

Options +FollowSymLinks - Follow symbolic links

RewriteEngine On - Enable redirect rules

RewriteBase / - Set the path of the site host to / (many redneck hosts add an extra index.php/ to the host, which breaks the site)

RewriteRule ^([a-zA-Z0-9_-]+)/*$ index.php/$1 - If the server sees a line like news or news/, where news can be any set of Latin characters, numbers, dashes and underscores - replaces it with index.php/news, The browser line will display the first option, and the server will see the second option.

RewriteRule ^(catalog)/(.+).html/([0-9]+)$ index.php/$1?view=$1&parent_id=$2&step=$3 - Replaces a string like catalog/whatever.html/2 with a string index.php/catalog?view=catalog&parent_id=anything&step=2 Where $1 is the first expression in brackets, it will always be equal to the string catalog $2 - the third expression in brackets can consist of any set of letters and symbols, including Cyrillic and any printable characters from the encoding utf-8 except for the slash / and the .html character set

And so on.

Last in our blog

Internet Marketing
04.11.2019