Manual of PHP

 
 
 

Web map

 
rim   rim
arrowReady frontarrow of Codes PHParrow Regular Expressions in PHP

 

Regular expressions in PHP To print E-mail

Series of characters that form a pattern to be able to compare it with another group of characters.

The regular expressions are a series of characters that form a pattern, normally representatively of another major group of characters, in such a way that we can compare the pattern with another set of characters to see the coincidences.

The regular expressions are available in almost any computer language, but although his syntax is relatively uniform, every language uses its own dialect.

If it is the first time that you approach the concept of regular expressions (regex to be quick) it will cheer you up to know that sure that you have already used them, still without knowing it, at least in his most basic slope. For example, when in a window we execute TWO dir *.* to obtain a list of all the files of a directory, we are using the concept of regular expressions, where the boss * coincides with any chain of characters.

A few simplified examples:

<? am//this is our boss. If we compare it with:
am//it coincides
panorama//coincides
ambition//coincides
camp//coincides
hand//does not coincide
?>

It is a question of being arranging simply a pattern (pattern) - that in this example is the letters sequence 'am '-with a chain (subject) and of seeing if inside her the same sequence exists. If it exists, we say that we have found a coincidence (match, in English).

Another example:

boss:
the wing traitor of the light fan
Till now the examples have been simple, since the secondhand bosses were literal, that is to say that only we find coincidences when there is an exact occurrence.

If we know in advance the exact chain to be looked, it is not necessary to break with a complicated boss, we can use like boss the exact chain for that we look, and that one and not other one will be that of coincidence. So, if in a list of names we look for the information of the user pepe we can use pepe like boss. But if in addition to pepe it is interesting to us to find pepa occurrences and pepito the literal ones are not sufficient.

The power of the regular expressions takes root precisely in the flexibility of the bosses, which can be confronted by any word or chain of text that has a well-known structure.

Really normally it is not necessary to use functions of regular expressions if we are going to use literal patterns. There exist other functions (the functions of chain) that work more effective and quickly with literal.

Characters and goal characters

Our boss can be formed by a set of characters (a group of letters, numbers or signs) or by goal characters that represent other characters, or allow a search contextual.

The meta-characters receive this name because it is they who are not represented themselves, but they are interpreted in a special way.

Is here the list of goal more secondhand characters:

. * ? + [] () {} ^ $ |

We will see his use, grouping them according to his purpose.

Goal characters of position, or anchors

The signs ^ and $ serve to indicate where our boss must be placed inside the chain to think that a coincidence exists.

When we use the sign ^ we mean that the boss must appear at the beginning of the compared chain of characters. When we use the sign $ we are indicating that the boss must appear at the end of the set of characters. Or more exactly, before a character of new line Like that:

<?

^am//our boss
am//it coincides
bed//does not coincide
ambidextrous//it coincides
Pam//does not coincide
golly//it does not coincide

am$
am//it coincides
salam//it coincides
amber//does not coincide
Pam//coincides

^am$
am//it coincides
salam//it does not coincide
amber//does not coincide

?>

or as in the previous example:

boss: ^el
the wing traitor of the light fan
The regular expressions that use anchors will only return an occurrence, since for example, only a sequence can exist to the beginning of the chain.

boss: el$
the wing traitor of the light fan
And here we do not find any, since in the chain to be compared (the line in this case) the boss "the" placed está in the end.
To show a coincidence in this example, we would have to look "co":

boss: co$
with the wing traitor of the light fan
We have begun for a few special metacharacters, since ^ $ do not represent other characters, but positions in a chain. That's why, they are known also how anchors or you anchor.

Escaping characters

It can happen that we need to include in our boss some metacharacter as literal sign, that is to say, for if the same and therefore it represents. To indicate this purpose we will use a character of leakage, the inverted bar.
This way, a boss defined as 12$ does not coincide with a chain finished in 12, and yes with 12$:

boss: 100$
the wing traitor of the light fan costs 100$
boss: 100$
the wing traitor of the light fan costs 100$
Concentrate on the previous examples. In the first one, there is no coincidence, because there is interpreted "he looks for a sequence consisting of the number 100 at the end of the chain", and the chain does not end in 100, but in 100$.
To specify that we look for the chain 100$, we must escape the sign $

As a rule, the inverted bar turns in normal special characters, and does special normal characters.

The point. as metacharacter

If a metacharacter is a character that can represent others, then the point is the excellent metacharacter. A point in the boss represents any character except new line.

And since we have just seen, if for what we want to look in the chain is precisely a point, we it will have to escape:.
boss: '.l'
the wing traitor of the light fan
He observes in the previous example how the boss is any character (included that of space in target) followed by one l.

Metacharacters quantifiers

The metacharacters that we have seen now inform us if our boss coincides with the chain to be compared. But: and if we want to compare with our chain a pattern that can be one or more times, or cannot it be? For this we use a special type of goal characters: the multipliers.

These metacharacters that are applied to the character or group of characters that they precede indicate in that number presents must be in the chain so that there is an occurrence.

By it they are called quantifiers or multipliers. Are the most secondhand *? +
<?
* //it coincides if the character (or group of characters) that him
//it precedes it is present 0 or more times
//ab* it coincides with "a", "ab", "abbb", etc.
//example:
cant*a//it coincides with canta, white hair, cantttta

?//it coincides if the character (or group of characters) that precedes
//time is present 0 or 1
//ab? it coincides with "a", "ab", does not coincide with "abb"
//example:
cant? to//it coincides with canta and white hair
d?//it coincides with of and
(wing)? dinner//coincides with dinner and larder

+ //it coincides if the character (or group) that him precedes is
//1 or more presents at least times.
//ab + it coincides with "ab", "abbb", etc. it does not coincide with "a"
//example:
cant+a//it coincides with canta, canttttta, does not coincide with
//white hair?>

boss: 'a*le'
the wing traitor of the light fan

boss: '? him'
the wing traitor of the light fan

boss: '+le'
the wing traitor of the light fan
In addition to these simple quantifiers also we can specify the maximum and minimal number of times that must happen so that there is an occurrence:

boss: (.a) {2}
the wing traitor of the light fan

<?
{x, y}//it coincides if the letter (or group) that him precedes is present
//a minimum "x" times and as maximum "y" times
//"ab {2}" coincides with "abb": exactly two occurrences of "b"
//"ab {2,}" coincides with "abb", "abbbb"... As minimum two
//occurrences of b, indefinite maximum
//"ab {3,5}" coincides with "abbb", "abbbb", or "abbbbb": As minimum
//two occurrences, like maximum 5

to {2,3}//it coincides with casaa, casaaa

to {2,}//coincides with any word that it has to
//less two "an" or more: casaa or casaaaaaa, not with house

to {0,3}//it coincides with any word that has 3 or
//less letters "a".
//IT NOTICES: you can stop without specifying the maximum value. NO
//you can leave the empty initial value

to {5}//exactly 5 letters "a"
?>

Therefore, the quantifiers * +? they can be expressed also this way:

* it is equivalent to {0,} (0 or more times)
+ it is equivalent to {1,} (1 or more times)
? it is equivalent to {0,1} (0 or 1 time)

Status metacharacters

The square brackets [] included in a boss allow to specify the status of valid characters to be compared. Basting that any of them exists so that of the condition:

<?

[abc]//The boss coincides with the chain if in this one there is
//any of these three characters: to, b, c

[a-c]//it coincides if a letter exists in the status ("a", "b" or "c")
c [ao] sa//it coincides with house and with thing

[^abc]//The boss coincides with the chain if in this one there is not
//none of these three characters: to, b, c
//Note that the sign ^ here is an exclusive value

c [^ao] sa//It Coincides with cesa, cusa, cisa (etc); it does not coincide
//neither with house nor thing

[0-9]//It Coincides with a chain that it contains any
//number between 0 and 9

[^0-9]//It Coincides with a chain that it does not contain no
//number

[A-Z]//It Coincides with any alphabetical character,
//in capital letters. It does not include numbers.

[a-z]//Like the previous one, in small letters

[to-Z]//Any alphabetical character, capital or minuscule

?>

A question to be remembered is that the rules of syntax of the regular expressions are not applied equally inside the square brackets. For example, the metacharacter ^ does not serve here as anchor, but as character disclaimer. It is not also necessary to escape all the metacharacters with the inverted bar. It will be only necessary to escape the following metacharacters:] ^ -

The rest of metacharacters can be included since there are considered to be - inside the square brackets - normal characters.

boss: [aeiou]
the wing traitor of the light fan

boss: [^aeiou]
the wing traitor of the light fan

boss: [a-d]
the wing traitor of the light fan

As these bosses are used repeatedly, there are short cuts:

<?
//short cut is equivalent to meaning

d [0-9]//numbers from 0 to 9
D [^0-9]//the opposite of d

w [0-9A-Za-z]//any number or letter
W [^0-9A-Za-z]//the opposite of w, a character that not

//be a letter I do not even number

s [\t\n\r]//it spread in target: it includes space,

//tab, new line or comeback

 

S [^ \t\n\r]//opposite of \s, any character
//that is not a space in target

//only regex POSIX
[[:alpha:]]//any alphabetical character aA - zZ.
[[:digit:]]//Any number (point) 0 - 9
[[:alnum:]]//Any alphanumeric character aA zZ 0 9
[[:space:]]//space
?>

Alternation metacharacters and gathering

<?
(xyz)//xyz coincides with the exact sequence
x|y//it coincides if it is present x ó and

(Don|Doña)//it coincides if "Don" or "Mrs" precedes
?>

boss: (to)
the wing traitor of the light fan

boss: to (l|b)
the wing traitor of the light fan
The parentheses serve not only to group characters sequences, but also to capture subpatterns that then can be returned to the script (backreferences). We will speak more of it on having treated about the functions POSIX and PCRE, on the following pages.

A typical example would be a regular expression which boss was capturing directions url cost and with them it was generating links to the flight:

<? $text = "one of the best pages is http://www.blasten.com";
$text = ereg_replace ("http:// (.*. (com|net|org))",
"1", $text);
print $text;
?>

The previous example would produce a linkage usable, where the url would take of the retro-reference and the visible part of the retro-reference 1 one of the best pages is www.ignside.net

Notice that in the previous example we use two groups of (sheltered) parentheses, therefore two apprehensions would take place: The retro-reference coincides with the looked coincidence. To capture it it is not necessary to use parentheses.

The retro-reference 1 coincides with this case with "www.blasten.com" and she is captured by the parentheses (.*. (com|net|org))

The retro-reference 2 coincides with "net" and corresponds to the sheltered parentheses (com|net|org)

Bear in mind that this characteristic of capturing occurrences and having them available for retroreferencias completes resources of the system. If you want to use parentheses in your regular expressions, but do you know in advance that you are not going to re-use the occurrences, and can do without the apprehension, it places after the first parentheses?:

<?
text = ereg_replace ("http:// (.*. (?:com|net|org))",
"<to href = ""> <1 / to>", $text);
?>

On having written (?:com|net|org) the subboss between parentheses remains grouped, but the coincidence is already not captured.

As final note on the topic, PHP can capture up to 99 subbosses to retro-reference effects, or up to 200 (in whole) if we look for subpatterns without capturing them.

A practical example

We have said that the regular expressions are one of the most useful instruments in any computer language. So that we can use them?. One of his most typical uses is of validating earnings of information that the visitors of a page could order us across forms html.
The most current example is that of a direction email. Let's imagine that we want to leak the directions introduced by the visitors, to avoid to introduce in the database the typical direction garbage ghghghghghghg. We all know the structure of a direction email, formed by the chain nombreusuario, the sign and the chain nombredominio. Also we know that nombredominio one is formed by two subchains, 'nombredomino', '.' and a suffix 'com', 'net', 'is' or similar.
Therefore the solution to our problem is to design a regular expression that identifies a direction email cost typical, and to confront it with the chain (direction email) spent by the visitor
For example:

<?
^ [^] + [^] +. [^.] +$
?>

We go to diseccionar our regular expression:

<?
^ //we mean that the first character for that we look
//it must be at the beginning of the chain to be compared.

[^]//this first sign must be not even the sign
//not even a space

+ //and it recurs one or more times
@ //then we look for the sign

[^] +//Followed by another sign that is one not even one
//space and it recurs one or more times

. //Followed by one.

[^.]//Followed by a character that is not not,
//neither space nor point

+$ //That repeats one or more times and the last one this one
//at the end of the chain
?>

And to verify it in practice, we use one of the functions of php related to the expressions regulares:ereg ().
Coming to the manual php, we can find out that this function has the following syntax:
ereg (string pattern, string string)
Pattern looks in string for the coincidences with the regular expression. The search differs between capital letters and small letters.
He returns a real value if some coincidence was, or false in coincidences were not or some error happened. We might use this function for a validador email with something like:

<?

//we establish a conditional sequence: if the variable $op does not exist and
//it is equal to "ds", a form appears

if ($op! = "ds") {?>
<form>
<unputt type=hidden yam = "op" values = "ds">
<strong> Your email: </strong> <br/>
<unputt type=text yam = "email" values = "" size = "25"/>
<unputt type=submit yam = "submit" values = to "Send"/>
</form>
<?
}

//If $op it exists and m is equal to "ds", the function is executed ereg looking
//our chain inside the boss $email that is the direction sent for
//the user from the previous form

else if ($op == "ds")
{
if (ereg ("^ [^] + [^] +. [^.] +$", $email))
{
print "<BR> This direction is correct: $email";}
else {I begin "$email it is not a cost direction";}
}

?>

It is not necessary to warn that it is a question of a very elementary example, which will give for any direction valid email that has a minimal appearance of normality (for example, it would give for cost 'midireccionnn@noteimporta.commm')

To have to hand...
A brief reference of the goal characters and his meaning, seizure of a comment of the manual of php.net.

<?

^ //Beginning of the zadena
$ //End of the chain

n*//Zero or more "n" (where n is the previous character)
n +//One or more "n"
n?//A possible one "n"

n {2}//Exactly two "n"
n {2,}//At least two or more "n"
n {2,4}//from two to four "n"

()//Parentheses to group expressions
(n|a)//or "n" or "a"

. //Any character

[1-6]//a number between 1 and 6
[c-h]//a letter in small letter between c and h
[D-M]//a letter in capital letter between D and M
[^a-z]//there are no letters in small letter of to up to z
[_a-zA-Z]//a low script or any letter of the alphabet

^. {2} [a-z] {1,2} _? [0-9] * ([1-6] | [a-f]) [^1-9] {2} a+$

/* A chain that begins for two characters anyone
Continued by one or two letters (in small letter)
Continued by a script _ under optional
Continued by zero or more numbers
Continued by a number of the 1 to 6 or a letter of - to - to the-f -
Continued by two characters that are not numbers of the 1 to 9
Followed by one or more characters at the end of the chain
Taken of a note to the manual of php.net, of mholdgate -
wakefield dot co dot uk */

?>

Functions PHP for regular expressions

PHP has two sets different from functions related to regular expressions, called POSIX and PCRE.

The functions "PCRE" are "Compatible PERL", that is to say, similar to the native functions Perl, although with light differences. They are more powerful enough than the functions POSIX, and correlatively more complex.

POSIX
It includes six different functions. As common note, you spend first the pattern (the expression to be looked) and like the second argument the chain to be compared

ereg it confronts the chain with the search boss and returns TRUE or FALSE as it finds it.
eregi like the previous one, WITHOUT distinguishing between capital letters - small letters

<?
ereg ("^am", "america");//TRUE

$es_com = ereg ("(). (com$)", $url);
//we look for two subpatterns in $url. The first one is
//a (literal) point and that's why it is escaped by the bar.
//the second subboss also literal looks for the sequence "com"
//at the end of a word.

?>

ereg_replace: He looks for any occurrence of the boss in the chain and replaces it by other one.
eregi_replace: as the previous one but without distinguishing minuscule capital letters:

boss, substitution, chain to be confronted

<?
$cadena = ereg_replace ("^am", "Hispanic - am", "america");//$cadena = Hispanic - america
?>

split () it divides a chain in pieces (that happen to an array) using regular expressions:
spliti: like the previous one, without differentiating Capital letters - small letters

It is basically just as explode, but using regular expressions to divide the chain, instead of literal expressions

<?
$date = "24-09-2003";
list ($month, $day, $year) = split ('[/.-]', $date);

?>

Storing the results with ereg

We can spend a pattern with subbosses grouped in parentheses. If in this case we use ereg, we can add a third parameter: the name of an array that will store the occurrences; $array [1] will contain the subchain that starts in the first left parentheses; $array [2] the one that begins in the second one, etc. $array [0] will contain a copy of the chain.

<?
$date = "24-09-2003";//we spend a date format dd-mm-yyyy

if (ereg ("([0-9 {] 1,2}) - () [0-9 {] 1,2} - ([0-9 {] 4})", $date, $mi_array)) {
I begin "$mi_array [3].$mi_array [2].$mi_array [1]";//it coincides. We show it in inverse order because we are like that:)
} else {
I begin "Invalid dates format: $date";//it does not coincide
}
?>

Storing the results with ereg_replace: backreferences

From very similar form to ereg, the functions of search and replacement ereg_replace and eregi_replace can store and re-use suboccurrences (subbosses found in the chain).

The principal difference is the way of calling the stored suboccurrences, since we need to use the inverted bar: \0, \1, \2, and so up to a maximum of 9.

The first reference \0 alludes to the coincidence of the entire boss; the rest, to the sub-occurrences of the sub-bosses, of left to right.

For example we are going to use this aptitude to turn a chain that proves to be an url in a linkage:

<?
$url = "the page blasten.com (http://www.blasten.com)";
$url = ereg_replace ("(http|ftp)://(www).? (. +). (com|net|org)", "<to href = "\0"> \3 </to> ", $url);
I begin $url;?>

Functions PCRE

The functions PCRE are "perl-compatible". Perl is one of the computer languages with better engine of regular expressions, and also it is very well-known. This bookstore is used (with different variants) not only by Perl, but by the proper PHP and also in other environments, like the server Apache, Phyton or KDE.

The functions of regular expressions PCRE of PHP are more flexible, powerful and rapid than the POSIX.

Practically there does not exist any syntactic difference between a boss PCRE or POSIX; often they are interchangeable. Naturally some difference exists. The most clear, that our boss will have to be marked in his beginning and end a few delimitadores, normally two bars:

/patron/

Delimitadores

It is possible to use like delimitador any special character (not alphanumeric) except the inverted bar.

The most widespread custom is, as we have seen, to use the bar/, nevertheless, if later we are going to need to include the same character delimitador in the boss, we it will have to escape, therefore he has sense to use in these cases a few different delimitadores: (), {}, [], or <>

Modifiers

They are placed after the boss:

<?
m//multiline. If our chain contains several physical lines (n)
//he respects these line jumps, what it means, for example,
//that the anchors ^ $ are not applied at first and end of
//chain, but at first and end of every line

s//The metacharacter. it represents any character less of
//new line. With the modifier "s" also it represents the new line.

i//the boss confronts with the chain ignoring minuscule Capital letters

x//he ignores spaces (unless they are escaped or included
//specially inside the search status. He ignores any character
//after ink pad (#) up to new line. It serves to include
//comments and to make the pattern more legible.

e//only in preg_replace. He evaluates the occurrences as code php earlier
//of realizing the replacement.

A//The boss is forced to be "anchored", this is, only one will exist
//occurrence if it is to the beginning of the chain.

E//the character $ in the boss will marry in order to the chain.
//Without this modifier, $ marries also the character immediately
//before that of a new line.

U//modifying East invests the "greed" of the quantifiers.
//If we apply U, * does it turn into lazy (lazy) and *? it returns to his
//normal behavior.
//A quantifier is a "covetous" (greedy) when it tries to capture all
//the possible occurrences, and lazy (lazy) when it captures
//shorter occurrence.?>

Word Delimitadores

In the functions PCRE you can use b that indicates the beginning and end of a word (or better, of an alphanumeric sequence):
/bpatronb/

B, on the contrary, it refers to a boss that is not to the beginning or end of a word.

Covetous or not

The regular expressions that use quantifiers tend to be quite it covetous that it is allowed them, whenever they respect the pattern to be continued. With the modifier U this behavior is invested.

In covetous way the boss will marry all the occurrences that it could, while in way lazy or ungreedy, he will marry only as felling as possible (e).
He warns that two solutions are correct.

boss: / http://.*. (com|net|org) / this is a link: http://www.abc.com and be different more: http://www.blah.com

boss: / http://.*. (com|net|org) / Uesto is a link: http://www.abc.com and be different more: http://www.blah.com

The functions

We have five functions PCRE:

preg_match ()
preg_match_all ()
preg_replace ()
preg_split ()
preg_grep ()

preg_match he looks for the pattern inside the chain, returning TRUE if it is found:
boss, chain [array of results]

If we indicate the third parameter, we will have the results in an array; $array [0] will contain the occurrence of the boss; $array [1] will have the chain that marries the first subboss and this way successively.

preg_match_all it finds all the occurrences of the boss in the chain.
boss, chain, array of bosses, order

If we provide the fourth parameter, it will place the occurrences in an array following the stated order.

preg_replace he looks and replaces the pattern in the chain. Both the boss and the replacement can pass in array, and can contain regular expressions.
A maximum occurrences limit can be specified also.

preg_split it operates like split although also you can spend regular expressions to him

preg_grep he looks for the pattern inside an array, and returns another array with the occurrences.
boss, array

Author: Blasten
http://www.blasten.com/contenidos/19073?pag=7

 

 
Front
Chapters of the Manual of PHP
Introduction to PHP
Variables
My First Script PHP
Operators in PHP
Structures of Control
Functions in PHP
Code inclusion
Counterfoils (array)
Chains of Characters
Classes
Dates
Entry and Exit
Operations with Files
The language SQL and PHP
Connection with MySQL
Meetings
Forms with PHP
To practise in line
List of practices in line
FAQ
Frequent questions
Codes PHP
List of Codes PHP
Forums
Forums PHP
Other Manuals
Manuals of other languages
 
   
 
 
MySpaceScripts.info - Sitemap - Contact us - Sitemap - Contact us - Sitemap - Contact us

Replica Horloges , FilesDownloads.net , DMC Poland , Replica Watches , Replika Klockor , Conference Organizers Poland , Hotels Booking