lex program to scan reserved word and identifiers || c program || compiler design

Lex is a program generator that is used to generate lexical analyzers. A lexical analyzer, also known as a lexer or scanner, is a program that reads input character by character and groups them into tokens that can be understood by a parser or compiler.

Lex works by generating a program in C that scans input and matches regular expressions to produce tokens. The regular expressions and actions for each token are specified in a file with the extension .lex or .l. This file is processed by the Lex program to generate a C program that performs the lexical analysis.

A sample flowchart Diagram for this program is shown below:


The basic structure of a Lex program consists of three parts:

1. Declarations: This section contains any C code that is needed by the program, such as #include statements and variable declarations.
2. Rules: This section contains regular expressions and the actions to be taken when a match is found.
The syntax for a rule is:
                                      <regular expression>
                                        {
                                           <action>
                                        }
3. User code: This section contains any additional C code that is needed by the program, such as function definitions or main().

  • Once the Lex program has been processed, the resulting C program can be compiled and run to perform the lexical analysis on the input. The tokens generated by the lexical analyzer can then be passed on to a parser or compiler for further processing.

ALGORITHM :-
  • Start
  • Define an array called 'reservedWords' that contains all the reserved words in the C language.
  • Define a string called 'specialChars' that contains all the special characters used in the C language.
  • Prompt the user to input a C program and store it in the 'input' string variable.
  • Use 'strtok' to split the input string into tokens based on the characters in the 'specialChars' string.
  • For each token, check if it is a reserved word by comparing it to the words in the 'reservedWords' array. If it is a reserved word, print a message indicating that it is a reserved word.
  • If the token is not a reserved word, check if it is an identifier by checking if the first character is a letter or an underscore. If it is, print a message indicating that it is an identifier.
  • If the token is not a reserved word or an identifier, check if it is a special character by checking if the first character is a punctuation mark. If it is, print a message indicating that it is a special character.
  • Repeat steps 5-7 for each token in the input string.
  • Return 0 to indicate successful completion of the program.
  • Stop

Uses of Lex Program:-

1. Generating lexical analyzers for programming languages: Lex can be used to generate efficient and accurate lexical analyzers for programming languages. These lexical analyzers can then be used to tokenize input programs, making them easier to parse and compile.


2. Building text processing tools: Lex can also be used to build various text processing tools, such as parsers, compilers, and interpreters. By defining regular expressions to match specific patterns in the input, Lex can generate code to recognize these patterns and perform actions based on them.


3. Parsing configuration files: Many applications use configuration files to store settings and preferences. Lex can be used to parse these configuration files and extract the relevant information, making it easier to process and use the settings.


4. Analyzing log files: Log files can contain vast amounts of data that need to be analyzed to extract meaningful insights. Lex can be used to analyze log files and extract specific information, such as error messages, warnings, or performance metrics.


5. Developing network protocols: Lex can be used to develop network protocols by generating code to parse and generate messages conforming to the protocol specification.

Overall, Lex is a powerful tool that can be used to generate efficient and accurate lexical analyzers for a wide range of applications, making it an essential tool for software developers and researchers alike.

write a lex program to scan reserved word and identifiers of c language?

SOURCE CODE:-

#include<string.h>

#include<ctype.h>

#include<stdio.h>

void keyword(char str[10])

{

if(strcmp("for",str)==0||strcmp("while",str)==0||strcmp("do",str)==0||strcmp("int",

str

)==0||strcmp("float",str)==0||strcmp("char",str)==0||strcmp("double",str)==0||str

cmp("static",str)==0||strcmp("switch",str)==0||strcmp("case",str)==0)

printf("\n%s is a keyword",str);

else

printf("\n%s is an identifier",str);

}

main()

{

FILE *f1,*f2,*f3;

char c, str[10], st1[10];

int num[100], lineno=0, tokenvalue=0,i=0,j=0,k=0;

printf("\n Enter the c program : ");/*gets(st1);*/

f1=fopen("input","w");

while((c=getchar())!=EOF)

 putc(c,f1);

fclose(f1);

f1=fopen("input","r");

f2=fopen("identifier","w");

f3=fopen("specialchar","w");

while((c=getc(f1))!=EOF)

{

if(isdigit(c))

{

tokenvalue=c-'0';

c=getc(f1);

while(isdigit(c))

{

tokenvalue*=10+c-'0';

c=getc(f1);

}

num[i++]=tokenvalue;

ungetc(c,f1);

}

else

if(isalpha(c))

{

putc(c,f2);

c=getc(f1);

while(isdigit(c)||isalpha(c)||c=='_'||c=='$')

{

putc(c,f2);

c=getc(f1);

}

putc(' ',f2);

ungetc(c,f1);

}

else

if(c==' '||c=='\t')

printf(" ");

else

if(c=='\n')

lineno++;

else

putc(c,f3);

}

fclose(f2);

fclose(f3);

fclose(f1);

printf("\n The no's in the program are :");

for(j=0; j<i; j++)

printf("%d", num[j]);

printf("\n");

f2=fopen("identifier", "r");

k=0;

printf("The keywords and identifiers are:");

while((c=getc(f2))!=EOF)

{

if(c!=' ')

str[k++]=c;

else

{

str[k]='\0';

keyword(str);

k=0;

}

 }

 fclose(f2);

f3=fopen("specialchar","r");

printf("\n Special characters are : ");

while((c=getc(f3))!=EOF)

printf("%c",c);

printf("\n");

fclose(f3);

printf("Total no. of lines are:%d", lineno);

}

Output :

Enter the C program: a+b*c

Ctrl-D

The no’s in the program are:

The keywords and identifiers are:

a is an identifier and terminal

b is an identifier and terminal

c is an identifier and terminal

Special characters are:

+ *

Total no. of lines are: 1

Important questions:

1. What is lexical analyzer? 

2. Which compiler is used for lexical analysis? 

3. What is the output of Lexical analyzer? 

5. Which Finite state machines are used in lexical analyzer design? 

6. What is the role of regular expressions, grammars in Lexical Analyzer?