CS 115 Program 3 Search Engine Fall 2017


Due Dates:
Individual Design: Wednesday, November 29 midnight
Source: Sunday December 3 midnight "officially"
actually due by Wednesday December 6 midnight
and accepted until Friday December 8 midnight with some penalty

NOT accepted during Finals Week at all!

Assignment total points = Design (20 points) + Runs test cases (20 points) + Implementation (70 points) + Style and documentation (15 points) = 125 points

Submit all program materials (.py files) with the link here.

Use Program 3 and either Code or Design for type.
NOTE: the case sensitivity part has been changed to 5 points bonus. Reminder that a program has to be turned in ON TIME to be eligible for the bonus. That means Wednesday, 12/6.

Educational Goals: the student should use the concepts of

You are going to write a program that will perform (some of) the actions that a search engine goes through. This will involve file input and output. Your program will produce an HTML file as output, which can be viewed by a web browser like Firefox or Internet Explorer. See this page for a short tutorial.

The user of a search engine provides some key phrase that they want to find. The search engine program looks through a database file to see if the desired search phrase appears or not. Every line with one or more occurrences of the phrase will be displayed for the user ("hits").

The user will specify the name of the database file. Your database file will be a text file of this format:

a URL (uniform resource locator), a comma, then keywords separated by spaces

If the database cannot be opened, the program should report the fact and ask the user for another (valid) filename. This continues until they give a filename that will open.

Example database file:

http://www.aol.com, Commercial Internet service
http://www.toyota.com, car sales
http://www.uky.edu, school research Kentucky university Lexington
http://www.google.com, search engine
http://www.eku.edu, school research Kentucky University Richmond
http://www.yahoo.com, search engine
http://www.youtube.com, video archive popular
http://www.avis.com, rental car service
http://www.msn.com, commercial Internet service
http://www.toysrus.com, toys bikes 
http://www.amazon.com, sales books garden clothes shoes toys cars videos
http://petsmart.com, pets adoptions
Example interaction with the user:
Big Blue Search Engine

Your user name? keen
Enter name of database file (.txt will be added): database1
Data retrieved from file

Enter a keyword to search for: car
Enter name for web page of results (extension of .html will be added): mycars
Do you want the search to be case sensitive? (y/n) n   Do this if you are doing the bonus

Done. 3 hits. Results in mycars.html
And the web page created would be in a file called mycars.html.
<html>
<title>Search Findings</title>
<body>
<h2><p align=center>Search for "car"</h2>
<p align=center>
<table border>
<tr><th>URL<th>Hit</tr>
<tr><td><a href="http://www.toyota.com"> http://www.toyota.com</a> <td>  <b>car</b> sales </tr>
<tr><td><a href="http://www.avis.com"> http://www.avis.com</a> <td>  rental <b>car</b> service </tr>
<tr><td><a href="http://www.amazon.com"> http://www.amazon.com</a> <td>  sales books garden clothes shoes toys <b>car</b>s videos </tr>
</table>
</body>
</html>

Note the search phrase "car" was displayed in bold in the result page.

If the search phrase is not found,

Big Blue Search Engine

Your user name? keen
Enter name of database file (.txt will be added): sampledb
Data retrieved from file

Enter a keyword to search for: chair
Enter name for web page of results (extension of .html will be added): mychairs
Do you want the search to be case sensitive? (y/n) n   Do this if you are doing the bonus

Done. 0 hits. Results in mychairs.html
then this is the output web page in a file called mychairs.html:
<html>
<title>Search Findings</title>
<body>
<h2><p align=center>Search for "chair"</h2>
<p align=center>
<table border>
<tr><td>chair not found! </tr>
</table>
</body>
</html>

Another example with spaces in the search phrase

Big Blue Search Engine
Your user name? keen
Enter name of database file (.txt will be added): sampledb
Enter a keyword to search for: rental car
Enter name for web page of results (extension of .html will be added): myhits
Do you want to be case sensitive? (y or n) n   Do this if you are doing the bonus
Done. 1 hits. Results in myhits.html
and the output file would be myhits.html
<html>
<title>Search Findings</title>
<body>
<h2><p align=center>Search for "rental car"</h2>
<p align=center>
<table border>
<tr><th>URL<th>Hit</tr>
<tr><td><a href="http://www.avis.com"> http://www.avis.com</a> <td>  <b>rental car</b> service </tr>
</table>
</body>
</html>

Some more examples
Search for "er" note that it bolds every occurrence of "er" in the keywords.
case sensitive search Note that it only picks up the lower case "commercial", not the one that started with a C. Do this if you are doing the bonus
Search for "toy" note that the "toy" in the URL is NOT bolded.
You can right click on these links, save the files and then use Notepad to open them up so you can read the html, or you can load them into a browser and use the "View/Page Source" option in the browser.

  • In addition to creating the html file with the search results, the program will also do something else the search engine companies do very commonly. It will keep a file called "secret.txt" which will record every userid given to it and the search phrases that person searched for. This file will grow every time the program is run. It is a "log file" that is appended to.

    Test Cases

    Test cases posted 11/25/17

    Read three pages, about testing loops, testing files, and about testing ifs.
    The "sample database" mentioned in the Test Cases page is the sample file given at the top of the assignment. Your program must handle all these test cases correctly to get the 20 points.

    Design


    Decide on what steps you will need to perform to solve this problem. Save this Python file as "design3.py" and submit it using the link above.

    For each function described below, you need to write the three P's. This page gives some examples of writing function designs. Besides the prolog (3 P's) you need to design the function body just as we've always done. State what control structures you're using and how. What loops do you need? what if statements? How do you get the return value if any?

    Now that you have more complicated data structures like lists and strings, describe what they look like. How are they constructed?

    Design Notes

    You MUST write and call these functions. You can also provide others that you think would be useful. Yes, you will have to figure out what parameters some functions need.

    Partial main function design

    # prolog 
    # display title
    # ask user for user name
    # get data from database file
    # ask user for keyword search phrase
    # ask user for name of file for results
    # ask user if they want to be case sensitive   Do this if you are doing the bonus
    #  some design here about case sensitivity   Do this if you are doing the bonus
    # do the search and get number of hits
    # report that the search is done, how many hits and the name of the file that has the results
    # add the user and keyword to the log file
    
    Please read the documentation standard on the class web page. As you can see from looking at the grading sheet, we will be looking to see how you meet these standards.