Overview

Foogle is a search engine aggregator, which compiles the results from other widely used search engines, stores the results into a mySQL database, and then displays the most relevant results by keyword to the consumer.

Why is this technology necessary? Traditional search engines, being from profit-oriented companies, prioritize displaying ads in search results. Moreover, they utilize ranking algorithms that are typically influenced by business interests. Foogle aims to overcome these limitations by using a word frequency algorithm that not influenced by business motives. Additionally, Foogle filters out ads from the compiled results, ensuring a more relevant and unbiased search experience. Lastly, by storing the search results from other search engines in a database, Foogle enables secondary searches using natural language processing and keyword ranking, resulting in more robust and accurate searches.

Technical Details:
Tech Stack: Python (Asyncio, Aiohttp, BeautifulSoup, MySQL Connector, NLTK), MySQL, Flask, HTML / CSS

Process Overview for Populating the Database:
  1. Retrieve query from end-user
  2. Send HTTP requests from Google, Bing, Yahoo, and DuckDuckGo to obtain URLs
  3. Perform ad and duplicate filtering
  4. Clean up and manipulate HTML text data to retrieve wanted information
  5. Populate the database with the information
Process Overview for Querying the Database:
  1. Retrieve websites from the database based on the end-user query relevance with the text from the URL using Natural Language Processing techniques
  2. Retrieve keywords from the end-user query by removing stopwords
  3. Count the keywords within the text of the retrieved websites
  4. Rank the retrieved websites based on keyword counts
  5. Display the results to the end-user on the website
The following tools are used to retrieve or display information from the end-user and send that information to the back-end:
  1. HTML / CSS
  2. Flask Web Framework

Results

With the above implementation, we are able to request information from over 50 different websites asynchronously and populate the website URL, title, and text in under 30 seconds on average. The mySQL Connector execution to populate the database has the greatest opportunities for speed efficiency.

Please review the presentation slide deck for a summary of the search engine implementation: Slides

Want to connect?

Connect with me through LinkedIn, or reach out to me via email or phone number.

Email

allenlau3@outlook.com

Phone

(484) 855-9707