Project Database Enhancements
Self-Assessment | Introduction | Software Design & Engineering | Algorithms & Data Structures | Database | Old Source Code | New Source Code
My plan for database enhancement is to utilize more advanced administrative methods to investigate and apply performance tuning on the database, with the end-goal of improving database efficiency by speeding up its reads and writes.
I learned quite a bit about how NoSQL distributed databases work. They are very efficient because they only use “rows” (only the concept of a “row” is replaced with a more flexible model which is a called “document”) whereas SQL databases use columns and rows that make up tables (Chodorow, 2013, p. 3). The ability to use only rows allows for indexing (Chodorow, 2013, p. 81). When a search is performed on an indexed field, only that field is looked at (by using a pointer) (Chodorow, 2013, pp. 81). This allows for higher performance and efficiency as the database grows (Chodorow, 2013, pp. 3-4).
One of the first skills I learned was how to create a MongoDB database using the import tool to load a JSON file containing the company’s database documents and verify its creation. I then learned how to carry out manipulations within a Mongo shell using MongoDB statements that create, read, update, and delete documents (known as CRUD). I learned how to carry out queries and aggregations within the shell, both with and without projections. Projections help to organize and display data in an orderly way for the user. I also learned about superseded aggregation commands like “distinct,” which can find all distinct values for a given key (Chodorow, 2013, p. 146). Some of these commands may be outdated, but some, like “distinct” or “.regex.,” still carry out useful tasks.
I then gained knowledge on how to implement indexes and how to use them efficiently. Indexing allows one to speed up reads, but it can also slow down reads and writes if used incorrectly (Chodorow, 2013, p. 160). Indexing requires forethought and is not for all situations; it can slow down queries when one must scan over 30% of their collection (Chodorow, 2013, pp. 102-103). I learned how to use unique indexes, single-field indexes, and multi-field indexes called compound indexes. A compound index has the power to limit the search of an entire database by more than one field and is useful for a key with many distinct values (called “high cardinality”) (Chodorow, 2013, pp. 84, 98). What I learned inside the shell would later translate to coding for modules outside the shell. These modules would carry out the same abilities as the shell commands through Pymongo, a language combining Python with Mongo.
I met the planned objectives of enhancing the artifact’s database by applying more advanced administrative methods to improve the efficiency of read and write speeds. All the database’s queries and aggregations have been investigated by using intensive profiling on the database. Profiling is an admin command that one can turn on to log more information about operations carried out (Chodorow, 2013, p. 339). The higher the level of profiling, the more information that is recorded. I set the profile level to its highest and I set the profiler to catch any query taking longer than ten milliseconds (ms). I also wanted to know which index each query/aggregation was using (if any). I performed profile queries to gather information. I discovered that my compound index was not being used by the query it was intended for; it used a single-field index instead. I rewrote the compound index and tested it using profiling and discovered it was no faster than my single-field index at 0ms. I decided to remove the compound index, and thereby improved write speed.
What I did seems simple, but it was not so simple for me because this was my first time profiling. I had to overcome many obstacles; first to implement profiling, then to figure out if queries were using the indexes that were designed for them, and then to figure out how to get “millisecond” feedback for all queries and aggregations. Query information is found through Mongo’s “explain” feature, but I tried many suggested command styles and that feature doesn’t work with aggregations for me. The version of MongoDB that I am using is likely the cause. The latest version is 4.4 and I am using version 2.6.12. This is what led me to profiling and I could not get profiling to work for me in MongoDB’s shell on the school’s Codio platform.
I learned about a tool I could use to help me profile called IntelliShell by Studio 3T. Installing and correctly emulating my database was a bit challenging and a time-risk. I downloaded a trial of Studio 3T and used tutorials and guides to set it up. Assuring that I had mongo 2.6.12 working in Studio 3T, I configured the local connection and path to my database directory. I also needed to emulate the real database by applying my indexes.
Using a Studio 3T guide, I learned how to use profiling by applying profile queries, but first I needed to execute my module’s queries. I wrote all my queries and aggregations in MongoDB shell-style commands, first testing them in my original project for accuracy and then running them in IntelliShell. I did this because once the profiler is on, one can confuse oneself with bad entries that were logged. The profiler logged my entries, and then I ran profile-based queries given by the guide to investigate each query and aggregation. I honestly did not change as much as I wanted to in fine-tuning reads and writes because almost all my indexes proved to be speeding up reads and there were no needed indexes that I missed (Factor, 2020).
One feature of IntelliShell that I think is exceptional is that one can see the time an aggregation takes. I conducted research in many places, and most claimed that one cannot accurately time aggregations through a Mongo shell in version 2.6.12 (StackOverflow, 2012). I felt the milliseconds given were very consistent with real-time results when comparing queries.