Monday, April 20, 2020

SQL Server 2005 Data Manipulation Essay Example Essay Example

SQL Server 2005: Data Manipulation Essay Example Paper SQL Server 2005: Data Manipulation Essay Introduction SQL Server 2005’s data manipulation capabilities are wide-ranging. Standard data manipulation language (DML) SQL statements such as SELECT, INSERT, UPDATE and DELETE are used in the implementation of data manipulation methods such as sorts and joins. A large number of data transformations, which allow various manipulations and copies of data, are also offered. Aggregate functions, such as AVG and CHECKSUM, are provided.The most basic form of data manipulation takes place through data manipulation language SQL statements. These statements allow the user to modify or display data according to their own specifications. SELECT allows the user to choose records to query or operate on according to criteria defined in the query. The ORDER BY and GROUP BY sub-clauses to the SELECT query allow for sorting and grouping of the output data. INSERT populates new rows in the table. UPDATE changes existing rows of a table to conform to the information provided, and DELETE removes the specifi ed rows from the table. These statements, used in conjunction, offer a simple and powerful implementation of methods such as joins.A join is a data manipulation method which allows for a single query to access columns from two different tables. There are a number of different types of joins; the two basic categorizations are inner and outer joins. Subtypes of each basic category include cross-joins, equi-joins and natural joins (both inner joins), and left outer joins, right outer joins and full outer joins. Figure 1 and Figure 2 are sample tables that will be used to illustrate joins.PartNoPartName25Widget26Sprocket27Thingy28WhatsitFigure 1: Parts tablePartNoNumberSold252500261162711948Figure 2: Sales tableA Cartesian, or cross join, is the most basic form of an inner join, and the least often used. A Cartesian join simply takes the Cartesian product of the two tables, or joins each record to each other record in both tables. The result of a Cartesian join, which can be expressed a sSELECT from parts, salescan be demonstrated as:Parts.PartNoParts.PartNameSales.PartNoSales.NumberSold25Widget25250026Sprocket2611627Thingy261194828WhatsitFigure 3: Cartesian joinMore useful is the equi-join, which selects only records which are in both tables. In the illustration above, row 28 would not be present since it is present only in the Parts table, and not in the Sales table. The equi-join, or explicit inner join, is the most common inner join. It can be achieved by the query:SELECT from Parts, SalesWHERE Parts.partNo = Sales.partNo;A natural join is not recommended; it is implemented fuzzily depending on context and can’t be fully predicted.Outer joins can be expressed as selecting a subset (left, right or full) of the data from both tables. A left outer join selects only records that are present in the first (left) table selected; a right outer join selects only records that are present in the second (right) table, and a full outer join selects records present in both. In order to execute a left outer join, you can use a query like:SELECT distinct FROM PartsLEFT OUTER JOIN SalesON Parts.PartNo = Sales.PartNo;This query will return the following information:Parts.PartNoParts.PartNameSales.PartNoSales.NumberSold25Widget25250026Sprocket2611627Thingy271194828WhatsitNULLNULLFigure 4: Output of a Left outer joinThe left outer join will return a NULL value for any record represented in the left table but not in the right table. Any record in the right table that isn’t reflected in the left table will not be displayed. The right outer join is the mirror image of the left outer join; it returns all records in the right table, substituting NULL for records not represented in the left table, and will not return any records in the left table that aren’t in the right table as well. A full outer join performs a combination of these two joins; it selects all records from both the left and right tables, substituting NULL values where record s aren’t represented in the opposite table.The merge join transformation is a particularly notable form of join. This process essentially combines the contents of two tables into either a third table or one of the previously existing ones, updating information where it is different and inserting rows which are missing from either table. In order to merge two tables, the tables need to be sorted and the metadata and column types must be the same. If there is a string column, the length of the field in the second table must be less than or equal to that of the first.There are a number of other transformations offered by SQL Server 2005. Some of the most notable transformations include Union All and Merge, Pivot, Fuzzy Grouping, Multicast and Data Conversion. Transformations typically perform data manipulation which is complicated to perform in raw SQL. All the following transformations are available through the SSIS Designer tool; simply open the package and apply the transform ations using the Toolbox.The Union All and Merge transformations are similar in function. The main difference between the two is that Merge requires a sorted input and produces sorted output, whereas Union All works on unsorted input and produces unsorted output. The purpose of both transformations is to produce a single table from two or more separate inputs, mapping inputs from the inputs onto each column in the output. In order to map the inputs to the outputs, the column metadata has to match.The Pivot transformation serves a similar purpose to the pivot table function in Excel. By performing the pivot transformation on a column, the data is manipulated to appear as a summary of all other relationships to that column, aggregating information into a single piece as related to the pivot column.The Fuzzy Grouping transformation is a data mining and comparison tool which allows for grouping of almost, but not quite, identical records. This allows for similar rows to be identified us ing fuzzy matching principles. This data manipulation tool can be difficult to use, as there are a number of different thresholds which can be set which affect the outcome of the grouping. Fuzzy grouping is slow and can create sizable temporary tables in order to perform its transformation; however, it is a very important tool for data comparison and data mining.The Multicast transformation is one of the simplest, and the most useful, of the available transformations. It takes a single input table and creates multiple logical copies of the table in order to perform multiple operations on the same data without impacting the other operations going on.The Data Conversion transformation converts data in one column from its current type to another type. This can be useful for formatting data loaded from flat text files, supporting a data type switch or simply specifying print output in a manner unavailable to some data types. For example, if a text field contains dates, it may be more ef ficient to process, compare and sort that field if its contents were properly reflected in a date format. This can be done easily using the Data Conversion transformation.Aggregate functions are another form of data manipulation offered by SQL Server 2005. Aggregate functions, such as AVG and CHECKSUM, work in a SELECT query using the COMPUTE BY clause. Aggregate functions can be used in conjunction with joins, merges or other data manipulation methods to create or calculate information from the table’s existing records. CHECKSUM, one of the aggregate functions provided, creates a hash value (a unique identifier based on the contents of the selected informatioN) across the selected rows of a table. This value can then be used to build a hash index in order to speed data access and perform equality checks. In order to use CHECKSUM, use the SQL clauseCHECKSUM or CHECKSUM [expression]in a SELECT or UPDATE query.Mathematical aggregate functions such as COUNT, MIN, MAX, SUM and A VG offer a data snapshot which can be used in analysis of a data set overall without selecting from each individual row. COUNT simply returns the number of records in a table matching the specified requirements. Using Figure 1 above:SELECT COUNT() from PartsWHERE PartNo = 28;This query would be expected to return 1, as the PartNo field is a unique identifier.MIN and MAX return the minimum and maximum value of the specified fields.SELECT MIN(PartsNo) from Parts would return 25.SELECT MAX(PartsNo) from Parts would return 28.AVG and SUM are aggregate functions which return mathematical calculations of the selected fields. As such, they’re most useful in a numeric field such as INT rather than a text or date based field. AVG is used to compute the average of the selected fields. It has two operational modes. ALL selects all fields, where DISTINCT only calculates on unique values. ALL is the default operation of the command, so if there are known to be redundant values in the colu mn it may be best to use DISTINCT. Using Figure 2, we can compute the average sales volume for all products.SELECT AVG(NumberSold) from Sales;This query will return 4704.66 as the average number of items sold by each product across the product range.SUM works in a similar manner to AVG, and has the same behavior for ALL and DISTINCT. In order to compute the sum of the sales from Figure 2:SELECT SUM(NumberSold) from Sales;This query will return 14144, the total number of units sold across the product range.SQL Server 2005 has a large number of data manipulation facilities, implemented in both Transact-SQL and the graphical design tools. There are dozens of transformations and potential queries in addition to those outlined here. Data manipulation tools include those to sort, transform and aggregate data, as well as change display and output methods. SQL Server 2005 is a fully functional data manipulation engine in addition to being a database server. SQL Server 2005: Data Manipulation Essay Thank you for reading this Sample!