Teradata viewpoint

Analysing ‘Big Data’ from a database with real world data requires retrieving data from the database by writing relevant queries. As part of “Managing Big Data with MySQL” by Duke University on Cousera, I dealt with real world data that comprise hundreds to millions of entries/rows.

This is the database of Dillard’s Department stores, specifically, the

UA_DILLARDS that contains sales transaction data for 453 Dillard’s Department stores between August 2004 and July 2005.

The full actual and details of the tables in the database can be found on the University of Arkansas website

(Note: there was an additional table provided in the course which gave more data to the SKSTINFO  and that table is the SKSTINFO_FIX. Hence there are 7 tables that will be looked at here.  )

for reference, the full list of tables from the Dillard’s dataset are

  • DEPTINFO
  • SKSTINFO
  • SKSTINFO_FIX
  • SKUINFO
  • STORE_MSA
  • STRINFO
  • TRNSACT

The SQL queries here were written as part of assignments in Coursera course. These are my personal codes and criticisms and additions are welcome

Exercise 1. Use HELP and SHOW to confirm that the relational schema provided to us for the Dillard’s dataset shows the correct column names and primary keys for each table.

 

Exercise 2. Look at examples of data from each of the tables. Pay particular attention to the skuinfo table.

 

Exercise 3. Examine lists of distinct values in each of the tables.

 

Exercise 4. Examine instances of transaction table where “amt” is different than “sprice”. What did you learn about how the values in “amt”, “quantity”, and “sprice” relate to one another?

When the “amt” is different than “sprice” then the quantity purchased was more than 1

Exercise 5: Even though the Dillard’s dataset had primary keys declared and there were not many NULL values, there are still many strange entries that likely reflect entry errors. To see some examples of these likely errors, examine: (a) rows in the trsnact table that have “0” in their orgprice column (how could the original price be 0?), (b) rows in the skstinfo table where both the cost and retail price are listed as 0.00, and (c) rows in the skstinfo table where the cost is greater than the retail price (although occasionally retailers will sell an item at a loss for strategic reasons, it is very unlikely that a manufacturer would provide a suggested retail price that is lower than the cost of the item).


Exercise 6. Write your own queries that retrieve multiple columns in a precise order from a table, and that restrict the rows retrieved from those columns using “BETWEEN”, “IN”, and references to text strings.

– retrieve multiple columns in a precise order from a table

– select distinct rows from a table – rename columns in a query output

– restrict the data you retrieve to meet certain criteria – sort your output – reference parts of text “strings”

– use “BETWEEN” and “IN” in your query statements

Similar Posts

One Comment

  1. Wow, amazing weblog structure! How lengthy have you ever been blogging for?
    you make blogging look easy. The overall glance
    of your site is fantastic, let alone the content!

Leave a Reply

Your email address will not be published. Required fields are marked *