Select first row in each GROUP BY group?

Introduction to GROUP BY in SQL

In SQL, the GROUP BY clause is a powerful tool for summarizing data. It works by organizing rows with matching values in one or more columns into groups. This allows you to perform calculations on these groups, such as counting the number of items in each group (using COUNT()), finding the average value (using AVG()), or identifying the minimum or maximum values (using MIN() and MAX()). GROUP BY is typically used with a SELECT statement to extract specific data and often works alongside aggregate functions to generate insightful summaries from your database tables.

You can do following operations using GROUP BY in sql

  • To group your data in SQL, you’ll use the GROUP BY clause along with the SELECT statement.
  • The GROUP BY clause typically goes after the WHERE clause in your query, where you can specify conditions to filter your data before grouping.
  • If you also want to sort your grouped data, the ORDER BY clause comes after the GROUP BY clause in the query.
  • For filtering groups based on specific conditions, you can use the HAVING clause after the GROUP BY clause. This lets you apply conditions to the groups themselves, rather than the individual rows.

Syntax:

FROM table_name

WHERE condition

GROUP BY column1, column2

ORDER BY column1, column2;

Here, function_name refers to the specific calculation you want to perform on a column, like finding the sum (SUM()), average (AVG()), MIN(), MAX() or other statistical values.

table_name is simply the name of the table you’re retrieving data from.

condition is an optional part where you can specify a rule to filter the data. Only rows that meet the condition will be included in the results.

Examples of Using GROUP BY in SQL

You can create table with SQL Queries. I am here Using PostgreSQL. You can use any other sql database such as MySQL, ORACLE etc. The following output is the screenshot of

CREATE TABLE student (
  student_id INT PRIMARY KEY,
  student_name VARCHAR(50),
  course_title VARCHAR(50),
  marks DECIMAL(10,2),
  age INT
);

SELECT * FROM student;

Output:

image 2 - Select first row in each GROUP BY group?

Now insert some values in the table with query:

INSERT INTO student (student_id, student_name, course_title, marks, age) VALUES 
	(1, 'Alice Smith', 'Mathematics', 85, 19),
(2, 'Bob Johnson', 'History', 72, 20),
(3, 'Charlie Brown', 'Computer Science', 90, 18),
(4, 'Alice Smith', 'English Literature', 88, 21),
(5, 'Ethan Lee', 'Biology', 65, 19),
(6, 'Fiona Miller', 'Chemistry', 78, 20),
(7, 'Bob Johnson', 'Physics', 95, 18),
(8, 'Bob Johnson', 'Art History', 82, 21),
(9, 'Isaac Clark', 'Psychology', 68, 19),
(10, 'Jasmine Moore', 'Economics', 80, 20);

SELECT * FROM student;

Output:

image 3 - Select first row in each GROUP BY group?

Using GROUP BY with SUM() Function:

When we GROUP BY a single column, it’s like sorting through a pile of papers. We take all the papers with the same information on that specific column and put them in their own stacks.

SELECT student_name, SUM(marks) FROM student 
GROUP BY student_name;

Output:

image 4 - Select first row in each GROUP BY group?

The reason we see these totals is because there are multiple rows for both Bob Johnson and Alice Smith. In the data, Bob Johnson’s marks appear three times (72, 95, and 82), which adds up to 249. Similarly, Alice Smith’s marks appear twice (85 and 88) for a total of 173.

In other words, the GROUP BY clause has grouped together all the entries for each student, allowing us to calculate total marks for each one.

Untitled design - Select first row in each GROUP BY group?

Using GROUP BY with AVG() Function:

SELECT student_name, AVG(marks) FROM student
GROUP BY student_name;

Output:

image 5 - Select first row in each GROUP BY group?

Similarly, the results show us separate entries for students who took the course multiple times. For instance, Bob Johnson has three entries with marks 72, 95, and 82. Their average is calculated as (72 + 95 + 82) / 3 = 83. Similarly, Alice Smith has marks 85 and 88, and their average is (85 + 88) / 2 = 86.5.

In essence, the GROUP BY clause has categorized all the entries by student name, allowing us to compute the average marks earned by each student in the course.

Select First Row in Each GROUP by Distinct On

DISTINCT ON is a PostgreSQL-specific clause that allows you to retrieve rows based on distinct values of one or more specified columns. It is particularly useful when you want to retrieve only the first row for each distinct value of a column or combination of columns, based on a specified order.

Syntax:

SELECT DISTINCT ON (column1, column2, ...) column1, column2, ...
FROM table_name
ORDER BY column1, column2, ... ASC/DESC;

SELECT DISTINCT ON (column1, column2, …): Specifies the columns on which you want to retrieve distinct rows.
column1, column2, …: Columns you want to include in the output. These columns can be different from the ones specified in the DISTINCT ON clause.
FROM table_name: Specifies the table from which you are selecting data.
ORDER BY column1, column2, …: Determines the order in which the rows are considered when identifying distinct rows. The first row encountered for each combination of values in the specified columns is returned.

SELECT DISTINCT ON (student_name)
       student_name, marks
FROM   student;

Output:

image 6 - Select first row in each GROUP BY group?
Share The Tutorial With Your Friends
Twiter
Facebook
LinkedIn
Email
WhatsApp
Skype
Reddit

Check Our Ebook for This Online Course

Advanced topics are covered in this ebook with many practical examples.