Thread: Counting # of consecutive rows with specified value(s)?
I'm working with an attendance table, where each person gets a record for each day of class, with an attendance code (ABSENT, ATTENDED, ...). I'm trying to figure out how to get the number of consecutive absences a person has. I'm guessing this can be done without writing a function, but I need some help here.
I can currently test whether someone has at least a specified number of consecutive absences with the query below, but it would be better to get the actual number.
As a second question, what about getting the number of consecutive records for a set of values? (e.g., attendance_code IN ('ATTENDED','EXCUSED')
Any ideas or suggestions? Thanks.
Ken
This query checks for 4 consecutive absences:
SELECT client_id,
array(
SELECT attendance_code
FROM attendance
WHERE client_id=enrollment.client_id
ORDER BY attended_on DESC
LIMIT 4
)=array_fill('ABSENT'::varchar,array[4]) AS absent_last_4
FROM enrollment;

AGENCY Software
A data system that puts you in control
(253) 245-3801
On Jun 6, 2012, at 22:20, Ken Tanzer <ken.tanzer@gmail.com> wrote: > I'm working with an attendance table, where each person gets a record for each day of class, with an attendance code (ABSENT,ATTENDED, ...). I'm trying to figure out how to get the number of consecutive absences a person has. I'm guessingthis can be done without writing a function, but I need some help here. > > I can currently test whether someone has at least a specified number of consecutive absences with the query below, butit would be better to get the actual number. > > As a second question, what about getting the number of consecutive records for a set of values? (e.g., attendance_codeIN ('ATTENDED','EXCUSED') > > Any ideas or suggestions? Thanks. > > Ken > > This query checks for 4 consecutive absences: > > SELECT client_id, > array( > SELECT attendance_code > FROM attendance > WHERE client_id=enrollment.client_id > ORDER BY attended_on DESC > LIMIT 4 > )=array_fill('ABSENT'::varchar,array[4]) AS absent_last_4 > FROM enrollment; > Quick thought if you have window functions. If you perform a rank over the appropriate subset of columns (with order by)you might be able to get consecutive records with matching codes to have the same rank. The in an outer query you cansimply sum with the rank as the group. I am unsure which function gives you equal values for ties, or whether this approachwill even work, but it seems worth considering at least. Using a min date your result might be readable as "startingon 9/1, 4 absences; starting on 9/5, 21 attendeds; etc...". Consider using a frame of 1 prior row to see whether a given date/code is different than the immediately preceding day. If it is that day qualifies as the possible beginning of a chain otherwise it has to be added to an existing chain. Youprobably need to do this first then use the dates of the applicable chain starts (and ends maybe...) as fields in thepartition mentioned above. HTH David J.
Le 2012-06-06 à 22:20, Ken Tanzer a écrit :
I can currently test whether someone has at least a specified number of consecutive absences with the query below, but it would be better to get the actual number.As a second question, what about getting the number of consecutive records for a set of values? (e.g., attendance_code IN ('ATTENDED','EXCUSED')Any ideas or suggestions? Thanks.
This is similar to the islands and gaps problem. Search for that on StackOverflow and you'll get it.
Bye!
François
Thank you both for the suggestions. I started playing with the window functions, but found and copied an "islands and gaps" example that didn't need them, and was simpler than I thought. This query seems to do the trick:
SELECT
client_id,
count(*)
FROM
(SELECT
client_id,
attendance_code
FROM recovery_circle_attendance rca
WHERE attended_on >
(SELECT max(attended_on)
FROM recovery_circle_attendance
WHERE client_id=rca.client_id AND attendance_code != 'ABSENT')
) foo
GROUP BY client_id;
It's a fairly small dataset, so at least right now I'm not too worried about performance, but am curious if this is a reasonably well-optimized way to get this info, or if there are any glaring issues or room for improvement in this regard?
Cheers,
Ken
--

On Thu, Jun 7, 2012 at 12:35 PM, François Beausoleil <francois@teksol.info> wrote:
Le 2012-06-06 à 22:20, Ken Tanzer a écrit :I can currently test whether someone has at least a specified number of consecutive absences with the query below, but it would be better to get the actual number.As a second question, what about getting the number of consecutive records for a set of values? (e.g., attendance_code IN ('ATTENDED','EXCUSED')Any ideas or suggestions? Thanks.This is similar to the islands and gaps problem. Search for that on StackOverflow and you'll get it.Bye!François

AGENCY Software
A data system that puts you in control
(253) 245-3801