The tna
package accepts sequence data (a
stslist
object created by the TraMineR
package), a wide data.frame
where each row represents a
sequence and each column represents a timepoint, or a transition matrix.
When our data does not follow any of those formats, we can use the
prepare_data
function from tna
to get it in
the right shape.
Let’s start by creating an example dataframe containing event logs
where each row represents an action performed by a user
at
a specific timestamp.
The achievement
column
indicates whether the user was associated with high or low
achievement.
<- tribble(
df ~user, ~timestamp, ~event, ~achievement, ~order,
2, "2025-02-27 18:01:32", "Plan", "High", 1,
2, "2025-02-27 18:03:32", "Goals", "High", 2,
1, "2025-02-27 18:08:32", "Goals", "High", 1,
3, "2025-02-27 18:15:32", "Plan", "High", 1,
5, "2025-02-27 18:16:32", "Help", "Low", 1,
3, "2025-02-27 18:19:32", "Goals", "High", 2,
5, "2025-02-27 18:19:32", "Plan", "Low", 2,
2, "2025-02-27 18:20:32", "Environment", "High", 3,
1, "2025-02-27 18:25:32", "Task", "High", 2,
4, "2025-02-27 18:25:32", "Help", "Low", 1,
5, "2025-02-27 18:26:32", "Task", "Low", 3,
3, "2025-02-27 18:32:32", "Metacognition", "High", 3,
4, "2025-02-27 18:33:32", "Goals", "Low", 2,
1, "2025-02-27 18:36:32", "Environment", "High", 3,
1, "2025-02-27 18:44:32", "Task", "High", 4,
1, "2025-02-27 18:45:32", "Task", "High", 5,
5, "2025-02-27 18:46:32", "Help", "Low", 4,
1, "2025-02-27 19:01:32", "Plan", "High", 6,
2, "2025-02-27 19:06:32", "Environment", "High", 4,
4, "2025-02-27 19:06:32", "Plan", "Low", 3,
1, "2025-02-27 19:13:32", "Metacognition", "High", 7,
3, "2025-02-27 19:13:32", "Metacognition", "High", 4,
4, "2025-02-27 19:15:32", "Goals", "Low", 4,
4, "2025-02-27 19:20:32", "Metacognition", "Low", 5,
4, "2025-02-27 19:20:32", "Environment", "Low", 6,
5, "2025-02-27 19:23:32", "Metacognition", "Low", 5,
3, "2025-02-27 19:25:32", "Help", "High", 5,
2, "2025-02-27 19:27:32", "Metacognition", "High", 5,
2, "2025-02-27 19:33:32", "Environment", "High", 6,
4, "2025-02-27 19:46:32", "Environment", "Low", 7,
5, "2025-02-27 19:49:32", "Plan", "Low", 6,
2, "2025-02-27 19:55:32", "Goals", "High", 7
)
When we are interested in the whole dataset as a single sequence, and
the data is already ordered chronologically, we just need to specify the
action
column, which in this case is called
event
. We can pass the result of calling
prepare_data
directly to the tna
function to
create a tna
model.
<- prepare_data(df, action = "event")
by_classroom <- tna(by_classroom)
tna_by_classroom plot(tna_by_classroom)
When we want to create a sequence per user, in addition to
action
, we need to specify the actor
column,
which in this case is user
. If the data is already ordered,
we do not need to provide any additional arguments. If it is not
ordered, we can provide a column to order the data by, in this case
order
.
<- prepare_data(df, actor = "user", action = "event", order = "order")
by_user <- tna(by_user)
tna_by_user plot(tna_by_user)
If rather than the order we only have the timestamps, we can provide
it as time
column. By default, events happening less than
15 minutes apart will be grouped in the same sequence, while events that
happen after a longer time, will mark the start of a new sequence (i.e.,
session). If both time
and order
are provided,
the data will be first ordered by time
, and in case of a
tie, by order
.
<- prepare_data(df, actor = "user", time = "timestamp", action = "event")
by_session <- tna(by_session)
tna_by_session plot(tna_by_session)
If we want to customize the time gap that marks the start of a new
sequence, we can do so by customizing the time_threshold
argument (in minutes).
<- prepare_data(df, actor = "user", time = "timestamp", action = "event",
by_session_custom time_threshold = 10*60) # 10 minutes
<- tna(by_session_custom)
tna_by_session_custom plot(tna_by_session_custom)
Another advantage of using
prepare_data
prior to
constructing the tna
model is that we get to keep other
variables of the data and use them in our analysis. For instance, we can
use group_tna
to create a tna
model by
achievement group just by passing the result of
prepare_data
as a first argument, and indicating the name
of the column in the data that we want to group by.
<- group_tna(by_user, group = "achievement")
gtna plot(gtna)