R: How to visualize change in binary/categorical data over time
-
>dput(data) structure(list(ID = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3), Dx = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1), Month = c(0, 6, 12, 18, 24, 0, 6, 12, 18, 24, 0, 6, 12, 18, 24), score = c(0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0)), .Names = c("ID", "Dx", "Month", "score"), row.names = c(NA, -15L), class = "data.frame") >data ID Dx Month score 1 1 1 0 0 2 1 1 6 0 3 1 1 12 0 4 1 1 18 1 5 1 1 24 1 6 2 1 0 1 7 2 1 6 1 8 2 2 12 1 9 2 2 18 0 10 2 2 24 1 11 3 1 0 0 12 3 1 6 0 13 3 1 12 0 14 3 1 18 0 15 3 1 24 0 Suppose I have the above data.frame. I have 3 patients (ID = 1, 2 or 3). Dx is the diagnosis (Dx = 1 is normal, = 2 is diseased). There is a month variable. And last but not least, is a test score variable. The participants' test score is binary, and it can change from 0 or 1 or revert back from 1 to 0. I am having trouble coming up with a way to visualize this data. I would like an informative graph that looks at: The trend of the participants' test scores over time. How that trend compares to the participants' diagnosis over time In my real dataset I have over 800 participants, so I do not want to construct 800 separate graphs ... I think the test score variable being binary really has me stumped. Any help would be appreciated.
-
Answer:
Note: A lot of the following data manipulation needs to be done for part 2. Part 1 is less complex, and you can see it fit in below. Uses library(data.table) library(ggplot2) library(reshape2) To Compare First, change the Dx from 1 to 2 to 0 to 1 (Assuming that a 0 in score corresponds to a 1 in Dx) data$Dx <- data$Dx - 1 Now, create a matrix that returns a 1 for a 1 diagnosis with a 0 test, and a -1 for a 1 test with a 0 diagnosis. compare <- matrix(c(0,1,-1,0),ncol = 2,dimnames = list(c(0,1),c(0,1))) > compare 0 1 0 0 -1 1 1 0 Now, lets score every event. This simply looks up the matrix above for every entry in your matrix: data$calc <- diag(compare[as.character(data$Dx),as.character(data$score)]) *Note: This can be sped up for large matrices using matching, but it is a quick fix for smaller sets like yours To allow us to use data.table aggregation: data <- data.table(data) Now we need to create our variables: tograph <- melt(data[, list(ScoreTrend = sum(score)/.N, Type = sum(calc)/length(calc[calc != 0]), Measure = sum(abs(calc))), by = Month], id.vars = c("Month")) ScoreTrend: This calculates the proportion of positive scores in each month. Shows the trend of scores over time Type: Shows the proportion of -1 vs 1 over time. If this returns -1, all events were score = 1, diag = 0. If it returns 1, all events were diag = 1, score = 0. A zero would mean a balance between the two Measure: The raw number of incorrect events. We melt this data frame along month so that we can create a facet graph. If there are no incorrect events, we will get a NaN for Type. To set this to 0: tograph[value == NaN, value := 0] Finally, we can plot ggplot(tograph, aes(x = Month, y = value)) + geom_line() + facet_wrap(~variable, ncol = 1) We can now see, in one plot: The number of positive scores by month The proportion of under vs. over diagnosis The number of incorrect diagnoses.
Adrian at Stack Overflow Visit the source
Other answers
With ggplot2 you can make faceted plots with subplots for each patient (see my solution for dealing with the large number of plots below). An example visualization: library(ggplot2) ggplot(data, aes(x=Month, y=score, color=factor(Dx))) + geom_point(size=5) + scale_x_continuous(breaks=c(0,6,12,18,24)) + scale_color_discrete("Diagnosis",labels=c("normal","diseased")) + facet_grid(.~ID) + theme_bw() which gives: Including 800 patients in one plot might be a bit too much as already mentioned in the comments of the question. There are several solutions to this problem: Aggregate the data. Create patient subgroups and make a plot for each subgroup. Filter out all the patients who have never been ill. With regard to the last suggestion, you can do that with the following code (which I adapted from http://stackoverflow.com/a/24715757/2204410 to one of my own questions): deleteable <- with(data, ave(Dx, ID, FUN=function(x) all(x==1))) data2 <- data[deleteable==0,] You can use this as well for creating a new variable identifying patient who have been ill: data$neverill <- with(data, ave(Dx, ID, FUN=function(x) all(x==1))) Then you can for example aggregate the data with the several grouping variables (e.g. Month, neverill).
Jaap
Related Q & A:
- How do I change the time on my Yahoo page?Best solution by Yahoo! Answers
- How can I change the time on my outgoing emails?Best solution by Yahoo! Answers
- How do I change the widget clock to central time?Best solution by Yahoo! Answers
- How do I change the time in Hotmail?Best solution by Yahoo! Answers
- In yahoo alert, how can I change the alert receive time to match my local time?Best solution by Yahoo! Answers
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.