Abstract

   The high background tumor mutation burden in cutaneous melanoma limits
   the ability to identify significantly mutated genes (SMGs) that drive
   this cancer. To address this, we performed a mutation significance
   study of over 1,000 melanoma exomes, combined with a multi-omic
   analysis of 470 cases from The Cancer Genome Atlas. We discovered
   several SMGs with co-occurring loss-of-heterozygosity and
   loss-of-function mutations, including PBRM1, PLXNC1 and PRKAR1A, which
   encodes a protein kinase A holoenzyme subunit. Deconvolution of bulk
   tumor transcriptomes into cancer, immune and stromal components
   revealed a melanoma-intrinsic oxidative phosphorylation signature
   associated with protein kinase A pathway alterations. We also
   identified SMGs on the X-chromosome, including the RNA helicase DDX3X,
   whose loss-of-function mutations were exclusively observed in males.
   Finally, we found that tumor mutation burden and immune infiltration
   contain complementary information on survival of patients with
   melanoma. In summary, our multi-omic analysis provides insights into
   melanoma etiology and supports contribution of specific mutations to
   the sex bias observed in this cancer.

INTRODUCTION

   Cutaneous melanoma is the most aggressive form of skin cancer. Most
   frequently it develops on non-acral, sun-exposed skin, linked with DNA
   damage from ultraviolet radiation (UVR). It can also arise on acral
   skin, such as the soles of the feet, palms of the hands, and fingernail
   matrix, where UVR is thought to play a lesser role.^[46]1 Melanomas
   originating from sun-exposed skin display one of the highest tumor
   mutation burden (TMB) among all malignancies.^[47]2–[48]4 The majority
   of these mutations are UVR-induced C>T transitions occurring at
   dipyrimidines.^[49]5

   An important yet poorly understood aspect of melanoma is that males
   have higher incidence and worse prognosis at all clinical
   stages.^[50]6,[51]7 The mechanisms that mediate these differences
   remain unclear. Recently, differential expression of a gonosomal gene,
   PPP2R3B, between sexes in melanoma was proposed to explain some of
   these differences.^[52]8 However, the cumulative effect of
   X-inactivation-escaping genes on melanoma biology remains largely
   unknown.

   Despite methodological advances in the identification of significantly
   mutated genes (SMGs)^[53]2,[54]9–[55]12, it remains difficult to
   determine which genes are under positive selection in melanoma. For
   instance, recent studies have reported a context-specific mutational
   signature characterized by extremely high mutation rates in ETS
   transcription factor binding sites.^[56]13–[57]16 This phenomenon
   occurs at cytosines flanked by a specific sequence ([C]TTCCG)^[58]13,
   where transcription factor binding causes conformational changes
   increasing DNA vulnerability to UVR-induced damage^[59]14 and reducing
   repair efficiency.^[60]15,[61]16 Classical trinucleotide mutation
   models do not account for this context-specific
   signature^[62]12,[63]13, which can lead to spurious evidence of
   positive selection. Additionally, the large proportion of passenger
   mutations greatly reduces the statistical power to detect genes under
   positive selection.^[64]2 Previous estimates suggest ~1,000 melanoma
   exomes are needed to achieve the same sensitivity provided by 200
   breast cancer cases.^[65]17 The largest integrative analysis of
   cutaneous melanoma from The Cancer Genome Atlas (TCGA) included 331
   cases, identifying 13 SMGs^[66]4, and a more recent analysis of 437
   cases identified 17 SMGs.^[67]12 Thus, a comprehensive catalogue of
   oncogenes and tumor suppressors is still lacking for cutaneous
   melanoma.

   Here, we performed a mutation analysis of cutaneous melanoma, combining
   whole exome somatic variants for 1,014 melanomas from five
   studies^[68]2,[69]4,[70]18–[71]20, with integration of the complete
   melanoma TCGA cohort of 470 cases with copy number, transcriptomic,
   methylation and clinical data. We controlled for background mutational
   processes by analyzing samples with different mutational signatures
   separately and limited the risk of false positives by accounting for
   ETS-binding sites and other confounding factors. For several identified
   SMGs, we observed independent evidence of positive selection, such as
   co-occurring mutations and loss-of-heterozygosity (LOH). The power
   gained by analyzing over 1,000 melanoma exomes, along with our
   integrative analysis, facilitated the identification of previously
   unrecognized SMGs in cutaneous melanoma, uncovered the importance of a
   male-specific tumor suppressor, DDX3X, and provided insights into the
   relationship of UVR, TMB, and immune infiltration with patient
   survival.

RESULTS

Summary of samples

   We collected and uniformly annotated whole exome somatic variant calls
   for 1,014 melanomas (623 males, 390 females, and one unannotated) from
   four whole exome sequencing studies^[72]2,[73]4,[74]19,[75]20 and one
   whole genome sequencing study^[76]18 ([77]Supplementary Tables 1 and
   [78]2). The combined cohort comprised 219 primary, 663 metastatic, and
   132 unannotated samples. The majority (n = 772) originated on non-acral
   skin, and the rest were from acral (n = 51), mucosal (n = 14), or of
   unknown, uncertain, or unavailable origin (n = 177). We referred to a
   published curated annotation to define non-acral cutaneous melanomas in
   TCGA.^[79]21 Cases from the Hayward et al. study (n = 183) and the
   majority from TCGA (n = 470) were systemic and radiation treatment
   naïve prior to tumor sample procurement ([80]Supplementary Table 2).
   The other cohorts were not restricted to treatment naïve
   samples^[81]2,[82]19,[83]20. Only the TCGA cohort had matching gene
   expression, methylation and copy number data ([84]Supplementary Table
   3).

Identification of significantly mutated genes

   We identified SMGs using OncodriveFML^[85]11 (OFML), an algorithm that
   detects positive selection by comparing the average impact score of the
   mutations in a gene with its expected distribution under the hypothesis
   of neutral evolution. While OFML uses a permutation approach that
   controls for variations of the mutation rate across the genome, it
   relies on a global estimate of the tri-nucleotide background mutation
   rates. Consequently, we stratified our cohort according to the dominant
   tri-nucleotide mutational signature in each sample using non-negative
   matrix factorization (NMF). The optimal NMF decomposition consisted of
   three mutational signatures ([86]Extended Data Fig. 1a–[87]d), which we
   compared to a set of 65 pan-cancer signatures from the COSMIC database
   ([88]Extended Data Fig. 1f, [89]g).^[90]22 Our first signature matched
   UVR-associated mutational signatures (SBS7a and 7b) that dominated the
   majority of non-acral cutaneous melanomas ([91]Extended Data Fig. 1e).
   Our second signature was a mixture of an aging-associated signature
   (SBS1) and another signature of unknown etiology (SBS39), most
   prevalent in acral and mucosal melanomas ([92]Extended Data Fig. 1d,
   [93]e, [94]g). Our third signature corresponded to an alkylating
   agent-associated mutational signature (SBS11) dominant in 13 samples,
   likely due to prior treatment with an alkylating agent ([95]Extended
   Data Fig. 1d). We performed separate mutation significance analyses on
   UVR-high (>50% UVR-mutations, n = 824) and UVR-low samples (≤50%
   UVR-mutations, n = 177), excluding samples with a dominant alkylating
   signature (n = 13).

   OFML employs the CADD score^[96]23, which combines multiple annotations
   (e.g. conservation measures such as phyloP^[97]24 and protein-level
   scores such as SIFT^[98]25) into a single metric to reflect the
   relative functional impact of any single nucleotide change. It does not
   explicitly distinguish between gain-of-function (GoF) and
   loss-of-function (LoF) mutations. To improve our ability to detect
   tumor suppressor genes (TSGs), we used an additional score that
   considers high confidence LoF mutations (frameshifts, loss of
   translation start sites, premature stop codons, and splice site
   mutations).^[99]2

   We identified 38 SMGs (false discovery rate (FDR) < 1%) in our combined
   OFML analyses ([100]Supplementary Table 4 and [101]Extended Data Fig.
   2a–[102]d). These included established melanoma oncogenes and tumor
   suppressors in pathways related to RTK-RAS-MAPK kinase signaling (BRAF,
   NRAS, NF1, KIT, MAP2K1, RAC1), apoptosis and cell cycle (TP53, CDKN2A,
   RB1, CDK4), PI 3-kinase signaling (PTEN), immune evasion (B2M),
   epigenetic regulation (ARID2), and mRNA splicing (SF3B1) ([103]Fig. 1a,
   [104]b, [105]Extended Data Fig. 2e). Comparing mutational frequencies
   across acral, mucosal, and UVR-high and -low non-acral cutaneous
   melanomas, we observed that KIT and SF3B1 were found significantly
   mutated only in the UVR-low analysis ([106]Extended Data Fig. 2d) and
   had higher mutation frequency in mucosal melanomas (~21% [3 of 14] for
   SF3B1 and ~14% [2 of 14] for KIT), as reported previously
   ([107]Extended Data Fig. 2f).^[108]18 Although KIT mutations were more
   frequent in acral (~8% [4 of 51]) compared to non-acral cutaneous
   melanomas (~4% [29 of 772])^[109]26,[110]27, the UVR-low subset of
   non-acral cutaneous melanomas had a KIT mutation frequency comparable
   to acral melanomas (~10% [8 of 82]; [111]Extended Data Fig.
   2g).^[112]26

Figure 1. The landscape of somatic driver mutations in melanoma.

   Figure 1.
   [113]Open in a new tab

   (a) Gene mutation frequencies in 1,014 melanoma tumors. Select
   biological roles are indicated by black boxes. The significantly
   mutated set (n = 27 genes) comprises genes inferred to undergo positive
   selection (OFML FDR < 1%). The UVR-group colour bar indicates whether a
   gene is significantly mutated in UVR-low or UVR-high tumors. The
   enriched set (n = 5 genes) comprises genes which passed a less
   conservative OFML FDR cut-off (<10%), selected based on their
   involvement in MAPK signalling, BAF, and MLL protein complexes. The
   other drivers set (n = 7 genes) comprises genes previously linked to
   melanoma. (b) Mutation and DNA copy number profiles of melanomas from
   the TCGA (n = 449 tumors). From top to bottom: (1) Number of single
   nucleotide variants (y-axis) per tumor (x-axis), plotted on a square
   root scale. (2) Tumor purity inferred by ABSOLUTE. (3) Patient age at
   the time of tumor sample procurement. (4) Gene-by-tumor matrix of
   mutations in significantly mutated genes, enriched genes, and other
   drivers. (5) Gene-by-tumor matrix of copy number alterations in genes
   that exhibited co-occurrence of mutations and loss-of-heterozygosity
   (LOH) or DNA copy gain (p<0.05; one-tailed Fisher’s exact test). (6)
   Single nucleotide substitution frequencies per tumor. (c) Proportion of
   mutations per gene attributed to each mutational signature identified
   by NMF (n = 1,014 tumors). (d) Saturation analysis of SMG detection
   using OFML with CADD or LoF scores. The y-axes show the mean number of
   SMGs detected (FDR < 1%) across ten random subsamples of a given sample
   size, indicated on the x-axis. Genes are stratified by mutation
   frequency in UVR-high tumors (n = 824 tumors). Dotted lines correspond
   to the min and max values observed across the 10 replicates.

Filtering potential false positives

   While OFML and similar well-established
   algorithms^[114]2,[115]9–[116]11 have demonstrated their proficiency in
   the identification of cancer driver genes, their mutational models
   remain a simplification of a more complex and heterogeneous process.
   For instance, several ETS binding sites exhibit high neutral mutation
   rates in melanoma ([117]Extended Data Fig. 3a). This can lead to
   recurrent mutations that do not confer a selective
   advantage,^[118]13–[119]16 but still deviate from background mutational
   models. While these mutations are usually located near the
   transcription start sites of actively transcribed genes, they can
   overlap with the coding region of low or non-expressed isoforms and be
   mis-annotated as non-synonymous variants. We believe this to be the
   case for STK19, SLC27A5, and SUCO among our SMGs ([120]Extended Data
   Fig. 3b, [121]c). We also observed nine SMGs (PDE7B, KCNQ, RNF217,
   SLC27A5, IVL, DACH1, RUNX1T1, HS3ST4, and DSPP) that had extremely low
   mRNA abundance and/or high neutral mutation rates ([122]Extended Data
   Fig. 3d,[123]e), two well-known discriminative features of false
   positives.^[124]10 We omitted these genes from downstream analyses.

Significantly Mutated Genes

   Our SMG analysis highlighted evidence of positive selection for the
   recently reported candidate oncogene CNOT9/RQCD1^[125]28 (mRNA
   helicase), the candidate tumor suppressor SETD2^[126]19 (histone lysine
   methyltransferase), and members of the SWI/SNF (BAF) complex family,
   ARID1A and BRD7.^[127]2,[128]12,[129]19 Here, we report significant
   enrichment of LoF mutations in an additional member of the SWI/SNF
   complexes, PBRM1, in ~4% of melanoma cases. Altogether, SWI/SNF complex
   subunits highlighted by our study (ARID2, ARID1A, ARID1B, PBRM1, and
   BRD7) exhibited LoF mutations in >12% of melanoma samples
   ([130]Extended data Fig. 4c). We also observed LoF mutations in a
   transmembrane receptor for semaphorins, PLXNC1, in ~5% of cases.
   Finally, the cAMP-protein kinase A (PKA) signaling pathway is known to
   play an important role in melanoma; however, driver somatic mutations
   affecting this pathway have remained elusive.^[131]29 We observed a
   significant enrichment of LoF mutations in PRKAR1A, a regulatory
   subunit of the cAMP-dependent PKA holoenzyme, which were found in ~2%
   of samples. PRKAR1A loss is known to activate PKA signalling and is
   observed in an autosomal dominant syndrome called Carney Complex, which
   is associated with the development of multiple neural-crest-derived
   tumors.^[132]30 Over 50% of mutations in most SMGs were likely acquired
   due to UVR mutagenesis ([133]Fig. 1c). Our significance analysis
   omitted several established melanoma-associated genes, possibly due to
   their low mutation frequency or the limitations of OFML, and a
   saturation analysis suggests that additional low frequency driver genes
   would be uncovered in larger cohorts ([134]Fig. 1d). These genes
   included APC, CTNNB1, EZH2, IDH1, KRAS, HRAS and PPP6C ([135]Fig. 1a,
   [136]b). We considered these genes false negatives and included them in
   downstream analyses.

   To identify trending genes that did not meet our 1% FDR significance
   cut-off, we performed gene set enrichment analysis (GSEA) on 75 genes
   with an OFML FDR <10%. We found an expected enrichment of MAPK pathway
   genes ([137]Extended data Fig. 4a), including two recently reported
   RASopathy genes with tumor suppressor functions, SPRED1 and
   RASA2.^[138]19,[139]31 We identified one member of the mixed-lineage
   leukemia (MLL) complex family, KMT2B, as significantly mutated, and an
   enrichment for other members, KMT2A, MEN1, and KANSL1 in our mutation
   analysis ([140]Extended data Fig. 4b). These MLL complex genes
   collectively exhibited LoF mutations in ~7% of samples ([141]Extended
   data Fig. 4c).

   Finally, three SMGs identified at <1% FDR were located on the X
   chromosome: DDX3X (a DEAD-box RNA helicase), CCNQ/FAM58A (the
   activating cyclin for CDK10), and ZFX (a C2H2 zinc finger transcription
   factor).^[142]3,[143]4,[144]12 Despite sex being one of the strongest
   independent prognostic factors in melanoma,^[145]6,[146]7 sex
   differences in driver mutations have not yet been reported in melanoma.

DDX3X is a sex-specific tumor suppressor in cutaneous melanoma

   Some tumor suppressors escape X chromosome inactivation (XCI), which
   has been proposed to explain the protective effect of the X chromosome
   against cancer.^[147]32 We compared TMB between sexes and observed
   lower values for autosomes in females relative to males ([148]Fig.
   2a).^[149]33 We observed no significant difference for the X
   chromosome, likely explained by the accumulation of mutations on the
   additional copy in females ([150]Fig. 2a). We compared the mutation
   frequency of the SMGs identified in our analysis and observed that
   autosomal SMGs were mutated more frequently in males than females, but
   these differences were not statistically significant when controlling
   for the difference in TMB between sexes ([151]Fig. 2b and
   [152]Supplementary Table 5). The three X-linked SMGs, DDX3X, CCNQ and
   ZFX were also more frequently mutated in males ([153]Fig. 2b). This was
   unexpected given the similar TMB observed between sexes for the X
   chromosome. DDX3X showed the only significant imbalance in our analyzed
   cohort (FDR < 1%; two-tailed Fisher’s exact test), with its LoF
   mutations (n = 19) found exclusively in males ([154]Fig. 2c). This
   result remained significant when controlling for age, study, and TMB
   using a logistic regression approach ([155]Extended Data Fig. 5a).

Figure 2. DDX3X is enriched in loss-of-function (LoF) mutations in males and
escapes X-inactivation in females.

   Figure 2.
   [156]Open in a new tab

   (a) Box and whisker plots showing the number of autosomal and
   X-chromosome single nucleotide variants (SNVs) in male (n = 623) and
   female (n = 390) melanoma tumors. Each point represents one tumor.
   Boxes indicate the first, second, and third quartiles. Whiskers extend
   to the minimum and maximum data points, no further than 1.5 times the
   inter-quartile range from the hinges. P-values are from a two-tailed
   Wilcoxon rank sum test. (b) Volcano plot showing the relationship
   between patient sex and the mutation frequencies of 39 driver genes
   considered in this study. Each point represents one gene. X-linked
   genes are labelled in gold. The x-axis shows the odds ratio (OR) of
   mutation in females (n = 390 tumors) relative to males (n = 623
   tumors). The y-axis corresponds to FDR-adjusted p-values from a
   two-tailed Fisher’s exact test of divergence from the expected OR,
   denoted by a dark grey dotted line for autosomal genes and gold dotted
   line for X-linked genes. A horizontal red dotted line marks the FDR
   cut-off of 1%. (c) Frequency of DDX3X LoF in females and males,
   stratified by patient cohort. (d) Box and whisker plot of DDX3X mRNA
   expression in male TCGA tumors, stratified by DDX3X mutation status
   (n[LoF] = 12; n[missense] = 12; n[WT] = 265 tumors). Each point
   represents one tumor. Box plot elements are described in (a). Violin
   widths correspond to the density of points. P-values are from a
   two-tailed Wilcoxon rank sum test comparing mRNA expression levels
   between DDX3X wild-type (WT) and mutant tumors. (e) Allele frequencies
   of DDX3X mutations in TCGA tumors plotted against tumor purity (i.e.
   proportion of cancer cells) (n = 24 tumors). Each point represents one
   tumor. The diagonal lines represent the expected allelic frequencies
   for clonal mutations in males and heterozygous females. (f) Volcano
   plot showing differences in mRNA expression of 757 X-linked genes
   between females (n = 174 tumors) and males (n = 273 tumors) from the
   TCGA cohort. Each point represents one gene. The x-axis corresponds to
   the difference in mean expression relative to males and the y-axis
   shows FDR-adjusted p-values (using the Benjamini-Hochberg procedure).
   The horizontal red dotted line marks an FDR cut-off of 1%. The
   fold-changes and FDR values were estimated using DESeq2, parameterized
   to perform a two-tailed Wald test on negative binomial generalized
   linear model coefficients. (g) Number of RNA-seq reads supporting the A
   (x-axis) and C (y-axis) alleles of SNP rs5963957 at the DDX3X locus in
   the TCGA cohort (n = 468 tumors). Each point represents one tumor.
   Density plots show the distribution of points along the x- and y-axes
   separately for males and females. (h) Distribution of mean DNA
   methylation at the DDX3X promoter in female tumors (blue line; n = 180
   tumors) compared to the promoters of other X-linked genes either
   upregulated (red line; n = 7,200 promoters across 180 tumors), or not
   upregulated (black line; n = 98,916 promoters across 180 tumors) in
   females compared to males. Blue and black distributions were compared
   using a one-tailed Kolmogorov-Smirnov test for a rightward shift in the
   black distribution relative to the blue distribution. TSS is an acronym
   for transcription start site.

   LoF mutations in DDX3X were associated with a decrease in its mRNA
   expression ([157]Fig. 2d). A comparison of mutated allele frequencies
   with tumor sample purity derived computationally by ABSOLUTE suggests
   that most DDX3X LoF mutations are homozygous clonal ([158]Fig. 2e),
   indicating they likely occurred prior to clonal expansion. Our NMF
   mutation signature analysis revealed ~75% of DDX3X mutations are
   attributable to UVR ([159]Fig. 1c).

   We examined all X-linked genes for differential expression between
   sexes and identified 45 genes significantly upregulated in females
   ([160]Fig. 2f), which suggests they escape XCI. DDX3X expression was
   ~1.3-fold higher in melanomas from females (FDR < 1%). We observed
   biallelic expression of a common single nucleotide polymorphism
   (rs5963957) located in DDX3X ([161]Fig. 2g). Furthermore, upregulated
   X-linked genes, and specifically DDX3X, had lower levels of promoter
   methylation ([162]Fig. 2h). These results indicate that females are
   protected against complete loss of DDX3X in the event of a single
   mutation, as opposed to males, which could explain some of the observed
   sex bias in melanoma incidence and outcomes.

   To gain insight into the biological consequences of DDX3X mutations in
   melanoma, we compared mRNA expression profiles of wild-type samples to
   those harboring LoF and missense DDX3X mutations in TCGA. We controlled
   for potential confounding factors, such as tumor purity, and confined
   our analysis to male samples. We identified 57 upregulated and 10
   downregulated genes (FDR < 20%) ([163]Fig. 3a), including DVL1, which
   exhibited 50% upregulation in mutant samples. DVL1 is a regulator of
   the WNT/β-catenin signaling axis, one of the best-characterized
   DDX3X-regulated pathways.^[164]34 Given the high genetic heterogeneity
   in these tumors, we sought additional evidence supporting these mutant
   DDX3X associated changes. We analysed public RNA-Seq data of DDX3X
   knockdown in three cell lines (K562, HepG2, and the melanoma cell line,
   HT144).^[165]35,[166]36 We observed substantial concordance between
   expression differences in these lines and tumors ([167]Extended Data
   Fig. 5b, [168]c).

Figure 3. Genes and pathways dysregulated as a result of DDX3X mutations.

   Figure 3.
   [169]Open in a new tab

   (a) Volcano plot showing differences in mRNA expression of genes
   between DDX3X mutant and wildtype (WT) male tumors from the TCGA cohort
   (n = 22 mutant; n = 167 wildtype tumors). Each point represents one
   gene. The x-axis shows to the log2 fold-change in mean expression
   relative to WT samples, and the y-axis shows corresponding p-values
   (unadjusted). Eight genes with the smallest p-values are labelled on
   the plot. Fold-changes and p-values were estimated using the Limma R
   package, parameterized to perform a two-tailed t-test on linear model
   coefficients. (b) Distribution of eCLIP peaks for DDX3X, and other RNA
   binding proteins (RBPs), along transcripts in K562 and HepG2 cell
   lines. The x-axis corresponds to the gene length percentile, beginning
   at the 5’UTR. The y-axis shows the proportion of genes with overlapping
   RNA binding protein (RBP) peaks at a given percentile relative to the
   total number of genes bound by the RBP. (c) A heatmap showing the
   difference in densities of differentially expressed DDX3X targets in
   TCGA tumors (x-axis) and K562 cells (y-axis), relative to the targets
   of other RBPs (n = 22 DDX3X mutant tumors, n = 164 DDX3X wildtype
   tumors; n = 2 biological replicates of DDX3X knockdown, n = 2
   biological replicates of non-targeting controls in K562 cells; n =
   1,196 DDX3X targets, n = 2,808 other RBP targets). (d) Gene set
   enrichment analysis (GSEA) of mutant DDX3X-associated transcriptional
   changes in male melanomas from the TCGA (n = 22 mutant tumors, n = 164
   WT tumors). The x-axis corresponds to the effect size (i.e. the
   coefficient from a logistic regression GLM). A positive or negative
   effect size indicates whether a gene set on the y-axis is upregulated
   or downregulated upon DDX3X loss, respectively. The colour of each bar
   indicates the FDR-adjusted p-value (using Benjamini-Hochberg procedure)
   associated with the effect size. Only pathways which passed an FDR
   cut-off of <1% and were concordantly dysregulated in HT144 cells after
   DDX3X knockdown (p < 0.05) are shown (n = 2 biological replicates of
   knockdown, n = 2 biological replicates of non-targeting controls). GLM
   coefficient p-values were computed using a two-tailed z Wald test.

   Considering DDX3X is a DEAD-box protein family member that has
   ATP-dependent RNA helicase activity^[170]37, we used enhanced
   crosslinking and immunoprecipitation (eCLIP) data from ENCODE project
   to examine whether DDX3X binding sites are enriched in differentially
   expressed genes.^[171]35,[172]38 Given the strong positional enrichment
   of DDX3X peaks in 5’UTRs ([173]Fig. 3b), we defined a set of DDX3X
   target genes, whose 5’UTRs overlap DDX3X binding sites. We compared
   these to a set of control genes, whose 5’UTRs overlap at least one
   binding site from a compendium of RNA binding proteins (RBPs), to
   account for potential biases associated with eCLIP experiments. In both
   cell lines and tumors, we observed enrichment of DDX3X targets in genes
   upregulated due to DDX3X knockdown or mutation compared to the control
   gene set ([174]Fig. 3c, [175]Extended Data Fig. 5d).

   To identify pathways impacted by DDX3X mutations, we performed GSEA on
   DDX3X-associated gene expression differences in melanomas from TCGA. We
   identified 100 gene sets exhibiting differential regulation (FDR < 1%).
   Overall, 34 were concordantly differentially regulated in the HT144
   melanoma line (p < 0.05) ([176]Fig. 3d). Upregulated gene sets were
   related to metastatic processes, as well as RAS, PI3K, β-catenin and
   neuronal signaling pathways. Downregulated gene sets were involved in
   cell cycle processes and RNA metabolism. Altogether, this analysis
   suggests that DDX3X loss is associated with de-differentiation,
   invasiveness and reduced proliferation, consistent with a recent
   functional study.^[177]36

The DNA copy number landscape of cutaneous melanoma

   The landmark melanoma TCGA study analyzed copy number data from 331
   melanomas.^[178]4 To gain insight into additional genetic driver events
   targeted by copy number alterations, we obtained estimates of tumor
   purity, ploidy, and genome-wide copy number for the TCGA cohort using
   ABSOLUTE^[179]39 ([180]Fig. 4). We confirmed that our copy number calls
   are positively correlated with mRNA expression of driver genes
   ([181]Fig. 4b). Overall, the most frequent chromosome arm alterations
   included gain of 6p (40%), 7q (40%), 1q (35%), 7p (35%), and 8q (32%);
   and loss of 9p (63%), 10q (50%), 6q (45%), 10p (40%), 9q (38%), and 11q
   (32%) ([182]Fig. 4c). None of the examined autosomal arms were
   completely lost ([183]Fig. 4 e, [184]f). Recurrent focal homozygous
   loss was observed for a few genes, including CDKN2A (25%), PTEN (5%),
   LINC00290 (3%), and SPRED1 (1%) ([185]Fig. 4d). Most LOH events in
   samples that have undergone genome duplication were copy-neutral (i.e.
   at loci with a copy number of 2) ([186]Fig. 4 e, [187]f), supporting
   the notion they occur prior to genome duplication.^[188]39

Figure 4. The landscape of somatic DNA copy number alterations in cutaneous
melanoma.

   Figure 4.
   [189]Open in a new tab

   (a) Scatter plot with marginal histograms showing the distribution of
   mean tumor sample ploidy (x-axis) and purity (y-axis), inferred using
   ABSOLUTE (n = 449 tumors from the TCGA). (b) Bar plot showing the
   Spearman correlation between gene RNA expression (from Kallisto) and
   relative copy number (n = 448 tumors). The plot includes all 36
   autosomal drivers considered in this study, and genes highlighted by
   the TCGA’s 2015 GISTIC analysis of DNA copy number alterations in
   melanoma (GISTIC-2015 genes). (c) Bar plot showing the gain and loss
   frequencies of autosomal chromosome arms, defined for each arm by the
   sign of the difference between its median copy number and the overall
   median copy number in a tumor (n = 449 tumors). (d) Bar plot showing
   the homozygous deletion (HD) frequencies of SMGs and GISTIC-2015 genes
   that have one or more HDs in 449 TCGA tumors. (e) Genome-wide
   frequencies (in 10kb bins) of copy number alterations in all TCGA
   melanomas without genome duplication (n = 289 tumors). From bottom to
   top: (1) Genome-wide frequencies of loss-of-heterozygosity (LOH) and
   (2) DNA copy number. (3) Significantly amplified or deleted genomic
   regions (q-value < 0.01), identified here using GISTIC version 2.0 (a
   right-tailed permutation test performed independently for gains and
   losses, with p-values adjusted for multiple testing using the
   Benjamini-Hochberg procedure). GISTIC was run on 470 TCGA tumors. (4)
   Genomic loci for a subset of the 36 autosomal driver genes considered
   in this study that overlapped significant GISTIC regions, exhibited
   homozygous deletion in one or more samples, or whose mutations
   co-occurred with LOH or copy gain. GISTIC-2015 genes are also shown.
   (f) Genome-wide frequencies (in 10kb bins) of copy number alterations
   in all TCGA melanomas that have undergone one genome duplication (n =
   146 tumors). (g) Genome-wide frequencies (in 10kb bins) of copy number
   alterations in non-acral cutaneous TCGA melanomas without genome
   duplication. The top and middle panels contain UVR-low and UVR-high
   tumors, respectively (n = 29 UVR-low tumors, n = 234 UVR-high tumors).
   The bottom panel shows the unadjusted p-values from a two-tailed
   Fisher’s exact test comparing UVR-low and UVR-high distributions in
   each genomic bin, multiplied by the sign of the difference in mean copy
   number between the groups, such that a positive value indicates a
   higher copy number in the UVR-low group.

   We compared the copy-number profiles of UVR-high and UVR-low non-acral
   cutaneous melanomas. We observed chromosome arms 4p, 5p, 8q, 11q, and
   22q were more frequently amplified in UVR-low cases ([190]Fig. 4g),
   while chromosome arm 9q was more frequently deleted in UVR-high cases.
   Finally, a region of 15q overlapping SPRED1 and B2M was preferentially
   deleted in UVR-low melanomas.

   We observed statistically significant co-occurrence between segmental
   LOH and LoF mutations in several tumor suppressors including B2M, MEN1,
   CDKN2A, PTEN, TP53, APC, NF1, and RB1 ([191]Fig. 5a, [192]Supplementary
   Table 6). In addition, BRD7 (OR = 10.40, P = 2.57×10^−3), PLXNC1 (OR =
   7.36, P = 8.01×10^−3), and PBRM1 (OR = 6.07, P = 1.89×10^−2) also
   exhibited association between LOH and LoF mutations. All PRKAR1A LoF
   mutations were concurrent with LOH (P = 2.75×10^−04). Similarly, we
   observed significant co-occurrence between DNA copy gain and recurrent
   amino acid substitutions in three activators of the MAPK signalling
   pathway: KIT, BRAF, and NRAS ([193]Fig. 5b, [194]Supplementary Table
   6). Overall, the frequency of local copy loss of SMGs was positively
   correlated with their enrichment of LoF mutations ([195]Fig. 5c,
   [196]d). Finally, we used GISTIC^[197]40 to identify significantly
   recurrent copy number alterations (q-value < 0.01) ([198]Supplementary
   Tables 7, [199]8). Three SMGs (CDK4, KIT, and BRAF), in addition to
   EZH2, overlapped significantly amplified regions, and four SMGs (BRD7,
   B2M, CDKN2A, and PTEN), in addition to SPRED1 and KMT2A, overlapped
   significantly deleted regions ([200]Fig. 4e).

Figure 5. Co-occurrence of copy number alterations and mutations in driver
genes.

   Figure 5.
   [201]Open in a new tab

   (a) Forest plot showing the odds-ratio of co-occurrence between
   loss-of-function (LoF) mutations and segmental loss-of-heterozygosity
   (LOH). The p-values are from a right-tailed Fisher’s exact test. A
   vertical red dotted line marks a p-value of 0.05. (b) Forest plot
   showing the odds-ratio of co-occurrence between recurrent missense
   mutations and segmental copy gain. The p-values are from a right-tailed
   Fisher’s exact test. A vertical red dotted line marks a p-value of
   0.05. For panels (a) and (b), only a subset of the 36 autosomal driver
   genes with 3 or more mutations and 3 or more copy number alterations
   and a Fisher’s test p-value less than one are shown. Per gene sample
   sizes are provided in [202]Supplementary Table 6. (c) Bar plot showing
   the frequency of local gain and loss in driver genes. Shown here are
   all 36 autosomal drivers considered in this study, and GISTIC-2015
   genes. Local gain and loss were defined per sample for each gene by the
   sign of the difference between its absolute copy number and the median
   absolute copy number of its host arm (n = 449 tumors). (d) Scatter plot
   showing the relationship between the local deletion of genes (y-axis)
   and enrichment of LoF mutations (x-axis). Each point corresponds to one
   gene. The correlation p-value was computed using a two-sided Spearman’s
   test via the asymptotic t approximation (n = 36 genes).

Deconvolution of melanoma intrinsic and extrinsic expression profiles

   To gain insight into the relationship between the mutational landscape
   and transcriptome, we screened for associations between driver gene
   alterations and cancer-cell intrinsic mRNA signatures. Previous studies
   used unsupervised clustering of mRNA profiles to group melanomas based
   on their dominant gene expression signatures.^[203]4,[204]41 Four major
   signatures have been found in cutaneous melanoma: immune, keratin,
   MITF-Low, and MITF-high. Because some of these signatures can originate
   from stromal and immune cells, tumor purity can greatly impact
   transcriptomic grouping. Our analysis of tumor purity across TCGA
   samples revealed melanoma tumors vary widely in their stromal cell
   content (interquartile range of 15%−49%; [205]Fig. 6a). Strong negative
   correlations were observed between tumor purity and expression for a
   large number of genes ([206]Fig. 6b), implying a significant proportion
   of variance in expression reflects stromal cell content variations
   rather than differences in cancer cell gene expression.

Figure 6. Deconvolving melanoma and stromal mRNA expression in bulk tumor
samples.

   Figure 6.
   [207]Open in a new tab

   (a) Histogram of tumor purity (proportion of cancer cells) in TCGA
   melanomas, estimated using ABSOLUTE (n = 449 tumors). (b) Histogram of
   correlations between tumor purity and mRNA expression, per gene (n =
   447 tumors, n = 17,481 genes). (c) Forest plot of the coefficients of a
   multivariable beta regression between all five NMF signature weights
   and tumor purity, with their associated 95% confidence intervals (CI)
   and unadjusted p-values (computed using a two-tailed z Wald test of
   coefficients) (n = 447 tumors). (d) Matrix of NMF signature weights
   across tumors. Each row corresponds to a signature, and each column to
   a tumor sample (n = 468 tumors). (e) Spearman’s correlation between the
   weights of each NMF signature across samples and each of 64 cell-type
   specific signature weights measured per sample using xCell (n = 468
   tumors). (f) Kaplan-Meier survival curves for TCGA melanoma patients
   with a metastatic regional lymph node tumor sample, stratified
   according to immune signature weight (n = 216 patients). (g)
   Distribution of tumor pigmentation scores in three melanoma intrinsic
   mRNA subgroups. P-values are from a two-tailed Wilcoxon rank sum test
   (n[OxPhos] = 47, n[Common] = 205, n[MITF-low] = 59 tumors). Violin
   widths correspond to the density of data points at a given pigmentation
   score.

   To untangle cancer-cell-intrinsic and -extrinsic mRNA signatures, we
   applied NMF to gene expression data from 468 TCGA samples. In contrast
   to partitional clustering, NMF considers samples as a mix of k unknown
   signatures and proceeds to deconvolve each sample into its constitutive
   parts.^[208]42 An advantage of NMF is that it can be used to assign
   signature weights to samples when signatures are not discrete. This is
   highly relevant for immune related signatures, as the degree of
   infiltration is a continuous predictor of patient outcome
   ([209]Extended Data Fig. 6a). The most stable NMF solution involved
   five signatures ([210]Extended Data Fig. 6b–[211]d), which we
   characterized using GSEA and the xCell tool.^[212]43–[213]45

   One signature showed a strong negative correlation with purity
   ([214]Fig. 6c), consistent with a normal-cell origin. It was associated
   with an array of immune cell types ([215]Fig. 6e) and predictive of
   patient survival ([216]Fig. 6f).^[217]4,[218]41 All samples exhibited
   some level of expression of this immune signature ([219]Fig. 6d,
   [220]Extended Data Fig. 6e). The second signature was characterized by
   high keratin expression and correlated with skin cells, such as
   keratinocytes and sebocytes ([221]Fig. 6e, [222]Extended Data Fig. 6f,
   [223]g). This keratin signature was present almost exclusively in
   primary samples ([224]Extended Data Fig. 6h) and likely explained by
   the presence of normal skin cells in those samples.

   In contrast, the other three expression signatures had a positive
   correlation with tumor purity ([225]Fig. 6c) and showed a pattern of
   mutual exclusivity ([226]Fig. 6d, [227]Extended Data Fig. 6e),
   suggesting they constitute well-defined cancer-cell intrinsic
   subgroups. This is further supported by the presence of highly
   concordant subgroups when performing classical clustering on
   purity-adjusted expression data ([228]Extended Data Fig. 6i, [229]j).

   The first subgroup (n = 76) corresponded to the well-known melanoma
   mRNA subgroup characterized by low levels of the lineage-specific
   transcription factor, MITF (MITF-low) ([230]Extended Data Fig.
   7a–[231]c).^[232]41 The second subgroup (n = 72) exhibited higher
   expression of genes that regulate oxidative phosphorylation (OxPhos)
   ([233]Extended Data Fig. 7d, [234]e), had the lowest expression of
   hypoxia-related genes, including HIF1A and VEGFA ([235]Extended Data
   Fig. 7b, [236]c), as well as the highest level of pigmentation
   ([237]Fig. 6g). The third subgroup constituted the majority (n = 291)
   of melanoma samples (Common), characterized by higher expression of
   MITF, interferon signalling genes, and genes co-expressed with the
   SWI/SNF chromatin-remodelling subunit, SMARCA2 ([238]Extended Data Fig.
   7a, [239]d, [240]e). Whereas tumors within the OxPhos mRNA subgroup
   exhibited gene expression patterns resembling differentiated
   melanocytes, the Common and MITF-low signatures resembled other
   lineages of the neural crest origin as determined using xCell
   ([241]Fig. 6e).^[242]41

   We examined the relationship between our mRNA signatures and other
   genomic features, including TMB and UVR signature (expressed as the
   proportion of UVR-associated mutations) in non-acral cutaneous samples
   from TCGA. We observed no significant association between TMB or UVR
   and our intrinsic mRNA subgroups ([243]Extended Data Fig. 7f, [244]g).
   However, we observed a modest but robust correlation of our immune
   signature with TMB and the UVR signature ([245]Extended Data Fig. 7h).
   We found 4 SMGs differentially mutated across our mRNA subgroups (FDR <
   20%) ([246]Figure 7a, [247]Supplementary Table 9). CDKN2A and TP53 were
   preferentially mutated and had lower expression in MITF-Low and Common
   samples ([248]Figure 7a, [249]b), whereas PRKAR1A was preferentially
   mutated and had lower expression in the OxPhos samples. Finally, CTNNB1
   and KIT had relatively more mutations and higher expression in OxPhos
   samples.

Figure 7. Alteration frequency of melanoma drivers differ across mRNA
subgroups.

   Figure 7.
   [250]Open in a new tab

   (a) Bottom to top: (1) Bar plot showing the frequency of coding
   mutations, homozygous deletions, and local amplifications in each mRNA
   subgroup for all 39 driver genes considered in this study (n[OxPhos] =
   72, n[Common] = 299, n[MITF-low] = 76 tumors). The number of altered
   tumors per subgroup per gene is reported in [251]Supplementary Table 9.
   (2) FDR values from a two-tailed Fisher’s exact test for differential
   alteration frequency between subgroups (p-values were adjusted for
   multiple hypothesis testing using the Benjamini-Hochberg procedure).
   (3) Fold-difference in median gene expression per subgroup, relative to
   OxPhos, in samples not carrying the gene alterations listed in (1). (b)
   Box and whisker plots of mRNA expression for the four SMGs with the
   smallest p-values in panel (a), in addition to the catalytic protein
   kinase A (PKA) subunit, PRKACA, and its regulatory subunit PRKAR1A.
   Each point corresponds to one tumor. X-axes correspond to mRNA subgroup
   and Y-axes correspond to mRNA expression in log2 transformed counts per
   million (CPM). Boxes indicate first, second, and third quartiles.
   Whiskers extend to the minimum and maximum data points, no further than
   1.5 times the inter-quartile range from the hinges. P-values are from a
   two-tailed Wilcoxon rank sum test (n[OxPhos] = 72, n[Common] = 299,
   n[MITF-low] = 76 tumors). (c) Illustration of protein kinase A (PKA)
   regulation by PRKAR1A. Alpha-Melanocyte-stimulating hormone (αMSH)
   ligand binds to and activates melanocortin 1 receptor (MC1R), inducing
   adenylyl cyclase (AC). AC catalyzes the cyclization of adenosine
   triphosphate (ATP) into second messenger molecule, cyclic adenosine
   monophosphate (cAMP). Binding of cAMP to PRKAR1A relieves its
   inhibitory effect on PRKACA. Activated PRKACA is able to catalyze
   phosphorylation of target proteins by hydrolyzing ATP to adenosine
   diphosphate (ADP). cAMP dependent phosphodiesterases (PDEs) negatively
   regulate PKA signalling by hydrolyzing cAMP to adenosine monophosphate
   (AMP).

Correlates of immune infiltration and survival

   We next asked whether mutations in individual SMGs were associated with
   our immune signature. Because infiltrated tumors have lower proportions
   of tumor originating sequencing reads, we controlled for purity and
   sequencing coverage using a partial correlation model. Only mutations
   in one SMG, PRKAR1A, showed a negative correlation with the immune
   signature following multiple hypothesis correction (FDR < 5%;
   [252]Supplementary Table 10).

   Previous studies observed that high TMB is associated with improved
   response to immune checkpoint inhibitors (ICIs)^[253]20,[254]46 and
   longer survival in the cutaneous melanoma TCGA cohort.^[255]47 High TMB
   is thought to increase the likelihood that a tumor will express
   non-self antigens recognized by the immune system. More recently,
   UVR-induced DNA damage has been linked to improved survival^[256]21 and
   reported as a potential determinant in response to ICI.^[257]48,[258]49
   Here, we investigated the relationship of TMB, the UVR signature, and
   other clinical variables with melanoma post-accession survival (i.e.
   survival relative to time of tumor sample procurement) in patients with
   non-acral cutaneous melanoma in TCGA^[259]4. We first tested an initial
   set of clinical, pathological, and molecular features using univariate
   Cox proportional-hazards models and a p-value threshold of 0.05.
   Statistically significant predictors consisted of the immune signature,
   TMB, UVR signature, age, and tumor tissue site ([260]Extended Data Fig.
   8a). We then considered multivariable Cox proportional-hazards models
   for all possible subsets of predictors and compared the effect of TMB
   and UVR-signature inclusion on their quality, using the Akaike
   Information Criterion (AIC). The best models included the immune
   signature, tumor tissue site, age at sample procurement, and either
   UVR-signature or TMB ([261]Extended Data Fig. 8b). We observed that the
   immune signature, UVR-signature and TMB were also amongst the best
   predictors of overall survival (i.e. survival relative to time of
   initial diagnosis) ([262]Extended data Fig. 9). These results indicate
   wthat UVR-signature and TMB provide prognostic information
   complementary to immune infiltration ([263]Fig. 8a, [264]b). Including
   both UVR and TMB simultaneously did not significantly improve AIC or
   concordance index ([265]Extended Data Fig. 8c, [266]d), which is not
   surprising due to their substantial correlation (Spearman rho of 0.73)
   ([267]Fig. 8d). Notably, when restricting our analysis solely to
   UVR-high samples, TMB, but not the proportion of UVR mutations,
   provided a significant improvement to the model ([268]Fig. 8c,
   [269]Extended Data Fig. 8e, [270]f). This suggests that TMB provides
   information on melanoma patient survival not included in the UVR
   signature.

Figure 8. Correlates of immune infiltration, UV-signature, TMB and survival.

   Figure 8.
   [271]Open in a new tab

   (a and b) Multivariable Cox regression models of post-accession
   survival times for patients with non-acral cutaneous melanoma (n = 347
   patients). The model in (a) includes proportion of UVR mutations
   whereas (b) includes TMB. At the top of each panel are the model
   coefficients and their 95% confidence intervals, expressed in log2
   hazard-ratios, with p-values on the right (two-tailed z Wald test of
   coefficients). At the bottom of each panel are Kaplan-Meier survival
   curves for post-accession survival time of patients with regional lymph
   node metastasis samples. The patients are stratified into four groups
   based on median dichotomized immune signature and UVR-group in (a) or
   TMB-group in (b). The UVR groups are defined using a 50% cut-off for
   the proportion of UVR mutations. The TMB groups are based on a
   15th-percentile cut-off, in order to obtain similar UVR and TMB group
   sizes (23 UVR-low, 142 UVR-high, 29 TMB-low, 136 TMB-high patients).
   (c) Relative quality of different multivariable Cox regression models
   of post-accession survival in patients with non-acral cutaneous
   melanoma and a high UVR signature (n = 301 patients). All models
   include age at procurement and tumor tissue site, in addition to the
   predictors specified on the x-axis (symbolized using ellipses). The
   y-axis shows the Akaike information criteria (AIC). Lower AIC indicates
   better relative quality. (d) Proportion of UVR mutations versus TMB.
   Each data point represents one tumor (the number of tumors per group
   are: acral = 51, acral or non-cutaneous = 11, mucosal = 14, non-acral
   cutaneous = 772, unknown or unavailable = 166). (e) Schematic
   representation of the pipeline used to predict antigenic mutations.
   Note that each increment of neoantigen tier includes all previous
   tiers. (f) Scatter plot showing the number of predicted antigenic
   mutations per tumor sample (y-axis) versus the total number of
   expressed missense mutations (median TPM > 1, x-axis). For
   visualization purposes, one hypermutated sample is not shown in this
   plot. (g) Top recurrent predicted antigenic peptides (the left red
   dots) with associated mutations (right blue dots). Numbers indicate how
   many TCGA patients are expressing them. All 4 tiers were included.
   *Likely not expressed (see Filtering potential false positives in text
   and [272]Extended Data Fig. 3)

   We next explored if tumor neoantigen load is more informative than TMB
   regarding patient survival. The recent TCGA Pan-Cancer Atlas neoantigen
   study limited their analysis to primary tumors of ~100
   melanomas.^[273]50 We implemented a pipeline to predict neo-peptide
   binding to MHC class I for the complete TCGA cohort (n = 457)
   ([274]Fig. 8e). To maximize the sensitivity of our analysis, we
   considered different levels of stringency by grouping antigenic
   mutations into four tiers, based on the predicted binding affinities of
   the mutated and wild-type peptides. As expected, we observed extremely
   high correlation (Pearson > 0.99) between TMB and neoantigen load
   ([275]Fig. 8f)^[276]48. Substituting TMB by neoantigen load did not
   improve our survival models ([277]Extended Data Fig. 10).

   We next sought for evidence of negative selection acting upon the
   accumulation of antigenic mutations by comparing the number of
   predicted HLA-mutation pairs to the distribution obtained with 1,000
   random permutations of the HLA alleles across patients. We did not
   observe significant depletion for any tier. These results are
   consistent with a prior analysis that did not detect evidence of
   negative selection in 99 melanoma samples, and with a recent study that
   estimated ~99% of missense mutations are tolerated and escape negative
   selection.^[278]12

   Despite the absence of a strong immunoediting signal in the melanoma
   TCGA cohort, studies have shown that specific neoantigens can be
   exploited therapeutically.^[279]51 We looked for recurrent antigenic
   peptides and their associated mutations in our extended cohort. In
   addition to known recurrent neoantigens in BRAF and H/K/NRAS, we
   highlight here less appreciated recurrent neoantigens predicted for
   RAC1 and CDKN2A ([280]Fig. 8g). Whether these neoantigens are
   therapeutically relevant for the development of personalized tumor
   vaccines requires further investigation.

DISCUSSION

Male specific DDX3X loss-of-function mutations

   Women have lower melanoma incidence and better prognosis than men.
   Epidemiological studies estimate on average, for a 20-year old
   individual, the risk of any mole transforming into a melanoma by the
   age of 80 is 3 times higher in males than females.^[281]52 This has
   been attributed to behavioral factors; however, sex has been shown to
   be an independent prognostic factor in cutaneous melanoma and evidence
   clearly points to either tumor-intrinsic or host-related biological sex
   differences.^[282]6,[283]7 Here, we provided evidence that DDX3X
   escapes XCI and is preferentially mutated in male melanoma patients,
   potentially explaining some of the sex differences observed in this
   malignancy. We also performed an integrative analysis of multiple
   datasets that support dysregulation of RAS, PI3K, β-catenin pathways
   upon DDX3X loss.

   Our findings raise many questions. First, it is unclear what role
   DDX3Y, the Y-linked paralog of DDX3X, plays in melanoma. We observed
   that males carrying DDX3X mutations had concurrent mRNA expression of
   DDX3Y and did not observe significant co-occurrence between DDX3X and
   DDX3Y mutations ([284]Extended Data Fig. 5e, [285]f). Although these
   paralogs share 92% amino acid identity, genetic studies have shown that
   DDX3Y does not compensate for loss of DDX3X.^[286]53 Specifically,
   germline mutations in DDX3X have been associated to intellectual
   disability (ID), and pedigree analysis of ID-affected families have
   reported cases of DDX3X mutations causing ID in males, but not in
   carrier females within the same family.^[287]53 This is consistent with
   reports indicating that although DDX3Y mRNA is found in many human
   tissues, DDX3Y protein is observed only in spermatocytes.^[288]54
   Conversely, a CRISPR-Cas9 screening study observed that DDX3Y was
   essential in a DDX3X mutant cancer cell line of male origin^[289]55.
   Future studies characterizing DDX3X and DDX3Y expression and function
   in melanoma are required. Furthermore, trends are emerging in
   meta-analyses of sex differences in overall survival rates in ICI
   trials.^[290]56 Whether DDX3X plays a role in modulating response to
   ICI requires further examination.

The cAMP-PKA signaling pathway

   Recently, LoF mutations in PRKAR1A were reported in 2 of 27 whole-exome
   sequencing cases of spitzoid melanoma; however, none were reported in
   conventional non-acral cutaneous melanoma.^[291]57 Spitzoid melanoma is
   an uncommon melanocytic neoplasm composed of large atypical epithelioid
   or spindled cells, more frequently presented in childhood or
   adolescence as an unpigmented nodule.^[292]1 Here, we identified
   PRKAR1A as a SMG in ~2% of cases. To determine whether these mutations
   were solely in spitzoid melanomas, two dermatopathologists examined the
   digitized tumor slides, pathology reports and clinical data for 4
   primary and 3 metastatic cases harbouring a PRKAR1A LoF mutation in the
   TCGA dataset. Both dermatopathologists indicated none of these
   melanomas either displayed spitzoid morphology nor had clinical
   features associated with spitzoid melanoma. These results indicate that
   PRKAR1A loss is an infrequent but significant genetic event in
   conventional non-acral cutaneous melanoma.

   PRKAR1A encodes for the regulatory type IA subunit for the
   cAMP-dependent PKA holoenzyme.^[293]29 The holoenzyme exists as an
   inactive tetramer, which consists of two pairs of regulatory and
   catalytic subunits ([294]Fig. 7c). Loss of PRKAR1A function is known to
   activate PKA signalling, and germline LoF variants in PRKAR1A have been
   linked to the Carney Complex syndrome.^[295]30 By performing
   cross-platform integrative analysis, we observed that PRKAR1A LoF
   mutations are enriched in melanomas belonging to the OxPhos mRNA
   subgroup, which exhibits high expression of the PRKACA catalytic
   subunit ([296]Fig. 7b). A similar OxPhos expression signature has been
   linked to BRAF inhibitor resistance.^[297]58 A genome-wide
   open-reading-frame screen identified PRKACA as the highest scoring
   serine/threonine kinase to promote BRAF inhibitor resistance.^[298]59
   When examining published sequencing studies of BRAF inhibitor pre- and
   post-resistance melanoma samples, PRKAR1A mutations were found in 2 of
   45 (4.4%) post-treatment resistant cases.^[299]60 Whether PRKAR1A loss
   is associated with BRAF inhibitor resistance requires further
   investigation.

UVR and TMB in melanoma patient survival

   Studies have linked high TMB with improved ICI response and survival in
   patients with melanoma^[300]20,[301]46. However, two recent reports
   have suggested that these results are confounded by different melanoma
   subtypes (acral, mucosal and uveal), which generally have lower ICI
   responses, but also lack a UVR mutation signature and have lower
   TMB.^[302]48,[303]49 Here, we examined the relationship of UVR, TMB,
   the immune signature and other clinical variables with patient survival
   in non-acral cutaneous melanomas from TCGA that were predominantly
   procured prior to the widespread implementation of ICI therapies in the
   clinic. We observed TMB provides complementary information to immune
   infiltration on patient survival, even when restricting our analysis to
   non-acral cutaneous melanomas with a high UVR signature, although this
   effect was weaker in the latter case. These results support the notion
   that TMB is not simply distinguishing melanoma subtypes (non-acral
   versus acral, mucosal, and uveal), but is having an impact on patient
   survival. It will be interesting to see if the association between TMB
   and ICI response in melanoma re-emerges when analyzing larger cohorts
   of patients with a more comprehensive characterization of immune
   infiltration.

METHODS

Variant processing

   Aggregated somatic mutation files from 470 TCGA-SKCM samples were
   downloaded from the GDC^[304]61 portal. To mitigate sequencing errors
   and alignment artefacts, we only considered TCGA variants that were
   reported by at least three callers in at least one sample. Variants
   from the 183 MELA-AU whole genomes^[305]18 were downloaded from the
   ICGC data portal^[306]62. Variants from the Hodis^[307]2,
   Krauthammer^[308]19 and VanAllen^[309]20 cohorts were retrieved from
   the associated publications. Variants from hg19-based datasets were
   mapped to the hg38 reference using the rtracklayer R package. We
   discarded any variants with ambiguous coordinates (non-bijective
   mapping between hg19 and hg38) or discordant reference allele. The
   hg19-based coordinates of TCGA variants were similarly determined.
   Adjacent SNVs within each sample were identified using the
   GenomicRanges R package^[310]63 and merged back into MNVs. The combined
   set of variants from all five studies was re-annotated with snpEFF
   v.4.3s (2017–10-25)^[311]64 using Ensembl GRCh38.86 gene models and
   dbSNP build 150. Common germline variants were excluded from downstream
   analysis.

Mutational signatures analysis

   Mutational signatures were identified using non-negative matrix
   factorization (NMF) from the NNLM R package (version 0.4.2),
   considering a trinucleotide context model without strand specificity
   (96 mutation types). Thus, the mutation counts for the 1,014 melanoma
   samples were arranged in a 96-by-1014 matrix V, and NMF was applied to
   obtain a decomposition
   [MATH: <mi>V</mi><mo>≃</mo><mi>W</mi><mi>H</mi> :MATH]
   , where W is a 96-by-k matrix containing k mutational signatures, and H
   is a k-by-1014 matrix representing the signatures’ absolute
   contribution to each sample. NMF was run with the Kullback-Leibler
   divergence loss function and a maximum of 50,000 iterations. The
   optimal decomposition rank k (i.e. number of mutation signatures) was
   determined using three repetitions of five-fold cross-validation. For
   each fold, one-fifth of the input matrix V was randomly masked, and the
   mean squared error (MSE) between the predicted and original values of
   the masked entries was computed. The rank with the smallest mean MSE
   was selected. The final NMF decomposition is provided in
   [312]Supplementary Tables 12 and [313]13 for matrices W and H,
   respectively.

   To estimate the proportion of mutations attributed to a mutational
   signature k in each sample, we first multiplied the signature’s
   corresponding column in W by its row in H to produce a matrix,
   W[*,k]H[k,*], that contains estimated sample-wise tri-nucleotide
   mutation counts for the signature. We then divided the column sums of
   W[*,k]H[k,*] by the column sums of WH. A similar procedure was used to
   estimate gene-wise signature contributions.

Significantly mutated genes

   We used OncodriveFML 2.0.3^[314]11 to identify genes under positive
   selection. Analyses were done separately for the UVR-high (n = 824) and
   UVR-low (n = 177) samples, defined as having ≤50% or >50% of their
   mutations originating from the UV-signature. Samples with >50% of their
   mutations originating from the alkylating signature (n = 13) were
   omitted from these analyses.

   We ran OncodriveFML twice for each UVR group, using default CADD
   scores^[315]23 and custom LoF scores devised for the identification of
   tumor suppressor genes. LoF scores were obtained by generating all
   possible single nucleotide variants across the coding genome, followed
   by snpEff annotation (v.4.3s, Ensembl GRCh37.75 gene models). Variants
   with a loss-of-function consequence on any protein coding transcripts
   were given a score of 1 and all other variants were given a score of 0.
   These consequences were considered loss-of-functions: stop_gained,
   start_lost, splice_acceptor and splice_donor. Since frameshift variants
   are treated independently by OncodriveFML, they were not explicitly
   included in the LoF scores.

   For each OncodriveFML run, genes with less than 10 mutations were
   discarded and p-values were adjusted for multiple hypothesis testing
   using the Benjamini-Hochberg procedure to control the false discovery
   rate (FDR). Genes that passed an FDR cut-off of <1% were labelled
   “significantly mutated”. Results of all OncodriveFML runs are provided
   in [316]Supplementary Table 4.

   To compute an LoF enrichment score ([317]Fig. 5d), we estimated the
   expected (neutral) proportions of LoF and synonymous variants in each
   gene, according to a penta-nucleotide context^[318]12, and use the
   following formula:
   [MATH:
   <mi>L</mi><mi>o</mi><mi>F</mi><mi>e</mi><mi>n</mi><mi>r</mi><mi>i</mi><
   mi>c</mi><mi>h</mi><mi>m</mi><mi>e</mi><mi>n</mi><mi>t</mi><mi>s</mi><m
   i>c</mi><mi>o</mi><mi>r</mi><mi>e</mi><mo>=</mo><mfenced
   separators="|"><mrow><mfrac><mrow><mi>o</mi><mi>b</mi><mi>s</mi><mi>e</
   mi><mi>r</mi><mi>v</mi><mi>e</mi><msub><mrow><mi>d</mi></mrow><mrow><mi
   >L</mi><mi>o</mi><mi>F</mi></mrow></msub><mi
   mathvariant="normal">*</mi><mi>e</mi><mi>x</mi><mi>p</mi><mi>e</mi><mi>
   c</mi><mi>t</mi><mi>e</mi><msub><mrow><mi>d</mi></mrow><mrow><mi>s</mi>
   <mi>y</mi><mi>n</mi></mrow></msub></mrow><mrow><mi>o</mi><mi>b</mi><mi>
   s</mi><mi>e</mi><mi>r</mi><mi>v</mi><mi>e</mi><msub><mrow><mi>d</mi></m
   row><mrow><mi>s</mi><mi>y</mi><mi>n</mi></mrow></msub><mi
   mathvariant="normal">*</mi><mi>e</mi><mi>x</mi><mi>p</mi><mi>e</mi><mi>
   c</mi><mi>t</mi><mi>e</mi><msub><mrow><mi>d</mi></mrow><mrow><mi>L</mi>
   <mi>o</mi><mi>F</mi></mrow></msub></mrow></mfrac></mrow></mfenced>
   :MATH]

   The following variants were considered as LoF: stop_gained, start_lost,
   splice_acceptor and splice_donor. To prevent extreme values for genes
   with few mutations, we added three pseudo-counts to both the numerator
   and denominator of the plotted estimates.

Saturation analysis

   To measure the influence of sample size on the number of identified
   SMGs, we ran OncodriveFML ten times using n = (100, 150, …, 800) tumors
   randomly chosen from the high-UV datasets. The number of genes that
   passed an FDR cut-off of <1% in each run was then plotted against the
   number of considered samples.

Identification of potential false positives

   We considered three criteria to identify potential false positive SMGs:
   (1) high proportions of mutations in ETS transcription factor binding
   sites (>10% of all mutations in gene), (2) high neutral mutation rate
   (>3 × 10^−05 mutations per nucleotide per sample), (3) lack of or low
   gene expression in melanoma cell lines (90th percentile of RPKM < 1).

   ENCODE’s clustered ChIP-seq data for 161 transcription factors^[319]65
   was downloaded from the UCSC Genome Browser^[320]66. We selected peaks
   with an ENCODE’s normalized score ≥500 from the following ETS factors:
   ETS1, GABPA, ELF1, ELK1 and ELK4. Overlapping variants were identified
   using the GenomicRanges package.

   To estimate neutral mutation rates, we used the mutation data from the
   183 melanoma whole genomes (MELA-AU). For each gene, we considered a
   centered window of at least 100kb spanning its complete set of
   transcripts. We then excluded any coding, evolutionary conserved or low
   mappability regions ([321]Supplementary Table 19; neutral mutation rate
   estimation). The gene-level mutation rate was computed as the number of
   variants falling within the non-excluded regions, divided by their
   total size. This method was implemented in R with the rtracklayer and
   GenomicRanges packages.

Mutated genes pathway enrichment analysis

   We tested if genes with an OFML FDR < 10% were enriched for biological
   pathways or complexes from the Reactome^[322]67 “ENSEMBL to pathways”
   databaseand EpiFactors database^[323]68. The enrichment of each pathway
   or complex was tested using a one-tailed Fisher’s exact test. For each
   test, the “gene universe” was defined as the set of genes tested for
   mutational significance in any of the four OFML runs (CADD-UVR-high,
   CADD-UVR-low, LoF-UVR-high and LoF-UVR-low). P-values were adjusted for
   multiple testing using the Benjamini-Hochberg procedure independently
   for Reactome and EpiFactors.

Copy number and purity analysis

ABSOLUTE

   Haplotype phasing and copy-ratio segmentation was done with HAPSEG
   (version 1.1.1) ^[324]69 using Affymetrix SNP6 microarray data from 463
   TCGA tumor-normal pairs acquired from the legacy GDC archive. Somatic
   tumor variants and HAPSEG segmentations were processed with
   ABSOLUTE^[325]39 (version 1.0.6) to obtain purity, ploidy and
   genome-wide allelic copy numbers (solution obtained for 449 samples).
   ABSOLUTE segmentation was intersected with gene coordinates (Ensembl
   GRCh37.75) to obtain gene level LOH and total copy numbers in each
   tumor sample. Genes overlapped by multiple segments were assigned the
   lowest total copy number and were considered to exhibit LOH if at least
   one segment supported it. The local gain or loss of a gene was
   determined using the ratio of its absolute copy number relative to the
   median copy number of the chromosome arm where it resides. When this
   ratio was greater than or equal to three, a gene was considered
   amplified.

Co-occurrence between mutations and copy gain or LOH

   Co-occurrence enrichment between mutations and segmental LOH or copy
   gain in candidate driver genes was tested using a one-tailed Fisher’s
   exact test. For LOH, we considered LoF mutations only. For copy gain,
   we considered missense mutations at recurrently mutated amino acids (N
   > 1) only. To ensure sufficient power, only genes having mutations and
   segmental events in at least three samples were tested. For any given
   gene locus, samples with homozygous deletion (0 copy) were excluded
   from the tests.

Significantly amplified or deleted regions

   We used GISTIC to identify significantly amplified or deleted regions.
   Segmented copy ratios (germline CNV masked) for 470 TCGA tumor samples
   were acquired from the GDC data portal. Segments that exceeded the
   telomeric- or centromeric-most array probes were truncated to be within
   covered genomic regions. To improve sensitivity, we applied in silico
   admixture removal^[326]70 to samples for which ABSOLUTE ploidy and
   purity estimates were available, using the following formula:
   [MATH:
   <mi>a</mi><mi>d</mi><mi>j</mi><mi>u</mi><mi>s</mi><mi>t</mi><mi>e</mi><
   mi>d</mi><mi>c</mi><mi>o</mi><mi>p</mi><mi>y</mi><mi>r</mi><mi>a</mi><m
   i>t</mi><mi>i</mi><mi>o</mi><mo>=</mo> :MATH]
   [MATH:
   <mi>c</mi><mi>o</mi><mi>p</mi><mi>y</mi><mi>r</mi><mi>a</mi><mi>t</mi><
   mi>i</mi><mi>o</mi><mo>+</mo><mrow><mrow><mn>2</mn><mi
   mathvariant="normal">*</mi><mfenced
   separators="|"><mrow><mn>1</mn><mo>-</mo><mi>p</mi><mi>u</mi><mi>r</mi>
   <mi>i</mi><mi>t</mi><mi>y</mi></mrow></mfenced><mi
   mathvariant="normal">*</mi><mfenced
   separators="|"><mrow><mi>c</mi><mi>o</mi><mi>p</mi><mi>y</mi><mi>r</mi>
   <mi>a</mi><mi>t</mi><mi>i</mi><mi>o</mi><mo>-</mo><mn>1</mn></mrow></mf
   enced></mrow><mo>/</mo><mrow><mfenced
   separators="|"><mrow><mi>p</mi><mi>u</mi><mi>r</mi><mi>i</mi><mi>t</mi>
   <mi>y</mi><mi
   mathvariant="normal">*</mi><mi>p</mi><mi>l</mi><mi>o</mi><mi>i</mi><mi>
   d</mi><mi>y</mi></mrow></mfenced></mrow></mrow> :MATH]

   Non-positive copy ratios were capped to 1e-3. Adjusted ratios were log2
   transformed, centered on their mode and passed to GISTIC. In [327]Fig.
   4e, GISTIC wide peak boundary coordinates were converted from hg38 to
   hg19 using liftOver to be visualized with ABSOLUTE copy number
   profiles.

Transcriptome analysis

RNA-seq processing

   Raw RNA-Seq read counts were download from the GDC portal for the
   TCGA-SKCM cohort, and from the CCLE data portal for the melanoma cell
   lines. Counts of protein coding genes were converted to CPM after TMM
   normalization using the edgeR package. RPKM/FPKM values were calculated
   using the rpkm function.

Deconvolution of transcriptomic profiles by NMF

   We used NMF (as implemented in the NMF^[328]71 R package) to deconvolve
   cancer and stromal transcriptomic profiles in the TCGA-SKCM cohort. The
   output of NMF consists of 2 matrices, W and H, whose product is the
   approximated matrix of observed CPM values. In this context, W is a
   gene-by-signature matrix containing the weights of each gene’s
   contribution to a signature and H is a signature-by-sample matrix
   containing the weight of each signature’s contribution to a sample.
   Here, signatures can be seen as cell-type specific gene expression
   modules. NMF was applied to a matrix of CPM values for 5000 genes and
   470 samples, with the Brunet optimization algorithm. The genes were
   selected to have the largest mean absolute deviation (calculated using
   log-transformed CPM values) amongst autosomal protein coding genes with
   a mean RPKM > 1. The optimal number of signatures (i.e. decomposition
   rank) was determined using the proportion of ambiguous clustering
   (PAC).^[329]72 The final NMF decomposition is provided in
   [330]Supplementary Tables 14 (matrix W) and [331]15 (matrix H).

Confirmation of intrinsic profiles using PCA and clustering

   We recovered the three intrinsic NMF signatures using a classical
   clustering approach on purity adjusted gene expression. To mitigate the
   effect of stromal cell contamination, we restricted our analysis to
   genes with at least one read in all samples and whose expression
   (log-transformed CPM) positively correlated with tumor purity (Pearson
   correlation > 0.1) and not strongly positively correlated with NMF’s
   keratin signature (Pearson correlation < 0.2). The log-transformed CPM
   values of 5166 retained genes were regressed (linearly) on tumor
   purity. Genes were ranked by decreasing variance of the residuals, and
   the top 1500 were used for clustering of the tumor samples using the
   ConsensusClusterPlus^[332]73 R package, with 1000 resampling iterations
   of kmeans clustering with k = 3. The transcriptomic subgroup of each
   sample was assigned based on their membership to one of the 3 clusters.

Transcriptomic signatures and xCell analysis

   A gene by sample matrix of mRNA RPKM expression values for 468 TCGA
   samples was passed to xCell to obtain cell-type enrichment scores for
   each sample. Spearman’s correlation between cell-type’s scores and NMF
   component weights was computed and plotted in [333]Figure 6e.

Transcriptomic subgroup gene set enrichment analysis

   Differential gene expression analysis was performed using
   DESeq2^[334]74, comparing samples in each transcriptomic subgroup to
   all other samples. For each comparison, log2 fold-differences were
   supplied to the GSEA tool,^[335]44,[336]45 using default parameters
   with the Hallmarks and Curated (C2) gene sets.

Differential gene alteration analysis across transcriptomic subgroups

   Differential alteration frequencies (coding mutations, homozygous
   deletions, and local amplifications) of candidate driver genes across
   transcriptional subgroups was assessed using a two-tailed Fisher’s
   exact test. For each gene, the test was performed on a two-by-three
   contingency table of alteration counts (gain and loss) and mRNA
   subgroups. P-values were adjusted for multiple testing using the
   Benjamini-Hochberg procedure.

X-linked analysis

Sex-biased mutation frequency

   A two-tailed Fisher’s exact test was used to determine if a given SMG
   is differentially mutated (missense, inframe-indel or LoF) between
   males and females. To control for the different neutral mutation burden
   observed in males and females (see [337]Figure 2A), separate null
   hypotheses [i.e. expected odds ratio (ORs)] were considered for
   autosomal and X-linked genes. Specifically, we set the expected OR of
   the Fisher’s test (i.e. “or” parameter in R’s fisher.test() function)
   to the median OR observed across all non-SMGs (mutated in at least 10
   samples to ensure reliable estimates), considering autosomal and
   X-linked genes separately. P-values were adjusted for multiple testing
   using the Benjamini-Hochberg procedure.

   We complemented the Fisher’s test using a logistic regression approach,
   whereby the mutation probability of each gene is modeled as a function
   of sex and additional covariates specified in [338]Extended Data Fig.
   5a. We used the number of SNVs (log-transformed) on the autosomes (or X
   chromosome for X-linked genes) to account for differences in mutation
   burden across samples. We note that this approach cannot be used when
   the outcome variable completely separates one or more of the predictor
   variables, as is the case for DDX3X LoF mutations that were found
   exclusively in males.

Differential gene expression between sexes

   Differential gene expression analysis of X-linked genes between males
   (n = 174) and females (n = 273) was performed using DESeq2^[339]74.
   Specifically, gene expression was modeled as a function of gender,
   tumor purity, and tumor tissue site (i.e. primary, regional cutaneous
   or sub-cutaneous, regional lymph node, and distant metastasis). DESeq2
   was initially run on all protein-encoding genes to ensure precise
   estimates of dispersion. Expression fold-differences between genders
   and their respective P-values for X-linked genes were subset and
   adjusted for multiple testing using the Benjamini-Hochberg procedure
   independently of other genes.

Promoter methylation

   Promoter methylation was calculated by taking the mean Beta value of
   all methylation probes 2kb upstream of a gene’s most 5’ transcription
   start site, in each of 180 female samples.

Bi-allelic expression of DDX3X

   RNA-seq BAM slices of the DDX3X locus were downloaded from GDC for all
   TCGA-SKCM samples, and nucleotide counts were determined at each
   genomic position using the Rsamtools package.^[340]75 We then looked
   for common SNPs (average heterozygosity >= 20%, dbSNP150) located
   within any DDX3X exon and covered by at least 10 reads in >50% of the
   samples. Only one SNP fulfilled these criteria, rs5963957 (A/C,
   hg38:chrX:41349057; avHet = 0.43), with a median coverage of 274 reads
   across samples. The distribution of nucleotide counts at this position
   confirmed bi-allelic expression in most female samples.

DDX3X functional analysis

Differential gene expression (DGE) analysis of mutant and WT DDX3X tumors

   We applied a linear model framework for transformed RNA-seq read
   counts, implemented in the limma R package^[341]76, to RNA-seq data
   from the TCGA. Starting with a gene-by-sample matrix of read counts, we
   retained protein coding genes that in at least 50 samples had a
   counts-per-million (CPM) value ≥ a CPM corresponding to 10 reads in the
   sample with the smallest library (number of genes = 14,011). Then,
   sample-wise normalization factors were calculated using the TMM method
   implemented in edgeR^[342]77 and were subsequently provided along with
   the read counts to limma’s voom function to estimate observation
   weights. Linear models were fitted to the voom-weighted observations
   using limma’s lmFit function and differential expression estimates were
   moderated using limma’s eBayes function. P-values corresponding to the
   log2 fold-changes were adjusted for multiple testing using the
   Benjamini-Hochberg procedure.

   We restricted our analysis to male samples, which harbored the vast
   majority of DDX3X mutations. We modeled gene expression as a function
   of (1) the mutation status of DDX3X (LoF, missense, or wildtype), (2)
   the intrinsic gene expression signatures from NMF, (3) the immune
   signature, (4) top three principal components corresponding to the gene
   expression (log2CPM + a prior count of 5) of “Keratinization” and
   “Formation of the cornified envelope” related genes listed in the
   Reactome database (downloaded October 6, 2019), as these captured more
   of the variance in Keratinocyte gene expression than NMF’s Keratin
   signature, and (5) the expression level of DDX3Y (high or low, based on
   a [log2CPM + a prior count of 5] > 5 cut-off determined based on the
   relation of DDX3Y expression and tumor purity). We computed the
   fold-change in gene expression between DDX3X mutant and wildtype
   samples that had high DDX3Y expression (22 and 167 samples
   respectively), as the majority of DDX3X mutations occurred in DDX3Y
   expressing tumors.

Differential gene expression analysis of DDX3X KD and control cell lines

   For each cell line (melanoma HT144 cells^[343]36, hepatocellular
   carcinoma HepG2 cells, and chronic myelogenous leukemia K562
   cells^[344]35), we quantified mRNA expression using Kallisto (default
   parameters and GENCODE v22 gene annotations). We used DESeq2^[345]74
   with default parameters to estimate differences in gene expression
   between DDX3X knockdown and control conditions. P-values were adjusted
   for multiple testing using the Benjamini-Hochberg procedure. SRA
   accessions for RNA-seq data are in [346]Supplementary Tables 16 and
   [347]17.

Positional enrichment of DDX3X eCLIP peaks

   Enhance crosslinking immunoprecipitation (eCLIP) peaks that passed an
   irreproducible discovery rate (IDR) cut-off <0.01 were acquired from
   ENCODE for 150 RNA binding proteins (RBPs) (103 for HepG2 and 120 for
   K562) including DDX3X. To determine the positional enrichment of these
   peaks in gene bodies, we first binned the genomic coordinates of each
   gene into 1000 tiles, with the first tile starting at the 5’end of the
   gene. Then, for each tile [1–1000], we computed the proportion of genes
   that overlap at least one peak for the RBP of interest at that tile as
   a fraction of all genes that overlap one or more peaks for that RBP at
   any tile. The ENCODE IDs of eCLIP data are in [348]Supplementary Table
   18.

Enrichment of DDX3X targets in differentially regulated genes

   Genes were divided into two groups based on whether their 5’UTR(s)
   exclusively overlap at least one IDR eCLIP peak for DDX3X or another
   RNA binding protein (RBP). For each group, 2D kernel density estimates
   for differential gene expression in tumors (DDX3X mutant vs. WT) and
   cell lines (DDX3X knockdown vs. control) were estimated using the kde2d
   function from the MASS package in R. The bandwidth parameter of the
   function was set to 0.4. The difference in densities between the two
   groups of genes was computed and plotted in [349]Fig. 3c and
   [350]Extended data Fig. 5d.

Gene set enrichment analyses for DDX3X differential expression

   We tested for enrichment of gene sets in differentially expressed genes
   using weighted logistic regression models. For each gene set in the
   Reactome database (downloaded October 6, 2019), we modeled the presence
   or absence of each gene in the set (i.e. number of ‘successes’), as a
   fraction of the total number of sets to which the gene is annotated
   (i.e. number of ‘trials’), using differential gene expression as an
   explanatory variable. In this model, each observation (fraction of
   successes) is weighted by the number of trials. We extracted the
   differential gene expression coefficients and their corresponding
   P-values from each model for plotting and further analysis. P-values
   were adjusted for multiple hypothesis testing using the
   Benjamini-Hochberg procedure.

Survival analysis

   The majority of TCGA specimens analyzed were from metastases (363 of
   465). For some patients, this was years after their primary melanoma
   diagnosis. Because the biology of a metastatic melanoma and its immune
   cell content may differ from that of its original primary melanoma, we
   focused on post-accession survival times as in the melanoma TCGA marker
   publication.^[351]4 In summary, patient vital status and the number of
   days from primary melanoma diagnosis to death or last follow-up were
   acquired from the GDC data portal (overall survival). We also obtained
   the number of days between primary diagnosis and sample procurement
   (sample procurement times) from the Broad Institute TCGA GDAC Firehose
   website:

   ([352]https://gdac.broadinstitute.org/). We subtracted these sample
   procurement times from the overall survival times to obtain
   “post-accession survival times”.

   We modeled survival time using Cox proportional hazards regression in
   R. Kaplan-Meier estimator plots were generated using the survfit
   function from the survival package (version 2.43–3) in R. In all
   Kaplan-Meier plots, we limited our survival analysis to patients with
   molecularly profiled metastatic melanoma lymph node specimens only (n =
   216). P-values associated with Kaplan-Meier plots are from a log-rank
   or Mantel-Haenszel test performed using the survdiff R function.

Neoantigen analysis

   HLA typing of the TCGA-SKCM samples was performed with Optitype^[353]78
   using the BAM files from the normal tissue samples. MHC-I binding
   predictions were obtained with netMHCpan4^[354]79. Variant processing
   was performed as follow. We first extracted the mutated and wild-type
   sequences of a 17aa window centered on each missense mutation using the
   Biostrings and ensembldb R packages. These sequences were then
   processed with netMHCpan4 to predict their MHC-I binding affinity,
   using a 9aa window. We used the default percentile rank thresholds
   provided by netMHCpan4 to classify peptides into strong (<0.5%) or weak
   (<2%) binders. Predicted antigenic mutations were grouped into 4 tiers
   of decreasing specificity as follow: Tier 1 includes mutations creating
   at least one peptide with strong binding prediction but whose wild-type
   form is not predicted to be a strong binder. Tier 2 includes any
   mutation with a strong binding prediction, without regard to the
   binding predictions of the wild-type forms. Tier 3 includes mutations
   creating at least one peptide with weak binding prediction, but whose
   wild-type form is not predicted to bind. Tier 4 includes any mutation
   with weak binding prediction, without regard to the binding predictions
   of the wild-type forms. Finally, all tiers were updated to include
   mutations from less specific tiers (i.e. tier k includes any mutations
   in tier k-1). Only variants with median expression > 1 TPM (as
   estimated by Kallisto)^[355]80 were considered as potential
   neoantigens.

   To test for evidence of negative selection, we compared the number of
   predicted antigenic mutations in the TCGA-SKCM cohort with the
   distribution obtained over 1000 random permutations of the HLA alleles
   across patients. Importantly, to remove bias that could results from
   population structure or the HLA-typing step, we considered the sum of
   predicted antigenic mutations over the six HLA alleles in each patient
   (i.e. antigenic mutations recognised by homozygous HLA loci are counted
   twice). We estimated the statistical power of this approach by applying
   the same procedure on randomized datasets in which varying proportions
   of antigenic mutations were specifically removed to simulate increasing
   levels of negative selection. For predicted MHC-I strong binding
   peptides (tiers 1 and 2), power reached 80% when 7.5% of antigenic
   mutations were removed.

Adjusting TMB for WES coverage and purity

   For each TCGA sample, we determined the proportion of the coding genome
   that has sufficient read coverage to provide 80% power for mutation
   detection using ABSOLUTE estimates. We divided the observed TMB by this
   value to obtain the expected TMB if coverage was sufficient for 100% of
   the coding genome.

Statistics and reproducibility

   In this study, we aimed to analyze the largest possible cohort of
   melanoma whole exomes. No statistical method was used to predetermine
   sample size, as this number was dictated by the availability of
   published datasets.

   Four TCGA patients had multiple corresponding tumor samples. Prior to
   our analyses, we decided to exclude the following redundant samples,
   arbitrarily prioritizing primaries over metastases: TCGA-ER-A19T-06A,
   TCGA-ER-A2NF-06A, TCGA-D3-A1Q6–07A and TCGA-D3-AlQA-07A.

   Statistical analyses were performed in R (v3.3.0-v3.5.3). These
   included one-sided and two-sided Fisher’s exact test, two-sided
   Mann–Whitney U test, one-sided Kolmogorov–Smirnov test and generalized
   linear models, as indicated. P-values were adjusted for multiple
   testing using the Benjamini-Hochberg procedure, as indicated. A
   detailed list of R packages and software programs used in this study is
   provided in [356]Supplementary Table 20. The experiments were not
   randomized. The Investigators were not blinded to allocation during
   experiments and outcome assessment.

DATA AVAILABILITY

   Previously published melanoma somatic variants that were reanalyzed in
   this study are available from the associated publications:

   Hodis et al. 2012 ([357]https://doi.org/10.1016/j.cell.2012.06.024,
   Supplementary Table S4A),

   Krauthammer et al. 2015 ([358]https://doi.org/10.1038/ng.3361,
   Supplementary Data 3) and

   Van Allen at al. 2015 ([359]https://doi.org/10.1126/science.aad0095,
   Supplementary Table S1).

   The human melanoma data generated by the TCGA Research Network
   ([360]http://cancergenome.nih.gov/) can be accessed from the GDC Data
   Portal ([361]https://portal.gdc.cancer.gov/), after approval for dbGap
   Study Accession phs000178
   ([362]https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study
   _id=phs000178.v10.p8), due to the presence of personally identifiable
   information, such as a patient’s germline DNA variants.

   The following MAFs were used:

   TCGA.SKCM.muse.4cd49f89-d7e2–4333-9872–0bff5327c896.protected.maf

   TCGA.SKCM.mutect.bd022199-d399–45db-8474–6dc1f3aad457.protected.maf

   TCGA.SKCM.somaticsniper.4ff8ab0f-1a75–44f6-af48–2b30fc6d5a08.protected.
   maf

   TCGA.SKCM.varscan.a83548c2-e6b2–45cf-a7c3-ec099daf30ce.protected.maf

   The somatic variants from 183 human melanoma whole genomes (Hayward et
   al. 2017) can be accessed from the International Cancer Genome
   Consortium (ICGC) data portal
   ([363]https://dcc.icgc.org/releases/release_23/Projects/MELA-AU),
   without restriction.

   RNA-seq data from DDX3X knockdown in HT144 cell lines can be accessed
   from the Sequence Read Archive ([364]https://www.ncbi.nlm.nih.gov/sra),
   using accession identifiers provided in Supplementary Table 14.

   eCLIP data and expression data from DDX3X knockdown in K562 and HepG2
   human cell lines can be downloaded from the ENCODE portal
   ([365]https://www.encodeproject.org/), using accession identifiers
   provided in Supplementary Table 16 and 15, respectively.

   Regions considered for neutral mutation rate estimation were defined
   using the following files available from Ensembl or the UCSC Genome
   Browser website:

   [366]http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/phastConsE
   lements100way.txt.gz

   [367]http://hgdownload.cse.ucsc.edu/goldenpath/hg19/encodeDCC/wgEncodeM
   apability/wgEncodeCrgMapabilityAlign75mer.bigWig

   [368]http://hgdownload.cse.ucsc.edu/goldenpath/hg19/encodeDCC/wgEncodeM
   apability/wgEncodeDukeMapabilityRegionsExcludable.bed.gz

   [369]http://hgdownload.cse.ucsc.edu/goldenpath/hg19/encodeDCC/wgEncodeM
   apability/wgEncodeDacMapabilityConsensusExcludable.bed.gz

   [370]http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/pseudoYale
   60.txt.gz

   [371]ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens/Homo_sapiens
   .GRCh37.75.gtf.gz

   ENCODE’s ETS transcription factor binding sites were downloaded from
   the UCSC Genome Browser website:

   [372]https://hgdownload.cse.ucsc.edu/goldenpath/hg19/encodeDCC/wgEncode
   RegTfbsClustered/wgEncodeRegTfbsClusteredWithCellsV3.bed.gz

   CCLE cell lines gene expression data was obtained from:

   [373]https://portals.broadinstitute.org/ccle/data/CCLE_DepMap_18q3_RNAs
   eq_reads_20180718.gct

   Cell line annotations were obtained from DepMap:

   [374]https://depmap.org/portal/download/all/DepMap-2018q4-celllines.csv

   Gene lengths used for RPKM calculations were obtained from:

   [375]ftp://ftp.ensembl.org/pub/release-86/gtf/homo_sapiens/Homo_sapiens
   .GRCh38.86.gtf.gz

   The mutated genes pathway enrichment analysis was based on the

   EpiFactors database

   (downloaded on 2018–01-21, [376]http://epifactors.autosome.ru/) and the

   Reactome database

   (downloaded on 2018–01-20, [377]https://reactome.org/, ENSEMBL-
   to-pathways).

   The mRNA subgroups pathway enrichment analysis was based on

   MSigDB (v6.2): [378]https://www.gsea-msigdb.org/gsea/msigdb/index.jsp

   We obtained transcript level expression (in TPM) for TCGA-SKCM from:
   [379]https://osf.io/gqrz9

   Gene set enrichment analyses for DDX3X differential expression was
   based on the

   Reactome database (downloaded on 2019–10-06):
   [380]https://reactome.org/

   For the GISTIC2 analysis of recurrent focal copy-number alteration, we
   used the following reference file provided by the GDC:
   snp6.na35.liftoverhg38.txt.zip

   ([381]https://gdc.cancer.gov/about-data/data-harmonization-and-generati
   on/gdc-reference-files/)

   The COSMIC Mutation Signature definitions were downloaded from the
   DeconstructSigs website:

   [382]https://github.com/raerose01/deconstructSigs/blob/master/data/sign
   atures.exome.cosmic.v3.may2019.rda

   The combined set of reannotated variants, excluding those protected by
   the TCGA, can be accessed at our GitHub repository:

   [383]https://github.com/ianwatsonlab/multiomic_melanoma_study_2019

CODE AVAILABILITY

   Code related to the main findings of the study is available at GitHub
   at: [384]https://github.com/ianwatsonlab/multiomic_melanoma_study_2019

Extended Data

Extended Data Figure 1. Identification of melanoma mutational signatures
using NMF.

   Extended Data Figure 1.
   [385]Open in a new tab

   (a) Determination of the optimal NMF decomposition rank (k) based on
   the average of the mean squared error (MSE) between observed
   trinucleotide mutation counts and predictions of masked values (y-axis)
   imputed by NMF. The average, calculated across three repetitions of
   5-fold cross validation, is plotted against the decomposition ranks
   (x-axis). Error bars represent the standard error of the mean (SEM).
   (b) Sample-wise Spearman’s correlation between the observed and NMF’s
   fitted trinucleotide mutation counts (n = 96 trinucleotide mutations, n
   = 1,014 tumors). The color gradient represents the number of mutated
   trinucleotides in each tumor sample and is meant to highlight that
   lower correlations result simply from the low sparsity of NMF’s fit.
   (c) Percentage contribution of trinucleotide mutations for each
   mutational signature. (d) Percent contribution of each mutational
   signature to the total number of mutations per tumor. (e) The
   proportion of UVR-signature mutations per tumor. Melanoma subtypes are
   distinguished by different colours. (f) Comparisons of our
   trinucleotide mutational signatures to the Catalogue of Somatic
   Mutations in Cancer (COSMIC) set of signatures. Left panels show the
   Person’s correlation (y-axis) between the percent contribution of
   trinucleotide mutations to our signatures (the values in c) and each of
   65 signatures in COSMIC (x-axis) (n = 96 trinucleotide mutations).
   Right panels show the mean squared difference (y-axis) between the
   percent contributions (n = 96 trinucleotide mutations). (g) Heatmap
   showing the column-sum normalized weights of COSMIC mutational
   signature (rows) in our set of 1,014 tumor samples (columns), estimated
   using non-negative linear regression (via the nnlm() function
   implemented in the NNLM R package). Our unsupervised estimates of
   mutation signature contributions are shown at the top. There is strong
   agreement between our estimates and those based on the COSMIC
   signatures.

Extended Data Figure 2. Summary of OncodriveFML (OFML) results.

   Extended Data Figure 2.
   [386]Open in a new tab

   (a) Quantile-quantile (Q-Q) plots of OFML (right-tailed permutation
   test) p-values (y-axis) plotted against uniformly distributed p-values
   (x-axis) (n = 177 UVR-low tumor samples, n = 824 UVR-high tumors). Each
   point represents one gene. Genes with an FDR adjusted p-value of <10%
   are labelled. (b-c) Venn diagrams showing the overlap between genes
   identified in each UVR group (b) and using each scoring function (c).
   (d) Detailed breakdown of OFML subset analyses. Each row of the matrix
   contains the genes identified as significantly mutated (FDR < 1%) in an
   analysis using the labelled score (leftmost row label), the UVR group
   (centre) and melanoma subset (right). Sample sizes are as follows:
   (UVR-low, UVR-high, all subtypes n = 1,001 tumors); (UVR-low, non-acral
   cutaneous n = 77 tumors); (UVR-low, all subtypes n = 177 tumors);
   (UVR-high, non-acral cutaneous n = 690 tumors); (UVR-high, all subtypes
   n = 824 tumors). (e) Mutation frequency of SMGs stratified by cohort,
   (f) melanoma subtype and (g) UVR-group. The number of mutated tumors is
   indicated above each bar for panels (f) and (g). All FDR adjusted
   p-values were obtained using the Benjamini-Hochberg procedure.

Extended Data Figure 3. Summary of criteria used to flag potential false
positive SMGs.

   Extended Data Figure 3.
   [387]Open in a new tab

   (a) Distribution of mutation frequency near transcription start sites
   (TSS) of expressed transcripts (> 1 TPM) that overlap (red) or do not
   overlap (black) ETS transcription factor peaks. (b) Number and
   proportion of gene mutations that fall within an ETS transcription
   factor peak. (c) Recurrent mutations at ETS binding sites overlapping
   SMGs (STK19, SLC27A5, SUCO). For each gene, the top to bottom panels
   show (1) the gene locus. (2) the various transcripts at that locus. (3)
   the cumulative median expression of all transcripts, per DNA strand, in
   units of transcripts per million (TPM). (4) the cumulative median
   expression of all transcripts, per DNA strand, restricted to their
   coding regions (CDS), in TPM. (5) the number of mutations at each
   position in the region. (6) the locations of ETS transcription factor
   ChIP-seq peaks. (d) Neutral mutation frequency, per tumor sample per
   nucleotide, for SMGs and other potential drivers (see methods). (e)
   mRNA expression of genes in melanoma cell lines from the cancer cell
   line encyclopedia (CCLE) (n = 55 cell lines) (top) and melanoma TCGA
   tumors (second to fourth panels). Each point on the plot represents one
   tumor or cancer cell line. Expression levels are in log transformed
   units of reads per kilobase per million (RPKM). For each gene, TCGA
   tumors were stratified by mutation status. The colour of TCGA data
   points corresponds to the Spearman’s correlation between gene mRNA
   expression and tumor purity. The number of tumors used to compute the
   correlation coefficient is denoted for each gene and mutation type at
   the top of each panel.

Extended Data Figure 4. Enrichment of biological processes and protein
complexes in the top OncodriveFML (OFML) hits.

   Extended Data Figure 4.
   [388]Open in a new tab

   (a and b) Left panel: gene set membership for 66 genes which had an
   OFML FDR < 10%. Right panel: P-values from a one-tailed Fisher’s exact
   test of the overlap of Reactome (a) or EpiFactors (b) gene sets with
   genes that passed an OncodriveFML FDR of <10%. Only gene sets that
   passed an FDR threshold of <20% are shown. FDRs were computed form
   p-values using the Benjamini-Hochberg procedure. (c) Bar plot showing
   the frequency of somatic mutations in (SWI/SNF)/BAF complex subunits
   (ARID2, ARID1A, ARID1B, BRD7, and PBRM1) and MLL complex subunits
   (KMT2A, KMT2B, KANSL1, and MEN1) that passed an OncodriveFML FDR of
   <10%. Number of samples = 1,014 tumors.

Extended Data Figure 5. DDX3X functional analysis and DDX3Y expression and
mutation profiles.

   Extended Data Figure 5.
   [389]Open in a new tab

   (a) Volcano plots showing the relationship between patient sex and
   mutation status of SMGs. Each panel corresponds to a logistic
   regression model, where the probability of mutations in each SMG is
   modeled as a function of sex and potential confounders such as tumor
   mutation burden (TMB) and age. Each point corresponds to one gene. The
   x-axis corresponds to the value of the sex coefficient in the model
   (log2), the y-axis corresponds to its associated FDR-adjusted p-value
   (derived from a two-tailed z Wald test and adjusted for the number of
   hypotheses tested using the Benjamini-Hochberg procedure). These models
   show that the imbalance of DDX3X mutations between sex are not
   confounded by factors such as tumor mutation burden and age. There were
   a limited number of samples for which age was available. Therefore, the
   FDR value corresponding to DDX3X increases when age is included to the
   model, as do the FDR values of other SMGs. Models not including age at
   diagnosis were fitted to 1,013 tumors (59 DDX3X mutant tumors). The
   model including age at diagnosis was fitted to 841 tumors (50 DDX3X
   mutant tumors). Data is only shown for genes with at least five
   mutations. (b) Scatter plots showing DDX3X associated differences in
   gene expression in tumors (x-axis; n = 22 mutant DDX3X tumors vs n =
   167 wildtype tumors) compared against gene expression differences in
   cancer cell lines (y-axis; two biologically independent replicates of
   DDX3X knockdown per cell line vs two biologically independent controls
   per cell line). In each panel, the (log2) odds ratio (OR) between the
   sign of expression differences in the corresponding cell line vs the
   sign of expression differences in the tumors are shown. Only genes that
   had a differential expression p-value < 0.05 in tumors were considered
   in this analysis (p-values were estimated using the Limma R package;
   parameterized to perform a two-tailed t-test on linear model
   coefficients). (c) Expression differences of individual genes between
   DDX3X mutant and wildtype samples (TCGA, left panel) or DDX3X knockdown
   and control samples (HT144 cell line, right panel). See (b) for sample
   sizes and statistical test used with TCGA data. P-values for TCGA were
   adjusted for multiple testing using the Benjamini-Hochberg procedure.
   For HT144 data, a two-tailed z Wald test was performed on negative
   binomial model coefficients fitted using DESeq2. Genes are ordered
   according to differential expression p-values in HT144. (d) Heatmaps
   showing the difference in density of differentially expressed DDX3X
   targets relative to the targets of other RBPs. Each panel corresponds
   to a different combination of datasets used to determine differential
   expression (x and y-axis) and DDX3X or other RBP targets (indicated at
   top of panel). Target genes were identified based on the overlap of
   their 5’UTR with eCLIP peaks in K562 (left column), HepG2 (middle
   column) or the union of peaks in both cell lines (right column). (e)
   mRNA expression of DDX3X and DDX3Y in TCGA tumors from female (left)
   and male (right) patients (n = 289 male tumors, n = 179 female tumors).
   Each point represents one tumor, with the color representing DDX3X
   mutation status. Expression levels are in log transformed units of
   reads per kilobase per million (RPKM). (f) Mutation matrix of
   loss-of-function and missense mutations of DDX3X and DDX3Y across all
   samples analyzed in this study.

Extended Data Figure 6. Deconvolution of melanoma transcriptomes using NMF.

   Extended Data Figure 6.
   [390]Open in a new tab

   (a) Kaplan-Meier survival curves for 216 patients from TCGA with
   metastatic tumor samples from a regional lymph node, stratified by mRNA
   expression percentiles of lymphocytic markers. Each panel corresponds
   to one lymphocytic maker. (b-d) Determining the optimal NMF
   decomposition for RNA-seq data (n = 468 tumors). (b) Average of tumor
   sample connectivity matrices across 100 randomly initialized NMF runs.
   (c) Cumulative distribution function (CDF) of averaged tumor
   connectivity matrices. (d) Proportion of ambiguous clustering (PAC) by
   NMF – used to evaluate the stability of NMFs solution at each rank (k).
   PAC measurements using five different definitions of ambiguity are
   shown. (e-h) Distribution of NMF’s expression signatures and their
   relationship with non-melanocyte skin cells. (e) Distribution of NMF
   signature weights in TCGA tumors. (f and g) Scatter plots of each
   tumor’s NMF keratin weights (y-axis) and xCell keratinocyte and
   sebocyte signature scores (x-axis) (n = 468 tumors). Correlation
   p-value was computed using a two-sided Spearman’s test. (h)
   Distribution of keratin weights across TCGA tumor tissue sites
   (n[primary] = 101 and n[other] = 362 tumor samples). Each point
   corresponds to one tumor sample. P-value is from a two-tailed Wilcoxon
   rank sum test. Boxes indicate first, second, and third quartiles.
   Whiskers extend to the minimum and maximum data points, no further than
   1.5 times the inter-quartile range from the hinges. (i) Classical
   clustering recapitulates melanoma cell intrinsic expression signatures
   when the effect of varying tumor purity is subtracted from gene
   expression data. See methods for additional details. (j) Agreement
   between NMFs proposed cancer intrinsic signatures and the groups
   uncovered in (i). Each panel includes samples from a single mRNA
   subgroup identified in (i), indicated on the right side of the panel
   (n[Common] = 299, n[MITF-low] = 76 and n[OxPhos] = 72 tumors). Shown on
   the y-axis are the NMF weights corresponding to each NMF signature
   indicated on the x-axis. Boxes indicate first, second, and third
   quartiles. Whiskers extend to the minimum and maximum data points, no
   further than 1.5 times the inter-quartile range from the hinges.

Extended Data Figure 7. Characterization of melanoma mRNA subgroups.

   Extended Data Figure 7.
   [391]Open in a new tab

   (a-c) mRNA expression of MITF and hypoxia markers HIF1A and VEGFA in
   melanoma mRNA subgroups (n[Common] = 299, n[MITF-low] = 76 and
   n[OxPhos]= 72 tumors). The y-axis corresponds to mRNA expression in log
   transformed counts per million (CPM). P-values are from a two-tailed
   Wilcoxon rank sum test. Boxes indicate first, second, and third
   quartiles. Whiskers extend to the minimum and maximum data points, no
   further than 1.5 times the inter-quartile range from the hinges. (d-e)
   Gene set enrichment analysis (GSEA), performed on genes differentially
   expressed in each mRNA subgroup compared to all out-of-group samples.
   Genes were first ranked by fold-difference in RNA expression (estimated
   using DESeq2). The ranked log transformed fold-differences were
   provided to the Broad Institute’s GSEA tool, which performs a
   one-tailed permutation test of a Kolmogorov-Smirnov-like statistic
   (number of genes = 17,481). The MSigDB hallmarks (d) and curated gene
   sets (C2) (e) databases were used. For each gene set (y-axes) an
   enrichment score (x-axes) and a corresponding FDR value (colour
   gradient) are assigned. Positive and negative enrichment scores
   indicate that a gene set is enriched in upregulated and downregulated
   genes, respectively. The gene sets shown here are the top seven per
   group that passed an FDR cut-off <1%. GSEA computes FDR values using a
   permutation approach. (f) Distribution of TMB in non-acral cutaneous
   samples, stratified by their dominant intrinsic mRNA signature. See
   panel (a) legend for sample sizes. (g) Distribution of UVR-mutation
   proportions in non-acral cutaneous samples stratified by their dominant
   intrinsic mRNA signature. See panel (a) legend for sample sizes.
   P-values for (a) and (b) are based on a Kruskal-Wallis test. (h)
   Bootstrap estimates (10,000 iterations) of the Spearman correlation
   between immune signature and TMB (left) or proportion of UVR mutations
   (right) in each tumor (n = 394 non-acral cutaneous tumors). In panels
   (f-h), Boxes indicate first, second, and third quartiles. Whiskers
   extend to the minimum and maximum data points, no further than 1.5
   times the inter-quartile range from the hinges.

Extended Data Figure 8. Single predictors evaluation and relative quality of
multivariable post-accession survival models in non-acral cutaneous
melanomas.

   Extended Data Figure 8.
   [392]Open in a new tab

   (a) Univariable Cox model Hazard ratio and unadjusted p-value of single
   predictors (n = 347 tumor samples). (b) Relative quality of univariable
   and multivariable Cox survival models including all subsets of
   predictors. Models are ordered from left to right by increasing Akaike
   Information Criteria (AIC, top). The bottom panel (binary matrix)
   indicates which predictors were included in each model (n = 347 tumor
   samples). (c and e) Relative quality of multivariable Cox regression
   models including different subsets of predictors for unstratified (n =
   347) and UVR-high (n = 301) tumor samples. All models include age at
   procurement and tumor tissue site, in addition to the predictors
   specified on the x-axis (indicated by ellipses). The y-axis shows the
   Akaike information criteria (AIC, top subpanel) and concordance index
   (lower subpanel). (d and f) Coefficients of the multivariable Cox
   regression models shown in (c) and (e). For each subpanel, the left
   part shows the coefficients, expressed in log2 hazard-ratios with 95%
   confidence intervals, and the right part shows the coefficient
   p-values. Cox model coefficient p-values in all panels were computed
   using a two-tailed z Wald test.

Extended Data Figure 9. Single predictors evaluation and relative quality of
multivariable overall survival models in non-acral cutaneous melanomas.

   Extended Data Figure 9.
   [393]Open in a new tab

   (a) Univariable Cox model Hazard ratio and unadjusted p-value of single
   predictors (n = 347 tumor samples). (b) Relative quality of univariable
   and multivariable Cox survival models including all subsets of
   predictors. Models are ordered from left to right by increasing Akaike
   Information Criteria (AIC, top). The bottom panel (binary matrix)
   indicates which predictors were included in each model (n = 347 tumor
   samples). (c and e) Relative quality of multivariable Cox regression
   models including different subsets of predictors for unstratified (n =
   347) and UVR-high (n = 301) tumor samples. All models include age at
   initial diagnosis, pathologic stage, and mRNA subgroup in addition to
   the predictors specified on the x-axis (indicated by ellipses). The
   y-axis shows the Akaike information criteria (AIC, top subpanel) and
   concordance index (lower subpanel). (d and f) Coefficients of the
   multivariable Cox regression models shown in (c) and (e). For each
   subpanel, the left part shows the coefficients, expressed in log2
   hazard-ratios with 95% confidence intervals, and the right part shows
   the coefficient p-values. Cox model coefficient p-values in all panels
   were computed using a two-tailed z Wald test.

Extended Data Figure 10. Relative quality of multivariable Cox regression
models including TMB or neoantigen load.

   Extended Data Figure 10.
   [394]Open in a new tab

   (a) Concordance index. (b) Akaike information criteria. Models were
   restricted to non-acral cutaneous melanomas with no missing data for
   all predictors (n = 336 tumors, of which 293 tumors were UVR-high).

Supplementary Material

   supplementary_tables
   [395]NIHMS1756655-supplement-supplementary_tables.xlsx^ (8.8MB, xlsx)

ACKNOWLEDGEMENTS