Epigenetic inheritance of diet-induced and sperm-borne mitochondrial RNAs
HFD–LFD experiment in mice (F0–F1)
Animal housing, diet composition and breeding strategies
C57BL/6N male and female mice were purchased from Charles River Laboratories Germany. All animals were fed ad libitum and housed at constant temperature (22 ± 1 °C) and controlled humidity in ventilated cages on a 12 h:12 h light/dark cycle.
For eHFD treatment, 6-week-old male mice were randomly assigned to two groups fed for 2 weeks with purified HFD (rodent diet with 60 kcal% from fat; Research Diet D12492i) or LFD control diet (rodent diet with 10 kcal% from fat; Research Diet D12450B) and subsequently mated with a single unexposed, virgin female of the same age. For sHFD treatment, 6-week-old male mice were randomly assigned to two groups fed for 2 weeks with purified HFD (rodent diet with 60 kcal% from fat; Research Diet D12492i) or LFD control diet (rodent diet with 10 kcal% from fat; Research Diet D12450B), mated with a single unexposed virgin female (to empty the epididymis) and moved back to a standard chow diet for 4 weeks. These animals were subsequently mated with a single, virgin and aged-matched female to generate the offspring cohort.
At the time of mating, both males and females were fed ad libitum on a standard chow diet. To avoid isolation during gestation, males were removed from the cage after delivery and the mothers were maintained individually throughout newborn nursing and lactation. Litter size was adjusted to 6–8 whenever the number of pups was higher, to avoid undernourishment.
Offspring from LFD or HFD males and WT unexposed females were named F1. F1 animals were weaned at 21 days post-partum and kept on ad libitum chow diet for their entire life.
All animal experiments have been carried out according to the European Union directive 2010/63/EU and were approved by the responsible authorities of the government of Upper Bavaria, under licence number ROB-55.2-2532.Vet_02-17-33.
All efforts have been made to minimize suffering by considerate housing and husbandry. All phenotyping procedures were examined for potential refinements. Animal welfare was assessed routinely for all mice involved. Data from animal experiments are reported in accordance with the ARRIVE guidelines59.
Phenotyping pipeline
Body weight and relative lean and fat mass were measured in exposed F0 animals before and after the dietary challenge, as well as in chow-fed F1 animals bi-weekly from 4 to 14 weeks of age. Body composition was determined by nuclear magnetic resonance spectroscopy with a Minispec NMR analyser (Brucker Optics), according to the manufacturer’s instructions. ipGTTs were conducted on 8- or 12-week-old F0 mice and 16- and 24-week-old F1 mice after an overnight fasting period of 16 h (from 6 pm to 10 am). A ratio of 2 g of glucose per kilogram of fasting body weight was injected. Blood glucose levels were determined before and after the injection at 15, 30, 60 and 120 min using the Roche AccuChek Aviva blood glucose meter. Plasma samples were separated from the whole blood, collected in EDTA-coated microvettes (Sarstedt) at time 0, 30 and 60 min, and snap-frozen in liquid nitrogen for further analyses.
Insulin tolerance tests were carried out on 25-week-old F1 mice after 6 h fasting (from 6 am to 12 pm). Mice were injected with 0.5 U insulin per kilogram of body weight. Blood glucose was measured before the intraperitoneal injection and at 15, 30, 60 and 120 min using the Roche AccuChek Aviva blood glucose meter.
Analysis of spermatogenesis and sperm function
Testis from 8-week-old mice exposed to 2 weeks of LFD- or HFD-feeding were dissected and processed for histology (n = 5 mice per diet), and purification of round spermatids for RNA and sncRNA-seq (n = 3 mice per diet) or further processed for single-cell RNA-seq (n = 3 mice per diet).
Testis staining and histology
Testis sections were fixed for 48 h in 10% formalin, dehydrated through ethanol series, cleared in xylene and embedded in paraffin. After rehydration, 4-μm sections were stained with haematoxylin and eosin, according to the manufacturer’s instructions. For immunohistochemical analysis, 1.5-μm sections were dewaxed by standard techniques. Heat treatment was carried out for antigen retrieval in sodium citrate buffer. Endogenous peroxidase activity was quenched with 3% H2O2 in methanol at room temperature for 5 min. Incubation with primary antibodies was carried out overnight at 4 °C in blocking buffer (TBS–Tween 1%), and chromogenic reactions were carried out. Staining was carried out using the automatic Discovery XT (Ventana Systems) stainer, following preset protocols. Sections were subjected to EDTA-based antigen retrieval for 20 min and protein block (Dako, DS9390) for 12 min. Sections were then examined under an Olympus microscope. The primary antibody was TRA98 (Abcam Ab82527; 1:1,000); secondary antibody was rabbit anti-rat IgG H&L (HRP; Abcam Ab6734; 1:1,000). The diameters of 60–70 seminiferous tubuli were measured from cross-sectional areas of TRA98-stained testes and expressed as the average of the horizontal and vertical diameters.
Round spermatid isolation
Round spermatids were isolated using a modified density gradient protocol optimized for round spermatids as previously described60. A minimum of 85–90% purity was microscopically checked. Total RNA was prepared using the RNAeasy Mini Kit (QIAGEN) according to the manufacturer’s instructions. RNA concentration and integrity were controlled on a Bioanalyzer system (Agilent) and only RNA samples with a RIN (RNA integrity number) value > 7 were considered for downstream applications. A 50 ng quantity of total RNA was used to prepare total (Ribo-minus) and sncRNA-seq libraries as described below.
Cauda sperm isolation and analysis
Mature cauda spermatozoa were purified by a swim-up procedure61 from mice fed on HFD or LFD for 2 weeks (eHFD) and/or allowed to recover on chow diet for 4 weeks after the dietary challenge (sHFD).
Briefly, the cauda and the vas deferens were cut into small pieces, placed in 500 μl Donners medium (25 mM NaHCO3, 20 mg ml−1 BSA, 1 mM sodium pyruvate and 0.53% (vol/vol) sodium dl-lactate in Donners stock; Donners stock: 135 mM NaCl, 5 mM KCl, 1 mM MgSO4, 2 mM CaCl2 and 30 mM HEPES, pH 7.4, filtered through a 0.22-μM filter and stored at room temperature), filled up to 2 ml, centrifuged for 2 min at 1,200 r.p.m. and then incubated at 37 °C for 1 h. Afterwards, 1.5 ml of supernatant (composed mainly of motile sperm) was transferred into a new 1.5-ml tube, centrifuged for 2 min at 1,200 r.p.m. and incubated for 30 min at 37 °C. A 1 ml volume of supernatant was then collected, and centrifuged for 5 min at 4,800 r.p.m. The supernatant was discarded and the pellet was resuspended in a cell lysis buffer (SDS 0.01%, Triton X-100 0.005%, dissolved in RNAse-free water) and incubated on ice for 30 min. Samples were then centrifuged at 4,800 r.p.m., washed with cold 1× PBS, resuspended in 500 μl of TRIzol reagent (Thermo Fisher) and stored at −80 °C until further processing. Total RNA was prepared using the RNAeasy Mini Kit (QIAGEN 74104) or the miRNeasy Micro Kit (QIAGEN 1071023) according to the manufacturer’s instructions.
A 10–15-μl aliquot of PBS-washed motile spermatozoa was used to microscopically check somatic cell contamination (only samples with no visible somatic cells were considered for downstream applications) and for functional analyses. Sperm concentration, motility and progressive motility have been calculated using an automated computer-assisted semen analysis (Hamilton Thorn IVOS II) according to the manufacturer’s instructions.
IVF and embryo analysis
Oocyte isolation and IVF were conducted following standardized procedures of the INFRAFRONTIER consortium as previously described21. Briefly, male gamete donors were euthanized at 8 weeks of age, after the 2-week exposure to LFD or HFD. Mature sperm cells were obtained from cauda epididymis as described above. WT unexposed female oocyte donors were euthanized the same day at 10–11 weeks of age, after superovulation induced with 7.5 U of pregnant mare serum gonadotropin and 7.5 U of human chorionic gonadotropin before being killed for oocyte collection. The sperm and the oocytes were co-cultured for 4–6 h. Subsequently, the oocytes were transferred and incubated in high-calcium human tubal fluid (HTF) culture medium at 37 °C and 5% CO2. First-cleavage (zygote to two-cell embryo) rate and rate of blastocyst development were measured to assess embryonic development and complement sperm functional analysis in determining the effects of HFD on male reproductive fitness.
Morulae were microscopically checked and individually picked after incubating fertilized oocytes in high-calcium HTF for 72 h.
Testes single-cell RNA-seq library preparation
Testes from 8-week-old mice fed on HFD or LFD for 2 weeks (n = 3 per group) were processed to obtain a single-cell suspension60. Briefly, testes were decapsulated and incubated with 1 mg ml−1 collagenase IV (3 min, 37 °C), washed twice with warm 1× KREBS (10× KREBS: 3.26 g KH2PO4, 139.5 g NaCl, 5.89 g MgSO4⋅7H2O, 50 g dextrose, 3.78 g CaCl2⋅2H2O, 7.12 g KCl in 2 l water and filtered with 0.22 µm) by sedimentation, incubated with DNAse I plus 0.25% trypsin (Gibco; 15–20 min at 34 °C), filtered with a 40-μm filter, centrifuged at 600g, 5 min at 4 °C, washed twice with cold 1× KREBS and resuspended with 1 ml 1× PBS + 0.04% BSA. Ten thousand cells were targeted using the Chromium Next GEM Chip G Single Cell and the Chromium Next GEM Single Cell 3′ Gel Bead Kit v3.1. Chromium Next GEM Single Cell 3′ GEM Kit v3.1 was used for cDNA synthesis and Chromium Next GEM Single Cell 3′ Library Kit v3.1 was used for library preparation following the ChromiumNextGEMSingleCell3_v3.1_Rev_D user guide (10x Genomics). Libraries were verified using a 2100 Bioanalyzer (Agilent). Samples were paired-end sequenced on an Illumina NovaSeq 6000 platform following the number of cycles recommended by 10x Genomics.
Testes single-cell RNA-seq analysis
Raw sequencing data were demultiplexed using the Cell Ranger (10x Genomics) mkfastq to obtain fastq files for each sample. Demultiplexed reads were processed and mapped to the mouse genome (refdata-gex-mm10-2020-A), and filtering and unique molecular identifier counting were carried out to obtain gene transcript counts per cell (gene barcode matrix) using the Cell Ranger (version 6.0.1) count function (Supplementary Table 1).
Cell Ranger-filtered count matrices for each sample were imported into R and Seurat R objects (version 4.3.0) were created62. For each sample, further filtering was conducted to select high-quality cells. We filtered raw count matrices by excluding cells expressing fewer than 200 detectably expressed genes and genes expressed in fewer than three cells. Cells with more than 40,000 detected features (nFeature_RNA) per cell were excluded. All of the samples were combined using the merge function. Gene expression counts were normalized with a scale factor of 10,000 and a log(1 + n) transformation, using the Seurat NormalizeData function. The CellCycleScoring function was used to infer cell cycle phase, as it determines relative expression of a large set of G2/M- and S-phase genes. The highly variable genes were identified using the FindVariableFeatures function. The Seurat object was subsequently scaled and analysed by PCA. After PCA, we used the RunHarmony function in the Harmony R package63 for integration.
The top 30 principal components were used to carry out the uniform manifold approximation and projection dimensional reduction. We then constructed the nearest-neighbour graph with the FindNeighbors function with the reduction as ‘harmony’, and dimensionality reduction as 1:30. Clusters were then identified using the FindClusters function with the resolution parameter of 0.3, which resulted in 18 clusters. Markers of each cluster were identified using the FindAllMarkers function with a Wilcoxon rank sum test. Cell types were assigned on the basis of publicly available testes single-cell datasets from age-matched mice64. The raw count gene expression data were used to carry out differential gene expression for all of the clusters and bulk using the DESeq2 package (Supplementary Table 2).
sncRNA-seq library preparation
A 10–50 ng quantity of total RNA from round spermatids (50 ng; n = 3 per group) and cauda spermatozoa (10 ng; n = 3 per group) from mice fed on HFD or LFD for 2 weeks or cauda spermatozoa from mice fed on HFD or LFD for 2 weeks, mated and allowed to recover on chow diet for 4 weeks (10 ng; n = 3 per group) was used for sncRNA library preparation. Libraries were prepared using the NEBNext Multiplex Small RNA Library Prep Set for Illumina (NEB E7560S) with 5′ and 3′ adaptors diluted 1:5 and 15 PCR amplification cycles. Libraries were verified using a 2100 Bioanalyzer (Agilent) and paired-end (read length = 150 base pairs (bp)) sequenced with the Illumina NovaSeq 6000 platform. Although allowing robust detection of all sncRNA biotypes, this library preparation method does not efficiently capture highly modified sncRNAs, such as tsRNAs and rsRNAs.
sncRNA-seq analysis
Raw sequencing data were quality checked using MultiQC v1.11. Reads were trimmed using cutadapt 2.8 according to the kit manufacturer’s instructions. Trimmed and quality-filtered reads were used for further analysis. Sequencing files were aligned and annotated to the mouse genome (mm10) using the SPORTS1.1 pipeline65 with default parameters and a maximum number of mismatches of 2. Reference genome and small RNA annotation databases were downloaded from the SPORTS website (https://github.com/junchaoshi/sports1.1). This included the mm10 genome files, miRNA from miRbase 21, rRNA from National Center for Biotechnology Information (NCBI) Nucleotide, tRNA from GtRNAdb, piRNA from pirBase and piRNAbank, other ncRNA from ensembl (release-89), and rfam 12.3. The raw count tables generated by SPORTS were annotated to small RNA biotypes. Averages were aggregated across biotypes (rsRNA, tsRNA, miRNA, piRNA and so on) using the default annotations in SPORTS result output files.
The downstream analysis was carried out as previously described66 with few modifications. Briefly, counts were converted into reads per million (RPM). The fragments with at least 0.01 RPM in all of the samples and lengths between 16 and 45 nucleotides were retained for further analysis. Next, edgeR was used to identify differentially expressed fragments. All of the analyses were carried out using different packages in R version 4.1.2 and Bioconductor version 3.14.
RNA-seq library construction
Round spermatids, cauda spermatozoa and morula library construction and sequencing were outsourced to IGA Technology Services. Libraries were constructed using the Nextera Library Prep Kit (Illumina) according to the manufacturer’s instructions and sequenced on an Illumina HiSeq 2500 at 75-bp paired-ended (round spermatids and morula) or single-ended (cauda spermatozoa), with a minimum output of 40 million reads per sample.
Liver, muscle and epididymal white adipose tissue total RNA was prepared using TRIzol reagent (Thermo Fisher) according to the manufacturer’s instructions. RNA concentration and integrity were controlled on a Bioanalyzer system (Agilent) and only RNA samples with RIN values > 7 were used for downstream applications. Sequencing libraries were prepared by using the Quantseq 3′ mRNA-Seq mRNA Library Prep Kit FWD for Illumina (Lexogen) with i7 indexes (Lexogen) according to the manufacturer’s instructions. Libraries were sequenced on an Illumina HiSeq 2500 at 50-bp single-ended, with a minimum output of 40–50 million reads per sample.
RNA-seq data analysis
Reads mapping and differential expression analysis were carried out using the A.I.R. (Artificial Intelligence RNA-Seq) software from Sequentia Biotech with the following pipeline: BBDuk (reads trimming; BBDUkguide), STAR (reads mapping to the mouse genome GRCm38 (ENSEMBL); https://github.com/alexdobin/STAR), featureCounts (gene expression quantification; https://subread.sourceforge.net/featureCounts.html) and NOISeq (statistical analysis of differentially expressed genes; http://bioinfo.cipf.es/noiseq/doku.php). Compared to other methods to calculate differential expression, NOISeq is a data-adaptive non-parametric method specifically designed to account for high variability across replicates and genes with low expression levels67, a feature of RNA-seq datasets from both germ cells and developing embryos. Heat map and PCA analyses were carried out with the web application ClustVis using default parameters68 (Fig. 1c and Extended Data Figs. 1h, 2h, 6b,d and 8d,f) or with GraphPad Prism 8 or 9. Enrichment analyses were carried out with Enrichr69 or g:Profiler70 with default parameters.
For the heat map in Fig. 4f, plotted are manually annotated gene sets associated with multiple mitochondrial functions (see Supplementary Table 8 for the full list of genes). As most of these genes have tissue-specific expression, we plotted the average log2[FC(treated versus control)], for which the treated conditions are: direct HFD exposure for F0 tissues (gastrocnemius, epididymal white adipose tissue and sperm), paternal exposures for morula (HFD versus LFD; as we have only bulk RNA-seq data), and the phenotypic discordant HFD groups for both two-cell embryo (HFD_A versus HFD_B; see Fig. 3 and Extended Data Fig. 6 for the clustering) and adult F1 tissues (HFDi versus HFDt; see Fig. 1 for the group definition).
Heteroplasmy experiment
Animals and husbandry conditions
A total of 60 female mice of the strain C57BL/6N-mtST (nuclear DNA: C57BL/6N; mtDNA: ST, GenBank accession number KC663621) of the age of 3 to 16 weeks were used. Mice were specific pathogen free according to FELASA recommendations and maintained in a barrier rodent facility. Groups of 3 to 4 females were housed in type IIL IVC cages (Blue Line, Tecniplast). The cages were lined with 120 g bedding (Lignocel Select, 3.5–4.5 mm poplar chips, Rettenmaier) and enriched with nesting material (PurZellin, Paul Hartmann) (photoperiod 12 h:12 h light/dark). Food (V1534, Ssniff Spezialdiaeten) and tap water in 250-ml bottles were available ad libitum.
These experiments were carried out at the University of Vienna (Austria) and all of the experimental procedures were discussed and approved by the Ethics and Welfare Committee of the University of Veterinary Medicine, Vienna and the national authority (Austrian Federal Ministry of Education, Science and Research) according to section 26ff of the Animal Experiments Act, Tierversuchsgesetz 2012–TVG 2012 under licence number 2021-0.731.149.
Experimental procedure
Female mice were treated in groups of 15 for superovulation by an intraperitoneal injection of 0.1 ml CARD HyperOva (CosmoBio, by Hölzel) and 48 h later 5 IU in 0.1 ml hCG (Chorulon; Intervet). Animals were euthanized 14 h after hCG injection, oviducts were dissected and ampullae were opened in drops of 90 μl HTF medium to collect the cumulus oocyte complexes. Oocytes were in vitro fertilized with sperm from LFD- or HFD-fed mice. Sperm from four different males per group were used as frozen–thawed sperm according to the EMMA protocol (https://www.infrafrontier.eu/). On each experimental date, oviducts from 7.5 females were used for IVF with a HFD male and oviducts from 7.5 females for IVF with LFD a male. On each date, different males were used. Briefly, thawed sperm was dissolved in 90 μl TYH medium and incubated for 30 min. A 10 μl volume of the sperm solution was added to a group of cumulus oocyte complexes and incubated for 4–6 h. After washing in HTF medium, oocytes were incubated overnight. Two-cell-stage embryos were washed the next morning in PBS, aliquoted in 0.2-ml tubes with lysis buffer71, snap-frozen in liquid nitrogen and stored at −80 °C.
Single-embryo RNA-seq and data analysis
A total of 122 HFD and 99 LFD single early two-cell embryos were processed to obtain RNA-seq libraries according to the Smart-seq2 protocol71 with 16 cycles of PCR pre-amplification. Libraries were verified using a 2100 Bioanalyzer (Agilent), pooled and paired-end sequenced on an Illumina NovaSeq 6000 platform (read length = 150 bp).
Data quality was assessed with MultiQC (version 1.11) and Nextera adaptors were removed from paired-end reads using trim_galore (version 0.6.6). Read mapping, gene expression quantification and differential expression analysis were carried out using the following pipeline: STAR (reads mapping to the mouse genome GRCm38 (ENSEMBL); https://github.com/alexdobin/STAR; after replacing the mtDNA sequence according to the strain C57BL/6N-mtST; GenBank accession number KC663621); featureCounts (gene expression quantification; https://subread.sourceforge.net/featureCounts.html); DESeq2 (package 1.34.0; differential gene expression analysis). Gene Ontology (GO) enrichment analysis of differentially expressed genes has been carried out using g:Profiler72 with default parameters. Plotted are the driver GO terms from the ‘Molecular function’ and ‘Biological processes’ categories.
Individual embryos were sexed using the exclusive expression of Y-linked genes. For the analysis of mtDNA heteroplasmy, uniquely mapped reads were extracted from the BAM files using samtools and used to estimate mtDNA heteroplasmy with MitoHEAR (mitochondrial heteroplasmy analyzeR; https://github.com/ScialdoneLab/MitoHEAR)73.
Principal component and clustering analyses and visualization were carried out with the web application ClustVis68 with default parameters.
For Seurat-based clustering of the single embryos, the counts data matrix from featureCounts was used as input to the Seurat package to treat the individual sample as a single cell. The Seurat NormalizeData function was used to normalize counts. The highly variable genes were identified using the FindVariableFeatures function. The Seurat object was subsequently scaled using ScaleData and PCA was carried out using RunPCA. The FindNeighbors function was used to construct the nearest-neighbour graph with a dimensionality reduction of 1:15. Clusters were then identified using the FindClusters function with the resolution parameter of 0.8.
As one specific limitation of this method, smart-seq2 cannot detect fragmented tRNAs, as they are not poly-adenylated. To account for the partial penetrance of the reported phenotypes, we should have profiled sncRNAs in single embryos. Nevertheless, according to the genetic distance between the ST and the BL6 mtDNAs (which we used to calculate the heteroplasmy), only 5′ fragments of mt-Tp would be reliably detectable (which has an SNP at the 5′ and 3′ of the mature tRNA sequence). Therefore, although very likely, our data do not show transfer of mt-tsRNAs while demonstrating inheritance of paternal mature mt-tRNAs.
Human studies
For the assessment of association of parental weight and the clinical phenotype of the offspring children, we analysed data from the Leipzig Childhood Obesity Cohort (NCT04491344) and the LIFE Child Study (NCT02550236), a prospective regional population-based longitudinal observational study aimed at characterizing contributing factors for civilization disease with comprehensive phenotyping (for example, clinical, laboratory, psychosocial data and biosamples) including data on parental BMI and BMI of the offspring conducted in the city of Leipzig, Germany32,33. With recruitment age ranging between the 24th week of gestation and 16 years of child age and annual follow-ups, the study combines a cross-sectional with a longitudinal design and covers a broad age range. The LIFE Child obesity sub-study specifically focuses on the origin and sequelae of childhood obesity in the frame of the Leipzig Childhood Obesity Cohort. After exclusion of children with present or past severe disease (for example, type 1 diabetes, syndromal obesity or cancer) and present or past interfering medical treatment (for example, insulin, immunosuppressives or growth hormone), we included those children with both parental BMIs available into the analysis (n = 3431; Supplementary Table 3). In the case of multiple visits, we used anthropometric data of the most recent visit of the child. For parental data, we used the earliest (that is, closest to the first pregnancy visit).
Expression of candidate genes derived from genome-wide expression data was analysed from children included in the Leipzig Adipose Tissue Childhood Cohort, for which subcutaneous adipose tissue samples and phenotypic data had been obtained as previously described30 (NCT02208141). RNA transcript levels were quantified using the Illumina HumanHT-12 v4.0 Expression BeadChip arrays and imputation, background correction and quality control were carried out according to previously published protocols74.
To correct for polygenic association of children’s BMI-SDS, we analysed 2,987 children of our cohorts for which genome-wide SNP array data, BMI-SDS and BMI of parents are available. Relatedness matrix was estimated according to ref. 75. The polygenic effect on BMI-SDS of children was subtracted by linear mixed model analysis using the R package GenABEL. We then tested the effects of parent BMI on BMI-SDS of children by linear regression analysis adjusting for age and sex.
sncRNA-seq from human spermatozoa
Sample collection. Samples were provided by our research collaborators from the University of Turku. After an informed consent, semen samples were given by 18 volunteering young men (19–21 years old). The men participated in a study on male reproductive health, which was approved by The Joint Ethics Committee of the University of Turku and Turku University Hospital. Ejaculation abstinence of at least 48 h was recommended before the semen sample collection. Standard semen analysis, including semen volume, pH, sperm concentration, total sperm counts and percentage of motile sperm, was carried out according to the World Health Organization Laboratory criteria in the Andrology laboratory of the Institute of Biomedicine. Spermatozoa were purified by centrifuging through a 50% gradient of Puresperm (Nidacon International). After washing the sample with PBS, the purity was evaluated microscopically, spermatozoa were counted, and the sample was evaluated for somatic cell contamination. The pellet was resuspended in Sperm CryoProtec II (Nidacon International AB) and frozen pellet was shipped to Germany. In Germany, spermatozoa were further purified from somatic cell contamination by washing with somatic cell lysis buffer76. The purity of samples was validated microscopically. Detailed sample information including sperm and metabolic parameters can be found in Supplementary Table 4.
RNA isolation. Total RNA was extracted from spermatozoa with the TRIzol–chloroform phase separation method followed by precipitation and wash steps with 100% and 70% ethanol, respectively. RNA quality control was carried out with Agilent Bioanalyzer and only the samples with a RIN value of between 2 and 4.5 and no evident intact ribosomal RNA peaks were further processed.
Library preparation. Sequencing libraries were prepared using NEBNext Small RNA Library Prep Set for Illumina (New England Biolabs) with 100–120 ng RNA as a total input. To minimize adaptor-dimer formation, adaptors were diluted 1:6, whereas reverse transcription primer was applied undiluted. At the final PCR stage, 1:2 diluted primers were used. Amplified libraries were cleaned with Monarch PCR & DNA Cleanup Kit (New England Biolabs) and size selected with AMPure XP (Beckman Coulter) using 1.0× and 3.7× beads for long-fragment removal and target-size retention, respectively. Pooled libraries were sequenced on a NovaSeq 6000, SP flow cell, 100 cycles (Illumina) at an average depth of 53.18 million reads per sample.
Data analysis. All sequencing files were quality checked using MultiQC v1.11. After a preliminary quality check, single-end read sequences were analysed using SPORTS1.1 (ref. 65). The adaptor sequence (5′-AGATCGGAAGAGCACACGTCTGAACTCCAGTCA-3′) was trimmed using in-built Cutadapt pipeline and sequences with the length of 15–45 nucleotides were further processed. Using human genome version hg19 and allowed mismatch number set to 2, SPORTS1.1 mapped reads to small RNA subtypes. After joining sequences from all samples into a common count matrix, a filter was applied to exclude sequences with no reads in more than 80% of samples. Samples were further analysed on the basis of BMI variation. Sequence-based differential expression analysis was carried out using DESeq2 (ref. 77) addressing BMI as a continuous variable. Briefly, a generalized linear model-based algorithm estimated the association of each sequence with the BMI across samples and output log2[FC] reflecting change in sequence expression level per unit of increment of BMI. Benjamini–Hochberg Padj < 0.1 served as a measure of significance for each result. Pearson-based correlation analysis between donors’ BMI and sperm sncRNA biotypes (variance-stabilizing transformation-normalized expression) has been carried out with GraphPad Prism 8 using default parameters.
The following string represents the code used for the continuous DESeq2 analysis of human sperm sncRNA-seq data: dd_obj <- DESeqDataSetFromMatrix(countData = [your sequence count matrix], colData = [data frame with sample ID and BMI values], design = ~BMI).
International Mouse Phenotyping Consortium data collection and analysis
The goal is to provide evidence in support of the hypothesis that paternal mitochondrial dysfunction leads to intergenerational alterations of metabolic homeostasis. The IMPC offers an invaluable resource of functional genetics studies29. Briefly, we used the IMPC dataset to study the (epi)genetic, intergenerational consequences of paternal manipulation of genes involved in mitochondrial structure and function.
For this, we compared a battery of 11 numerical metabolic phenotypes (fat/body weight; initial response to intraperitoneal glucose load; AUCipGTT; total food intake; respiratory exchange ratio; total cholesterol; HDL cholesterol; triglycerides; fasting blood glucose; albumin; alkaline phosphatase) in two populations of isogenic WT C57BL/6N mice generated from a pure WT lineage (control) or from heterozygous matings (WT—parental information (father × mother): het_×_het; het_×_WT; WT_×_het).
Gene selection
The selection of genes involved in mitochondria structure and function has been carried out with the following steps: from the list of IMPC genes (Data Release 11), we extracted those presenting with one or more of the above listed metabolic phenotypes in heterozygosity; functional enrichment analysis (GO and KEGG pathway analysis using Enrichr69); list of selected genes compiled by merging genes belonging to the following mt-related terms (GO:0005739; GO:0005743; GO:0005759; GO:0031966; GO:0042645; GO:0005747; GO:0005761; GO:0005763; GO:0005749; GO:0032592; KEGG:mmu00190; Supplementary Table 5).
Data collection and analysis
As the individual IMPC phenotyping centres phenotype control and/or WT animals (see definition above) and consider them as WT control for gene-specific phenotypes, specific control or WT data are not directly identifiable from the IMPC website. Therefore, we were granted special access to the IMPC data for specific collection of control and WT phenotypic information. Following collection, the dataset was arranged in a multidimensional data matrix for further analysis. Animals with at least three replicates per group and sex were included for further steps. Imputation for the missing data was carried out with a random forest imputation algorithm using the missForestR package with default parameters78. The data matrix was then scaled and the prcomp function in R was used to determine the principal components of the dataset. To quantify the difference between the genes on the basis of phenotypes, we use the Pearson correlation method using the get_dist function from the factoextra R package. These correlation coefficients were calculated to identify similarity patterns in gene–phenotype pairs and visualized in a heat map generated by using the ComplexHeatmap package from R. Cluster-specific phenotypes were visualized in a heat map including the 11 phenotypes, as well as sex- and parent-of-origin-specific information. Ranked AUCipGTT (expressed as log2[FC(WT versus control)]) was plotted as a horizontal bar plot using GraphPad Prism 9.
Cryopreserved sperm collection
Cryopreserved sperm samples containing a pool of purified cauda spermatozoa from 10 heterozygous mice were obtained from EMMA56. Genes were selected for availability with the exception of Tsfm (Ts translation elongation factor, mitochondrial), which represents a suitable negative control for initial mechanistic dissection.
RNA isolation and sncRNA-seq libraries construction
Individual straws were thawed directly in TRIzol and RNA was extracted with the miRNeasy Micro Kit (QIAGEN 1071023) according to the manufacturer’s instructions. A 10 ng quantity of total RNA was used for sncRNA library preparation. Libraries were prepared using the NEBNext Multiplex Small RNA Library Prep Set for Illumina (NEB E7560S) with 5′ and 3′ adaptors diluted 1:5 and 15 PCR amplification cycles. Libraries were verified using a 2100 Bioanalyzer (Agilent) and paired-end (read length = 150 bp) sequenced with the Illumina NovaSeq 6000 platform.
sncRNA-seq data analysis
sncRNA-seq data were analysed as described above. Briefly, sequencing raw data were quality checked using MultiQC v1.11. Reads were trimmed using cutadapt 2.8 according to the kit manufacturer instructions. Trimmed and quality-filtered reads were used for further analysis. Sequencing files were aligned and annotated to the mouse genome (mm10) using the SPORTS1.1 pipeline65 with default parameters and a maximum number of mismatch of 2. Reference genome and small RNA annotation databases were downloaded from the SPORTS website (https://github.com/junchaoshi/sports1.1). This included the mm10 genome files, miRNA from miRbase 21, rRNA from NCBI Nucleotide, tRNA from GtRNAdb, piRNA from pirBase and piRNAbank, other ncRNA from ensembl (release-89), and rfam 12.3. The raw count tables generated by SPORTS were annotated to small RNA biotypes. Averages were aggregated across biotypes (rsRNA, tsRNA, miRNA, piRNA and so on) using the default annotations in SPORTS result output files.
Statistical analysis
All figures and statistical analyses (as needed and appropriate) were generated using GraphPad Prism 8 or 9. Statistical significance was tested by Student’s t-test, or ANOVA as appropriate. Correlation analyses were used to test for linear regression. Odds ratios have been calculated using MedCalc (https://www.medcalc.org/calc/odds_ratio.php). All data are expressed as mean ± s.e.m. and a two-tailed P value < 0.05 was considered to indicate statistical significance unless otherwise specified in the text.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.