Exploring GBS data and some file conversions

I’ve been working on organizing my scripts to analyze GBS data for my dissertation chapter on the phylogeography of the Olympia oyster. Still a lot of work to do, but here’s a couple of notebooks. All notebooks can be found at my Github under 2016_Notebooks and the scripts referenced in them can be found in my Github’s Scripts repository.

GBS_File_Conversions.ipynb: I’m planning on using this notebook to describe various file conversion scripts and code snippets I’ve written to work with. May end up splitting them into separate notebooks, but we’ll see. Currently, it describes how to add population information to the Structure (.str) output from ipyrad/pyrad, whether that be integers (required by the actual Structure program), or strings (useful for programs like Adegenet in R). It also describes how to take a .geno or .str file and split it up into many .geno files for looking at pairwise Fst and nucleotide diversity in R.

Exploring_GBS_Data_With_R.ipynb: This notebook has snippets of code used to explore pairwise Fst and nucleotide diversity (pi). Will eventually expand to include other R-based analysis methods.

 

Advertisements

10/23/15

Common Garden Project

DNA extraction from Olympia oyster broodstock

  1. NF2_1
  2. NF2_2
  3. NF2_3
  4. NF2_4
  5. NF2_5
  6. NF2_6
  7. NF2_7
  8. NF2_9
  9. NF2_10
  10. NF2_11
  11. NF2_12
  12. NF2_13
  13. NF2_14
  14. NF2_15
  15. NF2_16
  16. NF2_17
  17. NF2_18
  18. NF2_19
  19. NF2_20
  20. NF3_1
  21. NF3_2
  22. NF3_3
  23. NF3_4
  24. NF3_5

Did not find NF2_8…

Olympia Oyster Population Structure Project

I started preparing genotype-by-sequencing libraries using the dried down samples from 10/21/15. Did an ApeKI digestion and ligation of adaptors.

Digestion:

1x 50x
NEB Buffer 3 2.μL 100
ApeKI 1 μL 50
H2O 17 μL 850

Added 20 uL of digestion master mix to each of the 48 wells. Incubated for 2 hours at 75degC then held at 4degC until ligation master mix was ready.

1x 51x
T4 Ligase Buffer 5 μL 255
T4 Ligase 1.6 μL 81.6 μL
H2O 23.4 μL

Added 30 uL of ligation mastermix to each well to give 50 uL total. Incubated at 19degC for 1 hour and 65degC for 20 minutes. Cooled to 0degC and put in freezer.

10/21/15

Common Garden Samples

  • DNA extraction of broodstock tissue. Let digest for 2.5 hours.
    1.  SS2_9
    2. SS2_10
    3. SS2_11
    4. SS2_12
    5. SS2_13
    6. SS2_14
    7. SS2_15
    8. SS2_16
    9. SS2_17
    10. S2_18
    11. S2_19
    12. S3_3
    13. S3_4
    14. S3_7
    15. S3_8
    16. S3_9
    17. S3_10
    18. S3_11
    19. S3_12
    20. S3_13
    21. S3_14
    22. S3_15
    23. S3_16
    24. S3_17

Running low on EZNA kit columns so ordered another kit.

Phylogeography Project

Setup dry down of 200 ng DNA from 48 samples.

BC1_12
BC1_8
BC2_7: 10.6
BC2_13: 11.5
BC2_9: 13 (8)
BC3_13: 108
BC3_12: 15 (8)
BC4_2: 8.84
BC4_17: 11.4 (8)
CA1_18: 24.4
CA1_1: 36
CA1_2: 17.4 (8)
CA2_9: 43.8
CA2_12: 18.8 (5)
CA3_6: 62.3 (5)
CA4_1: 5.1
CA4_7: 19.8 (8)
CA5_10: 48.8 (8)
CA6_8: 5.14
CA6_15: 25.2 (7)
CA7_11: 7.4
CA7_16: 19.1
CA7_15: 14.9 (8)
OR1_ 1: 7.66
OR1_5: 29.2 (7)
OR2_6: 12.9
OR2_12: 15.8
OR2_15: 18.2 (7)
OR3_15: 9.24
OR3_20: 13.7 (7)
WA10_16: 25.4
WA10_13: 9.52 (8)
WA11_10: 18.7
WA11_22: 17.4
WA11_4: 11.7
WA11_17: 4.78 (8)
WA12_15: 24.4
WA12_11: 4.18 (8)
WA13_12: 7.2
WA13_5: 4.4
WA13_15: 45.1 (9)
WA9_2: 11.4 (7)
WA1_16: 62 (8)
Conch_1: 12.4
conch_2: 12.3
conch_4: 9.65
CA5_13: 21.8 (repeat with 2)
CA5_10 (repeat within)

Keeping the calculations of uL to add on page 3 of this Google Sheet.

Friday 10/16/15

Common Garden 2b-RAD

Gel extracted 2b-rad PCR samples from Wednesday 10/14/15 using the Qiagen Gel Extraction Kit. Incubated 40 uL of elution buffer on the column for 5 minutes. Decided that going forward I would run PCR product out on a low melt gel and use gelase instead of the kit to save time/money.

Quantified gel extracted samples with High Sensitivity Qubit. Multiplied 37 uL by the lowest concentration to get ng per sample to add to pool, then calculated how many uL of each sample to add to pool. 37 * .373 = 13.8 ng.

Population Sample ng/uL after gel Vol to add to pool (uL)
Oyster Bay BS_2_6 0.907 15.21499449
Oyster Bay BS_2_7 0.388 35.56701031
Oyster Bay BS_2_8 0.661 20.8774584
Hood Canal BS_1_5 0.731 18.87824897
Hood Canal BS_1_6 1.45 9.517241379
Hood Canal BS_1_7 0.853 16.17819461
Hood Canal BS_1_8 0.96 14.375
Fidalgo BS_1_4 0.873 15.80756014
Fidalgo BS_1_5 0.373 36.99731903
Fidalgo BS_1_6 1.07 12.89719626

GBS Population Structure

Made new annealed adaptors for GBS as the current stock was over 2 years old. Followed Buckler lab protocol. buckler_lab_genotyping_by_sequencing_protocol_20110808

Wednesday 9/23/15

  • Did more extractions of broodstock samples. I think once this test library is done the broodstock libraries should be 1st priority, with samples from the stressor experiment 2nd and other larvae samples 3rd.
    • Used the EZNA kit with a 4 hour digestion time. All the tissue for HC1_16 was used up.
  1. HC1_11
  2. HC1_12
  3. HC1_13
  4. HC1_14
  5. HC1_15
  6. HC1_16
  7. HC1_17
  8. HC1_18
  9. HC1_19
  10. HC2_1
  11. HC2_2
  12. HC2_3
  13. HC2_4
  14. HC2_5
  15. HC2_6
  16. HC2_7
  17. HC2_8
  18. HC2_9
  19. HC2_10
  20. HC2_11
  21. HC2_12
  22. HC2_13
  23. HC2_14
  24. HC2_15
  • Finished the ethanol precipitation started on Tuesday 9/22/15. Left samples in 10 uL of water in 4degC overnight.
  • Qubit of samples for phylogeographic study. Made list of samples for GBS library.
      1. BC1_12: 25.2
      2. BC1_8: 6.44 (8)
      3. BC2_7: 10.6
      4. BC2_13: 11.5
      5. BC2_9: 13 (8)
      6. BC3_13: 108
      7. BC3_12: 15 (8)
      8. BC4_2: 8.84
      9. BC4_17: 11.4 (8)
      10. CA1_18: 24.4
      11. CA1_1: 36
      12. CA1_2: 17.4 (8)
      13. CA2_9: 43.8
      14. CA2_12: 18.8 (5)
      15. CA3_6: 62.3 (5)
      16. CA4_1: 5.1
      17. CA4_7: 19.8 (8)
      18. CA5_10: 48.8 (8)
      19. CA6_8: 5.14
      20. CA6_15: 25.2 (7)
      21. CA7_11: 7.4
      22. CA7_16: 19.1
      23. CA7_15: 14.9 (8)
      24. OR1_ 1: 7.66
      25. OR1_5: 29.2 (7)
      26. OR2_6: 12.9
      27. OR2_12: 15.8
      28. OR2_15: 18.2 (7)
      29. OR3_15: 9.24
      30. OR3_20: 13.7 (7)
      31. WA10_16: 25.4
      32. WA10_13: 9.52 (8)
      33. WA11_10: 18.7
      34. WA11_22: 17.4
      35. WA11_4: 11.7
      36. WA11_17: 4.78 (8)
      37. WA12_15: 24.4
      38. WA12_11: 4.18  (8)
      39. WA13_12: 7.2
      40. WA13_5: 4.4
      41. WA13_15: 45.1 (9)
      42. WA9_2: 11.4 (7)
      43. WA1_16: 62 (8)
      44. Conch_1: 12.4
      45. conch_2: 12.3
      46. conch_4: 9.65
      47. CA5_12: 21.8 (repeat with 2)
      48. CA5_10 repeat within

Friday 9/11/15

Thought I’d be switching my online notebook over to the Open Notebook Science network, but when trying to write this post I found that I could not add a new category. Will try to figure that out, but until then will keep posting here.

Took my samples and reagents from the freezer in my lab on campus to the Pritzker DNA Lab at the Field Museum, where I do the bulk of my molecular work. Spent the day alternating between writing my Doctoral Dissertation Improvement Grant (DDIG) and some lab work.

Common Garden Experiment

  • Extracted DNA from 14 broodstock samples using the EZNA Mollusc kit (protocol here). Left to digest for 2.5 hours.
    1. SS2_6
    2. SS2_7
    3. SS2_8
    4. HC1_6
    5. HC1_7
    6. HC1_8
    7. HC1_9
    8. HC1_10
    9. NF1_3
    10. NF1_4
    11. NF1_5
    12. NF1_6
    13. NF1_7
    14. NF1_8
  • Ran out a gel from the DNA extractions of larvae samples done on 8/11/12 at Manchester.
    • Population Tank Family Size Date Storage Est. # Date extracted
      1 Hood Canal LC >100 7/13/2015 75/95 EtOH 8/11/2015
      2 Fidalgo Bay NA LC 100 7/13/2015 75% EtOH 8/11/2015
      3 South Sound NA LC 100 7/13/2015 RNALater 8/11/2015
      4 Hood Canal NA HC2 100 7/17/2015 RNALater 8/11/2015
      5 Hood Canal NA LC 100 7/17/2015 RNALater 8/11/2015
      6 South Sound NA LC 100 7/17/2015 RNALater 8/11/2015
      7 Hood Canal NA LC 100 7/20/2015 RNALater 8/11/2015
      8 Fidalgo Bay NA LC 100 7/20/2015 RNALater 8/11/2015
      9 South Sound NA LC 100 7/20/2015 RNALater 8/11/2015
      10 Hood Canal HC_Tank1_160 NA 160 7/20/2015 RNALater 8/11/2015
      11 Fidalgo Bay NF_Tank1_new NA 160 7/20/2015 RNALater 8/11/2015
      12 South Sound SS_Tank1_new NA 160 7/15/2015 RNALater 8/11/2015
      13 Hood Canal HC_Tank1_new NA 160 7/24/2015 RNALater 8/11/2015
      14 Fidalgo Bay NF_Tank1_new NA 160 7/24/2015 RNALater 8/11/2015
      15 South Sound SS_Tank1_new NA 160 7/24/2015 RNALater 8/11/2015
      16 Hood Canal HC_Tank1_new NA 160 7/27/2015 RNALater 8/11/2015
      17 Fidalgo Bay NF_Tank1_new NA 160 7/27/2015 RNALater 8/11/2015
      18 South Sound SS_Tank1_new NA 160 7/27/2015 RNALater 8/11/2015
      19 Hood Canal HC_Tank2_160 224 8/3/2015 RNALater 8/11/2015
      20 Fidalgo Bay NF_Tank2_160 224 8/3/2015 RNALater 8/11/2015
      21 South Sound SS_Tank2_160 224 8/3/2015 RNALater 8/11/2015
      22 Hood Canal HC_Tank2_160 >224 8/7/2015 RNALater 350 8/11/2015
      23 Fidalgo Bay NF_Tank2_160 >224 8/7/2015 RNALater 504 8/11/2015
      24 Oyster Bay SS_Tank2_160 >224 8/7/2015 RNALater 641 8/11/2015
    • The computer hooked up to the UV camera wouldn’t recognize a USB drive, so I just have a phone picture of a printed out picture for now. 12 and 13 are cut off, but they did not show up on gel. For gels of DNA extracts, I give a ranking from 1 (did not work at all) to 5 (bright band of high molecular weight DNA). These are put of the Sample Master Sheet.
      • gel_9_11_15
        1. 3.5
        2. 4
        3. 4
        4. 1 (rerun)
        5. 4
        6. 4.5
        7. 3.5
        8. 3 (low)
        9. 5
        10. 4
        11. 1 (rerun)
        12. 1 (rerun)
        13. 1 (rerun)
        14. 1 (rerun)
        15. 5
        16. 3 (degrad)
        17. 2.5 (degrad)
        18. 2 (degrad)
        19. 5
        20. 1 (rerun)
        21. 4.5
        22. 3 (degrad)
        23. 2.5 (low)
        24. 5
  • Playing around with data!
    • Looked at larvae release from each population across time. Did this partially out of burning curiosity and because it would be nice to have some preliminary data to put in my DDIG proposal.
    • First, I had to edit the Google Sheet to include zeros for days when a population released no larvae and to have total counts across families. Then in R:
    •  > larvae = read.csv("Larval counts - Day 1 (1).csv", header = TRUE) 
      > names(larvae)

      [1] “Date” “Population” “Family” “Tank.added.to” “Volume.of.tripour..mL.” “Vol.of.drop.counts” “Ethanol.used.”
      [8] “Live.Count.1” “Live.Count.2” “Live.Count.3” “Live.Count.4” “Total.Live.Larvae” “X…” “Notes”
      [15] “Dead.count..1” “Dead..2” “Dead..3” “Dead..4” “Total.Dead” “Total.Larvae” “Total.by.date”

 > pop_Total_Date_na <- na.omit(pop_Total_Date)
> ggplot(data=pop_Total_Date_na, aes(x=Date, y=Total.by.date, \
group=Population, colour=Population)) + geom_line() + geom_point()

Day1Larvae_date

  • Interestingly, the South Sound population produced more larvae earlier. This mirrors the reciprocal transplant experiment, where SS oysters reached their maximum percentage of brooding females sooner at two of the 4 sites.

Oly Population Structure

  • In addition to making libraries for samples from the common garden, I need to start getting DNA ready for one more sequencing run for my project looking at rangewide population structure in Olympia oysters. Did 10 extractions concurrently with the common garden extractions.
  1. WA1_16
  2. WA1_14
  3. BC1_19
  4. BC1_18
  5. CA6_16
  6. CA6_17
  7. CA6_18
  8. CA7_13
  9. OR3_7
  10. OR3_20