New parents get a lot of advice about sleep. What they don't get is a baseline. Is my kid sleeping a normal amount? Is the longest stretch actually getting longer, or do I just feel like it does at 3am? Are we drifting toward an earlier bedtime or am I imagining the trend?
I had one advantage: we'd been logging every nap and night in Huckleberry since the first weeks. That export is a CSV, and a CSV is a dataset. So I did what any sleep-deprived engineer does and opened a notebook.
This post is the write-up. The full analysis — code, charts, and all — is embedded at the bottom and lives at /baby_sleep_analysis.html.
The data
The Huckleberry export came out as 2,776 rows across 8 columns:
['Type', 'Start', 'End', 'Duration', 'Start Condition',
'Start Location', 'End Condition', 'Notes']
Type mixes sleep with feedings, diapers, and everything else you tap a button for at 4am. Filtering to just sleep dropped it to 667 sessions spanning August 2024 through January 2025 — roughly six months.
df = pd.read_csv(path_to_data)
df = df[df['Type'] == 'Sleep'].copy()
df['Sleep Start'] = pd.to_datetime(df['Start'])
df['Sleep End'] = pd.to_datetime(df['End'])
df['Sleep Duration'] = (df['Sleep End'] - df['Sleep Start']).dt.total_seconds() / 3600
Two derived columns did most of the analytical heavy lifting later: Is Night Sleep (a flag for evening/overnight sessions) and an age axis computed from date of birth, so I could plot against Days Old / Weeks Old instead of calendar dates. Plotting a baby against its own age rather than the wall calendar is the whole trick — it's what makes the trends legible.
Tooling was the boring, correct stack: pandas and numpy for wrangling, seaborn/matplotlib for charts, statsmodels and scipy for the regressions.
Question 1: Is the longest stretch really getting longer?
This is the question every exhausted parent actually cares about. I grouped by month and pulled the distribution of sleep-session durations:
Monthly Sleep Statistics (hours):
count mean std min max
2024-08 15 1.71 0.52 0.35 2.42
2024-09 23 1.96 1.35 0.25 4.97
2024-10 200 2.07 1.45 0.18 8.13
2024-11 182 2.40 2.17 0.15 10.50
2024-12 198 2.19 2.12 0.08 10.85
2025-01 49 1.88 1.71 0.28 6.23
Look at the max column, not the mean. The longest single stretch climbs from 2.4 hours in August to 10.85 hours by December — a newborn who couldn't go two hours turning into a baby who occasionally sleeps nearly eleven. A linear regression on longest-stretch over time confirmed the trend was real and not just a couple of lucky nights.
The mean, meanwhile, barely moves (1.7 → 2.4 → back to 1.9). That's the trap: averaging a night of 10 hours together with five 20-minute catnaps flattens exactly the signal you care about. The story was in the tail of the distribution, not its center — a lesson that generalizes well beyond babies.
Question 2: When does bedtime happen, and is it getting more consistent?
I filtered to evening sessions (onset after 6pm) and tracked the average start time, week over week:
Weekly Sleep Onset (hour of day, 24h):
Weeks Old mean onset
14 21.83 (~9:50pm)
17 22.57 (~10:34pm)
20 20.16 (~8:10pm)
21 20.45 (~8:27pm)
Bedtime drifts from nearly 10pm down toward 8pm as the weeks pass — earlier and more civilized. But the more satisfying metric was consistency. I measured the standard deviation of onset time, in minutes:
Weekly Onset Variability (minutes):
Weeks Old std
15 52.81
20 80.90
24 21.74
Outside a noisy patch around week 20, the spread tightens from ~53 minutes down to ~22. The baby wasn't just going to bed earlier — bedtime was becoming predictable. If you've ever clung to the promise of a routine, there it is in the data.
Question 3: Are we normal?
I pulled the AAP / National Sleep Foundation guidelines into a small reference frame and compared month by month:
Month 2: Total Sleep: 4.8h (Guideline 14-17h) - BELOW RANGE
Month 3: Total Sleep: 12.0h (Guideline 14-17h) - BELOW RANGE
Month 4: Total Sleep: 14.0h (Guideline 14-17h) - WITHIN RANGE
Month 5: Total Sleep: 14.8h (Guideline 14-17h) - WITHIN RANGE
At face value, months 2 and 3 look alarming — well below the recommended range. This is where being honest about your own data matters more than the chart looks. August had only 15 logged sessions the entire month. We weren't logging a sleepless infant; we were a brand-new family that hadn't built the habit of tapping the button yet. "4.8 hours of sleep in month 2" isn't a finding about the baby — it's a finding about us and the dataset.
By months 4 and 5, once logging was consistent, totals land squarely in range (14–15 hours/day) and nap counts converge toward the typical 3–4. Overall, across the whole window: 3.79 ± 1.15 naps a day, 4.83 hours of naps, and 8.33 hours of night sleep.
The biggest analytical lesson of the whole project lives in that "BELOW RANGE" line: missing data and a real signal can look identical on a chart. Distinguishing the two is the actual job. A plot that doesn't account for collection coverage will confidently tell you something false.
The pandas patterns that did the work
Strip away the domain and this was three idioms, over and over:
groupby+unstackto pivot long event logs into day-by-category matrices (day sleep vs. night sleep per date).- Datetime-derived features —
.dt.date,.dt.hour, age-since-DOB — so I could regress and group against meaningful axes. - A boolean classification column (
Is Night Sleep) computed once and reused everywhere, which kept every downstreamgroupbya one-liner.
daily_sleep = (df.groupby([df['Sleep Start'].dt.date, 'Is Night Sleep'])
['Sleep Duration'].sum().unstack())
daily_sleep.columns = ['Day Sleep', 'Night Sleep']
daily_sleep['Total Sleep'] = daily_sleep['Day Sleep'] + daily_sleep['Night Sleep']
What the data couldn't tell me
Plenty. Quality isn't duration — ten logged hours with three wakeups isn't ten hours of rest, and the export doesn't capture that. Week-over-week change averaged a meaningless 0.01 hours, with a +1.65h swing one week and a −0.94h drop the next; sleep is noisy and any single week is mostly weather. And n=1 is n=1 — this is one kid, not a study.
But that was never the point. The point was to replace anxiety with a baseline, and a baseline is exactly what a CSV and an afternoon of pandas can buy you. The stretches really were getting longer. Bedtime really was settling. Some nights you just have to trust the regression line over how you feel at 3am.
The full analysis
The complete notebook — every chart and every cell — is embedded below:
— Parker Jones, parkerjones.dev