Skip to main content

Job Description

   Back

Lead - Server Engineer

21-12-2025 17:06:54

Job_303409

5 - 10 years

  • Chennai, Tamil Nadu, India (CHN)

A summary of the key tasks/activities across critical areas for server management per our

understanding is provided below:

Server Landscape  Windows (physical & virtual)

Physical Server Perform all physical device tasks with the exception of the data centre


itself


Design Design hardware configuration to meet customer needs

Monitoring Track utilization metrics (e.g., CPU, I/O, Memory) and alert for


anomalies


Reporting Aggregate metrics on regular schedule

Audit Verify metrics through alternative means (e.g., physical inspection)

Install Physically install servers

Move / Add / Change Move, add, or upgrade servers

Remote Hands Provide simple on-site services (reboot, vendor escort, etc.)

Server OS Manage the server operating system

Capacity Monitor capacity and deploy new instances / servers as needed

Image Deploy a pre-built server image from master and keep master


updated


Change Execute and track all changes (user-requested or required) in change


management system


2


Copyright©2025 Neurealm Private Limited. All rights reserved.


Preventive Maintenance Perform regular maintenance based on supplier schedule

Patch Ensure OS is patched / up to date.


Vulnerability and patch management to mitigate all the findings for

the infrastructure issues and findings using customer supplied

Tenable


Upgrade Perform major upgrades as required by client

Break/fix Respond, track, escalate, and resolve as needed to resolve problems


with OS


Secure Manage access, apply security patches, manage antivirus, run regular


security audits, penetration tests and intrusion detection

Hardening Plan & implement company policy for hardening

Monitor Track OS-level metrics such as number of concurrent users

Report Summarize metrics on a regular basis

Cluster Manage the software that load-balances resource-intensive tasks and

detects a device failure and manages the fail-over process to another

device


Virtualization Manage the software that hosts multiple server images in one or


multiple devices


Key Responsibilities

 A Windows Level 3 (L3) Administrator is the subject matter expert (SME) responsible

for the strategic design, advanced security, and deep-dive problem resolution of the

entire Windows Server infrastructure.

 Focus on complex, non-routine tasks that require architectural knowledge and high-

level scripting proficiency.

 Deep-Dive Troubleshooting & Root Cause Analysis (RCA)

 L3 is the final escalation point, handling the most complex, systemic, and persistent

issues.

 Systemic Performance Analysis:

 Utilizing Windows Performance Monitor (Perfmon), Resource Monitor, and

advanced tools like Windows Performance Analyzer (WPA) and Process

Monitor/Explorer to diagnose high CPU utilization, memory leaks, and I/O

bottlenecks that impact application performance across the environment.


3


Copyright©2025 Neurealm Private Limited. All rights reserved.


 Analyzing crash dumps (Dumps) and hangs using tools like the Windows Debugger

(WinDbg) to identify root code-level or driver issues.

 Active Directory (AD) Recovery and Advanced Health:

 Troubleshooting and resolving complex, non-replicating AD environments, including

issues like Lingering Objects, USN Rollbacks, and Intersite Replication failures.

 Performing Authoritative and Non-Authoritative Restores of AD, or using the Active

Directory Recycle Bin for critical object recovery.

 Deep-diving into Kerberos authentication failures, complex DNS Service Location

(SRV) record issues, and FSMO role holder stability.

 Networking & Protocol Diagnostics:

 Using Wireshark or Microsoft Network Monitor (NetMon) for deep-packet

inspection to diagnose complex network issues like fragmentation, MTU

mismatches, or unexpected protocol behavior impacting server communications.

 Resolving advanced issues with DFS (Distributed File System) Replication stability

and integrity.

 Architecture, Design, & Infrastructure Projects

 L3 administrators are involved in planning and implementing major changes to the

server environment.

 Active Directory Design:

 Planning and executing Forest/Domain consolidations or splitting existing forests,

including complex trust relationship architectures.

 Designing the optimal Group Policy Object (GPO) structure and inheritance for large,

complex organizations to ensure stability and security compliance.

 High Availability (HA) & Clustering:

 Designing, configuring, and troubleshooting advanced features in Windows Failover

Clustering (WSFC), including Cluster Shared Volumes (CSV) and Quorum

configuration.

 Implementing and managing database high availability solutions like SQL Server

Always On Availability Groups.

 Hybrid & Cloud Integration:

 Planning and deploying tools like Azure AD Connect (now Microsoft Entra Connect)

for synchronization optimization and complex attribute filtering.

 Architecting solutions using Azure Arc to bring on-premises servers under

centralized cloud management and governance.

 Virtualization Management:

 Designing and optimizing large Hyper-V or VMware virtual server platforms, focusing

on resource allocation, host clustering, and virtual machine performance tuning.

 Security, Governance, & Compliance


4


Copyright©2025 Neurealm Private Limited. All rights reserved.


 This involves implementing and maintaining the highest security posture across the

Windows environment.

 Security Hardening:

 Implementing and enforcing CIS Benchmarks or DISA STIGs (Security Technical

Implementation Guides) for Windows Server.

 Designing JEA (Just Enough Administration) and JIT (Just-in-Time) access for

administrative accounts using tools like Privileged Access Management (PAM)

solutions.

 Advanced GPO Management:

 Deploying and managing advanced security settings via GPO, such as AppLocker or

Windows Defender Application Control (WDAC) to restrict unauthorized

executables.

 Developing and maintaining detailed Security Auditing Policies for logon/logoff

events, object acc