IT
IT Operations Engineer (Application Support) - AI & Data Technology
Point72
- Company
- Point72
- Salary
- Not Mentioned
- Experience
- 3–6 years of experience
- Qualification
- Bachelor's Degree
Overview AI Summary
This role at Point72 is ideal for an IT Operations Engineer passionate about supporting cutting-edge AI and data technologies. You will be instrumental in maintaining the operational health, performance, and reliability of critical production platforms that underpin the firm's innovative investing strategies. This involves a blend of proactive monitoring, advanced troubleshooting, incident management, and automation, directly impacting the availability of sophisticated AI tools and data infrastructure. The ideal candidate possesses a strong background in application support or production engineering, with hands-on expertise in Python and SQL for debugging and automation. Experience with cloud environments, particularly AWS and Azure, along with familiarity with AI/ML platform operations such as model serving and inference pipelines, is crucial. This position demands excellent analytical and problem-solving skills, coupled with a proactive approach to enhancing system resilience and supportability. Joining Point72 means becoming part of a forward-thinking technology team that values innovation and professional growth. You will collaborate closely with AI and Data engineering teams, contributing to a culture that embraces open-source solutions and agile methodologies. This opportunity offers a chance to deepen your expertise in a specialized domain, make significant operational impacts, and develop within a leading global investment firm that is committed to its people's long-term success.
Job Description
Overview
Point72 is actively enhancing its IT infrastructure to support the evolving landscape of investing. The firm’s Technology group focuses on leveraging open-source solutions and adopting enterprise agile methodologies. This role within the AI & Data Technology team is critical for ensuring the operational stability, performance, and reliability of compliance-approved AI and data platforms. The team fosters professional growth and encourages innovative contributions.
Key Responsibilities
- Manage the daily operational health, availability, and performance of production AI and Data platforms approved for compliance.
- Monitor AI and Data services, including model inference layers, APIs, and data dependencies, utilizing logs, metrics, dashboards, and alerts.
- Provide specialized production-focused user support for AI tools and data platforms, prioritizing efficient issue resolution.
- Lead incident triage, coordination, and resolution efforts for platform outages or service degradations, working collaboratively with development and infrastructure teams.
- Perform deep technical troubleshooting across various layers including applications, data, and underlying systems.
- Enhance observability, alerting mechanisms, and operational runbooks to reduce the mean time to detect (MTTD) and mean time to resolve (MTTR) incidents.
- Conduct thorough post-incident root cause analysis and drive the implementation of corrective and preventive improvements.
- Support production deployments, configuration changes, and platform upgrades, with a strong emphasis on risk mitigation and stability.
- Automate repetitive operational tasks and support workflows using Python and other scripting tools.
- Collaborate closely with AI and Data engineering teams to improve platform resilience, scalability, and overall supportability.
Required Skills
- Bachelor’s degree in Computer Science, Engineering, Mathematics, Physics, or a related technical discipline.
- 3-6 years of experience in roles such as application support, production engineering, Site Reliability Engineering (SRE), or platform operations.
- Strong proficiency in Python for debugging, automation, and developing operational tooling.
- Proficiency in SQL for data validation, issue investigation, and platform troubleshooting.
- Working knowledge of cloud operations, with a preference for experience in AWS and Azure.
- Familiarity with Windows environments, .NET applications, SQL Server, and Databricks.
- A good understanding of production systems, APIs, and distributed services.
- Experience supporting or operating AI/ML platforms, including knowledge of model serving, inference pipelines, and dependency management.
- Excellent analytical, troubleshooting, and incident management skills.
- Demonstrated commitment to the highest ethical standards.
Preferred Skills
- Prior experience in supporting Reference/Alternate Data applications.
Benefits
Point72 is dedicated to investing in its employees' careers, health, and well-being, offering a range of benefits including:
- Comprehensive health care benefits
- Maternity, adoption, and related leave policies
- Generous paternity and family care leave policies
- Employee Assistance Program and mental wellness programs
- Transportation support
- Tuition assistance
Additional Information
Point72 is a leading global alternative investment firm, led by Steven A. Cohen, with over 30 years of investing experience. The firm aims to deliver superior returns through both fundamental and systematic investing strategies across various asset classes and geographies. Point72 is committed to attracting and retaining top talent by cultivating an investor-led culture and supporting the long-term growth of its people. For more information, please visit point72.com.
Key Skills
Ready to apply?
You'll be redirected to the official career page of Point72.
Apply on Official Site