We’re hiring a Research Engineer strongly committed to the principles of free knowledge, open source and open data, transparency, privacy, and collaboration to join the Research team. As a Research Engineer on our team, you will support the research scientists in addressing knowledge gaps on the Wikimedia projects, supporting the Wikimedia volunteers in improving knowledge integrity, and building a more global community of Wikimedia researchers. We’re accepting applications until the 31st of August with a start date by, or before, October 30th.
You’ll work remotely with a distributed team, with members spread between Europe and North America. Here are some things we’ve worked on recently that might give you a better sense of what you could be working on:
We built a hyperlink recommendation algorithm (by building on past research) to support the Growth team in their newcomer task recommendations.
We used readers’ trajectories on Wikipedia to inform Wikipedia editors about COVID-19 related pages that readers seek to gain information from. (code)
We worked with the Analytics, Legal, and Security teams to find a privacy-respectful way to store COVID-19 related page-view traces beyond the 90-day limit that is our standard for purging this data. (code)
We ran surveys in Wikipedia across 14 languages and collected demographics data from the Wikipedia readers and their motivation and needs to study the effect of demographics on reader behavior. (ongoing results)
We built an NLP model to identify Unsourced Statements in Wikipedia articles. (paper, code)
You can learn more about what we have done in the past six month by reading our biannual report.
You will be responsible for:
Defining engineering projects to improve the research scientists’ workflows. For example, in collaboration with the Legal, Security, and Analytics teams you will be developing a process for public data releases by the team.
Collaborating with Analytics Engineering and Machine Learning Platform teams, to improve data collection and data sanitization and processing
Building experimental APIs for the models developed by the team
Writing distributed computing code in Spark for the algorithms developed by the research scientists
Acting as the Research team’s engineering contact for internal and external conversations and decision making
Skills and experience:
Experience working as a research or data engineer on complex applied research projects
Comfortable with mathematics and the basics of statistics
Strong understanding of Computer Science fundamentals such as: algorithms, data structures and complexity
Familiarity with scientific computing libraries in Python. Experience with open source machine learning libraries such as scikit-learn and deep learning frameworks such as Keras, TensorFlow or Pytorch
Experience with Hadoop and related technologies: HDFS, YARN, MapReduce, Hive, Spark, etc. (more info about our Hadoop cluster and analytics servers)
Experience with MySQL/Postgres technologies
Experience developing RESTful APIs for data retrieval
Strong written and oral communication skills in English, including the ability to communicate complex technical issues to a cross-team and cross-functional audience
BS, MS, or PhD in Computer Science, Mathematics, Statistics, or a closely related engineering field; or the equivalent in related work experience
We know that you won’t know how all of our systems work on day one. With solid fundamentals and teamwork, you will get there.
Qualities that are important to us:
Commitment to the mission of the organization
Commitment to our guiding principles
Ability to disagree in a respectful manner and yet work towards a solution even when you disagree
Willingness to understand math and algorithms
Good at async communication
Solution-focused. The Wikimedia ecosystem is complex, resources are limited, and our guiding principles are ambitious. We want you to work to find solutions embracing these factors.
Ability to navigate through ambiguity and bring a project to completion with limited directions
Curiosity and commitment to learn
Additionally, we’d love it if you have:
A portfolio of open source programming projects
Experience in label collection using crowdsourcing platforms or large-scale systems
Production-level experience with Hadoop, Spark, Flink, Hive, Kafka, etc.
Experience working with volunteers
Experience editing Wikipedia or other Wikimedia or open data / knowledge projects
The Wikimedia Foundation is...
...the nonprofit organization that hosts and operates Wikipedia and the other Wikimedia free knowledge projects. Our vision is a world in which every single human can freely share in the sum of all knowledge. We believe that everyone has the potential to contribute something to our shared knowledge, and that everyone should be able to access that knowledge, free of interference. We host the Wikimedia projects, build software experiences for reading, contributing, and sharing Wikimedia content, support the volunteer communities and partners who make Wikimedia possible, and advocate for policies that enable Wikimedia and free knowledge to thrive. The Wikimedia Foundation is a charitable, not-for-profit organization that relies on donations. We receive financial support from millions of individuals around the world, with an average donation of about $15. We also receive donations through institutional grants and gifts. The Wikimedia Foundation is a United States 501(c)(3) tax-exempt organization with offices in San Francisco, California, USA.
As an equal opportunity employer, the Wikimedia Foundation values having a diverse workforce and continuously strives to maintain an inclusive and equitable workplace. We encourage people with a diverse range of backgrounds to apply. We do not discriminate against any person based upon their race, traits historically associated with race, religion, color, national origin, sex, pregnancy or related medical conditions, parental status, sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, genetic information, or any other legally protected characteristics.
If you are a qualified applicant requiring assistance or an accommodation to complete any step of the application process due to a disability, you may contact us at firstname.lastname@example.org or (415) 839-6885.
U.S. Benefits & Perks*
Fully paid medical, dental and vision coverage for employees and their eligible families (yes, fully paid premiums!)
The Wellness Program provides reimbursement for mind, body and soul activities such as fitness memberships, baby sitting, continuing education and much more
The 401(k) retirement plan offers matched contributions at 4% of annual salary
Flexible and generous time off - vacation, sick and volunteer days, plus 19 paid holidays - including the last week of the year.
Family friendly! 100% paid new parent leave for seven weeks plus an additional five weeks for pregnancy, flexible options to phase back in after leave, fully equipped lactation room.
For those emergency moments - long and short term disability, life insurance (2x salary) and an employee assistance program
Pre-tax savings plans for health care, child care, elder care, public transportation and parking expenses
Telecommuting and flexible work schedules available
Appropriate fuel for thinking and coding (aka, a pantry full of treats) and monthly massages to help staff relax
Great colleagues - diverse staff and contractors speaking dozens of languages from around the world, fantastic intellectual discourse, mission-driven and intensely passionate people
*Please note that for remote roles located outside of the U.S., we defer to our PEO to ensure alignment with local labor laws.