Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hey! Starting An Mlops Director Position In 2 Weeks. I'M Thinking About Architecture. Has Anyone Ever Tried To Use Clearml As An Experiment Tracker, But Used A Different Orchestrator Like Metaflow, Airflow, Prefect, Etc.? I'M Struggling To Find Guides Or

Hey! Starting an MLOps Director position in 2 weeks. I'm thinking about architecture.

Has anyone ever tried to use ClearML as an experiment tracker, but used a DIFFERENT orchestrator like Metaflow, Airflow, Prefect, etc.? I'm struggling to find guides or "hot takes" online for this.

  
  
Posted one month ago
Votes Newest

Answers 9


I have tried:
Airflow - Pain to setup, old UI and other problems

Prefect - Literaly just tried to setup a simple distributed system, took me a week, I do not recommend this tool at all, horrible documentation, noone helps at slack.

Dagster - Absolute beauty, nice UI, easy to setup (as a pip package or just a docker + postgres), i highly recommend this tool. Takes a bit to get used to it. I will in coming week try this combo of dagster + clearml, where i periodically check some things and if i met some criteria I will spawn clearml jobs that will be put into clearml queue and executed.

  
  
Posted one month ago

Dang! @<1590514584836378624:profile|AmiableSeaturtle81> awesome answer thank you! You seem like an awesome person to know. Definitely connect if you'd like to talk ops stuff sometime. None

  
  
Posted one month ago

@<1523701482157772800:profile|AnxiousSeal95> I see a lot of people here migrating data from one data source to another.
For us it was that we experimented with Clearml to get the feeling and we used clearml built in file storage to save debug images an all other artifacts.

Then we grew rapidly and we had to migrate to S3 storage.
I had to write a script that goes through elasticsearch and mongo db to point to new S3 links wher the data was migrated to.
I do however understand that migration in itself is not easy and there isnt a magical button to solve this issue. However, exposed API that could change the artifact file path prefix maybe could be useful

  
  
Posted one month ago

I'm also curious about using external orchestrators as opposed to the ClearML's built-in ones

  
  
Posted one month ago

@<1541954607595393024:profile|BattyCrocodile47> Thanks a lot for the explanation! These inputs help us a lot building our tools, and eventually, building user's trust in them 🙂 Let us know with what orchestrator you ended up with and how it's going!

  
  
Posted one month ago

@<1590514584836378624:profile|AmiableSeaturtle81> yeah I can see what you mean. So you reuploaded everything from the ClearML file server into S3 and just changed the links?

  
  
Posted one month ago

I've also used Airflow and Dagster in prod, but not integrated them with an exp tracker.

  
  
Posted one month ago

@<1590514584836378624:profile|AmiableSeaturtle81> Cool to see the community building such things! 🙂 If this works out for you, we'll be happy if you share your process!

A question both to you and @<1541954607595393024:profile|BattyCrocodile47> , what compels you to use a different orchestrator? Anything missing from the ClearML orchestration layer?

  
  
Posted one month ago

Hey @<1523701482157772800:profile|AnxiousSeal95> ! I think ClearML's orchestrator is a great fit for ad-hoc experimentation, but not for (event-triggered) batch inference jobs that need to be relied on in production.

I'd only feel comfortable supporting pipelines that serve end users on a tool that is known for that, e.g. Metaflow, Dagster, or Airflow--mainly because those tools emphasize good monitoring and integration with the wider data ecosystem.

  
  
Posted one month ago
116 Views
9 Answers
one month ago
one month ago
Tags