Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Hi Guys! I Am New To Clearml And I Was Trying Out This Simple Code And It Took 4Min To Run. Is This Normal?

Hi guys! I am new to clearml and I was trying out this simple code and it took 4min to run. Is this normal?

Posted one year ago
Votes Newest

Answers 8

It shouldn't quite take 4 minutes. We did just have a server update so maybe there are some delays from there?

That said, it will take a little time, even if the code itself is very simple. This is a fixed cost overhead of ClearML analysing your runtime environment, packages etc.
This runs async though, so when the code itself takes more time, you won't notice this fixed cost as much, of course with just printing text it will be the majority of the runtime 🙂

Posted one year ago

How large are the datasets? To learn more you can always try to run something like line_profiler/kerprof, to get exactly how long a specific python line takes. How fast/stable is your internet?

Posted one year ago

Thanks for the reply but now I am trying to submit new tasks including getting datasets and still takes ~5min (excluding running my code), any idea what could be happening/how to debug?

Posted one year ago

Ok, the files are 20mb or less. I will try to profile the code and I will get back to you with the results. Never had issues with the internet stability so it shouldn't be that.

Posted one year ago

Ok, good to know! Thank you very much for doing this!

Posted one year ago

Here is the kernprofiler results from putting @profile before dataset.get:
It took 37s to access a dataset that must be in local cache since I use the same datset multiple times. Dataset size is a few mb stored in azure data storage, is this normal time duration? (had to cut the report because of message size limitation)
Total time: 37.5027 s
File: /home/francisco/miniconda3/envs/py310/lib/python3.10/site-packages/clearml/datasets/dataset.py
Function: get at line 1559

Line # Hits Time Per Hit % Time Line Contents

1685 4 11.4 2.9 0.0 if not dataset_id:
1686 4 2315410.9 578852.7 6.2 dataset_id, _ = cls.get_dataset_id(
1687 4 3.1 0.8 0.0 dataset_project=dataset_project,
1688 4 3.4 0.8 0.0 dataset_name=dataset_name,
1689 4 3.4 0.8 0.0 dataset_version=dataset_version,
1690 4 61.0 15.2 0.0 dataset_filter=dict(
1691 4 3.6 0.9 0.0 tags=dataset_tags,
1692 4 2.1 0.5 0.0 system_tags=system_tags,
1693 4 149.4 37.4 0.0 type=[str(Task.TaskTypes.data_processing)],
1694 4 4.2 1.1 0.0 status=["published"]
1695 4 3.3 0.8 0.0 if only_published
1696 4 3.7 0.9 0.0 else ["published", "completed", "closed"]
1697 4 2.2 0.6 0.0 if only_completed
1698 4 2.2 0.6 0.0 else None,
1699 ),
1700 4 2.6 0.7 0.0 shallow_search=shallow_search
1701 )
1702 4 6.8 1.7 0.0 if not dataset_id and not auto_create:
1703 raise ValueError(
1704 "Could not find Dataset {} {}".format(
1705 "id" if dataset_id else "project/name/version",
1706 dataset_id if dataset_id else (dataset_project, dataset_name, dataset_version),
1707 )
1708 )
1709 4 2.0 0.5 0.0 orig_dataset_id
= dataset_id
1711 4 21.6 5.4 0.0 if alias and overridable and running_remotely():
1712 remote_task = Task.get_task(task_id=get_remote_task_id())
1713 dataset_id = remote_task.get_parameter("{}/{}".format(cls.__hyperparams_section, alias))
1715 4 3.4 0.8 0.0 if not dataset_id:
1716 if not auto_create:
1717 raise ValueError(
1718 "Could not find Dataset {} {}".format(
1719 "id" if dataset_id else "project/name/version",
1720 dataset_id if dataset_id else (dataset_project, dataset_name, dataset_version),
1721 )
1722 )
1723 instance = Dataset.create(
1724 dataset_name=dataset_name, dataset_project=dataset_project, dataset_tags=dataset_tags
1725 )
1726 return finish_dataset_get(instance, instance._id)
1727 4 12669482.4 3167370.6 33.8 instance = get_instance(dataset_id)
1728 # Now we have the requested dataset, but if we want a mutable copy instead, we create a new dataset with the
1729 # current one as its parent. So one can add files to it and finalize as a new version.
1730 4 3.5 0.9 0.0 if writable_copy:
1731 writeable_instance = Dataset.create(
1732 dataset_name=instance.name,
1733 dataset_project=instance.project,
1734 dataset_tags=instance.tags,
1735 parent_datasets=[instance.id],
1736 )
1737 return finish_dataset_get(writeable_instance, writeable_instance.id)
1739 4 22510379.1 5627594.8 60.0 return finish_dataset_get(instance, orig_dataset_id

Posted one year ago

Hi @<1541592213111181312:profile|PleasantCoral12> thanks for sending me the details. Out of curiosity, could it be that your codebase / environment (apart from the clearml code, e.g. the whole git repo) is quit large? ClearML does a scan of your repo and packages every time a task is initialized, maybe that could be it. In the meantime I'm asking our devs if they can see any weird lag with your account on our end 🙂

Posted one year ago

Hey @<1541592213111181312:profile|PleasantCoral12> thanks for doing the profiling! This looks pretty normal to me. Although 37 seconds for a dataset.get is definitely too much. I just checked and for me it takes 3.7 seconds. Mind you the .get() method doesn't actually download the data, so the dataset size is irrelevant here.

But the slowdowns do seem to only occur when doing api requests. Possible next steps could be:

  • Send me your username and email address (maybe dm if you don't want it public)
  • Test the same code but on e.g. your mobile hotspot to try and exclude the internet connection as a possible issue
Posted one year ago
8 Answers
one year ago
one year ago