Reputation
Badges 1
662 × Eureka!We just inherit from logging.Handler
and use that in our logging.config.dictConfig
; weird thing is that it still logs most of the tasks, just not the last one?
I'll try with 1.1.5 first, then 1.1.6rc0
e.g. a separate structured user guide with common tips, usability, best practices - https://pandas.pydata.org/pandas-docs/stable/user_guide/index.html
vs the doc, where each function is its own page, e.g.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
I... did not, ashamed to admit. The documentation says only boolean values.
And last but not least, for dictionary for example, it would be really cool if one could do:my_config = task.connect_configuration(my_config, name=name) my_other_config = task.connect_configuration(my_other_config, name=other_name) my_other_config['bar'] = my_config # Creates the link automatically between the dictionaries
I can navigate through the projects, but selecting one task in one project, then navigating to another project and selecting a different task -> there is no suggestion to compare the tasks.
In the projects page if I show all - I just see the projects. If I search for a task of similar name, I get results, but I can't compare them via the UI.
The only way I managed so far was to create a pseudo-comparison between unrelated tasks in the same project, then remove one task from comparion, and u...
Another example - trying to validate dataset interactions ends with
` else:
self._created_task = True
dataset_project, parent_project = self._build_hidden_project_name(dataset_project, dataset_name)
task = Task.create(
project_name=dataset_project, task_name=dataset_name, task_type=Task.TaskTypes.data_processing)
if bool(Session.check_min_api_server_version(Dataset.__min_api_version)):
get_or_create_proje...
I have seen this quite frequently as well tbh!
CostlyOstrich36 That looks promising, but I don't see any documentation on the returned schema (i.e. workers.worker_stats
is not specified anywhere?)
We have an internal mono-repo and some of the packages are required - they’re all available correctly for the controller, only some are required for the individual tasks, but the “magic” doesn’t happen 😞
That is, the controller does not identify them as a requirement, so they’re not installed in the tasks environment.
It’s just that for the packages
argument, ClearML says:
If not provided, packages are automatically added based on the imports used inside the wrapped function.
So… 🤔
I can also do this via Mongo directly, but I was hoping to skip the K8S interaction there.
Any follow up thoughts SuccessfulKoala55 or CostlyOstrich36 ?
The deferred_init
input argument to Task.init
is bool
by default, so checking type(deferred_init) == int
makes no sense to begin with, and is altering the flow.
Last but not least - can I cancel the offline zip creation if I'm not interested in it 🤔
EDIT: I see not, guess one has to patch ZipFile
...
FWIW running clearml
==1.9.1
with WebApp: 1.9.2-317 • Server: 1.9.2-317 • API: 2.23
Happens with the latest version indeed.
I can’t share our code, but the gist of it is:
pipe = PipelineController(name=..., project=..., version=...)
pipe.add_function_step(...) # Many calls
pipe.set_default_execution_queue(...)
pipe.start(queue=..., wait=True)
So the pipeline runs successfully, I can find all the different tasks, but I cannot see them in the Pipelines tab…
Thanks SuccessfulKoala55 and AgitatedDove14 ! We'll go through the hoops of setting up mongo on AWS then.
We're working to decouple the data from the helm chart, seems like a dangerous idea to store long term data on k8s in case of failure 😅
We're using self hosted account
I am indeed
nevermind! Found and answered (solution in the issue linked above)
Yes; I tried running it both outside venv and inside a venv. No idea why it uses 2.7?
I’ve tracked it down further, it seems the pigar utility does not apply any smart logic there.
The case we have is the following -
- We have a monorepo, but all modules/libs share a common namespace
foo
; so e.g. working on modulemod
, we usefrom foo.mod import …
- This then looks for a module called
foo
, even though it’s just a namespace - In the dist-info requirement, it seems any hyphen, dot, etc are swapped for an underscore, so our site-packages represents this as `foo_m...
minio was a tiny bit of headache to configure, but I'd be happy to help if you want CrookedWalrus33 , I just went through this process yesterday and today (see a few threads up...)
If everything is managed with a git repo, does this also mean PRs will have a messy metadata file attached to them?
Thanks Alon. In the full/official documentation the clearml-data
CLI is not mentioned anywhere, so perhaps it should be refreshed 😉
I think we're referring to different things here.
I won't be using the UI (and neither will my team).
But as mentioned, we've used DVC before and it adds a lot of junk metadata files to each GitHub PR (many dvc.yaml
, dvc.lock
and .gitignore
files). We're trying to avoid that as much as possible, hence my question about GitHub pull...
Thanks AgitatedDove14 , I'll first have to prove viability with the free version :)
Is there some default Docker image you ship with ClearML that you'd recommend, or can/should we use our own? 🙂
The overall flow I currently have is e.g.
Start an internal task (not ClearML Task; MLOps not initialized yet) Call some pre_init
function with args
so I can upload the environment file via StorageManager to S3 Call some start_run
function with the configuration dictionary loaded, so I can upload the relevant CSV files and configuration file Finally initialize the MLOps (ClearML), start a task, execute remotely
I can play around with 3/4 (so e.g. upload CSVs and configuratio...