. Yes I do have a GOOGLE_APPLICATION_CREDENTIALS environment variable set, but nowhere do we save anything to GCS. The only usage is in the code which reads from BigQuery
Are you certain you have no artifacts on GS?
Are you saying that if GOOGLE_APPLICATION_CREDENTIALS
and clearml.conf contains no "project" section it crashed when starting ?
BTW MagnificentSeaurchin79 just making sure here:
but I don't see the loss plot in scalars
This is only with Detect API ?
You mean does one solution is better than combining maintaining and automating 3+ solutions (dvc/lakefs + mlflow + cubeflow/airflow)
Yes I'd say it is. BTW if you have airflow running for other automations you can very easily combine the automation with clearml and have a single airflow automation for everything, but the main difference now airflow only launches logic, never actual compute/data (which are launched and scaled via clearml
Does that make sense?
Okay this seems correct...
Can you share both yaml files (server & serving) and env file?
Any chance you can PR a fix to the docs?
Nicely found @<1595587997728772096:profile|MuddyRobin9> !
UPD: works on 1.7.0 as well, the bug is introduced in 1.8.0
Thanks JitteryCoyote63 , just to be clear, is this only in comparison or also on the individual Tasks ?
You do not need the cudatoolkit package, this is automatically installed if the agent is using conda as package manager. See your clearml.conf for the exact configuration you are running
https://github.com/allegroai/clearml-agent/blob/a56343ffc717c7ca45774b94f38bd83fe3ce1d1e/docs/clearml.conf#L79
I’m not sure if
https
will work because I want to use ssh keys for creds.
BTW: I was not aware github provide pypi like artifactory, do they ?
Regrading SSH keys, they are passed from the host machine (i.e. in venv mode it will use the SSH keys from the user running the agent, and n docker mode, they are automatically mapped into the container)
Hmm let me check, I think we changed the offline mode to use the latest API version (because by definition it cannot know what's the server).
Let me check if you can override it
Woot woot! 🤩
I was thinking mainly about AWS.
Meaning S3?
Let me rerun the code and check
HappyDove3 where are you running the code?
(the upload is done in the background, but it seems the python interpreter closed?!)
You can also wait for the upload:task.upload_artifact(name="my artifact", artifact_object=np.eye(3,3), wait_on_upload=True)
Hi MelancholyElk85
So the way datasets now work, is they are actually an entity (folder) inside a project , all under TFW hidden .datasets sub project
This is so all data and tasks are both on the same project , but at the same time will not intersect with subprojects by the same name. Does that make sense?
I still have name
my_name
, but the project name
my_project/.datasets/my_name
rather than
my_project/.datasets
Yes, this is the expected behavior
And I don't see any new projects / subprojects where that dataset creation Task is stored
They are marked "hidden" hence by default you cannot see them in the UI (so they will only appear in the Dataset page),
you can turn the UI hidden flag by going to your settings page and selecting "Con...
Did you set an agent on a machine? (See clearml agent in docs for details)
Hi SuperiorCockroach75
You mean like turning on caching ? What do you mean by taking too long?
Hi CostlyElephant1
What do you mean by "delete raw data"? Data is always fetched to cached folders and clearml takes care of cache cleanup
That said notice that get mutable copy is a target you specify, in this case you should definetly delete after usage. Wdyt ?
I think what you are looking for is clearml-agent daemon
https://clear.ml/docs/latest/docs/clearml_agent
https://clear.ml/docs/latest/docs/getting_started/video_tutorials/agent_remote_execution_and_automation
You need to adjust it to your setup , specifically change the queue name to one you have. Does that make sense ?
CooperativeFox72 could you expand on "not working"?
If you have a yaml file, I would do:
` # local_path = './my_config.yaml'
path = task.connect_configuration(local_path, name=name)
if task.running_locally():
with open(local_path, "r") as config_file:
my_params_dict = yaml.load(config_file, Loader=yaml.FullLoader)
my_params_dict['change_me'] = 'new value'
my_params_text = yaml.dump(my_params_dict)
store back the change, my_params assumed to be the content of the param file (tex...
Hi @<1523702932069945344:profile|CheerfulGorilla72>
Please tell me what RAM metric is tracked by ClearML?
Free RAM is the entire machine free RAM
Yeah htop shows odd numbers as it doesn't "count" allocated buffers
specifically you can see the code here:
None
Hi @<1643060801088524288:profile|HarebrainedOstrich43>
try this RC let me know if it works 🙂
pip install clearml==1.13.3rc1
Hi AmiableFish73
Hi all - is there an easier way track the set of datasets used by a particular task?
I think the easiest is to give the Dataset an alias, it will automatically appear in the Configuration section:Dataset.get(..., alias="train dataset")
wdyt?
Hi MysteriousBee56 ,
Yes this is permissions issue, the docker creates all folders as root (as it is the root user running inside the docker), Then when you execute in venv mode, you are running it from your user, which obviously cannot change root created folders.
RoundMosquito25 do notice the agent is pulling the code from the remote repo, so you do need to push the local commits, but the uncommitted changes clearml will do for you. Make sense?
Hi FierceHamster54
This is already supported, unfortunately the open-source version only supports static allocation (i.e you can spin multiple agents and connect each one to specific number of GPUs), the dynamic option (where you have single agent allocating jobs to multiple GPUs / Slices is only part of the enterprise edition
(there is the hidden assumption there that if you spent so much on a DGX you are probably not a small team 🙂 )
Hi MotionlessCoral18
You can set all mount points here:
https://github.com/allegroai/clearml-agent/blob/6e31171d314a6e9b276c36d45314025783956b00/docs/clearml.conf#L241
I mean test with:pipe.start_locally(run_pipeline_steps_locally=False)
This actually creates the steps as Tasks and launches them on remote machines