Reputation
Badges 1
25 × Eureka!but when I removed output_uri from Task.init, the pickled model has path
When you run the job on the k8s pod?
Hi StraightDog31
I am having trouble using theΒ
StorageManager
Β to upload files to GCP bucket
Are you using the storagemanager directly ? or are you using task.upload_artifact ?
Did you provide the GS credentials in the clearml.conf file, see example here:
https://github.com/allegroai/clearml/blob/c9121debc2998ec6245fe858781eae11c62abd84/docs/clearml.conf#L110
Sure LazyTurkey38 here's a nice hack for that:
` # code here
task.execute_remotely(queue_name=None, clone=False, exit_process=False)
patch the Task and actually send it for execution
if Task.running_locally():
task.update_task(task_data={'script': {'branch': 'new_branch', 'repository': 'new_repo'}})
# now to actually enqueue the Task
Task.enqueue(task, queue_name='default') You can also clear the git diff by passing "diff": "" `
wdyt?
SmugSnake6 I think the latest version (1.8.0) tries to parallelize it
You can also control max_workers
Ohh, yes, we need to map the correct clearml.conf, sorry, try (I fixed both clearml.conf mapping and ,ssh folder mapping):
` docker run -t --gpus "device=1" -e CLEARML_WORKER_ID=Gandalf:gpu1 -e CLEARML_DOCKER_IMAGE=nvidia/cuda:11.4.0-devel-ubuntu18.04 -v /home/dwhitena/.git-credentials:/root/.git-credentials -v /home/dwhitena/.gitconfig:/root/.gitconfig -v /home/dwhitena/clearml.conf:/root/clearml.conf -v /home/dwhitena/.ssh:/root/.ssh -v /home/dwhitena/.clearml/apt-cache.1:/var/cache/apt/arc...
@<1724960464275771392:profile|DepravedBee82> I just realized, the agent is Not running in docker mode, correct? (i.e. venv mode)
If this is the case how come it is running as root? (could it be is is running inside a container? how was that container spinned?)
JitteryCoyote63 I found it π
Are you working in docker mode or venv mode ?
Thank you @<1719524641879363584:profile|ThankfulClams64> for opening the GI, hopefully we will be able to reproduce it and fox ot quickly
Noooooooooo, it is still working π
Okay, I think I understand, but missing something. It seems you call get_parameters from old API , is your code actually calling get_parameters ? The trains-agent runs the code externally, whatever happens inside the agent should have now effect on the code. So who exactly is calling the task.get_parameters, and well, why ? :)
So can you verify it can download the model ?
ValueError('Task object can only be updated if created or in_progress')
It seems the task is not "running" hence the error, could that be
Hi @<1689446563463565312:profile|SmallTurkey79>
App Credentials now disappear on restart.
You mean in the web UI?
Hi EagerOtter28
Let's say we query another time and get 60k images. Now it is not trivial to create a new dataset B but only upload the diff: ...
Use Dataset.sync (or clearml-data sync) to check which files where changed/added.
All files are already hashed, right? I wonder whyΒ
clearml-data
Β does not keep files in a semi-flat hierarchy and groups them together to datasets?
It kind of does, it has a full listing of all the files with their hash (SHA2) values, ...
Could it be pandas was not installed on the local machine ?
PanickyMoth78
and I would definitely prefer the command
executing_pipeline
to
not
kill
the process that called it.
I understand why it would be odd from a notebook perspective, the issue is that the actual code is being "sent" to the backend to be execcuted on a remote machine. It is important to understand, that this is the end of the current process. Does that make sense ?
(not saying we could not add an argument for that, just trying to ...
Hi FriendlyKoala70 , trains will report all the tensorboard graphs, I'm assuming that's who is creating the epoch_lr graph. On top of it, you can always report manually with logger (as you pointed). Does that make sense to you?
I would like to start off by saying that I absolutely love clearml.
@<1547028031053238272:profile|MassiveGoldfish6> thank you for saying that! π
Is is possible to download individual files from a dataset without downloading the entire dataset? If so, how do you do that?
Well by default files are packaged into multiple zip files, you can control the size of the zip file for finer granularity, but at the end when you download, you are downloading the entire packaged ...
That seems like the k8s routing, can you try the web server curl?
JitteryCoyote63 virtualenv v20 is supported, pip v21 needs the latest trains/trains-agent RC,
Hmm what's the clearml version? Whats the python version, whats the OS? And pytorch version?
Really what I need is for A and B to be separate tasks, but guarantee they will be assigned to the same machine so that the clearml dataset cache on that machine will be warm.
I think that what you are looking for is multi-machine cache (which is fully supported). Basically mount an NFS/SMB folder from a NAS to any of those machines, configure the cache folder to point to it, and not you do not need to worry about affinity ?
no?
Is there a way to group A and B into a sub-pipeline, h...
which to my understanding has to be given before a call to an argparser,
SmarmySeaurchin8 You can call argparse before Task.init, no worries it will catch the arguments and trains-agent will be able to override them :)
WackyRabbit7
Cool - so that means the fileserver which comes with the host will stay emtpy? Or is there anything else being stored there?
Debug Images and artifacts will be automatically stored to the file server.
If you want your models to be automagically uploaded add the following :task=Task.init('example', 'experiment', output_uri=' ')(You can obviously point it to any other http/S3/GS/Azure storage)
Hi @<1545216070686609408:profile|EnthusiasticCow4> let me know if this one solves the issue
pip install clearml==1.14.2rc0
You can always click on the name of the series and remove it for display.
Why would you need three graphs?
each epoch runs about 55 minutes, and that screenshot I posted earlier kind of show the logs for the rest of the info being output, if you wanted to check that out
I thought you disabled the stdout log. no?
Maybe ClearML is using
tensorboard
in ways that I can fine tune? I
You can open your TB and see, every report there is logged into clearml
Hi UptightBeetle98
The hyper parameter example assumes you have agents ( trains-agent ) connected to your account. These agents will pull the jobs from the queue (which they are now, aka pending) setup the environment for the jobs (venv or docker+venv) and execute the job with the specific arguments the optimizer chose.
Make sense ?