Hi UnsightlyHorse88
Hmm, try adding to your clearml.conf file:agent.cpu_only = true
if that does not work try adding to the OS environmentexport CLEARML_CPU_ONLY=1
that is because my own machine has 10.2 (not the docker, the machine the agent is on)
No that has nothing to do with it, the CUDA is inside the container. I'm referring to this image https://allegroai-trains.slack.com/archives/CTK20V944/p1593440299094400?thread_ts=1593437149.089400&cid=CTK20V944
Assuming this is the output from your code running inside the docker , it points to cuda version 10.2
Am I missing something ?
When you are running the base-task, are you proving any arguments to it?
Can you share the "execution" Tab? and the Args tab of the base-task ?
Oh sorry, from the docstring, this will work:
` :param bool continue_last_task: Continue the execution of a previously executed Task (experiment)
.. note::
When continuing the executing of a previously executed Task,
all previous artifacts / models/ logs are intact.
New logs will continue iteration/step based on the previous-execution maximum iteration value.
For example:
The last train/loss scalar reported was iteration 100, the next report will b...
I think I was not able to fully express my point. Let me try again.
When you are running the pipeline Fully locally (both logic and components) the assumption is this is for debugging purposes.
This means that the code of each component is locally available, could that be a reason?
Let me check, it was supposed to be automatically aborted
What do you have in "server_info['url']" ?
print(requests.get(url='
print(requests.get(url='
Hmm what do you have here?
os.system("cat /var/log/studio/kernel_gateway.log")
This is strange, let me see if we can get around it, because I'm sure it worked ๐
Hi @<1523701066867150848:profile|JitteryCoyote63>
RC is out,
pip3 install clearml-agent==1.5.3rc3
Then in pytorch_resolve: "direct"
None
Let me know if it worked
Can clearml-agent currently detect this?
Hmm you mean will agent clean it self up?
@<1587253076522176512:profile|HollowPeacock33>
Is this a commercial ad? this seems like out of scope for this channel
Can you expand?
In venv mode yes, in docker mode you can pass them by setting the -e flag on the docker_extra_flags
https://github.com/allegroai/trains-agent/blob/121dec2a62022ddcbb0478ded467a7260cb60195/docs/trains.conf#L98
Hi BitterStarfish58
Where are you uploading it to?
Hmm BitterStarfish58 what's the error you are getting ?
Any chance you are over the free tier quota ?
It might be the file upload was broken?
BitterStarfish58 I would suspect the upload was corrupted (I think this is the discrepancy between the files size logged, to the actual file size uploaded)
GiganticTurtle0
If there are several tasks running concurrently, which task shouldย
Task.current_task()
ย return?ย (
How could you have that ?
Per process, there is one Main current Task (until you close it).
Are you referring to a pipeline with multiple steps ?
If this is the case, task.current_task
will return the Task of the component (if executed form the component) and the pipeline (if called from the pipeline logic function).
Notice we added the ability to s...
Okay, let me see...
. Is there any known issue with amazon sagemaker and ClearML
On the contrary it actually works better on Sagemaker...
Here is what I did on sage maker, created:
created a new sagemaker instance opened jupyter notebook Started a new notebook conda_python3 / conda_py3_pytorchIn then I just did "!pip install clearml" and Task.init
Is there any difference ?
DeterminedToad86 were you running a jupyter notebook or a jupyter console ?
Yey!
My pleasure ๐
I mean clone the Task in the UI (right click Clone), then go to the execution Tab, to the "installed packages" section, then click on Edit -> go to the torchvision http link, and replace it with torchvision == 0.7.0
and save.
Then right enqueue the Task (to the default queue) and see if the Agent can run it,
DeterminedToad86 Make sense ?
LOL, Okay I'm not sure we can do something that one.
You should probably increase the storage on your instance ๐
BTW: from the instance name it seems like it is a VM with preinstalled pytorch, why don't you add system site packages, so the venv will inherit all the preinstalled packages, it might also save some space ๐
DeterminedToad86 see here:
https://github.com/allegroai/clearml-agent/blob/0462af6a3d3ef6f2bc54fd08f0eb88f53a70724c/docs/clearml.conf#L55
Change it on the agent's conf file to:system_site_packages: true