but I don't see any change...where is the link to the file removed from
In the meta data section, check the artifacts "state" object
How are these two datasets different?
Like comparing two experiments :)
My bad you have to pass it to the container itself:
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a516cab60e83/docs/clearml.conf#L149extra_docker_arguments: ["-e", "CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1"]
Hi @<1631102016807768064:profile|ZanySealion18>
sorry missed that one
The cache doesn't work, it attempts to download the dataset every time.
just making sure the dataset itself contains all the files?
Once I used clearml-data add --folder * CLI everything works correctly (though all files recursively ended up in the root, I had luck all were named differently).
Not sure I follow here, is the problem the creation of the dataset of fetching it? is this a single version or multi...
Thanks!
fyi: This section is not necessary if you you have clearml.conf file in ~/Task.set_credentials( api_host=" ", web_host=" ", files_host=" ", key='********************', secret='***********************' )Let me check the code for a min
PanickyMoth78 ScantMoth28
With several models saved by the training process (whose code is not task-aware)
You can actually specify which models to be saved:task = Task.init(..., auto_connect_frameworks{'pytorch': ['*.pt']})https://clear.ml/docs/latest/docs/references/sdk/task#taskinit
This way you can upload only the model you need.
If there was an SSL issue it should log to console right?
correct, also the agent is able to report, so I'm assuming configuration is correct
@<1724960464275771392:profile|DepravedBee82> could you try to put the clearml import + Task .init at the top of your code?
Most likely yes, but I don't see how clearml would have an impact here, I am more inclined to think it would be a pytorch dataloader issue, although I don't see why
These are most certainly dataloader process. But clearml-agent when killing the process should also kill all subprocesses, and it might be there is something going on that prenets it from killing the subprocesses ...
Is this easily reproducible ? Can you verify it is still the case with the latest RC of clearml-agent ?
Hi GiddyTurkey39 ,
When you say trains agent, are you referring to the trains agent command ...
I mean running the trains-agent daemon on a machine. This means you have a daemon pulling jobs from the execution queue and executing them (either in virtual environment, or inside a docker)
You can read more about https://github.com/allegroai/trains-agent and https://allegro.ai/docs/concepts_arch/concepts_arch/
Is it sufficient to queue the experiments
Yes there is no ne...
SubstantialElk6 Ohh okay I see.
Let's start with background on how the agent works:
When the agent pulls a job (Task), it will clone the code based on the git credentials available on the host itself, or based on the git_user/git_pass configured in ~/clearml.conf
https://github.com/allegroai/clearml-agent/blob/77d6ff6630e97ec9a322e6d265cd874d0ab00c87/docs/clearml.conf#L18
The agent can work in two modes:
Virtual environment mode, where it will create a new venv for each experiment ba...
but is there any other way to get env vars / any value or secret from the host to the docker of a task?
if this is docker -e/--env as argument would do the same-e VAR=somevalue
WickedGoat98 Notice this is not the "clearml-agent-services" docker but "clearml-agent" docker image
Also the default docker image is "nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04"
Other than that quite similar :)
I'm suggesting to make it public.
Actually I'm thinking of enabling users to register Drivers in runtime, expanding the capability to support any type of URL link, meaning you can register "azure://" with AzureDriver, and the StorageHelper will automatically use the driver you provide.
This will make sure Any part of the system will be able to transparently use any custom driver.
wdyt?
Hi ScantChimpanzee51
having the ClearML auto scaler at all is super great and an impressive tool!
Thank you! 😍
As all data resides within the container, it is lost afterwards.
Nothing to fear there, if you are using the StorageManager, the destination is always the cache folder, which the agent automatically mounts to the host machine.
That said if the EC2 instance is taken down (i.e. idle) then the cache is lost with it.
Make sense?
you should have something like 192.168... or 10.0 ....
I'm assuming this is related to this thread:
None
https://www.geeksforgeeks.org/invalid-decimal-literal-in-python/
This is the warning hence my question
Hi TartSeal39
So the thing is, the agent does not support yaml env for conda. Currently if the requirements section is empty, the agent will use the requirements.txt of the repo. We first need to add support for conda yaml, and then allow you to disable the auto requirements or push the specific yaml. Would that work? Also is there a reason the auto package is not working?
I don't know how I would be able to get the description and name?
Good point, how about doing that in code, then you have all the information and you can store it in jsons / pickle next to the data folder?
wdyt?
AttributeError: 'NoneType' object has no attribute 'base_url'
can you print the model object ?
(I think the error is a bit cryptic, but generally it might be the model is missing an actual URL link?)print(model.id, model.name, model.url)
Also, for a single parameter you can use:cloned_task.set_parameter(name="Args/artifact_name", value="test-artifact", description="my help text that will appear in the UI next to the value")This way, you are not overwriting the other parameters, you are adding to them.
(Similar to update_parameters , only for a single parameter)
I have an idea, can you try with:task = Task.init(..., reuse_last_task_id=False)I have a suspicion it starts the Tasks in parallel, and the "reuse_last_task_id" causes them to "reuse the same task locally" which makes them overwrite the configuration of one another.
SmarmyDolphin68
Debug Samples tab and not the Plots,
Are you doing plt.imshow ?
Also make sure you have report_image=False when calling the report_matplotlib_figure
(if it is true it will upload it as an image to "debug samples")