This works.
great!
So it is still in master and should be included in 1.0.5?
correct, RC will be released soon with this fix included
Hmm I see your point.
Any chance you can open a github issue with a small code snippet to make sure we can reproduce and fix it?
Hi PanickyMoth78
Yes i think you are correct, this looks like gs throttling your connection. You can control the number of concurrent uploads with max_worker=1
https://github.com/allegroai/clearml/blob/cf7361e134554f4effd939ca67e8ecb2345bebff/clearml/datasets/dataset.py#L604
Let me know if it works
Lately I've heard of groups that do slices of datasets for distributed training, or who "stream" data.
Hmm so maybe a "glob" alike parameter for get_local_copy(select_filter='subfolder/*')
?
Hi ShortElephant92
This isn't an issue if the user is using a Service Account JSON Key,
Are you saying that when you are using GS python sdk directly it works?
For context, the google cloud storage SDK allows an authorized user credentials.
ClearML actually uses the google python SDK, the JSON is just a way to pass the credentials to the google SDK, I'm not sure it points to "service account"? where did that requirement came from ?
is it from here ` Service account info was n...
OutrageousGrasshopper93tensorflow-gpu
is not needed, it will convert tensorflow to tensorflow-gpu based on the detected cuda version (you can see it in the summary configuration when the experiment sins inside the docker)
How can i set the base python version for the newly created conda env?
You mean inside the docker ?
HarebrainedBear62 this is what I have.
clearml-data will store all the files for you, and version the entire thing, make is a breeze to abstract the dataset from the code. Querying data is available using Apache Drill (though currently it is still not built into the platform, but we are planning to get there soon) Since this is Image based data/meta-data, I know the paid tier of ClearML, has n additional dedicated data management solution specifically for images, with full ability to query m...
Hi ColossalAnt7 , I think we run into it on a few dockers, I believe the bug was fixed in the latest trains-agent
RC. Could you verify please ?
Hmm DepressedChimpanzee34 my bad it seems the loading is done via YAML loader, but the dumping is straight forward str casting...
https://github.com/allegroai/clearml/blob/6e6271fb91f2aeb2aa7a13c6d07d4e635baaa670/clearml/backend_interface/task/task.py#L934
What would you expect to get (BTW "value\blah"
is Not a valid string assignment in python as there is no \b escape character, it should be "value\blah" which translates into the text "value\blah")
Usually in the /tmp folder under a temp filename (it is generated automatically when spinned)
In case of the services, this will be inside the docker itself
You should have a download button when you hover over the table, I guess that would be the easiest.
If needed I can send an SDK code but unfortunately there is no single call for that
Hi @<1523708901155934208:profile|SubstantialBaldeagle49>
If you report on the same iteration with the same title/series you are essentially overwriting the data (as expected)
Regrading the plotly report size.
Two options:
- round down numbers (by default it will store all the digits, and usually after the forth it's quite useless, and it will drastically decrease the plot size)
- Use logger.report_scatter2d , it is more efficient and has a mechanism to subsample extremely large graphs.
Hi RobustHippopotamus53
The way "latest from branch" works:
On the Task you specify the branch name (e.g. "master", no need to add the origin/ prefix) The agent then pulls the latest commit from that branch and updates back the Task to the current commit ID (the latest on the branch at the time of execution) This process ensures reproduciblity and traceability as we can always be certain the exact commit that was executed.Could it be the you "forced-push" a commit/squash, hence the "origina...
K8s + clearml-agent integration.
Hmm is this an on-prem k8s cluster?
No sure I follow, you mean to launch it on the kubernretes cluster from the ClearML UI?
(like the clearml-k8s-glue ?)
Hmm, so currently you can provide help, so users know what they can choose from, but there is no way to limit it.
I know the Enterprise version has something similar that allows users to create a custom "application" from a Task, there you can define a drop and as such, but that might be an overkill here, wdyt?
Hi RipeGoose2 all PR's are welcome, feel free to submit :)
The issue is uploading reporting fro http uploads (object storage will report upload). Basically the http upload is post with urllib that does not support upload callbacks for progress report. If you have an idea here, we will gladly add it (as you mentioned it can be quite annoying to have to open network manager to verify the upload is progressing)
ItchyJellyfish73
Unfortunately this needs backend support, and only available in the enterprise version, what is your use case for it? (It was designed to allow out of the box bare-metal multi gpu dynamic allocation, think DGX with 8 GPUs that instead of spinning down agents when you want to change the queue->num-gpu mapping you can do it on the fly)
Q. Would someone mind outlining what the steps are to configuring the default storage locations, such that any artefacts or data which are pushed to the server are stored by default on the Azure Blob Store?
Hi VivaciousPenguin66
See my reply here on configuring the default output uri on the agent: https://clearml.slack.com/archives/CTK20V944/p1621603564139700?thread_ts=1621600028.135500&cid=CTK20V944
Regrading permission setup:
You need to make sure you have the Azure blob credenti...
Are you seeing the argparse arguments in the UI (when running locally) ?
I assume the account name and key refers to the storage account credentials that you can from Azure Storage Explorer?
correct
ElegantCoyote26parser = get_parser() args_ = vars(parser.parse_args()) task.connect(args_)
There is no need to connect args_
Task.init will automatically catch the argparser.
Is there still an issue? Could it be the browser cannot access the file server directly?
Hi @<1684010629741940736:profile|NonsensicalSparrow35>
But the provided command is missing the url target for the curl so it is not complete.
Not sure I followed. did you specify "NEW_ADDRESS" ?
or is it the in both cases the URL is locahost ?
Thanks @<1634001106403069952:profile|DefeatedMole42>
A follow up, (1) how are you spinning the agent ? (2) could it be the docker image "ultralytics/yolov5" does not have Bash as entry point ?
you can force that with
@PipelineDecorator.component(return_values=['int'], cache=False,
task_type='training',
docker="ultralytics/yolov5",
docker_args="--entrypoint /bin/bash",
pa...
@<1533620191232004096:profile|NuttyLobster9> I think we found the issue, when you are passing a direct link to the python venv, the agent fails to detect the python version and since the python version is required for fetching the correct torch it fails to install it. This is why passing CLEARML_AGENT_PACKAGE_PYTORCH_RESOLVE=none
because it skipped resolving the torch / cuda version (that requires parsing the python version)
Sure thing, anyhow we will fix this bug so next version there is no need for a workaround (but the workaround will still hold so you won't need to change anything)
- but the
pytorch/main.py
file doesn't run.
What do you have on the Task itself? is this the correct script ?
Any chance you can send a full log ? (you can DM it if it helps)
Great!
BTW: you can take some inspiration from here:
https://github.com/allegroai/trains/blob/master/examples/automation/task_piping_example.py
Or from the full pipeline:
https://github.com/allegroai/trains/blob/master/examples/pipeline/pipeline_controller.py