Reputation
Badges 1
25 × Eureka!In the "installed packages" section you should have "nvidia-dali-cuda110" In the agent's clearml.conf you should add:extra_index_url: ["
", ]
https://github.com/allegroai/clearml-agent/blob/master/docs/clearml.conf#L78
Should solve the issue
like this.. But when I am cloning the pipeline and changing the parameters, it is running on default parameters, given when pipeline was 1st run
Just making sure, you are running the cloned pipeline with an agent. correct?
What is the clearml version you are using?
Is this reproducible with the pipeline example ?
SmallAnt76
see https://clear.ml/pricing/ , under "What plan should I choose?"
what you are looking for is the first column "open-source". make sense ?
multiple machines and reporting to the same task.
Out of curiosity , how do you launch it on multiple machines?
reporting to the same task.
So the "funny" think is, they all report on on top (overwriting) the other...
In order for them to report individually, it might be that you need multiple Tasks (i.e. one per machine)
Maybe we could somehow have prefix with rank on the cpu/network etc?! or should it be a different "title", wdyt?
Ohh... I would not delete them then ... π
Maybe kind of heuristics (files created a week ago can be deleted?!)
An easier fix for now will probably be some kind of warning to the user that a task is created but not connected
That is a good point, maybe if you do not have a "main" Task, then we print the warning (with some flag to disable the warning) ?
UnevenDolphin73 i would use apiclient:
APIClient().projects.edit(project=project_id, system _tags=[])
*I might have a few typos above but that should be the gist
If possible, i would like all together prevent the fileserver and write everything to S3 (without needing every user to change their config)
There is no current way to "globally" change the default files server (I think this is part of the enterprise version, alongside vault etc.).
What you can do is use an OS environment to override the conf file:CLEARML_FILES_HOST="
"
PricklyRaven28 wdyt?
When looking at the worker details, it says "No queues currently assigned to this worker"
Yes, I think we should have better information there, the "AWS service" is not directly pulling jobs from any specific queue, hence nothing there. It is "listening" to queues and launching machines, those machines will be listening to the queue. I wonder if it is just easier to also make sure it is listed as "assigned" to those queues . wdyt?
I can't seem to find a difference between the two, why would matplotlib get listed and pandas does not... Any other package that is missing?
BTW: as an immediate "hack" , before your Task.init
call add the following:Task.add_requirements("pandas")
We do upload the final model manually.
If this is the case just name it based on the parameters, no? am I missing soemthing?
https://github.com/allegroai/clearml/blob/cf7361e134554f4effd939ca67e8ecb2345bebff/clearml/model.py#L1229
I was just wondering if i can make the autologging usable.
It kind of assumes these are different "checkpoints" on the same experiment, and then stores them based on the file name
You can however change the model names later:
` Task.current_task().mo...
Hmm I see your point.
Any chance you can open a github issue with a small code snippet to make sure we can reproduce and fix it?
Hi RattyBat71
Do you tend to create separate experiments for each fold?
If you really want to parallelized the workload, then splitting it to multiple executions (i.e. passing an argument of the index of the same CV) makes sense, then you can compare / sort the results based on a specific metric. That said if speed is not important, just having a single script with multiple CVs might be easier to implement?!
. Can I get gpu usage over time frame via API also?
task.get_reported_scalars
But this will get you All the scalars, I think the next version of the server supports asking a specific one as well.
How are you implementing the alert monitoring?
Is is a stateless process starting every X min, or is it a state-full process running and monitoring ?
Hi ObedientDolphin41
I keep bumping against the
ModuleNotFoundError: No module named
exception.
Import the package inside the component function (the one you decorated), it will make sure it lists it in the requirements section automatically.
You can also set it manually by passing it to as the "packages" argument on the decorator function:
If the same Task is run with different parameters...
ShinyWhale52 sorry, I kind of missed that in the explanation
The pipeline will always* create a new copy (clone) of the original Task (step), then modify the step's inputs etc.
The idea is that you have the experiment management (read execution management) to create full transparancy into the pipelines and steps. Think of it as the missing part in a lot of pipelines platforms where after you executed the pipeline you need to furthe...
ReassuredTiger98
Okay, but you should have had the prints ...uploading artifact
anddone uploading artifact
So I suspect something is going on with the agent.
Did you manage to run any experiment on this agent ?
EDIT: Can you try with artifacts example we have on the repo:
https://github.com/allegroai/clearml/blob/master/examples/reporting/artifacts.py
Wait, why aren't you just calling Popen? (or os.system), I'm not sure how it relates to the torch multiprocess example. What am I missing ?
why are all defined components shown in the UI Results/Plots/PipelineDetails/ExecutionDetails section? Shouldn't it make more sense to show only the ones that are used in that pipeline?
They are listed there (because of the decorator, you basically "say" these are steps so they are listed), the actual resolving (i.e. which steps are actually being called) is done in "real-time"
Make sense ?
I specifically set is as empty withΒ
export_data['script']['requirements'] = {}
Β in order not to reduce overhead during launch. I have everything installed inside the container
Do you have everything inside the container Inside a venv ?
Can I make the Tasks that I'm adding to the pipeline also run locally, such that the entire pipeline runs locally?
Ohh I think only if you have an agent running on your machine.
What is the use case ? (maybe we can add local execution as well?!)
Yes I think the writer.add_figure
somehow crops the image
CLEARML_AGENT_GIT_USER
Is your git user (on whatever git host/server you are using, GitHub/GitLab/BitBucket etc.)
The 'on-premise' server fails to connect to the ClearML server because of the VPN I think
I think you are correct.
You can quickly test it, try ti run curl
http://local-server:8008 see if that works
Would that go under
arguments
?
yes π
Also what is the base path where the git repo is cloned? So if my repo is called myProject.git, what would the full path be?
For example https://github.com/ <user>/myProject.git
btw: how come you do not have this field auto populated from running the code locally or using clearml-task
CLI?
Thanks GentleSwallow91
That's a good tip, where in the docs would you add it?
BeefyCow3 On the plot itself click on the json download button
. I'm thinking it's generically a kernel gateway issue, but I'm not sure if other platforms are using that yet
The odd thing is that you can access the notebook, but it returns zero kernels ..