Reputation
Badges 1
25 × Eureka!Hi ShortElephant92
You could get a local copy from the local server, then switch credentials to the hosted server and upload again, would that work?
I am providing a helper to run a task in queue after running it locally in the notebook
Is this part of a pipeline process or just part of the workflow ?
(reason for asking is that if this is a pipeline thing we might be able to support it in v2)
yes, that makes sense to me.
What is your specific use case, meaning when/how do you stop / launch the hpo?
Would it make sense to continue from a previous execution and just provide the Task ID? Wdyt?
It's just the print (_ repr _) not showing the datafor w in client.workers.get_all(): print(w.data)
Hi JumpyPig73
Funny enough this is being fixed as we speak ๐
The main issue is that as you mentioned, ClearML does not "detect" the exit code when os.exit() is called, and this is why it is "missing" the failed test (because as mentioned, all exceptions are caught). This should be fixed in the next RC
Is the code in this "other" repo downloaded to the agent's machine? Or is the component's code pushed to the machine on which the repository is?
Yes this repo is downloaded into the agent, so your code has access to it
How can I ensure that additional tasks arenโt created for a notebook unless I really want to?
TrickySheep9 are you saying two Tasks are created in the same notebook without you closing one of them ?
(Also, how is the git diff warning there with the latest clearml, I think there was some fix related to that)
do you have your Task.init call inside the "train.py" script ? (and if you do, what are you getting in the Execution tab of the task) ?
That somehow the PV never worked and it was all local inside the pod
@<1539780258050347008:profile|CheerfulKoala77> make sure the AMI id matches the zone of the EC2 machine
Yes! I checked it should work (it checks if you have load(...) function on the preprocess class and if you do it will use it:
None
def load(local_file)
self._model = joblib.load(local_file_name)
self._preprocess_model = joblib.load(Model(hard_coded_model_id).get_weights())
Yes that should work, only thing is you need to call Task init on the master process (and make sure you call Task.current_task() on the subprocesses, if you want to automagic to kick in, that said, usually there is no need, they are supposed to report everything back to the main one anyhow
basically
` @call_parse
def main(
ย ย gpus:Param("The GPUs to use for distributed training", str)='all',
ย ย script:Param("Script to run", str, opt=False)='',
ย ย args:Param("Args to pass to script", nargs=...
I think the crux of the issue is the subprocess calls I removed.
That kind of makes sense, though if the subprocess function also had Task.init call it should have worked.
Would that be the setup to try to replicate?
Hmm, maybe the right way to do so is to abuse "models" which have entity, you can specify a system_tag on them, they can store a folder (and extract it if you need), they are on projects and they are cloned and can be changed.
wdyt?
Can you share the storagemanager usage, and error you are getting ?
SmugOx94 Yes, we just introduced it ๐ with 0.16.3
Discussion was here (I'll make sure to update the issue that the version is out)
https://github.com/allegroai/trains/issues/222
In your trains.conf add the following line:sdk.development.store_code_diff_from_remote = trueIt will store the diff from the remote HEAD instead of the local one.
PompousParrot44 the fundamental difference is that artifacts are uploaded manually (i.e. a user will specifically "ask" to upload an artifact), models are logged automatically and a user might not want them uploaded (imagine debugging sessions, or testing).
By adding the 'upload_uri' arguments, you can specify to trains that you want all models to be automatically uploaded (not just logged).
Now here is the nice thing, when running using the trains-agent, you can have:
Always upload the mod...
Hi ResponsiveCamel97
Let me explain how it works, essentially it creates a new venv inside the docker, inheriting all the packages form the main system packages.
This allows it to use the installed packages if the version match, and upgrade/change if you need, all without the need to rebuild a new container. Make sense ?
One last question: Is it possible to set the pip_version task-dependent?
no... but why would it matter on a Task basis ? (meaning what would be a use case to change the pip version per Task)
My bad I wrote refresh and then edited it to the correct "reload" ๐
WackyRabbit7 interesting! Are those "local" pipelines all part of the same code repository? do they need their own environment ?
What would be the easiest pipeline interface to run them locally? (I would if we could support this workflow, it seems you are not alone in this approach, and of course that you can always use them remotely, i.e. clone the pipeline and launch it on an agent)
Hi DilapidatedCow43
I'm assuming the returned object cannot be pickled (which is ClearML's way of serializing it)
You can upload it as a model with
` uploaded_model_url = Task.current_task().update_output_model(model_path="/path/to/local/model")
...
return uploaded_model_url `wdyt?
Yes you can drag it in the UI :) it's a new feature in v1
Is this a common case? maybe we should change the run_pipeline_steps_locally argument to False?
(The idea of run_pipeline_steps_locally=True is that it will be easier to debug the entire pipeline on the same machine)
Hi John. sort of. It seems that archiving pipelines does not also archive the tasks that they contain so
This is correct, the rationale is that the components (i.e. Tasks) might be used (or already used) as cached steps ...
SoggyFrog26 there is a full pythonic interface, why don't you use this one instead, much cleaner ๐
VexedCat68 I think this is the issue described here:
https://github.com/allegroai/clearml/issues/491
Can you test with the latest RC:pip install clearml==1.1.5rc1
CheerfulGorilla72
yes, IP-based access,
hmm so this is the main downside of using IP based server, the links (debug images, models, artifacts) store the full URL (e.g. http://IP:8081/ http://IP:8081/... ) This means if you switched IP they will no longer work. Any chance to fix the new server to the old IP?
(the other option is somehow edit the DB with the links, I guess doable but quite risky)